Class MetadataScoringFilter
- java.lang.Object
- 
- org.apache.nutch.scoring.AbstractScoringFilter
- 
- org.apache.nutch.scoring.metadata.MetadataScoringFilter
 
 
- 
- All Implemented Interfaces:
- Configurable,- Pluggable,- ScoringFilter
 
 public class MetadataScoringFilter extends AbstractScoringFilter For documentation:org.apache.nutch.scoring.metadata
- 
- 
Field SummaryFields Modifier and Type Field Description static StringMETADATA_CONTENTstatic StringMETADATA_DATUMstatic StringMETADATA_PARSED- 
Fields inherited from interface org.apache.nutch.scoring.ScoringFilterX_POINT_ID
 
- 
 - 
Constructor SummaryConstructors Constructor Description MetadataScoringFilter()
 - 
Method SummaryAll Methods Instance Methods Concrete Methods Modifier and Type Method Description CrawlDatumdistributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount)This will take the metadata that you have listed in your "scoring.parse.md" property, and looks for them inside the parseData object.voidpassScoreAfterParsing(Text url, Content content, Parse parse)Takes the metadata, which was lumped inside the content, and replicates it within your parse data.voidpassScoreBeforeParsing(Text url, CrawlDatum datum, Content content)Takes the metadata, specified in your "scoring.db.md" property, from the datum object and injects it into the content.voidsetConf(Configuration conf)handles conf assignment and pulls the value assignment from the "scoring.db.md", "scoring.content.md" and "scoring.parse.md" properties.- 
Methods inherited from class org.apache.nutch.scoring.AbstractScoringFiltergeneratorSortValue, getConf, indexerScore, initialScore, injectedScore, updateDbScore
 - 
Methods inherited from class java.lang.Objectclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 - 
Methods inherited from interface org.apache.nutch.scoring.ScoringFilterorphanedScore
 
- 
 
- 
- 
- 
Field Detail- 
METADATA_DATUMpublic static final String METADATA_DATUM - See Also:
- Constant Field Values
 
 - 
METADATA_CONTENTpublic static final String METADATA_CONTENT - See Also:
- Constant Field Values
 
 - 
METADATA_PARSEDpublic static final String METADATA_PARSED - See Also:
- Constant Field Values
 
 
- 
 - 
Method Detail- 
distributeScoreToOutlinkspublic CrawlDatum distributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount) throws ScoringFilterException This will take the metadata that you have listed in your "scoring.parse.md" property, and looks for them inside the parseData object. If they exist, this will be propagated into your 'targets' Collection's ["outlinks"] attributes.- Specified by:
- distributeScoreToOutlinksin interface- ScoringFilter
- Overrides:
- distributeScoreToOutlinksin class- AbstractScoringFilter
- Parameters:
- fromUrl- url of the source page
- parseData- ParseData instance, which stores relevant score value(s) in its metadata. NOTE: filters may modify this in-place, all changes will be persisted.
- targets- <url, CrawlDatum> pairs. NOTE: filters can modify this in-place, all changes will be persisted.
- adjust- a CrawlDatum instance, initially null, which implementations may use to pass adjustment values to the original CrawlDatum. When creating this instance, set its status to- CrawlDatum.STATUS_LINKED.
- allCount- number of all collected outlinks from the source page
- Returns:
- if needed, implementations may return an instance of CrawlDatum,
         with status CrawlDatum.STATUS_LINKED, which contains adjustments to be applied to the original CrawlDatum score(s) and metadata. This can be null if not needed.
- Throws:
- ScoringFilterException- there is a fatal error distributing score data from the current page to all of its outlinks
- See Also:
- ScoringFilter.distributeScoreToOutlinks(org.apache.hadoop.io.Text, org.apache.nutch.parse.ParseData, java.util.Collection<java.util.Map.Entry<org.apache.hadoop.io.Text, org.apache.nutch.crawl.CrawlDatum>>, org.apache.nutch.crawl.CrawlDatum, int)
 
 - 
passScoreBeforeParsingpublic void passScoreBeforeParsing(Text url, CrawlDatum datum, Content content) Takes the metadata, specified in your "scoring.db.md" property, from the datum object and injects it into the content. This is transfered to the parseData object.- Specified by:
- passScoreBeforeParsingin interface- ScoringFilter
- Overrides:
- passScoreBeforeParsingin class- AbstractScoringFilter
- Parameters:
- url- url of the page
- datum- source datum. NOTE: modifications to this value are not persisted.
- content- instance of content. Implementations may modify this in-place, primarily by setting some metadata properties.
- See Also:
- ScoringFilter.passScoreBeforeParsing(org.apache.hadoop.io.Text, org.apache.nutch.crawl.CrawlDatum, org.apache.nutch.protocol.Content),- passScoreAfterParsing(org.apache.hadoop.io.Text, org.apache.nutch.protocol.Content, org.apache.nutch.parse.Parse)
 
 - 
passScoreAfterParsingpublic void passScoreAfterParsing(Text url, Content content, Parse parse) Takes the metadata, which was lumped inside the content, and replicates it within your parse data.- Specified by:
- passScoreAfterParsingin interface- ScoringFilter
- Overrides:
- passScoreAfterParsingin class- AbstractScoringFilter
- Parameters:
- url- page url
- content- original content. NOTE: modifications to this value are not persisted.
- parse- target instance to copy the score information to. Implementations may modify this in-place, primarily by setting some metadata properties.
- See Also:
- passScoreBeforeParsing(org.apache.hadoop.io.Text, org.apache.nutch.crawl.CrawlDatum, org.apache.nutch.protocol.Content),- ScoringFilter.passScoreAfterParsing(org.apache.hadoop.io.Text, org.apache.nutch.protocol.Content, org.apache.nutch.parse.Parse)
 
 - 
setConfpublic void setConf(Configuration conf) handles conf assignment and pulls the value assignment from the "scoring.db.md", "scoring.content.md" and "scoring.parse.md" properties.- Specified by:
- setConfin interface- Configurable
- Overrides:
- setConfin class- AbstractScoringFilter
 
 
- 
 
-