Class NaiveBayesParseFilter
- java.lang.Object
- 
- org.apache.nutch.parsefilter.naivebayes.NaiveBayesParseFilter
 
- 
- All Implemented Interfaces:
- Configurable,- HtmlParseFilter,- Pluggable
 
 public class NaiveBayesParseFilter extends Object implements HtmlParseFilter Html Parse filter that classifies the outlinks from the parseresult as relevant or irrelevant based on the parseText's relevancy (using a training file where you can give positive and negative example texts see the description of parsefilter.naivebayes.trainfile) and if found irrelevant it gives the link a second chance if it contains any of the words from the list given in parsefilter.naivebayes.wordlist. CAUTION: Set the parser.timeout to -1 or a bigger value than 30, when using this classifier.
- 
- 
Field SummaryFields Modifier and Type Field Description static StringDICTFILE_MODELFILTERstatic StringTRAINFILE_MODELFILTER- 
Fields inherited from interface org.apache.nutch.parse.HtmlParseFilterX_POINT_ID
 
- 
 - 
Constructor SummaryConstructors Constructor Description NaiveBayesParseFilter()
 - 
Method SummaryAll Methods Instance Methods Concrete Methods Modifier and Type Method Description booleanclassify(String text)booleancontainsWord(String url, ArrayList<String> wordlist)ParseResultfilter(Content content, ParseResult parseResult, HTMLMetaTags metaTags, DocumentFragment doc)Adds metadata or otherwise modifies a parse of HTML content, given the DOM tree of a page.booleanfilterParse(String text)booleanfilterUrl(String url)ConfigurationgetConf()voidsetConf(Configuration conf)voidtrain()
 
- 
- 
- 
Field Detail- 
TRAINFILE_MODELFILTERpublic static final String TRAINFILE_MODELFILTER - See Also:
- Constant Field Values
 
 - 
DICTFILE_MODELFILTERpublic static final String DICTFILE_MODELFILTER - See Also:
- Constant Field Values
 
 
- 
 - 
Method Detail- 
filterParsepublic boolean filterParse(String text) 
 - 
filterUrlpublic boolean filterUrl(String url) 
 - 
classifypublic boolean classify(String text) throws IOException - Throws:
- IOException
 
 - 
setConfpublic void setConf(Configuration conf) - Specified by:
- setConfin interface- Configurable
 
 - 
getConfpublic Configuration getConf() - Specified by:
- getConfin interface- Configurable
 
 - 
filterpublic ParseResult filter(Content content, ParseResult parseResult, HTMLMetaTags metaTags, DocumentFragment doc) Description copied from interface:HtmlParseFilterAdds metadata or otherwise modifies a parse of HTML content, given the DOM tree of a page.- Specified by:
- filterin interface- HtmlParseFilter
- Parameters:
- content- the- Contentfor a given response
- parseResult- the result of running on or more- Parser's on the content.
- metaTags- a populated- HTMLMetaTagsobject
- doc- a- DocumentFragment(DOM) which can be processed in the filtering process.
- Returns:
- a filtered ParseResult
- See Also:
- Parser.getParse(Content)
 
 
- 
 
-