Package org.apache.nutch.parse
Interface HtmlParseFilter
- 
- All Superinterfaces:
- Configurable,- Pluggable
 - All Known Implementing Classes:
- CCParseFilter,- DebugParseFilter,- HeadingsParseFilter,- HTMLLanguageParser,- JSParseFilter,- MetaTagsParser,- NaiveBayesParseFilter,- RegexParseFilter,- RelTagParser
 
 public interface HtmlParseFilter extends Pluggable, Configurable Extension point for DOM-based HTML parsers. Permits one to add additional metadata to HTML parses. All plugins found which implement this extension point are run sequentially on the parse.
- 
- 
Field SummaryFields Modifier and Type Field Description static StringX_POINT_IDThe name of the extension point.
 - 
Method SummaryAll Methods Instance Methods Abstract Methods Modifier and Type Method Description ParseResultfilter(Content content, ParseResult parseResult, HTMLMetaTags metaTags, DocumentFragment doc)Adds metadata or otherwise modifies a parse of HTML content, given the DOM tree of a page.- 
Methods inherited from interface org.apache.hadoop.conf.ConfigurablegetConf, setConf
 
- 
 
- 
- 
- 
Field Detail- 
X_POINT_IDstatic final String X_POINT_ID The name of the extension point.
 
- 
 - 
Method Detail- 
filterParseResult filter(Content content, ParseResult parseResult, HTMLMetaTags metaTags, DocumentFragment doc) Adds metadata or otherwise modifies a parse of HTML content, given the DOM tree of a page.- Parameters:
- content- the- Contentfor a given response
- parseResult- the result of running on or more- Parser's on the content.
- metaTags- a populated- HTMLMetaTagsobject
- doc- a- DocumentFragment(DOM) which can be processed in the filtering process.
- Returns:
- a filtered ParseResult
- See Also:
- Parser.getParse(Content)
 
 
- 
 
-