Package org.apache.nutch.parse
Interface Parser
- 
- All Superinterfaces:
- Configurable,- Pluggable
 - All Known Implementing Classes:
- ExtParser,- FeedParser,- HtmlParser,- JSParseFilter,- TikaParser,- ZipParser
 
 public interface Parser extends Pluggable, Configurable A parser for content generated by aProtocolimplementation. This interface is implemented by extensions. Nutch's core contains no page parsing code.
- 
- 
Field SummaryFields Modifier and Type Field Description static StringX_POINT_IDThe name of the extension point.
 - 
Method SummaryAll Methods Instance Methods Abstract Methods Modifier and Type Method Description ParseResultgetParse(Content c)This method parses the given content and returns a map of <key, parse> pairs.- 
Methods inherited from interface org.apache.hadoop.conf.ConfigurablegetConf, setConf
 
- 
 
- 
- 
- 
Field Detail- 
X_POINT_IDstatic final String X_POINT_ID The name of the extension point.
 
- 
 - 
Method Detail- 
getParseParseResult getParse(Content c) This method parses the given content and returns a map of <key, parse> pairs. Parseinstances will be persisted under the given key.Note: Meta-redirects should be followed only when they are coming from the original URL. That is: 
 Assume fetcher is in parsing mode and is currently processing foo.bar.com/redirect.html. If this url contains a meta redirect to another url, fetcher should only follow the redirect if the map contains an entry of the form <"foo.bar.com/redirect.html",Parsewith aParseStatusindicating the redirect>.- Parameters:
- c- Content to be parsed
- Returns:
- a map containing <key, parse> pairs
- Since:
- NUTCH-443
 
 
- 
 
-