Package org.apache.nutch.tools.warc
Class WARCExporter
- java.lang.Object
- 
- org.apache.hadoop.conf.Configured
- 
- org.apache.nutch.tools.warc.WARCExporter
 
 
- 
- All Implemented Interfaces:
- Configurable,- Tool
 
 public class WARCExporter extends Configured implements Tool MapReduce job to exports Nutch segments as WARC files. The file format is documented in the [ISO Standard](http://bibnum.bnf.fr/warc/WARC_ISO_28500_version1_latestdraft.pdf). Generates elements of type response if the configuration 'store.http.headers' was set to true during the fetching and the http headers were stored verbatim; generates elements of type 'resource' otherwise.
- 
- 
Nested Class SummaryNested Classes Modifier and Type Class Description static classWARCExporter.WARCMapReduce
 - 
Constructor SummaryConstructors Constructor Description WARCExporter()WARCExporter(Configuration conf)
 - 
Method SummaryAll Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description intgenerateWARC(String output, List<Path> segments, boolean onlySuccessfulResponses, boolean includeParseData, boolean includeParseText)static voidmain(String[] args)intrun(String[] args)- 
Methods inherited from class org.apache.hadoop.conf.ConfiguredgetConf, setConf
 - 
Methods inherited from class java.lang.Objectclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 - 
Methods inherited from interface org.apache.hadoop.conf.ConfigurablegetConf, setConf
 
- 
 
- 
- 
- 
Constructor Detail- 
WARCExporterpublic WARCExporter() 
 - 
WARCExporterpublic WARCExporter(Configuration conf) 
 
- 
 - 
Method Detail- 
generateWARCpublic int generateWARC(String output, List<Path> segments, boolean onlySuccessfulResponses, boolean includeParseData, boolean includeParseText) throws IOException - Throws:
- IOException
 
 
- 
 
-