Package org.apache.nutch.segment
Class ContentAsTextInputFormat
- java.lang.Object
- 
- org.apache.hadoop.mapreduce.InputFormat<K,V>
- 
- org.apache.hadoop.mapreduce.lib.input.FileInputFormat<K,V>
- 
- org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat<Text,Text>
- 
- org.apache.nutch.segment.ContentAsTextInputFormat
 
 
 
 
- 
 public class ContentAsTextInputFormat extends SequenceFileInputFormat<Text,Text> An input format that takes Nutch Content objects and converts them to text while converting newline endings to spaces. This format is useful for working with Nutch content objects in Hadoop Streaming with other languages.
- 
- 
Nested Class Summary- 
Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormatFileInputFormat.Counter
 
- 
 - 
Field Summary- 
Fields inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormatDEFAULT_LIST_STATUS_NUM_THREADS, INPUT_DIR, INPUT_DIR_NONRECURSIVE_IGNORE_SUBDIRS, INPUT_DIR_RECURSIVE, LIST_STATUS_NUM_THREADS, NUM_INPUT_FILES, PATHFILTER_CLASS, SPLIT_MAXSIZE, SPLIT_MINSIZE
 
- 
 - 
Constructor SummaryConstructors Constructor Description ContentAsTextInputFormat()
 - 
Method SummaryAll Methods Instance Methods Concrete Methods Modifier and Type Method Description RecordReader<Text,Text>getRecordReader(InputSplit split, Job job, Mapper.Context context)- 
Methods inherited from class org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormatcreateRecordReader, getFormatMinSplitSize, listStatus
 - 
Methods inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormataddInputPath, addInputPathRecursively, addInputPaths, computeSplitSize, getBlockIndex, getInputDirRecursive, getInputPathFilter, getInputPaths, getMaxSplitSize, getMinSplitSize, getSplits, isSplitable, makeSplit, makeSplit, setInputDirRecursive, setInputPathFilter, setInputPaths, setInputPaths, setMaxInputSplitSize, setMinInputSplitSize
 
- 
 
- 
- 
- 
Method Detail- 
getRecordReaderpublic RecordReader<Text,Text> getRecordReader(InputSplit split, Job job, Mapper.Context context) throws IOException - Throws:
- IOException
 
 
- 
 
-