Package org.apache.nutch.tools
Interface CommonCrawlFormat
- 
- All Superinterfaces:
- AutoCloseable,- Closeable
 - All Known Implementing Classes:
- AbstractCommonCrawlFormat,- CommonCrawlFormatJackson,- CommonCrawlFormatJettinson,- CommonCrawlFormatSimple,- CommonCrawlFormatWARC
 
 public interface CommonCrawlFormat extends Closeable Interface for all CommonCrawl formatter. It provides the signature for the method used to get JSON data.- Author:
- gtotaro
 
- 
- 
Method SummaryAll Methods Instance Methods Abstract Methods Modifier and Type Method Description voidclose()Optional method that could be implemented if the actual format needs some close procedure.List<String>getInLinks()gets set of inlinksStringgetJsonData()Get a string representation of the JSON structure of the URL content.StringgetJsonData(String url, Content content, Metadata metadata)Returns a string representation of the JSON structure of the URL content.StringgetJsonData(String url, Content content, Metadata metadata, ParseData parseData)Returns a string representation of the JSON structure of the URL content.voidsetInLinks(List<String> inLinks)sets inlinks of this document
 
- 
- 
- 
Method Detail- 
getJsonDataString getJsonData() throws IOException Get a string representation of the JSON structure of the URL content.- Returns:
- the JSON URL content string
- Throws:
- IOException- if there is a fatal I/O error obtaining JSON data
 
 - 
getJsonDataString getJsonData(String url, Content content, Metadata metadata) throws IOException Returns a string representation of the JSON structure of the URL content. Takes into consideration both theContentandMetadata- Parameters:
- url- the canonical url
- content- url- Content
- metadata- url- Metadata
- Returns:
- the JSON URL content string
- Throws:
- IOException- if there is a fatal I/O error obtaining JSON data
 
 - 
getJsonDataString getJsonData(String url, Content content, Metadata metadata, ParseData parseData) throws IOException Returns a string representation of the JSON structure of the URL content. Takes into consideration theContent,MetadataandParseData.- Parameters:
- url- the canonical url
- content- url- Content
- metadata- url- Metadata
- parseData- url- ParseData
- Returns:
- the JSON URL content string
- Throws:
- IOException- if there is a fatal I/O error obtaining JSON data
 
 - 
setInLinksvoid setInLinks(List<String> inLinks) sets inlinks of this document- Parameters:
- inLinks- list of inlinks
 
 - 
closevoid close() Optional method that could be implemented if the actual format needs some close procedure.- Specified by:
- closein interface- AutoCloseable
- Specified by:
- closein interface- Closeable
 
 
- 
 
-