Uses of Class
org.apache.nutch.indexer.NutchDocument
- 
Packages that use NutchDocument Package Description org.apache.nutch.analysis.lang Text document language identifier.org.apache.nutch.exchange Control code for exchange component, which acts in indexing job and decides to which index writer a document should be routed, based on plugins behavior.org.apache.nutch.exchange.jexl Plugin of Exchange component based on JEXL expressions.org.apache.nutch.indexer Index content, configure and run indexing and cleaning jobs to add, update, and delete documents from an index.org.apache.nutch.indexer.anchor An indexing plugin for inbound anchor text.org.apache.nutch.indexer.arbitrary Indexing filter to add document arbitrary data to the index from the output of a user-specified class.org.apache.nutch.indexer.basic A basic indexing plugin, adds basic fields: url, host, title, content, etc.org.apache.nutch.indexer.feed Indexing filter to index meta data from RSS feeds.org.apache.nutch.indexer.filter org.apache.nutch.indexer.geoip This plugin implements an indexing filter which takes advantage of the GeoIP2-java API.org.apache.nutch.indexer.jexl This plugin implements a dynamic indexing filter which uses JEXL expressions to allow filtering based on the page's metadataorg.apache.nutch.indexer.links org.apache.nutch.indexer.metadata Indexing filter to add document metadata to the index.org.apache.nutch.indexer.more A more indexing plugin, adds "more" index fields:last modified date, MIME type, content length.org.apache.nutch.indexer.replace Indexing filter to allow pattern replacements on metadata.org.apache.nutch.indexer.staticfield A simple plugin called at indexing that adds fields with static data.org.apache.nutch.indexer.subcollection Indexing filter to assign documents to subcollections.org.apache.nutch.indexer.tld Top Level Domain Indexing plugin.org.apache.nutch.indexer.urlmeta URL Meta Tag Indexing Pluginorg.apache.nutch.indexwriter.cloudsearch org.apache.nutch.indexwriter.csv Index writer plugin to write a plain CSV file.org.apache.nutch.indexwriter.dummy Index writer plugin for debugging, writes pairs of <action, url> to a text file, action is one of "add", "update", or "delete".org.apache.nutch.indexwriter.elastic Index writer plugin for Elasticsearch.org.apache.nutch.indexwriter.kafka Index writer plugin to produce JSON messages to Kafka.org.apache.nutch.indexwriter.opensearch1x Index writer plugin for OpenSearch.org.apache.nutch.indexwriter.rabbit org.apache.nutch.indexwriter.solr Index writer plugin for Apache Solr.org.apache.nutch.microformats.reltag A microformats Rel-Tag Parser/Indexer/Querier plugin.org.apache.nutch.scoring TheScoringFilterinterface.org.apache.nutch.scoring.depth Scoring filter to stop crawling at a configurable depth (number of "hops" from seed URLs).org.apache.nutch.scoring.link Scoring filter used in conjunction withWebGraph.org.apache.nutch.scoring.opic Scoring filter implementing a variant of the Online Page Importance Computation (OPIC) algorithm.org.apache.nutch.scoring.tld Top Level Domain Scoring plugin.org.apache.nutch.tools Miscellaneous tools.org.creativecommons.nutch Sample plugins that parse and index Creative Commons metadata.
- 
- 
Uses of NutchDocument in org.apache.nutch.analysis.langMethods in org.apache.nutch.analysis.lang that return NutchDocument Modifier and Type Method Description NutchDocumentLanguageIndexingFilter. filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)Methods in org.apache.nutch.analysis.lang with parameters of type NutchDocument Modifier and Type Method Description NutchDocumentLanguageIndexingFilter. filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
- 
Uses of NutchDocument in org.apache.nutch.exchangeMethods in org.apache.nutch.exchange with parameters of type NutchDocument Modifier and Type Method Description String[]Exchanges. indexWriters(NutchDocument nutchDocument)Returns all the indexers where the document must be sent to.booleanExchange. match(NutchDocument doc)Determines if the document must go to the related index writers.
- 
Uses of NutchDocument in org.apache.nutch.exchange.jexlMethods in org.apache.nutch.exchange.jexl with parameters of type NutchDocument Modifier and Type Method Description booleanJexlExchange. match(NutchDocument doc)Determines if the document must go to the related index writers.
- 
Uses of NutchDocument in org.apache.nutch.indexerFields in org.apache.nutch.indexer declared as NutchDocument Modifier and Type Field Description NutchDocumentNutchIndexAction. docMethods in org.apache.nutch.indexer that return NutchDocument Modifier and Type Method Description NutchDocumentNutchDocument. clone()NutchDocumentIndexingFilter. filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)Adds fields or otherwise modifies the document that will be indexed for a parse.NutchDocumentIndexingFilters. filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)Run all defined filters.Methods in org.apache.nutch.indexer with parameters of type NutchDocument Modifier and Type Method Description NutchDocumentIndexingFilter. filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)Adds fields or otherwise modifies the document that will be indexed for a parse.NutchDocumentIndexingFilters. filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)Run all defined filters.voidIndexWriter. update(NutchDocument doc)voidIndexWriters. update(NutchDocument doc)voidIndexWriter. write(NutchDocument doc)voidIndexWriters. write(NutchDocument doc)Constructors in org.apache.nutch.indexer with parameters of type NutchDocument Constructor Description NutchIndexAction(NutchDocument doc, byte action)
- 
Uses of NutchDocument in org.apache.nutch.indexer.anchorMethods in org.apache.nutch.indexer.anchor that return NutchDocument Modifier and Type Method Description NutchDocumentAnchorIndexingFilter. filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)TheAnchorIndexingFilterfilter object which supports boolean configuration settings for the deduplication of anchors.Methods in org.apache.nutch.indexer.anchor with parameters of type NutchDocument Modifier and Type Method Description NutchDocumentAnchorIndexingFilter. filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)TheAnchorIndexingFilterfilter object which supports boolean configuration settings for the deduplication of anchors.
- 
Uses of NutchDocument in org.apache.nutch.indexer.arbitraryMethods in org.apache.nutch.indexer.arbitrary that return NutchDocument Modifier and Type Method Description NutchDocumentArbitraryIndexingFilter. filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)TheArbitraryIndexingFilterfilter object uses reflection to instantiate the configured class and invoke the configured method.Methods in org.apache.nutch.indexer.arbitrary with parameters of type NutchDocument Modifier and Type Method Description NutchDocumentArbitraryIndexingFilter. filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)TheArbitraryIndexingFilterfilter object uses reflection to instantiate the configured class and invoke the configured method.
- 
Uses of NutchDocument in org.apache.nutch.indexer.basicMethods in org.apache.nutch.indexer.basic that return NutchDocument Modifier and Type Method Description NutchDocumentBasicIndexingFilter. filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)TheBasicIndexingFilterfilter object which supports few configuration settings for adding basic searchable fields.Methods in org.apache.nutch.indexer.basic with parameters of type NutchDocument Modifier and Type Method Description NutchDocumentBasicIndexingFilter. filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)TheBasicIndexingFilterfilter object which supports few configuration settings for adding basic searchable fields.
- 
Uses of NutchDocument in org.apache.nutch.indexer.feedMethods in org.apache.nutch.indexer.feed that return NutchDocument Modifier and Type Method Description NutchDocumentFeedIndexingFilter. filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)Extracts out the relevant fields: FEED_AUTHOR FEED_TAGS FEED_PUBLISHED FEED_UPDATED FEED And sends them to theIndexerfor indexing within the Nutch index.Methods in org.apache.nutch.indexer.feed with parameters of type NutchDocument Modifier and Type Method Description NutchDocumentFeedIndexingFilter. filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)Extracts out the relevant fields: FEED_AUTHOR FEED_TAGS FEED_PUBLISHED FEED_UPDATED FEED And sends them to theIndexerfor indexing within the Nutch index.
- 
Uses of NutchDocument in org.apache.nutch.indexer.filterMethods in org.apache.nutch.indexer.filter that return NutchDocument Modifier and Type Method Description NutchDocumentMimeTypeIndexingFilter. filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)Methods in org.apache.nutch.indexer.filter with parameters of type NutchDocument Modifier and Type Method Description NutchDocumentMimeTypeIndexingFilter. filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
- 
Uses of NutchDocument in org.apache.nutch.indexer.geoipMethods in org.apache.nutch.indexer.geoip that return NutchDocument Modifier and Type Method Description static NutchDocumentGeoIPDocumentCreator. createDocFromCityDb(String serverIp, NutchDocument doc, com.maxmind.geoip2.DatabaseReader reader)static NutchDocumentGeoIPDocumentCreator. createDocFromCityService(String serverIp, NutchDocument doc, com.maxmind.geoip2.WebServiceClient client)static NutchDocumentGeoIPDocumentCreator. createDocFromConnectionDb(String serverIp, NutchDocument doc, com.maxmind.geoip2.DatabaseReader reader)static NutchDocumentGeoIPDocumentCreator. createDocFromCountryService(String serverIp, NutchDocument doc, com.maxmind.geoip2.WebServiceClient client)static NutchDocumentGeoIPDocumentCreator. createDocFromDomainDb(String serverIp, NutchDocument doc, com.maxmind.geoip2.DatabaseReader reader)static NutchDocumentGeoIPDocumentCreator. createDocFromInsightsService(String serverIp, NutchDocument doc, com.maxmind.geoip2.WebServiceClient client)static NutchDocumentGeoIPDocumentCreator. createDocFromIspDb(String serverIp, NutchDocument doc, com.maxmind.geoip2.DatabaseReader reader)NutchDocumentGeoIPIndexingFilter. filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)Methods in org.apache.nutch.indexer.geoip with parameters of type NutchDocument Modifier and Type Method Description static voidGeoIPDocumentCreator. addIfNotNull(NutchDocument doc, String name, Object value)Add field to document but only if value isn't nullstatic NutchDocumentGeoIPDocumentCreator. createDocFromCityDb(String serverIp, NutchDocument doc, com.maxmind.geoip2.DatabaseReader reader)static NutchDocumentGeoIPDocumentCreator. createDocFromCityService(String serverIp, NutchDocument doc, com.maxmind.geoip2.WebServiceClient client)static NutchDocumentGeoIPDocumentCreator. createDocFromConnectionDb(String serverIp, NutchDocument doc, com.maxmind.geoip2.DatabaseReader reader)static NutchDocumentGeoIPDocumentCreator. createDocFromCountryService(String serverIp, NutchDocument doc, com.maxmind.geoip2.WebServiceClient client)static NutchDocumentGeoIPDocumentCreator. createDocFromDomainDb(String serverIp, NutchDocument doc, com.maxmind.geoip2.DatabaseReader reader)static NutchDocumentGeoIPDocumentCreator. createDocFromInsightsService(String serverIp, NutchDocument doc, com.maxmind.geoip2.WebServiceClient client)static NutchDocumentGeoIPDocumentCreator. createDocFromIspDb(String serverIp, NutchDocument doc, com.maxmind.geoip2.DatabaseReader reader)NutchDocumentGeoIPIndexingFilter. filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
- 
Uses of NutchDocument in org.apache.nutch.indexer.jexlMethods in org.apache.nutch.indexer.jexl that return NutchDocument Modifier and Type Method Description NutchDocumentJexlIndexingFilter. filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)Methods in org.apache.nutch.indexer.jexl with parameters of type NutchDocument Modifier and Type Method Description NutchDocumentJexlIndexingFilter. filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
- 
Uses of NutchDocument in org.apache.nutch.indexer.linksMethods in org.apache.nutch.indexer.links that return NutchDocument Modifier and Type Method Description NutchDocumentLinksIndexingFilter. filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)Methods in org.apache.nutch.indexer.links with parameters of type NutchDocument Modifier and Type Method Description NutchDocumentLinksIndexingFilter. filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
- 
Uses of NutchDocument in org.apache.nutch.indexer.metadataMethods in org.apache.nutch.indexer.metadata that return NutchDocument Modifier and Type Method Description NutchDocumentMetadataIndexer. filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)Methods in org.apache.nutch.indexer.metadata with parameters of type NutchDocument Modifier and Type Method Description protected voidMetadataIndexer. add(NutchDocument doc, String key, String value)NutchDocumentMetadataIndexer. filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
- 
Uses of NutchDocument in org.apache.nutch.indexer.moreMethods in org.apache.nutch.indexer.more that return NutchDocument Modifier and Type Method Description NutchDocumentMoreIndexingFilter. filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)Methods in org.apache.nutch.indexer.more with parameters of type NutchDocument Modifier and Type Method Description NutchDocumentMoreIndexingFilter. filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
- 
Uses of NutchDocument in org.apache.nutch.indexer.replaceMethods in org.apache.nutch.indexer.replace that return NutchDocument Modifier and Type Method Description NutchDocumentReplaceIndexer. filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)Methods in org.apache.nutch.indexer.replace with parameters of type NutchDocument Modifier and Type Method Description NutchDocumentReplaceIndexer. filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
- 
Uses of NutchDocument in org.apache.nutch.indexer.staticfieldMethods in org.apache.nutch.indexer.staticfield that return NutchDocument Modifier and Type Method Description NutchDocumentStaticFieldIndexer. filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)TheStaticFieldIndexerfilter object which adds fields as per configuration setting.Methods in org.apache.nutch.indexer.staticfield with parameters of type NutchDocument Modifier and Type Method Description NutchDocumentStaticFieldIndexer. filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)TheStaticFieldIndexerfilter object which adds fields as per configuration setting.
- 
Uses of NutchDocument in org.apache.nutch.indexer.subcollectionMethods in org.apache.nutch.indexer.subcollection that return NutchDocument Modifier and Type Method Description NutchDocumentSubcollectionIndexingFilter. filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)Methods in org.apache.nutch.indexer.subcollection with parameters of type NutchDocument Modifier and Type Method Description NutchDocumentSubcollectionIndexingFilter. filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
- 
Uses of NutchDocument in org.apache.nutch.indexer.tldMethods in org.apache.nutch.indexer.tld that return NutchDocument Modifier and Type Method Description NutchDocumentTLDIndexingFilter. filter(NutchDocument doc, Parse parse, Text urlText, CrawlDatum datum, Inlinks inlinks)Methods in org.apache.nutch.indexer.tld with parameters of type NutchDocument Modifier and Type Method Description NutchDocumentTLDIndexingFilter. filter(NutchDocument doc, Parse parse, Text urlText, CrawlDatum datum, Inlinks inlinks)
- 
Uses of NutchDocument in org.apache.nutch.indexer.urlmetaMethods in org.apache.nutch.indexer.urlmeta that return NutchDocument Modifier and Type Method Description NutchDocumentURLMetaIndexingFilter. filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)This will take the metatags that you have listed in your "urlmeta.tags" property, and looks for them inside the CrawlDatum object.Methods in org.apache.nutch.indexer.urlmeta with parameters of type NutchDocument Modifier and Type Method Description NutchDocumentURLMetaIndexingFilter. filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)This will take the metatags that you have listed in your "urlmeta.tags" property, and looks for them inside the CrawlDatum object.
- 
Uses of NutchDocument in org.apache.nutch.indexwriter.cloudsearchMethods in org.apache.nutch.indexwriter.cloudsearch with parameters of type NutchDocument Modifier and Type Method Description voidCloudSearchIndexWriter. update(NutchDocument doc)voidCloudSearchIndexWriter. write(NutchDocument doc)
- 
Uses of NutchDocument in org.apache.nutch.indexwriter.csvMethods in org.apache.nutch.indexwriter.csv with parameters of type NutchDocument Modifier and Type Method Description voidCSVIndexWriter. update(NutchDocument doc)voidCSVIndexWriter. write(NutchDocument doc)
- 
Uses of NutchDocument in org.apache.nutch.indexwriter.dummyMethods in org.apache.nutch.indexwriter.dummy with parameters of type NutchDocument Modifier and Type Method Description voidDummyIndexWriter. update(NutchDocument doc)voidDummyIndexWriter. write(NutchDocument doc)
- 
Uses of NutchDocument in org.apache.nutch.indexwriter.elasticMethods in org.apache.nutch.indexwriter.elastic with parameters of type NutchDocument Modifier and Type Method Description voidElasticIndexWriter. update(NutchDocument doc)voidElasticIndexWriter. write(NutchDocument doc)
- 
Uses of NutchDocument in org.apache.nutch.indexwriter.kafkaMethods in org.apache.nutch.indexwriter.kafka with parameters of type NutchDocument Modifier and Type Method Description voidKafkaIndexWriter. update(NutchDocument doc)voidKafkaIndexWriter. write(NutchDocument doc)
- 
Uses of NutchDocument in org.apache.nutch.indexwriter.opensearch1xMethods in org.apache.nutch.indexwriter.opensearch1x with parameters of type NutchDocument Modifier and Type Method Description voidOpenSearch1xIndexWriter. update(NutchDocument doc)voidOpenSearch1xIndexWriter. write(NutchDocument doc)
- 
Uses of NutchDocument in org.apache.nutch.indexwriter.rabbitMethods in org.apache.nutch.indexwriter.rabbit with parameters of type NutchDocument Modifier and Type Method Description voidRabbitIndexWriter. update(NutchDocument doc)voidRabbitIndexWriter. write(NutchDocument doc)
- 
Uses of NutchDocument in org.apache.nutch.indexwriter.solrMethods in org.apache.nutch.indexwriter.solr with parameters of type NutchDocument Modifier and Type Method Description voidSolrIndexWriter. update(NutchDocument doc)voidSolrIndexWriter. write(NutchDocument doc)
- 
Uses of NutchDocument in org.apache.nutch.microformats.reltagMethods in org.apache.nutch.microformats.reltag that return NutchDocument Modifier and Type Method Description NutchDocumentRelTagIndexingFilter. filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)Methods in org.apache.nutch.microformats.reltag with parameters of type NutchDocument Modifier and Type Method Description NutchDocumentRelTagIndexingFilter. filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
- 
Uses of NutchDocument in org.apache.nutch.scoringMethods in org.apache.nutch.scoring with parameters of type NutchDocument Modifier and Type Method Description floatAbstractScoringFilter. indexerScore(Text url, NutchDocument doc, CrawlDatum dbDatum, CrawlDatum fetchDatum, Parse parse, Inlinks inlinks, float initScore)floatScoringFilter. indexerScore(Text url, NutchDocument doc, CrawlDatum dbDatum, CrawlDatum fetchDatum, Parse parse, Inlinks inlinks, float initScore)This method calculates a indexed document score/boost.floatScoringFilters. indexerScore(Text url, NutchDocument doc, CrawlDatum dbDatum, CrawlDatum fetchDatum, Parse parse, Inlinks inlinks, float initScore)
- 
Uses of NutchDocument in org.apache.nutch.scoring.depthMethods in org.apache.nutch.scoring.depth with parameters of type NutchDocument Modifier and Type Method Description floatDepthScoringFilter. indexerScore(Text url, NutchDocument doc, CrawlDatum dbDatum, CrawlDatum fetchDatum, Parse parse, Inlinks inlinks, float initScore)
- 
Uses of NutchDocument in org.apache.nutch.scoring.linkMethods in org.apache.nutch.scoring.link with parameters of type NutchDocument Modifier and Type Method Description floatLinkAnalysisScoringFilter. indexerScore(Text url, NutchDocument doc, CrawlDatum dbDatum, CrawlDatum fetchDatum, Parse parse, Inlinks inlinks, float initScore)
- 
Uses of NutchDocument in org.apache.nutch.scoring.opicMethods in org.apache.nutch.scoring.opic with parameters of type NutchDocument Modifier and Type Method Description floatOPICScoringFilter. indexerScore(Text url, NutchDocument doc, CrawlDatum dbDatum, CrawlDatum fetchDatum, Parse parse, Inlinks inlinks, float initScore)Dampen the boost value by scorePower.
- 
Uses of NutchDocument in org.apache.nutch.scoring.tldMethods in org.apache.nutch.scoring.tld with parameters of type NutchDocument Modifier and Type Method Description floatTLDScoringFilter. indexerScore(Text url, NutchDocument doc, CrawlDatum dbDatum, CrawlDatum fetchDatum, Parse parse, Inlinks inlinks, float initScore)
- 
Uses of NutchDocument in org.apache.nutch.toolsMethods in org.apache.nutch.tools with parameters of type NutchDocument Modifier and Type Method Description static org.archive.io.warc.WARCRecordInfoWARCUtils. docToMetadata(NutchDocument doc)
- 
Uses of NutchDocument in org.creativecommons.nutchMethods in org.creativecommons.nutch that return NutchDocument Modifier and Type Method Description NutchDocumentCCIndexingFilter. filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)Methods in org.creativecommons.nutch with parameters of type NutchDocument Modifier and Type Method Description voidCCIndexingFilter. addUrlFeatures(NutchDocument doc, String urlString)Add the features represented by a license URL.NutchDocumentCCIndexingFilter. filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
 
-