Package org.apache.nutch.crawl
Class LinkDbMerger
- java.lang.Object
- 
- org.apache.hadoop.conf.Configured
- 
- org.apache.nutch.crawl.LinkDbMerger
 
 
- 
- All Implemented Interfaces:
- Configurable,- Tool
 
 public class LinkDbMerger extends Configured implements Tool This tool merges several LinkDb-s into one, optionally filtering URLs through the current URLFilters, to skip prohibited URLs and links.It's possible to use this tool just for filtering - in that case only one LinkDb should be specified in arguments. If more than one LinkDb contains information about the same URL, all inlinks are accumulated, but only at most linkdb.max.inlinksinlinks will ever be added.If activated, URLFilters will be applied to both the target URLs and to any incoming link URL. If a target URL is prohibited, all inlinks to that target will be removed, including the target URL. If some of incoming links are prohibited, only they will be removed, and they won't count when checking the above-mentioned maximum limit. - Author:
- Andrzej Bialecki
 
- 
- 
Nested Class SummaryNested Classes Modifier and Type Class Description static classLinkDbMerger.LinkDbMergeReducer
 - 
Constructor SummaryConstructors Constructor Description LinkDbMerger()LinkDbMerger(Configuration conf)
 - 
Method SummaryAll Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static JobcreateMergeJob(Configuration config, Path linkDb, boolean normalize, boolean filter)static voidmain(String[] args)Run the jobvoidmerge(Path output, Path[] dbs, boolean normalize, boolean filter)intrun(String[] args)- 
Methods inherited from class org.apache.hadoop.conf.ConfiguredgetConf, setConf
 - 
Methods inherited from class java.lang.Objectclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 - 
Methods inherited from interface org.apache.hadoop.conf.ConfigurablegetConf, setConf
 
- 
 
- 
- 
- 
Constructor Detail- 
LinkDbMergerpublic LinkDbMerger() 
 - 
LinkDbMergerpublic LinkDbMerger(Configuration conf) 
 
- 
 - 
Method Detail- 
mergepublic void merge(Path output, Path[] dbs, boolean normalize, boolean filter) throws Exception - Throws:
- Exception
 
 - 
createMergeJobpublic static Job createMergeJob(Configuration config, Path linkDb, boolean normalize, boolean filter) throws IOException - Throws:
- IOException
 
 - 
mainpublic static void main(String[] args) throws Exception Run the job- Parameters:
- args- input arguments for the job
- Throws:
- Exception- if there is an error running the job
 
 
- 
 
-