Class Model
- java.lang.Object
- 
- org.apache.nutch.scoring.similarity.cosine.Model
 
- 
 public class Model extends Object This class creates a model used to store Document vector representation of the corpus.
- 
- 
Field SummaryFields Modifier and Type Field Description static ArrayList<DocVector>docVectorsstatic booleanisModelCreated
 - 
Constructor SummaryConstructors Constructor Description Model()
 - 
Method SummaryAll Methods Static Methods Concrete Methods Modifier and Type Method Description static floatcomputeCosineSimilarity(DocVector docVector)static DocVectorcreateDocVector(String content, int mingram, int maxgram)Used to create a DocVector from given String text.static voidcreateModel(Configuration conf)static int[]retrieveNgrams(Configuration conf)Retrieves mingram and maxgram from configuration
 
- 
- 
- 
Method Detail- 
createModelpublic static void createModel(Configuration conf) throws IOException - Throws:
- IOException
 
 - 
createDocVectorpublic static DocVector createDocVector(String content, int mingram, int maxgram) Used to create a DocVector from given String text. Used during the parse stage of the crawl cycle to create a DocVector of the currently parsed page from the parseText attribute value- Parameters:
- content- The text to tokenize
- mingram- Value of mingram for tokenizing
- maxgram- Value of maxgram for tokenizing
- Returns:
- The created DocVector
 
 - 
computeCosineSimilaritypublic static float computeCosineSimilarity(DocVector docVector) 
 - 
retrieveNgramspublic static int[] retrieveNgrams(Configuration conf) Retrieves mingram and maxgram from configuration- Parameters:
- conf- Configuration to retrieve mingram and maxgram
- Returns:
- ngram array as mingram at first index and maxgram at second index
 
 
- 
 
-