Class WeightedSpanTermExtractor
java.lang.Object
org.apache.lucene.search.highlight.WeightedSpanTermExtractor
Class used to extract
WeightedSpanTerms from a Query based on whether Terms from the Query are contained in a supplied TokenStream.
In order to support additional, by default unsupported queries, subclasses can override extract(Query, float, Map) for extracting wrapped or delegate queries and extractUnknownQuery(Query, Map) to process custom leaf queries:
WeightedSpanTermExtractor extractor = new WeightedSpanTermExtractor() {
protected void extract(Query query, float boost, Map<String, WeightedSpanTerm>terms) throws IOException {
if (query instanceof QueryWrapper) {
extract(((QueryWrapper)query).getQuery(), boost, terms);
} else {
super.extract(query, boost, terms);
}
}
protected void extractUnknownQuery(Query query, Map<String, WeightedSpanTerm> terms) throws IOException {
if (query instanceOf CustomTermQuery) {
Term term = ((CustomTermQuery) query).getTerm();
terms.put(term.field(), new WeightedSpanTerm(1, term.text()));
}
}
};
}
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionprotected static classThis class makes sure that if both position sensitive and insensitive versions of the same term are added, the position insensitive one wins. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected voidcollectSpanQueryFields(SpanQuery spanQuery, Set<String> fieldNames) protected voidextract(Query query, float boost, Map<String, WeightedSpanTerm> terms) protected voidextractUnknownQuery(Query query, Map<String, WeightedSpanTerm> terms) protected voidextractWeightedSpanTerms(Map<String, WeightedSpanTerm> terms, SpanQuery spanQuery, float boost) protected voidextractWeightedTerms(Map<String, WeightedSpanTerm> terms, Query query, float boost) protected booleanfieldNameComparator(String fieldNameToCheck) Necessary to implement matches for queries againstdefaultFieldbooleanprotected LeafReaderContextReturns the tokenStream which may have been wrapped in a CachingTokenFilter.getWeightedSpanTerms(Query query, float boost, TokenStream tokenStream) Creates a Map ofWeightedSpanTermsfrom the givenQueryandTokenStream.getWeightedSpanTerms(Query query, float boost, TokenStream tokenStream, String fieldName) Creates a Map ofWeightedSpanTermsfrom the givenQueryandTokenStream.getWeightedSpanTermsWithScores(Query query, float boost, TokenStream tokenStream, String fieldName, IndexReader reader) Creates a Map ofWeightedSpanTermsfrom the givenQueryandTokenStream.booleanprotected booleanisQueryUnsupported(Class<? extends Query> clazz) booleanprotected booleanmustRewriteQuery(SpanQuery spanQuery) voidsetExpandMultiTermQuery(boolean expandMultiTermQuery) protected final voidsetMaxDocCharsToAnalyze(int maxDocCharsToAnalyze) A threshold of number of characters to analyze.voidsetUsePayloads(boolean usePayloads) voidsetWrapIfNotCachingTokenFilter(boolean wrap) By default,TokenStreams that are not of the typeCachingTokenFilterare wrapped in aCachingTokenFilterto ensure an efficient reset - if you are already using a different cachingTokenStreamimpl and you don't want it to be wrapped, set this to false.
-
Constructor Details
-
WeightedSpanTermExtractor
public WeightedSpanTermExtractor() -
WeightedSpanTermExtractor
-
-
Method Details
-
extract
protected void extract(Query query, float boost, Map<String, WeightedSpanTerm> terms) throws IOException- Parameters:
query- Query to extract Terms fromterms- Map to place created WeightedSpanTerms in- Throws:
IOException- If there is a low-level I/O error
-
isQueryUnsupported
-
extractUnknownQuery
protected void extractUnknownQuery(Query query, Map<String, WeightedSpanTerm> terms) throws IOException- Throws:
IOException
-
extractWeightedSpanTerms
protected void extractWeightedSpanTerms(Map<String, WeightedSpanTerm> terms, SpanQuery spanQuery, float boost) throws IOException- Parameters:
terms- Map to place created WeightedSpanTerms inspanQuery- SpanQuery to extract Terms from- Throws:
IOException- If there is a low-level I/O error
-
extractWeightedTerms
protected void extractWeightedTerms(Map<String, WeightedSpanTerm> terms, Query query, float boost) throws IOException- Parameters:
terms- Map to place created WeightedSpanTerms inquery- Query to extract Terms from- Throws:
IOException- If there is a low-level I/O error
-
fieldNameComparator
Necessary to implement matches for queries againstdefaultField -
getLeafContext
- Throws:
IOException
-
getWeightedSpanTerms
public Map<String,WeightedSpanTerm> getWeightedSpanTerms(Query query, float boost, TokenStream tokenStream) throws IOException Creates a Map ofWeightedSpanTermsfrom the givenQueryandTokenStream.- Parameters:
query- that caused hittokenStream- of text to be highlighted- Returns:
- Map containing WeightedSpanTerms
- Throws:
IOException- If there is a low-level I/O error
-
getWeightedSpanTerms
public Map<String,WeightedSpanTerm> getWeightedSpanTerms(Query query, float boost, TokenStream tokenStream, String fieldName) throws IOException Creates a Map ofWeightedSpanTermsfrom the givenQueryandTokenStream.- Parameters:
query- that caused hittokenStream- of text to be highlightedfieldName- restricts Term's used based on field name- Returns:
- Map containing WeightedSpanTerms
- Throws:
IOException- If there is a low-level I/O error
-
getWeightedSpanTermsWithScores
public Map<String,WeightedSpanTerm> getWeightedSpanTermsWithScores(Query query, float boost, TokenStream tokenStream, String fieldName, IndexReader reader) throws IOException Creates a Map ofWeightedSpanTermsfrom the givenQueryandTokenStream. Uses a suppliedIndexReaderto properly weight terms (for gradient highlighting).- Parameters:
query- that caused hittokenStream- of text to be highlightedfieldName- restricts Term's used based on field namereader- to use for scoring- Returns:
- Map of WeightedSpanTerms with quasi tf/idf scores
- Throws:
IOException- If there is a low-level I/O error
-
collectSpanQueryFields
-
mustRewriteQuery
-
getExpandMultiTermQuery
public boolean getExpandMultiTermQuery() -
setExpandMultiTermQuery
public void setExpandMultiTermQuery(boolean expandMultiTermQuery) -
isUsePayloads
public boolean isUsePayloads() -
setUsePayloads
public void setUsePayloads(boolean usePayloads) -
isCachedTokenStream
public boolean isCachedTokenStream() -
getTokenStream
Returns the tokenStream which may have been wrapped in a CachingTokenFilter. getWeightedSpanTerms* sets the tokenStream, so don't call this before. -
setWrapIfNotCachingTokenFilter
public void setWrapIfNotCachingTokenFilter(boolean wrap) By default,TokenStreams that are not of the typeCachingTokenFilterare wrapped in aCachingTokenFilterto ensure an efficient reset - if you are already using a different cachingTokenStreamimpl and you don't want it to be wrapped, set this to false. This setting is ignored when a term vector based TokenStream is supplied, since it can be reset efficiently. -
setMaxDocCharsToAnalyze
protected final void setMaxDocCharsToAnalyze(int maxDocCharsToAnalyze) A threshold of number of characters to analyze. When a TokenStream based on term vectors with offsets and positions are supplied, this setting does not apply.
-