Package org.apache.lucene.analysis.email
package org.apache.lucene.analysis.email
Fast, general-purpose URLs and email addresses tokenizers.
UAX29URLEmailTokenizer: implements the Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29, except URLs and email addresses are also tokenized according to the relevant RFCs.
UAX29URLEmailAnalyzerincludesUAX29URLEmailTokenizer,LowerCaseFilterandStopFilter.
-
ClassesClassDescriptionFilters
UAX29URLEmailTokenizerwithLowerCaseFilterandStopFilter, using a list of English stop words.This class implements Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29 URLs and email addresses are also tokenized according to the relevant RFCs.Factory forUAX29URLEmailTokenizer.This class implements Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29 URLs and email addresses are also tokenized according to the relevant RFCs.