Package | Description |
---|---|
org.apache.lucene.analysis |
API and code to convert text into indexable/searchable tokens.
|
org.apache.lucene.analysis.ar |
Analyzer for Arabic.
|
org.apache.lucene.analysis.bg |
Analyzer for Bulgarian.
|
org.apache.lucene.analysis.br |
Analyzer for Brazilian Portuguese.
|
org.apache.lucene.analysis.cjk |
Analyzer for Chinese, Japanese, and Korean, which indexes bigrams (overlapping groups of two adjacent Han characters).
|
org.apache.lucene.analysis.cn |
Analyzer for Chinese, which indexes unigrams (individual chinese characters).
|
org.apache.lucene.analysis.cn.smart |
Analyzer for Simplified Chinese, which indexes words.
|
org.apache.lucene.analysis.compound |
A filter that decomposes compound words you find in many Germanic
languages into the word parts.
|
org.apache.lucene.analysis.cz |
Analyzer for Czech.
|
org.apache.lucene.analysis.de |
Analyzer for German.
|
org.apache.lucene.analysis.el |
Analyzer for Greek.
|
org.apache.lucene.analysis.en |
Analyzer for English.
|
org.apache.lucene.analysis.es |
Analyzer for Spanish.
|
org.apache.lucene.analysis.fa |
Analyzer for Persian.
|
org.apache.lucene.analysis.fi |
Analyzer for Finnish.
|
org.apache.lucene.analysis.fr |
Analyzer for French.
|
org.apache.lucene.analysis.ga |
Analysis for Irish.
|
org.apache.lucene.analysis.gl |
Analyzer for Galician.
|
org.apache.lucene.analysis.hi |
Analyzer for Hindi.
|
org.apache.lucene.analysis.hu |
Analyzer for Hungarian.
|
org.apache.lucene.analysis.hunspell |
Stemming TokenFilter using a Java implementation of the
Hunspell stemming algorithm.
|
org.apache.lucene.analysis.icu |
Analysis components based on ICU
|
org.apache.lucene.analysis.icu.segmentation |
Tokenizer that breaks text into words with the Unicode Text Segmentation algorithm.
|
org.apache.lucene.analysis.id |
Analyzer for Indonesian.
|
org.apache.lucene.analysis.in |
Analysis components for Indian languages.
|
org.apache.lucene.analysis.it |
Analyzer for Italian.
|
org.apache.lucene.analysis.ja |
Analyzer for Japanese.
|
org.apache.lucene.analysis.lv |
Analyzer for Latvian.
|
org.apache.lucene.analysis.miscellaneous |
Miscellaneous TokenStreams
|
org.apache.lucene.analysis.ngram |
Character n-gram tokenizers and filters.
|
org.apache.lucene.analysis.nl |
Analyzer for Dutch.
|
org.apache.lucene.analysis.no |
Analyzer for Norwegian.
|
org.apache.lucene.analysis.path |
Analysis components for path-like strings such as filenames.
|
org.apache.lucene.analysis.payloads |
Provides various convenience classes for creating payloads on Tokens.
|
org.apache.lucene.analysis.phonetic |
Analysis components for phonetic search.
|
org.apache.lucene.analysis.position |
Filter for assigning position increments.
|
org.apache.lucene.analysis.pt |
Analyzer for Portuguese.
|
org.apache.lucene.analysis.query |
Automatically filter high-frequency stopwords.
|
org.apache.lucene.analysis.reverse |
Filter to reverse token text.
|
org.apache.lucene.analysis.ru |
Analyzer for Russian.
|
org.apache.lucene.analysis.shingle |
Word n-gram filters
|
org.apache.lucene.analysis.snowball |
TokenFilter and Analyzer implementations that use Snowball
stemmers. |
org.apache.lucene.analysis.standard |
Standards-based analyzers implemented with JFlex.
|
org.apache.lucene.analysis.stempel |
Stempel: Algorithmic Stemmer
|
org.apache.lucene.analysis.sv |
Analyzer for Swedish.
|
org.apache.lucene.analysis.synonym |
Analysis components for Synonyms.
|
org.apache.lucene.analysis.th |
Analyzer for Thai.
|
org.apache.lucene.analysis.tr |
Analyzer for Turkish.
|
org.apache.lucene.analysis.wikipedia |
Tokenizer that is aware of Wikipedia syntax.
|
org.apache.lucene.collation |
CollationKeyFilter
converts each token into its binary CollationKey using the
provided Collator , and then encode the CollationKey
as a String using
IndexableBinaryStringTools , to allow it to be
stored as an index term. |
org.apache.lucene.document |
The logical representation of a
Document for indexing and searching. |
org.apache.lucene.facet.enhancements |
Enhanced category features
|
org.apache.lucene.facet.enhancements.association |
Association category enhancements
|
org.apache.lucene.facet.index |
Indexing of document categories
|
org.apache.lucene.facet.index.streaming |
Expert: attributes streaming definition for indexing facets
|
org.apache.lucene.index.memory |
High-performance single-document main memory Apache Lucene fulltext search index.
|
org.apache.lucene.queryParser |
A simple query parser implemented with JavaCC.
|
org.apache.lucene.search.highlight |
The highlight package contains classes to provide "keyword in context" features
typically used to highlight search terms in the text of results pages.
|
Modifier and Type | Class | Description |
---|---|---|
class |
ASCIIFoldingFilter |
This class converts alphabetic, numeric, and symbolic Unicode characters
which are not in the first 127 ASCII characters (the "Basic Latin" Unicode
block) into their ASCII equivalents, if one exists.
|
class |
CachingTokenFilter |
This class can be used if the token attributes of a TokenStream
are intended to be consumed more than once.
|
class |
CannedTokenStream |
TokenStream from a canned list of Tokens.
|
class |
CharTokenizer |
An abstract base class for simple, character-oriented tokenizers.
|
class |
EmptyTokenizer |
Emits no tokens
|
class |
FilteringTokenFilter |
Abstract base class for TokenFilters that may remove tokens.
|
class |
ISOLatin1AccentFilter |
Deprecated.
If you build a new index, use
ASCIIFoldingFilter
which covers a superset of Latin 1. |
class |
KeywordMarkerFilter |
Marks terms as keywords via the
KeywordAttribute . |
class |
KeywordTokenizer |
Emits the entire input as a single token.
|
class |
LengthFilter |
Removes words that are too long or too short from the stream.
|
class |
LetterTokenizer |
A LetterTokenizer is a tokenizer that divides text at non-letters.
|
class |
LimitTokenCountFilter |
This TokenFilter limits the number of tokens while indexing.
|
class |
LookaheadTokenFilter<T extends LookaheadTokenFilter.Position> |
An abstract TokenFilter to make it easier to build graph
token filters requiring some lookahead.
|
class |
LowerCaseFilter |
Normalizes token text to lower case.
|
class |
LowerCaseTokenizer |
LowerCaseTokenizer performs the function of LetterTokenizer
and LowerCaseFilter together.
|
class |
MockFixedLengthPayloadFilter |
TokenFilter that adds random fixed-length payloads.
|
class |
MockGraphTokenFilter |
Randomly inserts overlapped (posInc=0) tokens with
posLength sometimes > 1.
|
class |
MockHoleInjectingTokenFilter |
|
class |
MockRandomLookaheadTokenFilter |
Uses
LookaheadTokenFilter to randomly peek at future tokens. |
class |
MockTokenizer |
Tokenizer for testing.
|
class |
MockVariableLengthPayloadFilter |
TokenFilter that adds random variable-length payloads.
|
class |
NumericTokenStream |
Expert: This class provides a
TokenStream
for indexing numeric values that can be used by NumericRangeQuery or NumericRangeFilter . |
class |
PorterStemFilter |
Transforms the token stream as per the Porter stemming algorithm.
|
class |
StopFilter |
Removes stop words from a token stream.
|
class |
TeeSinkTokenFilter |
This TokenFilter provides the ability to set aside attribute states
that have already been analyzed.
|
static class |
TeeSinkTokenFilter.SinkTokenStream |
TokenStream output from a tee with optional filtering.
|
class |
TokenFilter |
A TokenFilter is a TokenStream whose input is another TokenStream.
|
class |
Tokenizer |
A Tokenizer is a TokenStream whose input is a Reader.
|
class |
TypeTokenFilter |
Removes tokens whose types appear in a set of blocked types from a token stream.
|
class |
ValidatingTokenFilter |
A TokenFilter that checks consistency of the tokens (eg
offsets are consistent with one another).
|
class |
WhitespaceTokenizer |
A WhitespaceTokenizer is a tokenizer that divides text at whitespace.
|
Modifier and Type | Field | Description |
---|---|---|
protected TokenStream |
TokenFilter.input |
The source of tokens for this filter.
|
protected TokenStream |
ReusableAnalyzerBase.TokenStreamComponents.sink |
Modifier and Type | Method | Description |
---|---|---|
protected TokenStream |
ReusableAnalyzerBase.TokenStreamComponents.getTokenStream() |
Returns the sink
TokenStream |
TokenStream |
Analyzer.reusableTokenStream(String fieldName,
Reader reader) |
Creates a TokenStream that is allowed to be re-used
from the previous time that the same thread called
this method.
|
TokenStream |
LimitTokenCountAnalyzer.reusableTokenStream(String fieldName,
Reader reader) |
|
TokenStream |
MockAnalyzer.reusableTokenStream(String fieldName,
Reader reader) |
|
TokenStream |
PerFieldAnalyzerWrapper.reusableTokenStream(String fieldName,
Reader reader) |
|
TokenStream |
ReusableAnalyzerBase.reusableTokenStream(String fieldName,
Reader reader) |
This method uses
ReusableAnalyzerBase.createComponents(String, Reader) to obtain an
instance of ReusableAnalyzerBase.TokenStreamComponents . |
abstract TokenStream |
Analyzer.tokenStream(String fieldName,
Reader reader) |
Creates a TokenStream which tokenizes all the text in the provided
Reader.
|
TokenStream |
LimitTokenCountAnalyzer.tokenStream(String fieldName,
Reader reader) |
|
TokenStream |
MockAnalyzer.tokenStream(String fieldName,
Reader reader) |
|
TokenStream |
PerFieldAnalyzerWrapper.tokenStream(String fieldName,
Reader reader) |
|
TokenStream |
ReusableAnalyzerBase.tokenStream(String fieldName,
Reader reader) |
This method uses
ReusableAnalyzerBase.createComponents(String, Reader) to obtain an
instance of ReusableAnalyzerBase.TokenStreamComponents and returns the sink of the
components. |
Modifier and Type | Method | Description |
---|---|---|
static void |
BaseTokenStreamTestCase.assertTokenStreamContents(TokenStream ts,
String[] output) |
|
static void |
BaseTokenStreamTestCase.assertTokenStreamContents(TokenStream ts,
String[] output,
int[] posIncrements) |
|
static void |
BaseTokenStreamTestCase.assertTokenStreamContents(TokenStream ts,
String[] output,
int[] startOffsets,
int[] endOffsets) |
|
static void |
BaseTokenStreamTestCase.assertTokenStreamContents(TokenStream ts,
String[] output,
int[] startOffsets,
int[] endOffsets,
int[] posIncrements) |
|
static void |
BaseTokenStreamTestCase.assertTokenStreamContents(TokenStream ts,
String[] output,
int[] startOffsets,
int[] endOffsets,
int[] posIncrements,
int[] posLengths,
Integer finalOffset) |
|
static void |
BaseTokenStreamTestCase.assertTokenStreamContents(TokenStream ts,
String[] output,
int[] startOffsets,
int[] endOffsets,
int[] posIncrements,
Integer finalOffset) |
|
static void |
BaseTokenStreamTestCase.assertTokenStreamContents(TokenStream ts,
String[] output,
int[] startOffsets,
int[] endOffsets,
Integer finalOffset) |
|
static void |
BaseTokenStreamTestCase.assertTokenStreamContents(TokenStream ts,
String[] output,
int[] startOffsets,
int[] endOffsets,
String[] types,
int[] posIncrements) |
|
static void |
BaseTokenStreamTestCase.assertTokenStreamContents(TokenStream ts,
String[] output,
int[] startOffsets,
int[] endOffsets,
String[] types,
int[] posIncrements,
int[] posLengths,
Integer finalOffset) |
|
static void |
BaseTokenStreamTestCase.assertTokenStreamContents(TokenStream ts,
String[] output,
int[] startOffsets,
int[] endOffsets,
String[] types,
int[] posIncrements,
int[] posLengths,
Integer finalOffset,
boolean offsetsAreCorrect) |
|
static void |
BaseTokenStreamTestCase.assertTokenStreamContents(TokenStream ts,
String[] output,
int[] startOffsets,
int[] endOffsets,
String[] types,
int[] posIncrements,
Integer finalOffset) |
|
static void |
BaseTokenStreamTestCase.assertTokenStreamContents(TokenStream ts,
String[] output,
String[] types) |
Constructor | Description |
---|---|
ASCIIFoldingFilter(TokenStream input) |
|
CachingTokenFilter(TokenStream input) |
|
FilteringTokenFilter(boolean enablePositionIncrements,
TokenStream input) |
|
ISOLatin1AccentFilter(TokenStream input) |
Deprecated.
|
KeywordMarkerFilter(TokenStream in,
Set<?> keywordSet) |
Create a new KeywordMarkerFilter, that marks the current token as a
keyword if the tokens term buffer is contained in the given set via the
KeywordAttribute . |
KeywordMarkerFilter(TokenStream in,
CharArraySet keywordSet) |
Create a new KeywordMarkerFilter, that marks the current token as a
keyword if the tokens term buffer is contained in the given set via the
KeywordAttribute . |
LengthFilter(boolean enablePositionIncrements,
TokenStream in,
int min,
int max) |
Build a filter that removes words that are too long or too
short from the text.
|
LengthFilter(TokenStream in,
int min,
int max) |
Deprecated.
Use
LengthFilter(boolean, TokenStream, int, int) instead. |
LimitTokenCountFilter(TokenStream in,
int maxTokenCount) |
Build a filter that only accepts tokens up to a maximum number.
|
LookaheadTokenFilter(TokenStream input) |
|
LowerCaseFilter(TokenStream in) |
Deprecated.
Use
LowerCaseFilter(Version, TokenStream) instead. |
LowerCaseFilter(Version matchVersion,
TokenStream in) |
Create a new LowerCaseFilter, that normalizes token text to lower case.
|
MockFixedLengthPayloadFilter(Random random,
TokenStream in,
int length) |
|
MockGraphTokenFilter(Random random,
TokenStream input) |
|
MockHoleInjectingTokenFilter(Random random,
TokenStream in) |
|
MockRandomLookaheadTokenFilter(Random random,
TokenStream in) |
|
MockVariableLengthPayloadFilter(Random random,
TokenStream in) |
|
PorterStemFilter(TokenStream in) |
|
StopFilter(boolean enablePositionIncrements,
TokenStream in,
Set<?> stopWords) |
Deprecated.
use
StopFilter(Version, TokenStream, Set) instead |
StopFilter(boolean enablePositionIncrements,
TokenStream input,
Set<?> stopWords,
boolean ignoreCase) |
Deprecated.
Use
StopFilter(Version, TokenStream, Set) instead |
StopFilter(Version matchVersion,
TokenStream in,
Set<?> stopWords) |
Constructs a filter which removes words from the input TokenStream that are
named in the Set.
|
StopFilter(Version matchVersion,
TokenStream input,
Set<?> stopWords,
boolean ignoreCase) |
Deprecated.
Use
StopFilter(Version, TokenStream, Set) instead |
TeeSinkTokenFilter(TokenStream input) |
Instantiates a new TeeSinkTokenFilter.
|
TokenFilter(TokenStream input) |
Construct a token stream filtering the given input.
|
TokenStreamComponents(Tokenizer source,
TokenStream result) |
Creates a new
ReusableAnalyzerBase.TokenStreamComponents instance. |
TokenStreamToDot(String inputText,
TokenStream in,
PrintWriter out) |
If inputText is non-null, and the TokenStream has
offsets, we include the surface form in each arc's
label.
|
TypeTokenFilter(boolean enablePositionIncrements,
TokenStream input,
Set<String> stopTypes) |
|
TypeTokenFilter(boolean enablePositionIncrements,
TokenStream input,
Set<String> stopTypes,
boolean useWhiteList) |
|
ValidatingTokenFilter(TokenStream in,
String name,
boolean offsetsAreCorrect) |
The name arg is used to identify this stage when
throwing exceptions (useful if you have more than one
instance in your chain).
|
Modifier and Type | Class | Description |
---|---|---|
class |
ArabicLetterTokenizer |
Deprecated.
(3.1) Use
StandardTokenizer instead. |
class |
ArabicNormalizationFilter |
A
TokenFilter that applies ArabicNormalizer to normalize the orthography. |
class |
ArabicStemFilter |
A
TokenFilter that applies ArabicStemmer to stem Arabic words.. |
Constructor | Description |
---|---|
ArabicNormalizationFilter(TokenStream input) |
|
ArabicStemFilter(TokenStream input) |
Modifier and Type | Class | Description |
---|---|---|
class |
BulgarianStemFilter |
A
TokenFilter that applies BulgarianStemmer to stem Bulgarian
words. |
Constructor | Description |
---|---|
BulgarianStemFilter(TokenStream input) |
Modifier and Type | Class | Description |
---|---|---|
class |
BrazilianStemFilter |
A
TokenFilter that applies BrazilianStemmer . |
Constructor | Description |
---|---|
BrazilianStemFilter(TokenStream in) |
Creates a new BrazilianStemFilter
|
BrazilianStemFilter(TokenStream in,
Set<?> exclusiontable) |
Deprecated.
use
KeywordAttribute with KeywordMarkerFilter instead. |
Modifier and Type | Class | Description |
---|---|---|
class |
CJKBigramFilter |
Forms bigrams of CJK terms that are generated from StandardTokenizer
or ICUTokenizer.
|
class |
CJKTokenizer |
Deprecated.
Use StandardTokenizer, CJKWidthFilter, CJKBigramFilter, and LowerCaseFilter instead.
|
class |
CJKWidthFilter |
A
TokenFilter that normalizes CJK width differences:
Folds fullwidth ASCII variants into the equivalent basic latin
Folds halfwidth Katakana variants into the equivalent kana
|
Constructor | Description |
---|---|
CJKBigramFilter(TokenStream in) |
|
CJKBigramFilter(TokenStream in,
int flags) |
Create a new CJKBigramFilter, specifying which writing systems should be bigrammed.
|
CJKWidthFilter(TokenStream input) |
Modifier and Type | Class | Description |
---|---|---|
class |
ChineseFilter |
Deprecated.
Use
StopFilter instead, which has the same functionality. |
class |
ChineseTokenizer |
Deprecated.
Use
StandardTokenizer instead, which has the same functionality. |
Constructor | Description |
---|---|
ChineseFilter(TokenStream in) |
Deprecated.
|
Modifier and Type | Class | Description |
---|---|---|
class |
SentenceTokenizer |
Tokenizes input text into sentences.
|
class |
WordTokenFilter |
A
TokenFilter that breaks sentences into words. |
Modifier and Type | Method | Description |
---|---|---|
TokenStream |
SmartChineseAnalyzer.reusableTokenStream(String fieldName,
Reader reader) |
|
TokenStream |
SmartChineseAnalyzer.tokenStream(String fieldName,
Reader reader) |
Constructor | Description |
---|---|
WordTokenFilter(TokenStream in) |
Construct a new WordTokenizer.
|
Modifier and Type | Class | Description |
---|---|---|
class |
CompoundWordTokenFilterBase |
Base class for decomposition token filters.
|
class |
DictionaryCompoundWordTokenFilter |
A
TokenFilter that decomposes compound words found in many Germanic languages. |
class |
HyphenationCompoundWordTokenFilter |
A
TokenFilter that decomposes compound words found in many Germanic languages. |
Constructor | Description |
---|---|
CompoundWordTokenFilterBase(TokenStream input,
String[] dictionary) |
Deprecated.
|
CompoundWordTokenFilterBase(TokenStream input,
String[] dictionary,
boolean onlyLongestMatch) |
Deprecated.
|
CompoundWordTokenFilterBase(TokenStream input,
String[] dictionary,
int minWordSize,
int minSubwordSize,
int maxSubwordSize,
boolean onlyLongestMatch) |
Deprecated.
|
CompoundWordTokenFilterBase(TokenStream input,
Set<?> dictionary) |
Deprecated.
|
CompoundWordTokenFilterBase(TokenStream input,
Set<?> dictionary,
boolean onlyLongestMatch) |
Deprecated.
|
CompoundWordTokenFilterBase(TokenStream input,
Set<?> dictionary,
int minWordSize,
int minSubwordSize,
int maxSubwordSize,
boolean onlyLongestMatch) |
Deprecated.
|
CompoundWordTokenFilterBase(Version matchVersion,
TokenStream input,
String[] dictionary) |
|
CompoundWordTokenFilterBase(Version matchVersion,
TokenStream input,
String[] dictionary,
boolean onlyLongestMatch) |
|
CompoundWordTokenFilterBase(Version matchVersion,
TokenStream input,
String[] dictionary,
int minWordSize,
int minSubwordSize,
int maxSubwordSize,
boolean onlyLongestMatch) |
|
CompoundWordTokenFilterBase(Version matchVersion,
TokenStream input,
Set<?> dictionary) |
|
CompoundWordTokenFilterBase(Version matchVersion,
TokenStream input,
Set<?> dictionary,
boolean onlyLongestMatch) |
|
CompoundWordTokenFilterBase(Version matchVersion,
TokenStream input,
Set<?> dictionary,
int minWordSize,
int minSubwordSize,
int maxSubwordSize,
boolean onlyLongestMatch) |
|
DictionaryCompoundWordTokenFilter(TokenStream input,
String[] dictionary) |
Deprecated.
|
DictionaryCompoundWordTokenFilter(TokenStream input,
String[] dictionary,
int minWordSize,
int minSubwordSize,
int maxSubwordSize,
boolean onlyLongestMatch) |
Deprecated.
|
DictionaryCompoundWordTokenFilter(TokenStream input,
Set dictionary) |
Deprecated.
|
DictionaryCompoundWordTokenFilter(TokenStream input,
Set dictionary,
int minWordSize,
int minSubwordSize,
int maxSubwordSize,
boolean onlyLongestMatch) |
Deprecated.
|
DictionaryCompoundWordTokenFilter(Version matchVersion,
TokenStream input,
String[] dictionary) |
Deprecated.
Use the constructors taking
Set |
DictionaryCompoundWordTokenFilter(Version matchVersion,
TokenStream input,
String[] dictionary,
int minWordSize,
int minSubwordSize,
int maxSubwordSize,
boolean onlyLongestMatch) |
Deprecated.
Use the constructors taking
Set |
DictionaryCompoundWordTokenFilter(Version matchVersion,
TokenStream input,
Set<?> dictionary) |
Creates a new
DictionaryCompoundWordTokenFilter |
DictionaryCompoundWordTokenFilter(Version matchVersion,
TokenStream input,
Set<?> dictionary,
int minWordSize,
int minSubwordSize,
int maxSubwordSize,
boolean onlyLongestMatch) |
Creates a new
DictionaryCompoundWordTokenFilter |
HyphenationCompoundWordTokenFilter(TokenStream input,
HyphenationTree hyphenator,
String[] dictionary) |
Deprecated.
|
HyphenationCompoundWordTokenFilter(TokenStream input,
HyphenationTree hyphenator,
String[] dictionary,
int minWordSize,
int minSubwordSize,
int maxSubwordSize,
boolean onlyLongestMatch) |
|
HyphenationCompoundWordTokenFilter(TokenStream input,
HyphenationTree hyphenator,
Set<?> dictionary) |
Deprecated.
|
HyphenationCompoundWordTokenFilter(TokenStream input,
HyphenationTree hyphenator,
Set<?> dictionary,
int minWordSize,
int minSubwordSize,
int maxSubwordSize,
boolean onlyLongestMatch) |
|
HyphenationCompoundWordTokenFilter(Version matchVersion,
TokenStream input,
HyphenationTree hyphenator) |
Create a HyphenationCompoundWordTokenFilter with no dictionary.
|
HyphenationCompoundWordTokenFilter(Version matchVersion,
TokenStream input,
HyphenationTree hyphenator,
int minWordSize,
int minSubwordSize,
int maxSubwordSize) |
Create a HyphenationCompoundWordTokenFilter with no dictionary.
|
HyphenationCompoundWordTokenFilter(Version matchVersion,
TokenStream input,
HyphenationTree hyphenator,
String[] dictionary) |
Deprecated.
Use the constructors taking
Set |
HyphenationCompoundWordTokenFilter(Version matchVersion,
TokenStream input,
HyphenationTree hyphenator,
String[] dictionary,
int minWordSize,
int minSubwordSize,
int maxSubwordSize,
boolean onlyLongestMatch) |
Deprecated.
Use the constructors taking
Set |
HyphenationCompoundWordTokenFilter(Version matchVersion,
TokenStream input,
HyphenationTree hyphenator,
Set<?> dictionary) |
Creates a new
HyphenationCompoundWordTokenFilter instance. |
HyphenationCompoundWordTokenFilter(Version matchVersion,
TokenStream input,
HyphenationTree hyphenator,
Set<?> dictionary,
int minWordSize,
int minSubwordSize,
int maxSubwordSize,
boolean onlyLongestMatch) |
Creates a new
HyphenationCompoundWordTokenFilter instance. |
Modifier and Type | Class | Description |
---|---|---|
class |
CzechStemFilter |
A
TokenFilter that applies CzechStemmer to stem Czech words. |
Constructor | Description |
---|---|
CzechStemFilter(TokenStream input) |
Modifier and Type | Class | Description |
---|---|---|
class |
GermanLightStemFilter |
A
TokenFilter that applies GermanLightStemmer to stem German
words. |
class |
GermanMinimalStemFilter |
A
TokenFilter that applies GermanMinimalStemmer to stem German
words. |
class |
GermanNormalizationFilter |
Normalizes German characters according to the heuristics
of the
German2 snowball algorithm.
|
class |
GermanStemFilter |
A
TokenFilter that stems German words. |
Constructor | Description |
---|---|
GermanLightStemFilter(TokenStream input) |
|
GermanMinimalStemFilter(TokenStream input) |
|
GermanNormalizationFilter(TokenStream input) |
|
GermanStemFilter(TokenStream in) |
Creates a
GermanStemFilter instance |
GermanStemFilter(TokenStream in,
Set<?> exclusionSet) |
Deprecated.
use
KeywordAttribute with KeywordMarkerFilter instead. |
Modifier and Type | Class | Description |
---|---|---|
class |
GreekLowerCaseFilter |
Normalizes token text to lower case, removes some Greek diacritics,
and standardizes final sigma to sigma.
|
class |
GreekStemFilter |
A
TokenFilter that applies GreekStemmer to stem Greek
words. |
Constructor | Description |
---|---|
GreekLowerCaseFilter(TokenStream in) |
Deprecated.
Use
GreekLowerCaseFilter(Version, TokenStream) instead. |
GreekLowerCaseFilter(Version matchVersion,
TokenStream in) |
Create a GreekLowerCaseFilter that normalizes Greek token text.
|
GreekStemFilter(TokenStream input) |
Modifier and Type | Class | Description |
---|---|---|
class |
EnglishMinimalStemFilter |
A
TokenFilter that applies EnglishMinimalStemmer to stem
English words. |
class |
EnglishPossessiveFilter |
TokenFilter that removes possessives (trailing 's) from words.
|
class |
KStemFilter |
A high-performance kstem filter for english.
|
Constructor | Description |
---|---|
EnglishMinimalStemFilter(TokenStream input) |
|
EnglishPossessiveFilter(TokenStream input) |
Deprecated.
Use
EnglishPossessiveFilter(Version, TokenStream) instead. |
EnglishPossessiveFilter(Version version,
TokenStream input) |
|
KStemFilter(TokenStream in) |
Modifier and Type | Class | Description |
---|---|---|
class |
SpanishLightStemFilter |
A
TokenFilter that applies SpanishLightStemmer to stem Spanish
words. |
Constructor | Description |
---|---|
SpanishLightStemFilter(TokenStream input) |
Modifier and Type | Class | Description |
---|---|---|
class |
PersianNormalizationFilter |
A
TokenFilter that applies PersianNormalizer to normalize the
orthography. |
Constructor | Description |
---|---|
PersianNormalizationFilter(TokenStream input) |
Modifier and Type | Class | Description |
---|---|---|
class |
FinnishLightStemFilter |
A
TokenFilter that applies FinnishLightStemmer to stem Finnish
words. |
Constructor | Description |
---|---|
FinnishLightStemFilter(TokenStream input) |
Modifier and Type | Class | Description |
---|---|---|
class |
ElisionFilter |
Removes elisions from a
TokenStream . |
class |
FrenchLightStemFilter |
A
TokenFilter that applies FrenchLightStemmer to stem French
words. |
class |
FrenchMinimalStemFilter |
A
TokenFilter that applies FrenchMinimalStemmer to stem French
words. |
class |
FrenchStemFilter |
Deprecated.
Use
SnowballFilter with
FrenchStemmer instead, which has the
same functionality. |
Constructor | Description |
---|---|
ElisionFilter(TokenStream input) |
Deprecated.
use
ElisionFilter(Version, TokenStream) instead |
ElisionFilter(TokenStream input,
String[] articles) |
Deprecated.
use
ElisionFilter(Version, TokenStream, Set) instead |
ElisionFilter(TokenStream input,
Set<?> articles) |
Deprecated.
use
ElisionFilter(Version, TokenStream, Set) instead |
ElisionFilter(Version matchVersion,
TokenStream input) |
Constructs an elision filter with standard stop words
|
ElisionFilter(Version matchVersion,
TokenStream input,
Set<?> articles) |
Constructs an elision filter with a Set of stop words
|
FrenchLightStemFilter(TokenStream input) |
|
FrenchMinimalStemFilter(TokenStream input) |
|
FrenchStemFilter(TokenStream in) |
Deprecated.
|
FrenchStemFilter(TokenStream in,
Set<?> exclusiontable) |
Deprecated.
use
KeywordAttribute with KeywordMarkerFilter instead. |
Modifier and Type | Class | Description |
---|---|---|
class |
IrishLowerCaseFilter |
Normalises token text to lower case, handling t-prothesis
and n-eclipsis (i.e., that 'nAthair' should become 'n-athair')
|
Constructor | Description |
---|---|
IrishLowerCaseFilter(TokenStream in) |
Create an IrishLowerCaseFilter that normalises Irish token text.
|
Modifier and Type | Class | Description |
---|---|---|
class |
GalicianMinimalStemFilter |
A
TokenFilter that applies GalicianMinimalStemmer to stem
Galician words. |
class |
GalicianStemFilter |
A
TokenFilter that applies GalicianStemmer to stem
Galician words. |
Constructor | Description |
---|---|
GalicianMinimalStemFilter(TokenStream input) |
|
GalicianStemFilter(TokenStream input) |
Modifier and Type | Class | Description |
---|---|---|
class |
HindiNormalizationFilter |
A
TokenFilter that applies HindiNormalizer to normalize the
orthography. |
class |
HindiStemFilter |
A
TokenFilter that applies HindiStemmer to stem Hindi words. |
Constructor | Description |
---|---|
HindiNormalizationFilter(TokenStream input) |
|
HindiStemFilter(TokenStream input) |
Modifier and Type | Class | Description |
---|---|---|
class |
HungarianLightStemFilter |
A
TokenFilter that applies HungarianLightStemmer to stem
Hungarian words. |
Constructor | Description |
---|---|
HungarianLightStemFilter(TokenStream input) |
Modifier and Type | Class | Description |
---|---|---|
class |
HunspellStemFilter |
TokenFilter that uses hunspell affix rules and words to stem tokens.
|
Constructor | Description |
---|---|
HunspellStemFilter(TokenStream input,
HunspellDictionary dictionary) |
Creates a new HunspellStemFilter that will stem tokens from the given TokenStream using affix rules in the provided
HunspellDictionary
|
HunspellStemFilter(TokenStream input,
HunspellDictionary dictionary,
boolean dedup) |
Creates a new HunspellStemFilter that will stem tokens from the given TokenStream using affix rules in the provided
HunspellDictionary
|
Modifier and Type | Class | Description |
---|---|---|
class |
ICUFoldingFilter |
A TokenFilter that applies search term folding to Unicode text,
applying foldings from UTR#30 Character Foldings.
|
class |
ICUNormalizer2Filter |
Normalize token text with ICU's
Normalizer2 |
class |
ICUTransformFilter |
A
TokenFilter that transforms text with ICU. |
Constructor | Description |
---|---|
ICUFoldingFilter(TokenStream input) |
Create a new ICUFoldingFilter on the specified input
|
ICUNormalizer2Filter(TokenStream input) |
Create a new Normalizer2Filter that combines NFKC normalization, Case
Folding, and removes Default Ignorables (NFKC_Casefold)
|
ICUNormalizer2Filter(TokenStream input,
com.ibm.icu.text.Normalizer2 normalizer) |
Create a new Normalizer2Filter with the specified Normalizer2
|
ICUTransformFilter(TokenStream input,
com.ibm.icu.text.Transliterator transform) |
Create a new ICUTransformFilter that transforms text on the given stream.
|
Modifier and Type | Class | Description |
---|---|---|
class |
ICUTokenizer |
Breaks text into words according to UAX #29: Unicode Text Segmentation
(http://www.unicode.org/reports/tr29/)
|
Modifier and Type | Class | Description |
---|---|---|
class |
IndonesianStemFilter |
A
TokenFilter that applies IndonesianStemmer to stem Indonesian words. |
Constructor | Description |
---|---|
IndonesianStemFilter(TokenStream input) |
|
IndonesianStemFilter(TokenStream input,
boolean stemDerivational) |
Create a new IndonesianStemFilter.
|
Modifier and Type | Class | Description |
---|---|---|
class |
IndicNormalizationFilter |
A
TokenFilter that applies IndicNormalizer to normalize text
in Indian Languages. |
class |
IndicTokenizer |
Deprecated.
(3.6) Use
StandardTokenizer instead. |
Constructor | Description |
---|---|
IndicNormalizationFilter(TokenStream input) |
Modifier and Type | Class | Description |
---|---|---|
class |
ItalianLightStemFilter |
A
TokenFilter that applies ItalianLightStemmer to stem Italian
words. |
Constructor | Description |
---|---|
ItalianLightStemFilter(TokenStream input) |
Modifier and Type | Class | Description |
---|---|---|
class |
JapaneseBaseFormFilter |
Replaces term text with the
BaseFormAttribute . |
class |
JapaneseKatakanaStemFilter |
A
TokenFilter that normalizes common katakana spelling variations
ending in a long sound character by removing this character (U+30FC). |
class |
JapanesePartOfSpeechStopFilter |
Removes tokens that match a set of part-of-speech tags.
|
class |
JapaneseReadingFormFilter |
A
TokenFilter that replaces the term
attribute with the reading of a token in either katakana or romaji form. |
class |
JapaneseTokenizer |
Tokenizer for Japanese that uses morphological analysis.
|
Constructor | Description |
---|---|
JapaneseBaseFormFilter(TokenStream input) |
|
JapaneseKatakanaStemFilter(TokenStream input) |
|
JapaneseKatakanaStemFilter(TokenStream input,
int minimumLength) |
|
JapanesePartOfSpeechStopFilter(boolean enablePositionIncrements,
TokenStream input,
Set<String> stopTags) |
|
JapaneseReadingFormFilter(TokenStream input) |
|
JapaneseReadingFormFilter(TokenStream input,
boolean useRomaji) |
Modifier and Type | Class | Description |
---|---|---|
class |
LatvianStemFilter |
A
TokenFilter that applies LatvianStemmer to stem Latvian
words. |
Constructor | Description |
---|---|
LatvianStemFilter(TokenStream input) |
Modifier and Type | Class | Description |
---|---|---|
class |
EmptyTokenStream |
An always exhausted token stream.
|
class |
PrefixAndSuffixAwareTokenFilter |
Links two
PrefixAwareTokenFilter . |
class |
PrefixAwareTokenFilter |
Joins two token streams and leaves the last token of the first stream available
to be used when updating the token values in the second stream based on that token.
|
class |
SingleTokenTokenStream |
A
TokenStream containing a single token. |
class |
StemmerOverrideFilter |
Provides the ability to override any
KeywordAttribute aware stemmer
with custom dictionary-based stemming. |
Modifier and Type | Method | Description |
---|---|---|
TokenStream |
PrefixAwareTokenFilter.getPrefix() |
|
TokenStream |
PrefixAwareTokenFilter.getSuffix() |
Modifier and Type | Method | Description |
---|---|---|
void |
PrefixAwareTokenFilter.setPrefix(TokenStream prefix) |
|
void |
PrefixAwareTokenFilter.setSuffix(TokenStream suffix) |
Constructor | Description |
---|---|
PrefixAndSuffixAwareTokenFilter(TokenStream prefix,
TokenStream input,
TokenStream suffix) |
|
PrefixAwareTokenFilter(TokenStream prefix,
TokenStream suffix) |
|
StemmerOverrideFilter(Version matchVersion,
TokenStream input,
Map<?,String> dictionary) |
Create a new StemmerOverrideFilter, performing dictionary-based stemming
with the provided
dictionary . |
Modifier and Type | Class | Description |
---|---|---|
class |
EdgeNGramTokenFilter |
Tokenizes the given token into n-grams of given size(s).
|
class |
EdgeNGramTokenizer |
Tokenizes the input from an edge into n-grams of given size(s).
|
class |
NGramTokenFilter |
Tokenizes the input into n-grams of the given size(s).
|
class |
NGramTokenizer |
Tokenizes the input into n-grams of the given size(s).
|
Constructor | Description |
---|---|
EdgeNGramTokenFilter(TokenStream input,
String sideLabel,
int minGram,
int maxGram) |
Creates EdgeNGramTokenFilter that can generate n-grams in the sizes of the given range
|
EdgeNGramTokenFilter(TokenStream input,
EdgeNGramTokenFilter.Side side,
int minGram,
int maxGram) |
Creates EdgeNGramTokenFilter that can generate n-grams in the sizes of the given range
|
NGramTokenFilter(TokenStream input) |
Creates NGramTokenFilter with default min and max n-grams.
|
NGramTokenFilter(TokenStream input,
int minGram,
int maxGram) |
Creates NGramTokenFilter with given min and max n-grams.
|
Modifier and Type | Class | Description |
---|---|---|
class |
DutchStemFilter |
Deprecated.
Use
SnowballFilter with
DutchStemmer instead, which has the
same functionality. |
Constructor | Description |
---|---|
DutchStemFilter(TokenStream _in) |
Deprecated.
|
DutchStemFilter(TokenStream _in,
Map<?,?> stemdictionary) |
Deprecated.
|
DutchStemFilter(TokenStream _in,
Set<?> exclusiontable) |
Deprecated.
use
KeywordAttribute with KeywordMarkerFilter instead. |
DutchStemFilter(TokenStream _in,
Set<?> exclusiontable,
Map<?,?> stemdictionary) |
Deprecated.
use
KeywordAttribute with KeywordMarkerFilter instead. |
Modifier and Type | Class | Description |
---|---|---|
class |
NorwegianLightStemFilter |
A
TokenFilter that applies NorwegianLightStemmer to stem Norwegian
words. |
class |
NorwegianMinimalStemFilter |
A
TokenFilter that applies NorwegianMinimalStemmer to stem Norwegian
words. |
Constructor | Description |
---|---|
NorwegianLightStemFilter(TokenStream input) |
|
NorwegianMinimalStemFilter(TokenStream input) |
Modifier and Type | Class | Description |
---|---|---|
class |
PathHierarchyTokenizer |
Tokenizer for path-like hierarchies.
|
class |
ReversePathHierarchyTokenizer |
Tokenizer for domain-like hierarchies.
|
Modifier and Type | Class | Description |
---|---|---|
class |
DelimitedPayloadTokenFilter |
Characters before the delimiter are the "token", those after are the payload.
|
class |
NumericPayloadTokenFilter |
Assigns a payload to a token based on the
Token.type() |
class |
TokenOffsetPayloadTokenFilter |
Adds the
Token.setStartOffset(int)
and Token.setEndOffset(int)
First 4 bytes are the start |
class |
TypeAsPayloadTokenFilter |
Makes the
Token.type() a payload. |
Constructor | Description |
---|---|
DelimitedPayloadTokenFilter(TokenStream input,
char delimiter,
PayloadEncoder encoder) |
|
NumericPayloadTokenFilter(TokenStream input,
float payload,
String typeMatch) |
|
TokenOffsetPayloadTokenFilter(TokenStream input) |
|
TypeAsPayloadTokenFilter(TokenStream input) |
Modifier and Type | Class | Description |
---|---|---|
class |
BeiderMorseFilter |
TokenFilter for Beider-Morse phonetic encoding.
|
class |
DoubleMetaphoneFilter |
Filter for DoubleMetaphone (supporting secondary codes)
|
class |
PhoneticFilter |
Create tokens for phonetic matches.
|
Constructor | Description |
---|---|
BeiderMorseFilter(TokenStream input,
org.apache.commons.codec.language.bm.PhoneticEngine engine) |
|
BeiderMorseFilter(TokenStream input,
org.apache.commons.codec.language.bm.PhoneticEngine engine,
org.apache.commons.codec.language.bm.Languages.LanguageSet languages) |
Create a new BeiderMorseFilter
|
DoubleMetaphoneFilter(TokenStream input,
int maxCodeLength,
boolean inject) |
|
PhoneticFilter(TokenStream in,
org.apache.commons.codec.Encoder encoder,
boolean inject) |
Modifier and Type | Class | Description |
---|---|---|
class |
PositionFilter |
Set the positionIncrement of all tokens to the "positionIncrement",
except the first return token which retains its original positionIncrement value.
|
Constructor | Description |
---|---|
PositionFilter(TokenStream input) |
Constructs a PositionFilter that assigns a position increment of zero to
all but the first token from the given input stream.
|
PositionFilter(TokenStream input,
int positionIncrement) |
Constructs a PositionFilter that assigns the given position increment to
all but the first token from the given input stream.
|
Modifier and Type | Class | Description |
---|---|---|
class |
PortugueseLightStemFilter |
A
TokenFilter that applies PortugueseLightStemmer to stem
Portuguese words. |
class |
PortugueseMinimalStemFilter |
A
TokenFilter that applies PortugueseMinimalStemmer to stem
Portuguese words. |
class |
PortugueseStemFilter |
A
TokenFilter that applies PortugueseStemmer to stem
Portuguese words. |
Constructor | Description |
---|---|
PortugueseLightStemFilter(TokenStream input) |
|
PortugueseMinimalStemFilter(TokenStream input) |
|
PortugueseStemFilter(TokenStream input) |
Modifier and Type | Method | Description |
---|---|---|
TokenStream |
QueryAutoStopWordAnalyzer.reusableTokenStream(String fieldName,
Reader reader) |
|
TokenStream |
QueryAutoStopWordAnalyzer.tokenStream(String fieldName,
Reader reader) |
Modifier and Type | Class | Description |
---|---|---|
class |
ReverseStringFilter |
Reverse token string, for example "country" => "yrtnuoc".
|
Constructor | Description |
---|---|
ReverseStringFilter(TokenStream in) |
Deprecated.
use
ReverseStringFilter(Version, TokenStream)
instead. |
ReverseStringFilter(TokenStream in,
char marker) |
Deprecated.
use
ReverseStringFilter(Version, TokenStream, char)
instead. |
ReverseStringFilter(Version matchVersion,
TokenStream in) |
Create a new ReverseStringFilter that reverses all tokens in the
supplied
TokenStream . |
ReverseStringFilter(Version matchVersion,
TokenStream in,
char marker) |
Create a new ReverseStringFilter that reverses and marks all tokens in the
supplied
TokenStream . |
Modifier and Type | Class | Description |
---|---|---|
class |
RussianLetterTokenizer |
Deprecated.
Use
StandardTokenizer instead, which has the same functionality. |
class |
RussianLightStemFilter |
A
TokenFilter that applies RussianLightStemmer to stem Russian
words. |
class |
RussianLowerCaseFilter |
Deprecated.
Use
LowerCaseFilter instead, which has the same
functionality. |
class |
RussianStemFilter |
Deprecated.
Use
SnowballFilter with
RussianStemmer instead, which has the
same functionality. |
Constructor | Description |
---|---|
RussianLightStemFilter(TokenStream input) |
|
RussianLowerCaseFilter(TokenStream in) |
Deprecated.
|
RussianStemFilter(TokenStream in) |
Deprecated.
|
Modifier and Type | Class | Description |
---|---|---|
class |
ShingleFilter |
A ShingleFilter constructs shingles (token n-grams) from a token stream.
|
class |
ShingleMatrixFilter |
Deprecated.
Will be removed in Lucene 4.0.
|
Modifier and Type | Method | Description |
---|---|---|
TokenStream |
ShingleAnalyzerWrapper.reusableTokenStream(String fieldName,
Reader reader) |
|
TokenStream |
ShingleAnalyzerWrapper.tokenStream(String fieldName,
Reader reader) |
Constructor | Description |
---|---|
ShingleFilter(TokenStream input) |
Construct a ShingleFilter with default shingle size: 2.
|
ShingleFilter(TokenStream input,
int maxShingleSize) |
Constructs a ShingleFilter with the specified shingle size from the
TokenStream input |
ShingleFilter(TokenStream input,
int minShingleSize,
int maxShingleSize) |
Constructs a ShingleFilter with the specified shingle size from the
TokenStream input |
ShingleFilter(TokenStream input,
String tokenType) |
Construct a ShingleFilter with the specified token type for shingle tokens
and the default shingle size: 2
|
ShingleMatrixFilter(TokenStream input,
int minimumShingleSize,
int maximumShingleSize) |
Deprecated.
Creates a shingle filter using default settings.
|
ShingleMatrixFilter(TokenStream input,
int minimumShingleSize,
int maximumShingleSize,
Character spacerCharacter) |
Deprecated.
Creates a shingle filter using default settings.
|
ShingleMatrixFilter(TokenStream input,
int minimumShingleSize,
int maximumShingleSize,
Character spacerCharacter,
boolean ignoringSinglePrefixOrSuffixShingle) |
Deprecated.
Creates a shingle filter using the default
ShingleMatrixFilter.TokenSettingsCodec . |
ShingleMatrixFilter(TokenStream input,
int minimumShingleSize,
int maximumShingleSize,
Character spacerCharacter,
boolean ignoringSinglePrefixOrSuffixShingle,
ShingleMatrixFilter.TokenSettingsCodec settingsCodec) |
Deprecated.
Creates a shingle filter with ad hoc parameter settings.
|
Modifier and Type | Class | Description |
---|---|---|
class |
SnowballFilter |
A filter that stems words using a Snowball-generated stemmer.
|
Modifier and Type | Method | Description |
---|---|---|
TokenStream |
SnowballAnalyzer.reusableTokenStream(String fieldName,
Reader reader) |
Deprecated.
Returns a (possibly reused)
StandardTokenizer filtered by a
StandardFilter , a LowerCaseFilter ,
a StopFilter , and a SnowballFilter |
TokenStream |
SnowballAnalyzer.tokenStream(String fieldName,
Reader reader) |
Deprecated.
Constructs a
StandardTokenizer filtered by a StandardFilter , a LowerCaseFilter , a StopFilter ,
and a SnowballFilter |
Constructor | Description |
---|---|
SnowballFilter(TokenStream in,
String name) |
Construct the named stemming filter.
|
SnowballFilter(TokenStream input,
SnowballProgram stemmer) |
Modifier and Type | Class | Description |
---|---|---|
class |
ClassicFilter |
Normalizes tokens extracted with
ClassicTokenizer . |
class |
ClassicTokenizer |
A grammar-based tokenizer constructed with JFlex
|
class |
StandardFilter |
Normalizes tokens extracted with
StandardTokenizer . |
class |
StandardTokenizer |
A grammar-based tokenizer constructed with JFlex.
|
class |
UAX29URLEmailTokenizer |
This class implements Word Break rules from the Unicode Text Segmentation
algorithm, as specified in
Unicode Standard Annex #29
URLs and email addresses are also tokenized according to the relevant RFCs.
|
Constructor | Description |
---|---|
ClassicFilter(TokenStream in) |
Construct filtering in.
|
StandardFilter(TokenStream in) |
Deprecated.
Use
StandardFilter(Version, TokenStream) instead. |
StandardFilter(Version matchVersion,
TokenStream in) |
Modifier and Type | Class | Description |
---|---|---|
class |
StempelFilter |
Transforms the token stream as per the stemming algorithm.
|
Constructor | Description |
---|---|
StempelFilter(TokenStream in,
StempelStemmer stemmer) |
Create filter using the supplied stemming table.
|
StempelFilter(TokenStream in,
StempelStemmer stemmer,
int minLength) |
Create filter using the supplied stemming table.
|
Modifier and Type | Class | Description |
---|---|---|
class |
SwedishLightStemFilter |
A
TokenFilter that applies SwedishLightStemmer to stem Swedish
words. |
Constructor | Description |
---|---|
SwedishLightStemFilter(TokenStream input) |
Modifier and Type | Class | Description |
---|---|---|
class |
SynonymFilter |
Matches single or multi word synonyms in a token stream.
|
Constructor | Description |
---|---|
SynonymFilter(TokenStream input,
SynonymMap synonyms,
boolean ignoreCase) |
Modifier and Type | Class | Description |
---|---|---|
class |
ThaiWordFilter |
TokenFilter that use BreakIterator to break each
Token that is Thai into separate Token(s) for each Thai word. |
Constructor | Description |
---|---|
ThaiWordFilter(TokenStream input) |
Deprecated.
Use the ctor with
matchVersion instead! |
ThaiWordFilter(Version matchVersion,
TokenStream input) |
Creates a new ThaiWordFilter with the specified match version.
|
Modifier and Type | Class | Description |
---|---|---|
class |
TurkishLowerCaseFilter |
Normalizes Turkish token text to lower case.
|
Constructor | Description |
---|---|
TurkishLowerCaseFilter(TokenStream in) |
Create a new TurkishLowerCaseFilter, that normalizes Turkish token text
to lower case.
|
Modifier and Type | Class | Description |
---|---|---|
class |
WikipediaTokenizer |
Extension of StandardTokenizer that is aware of Wikipedia syntax.
|
Modifier and Type | Class | Description |
---|---|---|
class |
CollationKeyFilter |
Converts each token into its
CollationKey , and then
encodes the CollationKey with IndexableBinaryStringTools , to allow
it to be stored as an index term. |
class |
ICUCollationKeyFilter |
Converts each token into its
CollationKey , and
then encodes the CollationKey with IndexableBinaryStringTools , to
allow it to be stored as an index term. |
Modifier and Type | Method | Description |
---|---|---|
TokenStream |
CollationKeyAnalyzer.reusableTokenStream(String fieldName,
Reader reader) |
|
TokenStream |
ICUCollationKeyAnalyzer.reusableTokenStream(String fieldName,
Reader reader) |
|
TokenStream |
CollationKeyAnalyzer.tokenStream(String fieldName,
Reader reader) |
|
TokenStream |
ICUCollationKeyAnalyzer.tokenStream(String fieldName,
Reader reader) |
Constructor | Description |
---|---|
CollationKeyFilter(TokenStream input,
Collator collator) |
|
ICUCollationKeyFilter(TokenStream input,
com.ibm.icu.text.Collator collator) |
Modifier and Type | Field | Description |
---|---|---|
protected TokenStream |
AbstractField.tokenStream |
Modifier and Type | Method | Description |
---|---|---|
TokenStream |
Field.tokenStreamValue() |
The TokesStream for this field to be used when indexing, or null.
|
TokenStream |
Fieldable.tokenStreamValue() |
The TokenStream for this field to be used when indexing, or null.
|
TokenStream |
NumericField.tokenStreamValue() |
Returns a
NumericTokenStream for indexing the numeric value. |
Modifier and Type | Method | Description |
---|---|---|
void |
Field.setTokenStream(TokenStream tokenStream) |
Expert: sets the token stream to be used for indexing and causes isIndexed() and isTokenized() to return true.
|
Constructor | Description |
---|---|
Field(String name,
TokenStream tokenStream) |
Create a tokenized and indexed field that is not stored.
|
Field(String name,
TokenStream tokenStream,
Field.TermVector termVector) |
Create a tokenized and indexed field that is not stored, optionally with
storing term vectors.
|
Modifier and Type | Class | Description |
---|---|---|
class |
EnhancementsCategoryTokenizer |
A tokenizer which adds to each category token payload according to the
CategoryEnhancement s defined in the given
EnhancementsIndexingParams . |
Modifier and Type | Method | Description |
---|---|---|
protected TokenStream |
EnhancementsDocumentBuilder.getParentsStream(CategoryAttributesStream categoryAttributesStream) |
Modifier and Type | Method | Description |
---|---|---|
CategoryListTokenizer |
CategoryEnhancement.getCategoryListTokenizer(TokenStream tokenizer,
EnhancementsIndexingParams indexingParams,
TaxonomyWriter taxonomyWriter) |
Get the
CategoryListTokenizer which generates the category list for
this enhancement. |
protected CategoryListTokenizer |
EnhancementsDocumentBuilder.getCategoryListTokenizer(TokenStream categoryStream) |
|
protected CategoryTokenizer |
EnhancementsDocumentBuilder.getCategoryTokenizer(TokenStream categoryStream) |
Constructor | Description |
---|---|
EnhancementsCategoryTokenizer(TokenStream input,
EnhancementsIndexingParams indexingParams) |
Constructor.
|
Modifier and Type | Class | Description |
---|---|---|
class |
AssociationListTokenizer |
Tokenizer for associations of a category
|
Modifier and Type | Method | Description |
---|---|---|
CategoryListTokenizer |
AssociationEnhancement.getCategoryListTokenizer(TokenStream tokenizer,
EnhancementsIndexingParams indexingParams,
TaxonomyWriter taxonomyWriter) |
Constructor | Description |
---|---|
AssociationListTokenizer(TokenStream input,
EnhancementsIndexingParams indexingParams,
CategoryEnhancement enhancement) |
Modifier and Type | Method | Description |
---|---|---|
protected TokenStream |
CategoryDocumentBuilder.getParentsStream(CategoryAttributesStream categoryAttributesStream) |
Get a stream of categories which includes the parents, according to
policies defined in indexing parameters.
|
Modifier and Type | Method | Description |
---|---|---|
protected CategoryListTokenizer |
CategoryDocumentBuilder.getCategoryListTokenizer(TokenStream categoryStream) |
Get a category list tokenizer (or a series of such tokenizers) to create
the category list tokens.
|
protected CategoryTokenizer |
CategoryDocumentBuilder.getCategoryTokenizer(TokenStream categoryStream) |
Get a
CategoryTokenizer to create the category tokens. |
protected CountingListTokenizer |
CategoryDocumentBuilder.getCountingListTokenizer(TokenStream categoryStream) |
Get a
CountingListTokenizer for creating counting list token. |
Modifier and Type | Class | Description |
---|---|---|
class |
CategoryAttributesStream |
An attribute stream built from an
Iterable of
CategoryAttribute . |
class |
CategoryListTokenizer |
A base class for category list tokenizers, which add category list tokens to
category streams.
|
class |
CategoryParentsStream |
This class adds parents to a
CategoryAttributesStream . |
class |
CategoryTokenizer |
Basic class for setting the
CharTermAttribute s and
PayloadAttribute s of category tokens. |
class |
CategoryTokenizerBase |
A base class for all token filters which add term and payload attributes to
tokens and are to be used in
CategoryDocumentBuilder . |
class |
CountingListTokenizer |
CategoryListTokenizer for facet counting |
Constructor | Description |
---|---|
CategoryListTokenizer(TokenStream input,
FacetIndexingParams indexingParams) |
|
CategoryTokenizer(TokenStream input,
FacetIndexingParams indexingParams) |
|
CategoryTokenizerBase(TokenStream input,
FacetIndexingParams indexingParams) |
Constructor.
|
CountingListTokenizer(TokenStream input,
FacetIndexingParams indexingParams) |
Modifier and Type | Method | Description |
---|---|---|
<T> TokenStream |
MemoryIndex.keywordTokenStream(Collection<T> keywords) |
Convenience method; Creates and returns a token stream that generates a
token for each keyword in the given collection, "as is", without any
transforming text analysis.
|
Modifier and Type | Method | Description |
---|---|---|
void |
MemoryIndex.addField(String fieldName,
TokenStream stream) |
Equivalent to
addField(fieldName, stream, 1.0f) . |
void |
MemoryIndex.addField(String fieldName,
TokenStream stream,
float boost) |
Iterates over the given token stream and adds the resulting terms to the index;
Equivalent to adding a tokenized, indexed, termVectorStored, unstored,
Lucene
Field . |
Modifier and Type | Class | Description |
---|---|---|
static class |
QueryParserTestBase.QPTestFilter |
Filter which discards the token 'stop' and which expands the
token 'phrase' into 'phrase1 phrase2'
|
Modifier and Type | Method | Description |
---|---|---|
TokenStream |
QueryParserTestBase.QPTestAnalyzer.tokenStream(String fieldName,
Reader reader) |
Constructor | Description |
---|---|
QPTestFilter(TokenStream in) |
Modifier and Type | Class | Description |
---|---|---|
class |
OffsetLimitTokenFilter |
This TokenFilter limits the number of tokens while indexing by adding up the
current offset.
|
class |
TokenStreamFromTermPositionVector |
Modifier and Type | Method | Description |
---|---|---|
static TokenStream |
TokenSources.getAnyTokenStream(IndexReader reader,
int docId,
String field,
Analyzer analyzer) |
A convenience method that tries a number of approaches to getting a token
stream.
|
static TokenStream |
TokenSources.getAnyTokenStream(IndexReader reader,
int docId,
String field,
Document doc,
Analyzer analyzer) |
A convenience method that tries to first get a TermPositionVector for the
specified docId, then, falls back to using the passed in
Document to retrieve the TokenStream. |
static TokenStream |
TokenSources.getTokenStream(String field,
String contents,
Analyzer analyzer) |
|
static TokenStream |
TokenSources.getTokenStream(Document doc,
String field,
Analyzer analyzer) |
|
static TokenStream |
TokenSources.getTokenStream(IndexReader reader,
int docId,
String field) |
|
static TokenStream |
TokenSources.getTokenStream(IndexReader reader,
int docId,
String field,
Analyzer analyzer) |
|
static TokenStream |
TokenSources.getTokenStream(TermPositionVector tpv) |
|
static TokenStream |
TokenSources.getTokenStream(TermPositionVector tpv,
boolean tokenPositionsGuaranteedContiguous) |
Low level api.
|
TokenStream |
WeightedSpanTermExtractor.getTokenStream() |
|
TokenStream |
QueryScorer.init(TokenStream tokenStream) |
|
TokenStream |
QueryTermScorer.init(TokenStream tokenStream) |
|
TokenStream |
Scorer.init(TokenStream tokenStream) |
Called to init the Scorer with a
TokenStream . |
Modifier and Type | Method | Description |
---|---|---|
String |
Highlighter.getBestFragment(TokenStream tokenStream,
String text) |
Highlights chosen terms in a text, extracting the most relevant section.
|
String[] |
Highlighter.getBestFragments(TokenStream tokenStream,
String text,
int maxNumFragments) |
Highlights chosen terms in a text, extracting the most relevant sections.
|
String |
Highlighter.getBestFragments(TokenStream tokenStream,
String text,
int maxNumFragments,
String separator) |
Highlights terms in the text , extracting the most relevant sections
and concatenating the chosen fragments with a separator (typically "...").
|
TextFragment[] |
Highlighter.getBestTextFragments(TokenStream tokenStream,
String text,
boolean mergeContiguousFragments,
int maxNumFragments) |
Low level api to get the most relevant (formatted) sections of the document.
|
Map<String,WeightedSpanTerm> |
WeightedSpanTermExtractor.getWeightedSpanTerms(Query query,
TokenStream tokenStream) |
Creates a Map of
WeightedSpanTerms from the given Query and TokenStream . |
Map<String,WeightedSpanTerm> |
WeightedSpanTermExtractor.getWeightedSpanTerms(Query query,
TokenStream tokenStream,
String fieldName) |
Creates a Map of
WeightedSpanTerms from the given Query and TokenStream . |
Map<String,WeightedSpanTerm> |
WeightedSpanTermExtractor.getWeightedSpanTermsWithScores(Query query,
TokenStream tokenStream,
String fieldName,
IndexReader reader) |
Creates a Map of
WeightedSpanTerms from the given Query and TokenStream . |
TokenStream |
QueryScorer.init(TokenStream tokenStream) |
|
TokenStream |
QueryTermScorer.init(TokenStream tokenStream) |
|
TokenStream |
Scorer.init(TokenStream tokenStream) |
Called to init the Scorer with a
TokenStream . |
void |
Fragmenter.start(String originalText,
TokenStream tokenStream) |
Initializes the Fragmenter.
|
void |
NullFragmenter.start(String s,
TokenStream tokenStream) |
|
void |
SimpleFragmenter.start(String originalText,
TokenStream stream) |
|
void |
SimpleSpanFragmenter.start(String originalText,
TokenStream tokenStream) |
Constructor | Description |
---|---|
OffsetLimitTokenFilter(TokenStream input,
int offsetLimit) |
|
TokenGroup(TokenStream tokenStream) |
Copyright © 2000-2018 Apache Software Foundation. All Rights Reserved.