Package | Description |
---|---|
org.apache.lucene.analysis |
API and code to convert text into indexable/searchable tokens.
|
org.apache.lucene.analysis.ar |
Analyzer for Arabic.
|
org.apache.lucene.analysis.bg |
Analyzer for Bulgarian.
|
org.apache.lucene.analysis.br |
Analyzer for Brazilian Portuguese.
|
org.apache.lucene.analysis.cjk |
Analyzer for Chinese, Japanese, and Korean, which indexes bigrams (overlapping groups of two adjacent Han characters).
|
org.apache.lucene.analysis.cn |
Analyzer for Chinese, which indexes unigrams (individual chinese characters).
|
org.apache.lucene.analysis.cn.smart |
Analyzer for Simplified Chinese, which indexes words.
|
org.apache.lucene.analysis.compound |
A filter that decomposes compound words you find in many Germanic
languages into the word parts.
|
org.apache.lucene.analysis.cz |
Analyzer for Czech.
|
org.apache.lucene.analysis.de |
Analyzer for German.
|
org.apache.lucene.analysis.el |
Analyzer for Greek.
|
org.apache.lucene.analysis.en |
Analyzer for English.
|
org.apache.lucene.analysis.es |
Analyzer for Spanish.
|
org.apache.lucene.analysis.fa |
Analyzer for Persian.
|
org.apache.lucene.analysis.fi |
Analyzer for Finnish.
|
org.apache.lucene.analysis.fr |
Analyzer for French.
|
org.apache.lucene.analysis.ga |
Analysis for Irish.
|
org.apache.lucene.analysis.gl |
Analyzer for Galician.
|
org.apache.lucene.analysis.hi |
Analyzer for Hindi.
|
org.apache.lucene.analysis.hu |
Analyzer for Hungarian.
|
org.apache.lucene.analysis.hunspell |
Stemming TokenFilter using a Java implementation of the
Hunspell stemming algorithm.
|
org.apache.lucene.analysis.icu |
Analysis components based on ICU
|
org.apache.lucene.analysis.id |
Analyzer for Indonesian.
|
org.apache.lucene.analysis.in |
Analysis components for Indian languages.
|
org.apache.lucene.analysis.it |
Analyzer for Italian.
|
org.apache.lucene.analysis.ja |
Analyzer for Japanese.
|
org.apache.lucene.analysis.lv |
Analyzer for Latvian.
|
org.apache.lucene.analysis.miscellaneous |
Miscellaneous TokenStreams
|
org.apache.lucene.analysis.ngram |
Character n-gram tokenizers and filters.
|
org.apache.lucene.analysis.nl |
Analyzer for Dutch.
|
org.apache.lucene.analysis.no |
Analyzer for Norwegian.
|
org.apache.lucene.analysis.payloads |
Provides various convenience classes for creating payloads on Tokens.
|
org.apache.lucene.analysis.phonetic |
Analysis components for phonetic search.
|
org.apache.lucene.analysis.position |
Filter for assigning position increments.
|
org.apache.lucene.analysis.pt |
Analyzer for Portuguese.
|
org.apache.lucene.analysis.reverse |
Filter to reverse token text.
|
org.apache.lucene.analysis.ru |
Analyzer for Russian.
|
org.apache.lucene.analysis.shingle |
Word n-gram filters
|
org.apache.lucene.analysis.snowball |
TokenFilter and Analyzer implementations that use Snowball
stemmers. |
org.apache.lucene.analysis.standard |
Standards-based analyzers implemented with JFlex.
|
org.apache.lucene.analysis.stempel |
Stempel: Algorithmic Stemmer
|
org.apache.lucene.analysis.sv |
Analyzer for Swedish.
|
org.apache.lucene.analysis.synonym |
Analysis components for Synonyms.
|
org.apache.lucene.analysis.th |
Analyzer for Thai.
|
org.apache.lucene.analysis.tr |
Analyzer for Turkish.
|
org.apache.lucene.collation |
CollationKeyFilter
converts each token into its binary CollationKey using the
provided Collator , and then encode the CollationKey
as a String using
IndexableBinaryStringTools , to allow it to be
stored as an index term. |
org.apache.lucene.facet.enhancements |
Enhanced category features
|
org.apache.lucene.facet.enhancements.association |
Association category enhancements
|
org.apache.lucene.facet.index.streaming |
Expert: attributes streaming definition for indexing facets
|
org.apache.lucene.queryParser |
A simple query parser implemented with JavaCC.
|
org.apache.lucene.search.highlight |
The highlight package contains classes to provide "keyword in context" features
typically used to highlight search terms in the text of results pages.
|
Modifier and Type | Class | Description |
---|---|---|
class |
ASCIIFoldingFilter |
This class converts alphabetic, numeric, and symbolic Unicode characters
which are not in the first 127 ASCII characters (the "Basic Latin" Unicode
block) into their ASCII equivalents, if one exists.
|
class |
CachingTokenFilter |
This class can be used if the token attributes of a TokenStream
are intended to be consumed more than once.
|
class |
FilteringTokenFilter |
Abstract base class for TokenFilters that may remove tokens.
|
class |
ISOLatin1AccentFilter |
Deprecated.
If you build a new index, use
ASCIIFoldingFilter
which covers a superset of Latin 1. |
class |
KeywordMarkerFilter |
Marks terms as keywords via the
KeywordAttribute . |
class |
LengthFilter |
Removes words that are too long or too short from the stream.
|
class |
LimitTokenCountFilter |
This TokenFilter limits the number of tokens while indexing.
|
class |
LookaheadTokenFilter<T extends LookaheadTokenFilter.Position> |
An abstract TokenFilter to make it easier to build graph
token filters requiring some lookahead.
|
class |
LowerCaseFilter |
Normalizes token text to lower case.
|
class |
MockFixedLengthPayloadFilter |
TokenFilter that adds random fixed-length payloads.
|
class |
MockGraphTokenFilter |
Randomly inserts overlapped (posInc=0) tokens with
posLength sometimes > 1.
|
class |
MockHoleInjectingTokenFilter |
|
class |
MockRandomLookaheadTokenFilter |
Uses
LookaheadTokenFilter to randomly peek at future tokens. |
class |
MockVariableLengthPayloadFilter |
TokenFilter that adds random variable-length payloads.
|
class |
PorterStemFilter |
Transforms the token stream as per the Porter stemming algorithm.
|
class |
StopFilter |
Removes stop words from a token stream.
|
class |
TeeSinkTokenFilter |
This TokenFilter provides the ability to set aside attribute states
that have already been analyzed.
|
class |
TypeTokenFilter |
Removes tokens whose types appear in a set of blocked types from a token stream.
|
class |
ValidatingTokenFilter |
A TokenFilter that checks consistency of the tokens (eg
offsets are consistent with one another).
|
Modifier and Type | Class | Description |
---|---|---|
class |
ArabicNormalizationFilter |
A
TokenFilter that applies ArabicNormalizer to normalize the orthography. |
class |
ArabicStemFilter |
A
TokenFilter that applies ArabicStemmer to stem Arabic words.. |
Modifier and Type | Class | Description |
---|---|---|
class |
BulgarianStemFilter |
A
TokenFilter that applies BulgarianStemmer to stem Bulgarian
words. |
Modifier and Type | Class | Description |
---|---|---|
class |
BrazilianStemFilter |
A
TokenFilter that applies BrazilianStemmer . |
Modifier and Type | Class | Description |
---|---|---|
class |
CJKBigramFilter |
Forms bigrams of CJK terms that are generated from StandardTokenizer
or ICUTokenizer.
|
class |
CJKWidthFilter |
A
TokenFilter that normalizes CJK width differences:
Folds fullwidth ASCII variants into the equivalent basic latin
Folds halfwidth Katakana variants into the equivalent kana
|
Modifier and Type | Class | Description |
---|---|---|
class |
ChineseFilter |
Deprecated.
Use
StopFilter instead, which has the same functionality. |
Modifier and Type | Class | Description |
---|---|---|
class |
WordTokenFilter |
A
TokenFilter that breaks sentences into words. |
Modifier and Type | Class | Description |
---|---|---|
class |
CompoundWordTokenFilterBase |
Base class for decomposition token filters.
|
class |
DictionaryCompoundWordTokenFilter |
A
TokenFilter that decomposes compound words found in many Germanic languages. |
class |
HyphenationCompoundWordTokenFilter |
A
TokenFilter that decomposes compound words found in many Germanic languages. |
Modifier and Type | Class | Description |
---|---|---|
class |
CzechStemFilter |
A
TokenFilter that applies CzechStemmer to stem Czech words. |
Modifier and Type | Class | Description |
---|---|---|
class |
GermanLightStemFilter |
A
TokenFilter that applies GermanLightStemmer to stem German
words. |
class |
GermanMinimalStemFilter |
A
TokenFilter that applies GermanMinimalStemmer to stem German
words. |
class |
GermanNormalizationFilter |
Normalizes German characters according to the heuristics
of the
German2 snowball algorithm.
|
class |
GermanStemFilter |
A
TokenFilter that stems German words. |
Modifier and Type | Class | Description |
---|---|---|
class |
GreekLowerCaseFilter |
Normalizes token text to lower case, removes some Greek diacritics,
and standardizes final sigma to sigma.
|
class |
GreekStemFilter |
A
TokenFilter that applies GreekStemmer to stem Greek
words. |
Modifier and Type | Class | Description |
---|---|---|
class |
EnglishMinimalStemFilter |
A
TokenFilter that applies EnglishMinimalStemmer to stem
English words. |
class |
EnglishPossessiveFilter |
TokenFilter that removes possessives (trailing 's) from words.
|
class |
KStemFilter |
A high-performance kstem filter for english.
|
Modifier and Type | Class | Description |
---|---|---|
class |
SpanishLightStemFilter |
A
TokenFilter that applies SpanishLightStemmer to stem Spanish
words. |
Modifier and Type | Class | Description |
---|---|---|
class |
PersianNormalizationFilter |
A
TokenFilter that applies PersianNormalizer to normalize the
orthography. |
Modifier and Type | Class | Description |
---|---|---|
class |
FinnishLightStemFilter |
A
TokenFilter that applies FinnishLightStemmer to stem Finnish
words. |
Modifier and Type | Class | Description |
---|---|---|
class |
ElisionFilter |
Removes elisions from a
TokenStream . |
class |
FrenchLightStemFilter |
A
TokenFilter that applies FrenchLightStemmer to stem French
words. |
class |
FrenchMinimalStemFilter |
A
TokenFilter that applies FrenchMinimalStemmer to stem French
words. |
class |
FrenchStemFilter |
Deprecated.
Use
SnowballFilter with
FrenchStemmer instead, which has the
same functionality. |
Modifier and Type | Class | Description |
---|---|---|
class |
IrishLowerCaseFilter |
Normalises token text to lower case, handling t-prothesis
and n-eclipsis (i.e., that 'nAthair' should become 'n-athair')
|
Modifier and Type | Class | Description |
---|---|---|
class |
GalicianMinimalStemFilter |
A
TokenFilter that applies GalicianMinimalStemmer to stem
Galician words. |
class |
GalicianStemFilter |
A
TokenFilter that applies GalicianStemmer to stem
Galician words. |
Modifier and Type | Class | Description |
---|---|---|
class |
HindiNormalizationFilter |
A
TokenFilter that applies HindiNormalizer to normalize the
orthography. |
class |
HindiStemFilter |
A
TokenFilter that applies HindiStemmer to stem Hindi words. |
Modifier and Type | Class | Description |
---|---|---|
class |
HungarianLightStemFilter |
A
TokenFilter that applies HungarianLightStemmer to stem
Hungarian words. |
Modifier and Type | Class | Description |
---|---|---|
class |
HunspellStemFilter |
TokenFilter that uses hunspell affix rules and words to stem tokens.
|
Modifier and Type | Class | Description |
---|---|---|
class |
ICUFoldingFilter |
A TokenFilter that applies search term folding to Unicode text,
applying foldings from UTR#30 Character Foldings.
|
class |
ICUNormalizer2Filter |
Normalize token text with ICU's
Normalizer2 |
class |
ICUTransformFilter |
A
TokenFilter that transforms text with ICU. |
Modifier and Type | Class | Description |
---|---|---|
class |
IndonesianStemFilter |
A
TokenFilter that applies IndonesianStemmer to stem Indonesian words. |
Modifier and Type | Class | Description |
---|---|---|
class |
IndicNormalizationFilter |
A
TokenFilter that applies IndicNormalizer to normalize text
in Indian Languages. |
Modifier and Type | Class | Description |
---|---|---|
class |
ItalianLightStemFilter |
A
TokenFilter that applies ItalianLightStemmer to stem Italian
words. |
Modifier and Type | Class | Description |
---|---|---|
class |
JapaneseBaseFormFilter |
Replaces term text with the
BaseFormAttribute . |
class |
JapaneseKatakanaStemFilter |
A
TokenFilter that normalizes common katakana spelling variations
ending in a long sound character by removing this character (U+30FC). |
class |
JapanesePartOfSpeechStopFilter |
Removes tokens that match a set of part-of-speech tags.
|
class |
JapaneseReadingFormFilter |
A
TokenFilter that replaces the term
attribute with the reading of a token in either katakana or romaji form. |
Modifier and Type | Class | Description |
---|---|---|
class |
LatvianStemFilter |
A
TokenFilter that applies LatvianStemmer to stem Latvian
words. |
Modifier and Type | Class | Description |
---|---|---|
class |
StemmerOverrideFilter |
Provides the ability to override any
KeywordAttribute aware stemmer
with custom dictionary-based stemming. |
Modifier and Type | Class | Description |
---|---|---|
class |
EdgeNGramTokenFilter |
Tokenizes the given token into n-grams of given size(s).
|
class |
NGramTokenFilter |
Tokenizes the input into n-grams of the given size(s).
|
Modifier and Type | Class | Description |
---|---|---|
class |
DutchStemFilter |
Deprecated.
Use
SnowballFilter with
DutchStemmer instead, which has the
same functionality. |
Modifier and Type | Class | Description |
---|---|---|
class |
NorwegianLightStemFilter |
A
TokenFilter that applies NorwegianLightStemmer to stem Norwegian
words. |
class |
NorwegianMinimalStemFilter |
A
TokenFilter that applies NorwegianMinimalStemmer to stem Norwegian
words. |
Modifier and Type | Class | Description |
---|---|---|
class |
DelimitedPayloadTokenFilter |
Characters before the delimiter are the "token", those after are the payload.
|
class |
NumericPayloadTokenFilter |
Assigns a payload to a token based on the
Token.type() |
class |
TokenOffsetPayloadTokenFilter |
Adds the
Token.setStartOffset(int)
and Token.setEndOffset(int)
First 4 bytes are the start |
class |
TypeAsPayloadTokenFilter |
Makes the
Token.type() a payload. |
Modifier and Type | Class | Description |
---|---|---|
class |
BeiderMorseFilter |
TokenFilter for Beider-Morse phonetic encoding.
|
class |
DoubleMetaphoneFilter |
Filter for DoubleMetaphone (supporting secondary codes)
|
class |
PhoneticFilter |
Create tokens for phonetic matches.
|
Modifier and Type | Class | Description |
---|---|---|
class |
PositionFilter |
Set the positionIncrement of all tokens to the "positionIncrement",
except the first return token which retains its original positionIncrement value.
|
Modifier and Type | Class | Description |
---|---|---|
class |
PortugueseLightStemFilter |
A
TokenFilter that applies PortugueseLightStemmer to stem
Portuguese words. |
class |
PortugueseMinimalStemFilter |
A
TokenFilter that applies PortugueseMinimalStemmer to stem
Portuguese words. |
class |
PortugueseStemFilter |
A
TokenFilter that applies PortugueseStemmer to stem
Portuguese words. |
Modifier and Type | Class | Description |
---|---|---|
class |
ReverseStringFilter |
Reverse token string, for example "country" => "yrtnuoc".
|
Modifier and Type | Class | Description |
---|---|---|
class |
RussianLightStemFilter |
A
TokenFilter that applies RussianLightStemmer to stem Russian
words. |
class |
RussianLowerCaseFilter |
Deprecated.
Use
LowerCaseFilter instead, which has the same
functionality. |
class |
RussianStemFilter |
Deprecated.
Use
SnowballFilter with
RussianStemmer instead, which has the
same functionality. |
Modifier and Type | Class | Description |
---|---|---|
class |
ShingleFilter |
A ShingleFilter constructs shingles (token n-grams) from a token stream.
|
Modifier and Type | Class | Description |
---|---|---|
class |
SnowballFilter |
A filter that stems words using a Snowball-generated stemmer.
|
Modifier and Type | Class | Description |
---|---|---|
class |
ClassicFilter |
Normalizes tokens extracted with
ClassicTokenizer . |
class |
StandardFilter |
Normalizes tokens extracted with
StandardTokenizer . |
Modifier and Type | Class | Description |
---|---|---|
class |
StempelFilter |
Transforms the token stream as per the stemming algorithm.
|
Modifier and Type | Class | Description |
---|---|---|
class |
SwedishLightStemFilter |
A
TokenFilter that applies SwedishLightStemmer to stem Swedish
words. |
Modifier and Type | Class | Description |
---|---|---|
class |
SynonymFilter |
Matches single or multi word synonyms in a token stream.
|
Modifier and Type | Class | Description |
---|---|---|
class |
ThaiWordFilter |
TokenFilter that use BreakIterator to break each
Token that is Thai into separate Token(s) for each Thai word. |
Modifier and Type | Class | Description |
---|---|---|
class |
TurkishLowerCaseFilter |
Normalizes Turkish token text to lower case.
|
Modifier and Type | Class | Description |
---|---|---|
class |
CollationKeyFilter |
Converts each token into its
CollationKey , and then
encodes the CollationKey with IndexableBinaryStringTools , to allow
it to be stored as an index term. |
class |
ICUCollationKeyFilter |
Converts each token into its
CollationKey , and
then encodes the CollationKey with IndexableBinaryStringTools , to
allow it to be stored as an index term. |
Modifier and Type | Class | Description |
---|---|---|
class |
EnhancementsCategoryTokenizer |
A tokenizer which adds to each category token payload according to the
CategoryEnhancement s defined in the given
EnhancementsIndexingParams . |
Modifier and Type | Class | Description |
---|---|---|
class |
AssociationListTokenizer |
Tokenizer for associations of a category
|
Modifier and Type | Class | Description |
---|---|---|
class |
CategoryListTokenizer |
A base class for category list tokenizers, which add category list tokens to
category streams.
|
class |
CategoryParentsStream |
This class adds parents to a
CategoryAttributesStream . |
class |
CategoryTokenizer |
Basic class for setting the
CharTermAttribute s and
PayloadAttribute s of category tokens. |
class |
CategoryTokenizerBase |
A base class for all token filters which add term and payload attributes to
tokens and are to be used in
CategoryDocumentBuilder . |
class |
CountingListTokenizer |
CategoryListTokenizer for facet counting |
Modifier and Type | Class | Description |
---|---|---|
static class |
QueryParserTestBase.QPTestFilter |
Filter which discards the token 'stop' and which expands the
token 'phrase' into 'phrase1 phrase2'
|
Modifier and Type | Class | Description |
---|---|---|
class |
OffsetLimitTokenFilter |
This TokenFilter limits the number of tokens while indexing by adding up the
current offset.
|
Copyright © 2000-2018 Apache Software Foundation. All Rights Reserved.