Package | Description |
---|---|
org.apache.lucene.analysis |
API and code to convert text into indexable/searchable tokens.
|
org.apache.lucene.analysis.standard |
Standards-based analyzers implemented with JFlex.
|
org.apache.lucene.collation |
CollationKeyFilter
converts each token into its binary CollationKey using the
provided Collator , and then encode the CollationKey
as a String using
IndexableBinaryStringTools , to allow it to be
stored as an index term. |
org.apache.lucene.document |
The logical representation of a
Document for indexing and searching. |
Modifier and Type | Class | Description |
---|---|---|
class |
ASCIIFoldingFilter |
This class converts alphabetic, numeric, and symbolic Unicode characters
which are not in the first 127 ASCII characters (the "Basic Latin" Unicode
block) into their ASCII equivalents, if one exists.
|
class |
CachingTokenFilter |
This class can be used if the token attributes of a TokenStream
are intended to be consumed more than once.
|
class |
CharTokenizer |
An abstract base class for simple, character-oriented tokenizers.
|
class |
FilteringTokenFilter |
Abstract base class for TokenFilters that may remove tokens.
|
class |
ISOLatin1AccentFilter |
Deprecated.
If you build a new index, use
ASCIIFoldingFilter
which covers a superset of Latin 1. |
class |
KeywordMarkerFilter |
Marks terms as keywords via the
KeywordAttribute . |
class |
KeywordTokenizer |
Emits the entire input as a single token.
|
class |
LengthFilter |
Removes words that are too long or too short from the stream.
|
class |
LetterTokenizer |
A LetterTokenizer is a tokenizer that divides text at non-letters.
|
class |
LimitTokenCountFilter |
This TokenFilter limits the number of tokens while indexing.
|
class |
LowerCaseFilter |
Normalizes token text to lower case.
|
class |
LowerCaseTokenizer |
LowerCaseTokenizer performs the function of LetterTokenizer
and LowerCaseFilter together.
|
class |
NumericTokenStream |
Expert: This class provides a
TokenStream
for indexing numeric values that can be used by NumericRangeQuery or NumericRangeFilter . |
class |
PorterStemFilter |
Transforms the token stream as per the Porter stemming algorithm.
|
class |
StopFilter |
Removes stop words from a token stream.
|
class |
TeeSinkTokenFilter |
This TokenFilter provides the ability to set aside attribute states
that have already been analyzed.
|
static class |
TeeSinkTokenFilter.SinkTokenStream |
TokenStream output from a tee with optional filtering.
|
class |
TokenFilter |
A TokenFilter is a TokenStream whose input is another TokenStream.
|
class |
Tokenizer |
A Tokenizer is a TokenStream whose input is a Reader.
|
class |
TypeTokenFilter |
Removes tokens whose types appear in a set of blocked types from a token stream.
|
class |
WhitespaceTokenizer |
A WhitespaceTokenizer is a tokenizer that divides text at whitespace.
|
Modifier and Type | Field | Description |
---|---|---|
protected TokenStream |
TokenFilter.input |
The source of tokens for this filter.
|
protected TokenStream |
ReusableAnalyzerBase.TokenStreamComponents.sink |
Modifier and Type | Method | Description |
---|---|---|
protected TokenStream |
ReusableAnalyzerBase.TokenStreamComponents.getTokenStream() |
Returns the sink
TokenStream |
TokenStream |
Analyzer.reusableTokenStream(String fieldName,
Reader reader) |
Creates a TokenStream that is allowed to be re-used
from the previous time that the same thread called
this method.
|
TokenStream |
LimitTokenCountAnalyzer.reusableTokenStream(String fieldName,
Reader reader) |
|
TokenStream |
PerFieldAnalyzerWrapper.reusableTokenStream(String fieldName,
Reader reader) |
|
TokenStream |
ReusableAnalyzerBase.reusableTokenStream(String fieldName,
Reader reader) |
This method uses
ReusableAnalyzerBase.createComponents(String, Reader) to obtain an
instance of ReusableAnalyzerBase.TokenStreamComponents . |
abstract TokenStream |
Analyzer.tokenStream(String fieldName,
Reader reader) |
Creates a TokenStream which tokenizes all the text in the provided
Reader.
|
TokenStream |
LimitTokenCountAnalyzer.tokenStream(String fieldName,
Reader reader) |
|
TokenStream |
PerFieldAnalyzerWrapper.tokenStream(String fieldName,
Reader reader) |
|
TokenStream |
ReusableAnalyzerBase.tokenStream(String fieldName,
Reader reader) |
This method uses
ReusableAnalyzerBase.createComponents(String, Reader) to obtain an
instance of ReusableAnalyzerBase.TokenStreamComponents and returns the sink of the
components. |
Constructor | Description |
---|---|
ASCIIFoldingFilter(TokenStream input) |
|
CachingTokenFilter(TokenStream input) |
|
FilteringTokenFilter(boolean enablePositionIncrements,
TokenStream input) |
|
ISOLatin1AccentFilter(TokenStream input) |
Deprecated.
|
KeywordMarkerFilter(TokenStream in,
Set<?> keywordSet) |
Create a new KeywordMarkerFilter, that marks the current token as a
keyword if the tokens term buffer is contained in the given set via the
KeywordAttribute . |
KeywordMarkerFilter(TokenStream in,
CharArraySet keywordSet) |
Create a new KeywordMarkerFilter, that marks the current token as a
keyword if the tokens term buffer is contained in the given set via the
KeywordAttribute . |
LengthFilter(boolean enablePositionIncrements,
TokenStream in,
int min,
int max) |
Build a filter that removes words that are too long or too
short from the text.
|
LengthFilter(TokenStream in,
int min,
int max) |
Deprecated.
Use
LengthFilter(boolean, TokenStream, int, int) instead. |
LimitTokenCountFilter(TokenStream in,
int maxTokenCount) |
Build a filter that only accepts tokens up to a maximum number.
|
LowerCaseFilter(TokenStream in) |
Deprecated.
Use
LowerCaseFilter(Version, TokenStream) instead. |
LowerCaseFilter(Version matchVersion,
TokenStream in) |
Create a new LowerCaseFilter, that normalizes token text to lower case.
|
PorterStemFilter(TokenStream in) |
|
StopFilter(boolean enablePositionIncrements,
TokenStream in,
Set<?> stopWords) |
Deprecated.
use
StopFilter(Version, TokenStream, Set) instead |
StopFilter(boolean enablePositionIncrements,
TokenStream input,
Set<?> stopWords,
boolean ignoreCase) |
Deprecated.
Use
StopFilter(Version, TokenStream, Set) instead |
StopFilter(Version matchVersion,
TokenStream in,
Set<?> stopWords) |
Constructs a filter which removes words from the input TokenStream that are
named in the Set.
|
StopFilter(Version matchVersion,
TokenStream input,
Set<?> stopWords,
boolean ignoreCase) |
Deprecated.
Use
StopFilter(Version, TokenStream, Set) instead |
TeeSinkTokenFilter(TokenStream input) |
Instantiates a new TeeSinkTokenFilter.
|
TokenFilter(TokenStream input) |
Construct a token stream filtering the given input.
|
TokenStreamComponents(Tokenizer source,
TokenStream result) |
Creates a new
ReusableAnalyzerBase.TokenStreamComponents instance. |
TypeTokenFilter(boolean enablePositionIncrements,
TokenStream input,
Set<String> stopTypes) |
|
TypeTokenFilter(boolean enablePositionIncrements,
TokenStream input,
Set<String> stopTypes,
boolean useWhiteList) |
Modifier and Type | Class | Description |
---|---|---|
class |
ClassicFilter |
Normalizes tokens extracted with
ClassicTokenizer . |
class |
ClassicTokenizer |
A grammar-based tokenizer constructed with JFlex
|
class |
StandardFilter |
Normalizes tokens extracted with
StandardTokenizer . |
class |
StandardTokenizer |
A grammar-based tokenizer constructed with JFlex.
|
class |
UAX29URLEmailTokenizer |
This class implements Word Break rules from the Unicode Text Segmentation
algorithm, as specified in
Unicode Standard Annex #29
URLs and email addresses are also tokenized according to the relevant RFCs.
|
Constructor | Description |
---|---|
ClassicFilter(TokenStream in) |
Construct filtering in.
|
StandardFilter(TokenStream in) |
Deprecated.
Use
StandardFilter(Version, TokenStream) instead. |
StandardFilter(Version matchVersion,
TokenStream in) |
Modifier and Type | Class | Description |
---|---|---|
class |
CollationKeyFilter |
Converts each token into its
CollationKey , and then
encodes the CollationKey with IndexableBinaryStringTools , to allow
it to be stored as an index term. |
Modifier and Type | Method | Description |
---|---|---|
TokenStream |
CollationKeyAnalyzer.reusableTokenStream(String fieldName,
Reader reader) |
|
TokenStream |
CollationKeyAnalyzer.tokenStream(String fieldName,
Reader reader) |
Constructor | Description |
---|---|
CollationKeyFilter(TokenStream input,
Collator collator) |
Modifier and Type | Field | Description |
---|---|---|
protected TokenStream |
AbstractField.tokenStream |
Modifier and Type | Method | Description |
---|---|---|
TokenStream |
Field.tokenStreamValue() |
The TokesStream for this field to be used when indexing, or null.
|
TokenStream |
Fieldable.tokenStreamValue() |
The TokenStream for this field to be used when indexing, or null.
|
TokenStream |
NumericField.tokenStreamValue() |
Returns a
NumericTokenStream for indexing the numeric value. |
Modifier and Type | Method | Description |
---|---|---|
void |
Field.setTokenStream(TokenStream tokenStream) |
Expert: sets the token stream to be used for indexing and causes isIndexed() and isTokenized() to return true.
|
Constructor | Description |
---|---|
Field(String name,
TokenStream tokenStream) |
Create a tokenized and indexed field that is not stored.
|
Field(String name,
TokenStream tokenStream,
Field.TermVector termVector) |
Create a tokenized and indexed field that is not stored, optionally with
storing term vectors.
|
Copyright © 2000-2018 Apache Software Foundation. All Rights Reserved.