Analyzer | Description | Example Use |
---|---|---|
Standard | Default; breaks text by word boundaries, removes most punctuation, lowercases tokens. | English prose, general search |
Simple | Splits on non-letter, lowercases. | Part numbers, technical terms |
Whitespace | Splits on whitespace only, preserves case. | Code, serial numbers |
Keyword | Does not split; treats entire text as a single token. | Exact match fields, IDs, tags |
Pattern | Uses regex for splitting. | Log files, custom tokenization |
Stop | Simple, but removes English stopwords. | Basic English filtering |
Language | Language-specific, handles stemming, stopwords for various languages. | Multi-language text search |
Fingerprint | Produces sorted, deduped, lowercased tokens—great for deduplication. | Address or name normalization |
Custom | Chain any char-filters, tokenizer, token filters as needed. | Highly specific domain use-cases |
Last active
July 31, 2025 12:20
-
-
Save sany2k8/02973cdd5ed206fe9b72477c3c40257d to your computer and use it in GitHub Desktop.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment