Skip to content

Instantly share code, notes, and snippets.

@sany2k8
Last active July 31, 2025 12:20
Show Gist options
  • Save sany2k8/02973cdd5ed206fe9b72477c3c40257d to your computer and use it in GitHub Desktop.
Save sany2k8/02973cdd5ed206fe9b72477c3c40257d to your computer and use it in GitHub Desktop.

Types of Analyzers

Analyzer Description Example Use
Standard Default; breaks text by word boundaries, removes most punctuation, lowercases tokens. English prose, general search
Simple Splits on non-letter, lowercases. Part numbers, technical terms
Whitespace Splits on whitespace only, preserves case. Code, serial numbers
Keyword Does not split; treats entire text as a single token. Exact match fields, IDs, tags
Pattern Uses regex for splitting. Log files, custom tokenization
Stop Simple, but removes English stopwords. Basic English filtering
Language Language-specific, handles stemming, stopwords for various languages. Multi-language text search
Fingerprint Produces sorted, deduped, lowercased tokens—great for deduplication. Address or name normalization
Custom Chain any char-filters, tokenizer, token filters as needed. Highly specific domain use-cases
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment