semble is a search index over this repo. It splits the code into chunks and ranks them by a blend of keyword (BM25) and semantic-embedding similarity, then returns the top matches with their file_path and line. It is a retrieval tool, not an analysis tool: it finds existing code that resembles what you describe. It does not reason, judge, or summarize. Think "show me the code most like this description," never "answer this question about the codebase."
Invocation: semble is installed only in the misc conda env. Always run it as conda run -n misc --live-stream semble <args>. Never call semble bare; never use uvx. The short query examples below drop the prefix for readability — prepend it whenever you actually run one.
Describe the code you expect to find — its behavior, identifiers, or domain terms — in words that would plausibly appear in or near it. Do not query with a task, a verdict, or a property the code doesn't state about itself.
Good (describes target code):
semble search "retry with exponential backoff"semble search "parse and validate JWT"semble search "open a database connection pool"semble search "write model weights to disk"
Bad (a task or judgment, not code that exists by that name):
semble search "dead code"— nothing is labeled "dead code"; you determine deadness by tracing usage, not by retrieving a string.semble search "find the bug","security vulnerabilities","what should I refactor"— conclusions you reach by reading code, not code you can look up.
semble surfaces candidates; you do the comparison. It is not a wishing pool. Example — finding duplicated logic:
- You cannot search
"duplicate code". Search the behavior you suspect is duplicated, e.g.semble search "normalize and slugify a string". - semble returns the top matching chunks across all files, each with
file_pathand line. - Open and compare those chunks yourself to confirm whether they're actually duplicates.
- To expand from one known instance, run
find-relatedon it to pull chunks that resemble it:conda run -n misc --live-stream semble find-related src/text/format.py 88
Same shape for "is this already implemented somewhere," "where else do we do X," "find handlers like this one": search the behavior, then reason over the hits.
conda run -n misc --live-stream semble search "<describe the code>" # search cwd, top 5 chunks
conda run -n misc --live-stream semble search "<query>" path/to/repo -k 10 # explicit path, more results
conda run -n misc --live-stream semble search "<query>" --content docs # search prose/markdown instead of code
conda run -n misc --live-stream semble search "<query>" --content config # search config files
conda run -n misc --live-stream semble search "<query>" --content all # code + docs + config
conda run -n misc --live-stream semble find-related <file_path> <line> # chunks similar to a known location--content defaults to code; -k/--top-k defaults to 5; path defaults to the current directory. The index builds on first run, is cached, and rebuilds only when files change.
- Exact string, symbol, or error message → grep. semble ranks by similarity and returns a top-k, so it misses literal hits and won't give you all of them.
- A file by known name or path → read it directly.
- "Every occurrence of X" → grep; semble is ranked, not exhaustive.
- Full context after a hit → open the file at the returned
file_path:line.
Reach for semble when the question is "where is the code that does X" or "what code is like this." Reach for grep when you already know the exact text or path.