Skip to content

Instantly share code, notes, and snippets.

View tazarov's full-sized avatar

Trayan Azarov tazarov

View GitHub Profile
@tazarov
tazarov / openai_ef.py
Created May 22, 2025 17:33
OpenAI EF retry and timeouts
openai_ef = OpenAIEmbeddingFunction(
api_key=os.getenv("OPENAI_API_KEY"),
model_name=QAExecutor.embeddings_model
)
openai_ef._client._client.timeout=httpx.Timeout(60.0, read=5.0, write=10.0, connect=2.0)
openai_ef._client._client.max_retries=3
#!/usr/bin/env bash
pip install maturin cffi patchelf
maturin build
pip install --no-index --find-links target/wheels/ chromadb

Generate flamegraphs for python processes. This is great for performance profiling.

Note: You need root privs for this to work. In docker you need the container to run with privileged=true

Live view:

py-spy top --pid 1
@tazarov
tazarov / main.cpp
Created November 26, 2024 17:24
Reproducing HNSW Knn search exception memory leak
#include <iostream>
#include <stdexcept>
#if defined(__APPLE__)
#include <mach/mach.h>
#endif
#include <thread>
@tazarov
tazarov / hnsw_label_bug.ipynb
Created August 5, 2024 11:50
Reproduces a bug in HNSW when replace_delete=True
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@tazarov
tazarov / hnsw_stats.ipynb
Created August 2, 2024 10:37
Stats comparison between HNSW with and without replacement of deleted items
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@tazarov
tazarov / hnsw.patch
Created August 2, 2024 10:29
Patches Chroma 0.5.5+ with replace deleted HNSW flag
Index: chromadb/segment/impl/vector/local_hnsw.py
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/chromadb/segment/impl/vector/local_hnsw.py b/chromadb/segment/impl/vector/local_hnsw.py
--- a/chromadb/segment/impl/vector/local_hnsw.py (revision Staged)
+++ b/chromadb/segment/impl/vector/local_hnsw.py (date 1722520928951)
@@ -202,6 +202,7 @@
max_elements=DEFAULT_CAPACITY,
@tazarov
tazarov / hnsw_leak.py
Last active August 2, 2024 13:07
Runs a worst-case scenario benchmark on Chroma HNSW index to demonstrate effects of fragmentation on frequent add/delete
import argparse
import gc
from abc import ABC
from typing import List, Any, TypedDict, Optional
from overrides import EnforceOverrides, override
from pydantic import BaseModel, Field
from rich.console import Console
from rich.progress import track
from rich.prompt import Confirm
[Unit]
Description = Chroma Service
After = network.target
[Service]
Type = simple
User = root
Group = root
WorkingDirectory = /chroma
ExecStart=/usr/local/bin/chroma run --host 127.0.0.1 --port 8000 --path /chroma/data --log-path /var/log/chroma.log
[Unit]
Description = Chroma Docker Service
After = network.target docker.service
Requires = docker.service
[Service]
Type = forking
User = root
Group = root
WorkingDirectory = /home/admin/chroma