Skip to content

Instantly share code, notes, and snippets.

@jondurbin
jondurbin / example.md
Last active April 23, 2025 17:36
Deploying an LLM

"easy" vllm endpoint

You can call this endpoint and it will automatically select the most recent vllm image:

curl -XPOST https://api.chutes.ai/chutes/vllm \
  -H 'content-type: application/json' \
   -H 'Authorization: cpk...' \
  -d '{
    "tagline": "Mistral 24b Instruct",
    "model": "unsloth/Mistral-Small-24B-Instruct-2501",
    "public": true,
@jondurbin
jondurbin / example.py
Created April 10, 2025 11:23
kimi-vl example
import os
import base64
import openai
import glob
client = openai.Client(base_url="https://llm.chutes.ai/v1", api_key=os.environ["CHUTES_API_KEY"])
image_base64s = []
for path in glob.glob("/home/jdurbin/Downloads/logo*.png")[:8]:
with open(path, "rb") as infile:
@jondurbin
jondurbin / example.py
Created March 25, 2025 10:13
Qwen2.5-VL-32b-Instruct inference example
import os
import base64
import openai
import glob
client = openai.Client(base_url="https://llm.chutes.ai/v1", api_key=os.environ["CHUTES_API_KEY"])
image_base64s = []
for path in glob.glob("/home/jdurbin/Downloads/logo*.png")[:8]:
with open(path, "rb") as infile:
@jondurbin
jondurbin / spark_example.py
Last active March 18, 2025 10:37
Inference example with Spark-TTS on chutes
import os
import requests
import base64
audio = base64.b64encode(open("test.wav", "rb").read()).decode()
result = requests.post(
"https://chutes-spark-tts.chutes.ai/speak",
json={
"text": "How much wood would a woodchuck chuck if a woodchuck could chuck wood?",
"sample_audio_b64": audio,
@jondurbin
jondurbin / csm1b_example.py
Created March 18, 2025 09:20
Example inference with csm-1b on chutes
import os
import requests
import base64
audio = base64.b64encode(open("test.wav", "rb").read()).decode()
result = requests.post(
"https://chutes-csm-1b.chutes.ai/speak",
json={
"speaker": 1,
"context": [
@jondurbin
jondurbin / dolphin.txt
Last active March 16, 2025 07:24
Who is dolphin?
{
"id": "27ab0d1289814bb28c7c30e38a98df8d",
"object": "chat.completion",
"created": 1742109451,
"model": "cognitivecomputations/Dolphin3.0-Mistral-24B",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
@jondurbin
jondurbin / chutes-walkthrough.md
Created December 8, 2024 12:31
chutes quickstart
  1. Install chutes (and bittensor if you don't already have a coldkey/hotkey)
python3 -m venv chutes-venv
source chutes-venv/bin/activate
pip install chutes 'bittensor<8'
  1. If you don't already have a coldkey/hotkey (replace chutes/chuteshk with your desired coldkey/hotkey names)
@jondurbin
jondurbin / check_copyright.py
Last active July 12, 2024 00:41
Copyright check
@jondurbin
jondurbin / create_tokenizer.py
Last active December 28, 2023 17:31
AR examples
import re
import gc
import os
import glob
import json
from copy import deepcopy
from datasets import concatenate_datasets, Dataset
from transformers import AutoTokenizer
from huggingface_hub import snapshot_download
@jondurbin
jondurbin / airoboros-m-7b-3.1.2.md
Last active October 21, 2023 12:52
airoboros-m-7b-3.1.2.md

Trained on 10x a6000 GPUs on runpod.io.

I actually ran many fine-tunes, including multiple full-finetunes, fp16 loras, and qloras, and the below qlora actually did best in my testing.

dataset: https://hf.co/datasets/jondurbin/airoboros-3.1 (plus a few unpublished de-censoring instructions)

training script: https://github.com/jondurbin/qlora specifically commit 8cd269bf9bd7753c92164934269019e12f23314f

export BASE_DIR=/workspace