Skip to content

Instantly share code, notes, and snippets.

@jondurbin
Created December 8, 2024 12:31
Show Gist options
  • Save jondurbin/c6a5cf4ef883295f193d176494f75d4d to your computer and use it in GitHub Desktop.
Save jondurbin/c6a5cf4ef883295f193d176494f75d4d to your computer and use it in GitHub Desktop.
chutes quickstart
  1. Install chutes (and bittensor if you don't already have a coldkey/hotkey)
python3 -m venv chutes-venv
source chutes-venv/bin/activate
pip install chutes 'bittensor<8'
  1. If you don't already have a coldkey/hotkey (replace chutes/chuteshk with your desired coldkey/hotkey names)
btcli wallet new_coldkey --n_words 24 --wallet.name chutes
btcli wallet new_hotkey --wallet.name chutes --n_words 24 --wallet.hotkey chuteshk
  1. Register on the chutes platform:
chutes register
  1. Create an API key for use with plain HTTP calls
chutes keys create --name yourkeyname --admin
  1. Create a chute

Create a file named llama3b.py with the following:

from chutes.chute import NodeSelector
from chutes.chute.template.vllm import build_vllm_chute

chute = build_vllm_chute(
    username="chutes",
    model_name="unsloth/Llama-3.2-3B-Instruct",
    image="chutes/vllm:0.6.3",
    node_selector=NodeSelector(
        gpu_count=1,
    ),
)
  1. Optionally watch for events to see things happening in real-time before you deploy:
pip install python-socketio

Create watch_events.py

import socketio
import time

sio = socketio.Client()

@sio.event
def connect():
    print("Connected to server")

@sio.event
def disconnect():
    print("Disconnected from server")

@sio.event
def message(data):
    print(f"Received message: {data}")

@sio.on('*')
def catch_all(event, data):
    print(f"Received event '{event}' with data: {data}")

def main():
    try:
        # Connect to the server
        sio.connect('wss://events.chutes.ai')
        print("Attempting connection...")
        while True:
            time.sleep(1)
    except Exception as e:
        print(f"An error occurred: {e}")
    finally:
        if sio.connected:
            sio.disconnect()

if __name__ == "__main__":
    main()

Start watching for events:

python watch_events.py
  1. Deploy it

If you're running watch_events.py from 6, do this in a different tab/window.

chutes deploy llama3b:chute --public

(remove --public if you want it to be private)

  1. Use it

Once you see that the chute is "hot" from watch_events.py, or by running chutes chutes list and seeing the status "hot"

Get the "slug" for the chute (via chutes chutes list), which will be the subdomain for performing inference, e.g. username-unsloth-llama-3-2-3b-instruct, and the API key you created in step 4:

curl https://username-unsloth-llama-3-2-3b-instruct.chutes.ai/v1/chat/completions -d '{"model": "unsloth/Llama-3.2-3B-Instruct", "messages": [{"role": "user", "content": "What is the secret to life, the universe, everything?."}], "stream": true, "max_tokens": 1000}' -H 'Authorization: cpk_4...'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment