- Install
chutes
(andbittensor
if you don't already have a coldkey/hotkey)
python3 -m venv chutes-venv
source chutes-venv/bin/activate
pip install chutes 'bittensor<8'
- If you don't already have a coldkey/hotkey (replace chutes/chuteshk with your desired coldkey/hotkey names)
btcli wallet new_coldkey --n_words 24 --wallet.name chutes
btcli wallet new_hotkey --wallet.name chutes --n_words 24 --wallet.hotkey chuteshk
- Register on the chutes platform:
chutes register
- Create an API key for use with plain HTTP calls
chutes keys create --name yourkeyname --admin
- Create a chute
Create a file named llama3b.py
with the following:
from chutes.chute import NodeSelector
from chutes.chute.template.vllm import build_vllm_chute
chute = build_vllm_chute(
username="chutes",
model_name="unsloth/Llama-3.2-3B-Instruct",
image="chutes/vllm:0.6.3",
node_selector=NodeSelector(
gpu_count=1,
),
)
- Optionally watch for events to see things happening in real-time before you deploy:
pip install python-socketio
Create watch_events.py
import socketio
import time
sio = socketio.Client()
@sio.event
def connect():
print("Connected to server")
@sio.event
def disconnect():
print("Disconnected from server")
@sio.event
def message(data):
print(f"Received message: {data}")
@sio.on('*')
def catch_all(event, data):
print(f"Received event '{event}' with data: {data}")
def main():
try:
# Connect to the server
sio.connect('wss://events.chutes.ai')
print("Attempting connection...")
while True:
time.sleep(1)
except Exception as e:
print(f"An error occurred: {e}")
finally:
if sio.connected:
sio.disconnect()
if __name__ == "__main__":
main()
Start watching for events:
python watch_events.py
- Deploy it
If you're running watch_events.py
from 6, do this in a different tab/window.
chutes deploy llama3b:chute --public
(remove --public
if you want it to be private)
- Use it
Once you see that the chute is "hot" from watch_events.py
, or by running chutes chutes list
and seeing the status "hot"
Get the "slug" for the chute (via chutes chutes list
), which will be the subdomain for performing inference, e.g. username-unsloth-llama-3-2-3b-instruct
, and the API key you created in step 4:
curl https://username-unsloth-llama-3-2-3b-instruct.chutes.ai/v1/chat/completions -d '{"model": "unsloth/Llama-3.2-3B-Instruct", "messages": [{"role": "user", "content": "What is the secret to life, the universe, everything?."}], "stream": true, "max_tokens": 1000}' -H 'Authorization: cpk_4...'