Skip to content

Instantly share code, notes, and snippets.

@muellerzr
Created September 30, 2025 14:03
Show Gist options
  • Save muellerzr/125b8f23c21611781481dacd034f7095 to your computer and use it in GitHub Desktop.
Save muellerzr/125b8f23c21611781481dacd034f7095 to your computer and use it in GitHub Desktop.
#!/bin/bash
DEVICE=0,1,2,3,4
PORT=8001
docker run \
--gpus '"device='$DEVICE'"' \
-v /mnt/raid:/models \
-p $PORT:$PORT \
ghcr.io/ggml-org/llama.cpp:server-cuda \
-m /models/Kimi-k2-Instruct-0905-GGUF/Kimi-K2-Instruct-0905-Q4_K_S-00001-of-00013.gguf \
--port $PORT \
--host 0.0.0.0 \
-ngl 99 \
--jinja \
--ctx-size 16384 \
--prio 3 \
--threads -1 \
--flash-attn on \
-ot "blk\.(0)\.=CUDA0" \
-ot "blk\.(1|2|3|4|5)\.ffn_(down)_exps.=CPU" \
-ot "blk\.(7|8|9|10|11|12|13|14|15|16|17)\.ffn_(down)_exps.=CPU" \
-ot "blk\.(21|22|23|24|25|26|27|28|29|30|31)\.ffn_(down)_exps.=CPU" \
-ot "blk\.(35|36|37|38|39|40|41|42|43|44|45)\.ffn_(down)_exps.=CPU" \
-ot "blk\.(49|50|51|52|53|54)\.ffn_(down)_exps.=CPU" \
-ot "blk\.(1|2|3|4|5|6)\.=CUDA0" \
-ot "blk\.(7|8|9|10|11|12|13|14|15|16|17|18|19|20)\.=CUDA1" \
-ot "blk\.(21|22|23|24|25|26|27|28|29|30|31|32|33|34)\.=CUDA2" \
-ot "blk\.(35|36|37|38|39|40|41|42|43|44|45|46|47|48)\.=CUDA3" \
-ot "blk\.(49|50|51|52|53|54|55|56|57|58|59|60)\.=CUDA4"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment