Skip to content

Instantly share code, notes, and snippets.

@michaelgold
Created February 28, 2026 13:43
Show Gist options
  • Select an option

  • Save michaelgold/3db7ab66e71944f6f42a252297b8d180 to your computer and use it in GitHub Desktop.

Select an option

Save michaelgold/3db7ab66e71944f6f42a252297b8d180 to your computer and use it in GitHub Desktop.
Qwen3.5-35B-A3B-Q4_K_M Docker Compose
services:
reasoning:
image: ghcr.io/ggml-org/llama.cpp:server-cuda
container_name: qwen35-a3b-gguf
restart: unless-stopped
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
environment:
- NVIDIA_VISIBLE_DEVICES=all
- NVIDIA_DRIVER_CAPABILITIES=compute,utility
volumes:
- ../models:/models
ports:
- "8080:8080"
command: >
-m /models/reasoning/Qwen3.5-35B-A3B/Qwen3.5-35B-A3B-Q4_K_M.gguf
--host 0.0.0.0
--port 8080
--ctx-size 16384
--temp 1.0
--top-p 0.95
--top-k 20
--min-p 0.00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment