Skip to content

Instantly share code, notes, and snippets.

@michaelgold
Last active February 28, 2026 08:33
Show Gist options
  • Select an option

  • Save michaelgold/086755bc6d5d3b943fe4e6f371182481 to your computer and use it in GitHub Desktop.

Select an option

Save michaelgold/086755bc6d5d3b943fe4e6f371182481 to your computer and use it in GitHub Desktop.
minimax 2.5 docker compose
services:
minimax:
image: ghcr.io/ggml-org/llama.cpp:server-cuda
container_name: minimax-m25
restart: unless-stopped
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
environment:
- NVIDIA_VISIBLE_DEVICES=all
- NVIDIA_DRIVER_CAPABILITIES=compute,utility
volumes:
- ./models:/models
ports:
- "8080:8080"
command: >
-m /models/minimax-m2.5/minimax-m2.5-Q4_K_M.gguf
--host 0.0.0.0
--port 8080
-ngl 999
--ctx-size 24576
--cpu-moe
--no-warmup
--parallel 1
--batch-size 1024
--ubatch-size 256
--threads 32
--threads-batch 32
open-webui:
image: ghcr.io/open-webui/open-webui:main
container_name: open-webui
depends_on:
- minimax
restart: unless-stopped
ports:
- "3000:8080"
environment:
# WebUI expects OpenAI-compatible base
- OPENAI_API_BASE_URL=http://minimax:8080/v1
- OPENAI_API_KEY=dummy
# optional but nice:
- WEBUI_AUTH=false
@michaelgold
Copy link
Author

michaelgold commented Feb 17, 2026

to bootstrap:

curl -LsSf https://hf.co/cli/install.sh | bash
hf auth login
hf download ox-ox/MiniMax-M2.5-GGUF   --local-dir models/minimax-m2.5   --include "*.gguf"

run

docker compose up

optional (reduce disk swapping)

echo 'vm.swappiness=1' | sudo tee /etc/sysctl.d/99-llm.conf
sudo sysctl --system
sudo swapoff -a
sudo swapon -a

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment