Skip to content

Instantly share code, notes, and snippets.

@pszemraj
Created April 5, 2026 06:07
Show Gist options
  • Select an option

  • Save pszemraj/87eddd448bf831bdc2ec21a9dfb61da0 to your computer and use it in GitHub Desktop.

Select an option

Save pszemraj/87eddd448bf831bdc2ec21a9dfb61da0 to your computer and use it in GitHub Desktop.
script for launching gemma4 on 5090 via llama-server
#!/usr/bin/env bash
set -e
# a script for running https://hf.co/unsloth/gemma-4-31B-it-GGUF
# on 5090 hardware, targeting long ctx
MODEL_PATH="$HOME/model-weights/llm/gguf/gemma4/gemma-4-31B-it-UD-Q4_K_XL.gguf"
MMPROJ_PATH="$HOME/model-weights/llm/gguf/gemma4/mmproj-BF16.gguf"
CONTEXT_LENGTH=65536
PORT=8674
# Defaults: thinking enabled
ENABLE_THINKING=true
REASONING_FORMAT="deepseek"
# Parse args
while [[ $# -gt 0 ]]; do
case "$1" in
--instruct)
ENABLE_THINKING=false
REASONING_FORMAT="none"
shift
;;
-c|--ctx)
CONTEXT_LENGTH="$2"
shift 2
;;
-p|--port)
PORT="$2"
shift 2
;;
*)
echo "Unknown option: $1" >&2
echo "Usage: $0 [--instruct] [-c|--ctx N] [-p|--port N]" >&2
exit 1
;;
esac
done
if [[ ! -f "${MMPROJ_PATH}" ]]; then
echo "ERROR: mmproj not found at ${MMPROJ_PATH}. download from repo" >&2
exit 1
fi
echo "Mode: $(${ENABLE_THINKING} && echo 'thinking' || echo 'instruct')"
echo "Context: ${CONTEXT_LENGTH}"
echo "Port: ${PORT}"
sleep 3
llama-server \
-m "${MODEL_PATH}" \
--mmproj "${MMPROJ_PATH}" \
--jinja \
-ngl -1 \
-c "${CONTEXT_LENGTH}" \
-fa on \
-b 4096 \
-ub 512 \
-ctk q8_0 \
-ctv q8_0 \
--ctx-checkpoints 16 \
--no-context-shift \
-np 1 \
-t 8 \
-tb 8 \
--temp 1.0 \
--top-p 0.95 \
--top-k 64 \
--chat-template-kwargs "{\"enable_thinking\":${ENABLE_THINKING}}" \
--reasoning-format "${REASONING_FORMAT}" \
--port "${PORT}"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment