gotta go play some games now but quick test:
$ cd llama.cpp
$ git remote -v | grep sam
sammcj [email protected]:sammcj/llama.cpp.git (fetch)
sammcj [email protected]:sammcj/llama.cpp.git (push)
$ git checkout glm-4-5
$ git rev-parse --short HEAD
3d15c4a94
# compile cpu only
$ ./build/bin/llama-server --version
version: 6038 (3d15c4a94)
built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu
# test
#!/usr/bin/env bash
#ulimit -n 9999
model=/mnt/data/models/Thireus/GLM-4.5-THIREUS-BF16-SPECIAL_SPLIT/GLM-4.5-Thireus-Q8_0.gguf
numactl -N 0 -m 0 \
./build/bin/llama-server \
--model "$model"\
--alias ubergarm/GLM-4.5-Q8_0 \
--ctx-size 196608 \
-fa \
-ctk q8_0 -ctv q8_0 \
--parallel 1 \
--threads 128 \
--threads-batch 192 \
--numa numactl \
--host 127.0.0.1 \
--port 8080 \
--no-mmap
print_info: model type = 355B.A32B
print_info: model params = 358.34 B
print_info: general.name = GLM 4.5
print_info: vocab type = BPE
print_info: n_vocab = 151552
print_info: n_merges = 318088
print_info: BOS token = 151329 '<|endoftext|>'
print_info: EOS token = 151329 '<|endoftext|>'
print_info: EOT token = 151336 '<|user|>'
print_info: UNK token = 151329 '<|endoftext|>'
print_info: PAD token = 151329 '<|endoftext|>'
print_info: LF token = 198 'Ċ'
print_info: EOG token = 151329 '<|endoftext|>'
print_info: EOG token = 151336 '<|user|>'
print_info: max token length = 1024
load_tensors: loading model tensors, this can take a while... (mmap = false)
model has unused tensor blk.92.eh_proj (size = 209715200 bytes) -- ignoring
model has unused tensor blk.92.embed_tokens (size = 3103784960 bytes) -- ignoring
model has unused tensor blk.92.enorm (size = 20480 bytes) -- ignoring
model has unused tensor blk.92.hnorm (size = 20480 bytes) -- ignoring
model has unused tensor blk.92.shared_head.head (size = 3103784960 bytes) -- ignoring
model has unused tensor blk.92.shared_head.norm (size = 20480 bytes) -- ignoring
llama_model_load: error loading model: missing tensor 'blk.3.exp_probs_b'
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model '/mnt/data/models/Thireus/GLM-4.5-THIREUS-BF16-SPECIAL_SPLIT/GLM-4.5-Thireus-Q8_0.gguf'
srv load_model: failed to load model, '/mnt/data/models/Thireus/GLM-4.5-THIREUS-BF16-SPECIAL_SPLIT/GLM-4.5-Thireus-Q8_0.gguf'
srv operator(): operator(): cleaning up before exit...
main: exiting due to model loading error