Skip to content

Instantly share code, notes, and snippets.

@ubergarm
Created August 1, 2025 23:41
Show Gist options
  • Save ubergarm/ee121af8c7974d05d4df12ccd35e529b to your computer and use it in GitHub Desktop.
Save ubergarm/ee121af8c7974d05d4df12ccd35e529b to your computer and use it in GitHub Desktop.
testing glm4.5 big one on llama.cpp with some thireus stuff

gotta go play some games now but quick test:

$ cd llama.cpp
$ git remote -v | grep sam
sammcj  [email protected]:sammcj/llama.cpp.git (fetch)
sammcj  [email protected]:sammcj/llama.cpp.git (push)
$ git checkout glm-4-5
$ git rev-parse --short HEAD
3d15c4a94
# compile cpu only
$ ./build/bin/llama-server --version
version: 6038 (3d15c4a94)
built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu
# test
#!/usr/bin/env bash

#ulimit -n 9999

model=/mnt/data/models/Thireus/GLM-4.5-THIREUS-BF16-SPECIAL_SPLIT/GLM-4.5-Thireus-Q8_0.gguf

numactl -N 0 -m 0 \
./build/bin/llama-server \
    --model "$model"\
    --alias ubergarm/GLM-4.5-Q8_0 \
    --ctx-size 196608 \
    -fa \
    -ctk q8_0 -ctv q8_0 \
    --parallel 1 \
    --threads 128 \
    --threads-batch 192 \
    --numa numactl \
    --host 127.0.0.1 \
    --port 8080 \
    --no-mmap

print_info: model type       = 355B.A32B
print_info: model params     = 358.34 B
print_info: general.name     = GLM 4.5
print_info: vocab type       = BPE
print_info: n_vocab          = 151552
print_info: n_merges         = 318088
print_info: BOS token        = 151329 '<|endoftext|>'
print_info: EOS token        = 151329 '<|endoftext|>'
print_info: EOT token        = 151336 '<|user|>'
print_info: UNK token        = 151329 '<|endoftext|>'
print_info: PAD token        = 151329 '<|endoftext|>'
print_info: LF token         = 198 'Ċ'
print_info: EOG token        = 151329 '<|endoftext|>'
print_info: EOG token        = 151336 '<|user|>'
print_info: max token length = 1024
load_tensors: loading model tensors, this can take a while... (mmap = false)
model has unused tensor blk.92.eh_proj (size = 209715200 bytes) -- ignoring
model has unused tensor blk.92.embed_tokens (size = 3103784960 bytes) -- ignoring
model has unused tensor blk.92.enorm (size = 20480 bytes) -- ignoring
model has unused tensor blk.92.hnorm (size = 20480 bytes) -- ignoring
model has unused tensor blk.92.shared_head.head (size = 3103784960 bytes) -- ignoring
model has unused tensor blk.92.shared_head.norm (size = 20480 bytes) -- ignoring
llama_model_load: error loading model: missing tensor 'blk.3.exp_probs_b'
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model '/mnt/data/models/Thireus/GLM-4.5-THIREUS-BF16-SPECIAL_SPLIT/GLM-4.5-Thireus-Q8_0.gguf'
srv    load_model: failed to load model, '/mnt/data/models/Thireus/GLM-4.5-THIREUS-BF16-SPECIAL_SPLIT/GLM-4.5-Thireus-Q8_0.gguf'
srv    operator(): operator(): cleaning up before exit...
main: exiting due to model loading error
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment