Skip to content

Instantly share code, notes, and snippets.

View dobriak's full-sized avatar

Julian Neytchev dobriak

View GitHub Profile
@dobriak
dobriak / README.md
Created June 30, 2026 05:58
Calculate VRAM usage for a model, see if it will fit with vllm

vram_calc.py

Estimates GPU VRAM required to serve a model with vllm, reading model weights directly from local storage. Handles quantized models (FP8, NVFP4, GPTQ, AWQ, mixed-precision) and hybrid architectures with both full-attention and linear/Mamba-style attention layers.

Requirements

  • uv — no other dependencies needed, the script uses only the Python standard library.

Usage

@dobriak
dobriak / README.md
Created June 30, 2026 05:53
Calculate VRAM (and RAM) usage for GGUF models

gguf_vram_calc.py

VRAM estimator for llama.cpp models. Reads architecture metadata directly from GGUF files — no internet access, no config download required.

uv run gguf_vram_calc.py MODEL.gguf [options]

@dobriak
dobriak / README.md
Created June 27, 2026 05:45
Compile vllm from scratch, AMD R9700, ROCm, gfx1201

vllm — ROCm build for AMD RDNA4 (gfx1201)

Build and runtime notes for the AMD Radeon AI PRO R9700 (and RX 9070 XT) on ROCm 7.2.


Why build from source?

AMD's RDNA4 architecture (gfx1201 / Navi 48) is new enough that pre-built vllm wheels do not target it. The official pip package is compiled for CUDA, and the AMD-published ROCm wheels are built for MI300-series datacenter GPUs (gfx942). Installing either will either fail at import or silently miscompile kernels for the wrong ISA.

@dobriak
dobriak / qwen36-mtp-benchmark.md
Created June 4, 2026 00:08
Benchmarking Qwen3.6-MTP

Benchmarking Qwen3.6-27B MTP Performance

  • RTX 5090, Ryzen 9 9950X3D, 128GB DDR5
  • Debian 13 6.12.85-1, CUDA 13.2.78, llama.cpp b9200

benchy Tests

1. Control - No MTP bits in gguf

mmproj loaded

@dobriak
dobriak / llamacpp-gemma4-31B-mtp-benchmark.md
Last active June 4, 2026 14:26
llama.cpp Gemma 4 MTP Benchmark

llama.cpp Gemma 4 MTP Benchmark

RTX 4070 12GB VRAM, 64GB RAM

Get the Gemma-4-MTP PR

git fetch origin pull/23398/head:gemma-mtp
git checkout gemma-mtp
@dobriak
dobriak / parse-oai-modeldata.sh
Created March 19, 2026 06:13
Opencode for some reason does not know how to parse model data if used with local llama.cpp server. This jq filter parses the output of Open AI `v1/models` in a format that can be pasted into opencode config providers/llama.cpp models setting
#!/usr/bin/env bash
LLAMA_CPP_BASE_URL=https://llamacpp.your-url.com
curl -s ${LLAMA_CPP_BASE_URL}/v1/models | jq '[.data[] |
.status.args as $args |
{
(.id): {
name: .id,
limit: (
($args | index("--ctx-size")) as $idx |
if $idx then {context: ($args[$idx + 1] | tonumber), output: ($args[$idx + 1] | tonumber)} else empty end
@dobriak
dobriak / PostProc.cmd
Created December 9, 2013 03:36
Post processing script for SABnzbd on Windows. Moves processed files onto a NAS share.
@echo off
rem Post processing script for SABnzbd
set NASNAME=192.168.1.2
rem ping the nas just in case, exit if not online
ping %NASNAME% | find "TTL" > nul
IF ERRORLEVEL 1 GOTO ENDERROR
SET NASPATH=\\%NASNAME%\usb_storage\new
SET LOGFILE="%~d0%~p0\postprocessing.log"
@dobriak
dobriak / README.md
Created November 11, 2013 17:25 — forked from Filirom1/README.md

Install routing plugin

yum install rubygem-openshift-origin-routing-activemq.noarch

Create routing-plugin configuration file

cp /etc/openshift/plugins.d/openshift-origin-routing-activemq.conf.example /etc/openshift/plugins.d/openshift-origin-routing-activemq.conf

Add routinginfo user into activemq.xml configuration file. See files below.

@dobriak
dobriak / lvroot_extend.sh
Last active December 23, 2015 20:29
Extending root LV ext4 partition
FREEDISK=/dev/vdb
VGROOT=VolGroup
LVROOT=lv_root
mkfs.ext4 ${FREEDISK}
pvcreate ${FREEDISK}
vgextend /dev/${VGROOT} ${FREEDISK}
lvextend -l +100%FREE /dev/${VGROOT}/${LVROOT}
resize2fs /dev/${VGROOT}/${LVROOT}
@dobriak
dobriak / .vimrc
Last active December 23, 2015 07:29
My vimrc file. Autoindentation and tabbing, editting new files in tabs with f7/f8 back and forth (:tab e <file> to open in a tab)
:set autoindent
:set shiftwidth=2
:set tabstop=2
:set expandtab
:map <F7> :tabp<CR>
:map <F8> :tabn<CR>
:set pastetoggle=<F2>