Skip to content

Instantly share code, notes, and snippets.

View NeilPandya's full-sized avatar
♻️
I may be slow to respond.

Neil Pandya NeilPandya

♻️
I may be slow to respond.
View GitHub Profile
@Artefact2
Artefact2 / README.md
Last active June 5, 2025 20:44
GGUF quantizations overview

Which GGUF is right for me? (Opinionated)

Good question! I am collecting human data on how quantization affects outputs. See here for more information: ggml-org/llama.cpp#5962

In the meantime, use the largest that fully fits in your GPU. If you can comfortably fit Q4_K_S, try using a model with more parameters.

llama.cpp feature matrix

See the wiki upstream: https://github.com/ggerganov/llama.cpp/wiki/Feature-matrix

@younesbelkada
younesbelkada / finetune_llama_v2.py
Last active April 7, 2025 18:27
Fine tune Llama v2 models on Guanaco Dataset
# coding=utf-8
# Copyright 2023 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software