Converting HuggingFace Model to GGUF to Load into Ollama

Step 01: Download the Model from HF

First clone or download the huggingface repository. Change the model name as per need.

from huggingface_hub import snapshot_download
model_name = "Qwen/Qwen2.5-0.5B-Instruct"
snapshot_download(repo_id=model_name, local_dir="./hf_model")

Save the above code in model_downloader.py. Then run the above code with

python model_downloader.py

Step 02: Prepare llama.cpp

Clone the llama.cpp and build it following the steps:

Clone

git clone https://github.com/ggerganov/llama.cpp.git

Install

cd cd llama.cp
pip install -r requirements.txt
cd ..

Step 03: Convert HF Model to GGUF

python llama.cpp/convert_hf_to_gguf.py hf_model --outfile Qwen2.5-0.5B-Instruct.gguf --outtype q8_0

Here --outfile indicates what will be the name of the output gguf file. Also --outtype defines the quatization level. This should create the gguf file named fine_tuned_Qwen2.5-Code-3B-hq-only.gguf

Step 04: Create modelfile

vi Modelfile

Param set

Add below in the model file:

FROM /path_to_file/Qwen2.5-0.5B-Instruct.gguf
SYSTEM "You are a helpful coding assistant."
PARAMETER temperature 0.7
PARAMETER top_k 40

Step 05: Create Model from Modelfile

ollama create fine_tuned_Qwen2.5-Code-3B-hq-only -f Modelfile

Step 06: Verifying with Ollama

ollama ls

Talismanic/hf_2_gguf_2_ollama.md