First clone or download the huggingface repository. Change the model name as per need.
from huggingface_hub import snapshot_download
model_name = "Qwen/Qwen2.5-0.5B-Instruct"
snapshot_download(repo_id=model_name, local_dir="./hf_model")
Save the above code in model_downloader.py
. Then run the above code with
python model_downloader.py
Clone the llama.cpp and build it following the steps:
git clone https://github.com/ggerganov/llama.cpp.git
cd cd llama.cp
pip install -r requirements.txt
cd ..
python llama.cpp/convert_hf_to_gguf.py hf_model --outfile Qwen2.5-0.5B-Instruct.gguf --outtype q8_0
Here --outfile
indicates what will be the name of the output gguf file. Also --outtype
defines the quatization level. This should create the gguf file named fine_tuned_Qwen2.5-Code-3B-hq-only.gguf
vi Modelfile
Add below in the model file:
FROM /path_to_file/Qwen2.5-0.5B-Instruct.gguf
SYSTEM "You are a helpful coding assistant."
PARAMETER temperature 0.7
PARAMETER top_k 40
ollama create fine_tuned_Qwen2.5-Code-3B-hq-only -f Modelfile
ollama ls