How to run LLM AI model locally on a PC/Server

Here are the steps to get a model running on your not-so-powerful computer:

brew install llama.cpp

other ways are available here

Download the Gemma 3 model from unsloth (https://huggingface.co/unsloth). The 1-billion-parameter version should work with most CPUs, (faster with gpus in which case you need llama.cpp with gpu support):

llama-cli -hf unsloth/gemma-3-1b-it-GGUF

You can also choose different model weight compression, as they're offered in 1- to 16-bit versions.

Once the download is complete, you'll land in a "waiting for prompt" area. You can test the model there, but it's more fun to run it in server mode, accessible via an HTTP API.
To run the model in an OpenAI API-compatible server:

llama-server -hf unsloth/gemma-3-1b-it-GGUF

You should now be able to access the UI via your browser at http://localhost:8080

The chat completions endpoint is available at http://localhost:8080/v1/chat/completions.

0xMurage/readme.md