These are the steps to set up a self-hosted Gemma 4 chat with a web UI that you can use from your phone and laptop, keeping all your data and models private. It is just llama.cpp built-in web UI served over Tailscale.
The setup gives you:
- A chat interface accessible from any device on your Tailscale network
- Web search via MCP so the model can look things up (important since models have a knowledge cutoff)
- Streaming responses, conversation history and the same UI everywhere
My Setup: