Sultan Qasim Khan sultanqasim

Which GGUF is right for me? (Opinionated)

Good question! I am collecting human data on how quantization affects outputs. See here for more information: ggml-org/llama.cpp#5962

In the meantime, use the largest that fully fits in your GPU. If you can comfortably fit Q4_K_S, try using a model with more parameters.

llama.cpp feature matrix

See the wiki upstream: https://github.com/ggerganov/llama.cpp/wiki/Feature-matrix

Update!

This gist is out of date and I can no longer help much, as I got rid of my Mac.

Please visit T2 Linux website for more and better information:

https://t2linux.org/

Acknowledgements

This gist is just a compilation of the hard work that others have put in. I'm not a software developer, so if there are any mistakes or better ways of doing things, I'd appreciate any suggestions. Here's a list of the real heroes who made this possible:

	# Copyright 2023 Mistral AI and The HuggingFace Inc. team. All rights reserved.
	#
	# Licensed under the Apache License, Version 2.0 (the "License");
	# you may not use this file except in compliance with the License.
	# You may obtain a copy of the License at
	#
	# http://www.apache.org/licenses/LICENSE-2.0
	#
	# Unless required by applicable law or agreed to in writing, software
	# distributed under the License is distributed on an "AS IS" BASIS,