whisper.cpp on MacOS

Reference: https://github.com/ggml-org/whisper.cpp/blob/master/README.md

Shell

In .zshrc, add:

export DYLD_LIBRARY_PATH="/usr/local/lib:${DYLD_LIBRARY_PATH}"

PyTorch

Install pytorch and numpy

uv venv --python 3.12
source .venv/bin/activate
uv pip install torch numpy

The default PyTorch installation now includes Apple Silicon optimizations and Metal Performance Shaders (MPS) support for GPU acceleration.

check-mps.py:

import torch

print(f'PyTorch version: {torch.__version__}')
print(f'MPS available: {torch.backends.mps.is_available()}')
print(f'MPS built: {torch.backends.mps.is_built()}')
print()

# Check for MPS
if torch.backends.mps.is_available():
    device = torch.device("mps")
    print("Using Apple Silicon GPU")
elif torch.cuda.is_available():
    device = torch.device("cuda")
    print("Using NVIDIA GPU")
else:
    device = torch.device("cpu")
    print("Using CPU")

# Use the device
tensor = torch.randn(1000, 1000).to(device)
print()
print(tensor)

Verify installation:

python check-mps.py

Example Output:

PyTorch version: 2.7.0
MPS available: True
MPS built: True

Using Apple Silicon GPU

tensor([[ 0.2364, -0.0264, -2.1872,  ..., -1.7450,  1.0675,  0.9529],
        [-1.2570,  0.3314, -0.3106,  ...,  0.4044,  1.6333, -0.1500],
        [-0.2596, -0.0834,  0.0127,  ..., -2.2035, -0.8963,  0.7504],
        ...,
        [ 1.1132, -1.0610, -1.0890,  ...,  0.9274, -0.6170,  0.1616],
        [-2.5608,  0.3655, -0.1584,  ..., -1.7450,  0.5396, -1.3833],
        [-0.1134,  0.0801,  0.3779,  ...,  1.2989,  0.6712, -1.0711]],
       device='mps:0')

Core ML support

pip install ane_transformers openai-whisper coremltools

Xcode

coremlc is needed to build the models in the next step. This is a part of Xcode which will need to be downloaded from the Mac App store.

Once Xcode is installed:

Set the correct developer directory:

sudo xcode-select -s /Applications/Xcode.app/Contents/Developer

Verify the installation:

xcrun --find coremlc

whisper.cpp compilation

Build whisper.cpp with Core ML support and install files under /usr/local

# using CMake
cmake -B build -DWHISPER_COREML=1
cmake --build build -j --config Release

cd build
sudo make install
cd build/src
sudo cp -a libwhisper* /usr/local/lib

Models

cd whisper.cpp
./models/download-ggml-model.sh medium.en
Done! Model 'medium.en' saved in '/Users/jftuga/github.com/ggerganov/whisper.cpp/models/ggml-medium.en.bin'

./models/generate-coreml-model.sh medium.en

Output:

Torch version 2.7.0 has not been tested with coremltools. You may run into unexpected errors. Torch 2.5.0 is the most recent version that has been tested.
/Users/jftuga/github.com/ggerganov/whisper.cpp/.venv/lib/python3.12/site-packages/coremltools/optimize/torch/palettization/fake_palettize.py:82: SyntaxWarning: invalid escape sequence '\_'
  n_bits (:obj:`int`): Number of palettization bits. There would be :math:`2^{n\_bits}` unique weights in the ``LUT``.
ModelDimensions(n_mels=80, n_audio_ctx=1500, n_audio_state=1024, n_audio_head=16, n_audio_layer=24, n_vocab=51864, n_text_ctx=448, n_text_state=1024, n_text_head=16, n_text_layer=24)
/Users/jftuga/github.com/ggerganov/whisper.cpp/models/convert-whisper-to-coreml.py:146: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert x.shape[1:] == self.positional_embedding.shape[::-1], "incorrect audio shape"
/Users/jftuga/github.com/ggerganov/whisper.cpp/.venv/lib/python3.12/site-packages/ane_transformers/reference/layer_norm.py:60: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert inputs.size(1) == self.num_channels
/Users/jftuga/github.com/ggerganov/whisper.cpp/models/convert-whisper-to-coreml.py:88: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  scale = float(dim_per_head)**-0.5
Converting PyTorch Frontend ==> MIL Ops: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 4075/4076 [00:00<00:00, 10787.12 ops/s]
Running MIL frontend_pytorch pipeline: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 42.62 passes/s]
Running MIL default pipeline: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 89/89 [00:07<00:00, 11.31 passes/s]
Running MIL backend_mlprogram pipeline: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 61.59 passes/s]
done converting
/Users/jftuga/github.com/ggerganov/whisper.cpp/models/coreml-encoder-medium.en.mlmodelc/coremldata.bin
models/coreml-encoder-medium.en.mlmodelc -> models/ggml-medium.en-encoder.mlmodelc

jftuga/README.md