Reference: https://github.com/ggml-org/whisper.cpp/blob/master/README.md
In .zshrc
, add:
export DYLD_LIBRARY_PATH="/usr/local/lib:${DYLD_LIBRARY_PATH}"
Install pytorch
and numpy
uv venv --python 3.12
source .venv/bin/activate
uv pip install torch numpy
The default PyTorch installation now includes Apple Silicon
optimizations and Metal Performance Shaders (MPS)
support for GPU acceleration.
check-mps.py:
import torch
print(f'PyTorch version: {torch.__version__}')
print(f'MPS available: {torch.backends.mps.is_available()}')
print(f'MPS built: {torch.backends.mps.is_built()}')
print()
# Check for MPS
if torch.backends.mps.is_available():
device = torch.device("mps")
print("Using Apple Silicon GPU")
elif torch.cuda.is_available():
device = torch.device("cuda")
print("Using NVIDIA GPU")
else:
device = torch.device("cpu")
print("Using CPU")
# Use the device
tensor = torch.randn(1000, 1000).to(device)
print()
print(tensor)
Verify installation:
python check-mps.py
Example Output:
PyTorch version: 2.7.0
MPS available: True
MPS built: True
Using Apple Silicon GPU
tensor([[ 0.2364, -0.0264, -2.1872, ..., -1.7450, 1.0675, 0.9529],
[-1.2570, 0.3314, -0.3106, ..., 0.4044, 1.6333, -0.1500],
[-0.2596, -0.0834, 0.0127, ..., -2.2035, -0.8963, 0.7504],
...,
[ 1.1132, -1.0610, -1.0890, ..., 0.9274, -0.6170, 0.1616],
[-2.5608, 0.3655, -0.1584, ..., -1.7450, 0.5396, -1.3833],
[-0.1134, 0.0801, 0.3779, ..., 1.2989, 0.6712, -1.0711]],
device='mps:0')
pip install ane_transformers openai-whisper coremltools
coremlc
is needed to build the models in the next step. This is a part of Xcode
which will need to be downloaded from the Mac App
store.
Once Xcode
is installed:
Set the correct developer directory:
sudo xcode-select -s /Applications/Xcode.app/Contents/Developer
Verify the installation:
xcrun --find coremlc
Build whisper.cpp with Core ML support and install files under /usr/local
# using CMake
cmake -B build -DWHISPER_COREML=1
cmake --build build -j --config Release
cd build
sudo make install
cd build/src
sudo cp -a libwhisper* /usr/local/lib
cd whisper.cpp
./models/download-ggml-model.sh medium.en
Done! Model 'medium.en' saved in '/Users/jftuga/github.com/ggerganov/whisper.cpp/models/ggml-medium.en.bin'
./models/generate-coreml-model.sh medium.en
Output:
Torch version 2.7.0 has not been tested with coremltools. You may run into unexpected errors. Torch 2.5.0 is the most recent version that has been tested.
/Users/jftuga/github.com/ggerganov/whisper.cpp/.venv/lib/python3.12/site-packages/coremltools/optimize/torch/palettization/fake_palettize.py:82: SyntaxWarning: invalid escape sequence '\_'
n_bits (:obj:`int`): Number of palettization bits. There would be :math:`2^{n\_bits}` unique weights in the ``LUT``.
ModelDimensions(n_mels=80, n_audio_ctx=1500, n_audio_state=1024, n_audio_head=16, n_audio_layer=24, n_vocab=51864, n_text_ctx=448, n_text_state=1024, n_text_head=16, n_text_layer=24)
/Users/jftuga/github.com/ggerganov/whisper.cpp/models/convert-whisper-to-coreml.py:146: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert x.shape[1:] == self.positional_embedding.shape[::-1], "incorrect audio shape"
/Users/jftuga/github.com/ggerganov/whisper.cpp/.venv/lib/python3.12/site-packages/ane_transformers/reference/layer_norm.py:60: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert inputs.size(1) == self.num_channels
/Users/jftuga/github.com/ggerganov/whisper.cpp/models/convert-whisper-to-coreml.py:88: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
scale = float(dim_per_head)**-0.5
Converting PyTorch Frontend ==> MIL Ops: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 4075/4076 [00:00<00:00, 10787.12 ops/s]
Running MIL frontend_pytorch pipeline: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 42.62 passes/s]
Running MIL default pipeline: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 89/89 [00:07<00:00, 11.31 passes/s]
Running MIL backend_mlprogram pipeline: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 61.59 passes/s]
done converting
/Users/jftuga/github.com/ggerganov/whisper.cpp/models/coreml-encoder-medium.en.mlmodelc/coremldata.bin
models/coreml-encoder-medium.en.mlmodelc -> models/ggml-medium.en-encoder.mlmodelc