At the moment OpenAI's Whisper service limits the size of files to be transcribed to 20MB. That's about 20-25 minutes at a 192K bit rate. To transcribe bigger audio files its then necessary to split them. The OpenAI page about Whisper recommends the file is split during moments of silence so sentences are not split across two files
FFMPEG is able to both detect silences and split files and doing this manually is easy. This repository implements a great wrapper around FFMPEG for PHP that support programmatic control of FFMPEG and this is really useful for anyone responsible for a WordPress site.
The repository code surfaces the FFMPEG features needed to splt files but not its silence detection capability. This Gist provides two files each containing a class that make it possible to capture silence information. I'm not an expert on the repo code so there may be a better simpler way but it works for me. This Gist says a few things about the files and provides an example of using them.
The wrapper capture output from FFMPEG as it runs and provides that information as a set of event data that classes can register to listen for. SilenceDetectListener.php implements a class to handle FFMPEG output about silence information within an audio file. Silence information is returned as pairs of lines, the first is the time code of the start of the silence, the second the end and duration.
[silencedetect @ 000001daa80cb5c0] silence_start: 555.333016
[silencedetect @ 000001daa80cb5c0] silence_end: 563.114649 | silence_duration: 7.781633
The code includes a regular expression to parse an arbitrary number of these pairs to create a set of records that are stored in a format which is a segue to the second file. NullAudio.php implements a class that can be a 'format' required by the repo code's save function. The save function activates FFMPEG to, for example, transcode to a different format say MP3 to WMA so the format implementation represents the output format expected. In the case of silence detection, there is no output audio but even so, the repo code save function requires a format so this class implements a null format. The two other essential things it does is:
- register the silence detection class and
- define the specific noise detection parameters to be used In this implementation of the NullAudio class, the parameters are hard-coded but the class can be modified to take parameters in a constructor or via a 'set' method.
require_once 'misc_pages\audio\NullAudio.php';
// Instantiate the main class and open an audio file
$ffmpeg = FFMpeg\FFMpeg::create();
$audio = $ffmpeg->open( $filePath );
// NullAudio is a custom format that does not actually save the audio, but allows us to process it and detect silences
$format = new FFMpeg\Format\Audio\NullAudio();
// Using this format, the command run will be
// ffmpeg -i $filePath -af silencedetect=n=-50dB:d=1 -f null -
// Look in the NullAudio class
$audio->save( $format, '-');
echo print_r( $format->silences, true );