This document outlines the implementation of a WebSocket-based real-time speech transcription system using Twilio's services.
- Initializes a transcriber with specific encoding (
pcm_mulaw
) and sample rate (8000
).
const transcriber = new RealtimeService({
apiKey: process.env.ASSEMBLYAI_API_KEY,
encoding: 'pcm_mulaw',
sampleRate: 8000
});
- Processes partial and final transcripts on receipt.
- Handles errors and connection states appropriately.
transcriber.on('transcript.partial', (partialTranscript) => {
if (!partialTranscript.text) return;
console.clear();
console.log(partialTranscript.text);
});
transcriber.on('transcript.final', (finalTranscript) => {
console.clear();
console.log(finalTranscript.text);
});
- Confirms media stream connectivity upon connection.
- Handles audio data after confirming the stream is connected.
ws.on('message', async (message) => {
const msg = JSON.parse(message);
switch (msg.event) {
case 'connected':
console.info('Twilio media stream connected');
break;
case 'start':
console.info('Twilio media stream started');
break;
case 'media':
await transcriberConnectionPromise;
transcriber.sendAudio(Buffer.from(msg.media.payload, 'base64'));
break;
case 'stop':
console.info('Twilio media stream stopped');
break;
}
});
- Logs errors and connection states.
- Properly closes the transcription service when necessary.
transcriber.on('error', (error) => {
console.error(error);
});
transcriber.on('close', () => {
console.log('Disconnected from real-time service');
});
- Waits for and closes the transcriber connection.
await transcriberConnectionPromise;
- transcriber: Instance of
new RealtimeService
. - API Key:
process.env.ASSEMBLYAI_API_KEY
. - Encoding:
'pcm_mulaw'
. - Sample Rate:
8000
.
- Use
await/await
for async operations. - Handle errors and close connections appropriately to avoid resource leaks.
- Keep API keys in secure environment variables.
- Send audio data only after confirming the stream is connected.
- Be aware of rate limits imposed by Twilio's service.
This implementation provides a WebSocket connection to Twilio's real-time transcription service, handling events for partial and final transcripts. It includes proper error handling, ensures correct media stream connectivity, and manages resource cleanup efficiently.