Skip to content

Instantly share code, notes, and snippets.

@Anietz
Created February 7, 2025 08:04
Show Gist options
  • Save Anietz/41c9fb27419244e293d3e33ce8b898ec to your computer and use it in GitHub Desktop.
Save Anietz/41c9fb27419244e293d3e33ce8b898ec to your computer and use it in GitHub Desktop.
Websocket Realtime Transcription

WebSocket Real-Time Speech Transcription Implementation

This document outlines the implementation of a WebSocket-based real-time speech transcription system using Twilio's services.

Features

Connection Setup

  • Initializes a transcriber with specific encoding (pcm_mulaw) and sample rate (8000).
const transcriber = new RealtimeService({
  apiKey: process.env.ASSEMBLYAI_API_KEY,
  encoding: 'pcm_mulaw',
  sampleRate: 8000
});

Event Handling

  • Processes partial and final transcripts on receipt.
  • Handles errors and connection states appropriately.
transcriber.on('transcript.partial', (partialTranscript) => {
  if (!partialTranscript.text) return;
  console.clear();
  console.log(partialTranscript.text);
});

transcriber.on('transcript.final', (finalTranscript) => {
  console.clear();
  console.log(finalTranscript.text);
});

WebSocket Events

  • Confirms media stream connectivity upon connection.
  • Handles audio data after confirming the stream is connected.
ws.on('message', async (message) => {
  const msg = JSON.parse(message);
  switch (msg.event) {
    case 'connected':
      console.info('Twilio media stream connected');
      break;
    case 'start':
      console.info('Twilio media stream started');
      break;
    case 'media':
      await transcriberConnectionPromise;
      transcriber.sendAudio(Buffer.from(msg.media.payload, 'base64'));
      break;
    case 'stop':
      console.info('Twilio media stream stopped');
      break;
  }
});

Error Handling

  • Logs errors and connection states.
  • Properly closes the transcription service when necessary.
transcriber.on('error', (error) => {
  console.error(error);
});

transcriber.on('close', () => {
  console.log('Disconnected from real-time service');
});

Cleanup

  • Waits for and closes the transcriber connection.
await transcriberConnectionPromise;

Parameters

  • transcriber: Instance of new RealtimeService.
  • API Key: process.env.ASSEMBLYAI_API_KEY.
  • Encoding: 'pcm_mulaw'.
  • Sample Rate: 8000.

Usage Notes

  • Use await/await for async operations.
  • Handle errors and close connections appropriately to avoid resource leaks.

Important Notes

  • Keep API keys in secure environment variables.
  • Send audio data only after confirming the stream is connected.
  • Be aware of rate limits imposed by Twilio's service.

Summary

This implementation provides a WebSocket connection to Twilio's real-time transcription service, handling events for partial and final transcripts. It includes proper error handling, ensures correct media stream connectivity, and manages resource cleanup efficiently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment