Skip to content

Instantly share code, notes, and snippets.

@saminwankwo
Last active October 20, 2025 03:58
Show Gist options
  • Save saminwankwo/5c02bed1e91e3d145156ebdf9732c755 to your computer and use it in GitHub Desktop.
Save saminwankwo/5c02bed1e91e3d145156ebdf9732c755 to your computer and use it in GitHub Desktop.
Production-ready deepfake detection system architecture

🎭 Deepfake Detector - System Design

A scalable, queue-based deepfake detection system that analyzes videos and returns confidence scores, manipulation labels, and visual evidence.

Architecture Stack License


📋 Table of Contents


🏗 Architecture Overview

The system uses a microservices architecture with clear separation between API ingestion, job processing, and ML inference. This enables independent scaling and fault isolation.

Client → API → Queue → Worker → ML Service → Results

Design Principles

Principle Implementation
Asynchronous processing Long-running ML inference doesn't block the API
Horizontal scalability Add more workers or ML instances as load increases
Resilience Jobs persist in queue across restarts
Cost optimization Aggressive cleanup minimizes storage costs

🔧 Component Architecture

1. API Layer

Technology: Node.js/Express

Responsibilities:

  • Accept video uploads (multipart/form-data)
  • Accept public video URLs
  • Validate inputs and enforce limits
  • Enqueue analysis jobs
  • Return job status and results

Key Endpoints

Endpoint Method Purpose
/api/upload POST File upload
/api/analyze POST URL analysis
/api/job/:id GET Job status + results

Security Features

  • ✅ Redis-backed rate limiting (per-IP throttling)
  • ✅ File size limits (configurable, default 200MB)
  • ✅ Request timeouts
  • ✅ Input validation (MIME types, URL reachability)

2. Message Queue

Technology: Redis + BullMQ

Why BullMQ?

  • ✅ Job persistence across restarts
  • ✅ Built-in retry logic with exponential backoff
  • ✅ Priority queuing support
  • ✅ Horizontal worker scaling
  • ✅ Real-time job status tracking

Job Lifecycle

graph LR
    A[waiting] --> B[active]
    B --> C[completed]
    B --> D[failed]
    D -.retry 3x.-> A
Loading

3. Worker Pool

Technology: Node.js

Responsibilities:

  • Poll queue for new jobs
  • Download videos from URLs or filesystem
  • Validate format and size
  • Forward to ML service
  • Update job status
  • Cleanup temporary files

Download Strategy

Source Type Method Notes
Direct URLs axios streaming Handles CDN redirects
Social platforms yt-dlp fallback TikTok, YouTube, Instagram
Security No shell execution Prevents injection attacks

Cleanup Policy

  • 🔴 Immediate: Delete source files after analysis
  • 🟡 Scheduled: Remove artifacts older than 24 hours

4. ML Service

Technology: Python/FastAPI

Pipeline Stages

Preprocessing → Parallel Analysis → Fusion → Evidence Generation

1. Preprocessing (ffmpeg)

  • Extract frames at 1 FPS
  • Extract audio (16kHz mono WAV)
  • Detect and crop faces

2. Parallel Analysis

  • Visual: CNN-based frame analysis
  • Audio: Acoustic artifact detection
  • Lipsync: Audio-visual synchronization check

3. Fusion

  • Weighted ensemble of model predictions
  • Configurable weights per modality

4. Evidence Generation

  • Attention heatmaps (Grad-CAM)
  • Suspicious frame thumbnails
  • Temporal manipulation timeline

Response Format

{
  "confidence": 0.87,
  "label": "manipulated",
  "evidence": {
    "thumbnails": ["url1", "url2"],
    "heatmaps": ["url1", "url2"],
    "timeline": [
      {"frame": 120, "score": 0.92}
    ]
  }
}

🔄 Data Flow

Happy Path

1. Client submits video
   ↓
2. API validates → Enqueues job → Returns jobId
   ↓
3. Worker picks job → Downloads/reads video → Validates
   ↓
4. Worker forwards to ML service
   ↓
5. ML service analyzes → Returns results
   ↓
6. Worker updates Redis → Deletes temp files
   ↓
7. Client polls /job/:id → Receives results

Error Handling

Error Type Handling Strategy
Download failures Retry with exponential backoff (3 attempts)
ML service errors Job marked failed, error message stored
Timeout Hard limit on processing time (configurable)
Invalid input Immediate rejection with clear error message

📈 Scalability

Horizontal Scaling

Component Scaling Strategy
API Stateless, scale with load balancer
Workers Add instances to process more concurrent jobs
ML Service Add GPU instances for higher throughput

Bottleneck Analysis

ML inference is typically the bottleneck

Solution: Scale ML instances horizontally or use GPU batching

Monitoring: Alert on queue depth >100 jobs

Resource Requirements (Estimated)

Resource Requirement
Processing time 40-60s for 1080p 30s video
GPU NVIDIA T4 or better recommended
Memory 4GB per worker, 8GB+ per ML instance
Storage Minimal (aggressive cleanup)

🔒 Security & Abuse Prevention

Rate Limiting

  • ✅ Global: Configurable requests per hour per IP
  • ✅ Per-endpoint limits for uploads and URL analysis
  • ✅ Redis-backed (survives API restarts)

Input Validation

  • ✅ File size limits enforced
  • ✅ MIME type verification (magic bytes, not extension)
  • ✅ URL validation and optional allowlist/blocklist
  • ✅ Timeout enforcement

Data Privacy

  • ✅ Temporary files deleted immediately after analysis
  • ✅ Results cached for configurable TTL (default 24 hours)
  • ✅ No long-term storage of uploaded content
  • ✅ Evidence URLs use time-limited signed tokens

🚀 Deployment Options

Development (Docker Compose)

services:
  redis:
    image: redis:7-alpine
  
  api:
    build: ./api
    environment:
      - REDIS_URL=redis://redis:6379
  
  worker:
    build: ./worker
    environment:
      - REDIS_URL=redis://redis:6379
      - ML_SERVICE_URL=http://ml:8000
  
  ml:
    build: ./ml
    runtime: nvidia  # GPU support

Run: docker-compose up


Production Considerations

Managed Redis

Provider Use Case
Upstash Serverless, global edge
AWS ElastiCache Enterprise, AWS ecosystem
Redis Cloud Managed, multi-cloud

Compute

Component Instance Type
API/Workers Standard CPU (auto-scaling)
ML Service GPU instances (T4, A10G, or similar)

Storage

Type Solution
Temporary Local disk (ephemeral)
Evidence CDN/object storage (S3, R2, Cloudflare)
Results Redis (cache) or database (if long-term needed)

Cost Estimate (1000 analyses/day)

Service Monthly Cost
Managed Redis $10-20
API/Worker compute $50-100
GPU compute $200-400
Total ~$300-500

📊 Monitoring & Observability

Key Metrics

  • ⏱️ Job latency percentiles (p50, p95, p99)
  • 📥 Queue depth (waiting jobs)
  • 💼 Worker utilization
  • 🧠 ML inference time per stage
  • ⚠️ Error rates by type

Recommended Tools

Purpose Tool Options
Logging Structured JSON (Pino, Winston)
Metrics Prometheus + Grafana
Alerting PagerDuty, Opsgenie
Tracing OpenTelemetry (optional)

Critical Alerts

Alert Threshold
Queue depth >100 for >5 minutes
Worker crash rate >10%
ML service response time >2 minutes
Redis connection Any failure

⚠️ Known Limitations

Limitation Description
Async-only No real-time streaming analysis
Single video No batch upload API yet
Social platform TOS yt-dlp usage may violate some platform terms
Model drift Detection models require periodic retraining

🔮 Future Enhancements

  • Batch processing API (analyze multiple videos)
  • WebSocket support for real-time progress updates
  • Admin dashboard (queue management, analytics)
  • Model versioning and A/B testing framework
  • API authentication (API keys, OAuth)
  • Multi-region deployment for reduced latency

🛠 Technical Stack Summary

Layer Technology Why
API Node.js/Express Fast, lightweight, great async I/O
Queue Redis + BullMQ Proven reliability, horizontal scaling
Worker Node.js Same stack as API, easy streaming downloads
ML Python/FastAPI ML ecosystem, GPU support, async API
Deployment Docker + Fly.io Containerization, global edge deployment

📚 References & Resources


💬 Contact

Questions? Want to discuss architecture or collaborate?


Note: This document describes the high-level architecture and design patterns. Implementation details, specific models, and proprietary logic are not included.

Built with ❤️ for transparent AI safety

GitHub LinkedIn

Last Updated: October 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment