🎭 Deepfake Detector - System Design

A scalable, queue-based deepfake detection system that analyzes videos and returns confidence scores, manipulation labels, and visual evidence.

📋 Table of Contents

Architecture Overview
Component Architecture
Data Flow
Scalability
Security & Abuse Prevention
Deployment Options
Monitoring & Observability
Known Limitations
Technical Stack Summary

🏗 Architecture Overview

The system uses a microservices architecture with clear separation between API ingestion, job processing, and ML inference. This enables independent scaling and fault isolation.

Client → API → Queue → Worker → ML Service → Results

Design Principles

Principle	Implementation
Asynchronous processing	Long-running ML inference doesn't block the API
Horizontal scalability	Add more workers or ML instances as load increases
Resilience	Jobs persist in queue across restarts
Cost optimization	Aggressive cleanup minimizes storage costs

🔧 Component Architecture

1. API Layer

Technology: Node.js/Express

Responsibilities:

Accept video uploads (multipart/form-data)
Accept public video URLs
Validate inputs and enforce limits
Enqueue analysis jobs
Return job status and results

Key Endpoints

Endpoint	Method	Purpose
`/api/upload`	POST	File upload
`/api/analyze`	POST	URL analysis
`/api/job/:id`	GET	Job status + results

Security Features

✅ Redis-backed rate limiting (per-IP throttling)
✅ File size limits (configurable, default 200MB)
✅ Request timeouts
✅ Input validation (MIME types, URL reachability)

2. Message Queue

Technology: Redis + BullMQ

Why BullMQ?

✅ Job persistence across restarts
✅ Built-in retry logic with exponential backoff
✅ Priority queuing support
✅ Horizontal worker scaling
✅ Real-time job status tracking

Job Lifecycle

graph LR
    A[waiting] --> B[active]
    B --> C[completed]
    B --> D[failed]
    D -.retry 3x.-> A

3. Worker Pool

Technology: Node.js

Responsibilities:

Poll queue for new jobs
Download videos from URLs or filesystem
Validate format and size
Forward to ML service
Update job status
Cleanup temporary files

Download Strategy

Source Type	Method	Notes
Direct URLs	axios streaming	Handles CDN redirects
Social platforms	yt-dlp fallback	TikTok, YouTube, Instagram
Security	No shell execution	Prevents injection attacks

Cleanup Policy

🔴 Immediate: Delete source files after analysis
🟡 Scheduled: Remove artifacts older than 24 hours

4. ML Service

Technology: Python/FastAPI

Pipeline Stages

Preprocessing → Parallel Analysis → Fusion → Evidence Generation

1. Preprocessing (ffmpeg)

Extract frames at 1 FPS
Extract audio (16kHz mono WAV)
Detect and crop faces

2. Parallel Analysis

Visual: CNN-based frame analysis
Audio: Acoustic artifact detection
Lipsync: Audio-visual synchronization check

3. Fusion

Weighted ensemble of model predictions
Configurable weights per modality

4. Evidence Generation

Attention heatmaps (Grad-CAM)
Suspicious frame thumbnails
Temporal manipulation timeline

Response Format

{
  "confidence": 0.87,
  "label": "manipulated",
  "evidence": {
    "thumbnails": ["url1", "url2"],
    "heatmaps": ["url1", "url2"],
    "timeline": [
      {"frame": 120, "score": 0.92}
    ]
  }
}

🔄 Data Flow

Happy Path

1. Client submits video
   ↓
2. API validates → Enqueues job → Returns jobId
   ↓
3. Worker picks job → Downloads/reads video → Validates
   ↓
4. Worker forwards to ML service
   ↓
5. ML service analyzes → Returns results
   ↓
6. Worker updates Redis → Deletes temp files
   ↓
7. Client polls /job/:id → Receives results

Error Handling

Error Type	Handling Strategy
Download failures	Retry with exponential backoff (3 attempts)
ML service errors	Job marked failed, error message stored
Timeout	Hard limit on processing time (configurable)
Invalid input	Immediate rejection with clear error message

📈 Scalability

Horizontal Scaling

Component	Scaling Strategy
API	Stateless, scale with load balancer
Workers	Add instances to process more concurrent jobs
ML Service	Add GPU instances for higher throughput

Bottleneck Analysis

ML inference is typically the bottleneck

Solution: Scale ML instances horizontally or use GPU batching

Monitoring: Alert on queue depth >100 jobs

Resource Requirements (Estimated)

Resource	Requirement
Processing time	40-60s for 1080p 30s video
GPU	NVIDIA T4 or better recommended
Memory	4GB per worker, 8GB+ per ML instance
Storage	Minimal (aggressive cleanup)

🔒 Security & Abuse Prevention

Rate Limiting

✅ Global: Configurable requests per hour per IP
✅ Per-endpoint limits for uploads and URL analysis
✅ Redis-backed (survives API restarts)

Input Validation

✅ File size limits enforced
✅ MIME type verification (magic bytes, not extension)
✅ URL validation and optional allowlist/blocklist
✅ Timeout enforcement

Data Privacy

✅ Temporary files deleted immediately after analysis
✅ Results cached for configurable TTL (default 24 hours)
✅ No long-term storage of uploaded content
✅ Evidence URLs use time-limited signed tokens

🚀 Deployment Options

Development (Docker Compose)

services:
  redis:
    image: redis:7-alpine
  
  api:
    build: ./api
    environment:
      - REDIS_URL=redis://redis:6379
  
  worker:
    build: ./worker
    environment:
      - REDIS_URL=redis://redis:6379
      - ML_SERVICE_URL=http://ml:8000
  
  ml:
    build: ./ml
    runtime: nvidia  # GPU support

Run: docker-compose up

Production Considerations

Managed Redis

Provider	Use Case
Upstash	Serverless, global edge
AWS ElastiCache	Enterprise, AWS ecosystem
Redis Cloud	Managed, multi-cloud

Compute

Component	Instance Type
API/Workers	Standard CPU (auto-scaling)
ML Service	GPU instances (T4, A10G, or similar)

Storage

Type	Solution
Temporary	Local disk (ephemeral)
Evidence	CDN/object storage (S3, R2, Cloudflare)
Results	Redis (cache) or database (if long-term needed)

Cost Estimate (1000 analyses/day)

Service	Monthly Cost
Managed Redis	$10-20
API/Worker compute	$50-100
GPU compute	$200-400
Total	~$300-500

📊 Monitoring & Observability

Key Metrics

⏱️ Job latency percentiles (p50, p95, p99)
📥 Queue depth (waiting jobs)
💼 Worker utilization
🧠 ML inference time per stage
⚠️ Error rates by type

Recommended Tools

Purpose	Tool Options
Logging	Structured JSON (Pino, Winston)
Metrics	Prometheus + Grafana
Alerting	PagerDuty, Opsgenie
Tracing	OpenTelemetry (optional)

Critical Alerts

Alert	Threshold
Queue depth	>100 for >5 minutes
Worker crash rate	>10%
ML service response time	>2 minutes
Redis connection	Any failure

⚠️ Known Limitations

Limitation	Description
Async-only	No real-time streaming analysis
Single video	No batch upload API yet
Social platform TOS	yt-dlp usage may violate some platform terms
Model drift	Detection models require periodic retraining

🔮 Future Enhancements

Batch processing API (analyze multiple videos)
WebSocket support for real-time progress updates
Admin dashboard (queue management, analytics)
Model versioning and A/B testing framework
API authentication (API keys, OAuth)
Multi-region deployment for reduced latency

🛠 Technical Stack Summary

Layer	Technology	Why
API	Node.js/Express	Fast, lightweight, great async I/O
Queue	Redis + BullMQ	Proven reliability, horizontal scaling
Worker	Node.js	Same stack as API, easy streaming downloads
ML	Python/FastAPI	ML ecosystem, GPU support, async API
Deployment	Docker + Fly.io	Containerization, global edge deployment

📚 References & Resources

💬 Contact

Questions? Want to discuss architecture or collaborate?

📧 Email: email
🔗 LinkedIn: saminwankwo
💻 GitHub: saminwankwo

Note: This document describes the high-level architecture and design patterns. Implementation details, specific models, and proprietary logic are not included.

Built with ❤️ for transparent AI safety

Last Updated: October 2025

saminwankwo/deepfake-detector-system-design.md