A scalable, queue-based deepfake detection system that analyzes videos and returns confidence scores, manipulation labels, and visual evidence.
- Architecture Overview
- Component Architecture
- Data Flow
- Scalability
- Security & Abuse Prevention
- Deployment Options
- Monitoring & Observability
- Known Limitations
- Technical Stack Summary
The system uses a microservices architecture with clear separation between API ingestion, job processing, and ML inference. This enables independent scaling and fault isolation.
Client → API → Queue → Worker → ML Service → Results
| Principle | Implementation |
|---|---|
| Asynchronous processing | Long-running ML inference doesn't block the API |
| Horizontal scalability | Add more workers or ML instances as load increases |
| Resilience | Jobs persist in queue across restarts |
| Cost optimization | Aggressive cleanup minimizes storage costs |
Technology: Node.js/Express
Responsibilities:
- Accept video uploads (multipart/form-data)
- Accept public video URLs
- Validate inputs and enforce limits
- Enqueue analysis jobs
- Return job status and results
| Endpoint | Method | Purpose |
|---|---|---|
/api/upload |
POST | File upload |
/api/analyze |
POST | URL analysis |
/api/job/:id |
GET | Job status + results |
- ✅ Redis-backed rate limiting (per-IP throttling)
- ✅ File size limits (configurable, default 200MB)
- ✅ Request timeouts
- ✅ Input validation (MIME types, URL reachability)
Technology: Redis + BullMQ
- ✅ Job persistence across restarts
- ✅ Built-in retry logic with exponential backoff
- ✅ Priority queuing support
- ✅ Horizontal worker scaling
- ✅ Real-time job status tracking
graph LR
A[waiting] --> B[active]
B --> C[completed]
B --> D[failed]
D -.retry 3x.-> A
Technology: Node.js
Responsibilities:
- Poll queue for new jobs
- Download videos from URLs or filesystem
- Validate format and size
- Forward to ML service
- Update job status
- Cleanup temporary files
| Source Type | Method | Notes |
|---|---|---|
| Direct URLs | axios streaming | Handles CDN redirects |
| Social platforms | yt-dlp fallback | TikTok, YouTube, Instagram |
| Security | No shell execution | Prevents injection attacks |
- 🔴 Immediate: Delete source files after analysis
- 🟡 Scheduled: Remove artifacts older than 24 hours
Technology: Python/FastAPI
Preprocessing → Parallel Analysis → Fusion → Evidence Generation
1. Preprocessing (ffmpeg)
- Extract frames at 1 FPS
- Extract audio (16kHz mono WAV)
- Detect and crop faces
2. Parallel Analysis
- Visual: CNN-based frame analysis
- Audio: Acoustic artifact detection
- Lipsync: Audio-visual synchronization check
3. Fusion
- Weighted ensemble of model predictions
- Configurable weights per modality
4. Evidence Generation
- Attention heatmaps (Grad-CAM)
- Suspicious frame thumbnails
- Temporal manipulation timeline
{
"confidence": 0.87,
"label": "manipulated",
"evidence": {
"thumbnails": ["url1", "url2"],
"heatmaps": ["url1", "url2"],
"timeline": [
{"frame": 120, "score": 0.92}
]
}
}1. Client submits video
↓
2. API validates → Enqueues job → Returns jobId
↓
3. Worker picks job → Downloads/reads video → Validates
↓
4. Worker forwards to ML service
↓
5. ML service analyzes → Returns results
↓
6. Worker updates Redis → Deletes temp files
↓
7. Client polls /job/:id → Receives results
| Error Type | Handling Strategy |
|---|---|
| Download failures | Retry with exponential backoff (3 attempts) |
| ML service errors | Job marked failed, error message stored |
| Timeout | Hard limit on processing time (configurable) |
| Invalid input | Immediate rejection with clear error message |
| Component | Scaling Strategy |
|---|---|
| API | Stateless, scale with load balancer |
| Workers | Add instances to process more concurrent jobs |
| ML Service | Add GPU instances for higher throughput |
ML inference is typically the bottleneck
Solution: Scale ML instances horizontally or use GPU batching
Monitoring: Alert on queue depth >100 jobs
| Resource | Requirement |
|---|---|
| Processing time | 40-60s for 1080p 30s video |
| GPU | NVIDIA T4 or better recommended |
| Memory | 4GB per worker, 8GB+ per ML instance |
| Storage | Minimal (aggressive cleanup) |
- ✅ Global: Configurable requests per hour per IP
- ✅ Per-endpoint limits for uploads and URL analysis
- ✅ Redis-backed (survives API restarts)
- ✅ File size limits enforced
- ✅ MIME type verification (magic bytes, not extension)
- ✅ URL validation and optional allowlist/blocklist
- ✅ Timeout enforcement
- ✅ Temporary files deleted immediately after analysis
- ✅ Results cached for configurable TTL (default 24 hours)
- ✅ No long-term storage of uploaded content
- ✅ Evidence URLs use time-limited signed tokens
services:
redis:
image: redis:7-alpine
api:
build: ./api
environment:
- REDIS_URL=redis://redis:6379
worker:
build: ./worker
environment:
- REDIS_URL=redis://redis:6379
- ML_SERVICE_URL=http://ml:8000
ml:
build: ./ml
runtime: nvidia # GPU supportRun: docker-compose up
| Provider | Use Case |
|---|---|
| Upstash | Serverless, global edge |
| AWS ElastiCache | Enterprise, AWS ecosystem |
| Redis Cloud | Managed, multi-cloud |
| Component | Instance Type |
|---|---|
| API/Workers | Standard CPU (auto-scaling) |
| ML Service | GPU instances (T4, A10G, or similar) |
| Type | Solution |
|---|---|
| Temporary | Local disk (ephemeral) |
| Evidence | CDN/object storage (S3, R2, Cloudflare) |
| Results | Redis (cache) or database (if long-term needed) |
| Service | Monthly Cost |
|---|---|
| Managed Redis | $10-20 |
| API/Worker compute | $50-100 |
| GPU compute | $200-400 |
| Total | ~$300-500 |
- ⏱️ Job latency percentiles (p50, p95, p99)
- 📥 Queue depth (waiting jobs)
- 💼 Worker utilization
- 🧠 ML inference time per stage
⚠️ Error rates by type
| Purpose | Tool Options |
|---|---|
| Logging | Structured JSON (Pino, Winston) |
| Metrics | Prometheus + Grafana |
| Alerting | PagerDuty, Opsgenie |
| Tracing | OpenTelemetry (optional) |
| Alert | Threshold |
|---|---|
| Queue depth | >100 for >5 minutes |
| Worker crash rate | >10% |
| ML service response time | >2 minutes |
| Redis connection | Any failure |
| Limitation | Description |
|---|---|
| Async-only | No real-time streaming analysis |
| Single video | No batch upload API yet |
| Social platform TOS | yt-dlp usage may violate some platform terms |
| Model drift | Detection models require periodic retraining |
- Batch processing API (analyze multiple videos)
- WebSocket support for real-time progress updates
- Admin dashboard (queue management, analytics)
- Model versioning and A/B testing framework
- API authentication (API keys, OAuth)
- Multi-region deployment for reduced latency
| Layer | Technology | Why |
|---|---|---|
| API | Node.js/Express | Fast, lightweight, great async I/O |
| Queue | Redis + BullMQ | Proven reliability, horizontal scaling |
| Worker | Node.js | Same stack as API, easy streaming downloads |
| ML | Python/FastAPI | ML ecosystem, GPU support, async API |
| Deployment | Docker + Fly.io | Containerization, global edge deployment |
Questions? Want to discuss architecture or collaborate?
- 📧 Email: email
- 🔗 LinkedIn: saminwankwo
- 💻 GitHub: saminwankwo
Note: This document describes the high-level architecture and design patterns. Implementation details, specific models, and proprietary logic are not included.