Below is a mental model I’ve found useful when turning FFmpeg (or yt‑dlp + FFmpeg) into a tiny “MCP server” that feels as friendly as, say, an image‑resize API:
- Job = ( source → transform → destination )
- source – URL, upload, or pipe
- transform – one of a handful of named presets (audio‑only/mp3, “YouTube → HLS 360p”, sprite‑sheet, etc.)
- destination – file download, cloud bucket, or a streamable response
- The user POSTs a tiny JSON blob; the server turns it into the full FFmpeg command, runs it, and returns a
job_id
.
POST /jobs
{
"source": "https://youtu.be/abc123",
"preset": "mp3",
"callback": "https://example.com/webhook"
}
Presets are just JSON files checked into the repo:
Why it helps:
- Users never see CLI flags.
- You can tune presets without breaking API clients.
- Power users can still POST
"custom_args": ["-vf", "scale=1280:-1"]
if you expose an escape‑hatch field.
verb | path | purpose |
---|---|---|
GET | /presets |
list & describe presets |
POST | /jobs |
create a job (returns job_id ) |
GET | /jobs/{id} |
live status JSON (queue, running, …) |
GET | /jobs/{id}/log |
stream stderr/stdout (tail -f) |
GET | /jobs/{id}/out |
download final artifact |
DELETE | /jobs/{id} |
cancel |
Everything else (auth, rate‑limit, etc.) sits in a gateway.
┌──────────┐ enqueue ┌───────────────┐ spawn ┌──────────┐
│ API ├────────────▶ Job Queue ├──────────▶ Worker │
└──────────┘ └───────────────┘ │ ffmpeg │
└──────────┘
- Queue = Redis, Rabbit, or just BullMQ.
- Worker pulls the preset, does string interpolation (
${SOURCE}
,${DEST}
), thenexeca()
the full ffmpeg/yt‑dlp chain. - Progress: parse FFmpeg’s
stderr
lines (frame=…
) → push to Redis pub/sub → surface via/jobs/{id}
.
Feature | Implementation sketch |
---|---|
Self‑documenting | /presets plus OpenAPI (Swagger) generated from same JSON |
Dry‑run | dry_run=true → return the full command string |
Webhook / SSE | POST callback or Server‑Sent Events for real‑time UI |
Version pinning | Accept-Version: 2025‑05‑02 header, or /v1/ path |
Batch jobs | Accept an array in POST; respond with an array of IDs |
Health & metrics | /healthz , /metrics (Prometheus) |
CLI concept (yt‑dlp/ffmpeg) | API surface |
---|---|
-S sort descriptors |
part of a “quality” preset param ("quality": "720p" ). |
--compat-options |
hidden behind "legacy": true toggle |
--ignore-errors |
job‑level flag "best_effort": true |
Long -vf filterchains |
separate pipeline step in preset |
Updates / channels | container image tag; handled by dev‑ops, not the API |
curl -s -X POST https://ff.mcp/api/jobs \
-H "Authorization: Bearer $TOKEN" \
-d '{"source":"https://youtu.be/dQw4w9WgXcQ","preset":"gif:5s"}' \
| jq -r '.job_id'
Five seconds later:
curl -O https://ff.mcp/api/jobs/$ID/out # rickroll.gif
Treat FFmpeg like a render farm with opinionated presets, not a Swiss‑army CLI.
People shouldn’t need to know about -b:v
vs. -maxrate
; they just pick “1080p‑hq” or “mp3‑128k” and your server owns the rest.
Hope this gives you a clear mental framework. Once you’re comfortable, you can expose a second “expert” endpoint that takes raw CLI strings—just keep it separate so your simple path stays simple.