Skip to content

Instantly share code, notes, and snippets.

@joshooaj
Created March 17, 2026 17:08
Show Gist options
  • Select an option

  • Save joshooaj/02a502bfd33da603b122a11d54866dbc to your computer and use it in GitHub Desktop.

Select an option

Save joshooaj/02a502bfd33da603b122a11d54866dbc to your computer and use it in GitHub Desktop.
Copilot-generated technical explanation of H.26x for beginners

A Beginner’s Guide to H.26x Video Streams

The Big Picture (Start Here)

An H.26x video stream is:

A time‑ordered series of compressed pictures, plus instructions describing how to decode them, packaged in a way that works for files, networks, and live streaming.

When debugging, you are usually asking one of these questions:

  • Why won’t this stream decode? → parameter sets, NAL units, reference frames
  • Why does playback stutter or reorder frames? → PTS vs DTS, B‑frames
  • Why can’t I start decoding mid‑stream? → missing SPS / IDR
  • Why is video timing wrong? → time_base / time_scale confusion

Keep those questions in mind as we walk through the pieces.


1. NAL Units (Network Abstraction Layer)

What problem do they solve?

They are containers that let encoded video data travel over:

  • files (MP4, MKV)
  • networks (RTP, RTSP)
  • transport streams (MPEG‑TS)

Mental model

A NAL unit is like an envelope.
Inside the envelope is either:

  • part of a picture, or
  • decoder instructions (metadata)

Every H.264/H.265 stream is fundamentally:

NAL | NAL | NAL | NAL | ...

Common NAL unit types you’ll see

NAL Type Meaning
Slice NALs Actual compressed picture data
SPS Sequence Parameter Set (global config)
PPS Picture Parameter Set (per‑picture config)
SEI Supplemental info (timing, HDR, captions, etc.)
IDR slice Special “reset” picture

In Annex B streams (raw .h264, MPEG‑TS):

00 00 00 01 [NAL header][payload]

In MP4/MKV:

[length][NAL payload]

Debugging tip:
If your decoder says “missing SPS”, it literally hasn’t seen the right NAL unit yet.


2. Access Units (One Frame ≠ One NAL)

This trips up almost everyone at first.

Mental model

An access unit is “everything needed to decode ONE displayed picture”.

An access unit may contain:

  • multiple slice NALs (one frame split into chunks)
  • optional SPS/PPS
  • SEI
  • exactly one picture’s worth of image data

So:

[ Access Unit ] [ Access Unit ] [ Access Unit ]

Not:

[ Frame ][ Frame ][ Frame ]

Debugging tip:
If you’re counting NALs and expecting “1 NAL = 1 frame”, timestamps will never make sense.


3. I, P, and B Frames (Prediction Types)

The core compression trick

Instead of encoding every frame fully, H.26x encodes differences.

Frame types

Type Depends on Explanation
I (Intra) nothing A self‑contained image
P (Predicted) past frame(s) “What changed since before?”
B (Bi‑predicted) past and future frames “What changed between before and after?”

Why B‑frames matter

B‑frames:

  • compress very well
  • require future frames to decode

That single fact explains:

  • decode order vs display order
  • DTS vs PTS
  • reordering bugs

Debugging tip:
If you remove B‑frames, PTS == DTS and life gets much simpler.


4. Decode Order vs Display Order

The key confusion

Frames are not always decoded in the order they are shown.

Example (display order):

I   B   B   P

To decode the B‑frames, the decoder must first decode the P‑frame:

Decode order: I → P → B → B
Display order: I → B → B → P

Why this exists

Because B‑frames reference the future.

Debugging tip:
If frames appear out of order unless you respect PTS, this is why.


5. PTS and DTS (Presentation vs Decode Time)

This is the most important concept for playback issues.

Definitions

  • DTS (Decode Time Stamp)
    “When must this be decoded?”
  • PTS (Presentation Time Stamp)
    “When must this be shown?”

Mental model

DTS is for the decoder
PTS is for the viewer

With no B‑frames:

PTS == DTS

With B‑frames:

DTS: I  P  B  B
PTS: I  B  B  P

What goes wrong in real systems

  • Missing DTS → decoder stalls
  • Wrong PTS → jerky playback
  • Constant DTS but reordered PTS → frames jump

Debugging tip:
If video is smooth but frames are wrong, inspect PTS.
If video freezes or never decodes, inspect DTS.


6. IDR Frames (Instant Decoder Refresh)

What problem do they solve?

Random access and error recovery.

Mental model

An IDR frame is a hard reset point for the decoder.

At an IDR:

  • all reference history is discarded
  • decoding can start cleanly
  • no older frames are needed

Not all I‑frames are equal

  • I‑frame: intra‑coded
  • IDR frame: intra‑coded and resets reference state

Debugging tip:
If you join a live stream mid‑way and see garbage until the next IDR — that’s expected.


7. SPS (Sequence Parameter Set)

What is it?

The global configuration for decoding.

Contains things like:

  • resolution
  • profile / level
  • reference frame limits
  • timing info (frame rate hints)

Mental model

SPS is the decoder’s instruction manual.

Without it:

  • decoder does not know how to interpret the bitstream
  • even perfect frame data is useless

Practical reality

  • Sent at stream start
  • Often resent before IDRs
  • Must be available before decoding frames

Debugging tip:
“No SPS found” = decoder literally does not know the video’s shape.


8. time_scale / time_base (Where Timing Comes From)

This is subtle and often mis‑implemented.

The idea

Timestamps are integers, but they represent time using a scale.

Example:

time_scale = 90000
PTS = 180000
→ 2 seconds

Where it appears

  • MP4: timescale
  • RTP: clock rate (often 90 kHz)
  • FFmpeg: time_base

Mental model

Timestamps are “ticks” — time_scale tells you how long one tick is.

Common bugs

  • Mixing 1/1000 vs 1/90000
  • Assuming milliseconds
  • Rescaling incorrectly when remuxing

Debugging tip:
If audio and video slowly drift apart, your time base math is wrong.


9. H.265 Differences (Quick Notes)

H.265 (HEVC) keeps the same concepts but adds:

  • VPS (Video Parameter Set) above SPS
  • More frame types internally
  • Higher compression → more reordering

But:

PTS/DTS, NAL units, IDR, SPS all work the same way.


10. A Simple End‑to‑End Mental Model

Think of an H.26x stream like this:

  1. SPS/PPS: “Here’s how to decode what follows”
  2. IDR frame: “You can start decoding now”
  3. P/B frames: “Here are changes relative to others”
  4. DTS: “Decode in this order”
  5. PTS: “Show in this order”
  6. time_scale: “Here’s how fast time moves”

If any of those are missing or inconsistent, playback breaks.


Recommended Next Debugging Steps

Since you’re new but clearly technical, the fastest way to solidify this:

  • Use ffprobe -show_frames -show_packets
  • Look at:
    • pict_type
    • pts, dts
    • key_frame
  • Compare decode vs display order
  • Try the same stream with and without B‑frames

If you want, next we can:

  • walk through a real ffprobe dump line‑by‑line
  • explain Annex B vs MP4 in detail
  • map RTP timestamps → PTS/DTS
  • or explain how encoders decide GOP structure

Just tell me where you want to go deeper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment