A Beginner’s Guide to H.26x Video Streams

The Big Picture (Start Here)

An H.26x video stream is:

A time‑ordered series of compressed pictures, plus instructions describing how to decode them, packaged in a way that works for files, networks, and live streaming.

When debugging, you are usually asking one of these questions:

Why won’t this stream decode? → parameter sets, NAL units, reference frames
Why does playback stutter or reorder frames? → PTS vs DTS, B‑frames
Why can’t I start decoding mid‑stream? → missing SPS / IDR
Why is video timing wrong? → time_base / time_scale confusion

Keep those questions in mind as we walk through the pieces.

1. NAL Units (Network Abstraction Layer)

What problem do they solve?

They are containers that let encoded video data travel over:

files (MP4, MKV)
networks (RTP, RTSP)
transport streams (MPEG‑TS)

Mental model

A NAL unit is like an envelope.
Inside the envelope is either:

part of a picture, or

decoder instructions (metadata)

Every H.264/H.265 stream is fundamentally:

NAL | NAL | NAL | NAL | ...

Common NAL unit types you’ll see

NAL Type	Meaning
Slice NALs	Actual compressed picture data
SPS	Sequence Parameter Set (global config)
PPS	Picture Parameter Set (per‑picture config)
SEI	Supplemental info (timing, HDR, captions, etc.)
IDR slice	Special “reset” picture

In Annex B streams (raw .h264, MPEG‑TS):

00 00 00 01 [NAL header][payload]

In MP4/MKV:

[length][NAL payload]

✅ Debugging tip:
If your decoder says “missing SPS”, it literally hasn’t seen the right NAL unit yet.

2. Access Units (One Frame ≠ One NAL)

This trips up almost everyone at first.

Mental model

An access unit is “everything needed to decode ONE displayed picture”.

An access unit may contain:

multiple slice NALs (one frame split into chunks)
optional SPS/PPS
SEI
exactly one picture’s worth of image data

So:

[ Access Unit ] [ Access Unit ] [ Access Unit ]

Not:

[ Frame ][ Frame ][ Frame ]

✅ Debugging tip:
If you’re counting NALs and expecting “1 NAL = 1 frame”, timestamps will never make sense.

3. I, P, and B Frames (Prediction Types)

The core compression trick

Instead of encoding every frame fully, H.26x encodes differences.

Frame types

Type	Depends on	Explanation
I (Intra)	nothing	A self‑contained image
P (Predicted)	past frame(s)	“What changed since before?”
B (Bi‑predicted)	past and future frames	“What changed between before and after?”

Why B‑frames matter

B‑frames:

compress very well
require future frames to decode

That single fact explains:

decode order vs display order
DTS vs PTS
reordering bugs

✅ Debugging tip:
If you remove B‑frames, PTS == DTS and life gets much simpler.

4. Decode Order vs Display Order

The key confusion

Frames are not always decoded in the order they are shown.

Example (display order):

I   B   B   P

To decode the B‑frames, the decoder must first decode the P‑frame:

Decode order: I → P → B → B
Display order: I → B → B → P

Why this exists

Because B‑frames reference the future.

✅ Debugging tip:
If frames appear out of order unless you respect PTS, this is why.

5. PTS and DTS (Presentation vs Decode Time)

This is the most important concept for playback issues.

Definitions

DTS (Decode Time Stamp)
“When must this be decoded?”
PTS (Presentation Time Stamp)
“When must this be shown?”

Mental model

DTS is for the decoder
PTS is for the viewer

With no B‑frames:

PTS == DTS

With B‑frames:

DTS: I  P  B  B
PTS: I  B  B  P

What goes wrong in real systems

Missing DTS → decoder stalls
Wrong PTS → jerky playback
Constant DTS but reordered PTS → frames jump

✅ Debugging tip:
If video is smooth but frames are wrong, inspect PTS.
If video freezes or never decodes, inspect DTS.

6. IDR Frames (Instant Decoder Refresh)

What problem do they solve?

Random access and error recovery.

Mental model

An IDR frame is a hard reset point for the decoder.

At an IDR:

all reference history is discarded
decoding can start cleanly
no older frames are needed

Not all I‑frames are equal

I‑frame: intra‑coded
IDR frame: intra‑coded and resets reference state

✅ Debugging tip:
If you join a live stream mid‑way and see garbage until the next IDR — that’s expected.

7. SPS (Sequence Parameter Set)

What is it?

The global configuration for decoding.

Contains things like:

resolution
profile / level
reference frame limits
timing info (frame rate hints)

Mental model

SPS is the decoder’s instruction manual.

Without it:

decoder does not know how to interpret the bitstream
even perfect frame data is useless

Practical reality

Sent at stream start
Often resent before IDRs
Must be available before decoding frames

✅ Debugging tip:
“No SPS found” = decoder literally does not know the video’s shape.

8. time_scale / time_base (Where Timing Comes From)

This is subtle and often mis‑implemented.

The idea

Timestamps are integers, but they represent time using a scale.

Example:

time_scale = 90000
PTS = 180000
→ 2 seconds

Where it appears

MP4: timescale
RTP: clock rate (often 90 kHz)
FFmpeg: time_base

Mental model

Timestamps are “ticks” — time_scale tells you how long one tick is.

Common bugs

Mixing 1/1000 vs 1/90000
Assuming milliseconds
Rescaling incorrectly when remuxing

✅ Debugging tip:
If audio and video slowly drift apart, your time base math is wrong.

9. H.265 Differences (Quick Notes)

H.265 (HEVC) keeps the same concepts but adds:

VPS (Video Parameter Set) above SPS
More frame types internally
Higher compression → more reordering

But:

PTS/DTS, NAL units, IDR, SPS all work the same way.

10. A Simple End‑to‑End Mental Model

Think of an H.26x stream like this:

SPS/PPS: “Here’s how to decode what follows”
IDR frame: “You can start decoding now”
P/B frames: “Here are changes relative to others”
DTS: “Decode in this order”
PTS: “Show in this order”
time_scale: “Here’s how fast time moves”

If any of those are missing or inconsistent, playback breaks.

Recommended Next Debugging Steps

Since you’re new but clearly technical, the fastest way to solidify this:

Use ffprobe -show_frames -show_packets
Look at:
- pict_type
- pts, dts
- key_frame
Compare decode vs display order
Try the same stream with and without B‑frames

If you want, next we can:

walk through a real ffprobe dump line‑by‑line
explain Annex B vs MP4 in detail
map RTP timestamps → PTS/DTS
or explain how encoders decide GOP structure

Just tell me where you want to go deeper.

joshooaj/Beginners-Guide-To-H26x.md

Select an option

No results found

Select an option

No results found

A Beginner’s Guide to H.26x Video Streams

The Big Picture (Start Here)

1. NAL Units (Network Abstraction Layer)

What problem do they solve?

Mental model

Common NAL unit types you’ll see

2. Access Units (One Frame ≠ One NAL)

Mental model

3. I, P, and B Frames (Prediction Types)

The core compression trick

Frame types

Why B‑frames matter

4. Decode Order vs Display Order

The key confusion

Why this exists

5. PTS and DTS (Presentation vs Decode Time)

Definitions

Mental model

What goes wrong in real systems

6. IDR Frames (Instant Decoder Refresh)

What problem do they solve?

Mental model

Not all I‑frames are equal

7. SPS (Sequence Parameter Set)

What is it?

Mental model

Practical reality

8. time_scale / time_base (Where Timing Comes From)

The idea

Where it appears

Mental model

Common bugs

9. H.265 Differences (Quick Notes)

10. A Simple End‑to‑End Mental Model

Recommended Next Debugging Steps