How to handle asymmetry of WAV data?

WAV files can store PCM audio (WAVE_FORMAT_PCM). The WAV file format specification says:

The data format and maximum and minimums values for PCM waveform samples of various sizes are as follows:

Sample Size Data Format Maximum Value Minimum Value

One to eight bits Unsigned integer 255 (0xFF) 0

Nine or more bits Signed integer i Largest positive value of i Most negative value of i

For example, the maximum, minimum, and midpoint values for 8-bit and 16-bit PCM waveform data are as follows:

Format Maximum Value Minimum Value Midpoint Value

8-bit PCM 255 (0xFF) 0 128 (0x80)

16-bit PCM 32767 (0x7FFF) -32768 (-0x8000) 0

Sample Size	Data Format	Maximum Value	Minimum Value
One to eight bits	Unsigned integer	255 (0xFF)	0
Nine or more bits	Signed integer i	Largest positive value of i	Most negative value of i

Format	Maximum Value	Minimum Value	Midpoint Value
8-bit PCM	255 (0xFF)	0	128 (0x80)
16-bit PCM	32767 (0x7FFF)	-32768 (-0x8000)	0

Both the signed and unsigned formats are asymmetrical. How to handle the asymmetry? The signed version is two's complement representation, and AES17 defines the meaning of full-scale amplitude in this case:

amplitude of a 997-Hz sine wave whose positive peak value reaches the positive digital full scale, leaving the negative maximum code unused.

NOTE In 2's-complement representation, the negative peak is 1 LSB away from the negative maximum code.

As does IEC 61606-3:

amplitude of a 997 Hz sinusoid whose peak positive sample just reaches positive digital full-scale (in 2’s-complement a binary value of 0111…1111 to make up the word length) and whose peak negative sample just reaches a value one away from negative digital full-scale (1000…0001 to make up the word length) leaving the maximum negative code (1000…0000) unused

So, for example, for 16-bit audio, a signal that just reaches +32,767 and −32,767 would be full-scale, while one that reaches −32,768 exceeds full-scale.

The midpoint example for 8-bit clarifies that the symmetry of unsigned data is the same as for signed data. So, for 8-bit data, a signal that reaches from 1 to 255 would be full-scale, and the value 0 exceeds full-scale.

WAVE Audio File Format Specifications says:

For float data, full scale is 1.

So, to correctly convert signed ints to float, divide by 2**(b-1) - 1, where b is the number of bits.

To correctly convert unsigned ints to float, subtract 2**(b-1), then, similarly, divide by 2**(b-1) - 1.

The float representation will then be limited to +1.0 full-scale in the positive direction, but can exceed −1.0 full-scale in the negative direction.

Examples

Unsigned

WAV format actually allows for less than 8 bits:

The bits that represent the sample amplitude are stored in the most significant bits of i, and the remaining bits are set to zero.

So I'll show 2-bit audio first (wBitsPerSample = 2), because it's simpler to follow:

WAV	Sample	int	float	Comment
0xC0	0b11	3	+1.0	full-scale
0x80	0b10	2	0.0	midpoint
0x40	0b01	1	−1.0	full-scale
0x00	0b00	0	−2.0

For 8-bit audio, as mentioned above, 255 is full-scale, 128 is midpoint, 1 is negative full-scale, and 0 exceeds full-scale:

WAV	Sample	int	float	Comment
0xFF	0b1111_1111	255	+1.000	full-scale
0xFE	0b1111_1110	254	+0.992
0xFD	0b1111_1101	253	+0.984
...	...	...	...
0x82	0b1000_0010	130	+0.016
0x81	0b1000_0001	129	+0.008
0x80	0b1000_0000	128	0.000	midpoint
0x7F	0b0111_1111	127	−0.008
0x7E	0b0111_1110	126	−0.016
...	...	...	...
0x03	0b0000_0011	3	−0.984
0x02	0b0000_0010	2	−0.992
0x01	0b0000_0001	1	−1.000	full-scale
0x00	0b0000_0000	0	−1.008

Signed

For 16-bit audio, the interpretation is signed:

WAV	Sample	int	float	Comment
0x7FFF	0b0111_1111_1111_1111	+32,767	+1.00000	full-scale
0x7FFE	0b0111_1111_1111_1110	+32,766	+0.99997
0x7FFD	0b0111_1111_1111_1101	+32,765	+0.99994
...	...	...	...
0x0002	0b0000_0000_0000_0010	+2	+0.00006
0x0001	0b0000_0000_0000_0001	+1	+0.00003
0x0000	0b0000_0000_0000_0000	0	0.00000	midpoint
0xFFFF	0b1111_1111_1111_1111	−1	−0.00003
0xFFFE	0b1111_1111_1111_1110	−2	−0.00006
...	...	...	...
0x8003	0b1000_0000_0000_0011	−32,765	−0.99994
0x8002	0b1000_0000_0000_0010	−32,766	−0.99997
0x8001	0b1000_0000_0000_0001	−32,767	−1.00000	full-scale
0x8000	0b1000_0000_0000_0000	−32,768	−1.00003

As is 9-bit audio:

WAV	Sample	int	float	Comment
0x7F80	0b0111_1111_1	+255	+1.000	full-scale
0x7F00	0b0111_1111_0	+254	+0.996
0x7E80	0b0111_1110_1	+253	+0.992
...	...	...	...
0x0100	0b0000_0001_0	+2	+0.008
0x0080	0b0000_0000_1	+1	+0.004
0x0000	0b0000_0000_0	0	0.000	midpoint
0xFF80	0b1111_1111_1	−1	−0.004
0xFF00	0b1111_1111_0	−2	−0.008
...	...	...	...
0x8180	0b1000_0001_1	−253	−0.992
0x8100	0b1000_0001_0	−254	−0.996
0x8080	0b1000_0000_1	−255	−1.000	full-scale
0x8000	0b1000_0000_0	−256	−1.004

endolith/WAV interpretation.md

How to handle asymmetry of WAV data?

Examples

Unsigned

Signed

endolith commented Jun 27, 2024 •

edited

Loading

Uh oh!

endolith/WAV interpretation.md

How to handle asymmetry of WAV data?

Examples

Unsigned

Signed

endolith commented Jun 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

endolith commented Jun 27, 2024 •

edited

Loading