Skip to content

Instantly share code, notes, and snippets.

@endolith
Last active June 27, 2024 17:02
Interpretation of WAV file sample data and asymmetry

How to handle asymmetry of WAV data?

WAV files can store PCM audio (WAVE_FORMAT_PCM). The WAV file format specification says:

The data format and maximum and minimums values for PCM waveform samples of various sizes are as follows:

Sample Size Data Format Maximum Value Minimum Value
One to eight bits Unsigned integer 255 (0xFF) 0
Nine or more bits Signed integer i Largest positive value of i Most negative value of i

For example, the maximum, minimum, and midpoint values for 8-bit and 16-bit PCM waveform data are as follows:

Format Maximum Value Minimum Value Midpoint Value
8-bit PCM 255 (0xFF) 0 128 (0x80)
16-bit PCM 32767 (0x7FFF) -32768 (-0x8000) 0

Both the signed and unsigned formats are asymmetrical. How to handle the asymmetry? The signed version is two's complement representation, and AES17 defines the meaning of full-scale amplitude in this case:

amplitude of a 997-Hz sine wave whose positive peak value reaches the positive digital full scale, leaving the negative maximum code unused.

NOTE In 2's-complement representation, the negative peak is 1 LSB away from the negative maximum code.

As does IEC 61606-3:

amplitude of a 997 Hz sinusoid whose peak positive sample just reaches positive digital full-scale (in 2’s-complement a binary value of 0111…1111 to make up the word length) and whose peak negative sample just reaches a value one away from negative digital full-scale (1000…0001 to make up the word length) leaving the maximum negative code (1000…0000) unused

So, for example, for 16-bit audio, a signal that just reaches +32,767 and −32,767 would be full-scale, while one that reaches −32,768 exceeds full-scale.

The midpoint example for 8-bit clarifies that the symmetry of unsigned data is the same as for signed data. So, for 8-bit data, a signal that reaches from 1 to 255 would be full-scale, and the value 0 exceeds full-scale.

WAVE Audio File Format Specifications says:

For float data, full scale is 1.

So, to correctly convert signed ints to float, divide by 2**(b-1) - 1, where b is the number of bits.

To correctly convert unsigned ints to float, subtract 2**(b-1), then, similarly, divide by 2**(b-1) - 1.

The float representation will then be limited to +1.0 full-scale in the positive direction, but can exceed −1.0 full-scale in the negative direction.

Examples

Unsigned

WAV format actually allows for less than 8 bits:

The bits that represent the sample amplitude are stored in the most significant bits of i, and the remaining bits are set to zero.

So I'll show 2-bit audio first (wBitsPerSample = 2), because it's simpler to follow:

WAV Sample int float Comment
0xC0 0b11 3 +1.0 full-scale
0x80 0b10 2  0.0 midpoint
0x40 0b01 1 −1.0 full-scale
0x00 0b00 0 −2.0

For 8-bit audio, as mentioned above, 255 is full-scale, 128 is midpoint, 1 is negative full-scale, and 0 exceeds full-scale:

WAV Sample int float Comment
0xFF 0b1111_1111 255 +1.000 full-scale
0xFE 0b1111_1110 254 +0.992
0xFD 0b1111_1101 253 +0.984
... ... ... ...
0x82 0b1000_0010 130 +0.016
0x81 0b1000_0001 129 +0.008
0x80 0b1000_0000 128  0.000 midpoint
0x7F 0b0111_1111 127 −0.008
0x7E 0b0111_1110 126 −0.016
... ... ... ...
0x03 0b0000_0011 3 −0.984
0x02 0b0000_0010 2 −0.992
0x01 0b0000_0001 1 −1.000 full-scale
0x00 0b0000_0000 0 −1.008

Signed

For 16-bit audio, the interpretation is signed:

WAV Sample int float Comment
0x7FFF 0b0111_1111_1111_1111 +32,767 +1.00000 full-scale
0x7FFE 0b0111_1111_1111_1110 +32,766 +0.99997
0x7FFD 0b0111_1111_1111_1101 +32,765 +0.99994
... ... ... ...
0x0002 0b0000_0000_0000_0010 +2 +0.00006
0x0001 0b0000_0000_0000_0001 +1 +0.00003
0x0000 0b0000_0000_0000_0000  0  0.00000 midpoint
0xFFFF 0b1111_1111_1111_1111 −1 −0.00003
0xFFFE 0b1111_1111_1111_1110 −2 −0.00006
... ... ... ...
0x8003 0b1000_0000_0000_0011 −32,765 −0.99994
0x8002 0b1000_0000_0000_0010 −32,766 −0.99997
0x8001 0b1000_0000_0000_0001 −32,767 −1.00000 full-scale
0x8000 0b1000_0000_0000_0000 −32,768 −1.00003

As is 9-bit audio:

WAV Sample int float Comment
0x7F80 0b0111_1111_1 +255 +1.000 full-scale
0x7F00 0b0111_1111_0 +254 +0.996
0x7E80 0b0111_1110_1 +253 +0.992
... ... ... ...
0x0100 0b0000_0001_0 +2 +0.008
0x0080 0b0000_0000_1 +1 +0.004
0x0000 0b0000_0000_0  0  0.000 midpoint
0xFF80 0b1111_1111_1 −1 −0.004
0xFF00 0b1111_1111_0 −2 −0.008
... ... ... ...
0x8180 0b1000_0001_1 −253 −0.992
0x8100 0b1000_0001_0 −254 −0.996
0x8080 0b1000_0000_1 −255 −1.000 full-scale
0x8000 0b1000_0000_0 −256 −1.004
@endolith
Copy link
Author

endolith commented Jun 27, 2024

See scipy/scipy#12507 for more context on the different ways to interpret this.

MATLAB's audioread, USB Audio and Android all interpret integer PCM data as fixed-point

which produces slightly different values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment