Skip to content

Instantly share code, notes, and snippets.

@endolith
Last active June 27, 2024 17:02

Revisions

  1. endolith revised this gist May 18, 2020. 1 changed file with 4 additions and 4 deletions.
    8 changes: 4 additions & 4 deletions WAV interpretation.md
    Original file line number Diff line number Diff line change
    @@ -51,7 +51,7 @@ So I'll show 2-bit audio first (wBitsPerSample = 2), because it's simpler to fol
    | WAV | Sample | int | float | Comment |
    |------|--------|-----|-------|------------|
    | 0xC0 | 0b11 | 3 | +1.0 | full-scale |
    | 0x80 | 0b10 | 2 | 0.0 | midpoint |
    | 0x80 | 0b10 | 2 | 0.0 | midpoint |
    | 0x40 | 0b01 | 1 | −1.0 | full-scale |
    | 0x00 | 0b00 | 0 | −2.0 | |

    @@ -65,7 +65,7 @@ For 8-bit audio, as mentioned above, 255 is full-scale, 128 is midpoint, 1 is ne
    | ... | ... | ... | ... | |
    | 0x82 | 0b1000_0010 | 130 | +0.016 | |
    | 0x81 | 0b1000_0001 | 129 | +0.008 | |
    | 0x80 | 0b1000_0000 | 128 | 0.000 | midpoint |
    | 0x80 | 0b1000_0000 | 128 | 0.000 | midpoint |
    | 0x7F | 0b0111_1111 | 127 | −0.008 | |
    | 0x7E | 0b0111_1110 | 126 | −0.016 | |
    | ... | ... | ... | ... | |
    @@ -86,7 +86,7 @@ For 16-bit audio, the interpretation is signed:
    | ... | ... | ... | ... | |
    | 0x0002 | 0b0000_0000_0000_0010 | +2 | +0.00006 | |
    | 0x0001 | 0b0000_0000_0000_0001 | +1 | +0.00003 | |
    | 0x0000 | 0b0000_0000_0000_0000 | 0 | 0.00000 | midpoint |
    | 0x0000 | 0b0000_0000_0000_0000 | 0 | 0.00000 | midpoint |
    | 0xFFFF | 0b1111_1111_1111_1111 | −1 | −0.00003 | |
    | 0xFFFE | 0b1111_1111_1111_1110 | −2 | −0.00006 | |
    | ... | ... | ... | ... | |
    @@ -105,7 +105,7 @@ As is 9-bit audio:
    | ... | ... | ... | ... | |
    | 0x0100 | 0b0000_0001_0 | +2 | +0.008 | |
    | 0x0080 | 0b0000_0000_1 | +1 | +0.004 | |
    | 0x0000 | 0b0000_0000_0 | 0 | 0.000 | midpoint |
    | 0x0000 | 0b0000_0000_0 | 0 | 0.000 | midpoint |
    | 0xFF80 | 0b1111_1111_1 | −1 | −0.004 | |
    | 0xFF00 | 0b1111_1111_0 | −2 | −0.008 | |
    | ... | ... | ... | ... | |
  2. endolith revised this gist May 4, 2020. 1 changed file with 2 additions and 2 deletions.
    4 changes: 2 additions & 2 deletions WAV interpretation.md
    Original file line number Diff line number Diff line change
    @@ -24,9 +24,9 @@ As does [IEC 61606-3](https://www.sis.se/api/document/preview/571704/):

    > amplitude of a 997 Hz sinusoid whose peak positive sample just reaches positive digital full-scale (in 2’s-complement a binary value of 0111…1111 to make up the word length) and whose peak negative sample just reaches a value one away from negative digital full-scale (1000…0001 to make up the word length) leaving the maximum negative code (1000…0000) unused
    So, for example, for 16-bit audio, +32,767 and −32,767 would be the full-scale values, while −32,768 *exceeds* full-scale.
    So, for example, for 16-bit audio, a signal that just reaches +32,767 and −32,767 would be full-scale, while one that reaches −32,768 *exceeds* full-scale.

    The midpoint example for 8-bit clarifies that the symmetry of unsigned data is the same as for signed data. So, for 8-bit data, the value 255 is positive full-scale, the value 1 is negative full-scale, and the value 0 exceeds full-scale.
    The midpoint example for 8-bit clarifies that the symmetry of unsigned data is the same as for signed data. So, for 8-bit data, a signal that reaches from 1 to 255 would be full-scale, and the value 0 exceeds full-scale.

    [WAVE Audio File Format Specifications](http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/WAVE/WAVE.html) says:

  3. endolith revised this gist May 4, 2020. No changes.
  4. endolith revised this gist May 2, 2020. 1 changed file with 16 additions and 16 deletions.
    32 changes: 16 additions & 16 deletions WAV interpretation.md
    Original file line number Diff line number Diff line change
    @@ -50,7 +50,7 @@ So I'll show 2-bit audio first (wBitsPerSample = 2), because it's simpler to fol

    | WAV | Sample | int | float | Comment |
    |------|--------|-----|-------|------------|
    | 0xc0 | 0b11 | 3 | +1.0 | full-scale |
    | 0xC0 | 0b11 | 3 | +1.0 | full-scale |
    | 0x80 | 0b10 | 2 | 0.0 | midpoint |
    | 0x40 | 0b01 | 1 | −1.0 | full-scale |
    | 0x00 | 0b00 | 0 | −2.0 | |
    @@ -59,15 +59,15 @@ For 8-bit audio, as mentioned above, 255 is full-scale, 128 is midpoint, 1 is ne

    | WAV | Sample | int | float | Comment |
    |------|-------------|-----|--------|------------|
    | 0xff | 0b1111_1111 | 255 | +1.000 | full-scale |
    | 0xfe | 0b1111_1110 | 254 | +0.992 | |
    | 0xfd | 0b1111_1101 | 253 | +0.984 | |
    | 0xFF | 0b1111_1111 | 255 | +1.000 | full-scale |
    | 0xFE | 0b1111_1110 | 254 | +0.992 | |
    | 0xFD | 0b1111_1101 | 253 | +0.984 | |
    | ... | ... | ... | ... | |
    | 0x82 | 0b1000_0010 | 130 | +0.016 | |
    | 0x81 | 0b1000_0001 | 129 | +0.008 | |
    | 0x80 | 0b1000_0000 | 128 | 0.000 | midpoint |
    | 0x7f | 0b0111_1111 | 127 | −0.008 | |
    | 0x7e | 0b0111_1110 | 126 | −0.016 | |
    | 0x7F | 0b0111_1111 | 127 | −0.008 | |
    | 0x7E | 0b0111_1110 | 126 | −0.016 | |
    | ... | ... | ... | ... | |
    | 0x03 | 0b0000_0011 | 3 | −0.984 | |
    | 0x02 | 0b0000_0010 | 2 | −0.992 | |
    @@ -80,15 +80,15 @@ For 16-bit audio, the interpretation is signed:

    | WAV | Sample | int | float | Comment |
    |--------|-----------------------|---------|----------|------------|
    | 0x7fff | 0b0111_1111_1111_1111 | +32,767 | +1.00000 | full-scale |
    | 0x7ffe | 0b0111_1111_1111_1110 | +32,766 | +0.99997 | |
    | 0x7ffd | 0b0111_1111_1111_1101 | +32,765 | +0.99994 | |
    | 0x7FFF | 0b0111_1111_1111_1111 | +32,767 | +1.00000 | full-scale |
    | 0x7FFE | 0b0111_1111_1111_1110 | +32,766 | +0.99997 | |
    | 0x7FFD | 0b0111_1111_1111_1101 | +32,765 | +0.99994 | |
    | ... | ... | ... | ... | |
    | 0x0002 | 0b0000_0000_0000_0010 | +2 | +0.00006 | |
    | 0x0001 | 0b0000_0000_0000_0001 | +1 | +0.00003 | |
    | 0x0000 | 0b0000_0000_0000_0000 | 0 | 0.00000 | midpoint |
    | 0xffff | 0b1111_1111_1111_1111 | −1 | −0.00003 | |
    | 0xfffe | 0b1111_1111_1111_1110 | −2 | −0.00006 | |
    | 0xFFFF | 0b1111_1111_1111_1111 | −1 | −0.00003 | |
    | 0xFFFE | 0b1111_1111_1111_1110 | −2 | −0.00006 | |
    | ... | ... | ... | ... | |
    | 0x8003 | 0b1000_0000_0000_0011 | −32,765 | −0.99994 | |
    | 0x8002 | 0b1000_0000_0000_0010 | −32,766 | −0.99997 | |
    @@ -99,15 +99,15 @@ As is 9-bit audio:

    | WAV | Sample | int | float | Comment |
    |--------|---------------|------|--------|------------|
    | 0x7f80 | 0b0111_1111_1 | +255 | +1.000 | full-scale |
    | 0x7f00 | 0b0111_1111_0 | +254 | +0.996 | |
    | 0x7e80 | 0b0111_1110_1 | +253 | +0.992 | |
    | 0x7F80 | 0b0111_1111_1 | +255 | +1.000 | full-scale |
    | 0x7F00 | 0b0111_1111_0 | +254 | +0.996 | |
    | 0x7E80 | 0b0111_1110_1 | +253 | +0.992 | |
    | ... | ... | ... | ... | |
    | 0x0100 | 0b0000_0001_0 | +2 | +0.008 | |
    | 0x0080 | 0b0000_0000_1 | +1 | +0.004 | |
    | 0x0000 | 0b0000_0000_0 | 0 | 0.000 | midpoint |
    | 0xff80 | 0b1111_1111_1 | −1 | −0.004 | |
    | 0xff00 | 0b1111_1111_0 | −2 | −0.008 | |
    | 0xFF80 | 0b1111_1111_1 | −1 | −0.004 | |
    | 0xFF00 | 0b1111_1111_0 | −2 | −0.008 | |
    | ... | ... | ... | ... | |
    | 0x8180 | 0b1000_0001_1 | −253 | −0.992 | |
    | 0x8100 | 0b1000_0001_0 | −254 | −0.996 | |
  5. endolith revised this gist May 2, 2020. 1 changed file with 21 additions and 2 deletions.
    23 changes: 21 additions & 2 deletions WAV interpretation.md
    Original file line number Diff line number Diff line change
    @@ -28,15 +28,15 @@ So, for example, for 16-bit audio, +32,767 and −32,767 would be the full-scale

    The midpoint example for 8-bit clarifies that the symmetry of unsigned data is the same as for signed data. So, for 8-bit data, the value 255 is positive full-scale, the value 1 is negative full-scale, and the value 0 exceeds full-scale.

    [Audio File Format Specifications](http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/WAVE/WAVE.html) says:
    [WAVE Audio File Format Specifications](http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/WAVE/WAVE.html) says:

    > For float data, full scale is 1.
    So, to correctly convert signed ints to float, divide by `2**(b-1) - 1`, where *b* is the number of bits.

    To correctly convert unsigned ints to float, subtract `2**(b-1)`, then, similarly, divide by `2**(b-1) - 1`.

    The float representation will then be limited to +1.0 in the positive direction, but can exceed −1.0 in the negative direction.
    The float representation will then be limited to +1.0 full-scale in the positive direction, but can exceed −1.0 full-scale in the negative direction.

    ## Examples

    @@ -94,3 +94,22 @@ For 16-bit audio, the interpretation is signed:
    | 0x8002 | 0b1000_0000_0000_0010 | −32,766 | −0.99997 | |
    | 0x8001 | 0b1000_0000_0000_0001 | −32,767 | −1.00000 | full-scale |
    | 0x8000 | 0b1000_0000_0000_0000 | −32,768 | −1.00003 | |

    As is 9-bit audio:

    | WAV | Sample | int | float | Comment |
    |--------|---------------|------|--------|------------|
    | 0x7f80 | 0b0111_1111_1 | +255 | +1.000 | full-scale |
    | 0x7f00 | 0b0111_1111_0 | +254 | +0.996 | |
    | 0x7e80 | 0b0111_1110_1 | +253 | +0.992 | |
    | ... | ... | ... | ... | |
    | 0x0100 | 0b0000_0001_0 | +2 | +0.008 | |
    | 0x0080 | 0b0000_0000_1 | +1 | +0.004 | |
    | 0x0000 | 0b0000_0000_0 | 0 | 0.000 | midpoint |
    | 0xff80 | 0b1111_1111_1 | −1 | −0.004 | |
    | 0xff00 | 0b1111_1111_0 | −2 | −0.008 | |
    | ... | ... | ... | ... | |
    | 0x8180 | 0b1000_0001_1 | −253 | −0.992 | |
    | 0x8100 | 0b1000_0001_0 | −254 | −0.996 | |
    | 0x8080 | 0b1000_0000_1 | −255 | −1.000 | full-scale |
    | 0x8000 | 0b1000_0000_0 | −256 | −1.004 | |
  6. endolith created this gist May 2, 2020.
    96 changes: 96 additions & 0 deletions WAV interpretation.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,96 @@
    # How to handle asymmetry of WAV data?

    WAV files can store PCM audio (WAVE_FORMAT_PCM). [The WAV file format specification](http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/WAVE/Docs/riffmci.pdf) says:

    > The data format and maximum and minimums values for PCM waveform samples of various sizes are as follows:
    > | Sample Size | Data Format | Maximum Value | Minimum Value |
    > | ----------------- | ------------------ | ----------------------------- | -------------------------- |
    > | One to eight bits | Unsigned integer | 255 (0xFF) | 0 |
    > | Nine or more bits | Signed integer *i* | Largest positive value of *i* | Most negative value of *i* |
    >
    > For example, the maximum, minimum, and midpoint values for 8-bit and 16-bit PCM waveform data are as follows:
    > | Format | Maximum Value | Minimum Value | Midpoint Value |
    > |------------|----------------|------------------|----------------|
    > | 8-bit PCM | 255 (0xFF) | 0 | 128 (0x80) |
    > | 16-bit PCM | 32767 (0x7FFF) | -32768 (-0x8000) | 0 |
    Both the signed and unsigned formats are asymmetrical. How to handle the asymmetry? The signed version is [two's complement](https://en.wikipedia.org/wiki/Two%27s_complement) representation, and [AES17](https://www.scribd.com/document/256170486/AES-17-1998-r2009-pdf) defines the meaning of full-scale amplitude in this case:

    > amplitude of a 997-Hz sine wave whose positive peak value reaches the positive digital full scale, leaving the negative maximum code unused.
    >
    > NOTE In 2's-complement representation, the negative peak is 1 LSB away from the negative maximum code.
    As does [IEC 61606-3](https://www.sis.se/api/document/preview/571704/):

    > amplitude of a 997 Hz sinusoid whose peak positive sample just reaches positive digital full-scale (in 2’s-complement a binary value of 0111…1111 to make up the word length) and whose peak negative sample just reaches a value one away from negative digital full-scale (1000…0001 to make up the word length) leaving the maximum negative code (1000…0000) unused
    So, for example, for 16-bit audio, +32,767 and −32,767 would be the full-scale values, while −32,768 *exceeds* full-scale.

    The midpoint example for 8-bit clarifies that the symmetry of unsigned data is the same as for signed data. So, for 8-bit data, the value 255 is positive full-scale, the value 1 is negative full-scale, and the value 0 exceeds full-scale.

    [Audio File Format Specifications](http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/WAVE/WAVE.html) says:

    > For float data, full scale is 1.
    So, to correctly convert signed ints to float, divide by `2**(b-1) - 1`, where *b* is the number of bits.

    To correctly convert unsigned ints to float, subtract `2**(b-1)`, then, similarly, divide by `2**(b-1) - 1`.

    The float representation will then be limited to +1.0 in the positive direction, but can exceed −1.0 in the negative direction.

    ## Examples

    ### Unsigned

    WAV format actually allows for less than 8 bits:

    > The bits that represent the sample amplitude are stored in the most significant bits of *i*, and the remaining bits are set to zero.
    So I'll show 2-bit audio first (wBitsPerSample = 2), because it's simpler to follow:

    | WAV | Sample | int | float | Comment |
    |------|--------|-----|-------|------------|
    | 0xc0 | 0b11 | 3 | +1.0 | full-scale |
    | 0x80 | 0b10 | 2 | 0.0 | midpoint |
    | 0x40 | 0b01 | 1 | −1.0 | full-scale |
    | 0x00 | 0b00 | 0 | −2.0 | |

    For 8-bit audio, as mentioned above, 255 is full-scale, 128 is midpoint, 1 is negative full-scale, and 0 exceeds full-scale:

    | WAV | Sample | int | float | Comment |
    |------|-------------|-----|--------|------------|
    | 0xff | 0b1111_1111 | 255 | +1.000 | full-scale |
    | 0xfe | 0b1111_1110 | 254 | +0.992 | |
    | 0xfd | 0b1111_1101 | 253 | +0.984 | |
    | ... | ... | ... | ... | |
    | 0x82 | 0b1000_0010 | 130 | +0.016 | |
    | 0x81 | 0b1000_0001 | 129 | +0.008 | |
    | 0x80 | 0b1000_0000 | 128 | 0.000 | midpoint |
    | 0x7f | 0b0111_1111 | 127 | −0.008 | |
    | 0x7e | 0b0111_1110 | 126 | −0.016 | |
    | ... | ... | ... | ... | |
    | 0x03 | 0b0000_0011 | 3 | −0.984 | |
    | 0x02 | 0b0000_0010 | 2 | −0.992 | |
    | 0x01 | 0b0000_0001 | 1 | −1.000 | full-scale |
    | 0x00 | 0b0000_0000 | 0 | −1.008 | |

    ### Signed

    For 16-bit audio, the interpretation is signed:

    | WAV | Sample | int | float | Comment |
    |--------|-----------------------|---------|----------|------------|
    | 0x7fff | 0b0111_1111_1111_1111 | +32,767 | +1.00000 | full-scale |
    | 0x7ffe | 0b0111_1111_1111_1110 | +32,766 | +0.99997 | |
    | 0x7ffd | 0b0111_1111_1111_1101 | +32,765 | +0.99994 | |
    | ... | ... | ... | ... | |
    | 0x0002 | 0b0000_0000_0000_0010 | +2 | +0.00006 | |
    | 0x0001 | 0b0000_0000_0000_0001 | +1 | +0.00003 | |
    | 0x0000 | 0b0000_0000_0000_0000 | 0 | 0.00000 | midpoint |
    | 0xffff | 0b1111_1111_1111_1111 | −1 | −0.00003 | |
    | 0xfffe | 0b1111_1111_1111_1110 | −2 | −0.00006 | |
    | ... | ... | ... | ... | |
    | 0x8003 | 0b1000_0000_0000_0011 | −32,765 | −0.99994 | |
    | 0x8002 | 0b1000_0000_0000_0010 | −32,766 | −0.99997 | |
    | 0x8001 | 0b1000_0000_0000_0001 | −32,767 | −1.00000 | full-scale |
    | 0x8000 | 0b1000_0000_0000_0000 | −32,768 | −1.00003 | |