Last active
June 27, 2024 17:02
Revisions
-
endolith revised this gist
May 18, 2020 . 1 changed file with 4 additions and 4 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -51,7 +51,7 @@ So I'll show 2-bit audio first (wBitsPerSample = 2), because it's simpler to fol | WAV | Sample | int | float | Comment | |------|--------|-----|-------|------------| | 0xC0 | 0b11 | 3 | +1.0 | full-scale | | 0x80 | 0b10 | 2 | 0.0 | midpoint | | 0x40 | 0b01 | 1 | −1.0 | full-scale | | 0x00 | 0b00 | 0 | −2.0 | | @@ -65,7 +65,7 @@ For 8-bit audio, as mentioned above, 255 is full-scale, 128 is midpoint, 1 is ne | ... | ... | ... | ... | | | 0x82 | 0b1000_0010 | 130 | +0.016 | | | 0x81 | 0b1000_0001 | 129 | +0.008 | | | 0x80 | 0b1000_0000 | 128 | 0.000 | midpoint | | 0x7F | 0b0111_1111 | 127 | −0.008 | | | 0x7E | 0b0111_1110 | 126 | −0.016 | | | ... | ... | ... | ... | | @@ -86,7 +86,7 @@ For 16-bit audio, the interpretation is signed: | ... | ... | ... | ... | | | 0x0002 | 0b0000_0000_0000_0010 | +2 | +0.00006 | | | 0x0001 | 0b0000_0000_0000_0001 | +1 | +0.00003 | | | 0x0000 | 0b0000_0000_0000_0000 | 0 | 0.00000 | midpoint | | 0xFFFF | 0b1111_1111_1111_1111 | −1 | −0.00003 | | | 0xFFFE | 0b1111_1111_1111_1110 | −2 | −0.00006 | | | ... | ... | ... | ... | | @@ -105,7 +105,7 @@ As is 9-bit audio: | ... | ... | ... | ... | | | 0x0100 | 0b0000_0001_0 | +2 | +0.008 | | | 0x0080 | 0b0000_0000_1 | +1 | +0.004 | | | 0x0000 | 0b0000_0000_0 | 0 | 0.000 | midpoint | | 0xFF80 | 0b1111_1111_1 | −1 | −0.004 | | | 0xFF00 | 0b1111_1111_0 | −2 | −0.008 | | | ... | ... | ... | ... | | -
endolith revised this gist
May 4, 2020 . 1 changed file with 2 additions and 2 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -24,9 +24,9 @@ As does [IEC 61606-3](https://www.sis.se/api/document/preview/571704/): > amplitude of a 997 Hz sinusoid whose peak positive sample just reaches positive digital full-scale (in 2’s-complement a binary value of 0111…1111 to make up the word length) and whose peak negative sample just reaches a value one away from negative digital full-scale (1000…0001 to make up the word length) leaving the maximum negative code (1000…0000) unused So, for example, for 16-bit audio, a signal that just reaches +32,767 and −32,767 would be full-scale, while one that reaches −32,768 *exceeds* full-scale. The midpoint example for 8-bit clarifies that the symmetry of unsigned data is the same as for signed data. So, for 8-bit data, a signal that reaches from 1 to 255 would be full-scale, and the value 0 exceeds full-scale. [WAVE Audio File Format Specifications](http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/WAVE/WAVE.html) says: -
endolith revised this gist
May 4, 2020 . No changes.There are no files selected for viewing
-
endolith revised this gist
May 2, 2020 . 1 changed file with 16 additions and 16 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -50,7 +50,7 @@ So I'll show 2-bit audio first (wBitsPerSample = 2), because it's simpler to fol | WAV | Sample | int | float | Comment | |------|--------|-----|-------|------------| | 0xC0 | 0b11 | 3 | +1.0 | full-scale | | 0x80 | 0b10 | 2 | 0.0 | midpoint | | 0x40 | 0b01 | 1 | −1.0 | full-scale | | 0x00 | 0b00 | 0 | −2.0 | | @@ -59,15 +59,15 @@ For 8-bit audio, as mentioned above, 255 is full-scale, 128 is midpoint, 1 is ne | WAV | Sample | int | float | Comment | |------|-------------|-----|--------|------------| | 0xFF | 0b1111_1111 | 255 | +1.000 | full-scale | | 0xFE | 0b1111_1110 | 254 | +0.992 | | | 0xFD | 0b1111_1101 | 253 | +0.984 | | | ... | ... | ... | ... | | | 0x82 | 0b1000_0010 | 130 | +0.016 | | | 0x81 | 0b1000_0001 | 129 | +0.008 | | | 0x80 | 0b1000_0000 | 128 | 0.000 | midpoint | | 0x7F | 0b0111_1111 | 127 | −0.008 | | | 0x7E | 0b0111_1110 | 126 | −0.016 | | | ... | ... | ... | ... | | | 0x03 | 0b0000_0011 | 3 | −0.984 | | | 0x02 | 0b0000_0010 | 2 | −0.992 | | @@ -80,15 +80,15 @@ For 16-bit audio, the interpretation is signed: | WAV | Sample | int | float | Comment | |--------|-----------------------|---------|----------|------------| | 0x7FFF | 0b0111_1111_1111_1111 | +32,767 | +1.00000 | full-scale | | 0x7FFE | 0b0111_1111_1111_1110 | +32,766 | +0.99997 | | | 0x7FFD | 0b0111_1111_1111_1101 | +32,765 | +0.99994 | | | ... | ... | ... | ... | | | 0x0002 | 0b0000_0000_0000_0010 | +2 | +0.00006 | | | 0x0001 | 0b0000_0000_0000_0001 | +1 | +0.00003 | | | 0x0000 | 0b0000_0000_0000_0000 | 0 | 0.00000 | midpoint | | 0xFFFF | 0b1111_1111_1111_1111 | −1 | −0.00003 | | | 0xFFFE | 0b1111_1111_1111_1110 | −2 | −0.00006 | | | ... | ... | ... | ... | | | 0x8003 | 0b1000_0000_0000_0011 | −32,765 | −0.99994 | | | 0x8002 | 0b1000_0000_0000_0010 | −32,766 | −0.99997 | | @@ -99,15 +99,15 @@ As is 9-bit audio: | WAV | Sample | int | float | Comment | |--------|---------------|------|--------|------------| | 0x7F80 | 0b0111_1111_1 | +255 | +1.000 | full-scale | | 0x7F00 | 0b0111_1111_0 | +254 | +0.996 | | | 0x7E80 | 0b0111_1110_1 | +253 | +0.992 | | | ... | ... | ... | ... | | | 0x0100 | 0b0000_0001_0 | +2 | +0.008 | | | 0x0080 | 0b0000_0000_1 | +1 | +0.004 | | | 0x0000 | 0b0000_0000_0 | 0 | 0.000 | midpoint | | 0xFF80 | 0b1111_1111_1 | −1 | −0.004 | | | 0xFF00 | 0b1111_1111_0 | −2 | −0.008 | | | ... | ... | ... | ... | | | 0x8180 | 0b1000_0001_1 | −253 | −0.992 | | | 0x8100 | 0b1000_0001_0 | −254 | −0.996 | | -
endolith revised this gist
May 2, 2020 . 1 changed file with 21 additions and 2 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -28,15 +28,15 @@ So, for example, for 16-bit audio, +32,767 and −32,767 would be the full-scale The midpoint example for 8-bit clarifies that the symmetry of unsigned data is the same as for signed data. So, for 8-bit data, the value 255 is positive full-scale, the value 1 is negative full-scale, and the value 0 exceeds full-scale. [WAVE Audio File Format Specifications](http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/WAVE/WAVE.html) says: > For float data, full scale is 1. So, to correctly convert signed ints to float, divide by `2**(b-1) - 1`, where *b* is the number of bits. To correctly convert unsigned ints to float, subtract `2**(b-1)`, then, similarly, divide by `2**(b-1) - 1`. The float representation will then be limited to +1.0 full-scale in the positive direction, but can exceed −1.0 full-scale in the negative direction. ## Examples @@ -94,3 +94,22 @@ For 16-bit audio, the interpretation is signed: | 0x8002 | 0b1000_0000_0000_0010 | −32,766 | −0.99997 | | | 0x8001 | 0b1000_0000_0000_0001 | −32,767 | −1.00000 | full-scale | | 0x8000 | 0b1000_0000_0000_0000 | −32,768 | −1.00003 | | As is 9-bit audio: | WAV | Sample | int | float | Comment | |--------|---------------|------|--------|------------| | 0x7f80 | 0b0111_1111_1 | +255 | +1.000 | full-scale | | 0x7f00 | 0b0111_1111_0 | +254 | +0.996 | | | 0x7e80 | 0b0111_1110_1 | +253 | +0.992 | | | ... | ... | ... | ... | | | 0x0100 | 0b0000_0001_0 | +2 | +0.008 | | | 0x0080 | 0b0000_0000_1 | +1 | +0.004 | | | 0x0000 | 0b0000_0000_0 | 0 | 0.000 | midpoint | | 0xff80 | 0b1111_1111_1 | −1 | −0.004 | | | 0xff00 | 0b1111_1111_0 | −2 | −0.008 | | | ... | ... | ... | ... | | | 0x8180 | 0b1000_0001_1 | −253 | −0.992 | | | 0x8100 | 0b1000_0001_0 | −254 | −0.996 | | | 0x8080 | 0b1000_0000_1 | −255 | −1.000 | full-scale | | 0x8000 | 0b1000_0000_0 | −256 | −1.004 | | -
endolith created this gist
May 2, 2020 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,96 @@ # How to handle asymmetry of WAV data? WAV files can store PCM audio (WAVE_FORMAT_PCM). [The WAV file format specification](http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/WAVE/Docs/riffmci.pdf) says: > The data format and maximum and minimums values for PCM waveform samples of various sizes are as follows: > | Sample Size | Data Format | Maximum Value | Minimum Value | > | ----------------- | ------------------ | ----------------------------- | -------------------------- | > | One to eight bits | Unsigned integer | 255 (0xFF) | 0 | > | Nine or more bits | Signed integer *i* | Largest positive value of *i* | Most negative value of *i* | > > For example, the maximum, minimum, and midpoint values for 8-bit and 16-bit PCM waveform data are as follows: > | Format | Maximum Value | Minimum Value | Midpoint Value | > |------------|----------------|------------------|----------------| > | 8-bit PCM | 255 (0xFF) | 0 | 128 (0x80) | > | 16-bit PCM | 32767 (0x7FFF) | -32768 (-0x8000) | 0 | Both the signed and unsigned formats are asymmetrical. How to handle the asymmetry? The signed version is [two's complement](https://en.wikipedia.org/wiki/Two%27s_complement) representation, and [AES17](https://www.scribd.com/document/256170486/AES-17-1998-r2009-pdf) defines the meaning of full-scale amplitude in this case: > amplitude of a 997-Hz sine wave whose positive peak value reaches the positive digital full scale, leaving the negative maximum code unused. > > NOTE In 2's-complement representation, the negative peak is 1 LSB away from the negative maximum code. As does [IEC 61606-3](https://www.sis.se/api/document/preview/571704/): > amplitude of a 997 Hz sinusoid whose peak positive sample just reaches positive digital full-scale (in 2’s-complement a binary value of 0111…1111 to make up the word length) and whose peak negative sample just reaches a value one away from negative digital full-scale (1000…0001 to make up the word length) leaving the maximum negative code (1000…0000) unused So, for example, for 16-bit audio, +32,767 and −32,767 would be the full-scale values, while −32,768 *exceeds* full-scale. The midpoint example for 8-bit clarifies that the symmetry of unsigned data is the same as for signed data. So, for 8-bit data, the value 255 is positive full-scale, the value 1 is negative full-scale, and the value 0 exceeds full-scale. [Audio File Format Specifications](http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/WAVE/WAVE.html) says: > For float data, full scale is 1. So, to correctly convert signed ints to float, divide by `2**(b-1) - 1`, where *b* is the number of bits. To correctly convert unsigned ints to float, subtract `2**(b-1)`, then, similarly, divide by `2**(b-1) - 1`. The float representation will then be limited to +1.0 in the positive direction, but can exceed −1.0 in the negative direction. ## Examples ### Unsigned WAV format actually allows for less than 8 bits: > The bits that represent the sample amplitude are stored in the most significant bits of *i*, and the remaining bits are set to zero. So I'll show 2-bit audio first (wBitsPerSample = 2), because it's simpler to follow: | WAV | Sample | int | float | Comment | |------|--------|-----|-------|------------| | 0xc0 | 0b11 | 3 | +1.0 | full-scale | | 0x80 | 0b10 | 2 | 0.0 | midpoint | | 0x40 | 0b01 | 1 | −1.0 | full-scale | | 0x00 | 0b00 | 0 | −2.0 | | For 8-bit audio, as mentioned above, 255 is full-scale, 128 is midpoint, 1 is negative full-scale, and 0 exceeds full-scale: | WAV | Sample | int | float | Comment | |------|-------------|-----|--------|------------| | 0xff | 0b1111_1111 | 255 | +1.000 | full-scale | | 0xfe | 0b1111_1110 | 254 | +0.992 | | | 0xfd | 0b1111_1101 | 253 | +0.984 | | | ... | ... | ... | ... | | | 0x82 | 0b1000_0010 | 130 | +0.016 | | | 0x81 | 0b1000_0001 | 129 | +0.008 | | | 0x80 | 0b1000_0000 | 128 | 0.000 | midpoint | | 0x7f | 0b0111_1111 | 127 | −0.008 | | | 0x7e | 0b0111_1110 | 126 | −0.016 | | | ... | ... | ... | ... | | | 0x03 | 0b0000_0011 | 3 | −0.984 | | | 0x02 | 0b0000_0010 | 2 | −0.992 | | | 0x01 | 0b0000_0001 | 1 | −1.000 | full-scale | | 0x00 | 0b0000_0000 | 0 | −1.008 | | ### Signed For 16-bit audio, the interpretation is signed: | WAV | Sample | int | float | Comment | |--------|-----------------------|---------|----------|------------| | 0x7fff | 0b0111_1111_1111_1111 | +32,767 | +1.00000 | full-scale | | 0x7ffe | 0b0111_1111_1111_1110 | +32,766 | +0.99997 | | | 0x7ffd | 0b0111_1111_1111_1101 | +32,765 | +0.99994 | | | ... | ... | ... | ... | | | 0x0002 | 0b0000_0000_0000_0010 | +2 | +0.00006 | | | 0x0001 | 0b0000_0000_0000_0001 | +1 | +0.00003 | | | 0x0000 | 0b0000_0000_0000_0000 | 0 | 0.00000 | midpoint | | 0xffff | 0b1111_1111_1111_1111 | −1 | −0.00003 | | | 0xfffe | 0b1111_1111_1111_1110 | −2 | −0.00006 | | | ... | ... | ... | ... | | | 0x8003 | 0b1000_0000_0000_0011 | −32,765 | −0.99994 | | | 0x8002 | 0b1000_0000_0000_0010 | −32,766 | −0.99997 | | | 0x8001 | 0b1000_0000_0000_0001 | −32,767 | −1.00000 | full-scale | | 0x8000 | 0b1000_0000_0000_0000 | −32,768 | −1.00003 | |