Skip to content

Add description of audio sample types #256

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 27, 2021
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
184 changes: 165 additions & 19 deletions index.src.html
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,20 @@
main > dl > dd {
margin-bottom: 1em;
}

table {
width: 100%;
}

table#sample-types td, table#sample-types th {
text-align: center;
}

table#sample-types .even {
background-color: lightgrey;
}


</style>


Expand Down Expand Up @@ -2074,7 +2088,7 @@
:: The sample-rate, in Hz, for this {{AudioData}}.

: <dfn attribute for=AudioData>[[number of frames]]</dfn>
:: The number of frames (samples per channel) for this {{AudioData}}.
:: The number of [=frames=] for this {{AudioData}}.

: <dfn attribute for=AudioData>[[number of channels]]</dfn>
:: The number of audio channels for this {{AudioData}}.
Expand Down Expand Up @@ -2119,7 +2133,7 @@
{{AudioData/[[sample rate]]}}.

: <dfn attribute for=AudioData>numberOfFrames</dfn>
:: The number of frames (samples per channel) for this {{AudioData}}.
:: The number of [=frames=] for this {{AudioData}}.

The {{AudioData/numberOfFrames}} getter steps are to return
{{AudioData/[[number of frames]]}}.
Expand Down Expand Up @@ -2176,8 +2190,8 @@
6. Let |planeFrames| be the region of |resource| corresponding to
|options|.{{AudioDataCopyToOptions/planeIndex}}.
7. Copy elements of |planeFrames| into |destination|, starting with the
frame positioned at |options|.{{AudioDataCopyToOptions/frameOffset}}
and stopping after |copyElementCount| elements have been copied.
[=frame=] positioned at |options|.{{AudioDataCopyToOptions/frameOffset}}
and stopping after |copyElementCount| samples have been copied.

: <dfn method for=AudioData>clone()</dfn>
:: Creates a new AudioData with a reference to the same [=media resource=].
Expand Down Expand Up @@ -2246,51 +2260,183 @@
:: The index identifying the plane to copy from.

: <dfn dict-member for=AudioDataCopyToOptions>frameOffset</dfn>
:: An offset into the source plane data indicating which frame to begin
:: An offset into the source plane data indicating which [=frame=] to begin
copying from. Defaults to `0`.

: <dfn dict-member for=AudioDataCopyToOptions>frameCount</dfn>
:: The number of frames to copy. If not provided, the copy will include all
frames in the plane beginning with {{AudioDataCopyToOptions/frameOffset}}.
:: The number of [=frames=] to copy. If not provided, the copy will include all
[=frames=] in the plane beginning with {{AudioDataCopyToOptions/frameOffset}}.

Audio Sample Format{#audio-sample-format}
-----------------------------------------
Audo sample formats describe the numeric type used to represent a single
sample (e.g. 32-bit floating point) and the arrangement of samples from
different channels as either interleaved or planar.
## Audio Sample Format ##{#audio-sample-formats}

An audio sample format describes the numeric type used to represent a
single sample (e.g. 32-bit floating point) and the arrangement of samples from
different channels as either [=interleaved=] or [=planar=]. The <dfn>audio
sample type</dfn> refers solely to the numeric type and interval used to store
the data, this is {{U8}}, {{S16}}, {{S24}}, {{S32}}, or {{FLT}} for respectively
unsigned 8-bits, signed 16-bits, signed 32-bits, signed 32-bits, and 32-bits
floating point number. The [[#audio-buffer-arrangement|audio buffer
arrangement]] refers solely to the way the samples are laid out in memory
([=planar=] or [=interleaved=]).

A <dfn>sample</dfn> refers to a single value that is the magnitude of a
signal at a particular point in time in a particular channel.

A <dfn>frame</dfn> or (sample-frame) refers to a set of values of all channels
of a multi-channel signal, that happen at the exact same time.

Note: Consequently if an audio signal is mono (has only one channel), a frame
and a sample refer to the same thing.

All audio [=samples=] in this specification are using linear pulse-code
modulation (Linear PCM): quantization levels are uniform between values.

Note: The Web Audio API, that is expected to be used with this specificaion,
also uses Linear PCM.

<xmp class='idl'>
enum AudioSampleFormat {
"U8",
"S16",
"S24",
"S32",
"FLT",
"U8P",
"S16P",
"S24P",
"S32P",
"FLTP",
};
</xmp>

: <dfn enum-value for=AudioSampleFormat>U8</dfn>
:: 8-bit unsigned integer samples with interleaved channel arrangement.
:: [[WEBIDL#idl-octet|8-bit unsigned integer]] [=samples=] with [=interleaved=] [[#audio-buffer-arrangement|channel arrangement]].

: <dfn enum-value for=AudioSampleFormat>S16</dfn>
:: 16-bit signed integer samples with interleaved channel arrangement.
:: [[WEBIDL#idl-short|16-bit signed integer]] [=samples=] with [=interleaved=] [[#audio-buffer-arrangement|channel arrangement]].

: <dfn enum-value for=AudioSampleFormat>S24</dfn>
:: [[WEBIDL#idl-long|32-bit signed integer]] [=samples=] with [=interleaved=] [[#audio-buffer-arrangement|channel arrangement]], holding value in the 24-bit of lowest significance.

: <dfn enum-value for=AudioSampleFormat>S32</dfn>
:: 32-bit signed integer samples with interleaved channel arrangement.
:: [[WEBIDL#idl-long|32-bit signed integer]] [=samples=] with [=interleaved=] [[#audio-buffer-arrangement|channel arrangement]].

: <dfn enum-value for=AudioSampleFormat>FLT</dfn>
:: 32-bit float samples with interleaved channel arrangement.
:: [[WEBIDL#idl-float|32-bit float]] [=samples=] with [=interleaved=] [[#audio-buffer-arrangement|channel arrangement]].

: <dfn enum-value for=AudioSampleFormat>U8P</dfn>
:: [[WEBIDL#idl-octet|8-bit unsigned integer]] [=samples=] with [=planar=] [[#audio-buffer-arrangement|channel arrangement]].

: <dfn enum-value for=AudioSampleFormat>S16P</dfn>
:: 16-bit signed integer samples with planar channel arrangement.
:: [[WEBIDL#idl-short|16-bit signed integer]] [=samples=] with [=planar=] [[#audio-buffer-arrangement|channel arrangement]].

: <dfn enum-value for=AudioSampleFormat>S24P</dfn>
:: [[WEBIDL#idl-long|32-bit signed integer]] [=samples=] with [=planar=] [[#audio-buffer-arrangement|channel arrangement]], holding value in the 24-bit of lowest significance.

: <dfn enum-value for=AudioSampleFormat>S32P</dfn>
:: 32-bit signed integer samples with planar channel arrangement.
:: [[WEBIDL#idl-long|32-bit signed integer]] [=samples=] with [=planar=] [[#audio-buffer-arrangement|channel arrangement]].

: <dfn enum-value for=AudioSampleFormat>FLTP</dfn>
:: 32-bit float samples with planar channel arrangement.
:: [[WEBIDL#idl-float|32-bit float]] [=samples=] with [=planar=] [[#audio-buffer-arrangement|channel arrangement]].


### Arrangement of audio buffer ### {#audio-buffer-arrangement}

When an {{AudioData}} has an {{AudioSampleFormat}} that is
<dfn>interleaved</dfn>, the audio samples from different channels are laid out
consecutively in the same buffer, in the order described in the section
[[#audio-channel-ordering]]. The {{AudioData}} has a single plane, that contains a
number of elements therefore equal to {{AudioData/numberOfFrames}} *
{{AudioData/numberOfChannels}}.

When an {{AudioData}} has an {{AudioSampleFormat}} that is
<dfn>planar</dfn>, the audio samples from different channels are laid out
in different buffers, themselves arranged in an order described in the section
[[#audio-channel-ordering]]. The {{AudioData}} has a number of planes equal to the
{{AudioData}}'s {{AudioData/numberOfChannels}}. Each plane contains
{{AudioData/numberOfFrames}} elements.

Note: The [[WEBAUDIO|Web Audio API]] currently uses {{FLTP}} exclusively.

### Magnitude of the audio samples ### {#audio-samples-magnitude}

The <dfn>minimum value</dfn> and <dfn>maximum value</dfn> of an audio sample,
for a particular audio sample type, are the values below which
(respectively above which) audio clipping might occur. They are otherwise regular
types, that can hold values outside this interval during intermeditate
processing.

The <dfn>bias value</dfn> for an audio sample type is the value that often
corresponds to the middle of the range (but often the range is not symmetrical).
An audio buffer comprised only of values equal to the [=bias value=] is silent.

<table id="sample-types">
<thead>
<tr class="header">
<th>[=Audio sample type|Sample type=]</th>
<th>IDL type</th>
<th>[=Minimum value=]</th>
<th>[=Bias value=]</th>
<th>[=Maximum value=]</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>{{U8}}</td>
<td>[[WEBIDL#idl-octet|octet]]</td>
<td>0</td>
<td>128</td>
<td>+255</td>
</tr>
<tr class="even">
<td>{{S16}}</td>
<td>[[WEBIDL#idl-short|short]]</td>
<td>-32768</td>
<td>0</td>
<td>+32767</td>
</tr>
<tr class="odd">
<td>{{S24}}</td>
<td>[[WEBIDL#idl-long|long]]</td>
<td>-8388608</td>
<td>0</td>
<td>+8388607</td>
</tr>
<tr class="even">
<td>{{S32}}</td>
<td>[[WEBIDL#idl-long|long]]</td>
<td>-2147483648</td>
<td>0</td>
<td>+2147483647</td>
</tr>
<tr class="odd">
<td>{{FLT}}</td>
<td>[[WEBIDL#idl-float|float]]</td>
<td>-1.0</td>
<td>0.0</td>
<td>+1.0</td>
</tr>
</tbody>
</table>

Note: There is no data type that can hold 24 bits of information conveniently,
but audio content using 24-bit samples is common, so 32-bits integers are
commonly used to hold 24-bit content.

### Audio channel ordering ### {#audio-channel-ordering}

When decoding, the ordering of the audio channels in the resulting {{AudioData}}
MUST be the same as what is present in the {{EncodedAudioChunk}}.

When encoding, the ordering of the audio channels in the resulting
{{EncodedAudioChunk}} MUST be the same as what is preset in the given
{{AudioData}};

In other terms, no channel reordering is performed when encoding and decoding.

Note: The container either implies or specifies the channel mapping: the
channel attributed to a particular channel index.


VideoFrame Interface {#videoframe-interface}
--------------------------------------------
Expand Down