Tree - source-git/mingw-libvorbis

source-git / mingw-libvorbis

Blame doc/01-introduction.tex

Blob History Raw

Packit	06404a	`% -- mode: latex; TeX-master: "Vorbis_I_spec"; --`
Packit	06404a	`%!TEX root = Vorbis_I_spec.tex`
Packit	06404a	`% $Id$`
Packit	06404a	`\section{Introduction and Description} \label{vorbis:spec:intro}`
Packit	06404a
Packit	06404a	`\subsection{Overview}`
Packit	06404a
Packit	06404a	`This document provides a high level description of the Vorbis codec's`
Packit	06404a	`construction. A bit-by-bit specification appears beginning in`
Packit	06404a	`\xref{vorbis:spec:codec}.`
Packit	06404a	`The later sections assume a high-level`
Packit	06404a	`understanding of the Vorbis decode process, which is`
Packit	06404a	`provided here.`
Packit	06404a
Packit	06404a	`\subsubsection{Application}`
Packit	06404a	`Vorbis is a general purpose perceptual audio CODEC intended to allow`
Packit	06404a	`maximum encoder flexibility, thus allowing it to scale competitively`
Packit	06404a	`over an exceptionally wide range of bitrates. At the high`
Packit	06404a	`quality/bitrate end of the scale (CD or DAT rate stereo, 16/24 bits)`
Packit	06404a	`it is in the same league as MPEG-2 and MPC. Similarly, the 1.0`
Packit	06404a	`encoder can encode high-quality CD and DAT rate stereo at below 48kbps`
Packit	06404a	`without resampling to a lower rate. Vorbis is also intended for`
Packit	06404a	`lower and higher sample rates (from 8kHz telephony to 192kHz digital`
Packit	06404a	`masters) and a range of channel representations (monaural,`
Packit	06404a	`polyphonic, stereo, quadraphonic, 5.1, ambisonic, or up to 255`
Packit	06404a	`discrete channels).`
Packit	06404a
Packit	06404a
Packit	06404a	`\subsubsection{Classification}`
Packit	06404a	`Vorbis I is a forward-adaptive monolithic transform CODEC based on the`
Packit	06404a	`Modified Discrete Cosine Transform. The codec is structured to allow`
Packit	06404a	`addition of a hybrid wavelet filterbank in Vorbis II to offer better`
Packit	06404a	`transient response and reproduction using a transform better suited to`
Packit	06404a	`localized time events.`
Packit	06404a
Packit	06404a
Packit	06404a	`\subsubsection{Assumptions}`
Packit	06404a
Packit	06404a	`The Vorbis CODEC design assumes a complex, psychoacoustically-aware`
Packit	06404a	`encoder and simple, low-complexity decoder. Vorbis decode is`
Packit	06404a	`computationally simpler than mp3, although it does require more`
Packit	06404a	`working memory as Vorbis has no static probability model; the vector`
Packit	06404a	`codebooks used in the first stage of decoding from the bitstream are`
Packit	06404a	`packed in their entirety into the Vorbis bitstream headers. In`
Packit	06404a	`packed form, these codebooks occupy only a few kilobytes; the extent`
Packit	06404a	`to which they are pre-decoded into a cache is the dominant factor in`
Packit	06404a	`decoder memory usage.`
Packit	06404a
Packit	06404a
Packit	06404a	`Vorbis provides none of its own framing, synchronization or protection`
Packit	06404a	`against errors; it is solely a method of accepting input audio,`
Packit	06404a	`dividing it into individual frames and compressing these frames into`
Packit	06404a	`raw, unformatted 'packets'. The decoder then accepts these raw`
Packit	06404a	`packets in sequence, decodes them, synthesizes audio frames from`
Packit	06404a	`them, and reassembles the frames into a facsimile of the original`
Packit	06404a	`audio stream. Vorbis is a free-form variable bit rate (VBR) codec and packets have no`
Packit	06404a	`minimum size, maximum size, or fixed/expected size. Packets`
Packit	06404a	`are designed that they may be truncated (or padded) and remain`
Packit	06404a	`decodable; this is not to be considered an error condition and is used`
Packit	06404a	`extensively in bitrate management in peeling. Both the transport`
Packit	06404a	`mechanism and decoder must allow that a packet may be any size, or`
Packit	06404a	`end before or after packet decode expects.`
Packit	06404a
Packit	06404a	`Vorbis packets are thus intended to be used with a transport mechanism`
Packit	06404a	`that provides free-form framing, sync, positioning and error correction`
Packit	06404a	`in accordance with these design assumptions, such as Ogg (for file`
Packit	06404a	`transport) or RTP (for network multicast). For purposes of a few`
Packit	06404a	`examples in this document, we will assume that Vorbis is to be`
Packit	06404a	`embedded in an Ogg stream specifically, although this is by no means a`
Packit	06404a	`requirement or fundamental assumption in the Vorbis design.`
Packit	06404a
Packit	06404a	`The specification for embedding Vorbis into`
Packit	06404a	`an Ogg transport stream is in \xref{vorbis:over:ogg}.`
Packit	06404a
Packit	06404a
Packit	06404a
Packit	06404a	`\subsubsection{Codec Setup and Probability Model}`
Packit	06404a
Packit	06404a	`Vorbis' heritage is as a research CODEC and its current design`
Packit	06404a	`reflects a desire to allow multiple decades of continuous encoder`
Packit	06404a	`improvement before running out of room within the codec specification.`
Packit	06404a	`For these reasons, configurable aspects of codec setup intentionally`
Packit	06404a	`lean toward the extreme of forward adaptive.`
Packit	06404a
Packit	06404a	`The single most controversial design decision in Vorbis (and the most`
Packit	06404a	`unusual for a Vorbis developer to keep in mind) is that the entire`
Packit	06404a	`probability model of the codec, the Huffman and VQ codebooks, is`
Packit	06404a	`packed into the bitstream header along with extensive CODEC setup`
Packit	06404a	`parameters (often several hundred fields). This makes it impossible,`
Packit	06404a	`as it would be with MPEG audio layers, to embed a simple frame type`
Packit	06404a	`flag in each audio packet, or begin decode at any frame in the stream`
Packit	06404a	`without having previously fetched the codec setup header.`
Packit	06404a
Packit	06404a
Packit	06404a	`\begin{note}`
Packit	06404a	`Vorbis \emph{can} initiate decode at any arbitrary packet within a`
Packit	06404a	`bitstream so long as the codec has been initialized/setup with the`
Packit	06404a	`setup headers.`
Packit	06404a	`\end{note}`
Packit	06404a
Packit	06404a	`Thus, Vorbis headers are both required for decode to begin and`
Packit	06404a	`relatively large as bitstream headers go. The header size is`
Packit	06404a	`unbounded, although for streaming a rule-of-thumb of 4kB or less is`
Packit	06404a	`recommended (and Xiph.Org's Vorbis encoder follows this suggestion).`
Packit	06404a
Packit	06404a	`Our own design work indicates the primary liability of the`
Packit	06404a	`required header is in mindshare; it is an unusual design and thus`
Packit	06404a	`causes some amount of complaint among engineers as this runs against`
Packit	06404a	`current design trends (and also points out limitations in some`
Packit	06404a	`existing software/interface designs, such as Windows' ACM codec`
Packit	06404a	`framework). However, we find that it does not fundamentally limit`
Packit	06404a	`Vorbis' suitable application space.`
Packit	06404a
Packit	06404a
Packit	06404a	`\subsubsection{Format Specification}`
Packit	06404a	`The Vorbis format is well-defined by its decode specification; any`
Packit	06404a	`encoder that produces packets that are correctly decoded by the`
Packit	06404a	`reference Vorbis decoder described below may be considered a proper`
Packit	06404a	`Vorbis encoder. A decoder must faithfully and completely implement`
Packit	06404a	`the specification defined below (except where noted) to be considered`
Packit	06404a	`a proper Vorbis decoder.`
Packit	06404a
Packit	06404a	`\subsubsection{Hardware Profile}`
Packit	06404a	`Although Vorbis decode is computationally simple, it may still run`
Packit	06404a	`into specific limitations of an embedded design. For this reason,`
Packit	06404a	`embedded designs are allowed to deviate in limited ways from the`
Packit	06404a	`full' decode specification yet still be certified compliant. These
Packit	06404a	`optional omissions are labelled in the spec where relevant.`
Packit	06404a
Packit	06404a
Packit	06404a	`\subsection{Decoder Configuration}`
Packit	06404a
Packit	06404a	`Decoder setup consists of configuration of multiple, self-contained`
Packit	06404a	`component abstractions that perform specific functions in the decode`
Packit	06404a	`pipeline. Each different component instance of a specific type is`
Packit	06404a	`semantically interchangeable; decoder configuration consists both of`
Packit	06404a	`internal component configuration, as well as arrangement of specific`
Packit	06404a	`instances into a decode pipeline. Componentry arrangement is roughly`
Packit	06404a	`as follows:`
Packit	06404a
Packit	06404a	`\begin{center}`
Packit	06404a	`\includegraphics[width=\textwidth]{components}`
Packit	06404a	`\captionof{figure}{decoder pipeline configuration}`
Packit	06404a	`\end{center}`
Packit	06404a
Packit	06404a	`\subsubsection{Global Config}`
Packit	06404a	`Global codec configuration consists of a few audio related fields`
Packit	06404a	`(sample rate, channels), Vorbis version (always '0' in Vorbis I),`
Packit	06404a	`bitrate hints, and the lists of component instances. All other`
Packit	06404a	`configuration is in the context of specific components.`
Packit	06404a
Packit	06404a	`\subsubsection{Mode}`
Packit	06404a
Packit	06404a	`Each Vorbis frame is coded according to a master 'mode'. A bitstream`
Packit	06404a	`may use one or many modes.`
Packit	06404a
Packit	06404a	`The mode mechanism is used to encode a frame according to one of`
Packit	06404a	`multiple possible methods with the intention of choosing a method best`
Packit	06404a	`suited to that frame. Different modes are, e.g. how frame size`
Packit	06404a	`is changed from frame to frame. The mode number of a frame serves as a`
Packit	06404a	`top level configuration switch for all other specific aspects of frame`
Packit	06404a	`decode.`
Packit	06404a
Packit	06404a	`A 'mode' configuration consists of a frame size setting, window type`
Packit	06404a	`(always 0, the Vorbis window, in Vorbis I), transform type (always`
Packit	06404a	`type 0, the MDCT, in Vorbis I) and a mapping number. The mapping`
Packit	06404a	`number specifies which mapping configuration instance to use for`
Packit	06404a	`low-level packet decode and synthesis.`
Packit	06404a
Packit	06404a
Packit	06404a	`\subsubsection{Mapping}`
Packit	06404a
Packit	06404a	`A mapping contains a channel coupling description and a list of`
Packit	06404a	`'submaps' that bundle sets of channel vectors together for grouped`
Packit	06404a	`encoding and decoding. These submaps are not references to external`
Packit	06404a	`components; the submap list is internal and specific to a mapping.`
Packit	06404a
Packit	06404a	`A 'submap' is a configuration/grouping that applies to a subset of`
Packit	06404a	`floor and residue vectors within a mapping. The submap functions as a`
Packit	06404a	`last layer of indirection such that specific special floor or residue`
Packit	06404a	`settings can be applied not only to all the vectors in a given mode,`
Packit	06404a	`but also specific vectors in a specific mode. Each submap specifies`
Packit	06404a	`the proper floor and residue instance number to use for decoding that`
Packit	06404a	`submap's spectral floor and spectral residue vectors.`
Packit	06404a
Packit	06404a	`As an example:`
Packit	06404a
Packit	06404a	`Assume a Vorbis stream that contains six channels in the standard 5.1`
Packit	06404a	`format. The sixth channel, as is normal in 5.1, is bass only.`
Packit	06404a	`Therefore it would be wasteful to encode a full-spectrum version of it`
Packit	06404a	`as with the other channels. The submapping mechanism can be used to`
Packit	06404a	`apply a full range floor and residue encoding to channels 0 through 4,`
Packit	06404a	`and a bass-only representation to the bass channel, thus saving space.`
Packit	06404a	`In this example, channels 0-4 belong to submap 0 (which indicates use`
Packit	06404a	`of a full-range floor) and channel 5 belongs to submap 1, which uses a`
Packit	06404a	`bass-only representation.`
Packit	06404a
Packit	06404a
Packit	06404a	`\subsubsection{Floor}`
Packit	06404a
Packit	06404a	`Vorbis encodes a spectral 'floor' vector for each PCM channel. This`
Packit	06404a	`vector is a low-resolution representation of the audio spectrum for`
Packit	06404a	`the given channel in the current frame, generally used akin to a`
Packit	06404a	`whitening filter. It is named a 'floor' because the Xiph.Org`
Packit	06404a	`reference encoder has historically used it as a unit-baseline for`
Packit	06404a	`spectral resolution.`
Packit	06404a
Packit	06404a	`A floor encoding may be of two types. Floor 0 uses a packed LSP`
Packit	06404a	`representation on a dB amplitude scale and Bark frequency scale.`
Packit	06404a	`Floor 1 represents the curve as a piecewise linear interpolated`
Packit	06404a	`representation on a dB amplitude scale and linear frequency scale.`
Packit	06404a	`The two floors are semantically interchangeable in`
Packit	06404a	`encoding/decoding. However, floor type 1 provides more stable`
Packit	06404a	`inter-frame behavior, and so is the preferred choice in all`
Packit	06404a	`coupled-stereo and high bitrate modes. Floor 1 is also considerably`
Packit	06404a	`less expensive to decode than floor 0.`
Packit	06404a
Packit	06404a	`Floor 0 is not to be considered deprecated, but it is of limited`
Packit	06404a	`modern use. No known Vorbis encoder past Xiph.Org's own beta 4 makes`
Packit	06404a	`use of floor 0.`
Packit	06404a
Packit	06404a	`The values coded/decoded by a floor are both compactly formatted and`
Packit	06404a	`make use of entropy coding to save space. For this reason, a floor`
Packit	06404a	`configuration generally refers to multiple codebooks in the codebook`
Packit	06404a	`component list. Entropy coding is thus provided as an abstraction,`
Packit	06404a	`and each floor instance may choose from any and all available`
Packit	06404a	`codebooks when coding/decoding.`
Packit	06404a
Packit	06404a
Packit	06404a	`\subsubsection{Residue}`
Packit	06404a	`The spectral residue is the fine structure of the audio spectrum`
Packit	06404a	`once the floor curve has been subtracted out. In simplest terms, it`
Packit	06404a	`is coded in the bitstream using cascaded (multi-pass) vector`
Packit	06404a	`quantization according to one of three specific packing/coding`
Packit	06404a	`algorithms numbered 0 through 2. The packing algorithm details are`
Packit	06404a	`configured by residue instance. As with the floor components, the`
Packit	06404a	`final VQ/entropy encoding is provided by external codebook instances`
Packit	06404a	`and each residue instance may choose from any and all available`
Packit	06404a	`codebooks.`
Packit	06404a
Packit	06404a	`\subsubsection{Codebooks}`
Packit	06404a
Packit	06404a	`Codebooks are a self-contained abstraction that perform entropy`
Packit	06404a	`decoding and, optionally, use the entropy-decoded integer value as an`
Packit	06404a	`offset into an index of output value vectors, returning the indicated`
Packit	06404a	`vector of values.`
Packit	06404a
Packit	06404a	`The entropy coding in a Vorbis I codebook is provided by a standard`
Packit	06404a	`Huffman binary tree representation. This tree is tightly packed using`
Packit	06404a	`one of several methods, depending on whether codeword lengths are`
Packit	06404a	`ordered or unordered, or the tree is sparse.`
Packit	06404a
Packit	06404a	`The codebook vector index is similarly packed according to index`
Packit	06404a	`characteristic. Most commonly, the vector index is encoded as a`
Packit	06404a	`single list of values of possible values that are then permuted into`
Packit	06404a	`a list of n-dimensional rows (lattice VQ).`
Packit	06404a
Packit	06404a
Packit	06404a
Packit	06404a	`\subsection{High-level Decode Process}`
Packit	06404a
Packit	06404a	`\subsubsection{Decode Setup}`
Packit	06404a
Packit	06404a	`Before decoding can begin, a decoder must initialize using the`
Packit	06404a	`bitstream headers matching the stream to be decoded. Vorbis uses`
Packit	06404a	`three header packets; all are required, in-order, by this`
Packit	06404a	`specification. Once set up, decode may begin at any audio packet`
Packit	06404a	`belonging to the Vorbis stream. In Vorbis I, all packets after the`
Packit	06404a	`three initial headers are audio packets.`
Packit	06404a
Packit	06404a	`The header packets are, in order, the identification`
Packit	06404a	`header, the comments header, and the setup header.`
Packit	06404a
Packit	06404a	`\paragraph{Identification Header}`
Packit	06404a	`The identification header identifies the bitstream as Vorbis, Vorbis`
Packit	06404a	`version, and the simple audio characteristics of the stream such as`
Packit	06404a	`sample rate and number of channels.`
Packit	06404a
Packit	06404a	`\paragraph{Comment Header}`
Packit	06404a	The comment header includes user text comments (``tags'') and a vendor
Packit	06404a	`string for the application/library that produced the bitstream. The`
Packit	06404a	`encoding and proper use of the comment header is described in \xref{vorbis:spec:comment}.`
Packit	06404a
Packit	06404a	`\paragraph{Setup Header}`
Packit	06404a	`The setup header includes extensive CODEC setup information as well as`
Packit	06404a	`the complete VQ and Huffman codebooks needed for decode.`
Packit	06404a
Packit	06404a
Packit	06404a	`\subsubsection{Decode Procedure}`
Packit	06404a
Packit	06404a	`The decoding and synthesis procedure for all audio packets is`
Packit	06404a	`fundamentally the same.`
Packit	06404a	`\begin{enumerate}`
Packit	06404a	`\item decode packet type flag`
Packit	06404a	`\item decode mode number`
Packit	06404a	`\item decode window shape (long windows only)`
Packit	06404a	`\item decode floor`
Packit	06404a	`\item decode residue into residue vectors`
Packit	06404a	`\item inverse channel coupling of residue vectors`
Packit	06404a	`\item generate floor curve from decoded floor data`
Packit	06404a	`\item compute dot product of floor and residue, producing audio spectrum vector`
Packit	06404a	`\item inverse monolithic transform of audio spectrum vector, always an MDCT in Vorbis I`
Packit	06404a	`\item overlap/add left-hand output of transform with right-hand output of previous frame`
Packit	06404a	`\item store right hand-data from transform of current frame for future lapping`
Packit	06404a	`\item if not first frame, return results of overlap/add as audio result of current frame`
Packit	06404a	`\end{enumerate}`
Packit	06404a
Packit	06404a	`Note that clever rearrangement of the synthesis arithmetic is`
Packit	06404a	`possible; as an example, one can take advantage of symmetries in the`
Packit	06404a	`MDCT to store the right-hand transform data of a partial MDCT for a`
Packit	06404a	`50\% inter-frame buffer space savings, and then complete the transform`
Packit	06404a	`later before overlap/add with the next frame. This optimization`
Packit	06404a	`produces entirely equivalent output and is naturally perfectly legal.`
Packit	06404a	`The decoder must be \emph{entirely mathematically equivalent} to the`
Packit	06404a	`specification, it need not be a literal semantic implementation.`
Packit	06404a
Packit	06404a	`\paragraph{Packet type decode}`
Packit	06404a
Packit	06404a	`Vorbis I uses four packet types. The first three packet types mark each`
Packit	06404a	`of the three Vorbis headers described above. The fourth packet type`
Packit	06404a	`marks an audio packet. All other packet types are reserved; packets`
Packit	06404a	`marked with a reserved type should be ignored.`
Packit	06404a
Packit	06404a	`Following the three header packets, all packets in a Vorbis I stream`
Packit	06404a	`are audio. The first step of audio packet decode is to read and`
Packit	06404a	`verify the packet type; \emph{a non-audio packet when audio is expected`
Packit	06404a	`indicates stream corruption or a non-compliant stream. The decoder`
Packit	06404a	`must ignore the packet and not attempt decoding it to`
Packit	06404a	`audio}.`
Packit	06404a
Packit	06404a
Packit	06404a
Packit	06404a
Packit	06404a	`\paragraph{Mode decode}`
Packit	06404a	`Vorbis allows an encoder to set up multiple, numbered packet 'modes',`
Packit	06404a	`as described earlier, all of which may be used in a given Vorbis`
Packit	06404a	`stream. The mode is encoded as an integer used as a direct offset into`
Packit	06404a	`the mode instance index.`
Packit	06404a
Packit	06404a
Packit	06404a	`\paragraph{Window shape decode (long windows only)} \label{vorbis:spec:window}`
Packit	06404a
Packit	06404a	`Vorbis frames may be one of two PCM sample sizes specified during`
Packit	06404a	`codec setup. In Vorbis I, legal frame sizes are powers of two from 64`
Packit	06404a	`to 8192 samples. Aside from coupling, Vorbis handles channels as`
Packit	06404a	`independent vectors and these frame sizes are in samples per channel.`
Packit	06404a
Packit	06404a	`Vorbis uses an overlapping transform, namely the MDCT, to blend one`
Packit	06404a	`frame into the next, avoiding most inter-frame block boundary`
Packit	06404a	`artifacts. The MDCT output of one frame is windowed according to MDCT`
Packit	06404a	`requirements, overlapped 50\% with the output of the previous frame and`
Packit	06404a	`added. The window shape assures seamless reconstruction.`
Packit	06404a
Packit	06404a	`This is easy to visualize in the case of equal sized-windows:`
Packit	06404a
Packit	06404a	`\begin{center}`
Packit	06404a	`\includegraphics[width=\textwidth]{window1}`
Packit	06404a	`\captionof{figure}{overlap of two equal-sized windows}`
Packit	06404a	`\end{center}`
Packit	06404a
Packit	06404a	`And slightly more complex in the case of overlapping unequal sized`
Packit	06404a	`windows:`
Packit	06404a
Packit	06404a	`\begin{center}`
Packit	06404a	`\includegraphics[width=\textwidth]{window2}`
Packit	06404a	`\captionof{figure}{overlap of a long and a short window}`
Packit	06404a	`\end{center}`
Packit	06404a
Packit	06404a	`In the unequal-sized window case, the window shape of the long window`
Packit	06404a	`must be modified for seamless lapping as above. It is possible to`
Packit	06404a	`correctly infer window shape to be applied to the current window from`
Packit	06404a	`knowing the sizes of the current, previous and next window. It is`
Packit	06404a	`legal for a decoder to use this method. However, in the case of a long`
Packit	06404a	`window (short windows require no modification), Vorbis also codes two`
Packit	06404a	`flag bits to specify pre- and post- window shape. Although not`
Packit	06404a	`strictly necessary for function, this minor redundancy allows a packet`
Packit	06404a	`to be fully decoded to the point of lapping entirely independently of`
Packit	06404a	`any other packet, allowing easier abstraction of decode layers as well`
Packit	06404a	`as allowing a greater level of easy parallelism in encode and`
Packit	06404a	`decode.`
Packit	06404a
Packit	06404a	`A description of valid window functions for use with an inverse MDCT`
Packit	06404a	`can be found in \cite{Sporer/Brandenburg/Edler}. Vorbis windows`
Packit	06404a	`all use the slope function`
Packit	06404a	`\[ y = \sin(.5\pi \, \sin^2((x+.5)/n\pi)) . \]`
Packit	06404a
Packit	06404a
Packit	06404a
Packit	06404a	`\paragraph{floor decode}`
Packit	06404a	`Each floor is encoded/decoded in channel order, however each floor`
Packit	06404a	`belongs to a 'submap' that specifies which floor configuration to`
Packit	06404a	`use. All floors are decoded before residue decode begins.`
Packit	06404a
Packit	06404a
Packit	06404a	`\paragraph{residue decode}`
Packit	06404a
Packit	06404a	`Although the number of residue vectors equals the number of channels,`
Packit	06404a	`channel coupling may mean that the raw residue vectors extracted`
Packit	06404a	`during decode do not map directly to specific channels. When channel`
Packit	06404a	`coupling is in use, some vectors will correspond to coupled magnitude`
Packit	06404a	`or angle. The coupling relationships are described in the codec setup`
Packit	06404a	`and may differ from frame to frame, due to different mode numbers.`
Packit	06404a
Packit	06404a	`Vorbis codes residue vectors in groups by submap; the coding is done`
Packit	06404a	`in submap order from submap 0 through n-1. This differs from floors`
Packit	06404a	`which are coded using a configuration provided by submap number, but`
Packit	06404a	`are coded individually in channel order.`
Packit	06404a
Packit	06404a
Packit	06404a
Packit	06404a	`\paragraph{inverse channel coupling}`
Packit	06404a
Packit	06404a	`A detailed discussion of stereo in the Vorbis codec can be found in`
Packit	06404a	`the document \href{stereo.html}{Stereo Channel Coupling in the`
Packit	06404a	`Vorbis CODEC}. Vorbis is not limited to only stereo coupling, but`
Packit	06404a	`the stereo document also gives a good overview of the generic coupling`
Packit	06404a	`mechanism.`
Packit	06404a
Packit	06404a	`Vorbis coupling applies to pairs of residue vectors at a time;`
Packit	06404a	`decoupling is done in-place a pair at a time in the order and using`
Packit	06404a	`the vectors specified in the current mapping configuration. The`
Packit	06404a	`decoupling operation is the same for all pairs, converting square`
Packit	06404a	`polar representation (where one vector is magnitude and the second`
Packit	06404a	`angle) back to Cartesian representation.`
Packit	06404a
Packit	06404a	`After decoupling, in order, each pair of vectors on the coupling list,`
Packit	06404a	`the resulting residue vectors represent the fine spectral detail`
Packit	06404a	`of each output channel.`
Packit	06404a
Packit	06404a
Packit	06404a
Packit	06404a	`\paragraph{generate floor curve}`
Packit	06404a
Packit	06404a	`The decoder may choose to generate the floor curve at any appropriate`
Packit	06404a	`time. It is reasonable to generate the output curve when the floor`
Packit	06404a	`data is decoded from the raw packet, or it can be generated after`
Packit	06404a	`inverse coupling and applied to the spectral residue directly,`
Packit	06404a	`combining generation and the dot product into one step and eliminating`
Packit	06404a	`some working space.`
Packit	06404a
Packit	06404a	`Both floor 0 and floor 1 generate a linear-range, linear-domain output`
Packit	06404a	`vector to be multiplied (dot product) by the linear-range,`
Packit	06404a	`linear-domain spectral residue.`
Packit	06404a
Packit	06404a
Packit	06404a
Packit	06404a	`\paragraph{compute floor/residue dot product}`
Packit	06404a
Packit	06404a	`This step is straightforward; for each output channel, the decoder`
Packit	06404a	`multiplies the floor curve and residue vectors element by element,`
Packit	06404a	`producing the finished audio spectrum of each channel.`
Packit	06404a
Packit	06404a	`% TODO/FIXME: The following two paragraphs have identical twins`
Packit	06404a	`% in section 4 (under "dot product")`
Packit	06404a	`One point is worth mentioning about this dot product; a common mistake`
Packit	06404a	`in a fixed point implementation might be to assume that a 32 bit`
Packit	06404a	`fixed-point representation for floor and residue and direct`
Packit	06404a	`multiplication of the vectors is sufficient for acceptable spectral`
Packit	06404a	`depth in all cases because it happens to mostly work with the current`
Packit	06404a	`Xiph.Org reference encoder.`
Packit	06404a
Packit	06404a	`However, floor vector values can span \~{}140dB (\~{}24 bits unsigned), and`
Packit	06404a	`the audio spectrum vector should represent a minimum of 120dB (\~{}21`
Packit	06404a	`bits with sign), even when output is to a 16 bit PCM device. For the`
Packit	06404a	`residue vector to represent full scale if the floor is nailed to`
Packit	06404a	`$-140$dB, it must be able to span 0 to $+140$dB. For the residue vector`
Packit	06404a	`to reach full scale if the floor is nailed at 0dB, it must be able to`
Packit	06404a	`represent $-140$dB to $+0$dB. Thus, in order to handle full range`
Packit	06404a	`dynamics, a residue vector may span $-140$dB to $+140$dB entirely within`
Packit	06404a	`spec. A 280dB range is approximately 48 bits with sign; thus the`
Packit	06404a	`residue vector must be able to represent a 48 bit range and the dot`
Packit	06404a	`product must be able to handle an effective 48 bit times 24 bit`
Packit	06404a	`multiplication. This range may be achieved using large (64 bit or`
Packit	06404a	`larger) integers, or implementing a movable binary point`
Packit	06404a	`representation.`
Packit	06404a
Packit	06404a
Packit	06404a
Packit	06404a	`\paragraph{inverse monolithic transform (MDCT)}`
Packit	06404a
Packit	06404a	`The audio spectrum is converted back into time domain PCM audio via an`
Packit	06404a	`inverse Modified Discrete Cosine Transform (MDCT). A detailed`
Packit	06404a	`description of the MDCT is available in \cite{Sporer/Brandenburg/Edler}.`
Packit	06404a
Packit	06404a	`Note that the PCM produced directly from the MDCT is not yet finished`
Packit	06404a	`audio; it must be lapped with surrounding frames using an appropriate`
Packit	06404a	`window (such as the Vorbis window) before the MDCT can be considered`
Packit	06404a	`orthogonal.`
Packit	06404a
Packit	06404a
Packit	06404a
Packit	06404a	`\paragraph{overlap/add data}`
Packit	06404a	`Windowed MDCT output is overlapped and added with the right hand data`
Packit	06404a	`of the previous window such that the 3/4 point of the previous window`
Packit	06404a	`is aligned with the 1/4 point of the current window (as illustrated in`
Packit	06404a	`the window overlap diagram). At this point, the audio data between the`
Packit	06404a	`center of the previous frame and the center of the current frame is`
Packit	06404a	`now finished and ready to be returned.`
Packit	06404a
Packit	06404a
Packit	06404a	`\paragraph{cache right hand data}`
Packit	06404a	`The decoder must cache the right hand portion of the current frame to`
Packit	06404a	`be lapped with the left hand portion of the next frame.`
Packit	06404a
Packit	06404a
Packit	06404a
Packit	06404a	`\paragraph{return finished audio data}`
Packit	06404a
Packit	06404a	`The overlapped portion produced from overlapping the previous and`
Packit	06404a	`current frame data is finished data to be returned by the decoder.`
Packit	06404a	`This data spans from the center of the previous window to the center`
Packit	06404a	`of the current window. In the case of same-sized windows, the amount`
Packit	06404a	`of data to return is one-half block consisting of and only of the`
Packit	06404a	`overlapped portions. When overlapping a short and long window, much of`
Packit	06404a	`the returned range is not actually overlap. This does not damage`
Packit	06404a	`transform orthogonality. Pay attention however to returning the`
Packit	06404a	`correct data range; the amount of data to be returned is:`
Packit	06404a
Packit	06404a	`\begin{Verbatim}[commandchars=\\\{\}]`
Packit	06404a	`window\_blocksize(previous\_window)/4+window\_blocksize(current\_window)/4`
Packit	06404a	`\end{Verbatim}`
Packit	06404a
Packit	06404a	`from the center of the previous window to the center of the current`
Packit	06404a	`window.`
Packit	06404a
Packit	06404a	`Data is not returned from the first frame; it must be used to 'prime'`
Packit	06404a	`the decode engine. The encoder accounts for this priming when`
Packit	06404a	`calculating PCM offsets; after the first frame, the proper PCM output`
Packit	06404a	`offset is '0' (as no data has been returned yet).`

source-git / mingw-libvorbis

Source Code

Blame doc/01-introduction.tex