Blame doc/01-introduction.tex

Packit 06404a
% -*- mode: latex; TeX-master: "Vorbis_I_spec"; -*-
Packit 06404a
%!TEX root = Vorbis_I_spec.tex
Packit 06404a
% $Id$
Packit 06404a
\section{Introduction and Description} \label{vorbis:spec:intro}
Packit 06404a
Packit 06404a
\subsection{Overview}
Packit 06404a
Packit 06404a
This document provides a high level description of the Vorbis codec's
Packit 06404a
construction.  A bit-by-bit specification appears beginning in
Packit 06404a
\xref{vorbis:spec:codec}.
Packit 06404a
The later sections assume a high-level
Packit 06404a
understanding of the Vorbis decode process, which is
Packit 06404a
provided here.
Packit 06404a
Packit 06404a
\subsubsection{Application}
Packit 06404a
Vorbis is a general purpose perceptual audio CODEC intended to allow
Packit 06404a
maximum encoder flexibility, thus allowing it to scale competitively
Packit 06404a
over an exceptionally wide range of bitrates.  At the high
Packit 06404a
quality/bitrate end of the scale (CD or DAT rate stereo, 16/24 bits)
Packit 06404a
it is in the same league as MPEG-2 and MPC.  Similarly, the 1.0
Packit 06404a
encoder can encode high-quality CD and DAT rate stereo at below 48kbps
Packit 06404a
without resampling to a lower rate.  Vorbis is also intended for
Packit 06404a
lower and higher sample rates (from 8kHz telephony to 192kHz digital
Packit 06404a
masters) and a range of channel representations (monaural,
Packit 06404a
polyphonic, stereo, quadraphonic, 5.1, ambisonic, or up to 255
Packit 06404a
discrete channels).
Packit 06404a
Packit 06404a
Packit 06404a
\subsubsection{Classification}
Packit 06404a
Vorbis I is a forward-adaptive monolithic transform CODEC based on the
Packit 06404a
Modified Discrete Cosine Transform.  The codec is structured to allow
Packit 06404a
addition of a hybrid wavelet filterbank in Vorbis II to offer better
Packit 06404a
transient response and reproduction using a transform better suited to
Packit 06404a
localized time events.
Packit 06404a
Packit 06404a
Packit 06404a
\subsubsection{Assumptions}
Packit 06404a
Packit 06404a
The Vorbis CODEC design assumes a complex, psychoacoustically-aware
Packit 06404a
encoder and simple, low-complexity decoder. Vorbis decode is
Packit 06404a
computationally simpler than mp3, although it does require more
Packit 06404a
working memory as Vorbis has no static probability model; the vector
Packit 06404a
codebooks used in the first stage of decoding from the bitstream are
Packit 06404a
packed in their entirety into the Vorbis bitstream headers. In
Packit 06404a
packed form, these codebooks occupy only a few kilobytes; the extent
Packit 06404a
to which they are pre-decoded into a cache is the dominant factor in
Packit 06404a
decoder memory usage.
Packit 06404a
Packit 06404a
Packit 06404a
Vorbis provides none of its own framing, synchronization or protection
Packit 06404a
against errors; it is solely a method of accepting input audio,
Packit 06404a
dividing it into individual frames and compressing these frames into
Packit 06404a
raw, unformatted 'packets'. The decoder then accepts these raw
Packit 06404a
packets in sequence, decodes them, synthesizes audio frames from
Packit 06404a
them, and reassembles the frames into a facsimile of the original
Packit 06404a
audio stream. Vorbis is a free-form variable bit rate (VBR) codec and packets have no
Packit 06404a
minimum size, maximum size, or fixed/expected size.  Packets
Packit 06404a
are designed that they may be truncated (or padded) and remain
Packit 06404a
decodable; this is not to be considered an error condition and is used
Packit 06404a
extensively in bitrate management in peeling.  Both the transport
Packit 06404a
mechanism and decoder must allow that a packet may be any size, or
Packit 06404a
end before or after packet decode expects.
Packit 06404a
Packit 06404a
Vorbis packets are thus intended to be used with a transport mechanism
Packit 06404a
that provides free-form framing, sync, positioning and error correction
Packit 06404a
in accordance with these design assumptions, such as Ogg (for file
Packit 06404a
transport) or RTP (for network multicast).  For purposes of a few
Packit 06404a
examples in this document, we will assume that Vorbis is to be
Packit 06404a
embedded in an Ogg stream specifically, although this is by no means a
Packit 06404a
requirement or fundamental assumption in the Vorbis design.
Packit 06404a
Packit 06404a
The specification for embedding Vorbis into
Packit 06404a
an Ogg transport stream is in \xref{vorbis:over:ogg}.
Packit 06404a
Packit 06404a
Packit 06404a
Packit 06404a
\subsubsection{Codec Setup and Probability Model}
Packit 06404a
Packit 06404a
Vorbis' heritage is as a research CODEC and its current design
Packit 06404a
reflects a desire to allow multiple decades of continuous encoder
Packit 06404a
improvement before running out of room within the codec specification.
Packit 06404a
For these reasons, configurable aspects of codec setup intentionally
Packit 06404a
lean toward the extreme of forward adaptive.
Packit 06404a
Packit 06404a
The single most controversial design decision in Vorbis (and the most
Packit 06404a
unusual for a Vorbis developer to keep in mind) is that the entire
Packit 06404a
probability model of the codec, the Huffman and VQ codebooks, is
Packit 06404a
packed into the bitstream header along with extensive CODEC setup
Packit 06404a
parameters (often several hundred fields).  This makes it impossible,
Packit 06404a
as it would be with MPEG audio layers, to embed a simple frame type
Packit 06404a
flag in each audio packet, or begin decode at any frame in the stream
Packit 06404a
without having previously fetched the codec setup header.
Packit 06404a
Packit 06404a
Packit 06404a
\begin{note}
Packit 06404a
Vorbis \emph{can} initiate decode at any arbitrary packet within a
Packit 06404a
bitstream so long as the codec has been initialized/setup with the
Packit 06404a
setup headers.
Packit 06404a
\end{note}
Packit 06404a
Packit 06404a
Thus, Vorbis headers are both required for decode to begin and
Packit 06404a
relatively large as bitstream headers go.  The header size is
Packit 06404a
unbounded, although for streaming a rule-of-thumb of 4kB or less is
Packit 06404a
recommended (and Xiph.Org's Vorbis encoder follows this suggestion).
Packit 06404a
Packit 06404a
Our own design work indicates the primary liability of the
Packit 06404a
required header is in mindshare; it is an unusual design and thus
Packit 06404a
causes some amount of complaint among engineers as this runs against
Packit 06404a
current design trends (and also points out limitations in some
Packit 06404a
existing software/interface designs, such as Windows' ACM codec
Packit 06404a
framework).  However, we find that it does not fundamentally limit
Packit 06404a
Vorbis' suitable application space.
Packit 06404a
Packit 06404a
Packit 06404a
\subsubsection{Format Specification}
Packit 06404a
The Vorbis format is well-defined by its decode specification; any
Packit 06404a
encoder that produces packets that are correctly decoded by the
Packit 06404a
reference Vorbis decoder described below may be considered a proper
Packit 06404a
Vorbis encoder.  A decoder must faithfully and completely implement
Packit 06404a
the specification defined below (except where noted) to be considered
Packit 06404a
a proper Vorbis decoder.
Packit 06404a
Packit 06404a
\subsubsection{Hardware Profile}
Packit 06404a
Although Vorbis decode is computationally simple, it may still run
Packit 06404a
into specific limitations of an embedded design.  For this reason,
Packit 06404a
embedded designs are allowed to deviate in limited ways from the
Packit 06404a
`full' decode specification yet still be certified compliant.  These
Packit 06404a
optional omissions are labelled in the spec where relevant.
Packit 06404a
Packit 06404a
Packit 06404a
\subsection{Decoder Configuration}
Packit 06404a
Packit 06404a
Decoder setup consists of configuration of multiple, self-contained
Packit 06404a
component abstractions that perform specific functions in the decode
Packit 06404a
pipeline.  Each different component instance of a specific type is
Packit 06404a
semantically interchangeable; decoder configuration consists both of
Packit 06404a
internal component configuration, as well as arrangement of specific
Packit 06404a
instances into a decode pipeline.  Componentry arrangement is roughly
Packit 06404a
as follows:
Packit 06404a
Packit 06404a
\begin{center}
Packit 06404a
\includegraphics[width=\textwidth]{components}
Packit 06404a
\captionof{figure}{decoder pipeline configuration}
Packit 06404a
\end{center}
Packit 06404a
Packit 06404a
\subsubsection{Global Config}
Packit 06404a
Global codec configuration consists of a few audio related fields
Packit 06404a
(sample rate, channels), Vorbis version (always '0' in Vorbis I),
Packit 06404a
bitrate hints, and the lists of component instances.  All other
Packit 06404a
configuration is in the context of specific components.
Packit 06404a
Packit 06404a
\subsubsection{Mode}
Packit 06404a
Packit 06404a
Each Vorbis frame is coded according to a master 'mode'.  A bitstream
Packit 06404a
may use one or many modes.
Packit 06404a
Packit 06404a
The mode mechanism is used to encode a frame according to one of
Packit 06404a
multiple possible methods with the intention of choosing a method best
Packit 06404a
suited to that frame.  Different modes are, e.g. how frame size
Packit 06404a
is changed from frame to frame. The mode number of a frame serves as a
Packit 06404a
top level configuration switch for all other specific aspects of frame
Packit 06404a
decode.
Packit 06404a
Packit 06404a
A 'mode' configuration consists of a frame size setting, window type
Packit 06404a
(always 0, the Vorbis window, in Vorbis I), transform type (always
Packit 06404a
type 0, the MDCT, in Vorbis I) and a mapping number.  The mapping
Packit 06404a
number specifies which mapping configuration instance to use for
Packit 06404a
low-level packet decode and synthesis.
Packit 06404a
Packit 06404a
Packit 06404a
\subsubsection{Mapping}
Packit 06404a
Packit 06404a
A mapping contains a channel coupling description and a list of
Packit 06404a
'submaps' that bundle sets of channel vectors together for grouped
Packit 06404a
encoding and decoding. These submaps are not references to external
Packit 06404a
components; the submap list is internal and specific to a mapping.
Packit 06404a
Packit 06404a
A 'submap' is a configuration/grouping that applies to a subset of
Packit 06404a
floor and residue vectors within a mapping.  The submap functions as a
Packit 06404a
last layer of indirection such that specific special floor or residue
Packit 06404a
settings can be applied not only to all the vectors in a given mode,
Packit 06404a
but also specific vectors in a specific mode.  Each submap specifies
Packit 06404a
the proper floor and residue instance number to use for decoding that
Packit 06404a
submap's spectral floor and spectral residue vectors.
Packit 06404a
Packit 06404a
As an example:
Packit 06404a
Packit 06404a
Assume a Vorbis stream that contains six channels in the standard 5.1
Packit 06404a
format.  The sixth channel, as is normal in 5.1, is bass only.
Packit 06404a
Therefore it would be wasteful to encode a full-spectrum version of it
Packit 06404a
as with the other channels.  The submapping mechanism can be used to
Packit 06404a
apply a full range floor and residue encoding to channels 0 through 4,
Packit 06404a
and a bass-only representation to the bass channel, thus saving space.
Packit 06404a
In this example, channels 0-4 belong to submap 0 (which indicates use
Packit 06404a
of a full-range floor) and channel 5 belongs to submap 1, which uses a
Packit 06404a
bass-only representation.
Packit 06404a
Packit 06404a
Packit 06404a
\subsubsection{Floor}
Packit 06404a
Packit 06404a
Vorbis encodes a spectral 'floor' vector for each PCM channel.  This
Packit 06404a
vector is a low-resolution representation of the audio spectrum for
Packit 06404a
the given channel in the current frame, generally used akin to a
Packit 06404a
whitening filter.  It is named a 'floor' because the Xiph.Org
Packit 06404a
reference encoder has historically used it as a unit-baseline for
Packit 06404a
spectral resolution.
Packit 06404a
Packit 06404a
A floor encoding may be of two types.  Floor 0 uses a packed LSP
Packit 06404a
representation on a dB amplitude scale and Bark frequency scale.
Packit 06404a
Floor 1 represents the curve as a piecewise linear interpolated
Packit 06404a
representation on a dB amplitude scale and linear frequency scale.
Packit 06404a
The two floors are semantically interchangeable in
Packit 06404a
encoding/decoding. However, floor type 1 provides more stable
Packit 06404a
inter-frame behavior, and so is the preferred choice in all
Packit 06404a
coupled-stereo and high bitrate modes.  Floor 1 is also considerably
Packit 06404a
less expensive to decode than floor 0.
Packit 06404a
Packit 06404a
Floor 0 is not to be considered deprecated, but it is of limited
Packit 06404a
modern use.  No known Vorbis encoder past Xiph.Org's own beta 4 makes
Packit 06404a
use of floor 0.
Packit 06404a
Packit 06404a
The values coded/decoded by a floor are both compactly formatted and
Packit 06404a
make use of entropy coding to save space.  For this reason, a floor
Packit 06404a
configuration generally refers to multiple codebooks in the codebook
Packit 06404a
component list.  Entropy coding is thus provided as an abstraction,
Packit 06404a
and each floor instance may choose from any and all available
Packit 06404a
codebooks when coding/decoding.
Packit 06404a
Packit 06404a
Packit 06404a
\subsubsection{Residue}
Packit 06404a
The spectral residue is the fine structure of the audio spectrum
Packit 06404a
once the floor curve has been subtracted out.  In simplest terms, it
Packit 06404a
is coded in the bitstream using cascaded (multi-pass) vector
Packit 06404a
quantization according to one of three specific packing/coding
Packit 06404a
algorithms numbered 0 through 2.  The packing algorithm details are
Packit 06404a
configured by residue instance.  As with the floor components, the
Packit 06404a
final VQ/entropy encoding is provided by external codebook instances
Packit 06404a
and each residue instance may choose from any and all available
Packit 06404a
codebooks.
Packit 06404a
Packit 06404a
\subsubsection{Codebooks}
Packit 06404a
Packit 06404a
Codebooks are a self-contained abstraction that perform entropy
Packit 06404a
decoding and, optionally, use the entropy-decoded integer value as an
Packit 06404a
offset into an index of output value vectors, returning the indicated
Packit 06404a
vector of values.
Packit 06404a
Packit 06404a
The entropy coding in a Vorbis I codebook is provided by a standard
Packit 06404a
Huffman binary tree representation.  This tree is tightly packed using
Packit 06404a
one of several methods, depending on whether codeword lengths are
Packit 06404a
ordered or unordered, or the tree is sparse.
Packit 06404a
Packit 06404a
The codebook vector index is similarly packed according to index
Packit 06404a
characteristic.  Most commonly, the vector index is encoded as a
Packit 06404a
single list of values of possible values that are then permuted into
Packit 06404a
a list of n-dimensional rows (lattice VQ).
Packit 06404a
Packit 06404a
Packit 06404a
Packit 06404a
\subsection{High-level Decode Process}
Packit 06404a
Packit 06404a
\subsubsection{Decode Setup}
Packit 06404a
Packit 06404a
Before decoding can begin, a decoder must initialize using the
Packit 06404a
bitstream headers matching the stream to be decoded.  Vorbis uses
Packit 06404a
three header packets; all are required, in-order, by this
Packit 06404a
specification. Once set up, decode may begin at any audio packet
Packit 06404a
belonging to the Vorbis stream. In Vorbis I, all packets after the
Packit 06404a
three initial headers are audio packets.
Packit 06404a
Packit 06404a
The header packets are, in order, the identification
Packit 06404a
header, the comments header, and the setup header.
Packit 06404a
Packit 06404a
\paragraph{Identification Header}
Packit 06404a
The identification header identifies the bitstream as Vorbis, Vorbis
Packit 06404a
version, and the simple audio characteristics of the stream such as
Packit 06404a
sample rate and number of channels.
Packit 06404a
Packit 06404a
\paragraph{Comment Header}
Packit 06404a
The comment header includes user text comments (``tags'') and a vendor
Packit 06404a
string for the application/library that produced the bitstream.  The
Packit 06404a
encoding and proper use of the comment header is described in \xref{vorbis:spec:comment}.
Packit 06404a
Packit 06404a
\paragraph{Setup Header}
Packit 06404a
The setup header includes extensive CODEC setup information as well as
Packit 06404a
the complete VQ and Huffman codebooks needed for decode.
Packit 06404a
Packit 06404a
Packit 06404a
\subsubsection{Decode Procedure}
Packit 06404a
Packit 06404a
The decoding and synthesis procedure for all audio packets is
Packit 06404a
fundamentally the same.
Packit 06404a
\begin{enumerate}
Packit 06404a
\item decode packet type flag
Packit 06404a
\item decode mode number
Packit 06404a
\item decode window shape (long windows only)
Packit 06404a
\item decode floor
Packit 06404a
\item decode residue into residue vectors
Packit 06404a
\item inverse channel coupling of residue vectors
Packit 06404a
\item generate floor curve from decoded floor data
Packit 06404a
\item compute dot product of floor and residue, producing audio spectrum vector
Packit 06404a
\item inverse monolithic transform of audio spectrum vector, always an MDCT in Vorbis I
Packit 06404a
\item overlap/add left-hand output of transform with right-hand output of previous frame
Packit 06404a
\item store right hand-data from transform of current frame for future lapping
Packit 06404a
\item if not first frame, return results of overlap/add as audio result of current frame
Packit 06404a
\end{enumerate}
Packit 06404a
Packit 06404a
Note that clever rearrangement of the synthesis arithmetic is
Packit 06404a
possible; as an example, one can take advantage of symmetries in the
Packit 06404a
MDCT to store the right-hand transform data of a partial MDCT for a
Packit 06404a
50\% inter-frame buffer space savings, and then complete the transform
Packit 06404a
later before overlap/add with the next frame.  This optimization
Packit 06404a
produces entirely equivalent output and is naturally perfectly legal.
Packit 06404a
The decoder must be \emph{entirely mathematically equivalent} to the
Packit 06404a
specification, it need not be a literal semantic implementation.
Packit 06404a
Packit 06404a
\paragraph{Packet type decode}
Packit 06404a
Packit 06404a
Vorbis I uses four packet types. The first three packet types mark each
Packit 06404a
of the three Vorbis headers described above. The fourth packet type
Packit 06404a
marks an audio packet. All other packet types are reserved; packets
Packit 06404a
marked with a reserved type should be ignored.
Packit 06404a
Packit 06404a
Following the three header packets, all packets in a Vorbis I stream
Packit 06404a
are audio.  The first step of audio packet decode is to read and
Packit 06404a
verify the packet type; \emph{a non-audio packet when audio is expected
Packit 06404a
indicates stream corruption or a non-compliant stream. The decoder
Packit 06404a
must ignore the packet and not attempt decoding it to
Packit 06404a
audio}.
Packit 06404a
Packit 06404a
Packit 06404a
Packit 06404a
Packit 06404a
\paragraph{Mode decode}
Packit 06404a
Vorbis allows an encoder to set up multiple, numbered packet 'modes',
Packit 06404a
as described earlier, all of which may be used in a given Vorbis
Packit 06404a
stream. The mode is encoded as an integer used as a direct offset into
Packit 06404a
the mode instance index.
Packit 06404a
Packit 06404a
Packit 06404a
\paragraph{Window shape decode (long windows only)} \label{vorbis:spec:window}
Packit 06404a
Packit 06404a
Vorbis frames may be one of two PCM sample sizes specified during
Packit 06404a
codec setup.  In Vorbis I, legal frame sizes are powers of two from 64
Packit 06404a
to 8192 samples.  Aside from coupling, Vorbis handles channels as
Packit 06404a
independent vectors and these frame sizes are in samples per channel.
Packit 06404a
Packit 06404a
Vorbis uses an overlapping transform, namely the MDCT, to blend one
Packit 06404a
frame into the next, avoiding most inter-frame block boundary
Packit 06404a
artifacts.  The MDCT output of one frame is windowed according to MDCT
Packit 06404a
requirements, overlapped 50\% with the output of the previous frame and
Packit 06404a
added.  The window shape assures seamless reconstruction.
Packit 06404a
Packit 06404a
This is easy to visualize in the case of equal sized-windows:
Packit 06404a
Packit 06404a
\begin{center}
Packit 06404a
\includegraphics[width=\textwidth]{window1}
Packit 06404a
\captionof{figure}{overlap of two equal-sized windows}
Packit 06404a
\end{center}
Packit 06404a
Packit 06404a
And slightly more complex in the case of overlapping unequal sized
Packit 06404a
windows:
Packit 06404a
Packit 06404a
\begin{center}
Packit 06404a
\includegraphics[width=\textwidth]{window2}
Packit 06404a
\captionof{figure}{overlap of a long and a short window}
Packit 06404a
\end{center}
Packit 06404a
Packit 06404a
In the unequal-sized window case, the window shape of the long window
Packit 06404a
must be modified for seamless lapping as above.  It is possible to
Packit 06404a
correctly infer window shape to be applied to the current window from
Packit 06404a
knowing the sizes of the current, previous and next window.  It is
Packit 06404a
legal for a decoder to use this method. However, in the case of a long
Packit 06404a
window (short windows require no modification), Vorbis also codes two
Packit 06404a
flag bits to specify pre- and post- window shape.  Although not
Packit 06404a
strictly necessary for function, this minor redundancy allows a packet
Packit 06404a
to be fully decoded to the point of lapping entirely independently of
Packit 06404a
any other packet, allowing easier abstraction of decode layers as well
Packit 06404a
as allowing a greater level of easy parallelism in encode and
Packit 06404a
decode.
Packit 06404a
Packit 06404a
A description of valid window functions for use with an inverse MDCT
Packit 06404a
can be found in \cite{Sporer/Brandenburg/Edler}.  Vorbis windows
Packit 06404a
all use the slope function
Packit 06404a
\[ y = \sin(.5*\pi \, \sin^2((x+.5)/n*\pi)) . \]
Packit 06404a
Packit 06404a
Packit 06404a
Packit 06404a
\paragraph{floor decode}
Packit 06404a
Each floor is encoded/decoded in channel order, however each floor
Packit 06404a
belongs to a 'submap' that specifies which floor configuration to
Packit 06404a
use.  All floors are decoded before residue decode begins.
Packit 06404a
Packit 06404a
Packit 06404a
\paragraph{residue decode}
Packit 06404a
Packit 06404a
Although the number of residue vectors equals the number of channels,
Packit 06404a
channel coupling may mean that the raw residue vectors extracted
Packit 06404a
during decode do not map directly to specific channels.  When channel
Packit 06404a
coupling is in use, some vectors will correspond to coupled magnitude
Packit 06404a
or angle.  The coupling relationships are described in the codec setup
Packit 06404a
and may differ from frame to frame, due to different mode numbers.
Packit 06404a
Packit 06404a
Vorbis codes residue vectors in groups by submap; the coding is done
Packit 06404a
in submap order from submap 0 through n-1.  This differs from floors
Packit 06404a
which are coded using a configuration provided by submap number, but
Packit 06404a
are coded individually in channel order.
Packit 06404a
Packit 06404a
Packit 06404a
Packit 06404a
\paragraph{inverse channel coupling}
Packit 06404a
Packit 06404a
A detailed discussion of stereo in the Vorbis codec can be found in
Packit 06404a
the document \href{stereo.html}{Stereo Channel Coupling in the
Packit 06404a
Vorbis CODEC}.  Vorbis is not limited to only stereo coupling, but
Packit 06404a
the stereo document also gives a good overview of the generic coupling
Packit 06404a
mechanism.
Packit 06404a
Packit 06404a
Vorbis coupling applies to pairs of residue vectors at a time;
Packit 06404a
decoupling is done in-place a pair at a time in the order and using
Packit 06404a
the vectors specified in the current mapping configuration.  The
Packit 06404a
decoupling operation is the same for all pairs, converting square
Packit 06404a
polar representation (where one vector is magnitude and the second
Packit 06404a
angle) back to Cartesian representation.
Packit 06404a
Packit 06404a
After decoupling, in order, each pair of vectors on the coupling list,
Packit 06404a
the resulting residue vectors represent the fine spectral detail
Packit 06404a
of each output channel.
Packit 06404a
Packit 06404a
Packit 06404a
Packit 06404a
\paragraph{generate floor curve}
Packit 06404a
Packit 06404a
The decoder may choose to generate the floor curve at any appropriate
Packit 06404a
time.  It is reasonable to generate the output curve when the floor
Packit 06404a
data is decoded from the raw packet, or it can be generated after
Packit 06404a
inverse coupling and applied to the spectral residue directly,
Packit 06404a
combining generation and the dot product into one step and eliminating
Packit 06404a
some working space.
Packit 06404a
Packit 06404a
Both floor 0 and floor 1 generate a linear-range, linear-domain output
Packit 06404a
vector to be multiplied (dot product) by the linear-range,
Packit 06404a
linear-domain spectral residue.
Packit 06404a
Packit 06404a
Packit 06404a
Packit 06404a
\paragraph{compute floor/residue dot product}
Packit 06404a
Packit 06404a
This step is straightforward; for each output channel, the decoder
Packit 06404a
multiplies the floor curve and residue vectors element by element,
Packit 06404a
producing the finished audio spectrum of each channel.
Packit 06404a
Packit 06404a
% TODO/FIXME: The following two paragraphs have identical twins
Packit 06404a
%   in section 4 (under "dot product")
Packit 06404a
One point is worth mentioning about this dot product; a common mistake
Packit 06404a
in a fixed point implementation might be to assume that a 32 bit
Packit 06404a
fixed-point representation for floor and residue and direct
Packit 06404a
multiplication of the vectors is sufficient for acceptable spectral
Packit 06404a
depth in all cases because it happens to mostly work with the current
Packit 06404a
Xiph.Org reference encoder.
Packit 06404a
Packit 06404a
However, floor vector values can span \~{}140dB (\~{}24 bits unsigned), and
Packit 06404a
the audio spectrum vector should represent a minimum of 120dB (\~{}21
Packit 06404a
bits with sign), even when output is to a 16 bit PCM device.  For the
Packit 06404a
residue vector to represent full scale if the floor is nailed to
Packit 06404a
$-140$dB, it must be able to span 0 to $+140$dB.  For the residue vector
Packit 06404a
to reach full scale if the floor is nailed at 0dB, it must be able to
Packit 06404a
represent $-140$dB to $+0$dB.  Thus, in order to handle full range
Packit 06404a
dynamics, a residue vector may span $-140$dB to $+140$dB entirely within
Packit 06404a
spec.  A 280dB range is approximately 48 bits with sign; thus the
Packit 06404a
residue vector must be able to represent a 48 bit range and the dot
Packit 06404a
product must be able to handle an effective 48 bit times 24 bit
Packit 06404a
multiplication.  This range may be achieved using large (64 bit or
Packit 06404a
larger) integers, or implementing a movable binary point
Packit 06404a
representation.
Packit 06404a
Packit 06404a
Packit 06404a
Packit 06404a
\paragraph{inverse monolithic transform (MDCT)}
Packit 06404a
Packit 06404a
The audio spectrum is converted back into time domain PCM audio via an
Packit 06404a
inverse Modified Discrete Cosine Transform (MDCT).  A detailed
Packit 06404a
description of the MDCT is available in \cite{Sporer/Brandenburg/Edler}.
Packit 06404a
Packit 06404a
Note that the PCM produced directly from the MDCT is not yet finished
Packit 06404a
audio; it must be lapped with surrounding frames using an appropriate
Packit 06404a
window (such as the Vorbis window) before the MDCT can be considered
Packit 06404a
orthogonal.
Packit 06404a
Packit 06404a
Packit 06404a
Packit 06404a
\paragraph{overlap/add data}
Packit 06404a
Windowed MDCT output is overlapped and added with the right hand data
Packit 06404a
of the previous window such that the 3/4 point of the previous window
Packit 06404a
is aligned with the 1/4 point of the current window (as illustrated in
Packit 06404a
the window overlap diagram). At this point, the audio data between the
Packit 06404a
center of the previous frame and the center of the current frame is
Packit 06404a
now finished and ready to be returned.
Packit 06404a
Packit 06404a
Packit 06404a
\paragraph{cache right hand data}
Packit 06404a
The decoder must cache the right hand portion of the current frame to
Packit 06404a
be lapped with the left hand portion of the next frame.
Packit 06404a
Packit 06404a
Packit 06404a
Packit 06404a
\paragraph{return finished audio data}
Packit 06404a
Packit 06404a
The overlapped portion produced from overlapping the previous and
Packit 06404a
current frame data is finished data to be returned by the decoder.
Packit 06404a
This data spans from the center of the previous window to the center
Packit 06404a
of the current window.  In the case of same-sized windows, the amount
Packit 06404a
of data to return is one-half block consisting of and only of the
Packit 06404a
overlapped portions. When overlapping a short and long window, much of
Packit 06404a
the returned range is not actually overlap.  This does not damage
Packit 06404a
transform orthogonality.  Pay attention however to returning the
Packit 06404a
correct data range; the amount of data to be returned is:
Packit 06404a
Packit 06404a
\begin{Verbatim}[commandchars=\\\{\}]
Packit 06404a
window\_blocksize(previous\_window)/4+window\_blocksize(current\_window)/4
Packit 06404a
\end{Verbatim}
Packit 06404a
Packit 06404a
from the center of the previous window to the center of the current
Packit 06404a
window.
Packit 06404a
Packit 06404a
Data is not returned from the first frame; it must be used to 'prime'
Packit 06404a
the decode engine.  The encoder accounts for this priming when
Packit 06404a
calculating PCM offsets; after the first frame, the proper PCM output
Packit 06404a
offset is '0' (as no data has been returned yet).