Blame doc/stereo.html

Packit 06404a
Packit 06404a
<html>
Packit 06404a
<head>
Packit 06404a
Packit 06404a
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-15"/>
Packit 06404a
<title>Ogg Vorbis Documentation</title>
Packit 06404a
Packit 06404a
<style type="text/css">
Packit 06404a
body {
Packit 06404a
  margin: 0 18px 0 18px;
Packit 06404a
  padding-bottom: 30px;
Packit 06404a
  font-family: Verdana, Arial, Helvetica, sans-serif;
Packit 06404a
  color: #333333;
Packit 06404a
  font-size: .8em;
Packit 06404a
}
Packit 06404a
Packit 06404a
a {
Packit 06404a
  color: #3366cc;
Packit 06404a
}
Packit 06404a
Packit 06404a
img {
Packit 06404a
  border: 0;
Packit 06404a
}
Packit 06404a
Packit 06404a
#xiphlogo {
Packit 06404a
  margin: 30px 0 16px 0;
Packit 06404a
}
Packit 06404a
Packit 06404a
#content p {
Packit 06404a
  line-height: 1.4;
Packit 06404a
}
Packit 06404a
Packit 06404a
h1, h1 a, h2, h2 a, h3, h3 a, h4, h4 a {
Packit 06404a
  font-weight: bold;
Packit 06404a
  color: #ff9900;
Packit 06404a
  margin: 1.3em 0 8px 0;
Packit 06404a
}
Packit 06404a
Packit 06404a
h1 {
Packit 06404a
  font-size: 1.3em;
Packit 06404a
}
Packit 06404a
Packit 06404a
h2 {
Packit 06404a
  font-size: 1.2em;
Packit 06404a
}
Packit 06404a
Packit 06404a
h3 {
Packit 06404a
  font-size: 1.1em;
Packit 06404a
}
Packit 06404a
Packit 06404a
li {
Packit 06404a
  line-height: 1.4;
Packit 06404a
}
Packit 06404a
Packit 06404a
#copyright {
Packit 06404a
  margin-top: 30px;
Packit 06404a
  line-height: 1.5em;
Packit 06404a
  text-align: center;
Packit 06404a
  font-size: .8em;
Packit 06404a
  color: #888888;
Packit 06404a
  clear: both;
Packit 06404a
}
Packit 06404a
</style>
Packit 06404a
Packit 06404a
</head>
Packit 06404a
Packit 06404a
<body>
Packit 06404a
Packit 06404a
Packit 06404a
  Fish Logo and Xiph.Org
Packit 06404a
Packit 06404a
Packit 06404a

Ogg Vorbis stereo-specific channel coupling discussion

Packit 06404a
Packit 06404a

Abstract

Packit 06404a
Packit 06404a

The Vorbis audio CODEC provides a channel coupling

Packit 06404a
mechanisms designed to reduce effective bitrate by both eliminating
Packit 06404a
interchannel redundancy and eliminating stereo image information
Packit 06404a
labeled inaudible or undesirable according to spatial psychoacoustic
Packit 06404a
models. This document describes both the mechanical coupling
Packit 06404a
mechanisms available within the Vorbis specification, as well as the
Packit 06404a
specific stereo coupling models used by the reference
Packit 06404a
<tt>libvorbis</tt> codec provided by xiph.org.

Packit 06404a
Packit 06404a

Mechanisms

Packit 06404a
Packit 06404a

In encoder release beta 4 and earlier, Vorbis supported multiple

Packit 06404a
channel encoding, but the channels were encoded entirely separately
Packit 06404a
with no cross-analysis or redundancy elimination between channels.
Packit 06404a
This multichannel strategy is very similar to the mp3's dual
Packit 06404a
stereo mode and Vorbis uses the same name for its analogous
Packit 06404a
uncoupled multichannel modes.

Packit 06404a
Packit 06404a

However, the Vorbis spec provides for, and Vorbis release 1.0 rc1 and

Packit 06404a
later implement a coupled channel strategy. Vorbis has two specific
Packit 06404a
mechanisms that may be used alone or in conjunction to implement
Packit 06404a
channel coupling. The first is channel interleaving via
Packit 06404a
residue backend type 2, and the second is square polar
Packit 06404a
mapping. These two general mechanisms are particularly well
Packit 06404a
suited to coupling due to the structure of Vorbis encoding, as we'll
Packit 06404a
explore below, and using both we can implement both totally
Packit 06404a
lossless stereo image coupling [bit-for-bit decode-identical
Packit 06404a
to uncoupled modes], as well as various lossy models that seek to
Packit 06404a
eliminate inaudible or unimportant aspects of the stereo image in
Packit 06404a
order to enhance bitrate. The exact coupling implementation is
Packit 06404a
generalized to allow the encoder a great deal of flexibility in
Packit 06404a
implementation of a stereo or surround model without requiring any
Packit 06404a
significant complexity increase over the combinatorially simpler
Packit 06404a
mid/side joint stereo of mp3 and other current audio codecs.

Packit 06404a
Packit 06404a

A particular Vorbis bitstream may apply channel coupling directly to

Packit 06404a
more than a pair of channels; polar mapping is hierarchical such that
Packit 06404a
polar coupling may be extrapolated to an arbitrary number of channels
Packit 06404a
and is not restricted to only stereo, quadraphonics, ambisonics or 5.1
Packit 06404a
surround. However, the scope of this document restricts itself to the
Packit 06404a
stereo coupling case.

Packit 06404a
Packit 06404a
Packit 06404a

Square Polar Mapping

Packit 06404a
Packit 06404a

maximal correlation

Packit 06404a
 
Packit 06404a

Recall that the basic structure of a a Vorbis I stream first generates

Packit 06404a
from input audio a spectral 'floor' function that serves as an
Packit 06404a
MDCT-domain whitening filter. This floor is meant to represent the
Packit 06404a
rough envelope of the frequency spectrum, using whatever metric the
Packit 06404a
encoder cares to define. This floor is subtracted from the log
Packit 06404a
frequency spectrum, effectively normalizing the spectrum by frequency.
Packit 06404a
Each input channel is associated with a unique floor function.

Packit 06404a
Packit 06404a

The basic idea behind any stereo coupling is that the left and right

Packit 06404a
channels usually correlate. This correlation is even stronger if one
Packit 06404a
first accounts for energy differences in any given frequency band
Packit 06404a
across left and right; think for example of individual instruments
Packit 06404a
mixed into different portions of the stereo image, or a stereo
Packit 06404a
recording with a dominant feature not perfectly in the center. The
Packit 06404a
floor functions, each specific to a channel, provide the perfect means
Packit 06404a
of normalizing left and right energies across the spectrum to maximize
Packit 06404a
correlation before coupling. This feature of the Vorbis format is not
Packit 06404a
a convenient accident.

Packit 06404a
Packit 06404a

Because we strive to maximally correlate the left and right channels

Packit 06404a
and generally succeed in doing so, left and right residue is typically
Packit 06404a
nearly identical. We could use channel interleaving (discussed below)
Packit 06404a
alone to efficiently remove the redundancy between the left and right
Packit 06404a
channels as a side effect of entropy encoding, but a polar
Packit 06404a
representation gives benefits when left/right correlation is
Packit 06404a
strong.

Packit 06404a
Packit 06404a

point and diffuse imaging

Packit 06404a
Packit 06404a

The first advantage of a polar representation is that it effectively

Packit 06404a
separates the spatial audio information into a 'point image'
Packit 06404a
(magnitude) at a given frequency and located somewhere in the sound
Packit 06404a
field, and a 'diffuse image' (angle) that fills a large amount of
Packit 06404a
space simultaneously. Even if we preserve only the magnitude (point)
Packit 06404a
data, a detailed and carefully chosen floor function in each channel
Packit 06404a
provides us with a free, fine-grained, frequency relative intensity
Packit 06404a
stereo*. Angle information represents diffuse sound fields, such as
Packit 06404a
reverberation that fills the entire space simultaneously.

Packit 06404a
Packit 06404a

*Because the Vorbis model supports a number of different possible

Packit 06404a
stereo models and these models may be mixed, we do not use the term
Packit 06404a
'intensity stereo' talking about Vorbis; instead we use the terms
Packit 06404a
'point stereo', 'phase stereo' and subcategories of each.

Packit 06404a
Packit 06404a

The majority of a stereo image is representable by polar magnitude

Packit 06404a
alone, as strong sounds tend to be produced at near-point sources;
Packit 06404a
even non-diffuse, fast, sharp echoes track very accurately using
Packit 06404a
magnitude representation almost alone (for those experimenting with
Packit 06404a
Vorbis tuning, this strategy works much better with the precise,
Packit 06404a
piecewise control of floor 1; the continuous approximation of floor 0
Packit 06404a
results in unstable imaging). Reverberation and diffuse sounds tend
Packit 06404a
to contain less energy and be psychoacoustically dominated by the
Packit 06404a
point sources embedded in them. Thus, we again tend to concentrate
Packit 06404a
more represented energy into a predictably smaller number of numbers.
Packit 06404a
Separating representation of point and diffuse imaging also allows us
Packit 06404a
to model and manipulate point and diffuse qualities separately.

Packit 06404a
Packit 06404a

controlling bit leakage and symbol crosstalk

Packit 06404a
Packit 06404a

Because polar

Packit 06404a
representation concentrates represented energy into fewer large
Packit 06404a
values, we reduce bit 'leakage' during cascading (multistage VQ
Packit 06404a
encoding) as a secondary benefit. A single large, monolithic VQ
Packit 06404a
codebook is more efficient than a cascaded book due to entropy
Packit 06404a
'crosstalk' among symbols between different stages of a multistage cascade.
Packit 06404a
Polar representation is a way of further concentrating entropy into
Packit 06404a
predictable locations so that codebook design can take steps to
Packit 06404a
improve multistage codebook efficiency. It also allows us to cascade
Packit 06404a
various elements of the stereo image independently.

Packit 06404a
Packit 06404a

eliminating trigonometry and rounding

Packit 06404a
Packit 06404a

Rounding and computational complexity are potential problems with a

Packit 06404a
polar representation. As our encoding process involves quantization,
Packit 06404a
mixing a polar representation and quantization makes it potentially
Packit 06404a
impossible, depending on implementation, to construct a coupled stereo
Packit 06404a
mechanism that results in bit-identical decompressed output compared
Packit 06404a
to an uncoupled encoding should the encoder desire it.

Packit 06404a
Packit 06404a

Vorbis uses a mapping that preserves the most useful qualities of

Packit 06404a
polar representation, relies only on addition/subtraction (during
Packit 06404a
decode; high quality encoding still requires some trig), and makes it
Packit 06404a
trivial before or after quantization to represent an angle/magnitude
Packit 06404a
through a one-to-one mapping from possible left/right value
Packit 06404a
permutations. We do this by basing our polar representation on the
Packit 06404a
unit square rather than the unit-circle.

Packit 06404a
Packit 06404a

Given a magnitude and angle, we recover left and right using the

Packit 06404a
following function (note that A/B may be left/right or right/left
Packit 06404a
depending on the coupling definition used by the encoder):

Packit 06404a
Packit 06404a
Packit 06404a
      if(magnitude>0)
Packit 06404a
        if(angle>0){
Packit 06404a
          A=magnitude;
Packit 06404a
          B=magnitude-angle;
Packit 06404a
        }else{
Packit 06404a
          B=magnitude;
Packit 06404a
          A=magnitude+angle;
Packit 06404a
        }
Packit 06404a
      else
Packit 06404a
        if(angle>0){
Packit 06404a
          A=magnitude;
Packit 06404a
          B=magnitude+angle;
Packit 06404a
        }else{
Packit 06404a
          B=magnitude;
Packit 06404a
          A=magnitude-angle;
Packit 06404a
        }
Packit 06404a
    }
Packit 06404a
Packit 06404a
Packit 06404a

The function is antisymmetric for positive and negative magnitudes in

Packit 06404a
order to eliminate a redundant value when quantizing. For example, if
Packit 06404a
we're quantizing to integer values, we can visualize a magnitude of 5
Packit 06404a
and an angle of -2 as follows:

Packit 06404a
Packit 06404a

square polar

Packit 06404a
Packit 06404a

This representation loses or replicates no values; if the range of A

Packit 06404a
and B are integral -5 through 5, the number of possible Cartesian
Packit 06404a
permutations is 121. Represented in square polar notation, the
Packit 06404a
possible values are:

Packit 06404a
Packit 06404a
Packit 06404a
 0, 0
Packit 06404a
Packit 06404a
-1,-2  -1,-1  -1, 0  -1, 1
Packit 06404a
Packit 06404a
 1,-2   1,-1   1, 0   1, 1
Packit 06404a
Packit 06404a
-2,-4  -2,-3  -2,-2  -2,-1  -2, 0  -2, 1  -2, 2  -2, 3  
Packit 06404a
Packit 06404a
 2,-4   2,-3   ... following the pattern ...
Packit 06404a
Packit 06404a
 ...   5, 1   5, 2   5, 3   5, 4   5, 5   5, 6   5, 7   5, 8   5, 9
Packit 06404a
Packit 06404a
Packit 06404a
Packit 06404a

...for a grand total of 121 possible values, the same number as in

Packit 06404a
Cartesian representation (note that, for example, <tt>5,-10</tt> is
Packit 06404a
the same as <tt>-5,10</tt>, so there's no reason to represent
Packit 06404a
both. 2,10 cannot happen, and there's no reason to account for it.)
Packit 06404a
It's also obvious that this mapping is exactly reversible.

Packit 06404a
Packit 06404a

Channel interleaving

Packit 06404a
Packit 06404a

We can remap and A/B vector using polar mapping into a magnitude/angle

Packit 06404a
vector, and it's clear that, in general, this concentrates energy in
Packit 06404a
the magnitude vector and reduces the amount of information to encode
Packit 06404a
in the angle vector. Encoding these vectors independently with
Packit 06404a
residue backend #0 or residue backend #1 will result in bitrate
Packit 06404a
savings. However, there are still implicit correlations between the
Packit 06404a
magnitude and angle vectors. The most obvious is that the amplitude
Packit 06404a
of the angle is bounded by its corresponding magnitude value.

Packit 06404a
Packit 06404a

Entropy coding the results, then, further benefits from the entropy

Packit 06404a
model being able to compress magnitude and angle simultaneously. For
Packit 06404a
this reason, Vorbis implements residue backend #2 which pre-interleaves
Packit 06404a
a number of input vectors (in the stereo case, two, A and B) into a
Packit 06404a
single output vector (with the elements in the order of
Packit 06404a
A_0, B_0, A_1, B_1, A_2 ... A_n-1, B_n-1) before entropy encoding. Thus
Packit 06404a
each vector to be coded by the vector quantization backend consists of
Packit 06404a
matching magnitude and angle values.

Packit 06404a
Packit 06404a

The astute reader, at this point, will notice that in the theoretical

Packit 06404a
case in which we can use monolithic codebooks of arbitrarily large
Packit 06404a
size, we can directly interleave and encode left and right without
Packit 06404a
polar mapping; in fact, the polar mapping does not appear to lend any
Packit 06404a
benefit whatsoever to the efficiency of the entropy coding. In fact,
Packit 06404a
it is perfectly possible and reasonable to build a Vorbis encoder that
Packit 06404a
dispenses with polar mapping entirely and merely interleaves the
Packit 06404a
channel. Libvorbis based encoders may configure such an encoding and
Packit 06404a
it will work as intended.

Packit 06404a
Packit 06404a

However, when we leave the ideal/theoretical domain, we notice that

Packit 06404a
polar mapping does give additional practical benefits, as discussed in
Packit 06404a
the above section on polar mapping and summarized again here:

Packit 06404a
Packit 06404a
    Packit 06404a
  • Polar mapping aids in controlling entropy 'leakage' between stages
  • Packit 06404a
    of a cascaded codebook.
    Packit 06404a
  • Polar mapping separates the stereo image
  • Packit 06404a
    into point and diffuse components which may be analyzed and handled
    Packit 06404a
    differently.
    Packit 06404a
    Packit 06404a
    Packit 06404a

    Stereo Models

    Packit 06404a
    Packit 06404a

    Dual Stereo

    Packit 06404a
    Packit 06404a

    Dual stereo refers to stereo encoding where the channels are entirely

    Packit 06404a
    separate; they are analyzed and encoded as entirely distinct entities.
    Packit 06404a
    This terminology is familiar from mp3.

    Packit 06404a
    Packit 06404a

    Lossless Stereo

    Packit 06404a
    Packit 06404a

    Using polar mapping and/or channel interleaving, it's possible to

    Packit 06404a
    couple Vorbis channels losslessly, that is, construct a stereo
    Packit 06404a
    coupling encoding that both saves space but also decodes
    Packit 06404a
    bit-identically to dual stereo. OggEnc 1.0 and later uses this
    Packit 06404a
    mode in all high-bitrate encoding.

    Packit 06404a
    Packit 06404a

    Overall, this stereo mode is overkill; however, it offers a safe

    Packit 06404a
    alternative to users concerned about the slightest possible
    Packit 06404a
    degradation to the stereo image or archival quality audio.

    Packit 06404a
    Packit 06404a

    Phase Stereo

    Packit 06404a
    Packit 06404a

    Phase stereo is the least aggressive means of gracefully dropping

    Packit 06404a
    resolution from the stereo image; it affects only diffuse imaging.

    Packit 06404a
    Packit 06404a

    It's often quoted that the human ear is deaf to signal phase above

    Packit 06404a
    about 4kHz; this is nearly true and a passable rule of thumb, but it
    Packit 06404a
    can be demonstrated that even an average user can tell the difference
    Packit 06404a
    between high frequency in-phase and out-of-phase noise. Obviously
    Packit 06404a
    then, the statement is not entirely true. However, it's also the case
    Packit 06404a
    that one must resort to nearly such an extreme demonstration before
    Packit 06404a
    finding the counterexample.

    Packit 06404a
    Packit 06404a

    'Phase stereo' is simply a more aggressive quantization of the polar

    Packit 06404a
    angle vector; above 4kHz it's generally quite safe to quantize noise
    Packit 06404a
    and noisy elements to only a handful of allowed phases, or to thin the
    Packit 06404a
    phase with respect to the magnitude. The phases of high amplitude
    Packit 06404a
    pure tones may or may not be preserved more carefully (they are
    Packit 06404a
    relatively rare and L/R tend to be in phase, so there is generally
    Packit 06404a
    little reason not to spend a few more bits on them)

    Packit 06404a
    Packit 06404a

    example: eight phase stereo

    Packit 06404a
    Packit 06404a

    Vorbis may implement phase stereo coupling by preserving the entirety

    Packit 06404a
    of the magnitude vector (essential to fine amplitude and energy
    Packit 06404a
    resolution overall) and quantizing the angle vector to one of only
    Packit 06404a
    four possible values. Given that the magnitude vector may be positive
    Packit 06404a
    or negative, this results in left and right phase having eight
    Packit 06404a
    possible permutation, thus 'eight phase stereo':

    Packit 06404a
    Packit 06404a

    eight phase

    Packit 06404a
    Packit 06404a

    Left and right may be in phase (positive or negative), the most common

    Packit 06404a
    case by far, or out of phase by 90 or 180 degrees.

    Packit 06404a
    Packit 06404a

    example: four phase stereo

    Packit 06404a
    Packit 06404a

    Similarly, four phase stereo takes the quantization one step further;

    Packit 06404a
    it allows only in-phase and 180 degree out-out-phase signals:

    Packit 06404a
    Packit 06404a

    four phase

    Packit 06404a
    Packit 06404a

    example: point stereo

    Packit 06404a
    Packit 06404a

    Point stereo eliminates the possibility of out-of-phase signal

    Packit 06404a
    entirely. Any diffuse quality to a sound source tends to collapse
    Packit 06404a
    inward to a point somewhere within the stereo image. A practical
    Packit 06404a
    example would be balanced reverberations within a large, live space;
    Packit 06404a
    normally the sound is diffuse and soft, giving a sonic impression of
    Packit 06404a
    volume. In point-stereo, the reverberations would still exist, but
    Packit 06404a
    sound fairly firmly centered within the image (assuming the
    Packit 06404a
    reverberation was centered overall; if the reverberation is stronger
    Packit 06404a
    to the left, then the point of localization in point stereo would be
    Packit 06404a
    to the left). This effect is most noticeable at low and mid
    Packit 06404a
    frequencies and using headphones (which grant perfect stereo
    Packit 06404a
    separation). Point stereo is is a graceful but generally easy to
    Packit 06404a
    detect degradation to the sound quality and is thus used in frequency
    Packit 06404a
    ranges where it is least noticeable.

    Packit 06404a
    Packit 06404a

    Mixed Stereo

    Packit 06404a
    Packit 06404a

    Mixed stereo is the simultaneous use of more than one of the above

    Packit 06404a
    stereo encoding models, generally using more aggressive modes in
    Packit 06404a
    higher frequencies, lower amplitudes or 'nearly' in-phase sound.

    Packit 06404a
    Packit 06404a

    It is also the case that near-DC frequencies should be encoded using

    Packit 06404a
    lossless coupling to avoid frame blocking artifacts.

    Packit 06404a
    Packit 06404a

    Vorbis Stereo Modes

    Packit 06404a
    Packit 06404a

    Vorbis, as of 1.0, uses lossless stereo and a number of mixed modes

    Packit 06404a
    constructed out of lossless and point stereo. Phase stereo was used
    Packit 06404a
    in the rc2 encoder, but is not currently used for simplicity's sake. It
    Packit 06404a
    will likely be re-added to the stereo model in the future.

    Packit 06404a
    Packit 06404a
    Packit 06404a
      The Xiph Fish Logo is a
    Packit 06404a
      trademark (™) of Xiph.Org.
    Packit 06404a
    Packit 06404a
      These pages © 1994 - 2005 Xiph.Org. All rights reserved.
    Packit 06404a
    Packit 06404a
    Packit 06404a
    </body>
    Packit 06404a
    </html>
    Packit 06404a
    Packit 06404a
    Packit 06404a
    Packit 06404a
    Packit 06404a
    Packit 06404a