Tree - source-git/mingw-libvorbis

source-git / mingw-libvorbis

Blame doc/stereo.html

Blob History Raw

Packit	06404a
Packit	06404a	`<html>`
Packit	06404a	`<head>`
Packit	06404a
Packit	06404a	`<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-15"/>`
Packit	06404a	`<title>Ogg Vorbis Documentation</title>`
Packit	06404a
Packit	06404a	`<style type="text/css">`
Packit	06404a	`body {`
Packit	06404a	`margin: 0 18px 0 18px;`
Packit	06404a	`padding-bottom: 30px;`
Packit	06404a	`font-family: Verdana, Arial, Helvetica, sans-serif;`
Packit	06404a	`color: #333333;`
Packit	06404a	`font-size: .8em;`
Packit	06404a	`}`
Packit	06404a
Packit	06404a	`a {`
Packit	06404a	`color: #3366cc;`
Packit	06404a	`}`
Packit	06404a
Packit	06404a	`img {`
Packit	06404a	`border: 0;`
Packit	06404a	`}`
Packit	06404a
Packit	06404a	`#xiphlogo {`
Packit	06404a	`margin: 30px 0 16px 0;`
Packit	06404a	`}`
Packit	06404a
Packit	06404a	`#content p {`
Packit	06404a	`line-height: 1.4;`
Packit	06404a	`}`
Packit	06404a
Packit	06404a	`h1, h1 a, h2, h2 a, h3, h3 a, h4, h4 a {`
Packit	06404a	`font-weight: bold;`
Packit	06404a	`color: #ff9900;`
Packit	06404a	`margin: 1.3em 0 8px 0;`
Packit	06404a	`}`
Packit	06404a
Packit	06404a	`h1 {`
Packit	06404a	`font-size: 1.3em;`
Packit	06404a	`}`
Packit	06404a
Packit	06404a	`h2 {`
Packit	06404a	`font-size: 1.2em;`
Packit	06404a	`}`
Packit	06404a
Packit	06404a	`h3 {`
Packit	06404a	`font-size: 1.1em;`
Packit	06404a	`}`
Packit	06404a
Packit	06404a	`li {`
Packit	06404a	`line-height: 1.4;`
Packit	06404a	`}`
Packit	06404a
Packit	06404a	`#copyright {`
Packit	06404a	`margin-top: 30px;`
Packit	06404a	`line-height: 1.5em;`
Packit	06404a	`text-align: center;`
Packit	06404a	`font-size: .8em;`
Packit	06404a	`color: #888888;`
Packit	06404a	`clear: both;`
Packit	06404a	`}`
Packit	06404a	`</style>`
Packit	06404a
Packit	06404a	`</head>`
Packit	06404a
Packit	06404a	`<body>`
Packit	06404a
Packit	06404a
Packit	06404a
Packit	06404a
Packit	06404a
Packit	06404a	`Ogg Vorbis stereo-specific channel coupling discussion`
Packit	06404a
Packit	06404a	`Abstract`
Packit	06404a
Packit	06404a	`The Vorbis audio CODEC provides a channel coupling`
Packit	06404a	`mechanisms designed to reduce effective bitrate by both eliminating`
Packit	06404a	`interchannel redundancy and eliminating stereo image information`
Packit	06404a	`labeled inaudible or undesirable according to spatial psychoacoustic`
Packit	06404a	`models. This document describes both the mechanical coupling`
Packit	06404a	`mechanisms available within the Vorbis specification, as well as the`
Packit	06404a	`specific stereo coupling models used by the reference`
Packit	06404a	`<tt>libvorbis</tt> codec provided by xiph.org.`
Packit	06404a
Packit	06404a	`Mechanisms`
Packit	06404a
Packit	06404a	`In encoder release beta 4 and earlier, Vorbis supported multiple`
Packit	06404a	`channel encoding, but the channels were encoded entirely separately`
Packit	06404a	`with no cross-analysis or redundancy elimination between channels.`
Packit	06404a	`This multichannel strategy is very similar to the mp3's dual`
Packit	06404a	`stereo mode and Vorbis uses the same name for its analogous`
Packit	06404a	`uncoupled multichannel modes.`
Packit	06404a
Packit	06404a	`However, the Vorbis spec provides for, and Vorbis release 1.0 rc1 and`
Packit	06404a	`later implement a coupled channel strategy. Vorbis has two specific`
Packit	06404a	`mechanisms that may be used alone or in conjunction to implement`
Packit	06404a	`channel coupling. The first is channel interleaving via`
Packit	06404a	`residue backend type 2, and the second is square polar`
Packit	06404a	`mapping. These two general mechanisms are particularly well`
Packit	06404a	`suited to coupling due to the structure of Vorbis encoding, as we'll`
Packit	06404a	`explore below, and using both we can implement both totally`
Packit	06404a	`lossless stereo image coupling [bit-for-bit decode-identical`
Packit	06404a	`to uncoupled modes], as well as various lossy models that seek to`
Packit	06404a	`eliminate inaudible or unimportant aspects of the stereo image in`
Packit	06404a	`order to enhance bitrate. The exact coupling implementation is`
Packit	06404a	`generalized to allow the encoder a great deal of flexibility in`
Packit	06404a	`implementation of a stereo or surround model without requiring any`
Packit	06404a	`significant complexity increase over the combinatorially simpler`
Packit	06404a	`mid/side joint stereo of mp3 and other current audio codecs.`
Packit	06404a
Packit	06404a	`A particular Vorbis bitstream may apply channel coupling directly to`
Packit	06404a	`more than a pair of channels; polar mapping is hierarchical such that`
Packit	06404a	`polar coupling may be extrapolated to an arbitrary number of channels`
Packit	06404a	`and is not restricted to only stereo, quadraphonics, ambisonics or 5.1`
Packit	06404a	`surround. However, the scope of this document restricts itself to the`
Packit	06404a	`stereo coupling case.`
Packit	06404a
Packit	06404a
Packit	06404a	`Square Polar Mapping`
Packit	06404a
Packit	06404a	`maximal correlation`
Packit	06404a
Packit	06404a	`Recall that the basic structure of a a Vorbis I stream first generates`
Packit	06404a	`from input audio a spectral 'floor' function that serves as an`
Packit	06404a	`MDCT-domain whitening filter. This floor is meant to represent the`
Packit	06404a	`rough envelope of the frequency spectrum, using whatever metric the`
Packit	06404a	`encoder cares to define. This floor is subtracted from the log`
Packit	06404a	`frequency spectrum, effectively normalizing the spectrum by frequency.`
Packit	06404a	`Each input channel is associated with a unique floor function.`
Packit	06404a
Packit	06404a	`The basic idea behind any stereo coupling is that the left and right`
Packit	06404a	`channels usually correlate. This correlation is even stronger if one`
Packit	06404a	`first accounts for energy differences in any given frequency band`
Packit	06404a	`across left and right; think for example of individual instruments`
Packit	06404a	`mixed into different portions of the stereo image, or a stereo`
Packit	06404a	`recording with a dominant feature not perfectly in the center. The`
Packit	06404a	`floor functions, each specific to a channel, provide the perfect means`
Packit	06404a	`of normalizing left and right energies across the spectrum to maximize`
Packit	06404a	`correlation before coupling. This feature of the Vorbis format is not`
Packit	06404a	`a convenient accident.`
Packit	06404a
Packit	06404a	`Because we strive to maximally correlate the left and right channels`
Packit	06404a	`and generally succeed in doing so, left and right residue is typically`
Packit	06404a	`nearly identical. We could use channel interleaving (discussed below)`
Packit	06404a	`alone to efficiently remove the redundancy between the left and right`
Packit	06404a	`channels as a side effect of entropy encoding, but a polar`
Packit	06404a	`representation gives benefits when left/right correlation is`
Packit	06404a	`strong.`
Packit	06404a
Packit	06404a	`point and diffuse imaging`
Packit	06404a
Packit	06404a	`The first advantage of a polar representation is that it effectively`
Packit	06404a	`separates the spatial audio information into a 'point image'`
Packit	06404a	`(magnitude) at a given frequency and located somewhere in the sound`
Packit	06404a	`field, and a 'diffuse image' (angle) that fills a large amount of`
Packit	06404a	`space simultaneously. Even if we preserve only the magnitude (point)`
Packit	06404a	`data, a detailed and carefully chosen floor function in each channel`
Packit	06404a	`provides us with a free, fine-grained, frequency relative intensity`
Packit	06404a	`stereo*. Angle information represents diffuse sound fields, such as`
Packit	06404a	`reverberation that fills the entire space simultaneously.`
Packit	06404a
Packit	06404a	`*Because the Vorbis model supports a number of different possible`
Packit	06404a	`stereo models and these models may be mixed, we do not use the term`
Packit	06404a	`'intensity stereo' talking about Vorbis; instead we use the terms`
Packit	06404a	`'point stereo', 'phase stereo' and subcategories of each.`
Packit	06404a
Packit	06404a	`The majority of a stereo image is representable by polar magnitude`
Packit	06404a	`alone, as strong sounds tend to be produced at near-point sources;`
Packit	06404a	`even non-diffuse, fast, sharp echoes track very accurately using`
Packit	06404a	`magnitude representation almost alone (for those experimenting with`
Packit	06404a	`Vorbis tuning, this strategy works much better with the precise,`
Packit	06404a	`piecewise control of floor 1; the continuous approximation of floor 0`
Packit	06404a	`results in unstable imaging). Reverberation and diffuse sounds tend`
Packit	06404a	`to contain less energy and be psychoacoustically dominated by the`
Packit	06404a	`point sources embedded in them. Thus, we again tend to concentrate`
Packit	06404a	`more represented energy into a predictably smaller number of numbers.`
Packit	06404a	`Separating representation of point and diffuse imaging also allows us`
Packit	06404a	`to model and manipulate point and diffuse qualities separately.`
Packit	06404a
Packit	06404a	`controlling bit leakage and symbol crosstalk`
Packit	06404a
Packit	06404a	`Because polar`
Packit	06404a	`representation concentrates represented energy into fewer large`
Packit	06404a	`values, we reduce bit 'leakage' during cascading (multistage VQ`
Packit	06404a	`encoding) as a secondary benefit. A single large, monolithic VQ`
Packit	06404a	`codebook is more efficient than a cascaded book due to entropy`
Packit	06404a	`'crosstalk' among symbols between different stages of a multistage cascade.`
Packit	06404a	`Polar representation is a way of further concentrating entropy into`
Packit	06404a	`predictable locations so that codebook design can take steps to`
Packit	06404a	`improve multistage codebook efficiency. It also allows us to cascade`
Packit	06404a	`various elements of the stereo image independently.`
Packit	06404a
Packit	06404a	`eliminating trigonometry and rounding`
Packit	06404a
Packit	06404a	`Rounding and computational complexity are potential problems with a`
Packit	06404a	`polar representation. As our encoding process involves quantization,`
Packit	06404a	`mixing a polar representation and quantization makes it potentially`
Packit	06404a	`impossible, depending on implementation, to construct a coupled stereo`
Packit	06404a	`mechanism that results in bit-identical decompressed output compared`
Packit	06404a	`to an uncoupled encoding should the encoder desire it.`
Packit	06404a
Packit	06404a	`Vorbis uses a mapping that preserves the most useful qualities of`
Packit	06404a	`polar representation, relies only on addition/subtraction (during`
Packit	06404a	`decode; high quality encoding still requires some trig), and makes it`
Packit	06404a	`trivial before or after quantization to represent an angle/magnitude`
Packit	06404a	`through a one-to-one mapping from possible left/right value`
Packit	06404a	`permutations. We do this by basing our polar representation on the`
Packit	06404a	`unit square rather than the unit-circle.`
Packit	06404a
Packit	06404a	`Given a magnitude and angle, we recover left and right using the`
Packit	06404a	`following function (note that A/B may be left/right or right/left`
Packit	06404a	`depending on the coupling definition used by the encoder):`
Packit	06404a
Packit	06404a
Packit	06404a	`if(magnitude>0)`
Packit	06404a	`if(angle>0){`
Packit	06404a	`A=magnitude;`
Packit	06404a	`B=magnitude-angle;`
Packit	06404a	`}else{`
Packit	06404a	`B=magnitude;`
Packit	06404a	`A=magnitude+angle;`
Packit	06404a	`}`
Packit	06404a	`else`
Packit	06404a	`if(angle>0){`
Packit	06404a	`A=magnitude;`
Packit	06404a	`B=magnitude+angle;`
Packit	06404a	`}else{`
Packit	06404a	`B=magnitude;`
Packit	06404a	`A=magnitude-angle;`
Packit	06404a	`}`
Packit	06404a	`}`
Packit	06404a
Packit	06404a
Packit	06404a	`The function is antisymmetric for positive and negative magnitudes in`
Packit	06404a	`order to eliminate a redundant value when quantizing. For example, if`
Packit	06404a	`we're quantizing to integer values, we can visualize a magnitude of 5`
Packit	06404a	`and an angle of -2 as follows:`
Packit	06404a
Packit	06404a
Packit	06404a
Packit	06404a	`This representation loses or replicates no values; if the range of A`
Packit	06404a	`and B are integral -5 through 5, the number of possible Cartesian`
Packit	06404a	`permutations is 121. Represented in square polar notation, the`
Packit	06404a	`possible values are:`
Packit	06404a
Packit	06404a
Packit	06404a	`0, 0`
Packit	06404a
Packit	06404a	`-1,-2 -1,-1 -1, 0 -1, 1`
Packit	06404a
Packit	06404a	`1,-2 1,-1 1, 0 1, 1`
Packit	06404a
Packit	06404a	`-2,-4 -2,-3 -2,-2 -2,-1 -2, 0 -2, 1 -2, 2 -2, 3`
Packit	06404a
Packit	06404a	`2,-4 2,-3 ... following the pattern ...`
Packit	06404a
Packit	06404a	`... 5, 1 5, 2 5, 3 5, 4 5, 5 5, 6 5, 7 5, 8 5, 9`
Packit	06404a
Packit	06404a
Packit	06404a
Packit	06404a	`...for a grand total of 121 possible values, the same number as in`
Packit	06404a	`Cartesian representation (note that, for example, <tt>5,-10</tt> is`
Packit	06404a	`the same as <tt>-5,10</tt>, so there's no reason to represent`
Packit	06404a	`both. 2,10 cannot happen, and there's no reason to account for it.)`
Packit	06404a	`It's also obvious that this mapping is exactly reversible.`
Packit	06404a
Packit	06404a	`Channel interleaving`
Packit	06404a
Packit	06404a	`We can remap and A/B vector using polar mapping into a magnitude/angle`
Packit	06404a	`vector, and it's clear that, in general, this concentrates energy in`
Packit	06404a	`the magnitude vector and reduces the amount of information to encode`
Packit	06404a	`in the angle vector. Encoding these vectors independently with`
Packit	06404a	`residue backend #0 or residue backend #1 will result in bitrate`
Packit	06404a	`savings. However, there are still implicit correlations between the`
Packit	06404a	`magnitude and angle vectors. The most obvious is that the amplitude`
Packit	06404a	`of the angle is bounded by its corresponding magnitude value.`
Packit	06404a
Packit	06404a	`Entropy coding the results, then, further benefits from the entropy`
Packit	06404a	`model being able to compress magnitude and angle simultaneously. For`
Packit	06404a	`this reason, Vorbis implements residue backend #2 which pre-interleaves`
Packit	06404a	`a number of input vectors (in the stereo case, two, A and B) into a`
Packit	06404a	`single output vector (with the elements in the order of`
Packit	06404a	`A_0, B_0, A_1, B_1, A_2 ... A_n-1, B_n-1) before entropy encoding. Thus`
Packit	06404a	`each vector to be coded by the vector quantization backend consists of`
Packit	06404a	`matching magnitude and angle values.`
Packit	06404a
Packit	06404a	`The astute reader, at this point, will notice that in the theoretical`
Packit	06404a	`case in which we can use monolithic codebooks of arbitrarily large`
Packit	06404a	`size, we can directly interleave and encode left and right without`
Packit	06404a	`polar mapping; in fact, the polar mapping does not appear to lend any`
Packit	06404a	`benefit whatsoever to the efficiency of the entropy coding. In fact,`
Packit	06404a	`it is perfectly possible and reasonable to build a Vorbis encoder that`
Packit	06404a	`dispenses with polar mapping entirely and merely interleaves the`
Packit	06404a	`channel. Libvorbis based encoders may configure such an encoding and`
Packit	06404a	`it will work as intended.`
Packit	06404a
Packit	06404a	`However, when we leave the ideal/theoretical domain, we notice that`
Packit	06404a	`polar mapping does give additional practical benefits, as discussed in`
Packit	06404a	`the above section on polar mapping and summarized again here:`
Packit	06404a
Packit	06404a
Packit	06404a	`Polar mapping aids in controlling entropy 'leakage' between stages`
Packit	06404a	`of a cascaded codebook.`
Packit	06404a	`Polar mapping separates the stereo image`
Packit	06404a	`into point and diffuse components which may be analyzed and handled`
Packit	06404a	`differently.`
Packit	06404a
Packit	06404a
Packit	06404a	`Stereo Models`
Packit	06404a
Packit	06404a	`Dual Stereo`
Packit	06404a
Packit	06404a	`Dual stereo refers to stereo encoding where the channels are entirely`
Packit	06404a	`separate; they are analyzed and encoded as entirely distinct entities.`
Packit	06404a	`This terminology is familiar from mp3.`
Packit	06404a
Packit	06404a	`Lossless Stereo`
Packit	06404a
Packit	06404a	`Using polar mapping and/or channel interleaving, it's possible to`
Packit	06404a	`couple Vorbis channels losslessly, that is, construct a stereo`
Packit	06404a	`coupling encoding that both saves space but also decodes`
Packit	06404a	`bit-identically to dual stereo. OggEnc 1.0 and later uses this`
Packit	06404a	`mode in all high-bitrate encoding.`
Packit	06404a
Packit	06404a	`Overall, this stereo mode is overkill; however, it offers a safe`
Packit	06404a	`alternative to users concerned about the slightest possible`
Packit	06404a	`degradation to the stereo image or archival quality audio.`
Packit	06404a
Packit	06404a	`Phase Stereo`
Packit	06404a
Packit	06404a	`Phase stereo is the least aggressive means of gracefully dropping`
Packit	06404a	`resolution from the stereo image; it affects only diffuse imaging.`
Packit	06404a
Packit	06404a	`It's often quoted that the human ear is deaf to signal phase above`
Packit	06404a	`about 4kHz; this is nearly true and a passable rule of thumb, but it`
Packit	06404a	`can be demonstrated that even an average user can tell the difference`
Packit	06404a	`between high frequency in-phase and out-of-phase noise. Obviously`
Packit	06404a	`then, the statement is not entirely true. However, it's also the case`
Packit	06404a	`that one must resort to nearly such an extreme demonstration before`
Packit	06404a	`finding the counterexample.`
Packit	06404a
Packit	06404a	`'Phase stereo' is simply a more aggressive quantization of the polar`
Packit	06404a	`angle vector; above 4kHz it's generally quite safe to quantize noise`
Packit	06404a	`and noisy elements to only a handful of allowed phases, or to thin the`
Packit	06404a	`phase with respect to the magnitude. The phases of high amplitude`
Packit	06404a	`pure tones may or may not be preserved more carefully (they are`
Packit	06404a	`relatively rare and L/R tend to be in phase, so there is generally`
Packit	06404a	`little reason not to spend a few more bits on them)`
Packit	06404a
Packit	06404a	`example: eight phase stereo`
Packit	06404a
Packit	06404a	`Vorbis may implement phase stereo coupling by preserving the entirety`
Packit	06404a	`of the magnitude vector (essential to fine amplitude and energy`
Packit	06404a	`resolution overall) and quantizing the angle vector to one of only`
Packit	06404a	`four possible values. Given that the magnitude vector may be positive`
Packit	06404a	`or negative, this results in left and right phase having eight`
Packit	06404a	`possible permutation, thus 'eight phase stereo':`
Packit	06404a
Packit	06404a
Packit	06404a
Packit	06404a	`Left and right may be in phase (positive or negative), the most common`
Packit	06404a	`case by far, or out of phase by 90 or 180 degrees.`
Packit	06404a
Packit	06404a	`example: four phase stereo`
Packit	06404a
Packit	06404a	`Similarly, four phase stereo takes the quantization one step further;`
Packit	06404a	`it allows only in-phase and 180 degree out-out-phase signals:`
Packit	06404a
Packit	06404a
Packit	06404a
Packit	06404a	`example: point stereo`
Packit	06404a
Packit	06404a	`Point stereo eliminates the possibility of out-of-phase signal`
Packit	06404a	`entirely. Any diffuse quality to a sound source tends to collapse`
Packit	06404a	`inward to a point somewhere within the stereo image. A practical`
Packit	06404a	`example would be balanced reverberations within a large, live space;`
Packit	06404a	`normally the sound is diffuse and soft, giving a sonic impression of`
Packit	06404a	`volume. In point-stereo, the reverberations would still exist, but`
Packit	06404a	`sound fairly firmly centered within the image (assuming the`
Packit	06404a	`reverberation was centered overall; if the reverberation is stronger`
Packit	06404a	`to the left, then the point of localization in point stereo would be`
Packit	06404a	`to the left). This effect is most noticeable at low and mid`
Packit	06404a	`frequencies and using headphones (which grant perfect stereo`
Packit	06404a	`separation). Point stereo is is a graceful but generally easy to`
Packit	06404a	`detect degradation to the sound quality and is thus used in frequency`
Packit	06404a	`ranges where it is least noticeable.`
Packit	06404a
Packit	06404a	`Mixed Stereo`
Packit	06404a
Packit	06404a	`Mixed stereo is the simultaneous use of more than one of the above`
Packit	06404a	`stereo encoding models, generally using more aggressive modes in`
Packit	06404a	`higher frequencies, lower amplitudes or 'nearly' in-phase sound.`
Packit	06404a
Packit	06404a	`It is also the case that near-DC frequencies should be encoded using`
Packit	06404a	`lossless coupling to avoid frame blocking artifacts.`
Packit	06404a
Packit	06404a	`Vorbis Stereo Modes`
Packit	06404a
Packit	06404a	`Vorbis, as of 1.0, uses lossless stereo and a number of mixed modes`
Packit	06404a	`constructed out of lossless and point stereo. Phase stereo was used`
Packit	06404a	`in the rc2 encoder, but is not currently used for simplicity's sake. It`
Packit	06404a	`will likely be re-added to the stereo model in the future.`
Packit	06404a
Packit	06404a
Packit	06404a	`The Xiph Fish Logo is a`
Packit	06404a	`trademark (™) of Xiph.Org.`
Packit	06404a
Packit	06404a	`These pages © 1994 - 2005 Xiph.Org. All rights reserved.`
Packit	06404a
Packit	06404a
Packit	06404a	`</body>`
Packit	06404a	`</html>`
Packit	06404a
Packit	06404a
Packit	06404a
Packit	06404a
Packit	06404a
Packit	06404a

source-git / mingw-libvorbis

Source Code

Blame doc/stereo.html

Ogg Vorbis stereo-specific channel coupling discussion

Abstract

Mechanisms

Square Polar Mapping

maximal correlation

point and diffuse imaging

controlling bit leakage and symbol crosstalk

eliminating trigonometry and rounding

Channel interleaving

Stereo Models

Dual Stereo

Lossless Stereo

Phase Stereo

example: eight phase stereo

example: four phase stereo

example: point stereo

Mixed Stereo

Vorbis Stereo Modes