Blame doc/vorbis-clip.txt

Packit 06404a
Topic:
Packit 06404a
Packit 06404a
Sample granularity editing of a Vorbis file; inferred arbitrary sample
Packit 06404a
length starting offsets / PCM stream lengths
Packit 06404a
Packit 06404a
Overview:
Packit 06404a
Packit 06404a
Vorbis, like mp3, is a frame-based* audio compression where audio is
Packit 06404a
broken up into discrete short time segments.  These segments are
Packit 06404a
'atomic' that is, one must recover the entire short time segment from
Packit 06404a
the frame packet; there's no way to recover only a part of the PCM time
Packit 06404a
segment from part of the coded packet without expanding the entire
Packit 06404a
packet and then discarding a portion of the resulting PCM audio.
Packit 06404a
Packit 06404a
* In mp3, the data segment representing a given time period is called
Packit 06404a
  a 'frame'; the roughly equivalent Vorbis construct is a 'packet'.
Packit 06404a
Packit 06404a
Thus, when we edit a Vorbis stream, the finest physical editing
Packit 06404a
granularity is on these packet boundaries (the mp3 case is
Packit 06404a
actually somewhat more complex and mp3 editing is more complicated
Packit 06404a
than just snipping on a frame boundary because time data can be spread
Packit 06404a
backward or forward over frames.  In Vorbis, packets are all
Packit 06404a
stand-alone).  Thus, at the physical packet level, Vorbis is still
Packit 06404a
limited to streams that contain an integral number of packets.
Packit 06404a
Packit 06404a
However, Vorbis streams may still exactly represent and be edited to a
Packit 06404a
PCM stream of arbitrary length and starting offset without padding the
Packit 06404a
beginning or end of the decoded stream or requiring that the desired
Packit 06404a
edit points be packet aligned.  Vorbis makes use of Ogg stream
Packit 06404a
framing, and this framing provides time-stamping data, called a
Packit 06404a
'granule position'; our starting offset and finished stream length may
Packit 06404a
be inferred from correct usage of the granule position data.
Packit 06404a
Packit 06404a
Time stamping mechanism:
Packit 06404a
Packit 06404a
Vorbis packets are bundled into into Ogg pages (note that pages do not
Packit 06404a
necessarily contain integral numbers of packets, but that isn't
Packit 06404a
inportant in this discussion.  More about Ogg framing can be found in
Packit 06404a
ogg/doc/framing.html).  Each page that contains a packet boundary is
Packit 06404a
stamped with the absolute sample-granularity offset of the data, that
Packit 06404a
is, 'complete samples-to-date' up to the last completed packet of that
Packit 06404a
page. (The same mechanism is used for eg, video, where the number
Packit 06404a
represents complete 2-D frames, and so on).
Packit 06404a
Packit 06404a
(It's possible but rare for a packet to span more than two pages such
Packit 06404a
that page[s] in the middle have no packet boundary; these packets have
Packit 06404a
a granule position of '-1'.)
Packit 06404a
Packit 06404a
This granule position mechaism in Ogg is used by Vorbis to indicate when the
Packit 06404a
PCM data intended to be represented in a Vorbis segment begins a
Packit 06404a
number of samples into the data represented by the first packet[s]
Packit 06404a
and/or ends before the physical PCM data represented in the last
Packit 06404a
packet[s].
Packit 06404a
Packit 06404a
File length a non-integral number of frames:
Packit 06404a
Packit 06404a
A file to be encoded in Vorbis will probably not encode into an
Packit 06404a
integral number of packets; such a file is encoded with the last
Packit 06404a
packet containing 'extra'* samples. These samples are not padding; they
Packit 06404a
will be discarded in decode. 
Packit 06404a
Packit 06404a
*(For best results, the encoder should use extra samples that preserve
Packit 06404a
the character of the last frame.  Simply setting them to zero will
Packit 06404a
introduce a 'cliff' that's hard to encode, resulting in spread-frame
Packit 06404a
noise.  Libvorbis extrapolates the last frame past the end of data to
Packit 06404a
produce the extra samples.  Even simply duplicating the last value is
Packit 06404a
better than clamping the signal to zero).
Packit 06404a
Packit 06404a
The encoder indicates to the decoder that the file is actually shorter
Packit 06404a
than all of the samples ('original' + 'extra') by setting the granule
Packit 06404a
position in the last page to a short value, that is, the last
Packit 06404a
timestamp is the original length of the file discarding extra samples.
Packit 06404a
The decoder will see that the number of samples it has decoded in the
Packit 06404a
last page is too many; it is 'original' + 'extra', where the
Packit 06404a
granulepos says that through the last packet we only have 'original'
Packit 06404a
number of samples.  The decoder then ignores the 'extra' samples.
Packit 06404a
This behavior is to occur only when the end-of-stream bit is set in
Packit 06404a
the page (indicating last page of the logical stream).
Packit 06404a
 
Packit 06404a
Note that it not legal for the granule position of the last page to
Packit 06404a
indicate that there are more samples in the file than actually exist,
Packit 06404a
however, implementations should handle such an illegal file gracefully
Packit 06404a
in the interests of robust programming.
Packit 06404a
Packit 06404a
Beginning point not on integral packet boundary:
Packit 06404a
Packit 06404a
It is possible that we will the PCM data represented by a Vorbis
Packit 06404a
stream to begin at a position later than where the decoded PCM data
Packit 06404a
really begins after an integral packet boundary, a situation analagous
Packit 06404a
to the above description where the PCM data does not end at an
Packit 06404a
integral packet boundary.  The easiest example is taking a clip out of
Packit 06404a
a larger Vorbis stream, and choosing a beginning point of the clip
Packit 06404a
that is not on a packet boundary; we need to ignore a few samples to
Packit 06404a
get the desired beginning point.
Packit 06404a
Packit 06404a
The process of marking the desired beginning point is similar to
Packit 06404a
marking an arbitrary ending point. If the encoder wishes sample zero
Packit 06404a
to be some location past the actual beginning of data, it associates a
Packit 06404a
'short' granule position value with the completion of the second*
Packit 06404a
audio packet.  The granule position is associated with the second
Packit 06404a
packet simply by making sure the second packet completes its page.
Packit 06404a
Packit 06404a
*(We associate the short value with the second packet for two reasons.
Packit 06404a
 a) The first packet only primes the overlap/add buffer.  No data is
Packit 06404a
 returned before decoding the second packet; this places the decision
Packit 06404a
 information at the point of decision.  b) Placing the short value on
Packit 06404a
 the first packet would make the value negative (as the first packet
Packit 06404a
 normally represents position zero); a negative value would break the
Packit 06404a
 requirement that granule positions increase; the headers have
Packit 06404a
 position values of zero)
Packit 06404a
Packit 06404a
The decoder sees that on the first page that will return
Packit 06404a
data from the overlap/add queue, we have more samples than the granule
Packit 06404a
position accounts for, and discards the 'surplus' from the beginning
Packit 06404a
of the queue.
Packit 06404a
Packit 06404a
Note that short granule values (indicating less than the actually
Packit 06404a
returned about of data) are not legal in the Vorbis spec outside of
Packit 06404a
indicating beginning and ending sample positions.  However, decoders
Packit 06404a
should, at minimum, tolerate inadvertant short values elsewhere in the
Packit 06404a
stream (just as they should tolerate out-of-order/non-increasing
Packit 06404a
granulepos values, although this too is illegal).
Packit 06404a
Packit 06404a
Beginning point at arbitrary positive timestamp (no 'zero' sample):
Packit 06404a
Packit 06404a
It's also possible that the granule position of the first page of an
Packit 06404a
audio stream is a 'long value', that is, a value larger than the
Packit 06404a
amount of PCM audio decoded.  This implies only that we are starting
Packit 06404a
playback at some point into the logical stream, a potentially common
Packit 06404a
occurence in streaming applications where the decoder may be
Packit 06404a
connecting into a live stream.  The decoder should not treat the long
Packit 06404a
value specially.
Packit 06404a
Packit 06404a
A long value elsewhere in the stream would normally occur only when a
Packit 06404a
page is lost or out of sequence, as indicated by the page's sequence
Packit 06404a
number.  A long value under any other situation is not legal, however
Packit 06404a
a decoder should tolerate both possibilities.
Packit 06404a
Packit 06404a