Blame ext/ogg/README

Packit 971217
This document describes some things to know about the Ogg format, as well
Packit 971217
as implementation details in GStreamer.
Packit 971217
Packit 971217
INTRODUCTION
Packit 971217
============
Packit 971217
Packit 971217
ogg and the granulepos
Packit 971217
----------------------
Packit 971217
Packit 971217
An ogg stream contains pages with a serial number and a granulepos.
Packit 971217
The granulepos is a 64 bit signed integer.  It is a value that in some way
Packit 971217
represents a time since the start of the stream.
Packit 971217
The interpretation as such is however both codec-specific and
Packit 971217
stream-specific.
Packit 971217
Packit 971217
ogg has no notion of time: it only knows about bytes and granulepos values
Packit 971217
on pages.
Packit 971217
Packit 971217
The granule position is just a number; the only guarantee for a valid ogg
Packit 971217
stream is that within a logical stream, this number never decreases.
Packit 971217
Packit 971217
While logically a granulepos value can be constructed for every ogg packet,
Packit 971217
the page is marked with only one granulepos value: the granulepos of the
Packit 971217
last packet to end on that page.
Packit 971217
Packit 971217
theora and the granulepos
Packit 971217
-------------------------
Packit 971217
Packit 971217
The granulepos in theora is an encoding of the frame number of the last
Packit 971217
key frame ("i frame"), and the number of frames since the last key frame
Packit 971217
("p frame").  The granulepos is constructed as the sum of the first number,
Packit 971217
shifted to the left for granuleshift bits, and the second number:
Packit 971217
granulepos = (pframe << granuleshift) + iframe
Packit 971217
Packit 971217
(This means that given a framenumber or a timestamp, one cannot generate
Packit 971217
 the one and only granulepos for that page; several granulepos possibilities
Packit 971217
 correspond to this frame number.  You also need the last keyframe, as well
Packit 971217
 as the granuleshift.
Packit 971217
 However, given a granulepos, the theora codec can still map that to a
Packit 971217
 unique timestamp and frame number for that theora stream)
Packit 971217
Packit 971217
 Note: currently theora stores the "presentation time" as the granulepos;
Packit 971217
       ie. a first data page with one packet contains one video frame and
Packit 971217
       will be marked with 0/0.  Changing that to be 1/0 (so that it
Packit 971217
       represents the number of decodable frames up to that point, like
Packit 971217
       for Vorbis) is being discussed.
Packit 971217
Packit 971217
vorbis and granulepos
Packit 971217
---------------------
Packit 971217
Packit 971217
In Vorbis, the granulepos represents the number of samples that can be
Packit 971217
decoded from all packets up to that point.
Packit 971217
Packit 971217
In GStreamer, the vorbisenc elements produces a stream where:
Packit 971217
- OFFSET is the time corresponding to the granulepos
Packit 971217
  number of bytes produced before
Packit 971217
- OFFSET_END is the granulepos of the produced vorbis buffer
Packit 971217
- TIMESTAMP is the timestamp matching the begin of the buffer
Packit 971217
- DURATION is set to the length in time of the buffer
Packit 971217
Packit 971217
Ogg media mapping
Packit 971217
-----------------
Packit 971217
Packit 971217
Ogg defines a mapping for each media type that it embeds.
Packit 971217
Packit 971217
For Vorbis:
Packit 971217
Packit 971217
  - 3 header pages, with granulepos 0.
Packit 971217
     - 1 page with 1 packet header identification
Packit 971217
     - N pages with 2 packets comments and codebooks
Packit 971217
  - granulepos is samplenumber of next page
Packit 971217
  - one packet can contain a variable number of samples but one frame
Packit 971217
    that should be handed to the vorbis decoder.
Packit 971217
  
Packit 971217
For Theora
Packit 971217
     
Packit 971217
  - 3 header pages, with granulepos 0.
Packit 971217
     - 1 page with 1 packet header identification
Packit 971217
     - N pages with 2 packets comments and codebooks
Packit 971217
  - granulepos is framenumber of last packet in page, where framenumber
Packit 971217
    is a combination of keyframe number and p frames since keyframe.
Packit 971217
  - one packet contains 1 frame
Packit 971217
  
Packit 971217
Packit 971217
Packit 971217
Packit 971217
DEMUXING
Packit 971217
========
Packit 971217
Packit 971217
ogg demuxer
Packit 971217
-----------
Packit 971217
Packit 971217
This ogg demuxer has two modes of operation, which both share a significant
Packit 971217
amount of code. The first mode is the streaming mode which is automatically 
Packit 971217
selected when the demuxer is connected to a non-getrange based element. When 
Packit 971217
connected to a getrange based element the ogg demuxer can do full seeking
Packit 971217
with great efficiency.
Packit 971217
Packit 971217
1) the streaming mode.
Packit 971217
Packit 971217
In this mode, the ogg demuxer receives buffers in the _chain() function which
Packit 971217
are then simply submitted to the ogg sync layer. Pages are then processed when
Packit 971217
the sync layer detects them, pads are created for new chains and packets are
Packit 971217
sent to the peer elements of the pads.
Packit 971217
Packit 971217
In this mode, no seeking is possible. This is the typical case when the
Packit 971217
stream is read from a network source.
Packit 971217
Packit 971217
In this mode, no setup is done at startup, the pages are just read and decoded.
Packit 971217
A new logical chain is detected when one of the pages has the BOS flag set. At
Packit 971217
this point the existing pads are removed and new pads are created for all the
Packit 971217
logical streams in this new chain.
Packit 971217
  
Packit 971217
Packit 971217
2) the random access mode.
Packit 971217
Packit 971217
  In this mode, the ogg file is first scanned to detect the position and length
Packit 971217
of all chains. This scanning is performed using a recursive binary search
Packit 971217
algorithm that is explained below.
Packit 971217
Packit 971217
    find_chains(start, end)
Packit 971217
    {
Packit 971217
      ret1 = read_next_pages (start);
Packit 971217
      ret2 = read_prev_page (end);
Packit 971217
      
Packit 971217
      if (WAS_HEADER (ret1)) {
Packit 971217
      }
Packit 971217
      else {
Packit 971217
      }
Packit 971217
Packit 971217
    }
Packit 971217
Packit 971217
  a) read first and last pages
Packit 971217
Packit 971217
   start                                                      end
Packit 971217
    V                                                          V 
Packit 971217
    +-----------------------+-------------+--------------------+
Packit 971217
    |  111                  |  222        |  333               |
Packit 971217
   BOS                     BOS           BOS                  EOS
Packit 971217
Packit 971217
   
Packit 971217
   after reading start, serial 111, BOS, chain[0] = 111
Packit 971217
   after reading end,   serial 333, EOS
Packit 971217
Packit 971217
   start serialno != end serialno, binary search start, (end-start)/2
Packit 971217
Packit 971217
   start                    bisect                            end
Packit 971217
    V                         V                                V 
Packit 971217
    +-----------------------+-------------+--------------------+
Packit 971217
    |  111                  |  222        |  333               |
Packit 971217
Packit 971217
   
Packit 971217
   after reading start, serial 111, BOS, chain[0] = 111
Packit 971217
   after reading end,   serial 222, EOS
Packit 971217
Packit 971217
   while (
Packit 971217
Packit 971217
Packit 971217
Packit 971217
testcases
Packit 971217
---------
Packit 971217
    
Packit 971217
 a) stream without BOS
Packit 971217
Packit 971217
    +----------------------------------------------------------+
Packit 971217
       111                                                     |
Packit 971217
                                                              EOS
Packit 971217
Packit 971217
 b) chained stream, first chain without BOS
Packit 971217
  
Packit 971217
    +-------------------+--------------------------------------+
Packit 971217
       111              | 222                                  |
Packit 971217
                       BOS                                    EOS
Packit 971217
Packit 971217
Packit 971217
 c) chained stream
Packit 971217
  
Packit 971217
    +-------------------+--------------------------------------+
Packit 971217
    |  111              | 222                                  |
Packit 971217
   BOS                 BOS                                    EOS
Packit 971217
Packit 971217
Packit 971217
 d) chained stream, second without BOS
Packit 971217
Packit 971217
    +-------------------+--------------------------------------+
Packit 971217
    |  111              | 222                                  |
Packit 971217
   BOS                                                        EOS
Packit 971217
Packit 971217
What can an ogg demuxer do?
Packit 971217
---------------------------
Packit 971217
Packit 971217
An ogg demuxer can read pages and get the granulepos from them.
Packit 971217
It can ask the decoder elements to convert a granulepos to time.
Packit 971217
Packit 971217
An ogg demuxer can also get the granulepos of the first and the last page of a
Packit 971217
stream to get the start and end timestamp of that stream.
Packit 971217
It can also get the length in bytes of the stream
Packit 971217
(when the peer is seekable, that is).
Packit 971217
Packit 971217
An ogg demuxer is therefore basically able to seek to any byte position and
Packit 971217
timestamp.
Packit 971217
Packit 971217
When asked to seek to a given granulepos, the ogg demuxer should always convert
Packit 971217
the value to a timestamp using the peer decoder element conversion function. It
Packit 971217
can then binary search the file to eventually end up on the page with the given
Packit 971217
granule pos or a granulepos with the same timestamp.
Packit 971217
Packit 971217
Seeking in ogg currently
Packit 971217
------------------------
Packit 971217
Packit 971217
When seeking in an ogg, the decoders can choose to forward the seek event as a
Packit 971217
granulepos or a timestamp to the ogg demuxer.
Packit 971217
Packit 971217
In the case of a granulepos, the ogg demuxer will seek back to the beginning of
Packit 971217
the stream and skip pages until it finds one with the requested timestamp.
Packit 971217
Packit 971217
In the case of a timestamp, the ogg demuxer also seeks back to the beginning of
Packit 971217
the stream. For each page it reads, it asks the decoder element to convert the
Packit 971217
granulepos back to a timestamp. The ogg demuxer keeps on skipping pages until
Packit 971217
the page has a timestamp bigger or equal to the requested one.
Packit 971217
Packit 971217
It is therefore important that the decoder elements in vorbis can convert a
Packit 971217
granulepos into a timestamp or never seek on timestamp on the oggdemuxer.
Packit 971217
Packit 971217
The default format on the oggdemuxer source pads is currently defined as a the
Packit 971217
granulepos of the packets, it is also the value of the OFFSET field in the
Packit 971217
GstBuffer.
Packit 971217
Packit 971217
MUXING
Packit 971217
======
Packit 971217
Packit 971217
Oggmux
Packit 971217
------
Packit 971217
Packit 971217
The ogg muxer's job is to output complete Ogg pages such that the absolute
Packit 971217
time represented by the valid (ie, not -1) granulepos values on those pages
Packit 971217
never decreases. This has to be true for all logical streams in the group at
Packit 971217
the same time.
Packit 971217
Packit 971217
To achieve this, encoders are required to pass along the exact time that the
Packit 971217
granulepos represents for each ogg packet that it pushes to the ogg muxer.
Packit 971217
This is ESSENTIAL: without this exact time representation of the granulepos,
Packit 971217
the muxer can not produce valid streams.
Packit 971217
Packit 971217
The ogg muxer has a packet queue per sink pad.  From this queue a page can
Packit 971217
be flushed when:
Packit 971217
  - total byte size of queued packets exceeds a given value
Packit 971217
  - total time duration of queued packets exceeds a given value
Packit 971217
  - total byte size of queued packets exceeds maximum Ogg page size
Packit 971217
  - eos of the pad
Packit 971217
  - encoder sent a command to flush out an ogg page after this new packet
Packit 971217
    (in 0.8, through a flush event; in 0.10, with a GstOggBuffer)
Packit 971217
  - muxer wants a flush to happen (so it can output pages)
Packit 971217
Packit 971217
The ogg muxer also has a page queue per sink pad.  This queue collects
Packit 971217
Ogg pages from the corresponding packet queue.  Each page is also marked
Packit 971217
with the timestamp that the granulepos in the header represents.
Packit 971217
Packit 971217
A page can be flushed from this collection of page queues when:
Packit 971217
- ideally, every page queue has at least one page with a valid granulepos
Packit 971217
  -> choose the page, from all queues, with the lowest timestamp value
Packit 971217
- if not, muxer can wait if the following limits aren't reached:
Packit 971217
  - total byte size of any page queue exceeds a limit
Packit 971217
  - total time duration of any page queue exceeds a limit
Packit 971217
- if this limit is reached, then:
Packit 971217
  - request a page flush from packet queue to page queue for each queue
Packit 971217
    that does not have pages
Packit 971217
  - now take the page from all queues with the lowest timestamp value
Packit 971217
  - make sure all later-coming data is marked as old, either to be still
Packit 971217
    output (but producing an invalid stream, though it can be fixed later)
Packit 971217
    or dropped (which means it's gone forever)
Packit 971217
Packit 971217
The oggmuxer uses the offset fields to fill in the granulepos in the pages.
Packit 971217
Packit 971217
GStreamer implementation details
Packit 971217
--------------------------------
Packit 971217
As said before, the basic rule is that the ogg muxer needs an exact time
Packit 971217
representation for each granulepos.  This needs to be provided by the encoder.
Packit 971217
Packit 971217
Potential problems are:
Packit 971217
 - initial offsets for a raw stream need to be preserved somehow.  Example:
Packit 971217
   if the first audio sample has time 0.5, the granulepos in the vorbis encoder
Packit 971217
   needs to be adjusted to take this into account.
Packit 971217
 - initial offsets may need be on rate boundaries.  Example:
Packit 971217
   if the framerate is 5 fps, and the first video frame has time 0.1 s, the
Packit 971217
   granulepos cannot correctly represent this timestamp.
Packit 971217
   This can be handled out-of-band (initial offset in another muxing format,
Packit 971217
   skeleton track with initial offsets, ...)
Packit 971217
Packit 971217
Given that the basic rule for muxing is that the muxer needs an exact timestamp
Packit 971217
matching the granulepos, we need some way of communicating this time value
Packit 971217
from encoders to the Ogg muxer.  So we need a mechanism to communicate
Packit 971217
a granulepos and its time representation for each GstBuffer.
Packit 971217
Packit 971217
(This is an instance of a more generic problem - having a way to attach
Packit 971217
 more fields to a GstBuffer)
Packit 971217
Packit 971217
Possible ways:
Packit 971217
- setting TIMESTAMP to this value: bad - this value represents the end time
Packit 971217
  of the buffer, and thus conflicts with GStreamer's idea of what TIMESTAMP
Packit 971217
  is.  This would cause problems muxing the encoded stream in other muxing
Packit 971217
  formats, or for streaming.  Note that this is what was done in GStreamer 0.8
Packit 971217
- setting DURATION to GP_TIME - TIMESTAMP: bad - this breaks the concept of
Packit 971217
  duration for this frame.  Take the video example above; each buffer would
Packit 971217
  have a correct timestamp, but always a 0.1 s duration as opposed to the
Packit 971217
  correct 0.2 s duration
Packit 971217
- subclassing GstBuffer: clean, but requires a common header used between
Packit 971217
  ogg muxer and all encoders that can be muxed into ogg.  Also, what if
Packit 971217
  a format can be muxed into more than one container, and they each have
Packit 971217
  their own "extra" info to communicate ?
Packit 971217
- adding key/value pairs to GstBuffer: clean, but requires changes to
Packit 971217
  core.  Also, the overhead of allocating e.g. a GstStructure for *each* buffer
Packit 971217
  may be expensive.
Packit 971217
- "cheating":
Packit 971217
  - abuse OFFSET to store the timestamp matching this granulepos
Packit 971217
  - abuse OFFSET_END to store the granulepos value
Packit 971217
  The drawback here is that before, it made sense to use OFFSET and OFFSET_END
Packit 971217
  to store a byte count.  Given that this is not used for anything critical
Packit 971217
  (you can't store a raw theora or vorbis stream in a file anyway),
Packit 971217
  this is what's being done for now.
Packit 971217
Packit 971217
In practice
Packit 971217
-----------
Packit 971217
- all encoders of formats that can be muxed into Ogg produce a stream where:
Packit 971217
  - OFFSET is abused to be the timestamp corresponding exactly to the
Packit 971217
    granulepos
Packit 971217
  - OFFSET_END is abused to be the granulepos of the encoded theora buffer
Packit 971217
  - TIMESTAMP is the timestamp matching the begin of the buffer
Packit 971217
  - DURATION is the length in time of the buffer
Packit 971217
Packit 971217
- initial delays should be handled in the GStreamer encoders by mangling
Packit 971217
  the granulepos of the encoded packet to take the delay into account as
Packit 971217
  best as possible and store that in OFFSET;
Packit 971217
  this then brings TIMESTAMP + DURATION to within less
Packit 971217
  than a frame period of the granulepos's time representation
Packit 971217
  The ogg muxer will then create new ogg packets with this OFFSET as
Packit 971217
  the granulepos.  So in effect, the granulepos produced by the encoders
Packit 971217
  does not get used directly.
Packit 971217
Packit 971217
TODO
Packit 971217
----
Packit 971217
- decide on a proper mechanism for communicating extra per-buffer fields
Packit 971217
- the ogg muxer sets timestamp and duration on outgoing ogg pages based on
Packit 971217
  timestamp/duration of incoming ogg packets.
Packit 971217
  Note that:
Packit 971217
  - since the ogg muxer *has* to output pages sorted by gp time, representing
Packit 971217
    end time of the page, this means that the buffer's timestamps are not
Packit 971217
    necessarily monotonically increasing
Packit 971217
  - timestamp + duration of buffers don't match up; the duration represents
Packit 971217
    the length of the ogg page *for that stream*.  Hence, for a normal
Packit 971217
    two-stream file, the sum of all durations is twice the length of the
Packit 971217
    muxed file.
Packit 971217
Packit 971217
TESTING
Packit 971217
-------
Packit 971217
Proper muxing can be tested by generating test files with command lines like:
Packit 971217
- video and audio start from 0:
Packit 971217
gst-launch -v videotestsrc ! theoraenc ! oggmux audiotestsrc ! audioconvert ! vorbisenc ! identity ! oggmux0. oggmux0. ! filesink location=test.ogg
Packit 971217
Packit 971217
- video starts after audio:
Packit 971217
gst-launch -v videotestsrc timestamp-offset=500000000 ! theoraenc ! oggmux audiotestsrc ! audioconvert ! vorbisenc ! identity ! oggmux0. oggmux0. ! filesink location=test.ogg
Packit 971217
Packit 971217
- audio starts after video:
Packit 971217
gst-launch -v videotestsrc ! theoraenc ! oggmux audiotestsrc timestamp-offset=500000000 ! audioconvert ! vorbisenc ! identity ! oggmux0. oggmux0. ! filesink location=test.ogg
Packit 971217
Packit 971217
The resulting files can be verified with oggz-validate for correctness.