|
Packit |
971217 |
This document describes some things to know about the Ogg format, as well
|
|
Packit |
971217 |
as implementation details in GStreamer.
|
|
Packit |
971217 |
|
|
Packit |
971217 |
INTRODUCTION
|
|
Packit |
971217 |
============
|
|
Packit |
971217 |
|
|
Packit |
971217 |
ogg and the granulepos
|
|
Packit |
971217 |
----------------------
|
|
Packit |
971217 |
|
|
Packit |
971217 |
An ogg stream contains pages with a serial number and a granulepos.
|
|
Packit |
971217 |
The granulepos is a 64 bit signed integer. It is a value that in some way
|
|
Packit |
971217 |
represents a time since the start of the stream.
|
|
Packit |
971217 |
The interpretation as such is however both codec-specific and
|
|
Packit |
971217 |
stream-specific.
|
|
Packit |
971217 |
|
|
Packit |
971217 |
ogg has no notion of time: it only knows about bytes and granulepos values
|
|
Packit |
971217 |
on pages.
|
|
Packit |
971217 |
|
|
Packit |
971217 |
The granule position is just a number; the only guarantee for a valid ogg
|
|
Packit |
971217 |
stream is that within a logical stream, this number never decreases.
|
|
Packit |
971217 |
|
|
Packit |
971217 |
While logically a granulepos value can be constructed for every ogg packet,
|
|
Packit |
971217 |
the page is marked with only one granulepos value: the granulepos of the
|
|
Packit |
971217 |
last packet to end on that page.
|
|
Packit |
971217 |
|
|
Packit |
971217 |
theora and the granulepos
|
|
Packit |
971217 |
-------------------------
|
|
Packit |
971217 |
|
|
Packit |
971217 |
The granulepos in theora is an encoding of the frame number of the last
|
|
Packit |
971217 |
key frame ("i frame"), and the number of frames since the last key frame
|
|
Packit |
971217 |
("p frame"). The granulepos is constructed as the sum of the first number,
|
|
Packit |
971217 |
shifted to the left for granuleshift bits, and the second number:
|
|
Packit |
971217 |
granulepos = (pframe << granuleshift) + iframe
|
|
Packit |
971217 |
|
|
Packit |
971217 |
(This means that given a framenumber or a timestamp, one cannot generate
|
|
Packit |
971217 |
the one and only granulepos for that page; several granulepos possibilities
|
|
Packit |
971217 |
correspond to this frame number. You also need the last keyframe, as well
|
|
Packit |
971217 |
as the granuleshift.
|
|
Packit |
971217 |
However, given a granulepos, the theora codec can still map that to a
|
|
Packit |
971217 |
unique timestamp and frame number for that theora stream)
|
|
Packit |
971217 |
|
|
Packit |
971217 |
Note: currently theora stores the "presentation time" as the granulepos;
|
|
Packit |
971217 |
ie. a first data page with one packet contains one video frame and
|
|
Packit |
971217 |
will be marked with 0/0. Changing that to be 1/0 (so that it
|
|
Packit |
971217 |
represents the number of decodable frames up to that point, like
|
|
Packit |
971217 |
for Vorbis) is being discussed.
|
|
Packit |
971217 |
|
|
Packit |
971217 |
vorbis and granulepos
|
|
Packit |
971217 |
---------------------
|
|
Packit |
971217 |
|
|
Packit |
971217 |
In Vorbis, the granulepos represents the number of samples that can be
|
|
Packit |
971217 |
decoded from all packets up to that point.
|
|
Packit |
971217 |
|
|
Packit |
971217 |
In GStreamer, the vorbisenc elements produces a stream where:
|
|
Packit |
971217 |
- OFFSET is the time corresponding to the granulepos
|
|
Packit |
971217 |
number of bytes produced before
|
|
Packit |
971217 |
- OFFSET_END is the granulepos of the produced vorbis buffer
|
|
Packit |
971217 |
- TIMESTAMP is the timestamp matching the begin of the buffer
|
|
Packit |
971217 |
- DURATION is set to the length in time of the buffer
|
|
Packit |
971217 |
|
|
Packit |
971217 |
Ogg media mapping
|
|
Packit |
971217 |
-----------------
|
|
Packit |
971217 |
|
|
Packit |
971217 |
Ogg defines a mapping for each media type that it embeds.
|
|
Packit |
971217 |
|
|
Packit |
971217 |
For Vorbis:
|
|
Packit |
971217 |
|
|
Packit |
971217 |
- 3 header pages, with granulepos 0.
|
|
Packit |
971217 |
- 1 page with 1 packet header identification
|
|
Packit |
971217 |
- N pages with 2 packets comments and codebooks
|
|
Packit |
971217 |
- granulepos is samplenumber of next page
|
|
Packit |
971217 |
- one packet can contain a variable number of samples but one frame
|
|
Packit |
971217 |
that should be handed to the vorbis decoder.
|
|
Packit |
971217 |
|
|
Packit |
971217 |
For Theora
|
|
Packit |
971217 |
|
|
Packit |
971217 |
- 3 header pages, with granulepos 0.
|
|
Packit |
971217 |
- 1 page with 1 packet header identification
|
|
Packit |
971217 |
- N pages with 2 packets comments and codebooks
|
|
Packit |
971217 |
- granulepos is framenumber of last packet in page, where framenumber
|
|
Packit |
971217 |
is a combination of keyframe number and p frames since keyframe.
|
|
Packit |
971217 |
- one packet contains 1 frame
|
|
Packit |
971217 |
|
|
Packit |
971217 |
|
|
Packit |
971217 |
|
|
Packit |
971217 |
|
|
Packit |
971217 |
DEMUXING
|
|
Packit |
971217 |
========
|
|
Packit |
971217 |
|
|
Packit |
971217 |
ogg demuxer
|
|
Packit |
971217 |
-----------
|
|
Packit |
971217 |
|
|
Packit |
971217 |
This ogg demuxer has two modes of operation, which both share a significant
|
|
Packit |
971217 |
amount of code. The first mode is the streaming mode which is automatically
|
|
Packit |
971217 |
selected when the demuxer is connected to a non-getrange based element. When
|
|
Packit |
971217 |
connected to a getrange based element the ogg demuxer can do full seeking
|
|
Packit |
971217 |
with great efficiency.
|
|
Packit |
971217 |
|
|
Packit |
971217 |
1) the streaming mode.
|
|
Packit |
971217 |
|
|
Packit |
971217 |
In this mode, the ogg demuxer receives buffers in the _chain() function which
|
|
Packit |
971217 |
are then simply submitted to the ogg sync layer. Pages are then processed when
|
|
Packit |
971217 |
the sync layer detects them, pads are created for new chains and packets are
|
|
Packit |
971217 |
sent to the peer elements of the pads.
|
|
Packit |
971217 |
|
|
Packit |
971217 |
In this mode, no seeking is possible. This is the typical case when the
|
|
Packit |
971217 |
stream is read from a network source.
|
|
Packit |
971217 |
|
|
Packit |
971217 |
In this mode, no setup is done at startup, the pages are just read and decoded.
|
|
Packit |
971217 |
A new logical chain is detected when one of the pages has the BOS flag set. At
|
|
Packit |
971217 |
this point the existing pads are removed and new pads are created for all the
|
|
Packit |
971217 |
logical streams in this new chain.
|
|
Packit |
971217 |
|
|
Packit |
971217 |
|
|
Packit |
971217 |
2) the random access mode.
|
|
Packit |
971217 |
|
|
Packit |
971217 |
In this mode, the ogg file is first scanned to detect the position and length
|
|
Packit |
971217 |
of all chains. This scanning is performed using a recursive binary search
|
|
Packit |
971217 |
algorithm that is explained below.
|
|
Packit |
971217 |
|
|
Packit |
971217 |
find_chains(start, end)
|
|
Packit |
971217 |
{
|
|
Packit |
971217 |
ret1 = read_next_pages (start);
|
|
Packit |
971217 |
ret2 = read_prev_page (end);
|
|
Packit |
971217 |
|
|
Packit |
971217 |
if (WAS_HEADER (ret1)) {
|
|
Packit |
971217 |
}
|
|
Packit |
971217 |
else {
|
|
Packit |
971217 |
}
|
|
Packit |
971217 |
|
|
Packit |
971217 |
}
|
|
Packit |
971217 |
|
|
Packit |
971217 |
a) read first and last pages
|
|
Packit |
971217 |
|
|
Packit |
971217 |
start end
|
|
Packit |
971217 |
V V
|
|
Packit |
971217 |
+-----------------------+-------------+--------------------+
|
|
Packit |
971217 |
| 111 | 222 | 333 |
|
|
Packit |
971217 |
BOS BOS BOS EOS
|
|
Packit |
971217 |
|
|
Packit |
971217 |
|
|
Packit |
971217 |
after reading start, serial 111, BOS, chain[0] = 111
|
|
Packit |
971217 |
after reading end, serial 333, EOS
|
|
Packit |
971217 |
|
|
Packit |
971217 |
start serialno != end serialno, binary search start, (end-start)/2
|
|
Packit |
971217 |
|
|
Packit |
971217 |
start bisect end
|
|
Packit |
971217 |
V V V
|
|
Packit |
971217 |
+-----------------------+-------------+--------------------+
|
|
Packit |
971217 |
| 111 | 222 | 333 |
|
|
Packit |
971217 |
|
|
Packit |
971217 |
|
|
Packit |
971217 |
after reading start, serial 111, BOS, chain[0] = 111
|
|
Packit |
971217 |
after reading end, serial 222, EOS
|
|
Packit |
971217 |
|
|
Packit |
971217 |
while (
|
|
Packit |
971217 |
|
|
Packit |
971217 |
|
|
Packit |
971217 |
|
|
Packit |
971217 |
testcases
|
|
Packit |
971217 |
---------
|
|
Packit |
971217 |
|
|
Packit |
971217 |
a) stream without BOS
|
|
Packit |
971217 |
|
|
Packit |
971217 |
+----------------------------------------------------------+
|
|
Packit |
971217 |
111 |
|
|
Packit |
971217 |
EOS
|
|
Packit |
971217 |
|
|
Packit |
971217 |
b) chained stream, first chain without BOS
|
|
Packit |
971217 |
|
|
Packit |
971217 |
+-------------------+--------------------------------------+
|
|
Packit |
971217 |
111 | 222 |
|
|
Packit |
971217 |
BOS EOS
|
|
Packit |
971217 |
|
|
Packit |
971217 |
|
|
Packit |
971217 |
c) chained stream
|
|
Packit |
971217 |
|
|
Packit |
971217 |
+-------------------+--------------------------------------+
|
|
Packit |
971217 |
| 111 | 222 |
|
|
Packit |
971217 |
BOS BOS EOS
|
|
Packit |
971217 |
|
|
Packit |
971217 |
|
|
Packit |
971217 |
d) chained stream, second without BOS
|
|
Packit |
971217 |
|
|
Packit |
971217 |
+-------------------+--------------------------------------+
|
|
Packit |
971217 |
| 111 | 222 |
|
|
Packit |
971217 |
BOS EOS
|
|
Packit |
971217 |
|
|
Packit |
971217 |
What can an ogg demuxer do?
|
|
Packit |
971217 |
---------------------------
|
|
Packit |
971217 |
|
|
Packit |
971217 |
An ogg demuxer can read pages and get the granulepos from them.
|
|
Packit |
971217 |
It can ask the decoder elements to convert a granulepos to time.
|
|
Packit |
971217 |
|
|
Packit |
971217 |
An ogg demuxer can also get the granulepos of the first and the last page of a
|
|
Packit |
971217 |
stream to get the start and end timestamp of that stream.
|
|
Packit |
971217 |
It can also get the length in bytes of the stream
|
|
Packit |
971217 |
(when the peer is seekable, that is).
|
|
Packit |
971217 |
|
|
Packit |
971217 |
An ogg demuxer is therefore basically able to seek to any byte position and
|
|
Packit |
971217 |
timestamp.
|
|
Packit |
971217 |
|
|
Packit |
971217 |
When asked to seek to a given granulepos, the ogg demuxer should always convert
|
|
Packit |
971217 |
the value to a timestamp using the peer decoder element conversion function. It
|
|
Packit |
971217 |
can then binary search the file to eventually end up on the page with the given
|
|
Packit |
971217 |
granule pos or a granulepos with the same timestamp.
|
|
Packit |
971217 |
|
|
Packit |
971217 |
Seeking in ogg currently
|
|
Packit |
971217 |
------------------------
|
|
Packit |
971217 |
|
|
Packit |
971217 |
When seeking in an ogg, the decoders can choose to forward the seek event as a
|
|
Packit |
971217 |
granulepos or a timestamp to the ogg demuxer.
|
|
Packit |
971217 |
|
|
Packit |
971217 |
In the case of a granulepos, the ogg demuxer will seek back to the beginning of
|
|
Packit |
971217 |
the stream and skip pages until it finds one with the requested timestamp.
|
|
Packit |
971217 |
|
|
Packit |
971217 |
In the case of a timestamp, the ogg demuxer also seeks back to the beginning of
|
|
Packit |
971217 |
the stream. For each page it reads, it asks the decoder element to convert the
|
|
Packit |
971217 |
granulepos back to a timestamp. The ogg demuxer keeps on skipping pages until
|
|
Packit |
971217 |
the page has a timestamp bigger or equal to the requested one.
|
|
Packit |
971217 |
|
|
Packit |
971217 |
It is therefore important that the decoder elements in vorbis can convert a
|
|
Packit |
971217 |
granulepos into a timestamp or never seek on timestamp on the oggdemuxer.
|
|
Packit |
971217 |
|
|
Packit |
971217 |
The default format on the oggdemuxer source pads is currently defined as a the
|
|
Packit |
971217 |
granulepos of the packets, it is also the value of the OFFSET field in the
|
|
Packit |
971217 |
GstBuffer.
|
|
Packit |
971217 |
|
|
Packit |
971217 |
MUXING
|
|
Packit |
971217 |
======
|
|
Packit |
971217 |
|
|
Packit |
971217 |
Oggmux
|
|
Packit |
971217 |
------
|
|
Packit |
971217 |
|
|
Packit |
971217 |
The ogg muxer's job is to output complete Ogg pages such that the absolute
|
|
Packit |
971217 |
time represented by the valid (ie, not -1) granulepos values on those pages
|
|
Packit |
971217 |
never decreases. This has to be true for all logical streams in the group at
|
|
Packit |
971217 |
the same time.
|
|
Packit |
971217 |
|
|
Packit |
971217 |
To achieve this, encoders are required to pass along the exact time that the
|
|
Packit |
971217 |
granulepos represents for each ogg packet that it pushes to the ogg muxer.
|
|
Packit |
971217 |
This is ESSENTIAL: without this exact time representation of the granulepos,
|
|
Packit |
971217 |
the muxer can not produce valid streams.
|
|
Packit |
971217 |
|
|
Packit |
971217 |
The ogg muxer has a packet queue per sink pad. From this queue a page can
|
|
Packit |
971217 |
be flushed when:
|
|
Packit |
971217 |
- total byte size of queued packets exceeds a given value
|
|
Packit |
971217 |
- total time duration of queued packets exceeds a given value
|
|
Packit |
971217 |
- total byte size of queued packets exceeds maximum Ogg page size
|
|
Packit |
971217 |
- eos of the pad
|
|
Packit |
971217 |
- encoder sent a command to flush out an ogg page after this new packet
|
|
Packit |
971217 |
(in 0.8, through a flush event; in 0.10, with a GstOggBuffer)
|
|
Packit |
971217 |
- muxer wants a flush to happen (so it can output pages)
|
|
Packit |
971217 |
|
|
Packit |
971217 |
The ogg muxer also has a page queue per sink pad. This queue collects
|
|
Packit |
971217 |
Ogg pages from the corresponding packet queue. Each page is also marked
|
|
Packit |
971217 |
with the timestamp that the granulepos in the header represents.
|
|
Packit |
971217 |
|
|
Packit |
971217 |
A page can be flushed from this collection of page queues when:
|
|
Packit |
971217 |
- ideally, every page queue has at least one page with a valid granulepos
|
|
Packit |
971217 |
-> choose the page, from all queues, with the lowest timestamp value
|
|
Packit |
971217 |
- if not, muxer can wait if the following limits aren't reached:
|
|
Packit |
971217 |
- total byte size of any page queue exceeds a limit
|
|
Packit |
971217 |
- total time duration of any page queue exceeds a limit
|
|
Packit |
971217 |
- if this limit is reached, then:
|
|
Packit |
971217 |
- request a page flush from packet queue to page queue for each queue
|
|
Packit |
971217 |
that does not have pages
|
|
Packit |
971217 |
- now take the page from all queues with the lowest timestamp value
|
|
Packit |
971217 |
- make sure all later-coming data is marked as old, either to be still
|
|
Packit |
971217 |
output (but producing an invalid stream, though it can be fixed later)
|
|
Packit |
971217 |
or dropped (which means it's gone forever)
|
|
Packit |
971217 |
|
|
Packit |
971217 |
The oggmuxer uses the offset fields to fill in the granulepos in the pages.
|
|
Packit |
971217 |
|
|
Packit |
971217 |
GStreamer implementation details
|
|
Packit |
971217 |
--------------------------------
|
|
Packit |
971217 |
As said before, the basic rule is that the ogg muxer needs an exact time
|
|
Packit |
971217 |
representation for each granulepos. This needs to be provided by the encoder.
|
|
Packit |
971217 |
|
|
Packit |
971217 |
Potential problems are:
|
|
Packit |
971217 |
- initial offsets for a raw stream need to be preserved somehow. Example:
|
|
Packit |
971217 |
if the first audio sample has time 0.5, the granulepos in the vorbis encoder
|
|
Packit |
971217 |
needs to be adjusted to take this into account.
|
|
Packit |
971217 |
- initial offsets may need be on rate boundaries. Example:
|
|
Packit |
971217 |
if the framerate is 5 fps, and the first video frame has time 0.1 s, the
|
|
Packit |
971217 |
granulepos cannot correctly represent this timestamp.
|
|
Packit |
971217 |
This can be handled out-of-band (initial offset in another muxing format,
|
|
Packit |
971217 |
skeleton track with initial offsets, ...)
|
|
Packit |
971217 |
|
|
Packit |
971217 |
Given that the basic rule for muxing is that the muxer needs an exact timestamp
|
|
Packit |
971217 |
matching the granulepos, we need some way of communicating this time value
|
|
Packit |
971217 |
from encoders to the Ogg muxer. So we need a mechanism to communicate
|
|
Packit |
971217 |
a granulepos and its time representation for each GstBuffer.
|
|
Packit |
971217 |
|
|
Packit |
971217 |
(This is an instance of a more generic problem - having a way to attach
|
|
Packit |
971217 |
more fields to a GstBuffer)
|
|
Packit |
971217 |
|
|
Packit |
971217 |
Possible ways:
|
|
Packit |
971217 |
- setting TIMESTAMP to this value: bad - this value represents the end time
|
|
Packit |
971217 |
of the buffer, and thus conflicts with GStreamer's idea of what TIMESTAMP
|
|
Packit |
971217 |
is. This would cause problems muxing the encoded stream in other muxing
|
|
Packit |
971217 |
formats, or for streaming. Note that this is what was done in GStreamer 0.8
|
|
Packit |
971217 |
- setting DURATION to GP_TIME - TIMESTAMP: bad - this breaks the concept of
|
|
Packit |
971217 |
duration for this frame. Take the video example above; each buffer would
|
|
Packit |
971217 |
have a correct timestamp, but always a 0.1 s duration as opposed to the
|
|
Packit |
971217 |
correct 0.2 s duration
|
|
Packit |
971217 |
- subclassing GstBuffer: clean, but requires a common header used between
|
|
Packit |
971217 |
ogg muxer and all encoders that can be muxed into ogg. Also, what if
|
|
Packit |
971217 |
a format can be muxed into more than one container, and they each have
|
|
Packit |
971217 |
their own "extra" info to communicate ?
|
|
Packit |
971217 |
- adding key/value pairs to GstBuffer: clean, but requires changes to
|
|
Packit |
971217 |
core. Also, the overhead of allocating e.g. a GstStructure for *each* buffer
|
|
Packit |
971217 |
may be expensive.
|
|
Packit |
971217 |
- "cheating":
|
|
Packit |
971217 |
- abuse OFFSET to store the timestamp matching this granulepos
|
|
Packit |
971217 |
- abuse OFFSET_END to store the granulepos value
|
|
Packit |
971217 |
The drawback here is that before, it made sense to use OFFSET and OFFSET_END
|
|
Packit |
971217 |
to store a byte count. Given that this is not used for anything critical
|
|
Packit |
971217 |
(you can't store a raw theora or vorbis stream in a file anyway),
|
|
Packit |
971217 |
this is what's being done for now.
|
|
Packit |
971217 |
|
|
Packit |
971217 |
In practice
|
|
Packit |
971217 |
-----------
|
|
Packit |
971217 |
- all encoders of formats that can be muxed into Ogg produce a stream where:
|
|
Packit |
971217 |
- OFFSET is abused to be the timestamp corresponding exactly to the
|
|
Packit |
971217 |
granulepos
|
|
Packit |
971217 |
- OFFSET_END is abused to be the granulepos of the encoded theora buffer
|
|
Packit |
971217 |
- TIMESTAMP is the timestamp matching the begin of the buffer
|
|
Packit |
971217 |
- DURATION is the length in time of the buffer
|
|
Packit |
971217 |
|
|
Packit |
971217 |
- initial delays should be handled in the GStreamer encoders by mangling
|
|
Packit |
971217 |
the granulepos of the encoded packet to take the delay into account as
|
|
Packit |
971217 |
best as possible and store that in OFFSET;
|
|
Packit |
971217 |
this then brings TIMESTAMP + DURATION to within less
|
|
Packit |
971217 |
than a frame period of the granulepos's time representation
|
|
Packit |
971217 |
The ogg muxer will then create new ogg packets with this OFFSET as
|
|
Packit |
971217 |
the granulepos. So in effect, the granulepos produced by the encoders
|
|
Packit |
971217 |
does not get used directly.
|
|
Packit |
971217 |
|
|
Packit |
971217 |
TODO
|
|
Packit |
971217 |
----
|
|
Packit |
971217 |
- decide on a proper mechanism for communicating extra per-buffer fields
|
|
Packit |
971217 |
- the ogg muxer sets timestamp and duration on outgoing ogg pages based on
|
|
Packit |
971217 |
timestamp/duration of incoming ogg packets.
|
|
Packit |
971217 |
Note that:
|
|
Packit |
971217 |
- since the ogg muxer *has* to output pages sorted by gp time, representing
|
|
Packit |
971217 |
end time of the page, this means that the buffer's timestamps are not
|
|
Packit |
971217 |
necessarily monotonically increasing
|
|
Packit |
971217 |
- timestamp + duration of buffers don't match up; the duration represents
|
|
Packit |
971217 |
the length of the ogg page *for that stream*. Hence, for a normal
|
|
Packit |
971217 |
two-stream file, the sum of all durations is twice the length of the
|
|
Packit |
971217 |
muxed file.
|
|
Packit |
971217 |
|
|
Packit |
971217 |
TESTING
|
|
Packit |
971217 |
-------
|
|
Packit |
971217 |
Proper muxing can be tested by generating test files with command lines like:
|
|
Packit |
971217 |
- video and audio start from 0:
|
|
Packit |
971217 |
gst-launch -v videotestsrc ! theoraenc ! oggmux audiotestsrc ! audioconvert ! vorbisenc ! identity ! oggmux0. oggmux0. ! filesink location=test.ogg
|
|
Packit |
971217 |
|
|
Packit |
971217 |
- video starts after audio:
|
|
Packit |
971217 |
gst-launch -v videotestsrc timestamp-offset=500000000 ! theoraenc ! oggmux audiotestsrc ! audioconvert ! vorbisenc ! identity ! oggmux0. oggmux0. ! filesink location=test.ogg
|
|
Packit |
971217 |
|
|
Packit |
971217 |
- audio starts after video:
|
|
Packit |
971217 |
gst-launch -v videotestsrc ! theoraenc ! oggmux audiotestsrc timestamp-offset=500000000 ! audioconvert ! vorbisenc ! identity ! oggmux0. oggmux0. ! filesink location=test.ogg
|
|
Packit |
971217 |
|
|
Packit |
971217 |
The resulting files can be verified with oggz-validate for correctness.
|