|
Packit Service |
a3c5fa |
libxmlb
|
|
Packit Service |
a3c5fa |
=======
|
|
Packit Service |
a3c5fa |
|
|
Packit Service |
a3c5fa |
Introduction
|
|
Packit Service |
a3c5fa |
------------
|
|
Packit Service |
a3c5fa |
|
|
Packit Service |
a3c5fa |
XML is slow to parse and strings inside the document cannot be memory mapped as
|
|
Packit Service |
a3c5fa |
they do not have a trailing NUL char. The libxmlb library takes XML source, and
|
|
Packit Service |
a3c5fa |
converts it to a structured binary representation with a deduplicated string
|
|
Packit Service |
a3c5fa |
table -- where the strings have the NULs included.
|
|
Packit Service |
a3c5fa |
|
|
Packit Service |
a3c5fa |
This allows an application to mmap the binary XML file, do an XPath query and
|
|
Packit Service |
a3c5fa |
return some strings without actually parsing the entire document. This is all
|
|
Packit Service |
a3c5fa |
done using (almost) zero allocations and no actual copying of the binary data.
|
|
Packit Service |
a3c5fa |
|
|
Packit Service |
a3c5fa |
As each node in the binary XML file encodes the 'next' node at the same level
|
|
Packit Service |
a3c5fa |
it makes skipping whole subtrees trivial. A 10Mb binary XML file can be loaded
|
|
Packit Service |
a3c5fa |
from disk **and** queried in less than a few milliseconds.
|
|
Packit Service |
a3c5fa |
|
|
Packit Service |
a3c5fa |
The binary XML is not supposed to be small. It's usually about half the size of
|
|
Packit Service |
a3c5fa |
the text XML data where a lot of the tag content is duplicated, but can actually
|
|
Packit Service |
a3c5fa |
be larger than the original XML file. This isn't important; the fast query speed
|
|
Packit Service |
a3c5fa |
and the ability to mmap strings without copies more than makes up for the larger
|
|
Packit Service |
a3c5fa |
on-disk size. If you want to compress your XML, this library probably isn't for
|
|
Packit Service |
a3c5fa |
you -- just use gzip -- its gives you an almost a perfect compression ratio for
|
|
Packit Service |
a3c5fa |
data like this.
|
|
Packit Service |
a3c5fa |
|
|
Packit Service |
a3c5fa |
For example:
|
|
Packit Service |
a3c5fa |
|
|
Packit Service |
a3c5fa |
$ xb-tool compile fedora.xmlb fedora.xml.gz
|
|
Packit Service |
a3c5fa |
|
|
Packit Service |
a3c5fa |
$ du -h fedora.xml*
|
|
Packit Service |
a3c5fa |
12M fedora.xmlb
|
|
Packit Service |
a3c5fa |
3.6M fedora.xml.gz
|
|
Packit Service |
a3c5fa |
|
|
Packit Service |
a3c5fa |
$ xb-tool query fedora.xmlb "components/component[@type=desktop]/id[text()=firefox.desktop]"
|
|
Packit Service |
a3c5fa |
RESULT: firefox.desktop
|
|
Packit Service |
a3c5fa |
real 0m0.011s
|
|
Packit Service |
a3c5fa |
user 0m0.010s
|
|
Packit Service |
a3c5fa |
sys 0m0.001s
|
|
Packit Service |
a3c5fa |
|
|
Packit Service |
a3c5fa |
XPath
|
|
Packit Service |
a3c5fa |
=====
|
|
Packit Service |
a3c5fa |
|
|
Packit Service |
a3c5fa |
This library only implements a tiny subset of XPath. See the examples for the
|
|
Packit Service |
a3c5fa |
full list, but it's basically restricted to element_name, attributes and text.
|
|
Packit Service |
a3c5fa |
|
|
Packit Service |
a3c5fa |
We will use the following XML document in the examples below.
|
|
Packit Service |
a3c5fa |
|
|
Packit Service |
a3c5fa |
|
|
Packit Service |
a3c5fa |
<bookstore>
|
|
Packit Service |
a3c5fa |
<book>
|
|
Packit Service |
a3c5fa |
<title lang="en">Harry Potter</title>
|
|
Packit Service |
a3c5fa |
<price>29.99</price>
|
|
Packit Service |
a3c5fa |
</book>
|
|
Packit Service |
a3c5fa |
<book percentage="99">
|
|
Packit Service |
a3c5fa |
<title lang="en">Learning XML</title>
|
|
Packit Service |
a3c5fa |
<price>39.95</price>
|
|
Packit Service |
a3c5fa |
</book>
|
|
Packit Service |
a3c5fa |
</bookstore>
|
|
Packit Service |
a3c5fa |
|
|
Packit Service |
a3c5fa |
Selecting Nodes
|
|
Packit Service |
a3c5fa |
---------------
|
|
Packit Service |
a3c5fa |
|
|
Packit Service |
a3c5fa |
XPath uses path expressions to select nodes in an XML document. The only thing
|
|
Packit Service |
a3c5fa |
that libxmlb can return are nodes.
|
|
Packit Service |
a3c5fa |
|
|
Packit Service |
a3c5fa |
| Example | Description | Supported |
|
|
Packit Service |
a3c5fa |
| --- | --- | --- |
|
|
Packit Service |
a3c5fa |
| `/bookstore` | Returns the root bookstore element | ✔ |
|
|
Packit Service |
a3c5fa |
| `/bookstore/book` | Returns all `book` elements | ✔ |
|
|
Packit Service |
a3c5fa |
| `//book` | Returns books no matter where they are | ✖ |
|
|
Packit Service |
a3c5fa |
| `bookstore//book` | Returns books that are descendant of `bookstore` | ✖ |
|
|
Packit Service |
a3c5fa |
| `@lang` | Returns attributes that are named `lang` | ✖ |
|
|
Packit Service |
a3c5fa |
| `/bookstore/.` | Returns the `bookstore` node | ✖ |
|
|
Packit Service |
a3c5fa |
| `/bookstore/book/*` | Returns all `title` and `price` nodes of each `book` node | ✔ |
|
|
Packit Service |
a3c5fa |
| `/bookstore/book/child::*` | Returns all `title` and `price` nodes of each `book` node | ✔ |
|
|
Packit Service |
a3c5fa |
| `/bookstore/book/title/..` | Returns the `book` nodes with a title | ✔ |
|
|
Packit Service |
a3c5fa |
| `/bookstore/book/parent::*` | Returns `bookstore`, the parent of `book` | ✔ |
|
|
Packit Service |
a3c5fa |
| `/bookstore/book/parent::bookstore` | Returns the parent `bookstore` of `book` | ✖ |
|
|
Packit Service |
a3c5fa |
|
|
Packit Service |
a3c5fa |
Predicates
|
|
Packit Service |
a3c5fa |
----------
|
|
Packit Service |
a3c5fa |
|
|
Packit Service |
a3c5fa |
Predicates are used to find a specific node or a node that contains a specific
|
|
Packit Service |
a3c5fa |
value. Predicates are always embedded in square brackets.
|
|
Packit Service |
a3c5fa |
|
|
Packit Service |
a3c5fa |
| Example | Description | Supported |
|
|
Packit Service |
a3c5fa |
| --- | --- | --- |
|
|
Packit Service |
a3c5fa |
| `/bookstore/book[1]` | Returns the first book element | ✔ |
|
|
Packit Service |
a3c5fa |
| `/bookstore/book[first()]` | Returns the first book element | ✔ |
|
|
Packit Service |
a3c5fa |
| `/bookstore/book[last()]` | Returns the last book element | ✔ |
|
|
Packit Service |
a3c5fa |
| `/bookstore/book[last()-1]` | Returns the last but one book element | ✖ |
|
|
Packit Service |
a3c5fa |
| `/bookstore/book[position()<3]` | Returns the first two books | ✔ |
|
|
Packit Service |
a3c5fa |
| `/bookstore/book[upper-case(text())=='HARRY POTTER']` | Returns the first book | ✔ |
|
|
Packit Service |
a3c5fa |
| `/bookstore/book[@percentage>=90]` | Returns the book with `>=` 90% completion | ✔ |
|
|
Packit Service |
a3c5fa |
| `/bookstore/book/title[@lang]` | Returns titles with an attribute named `lang` | ✔ |
|
|
Packit Service |
a3c5fa |
| `/bookstore/book/title[@lang='en']` | Returns titles that have a `lang`equal `en` | ✔ |
|
|
Packit Service |
a3c5fa |
| `/bookstore/book/title[@lang!='en']` | Returns titles that have a `lang` not equal `en` | ✔ |
|
|
Packit Service |
a3c5fa |
| `/bookstore/book/title[@lang<='zz_ZZ']` | Returns titles that `lang` <= `zz_ZZ` | ✔ |
|
|
Packit Service |
a3c5fa |
| `/bookstore/book[price>35.00]` | Returns the books with a price greater than 35 | ✖ |
|
|
Packit Service |
a3c5fa |
| `/bookstore/book[price>35.00]/title` | Returns the titles that have a price greater than 35 | ✖ |
|
|
Packit Service |
a3c5fa |
| `/bookstore/book/title[text()='Learning XML']` | Returns the book node with matching content | ✔ |
|
|
Packit Service |
a3c5fa |
|
|
Packit Service |
a3c5fa |
Compilation
|
|
Packit Service |
a3c5fa |
----------
|
|
Packit Service |
a3c5fa |
|
|
Packit Service |
a3c5fa |
libxmlb is a standard meson project. It can be compiled using the following basic steps:
|
|
Packit Service |
a3c5fa |
|
|
Packit Service |
a3c5fa |
```
|
|
Packit Service |
a3c5fa |
# meson build
|
|
Packit Service |
a3c5fa |
# ninja -C build
|
|
Packit Service |
a3c5fa |
# ninja -C build install
|
|
Packit Service |
a3c5fa |
# ldconfig
|
|
Packit Service |
a3c5fa |
```
|
|
Packit Service |
a3c5fa |
|
|
Packit Service |
a3c5fa |
This will by default install the library into `/usr/local`. On some Linux distributions you may
|
|
Packit Service |
a3c5fa |
need to configure the linker path in `/etc/ld.so.conf` to be able to locate it.
|
|
Packit Service |
a3c5fa |
The call to `ldconfig` is needed to refresh the linker cache.
|