Blame README.md

Packit caecb6
libxmlb
Packit caecb6
=======
Packit caecb6
Packit caecb6
Introduction
Packit caecb6
------------
Packit caecb6
Packit caecb6
XML is slow to parse and strings inside the document cannot be memory mapped as
Packit caecb6
they do not have a trailing NUL char. The libxmlb library takes XML source, and
Packit caecb6
converts it to a structured binary representation with a deduplicated string
Packit caecb6
table -- where the strings have the NULs included.
Packit caecb6
Packit caecb6
This allows an application to mmap the binary XML file, do an XPath query and
Packit caecb6
return some strings without actually parsing the entire document. This is all
Packit caecb6
done using (almost) zero allocations and no actual copying of the binary data.
Packit caecb6
Packit caecb6
As each node in the binary XML file encodes the 'next' node at the same level
Packit caecb6
it makes skipping whole subtrees trivial. A 10Mb binary XML file can be loaded
Packit caecb6
from disk **and** queried in less than a few milliseconds.
Packit caecb6
Packit caecb6
The binary XML is not supposed to be small. It's usually about half the size of
Packit caecb6
the text XML data where a lot of the tag content is duplicated, but can actually
Packit caecb6
be larger than the original XML file. This isn't important; the fast query speed
Packit caecb6
and the ability to mmap strings without copies more than makes up for the larger
Packit caecb6
on-disk size. If you want to compress your XML, this library probably isn't for
Packit caecb6
you -- just use gzip -- its gives you an almost a perfect compression ratio for
Packit caecb6
data like this.
Packit caecb6
Packit caecb6
For example:
Packit caecb6
Packit caecb6
    $ xb-tool compile fedora.xmlb fedora.xml.gz
Packit caecb6
Packit caecb6
    $ du -h fedora.xml*
Packit caecb6
    12M         fedora.xmlb
Packit caecb6
    3.6M        fedora.xml.gz
Packit caecb6
Packit caecb6
    $ xb-tool query fedora.xmlb "components/component[@type=desktop]/id[text()=firefox.desktop]"
Packit caecb6
    RESULT: firefox.desktop
Packit caecb6
    real        0m0.011s
Packit caecb6
    user        0m0.010s
Packit caecb6
    sys         0m0.001s
Packit caecb6
Packit caecb6
XPath
Packit caecb6
=====
Packit caecb6
Packit caecb6
This library only implements a tiny subset of XPath. See the examples for the
Packit caecb6
full list, but it's basically restricted to element_name, attributes and text.
Packit caecb6
Packit caecb6
We will use the following XML document in the examples below.
Packit caecb6
Packit caecb6
    
Packit caecb6
    <bookstore>
Packit caecb6
      <book>
Packit caecb6
        <title lang="en">Harry Potter</title>
Packit caecb6
        <price>29.99</price>
Packit caecb6
      </book>
Packit caecb6
      <book percentage="99">
Packit caecb6
        <title lang="en">Learning XML</title>
Packit caecb6
        <price>39.95</price>
Packit caecb6
      </book>
Packit caecb6
    </bookstore>
Packit caecb6
Packit caecb6
Selecting Nodes
Packit caecb6
---------------
Packit caecb6
Packit caecb6
XPath uses path expressions to select nodes in an XML document. The only thing
Packit caecb6
that libxmlb can return are nodes.
Packit caecb6
Packit caecb6
| Example | Description | Supported |
Packit caecb6
| --- | --- | --- |
Packit caecb6
| `/bookstore` | Returns the root bookstore element | ✔ |
Packit caecb6
| `/bookstore/book` | Returns all `book` elements | ✔ |
Packit caecb6
| `//book` | Returns books no matter where they are | ✖ |
Packit caecb6
| `bookstore//book` | Returns books that are descendant of `bookstore` | ✖ |
Packit caecb6
| `@lang` | Returns attributes that are named `lang` | ✖ |
Packit caecb6
| `/bookstore/.` | Returns the `bookstore` node | ✖ |
Packit caecb6
| `/bookstore/book/*` | Returns all `title` and `price` nodes of each `book` node | ✔ |
Packit caecb6
| `/bookstore/book/child::*` | Returns all `title` and `price` nodes of each `book` node | ✔ |
Packit caecb6
| `/bookstore/book/title/..` | Returns the `book` nodes with a title | ✔ |
Packit caecb6
| `/bookstore/book/parent::*` | Returns `bookstore`, the parent of `book` | ✔ |
Packit caecb6
| `/bookstore/book/parent::bookstore` | Returns the parent `bookstore` of `book` | ✖ |
Packit caecb6
Packit caecb6
Predicates
Packit caecb6
----------
Packit caecb6
Packit caecb6
Predicates are used to find a specific node or a node that contains a specific
Packit caecb6
value. Predicates are always embedded in square brackets.
Packit caecb6
Packit caecb6
| Example | Description | Supported |
Packit caecb6
| --- | --- | --- |
Packit caecb6
| `/bookstore/book[1]` | Returns the first book element | ✔ |
Packit caecb6
| `/bookstore/book[first()]` | Returns the first book element | ✔ |
Packit caecb6
| `/bookstore/book[last()]` | Returns the last book element | ✔ |
Packit caecb6
| `/bookstore/book[last()-1]` | Returns the last but one book element | ✖ |
Packit caecb6
| `/bookstore/book[position()<3]` | Returns the first two books | ✔ |
Packit caecb6
| `/bookstore/book[upper-case(text())=='HARRY POTTER']` | Returns the first book | ✔ |
Packit caecb6
| `/bookstore/book[@percentage>=90]` | Returns the book with `>=` 90% completion | ✔ |
Packit caecb6
| `/bookstore/book/title[@lang]` | Returns titles with an attribute named `lang` | ✔ |
Packit caecb6
| `/bookstore/book/title[@lang='en']` | Returns titles that have a `lang`equal `en` | ✔ |
Packit caecb6
| `/bookstore/book/title[@lang!='en']` | Returns titles that have a `lang` not equal `en` | ✔ |
Packit caecb6
| `/bookstore/book/title[@lang<='zz_ZZ']` | Returns titles that `lang` <= `zz_ZZ` | ✔ |
Packit caecb6
| `/bookstore/book[price>35.00]` | Returns the books with a price greater than 35 | ✖ |
Packit caecb6
| `/bookstore/book[price>35.00]/title` | Returns the titles that have a price greater than 35 | ✖ |
Packit caecb6
| `/bookstore/book/title[text()='Learning XML']` | Returns the book node with matching content | ✔ |
Packit caecb6
Packit caecb6
Compilation
Packit caecb6
----------
Packit caecb6
Packit caecb6
libxmlb is a standard meson project.  It can be compiled using the following basic steps:
Packit caecb6
Packit caecb6
```
Packit caecb6
# meson build
Packit caecb6
# ninja -C build
Packit caecb6
# ninja -C build install
Packit caecb6
# ldconfig
Packit caecb6
```
Packit caecb6
Packit caecb6
This will by default install the library into `/usr/local`. On some Linux distributions you may
Packit caecb6
need to configure the linker path in `/etc/ld.so.conf` to be able to locate it.
Packit caecb6
The call to `ldconfig` is needed to refresh the linker cache.