Blame README.md

Packit Service a3c5fa
libxmlb
Packit Service a3c5fa
=======
Packit Service a3c5fa
Packit Service a3c5fa
Introduction
Packit Service a3c5fa
------------
Packit Service a3c5fa
Packit Service a3c5fa
XML is slow to parse and strings inside the document cannot be memory mapped as
Packit Service a3c5fa
they do not have a trailing NUL char. The libxmlb library takes XML source, and
Packit Service a3c5fa
converts it to a structured binary representation with a deduplicated string
Packit Service a3c5fa
table -- where the strings have the NULs included.
Packit Service a3c5fa
Packit Service a3c5fa
This allows an application to mmap the binary XML file, do an XPath query and
Packit Service a3c5fa
return some strings without actually parsing the entire document. This is all
Packit Service a3c5fa
done using (almost) zero allocations and no actual copying of the binary data.
Packit Service a3c5fa
Packit Service a3c5fa
As each node in the binary XML file encodes the 'next' node at the same level
Packit Service a3c5fa
it makes skipping whole subtrees trivial. A 10Mb binary XML file can be loaded
Packit Service a3c5fa
from disk **and** queried in less than a few milliseconds.
Packit Service a3c5fa
Packit Service a3c5fa
The binary XML is not supposed to be small. It's usually about half the size of
Packit Service a3c5fa
the text XML data where a lot of the tag content is duplicated, but can actually
Packit Service a3c5fa
be larger than the original XML file. This isn't important; the fast query speed
Packit Service a3c5fa
and the ability to mmap strings without copies more than makes up for the larger
Packit Service a3c5fa
on-disk size. If you want to compress your XML, this library probably isn't for
Packit Service a3c5fa
you -- just use gzip -- its gives you an almost a perfect compression ratio for
Packit Service a3c5fa
data like this.
Packit Service a3c5fa
Packit Service a3c5fa
For example:
Packit Service a3c5fa
Packit Service a3c5fa
    $ xb-tool compile fedora.xmlb fedora.xml.gz
Packit Service a3c5fa
Packit Service a3c5fa
    $ du -h fedora.xml*
Packit Service a3c5fa
    12M         fedora.xmlb
Packit Service a3c5fa
    3.6M        fedora.xml.gz
Packit Service a3c5fa
Packit Service a3c5fa
    $ xb-tool query fedora.xmlb "components/component[@type=desktop]/id[text()=firefox.desktop]"
Packit Service a3c5fa
    RESULT: firefox.desktop
Packit Service a3c5fa
    real        0m0.011s
Packit Service a3c5fa
    user        0m0.010s
Packit Service a3c5fa
    sys         0m0.001s
Packit Service a3c5fa
Packit Service a3c5fa
XPath
Packit Service a3c5fa
=====
Packit Service a3c5fa
Packit Service a3c5fa
This library only implements a tiny subset of XPath. See the examples for the
Packit Service a3c5fa
full list, but it's basically restricted to element_name, attributes and text.
Packit Service a3c5fa
Packit Service a3c5fa
We will use the following XML document in the examples below.
Packit Service a3c5fa
Packit Service a3c5fa
    
Packit Service a3c5fa
    <bookstore>
Packit Service a3c5fa
      <book>
Packit Service a3c5fa
        <title lang="en">Harry Potter</title>
Packit Service a3c5fa
        <price>29.99</price>
Packit Service a3c5fa
      </book>
Packit Service a3c5fa
      <book percentage="99">
Packit Service a3c5fa
        <title lang="en">Learning XML</title>
Packit Service a3c5fa
        <price>39.95</price>
Packit Service a3c5fa
      </book>
Packit Service a3c5fa
    </bookstore>
Packit Service a3c5fa
Packit Service a3c5fa
Selecting Nodes
Packit Service a3c5fa
---------------
Packit Service a3c5fa
Packit Service a3c5fa
XPath uses path expressions to select nodes in an XML document. The only thing
Packit Service a3c5fa
that libxmlb can return are nodes.
Packit Service a3c5fa
Packit Service a3c5fa
| Example | Description | Supported |
Packit Service a3c5fa
| --- | --- | --- |
Packit Service a3c5fa
| `/bookstore` | Returns the root bookstore element | ✔ |
Packit Service a3c5fa
| `/bookstore/book` | Returns all `book` elements | ✔ |
Packit Service a3c5fa
| `//book` | Returns books no matter where they are | ✖ |
Packit Service a3c5fa
| `bookstore//book` | Returns books that are descendant of `bookstore` | ✖ |
Packit Service a3c5fa
| `@lang` | Returns attributes that are named `lang` | ✖ |
Packit Service a3c5fa
| `/bookstore/.` | Returns the `bookstore` node | ✖ |
Packit Service a3c5fa
| `/bookstore/book/*` | Returns all `title` and `price` nodes of each `book` node | ✔ |
Packit Service a3c5fa
| `/bookstore/book/child::*` | Returns all `title` and `price` nodes of each `book` node | ✔ |
Packit Service a3c5fa
| `/bookstore/book/title/..` | Returns the `book` nodes with a title | ✔ |
Packit Service a3c5fa
| `/bookstore/book/parent::*` | Returns `bookstore`, the parent of `book` | ✔ |
Packit Service a3c5fa
| `/bookstore/book/parent::bookstore` | Returns the parent `bookstore` of `book` | ✖ |
Packit Service a3c5fa
Packit Service a3c5fa
Predicates
Packit Service a3c5fa
----------
Packit Service a3c5fa
Packit Service a3c5fa
Predicates are used to find a specific node or a node that contains a specific
Packit Service a3c5fa
value. Predicates are always embedded in square brackets.
Packit Service a3c5fa
Packit Service a3c5fa
| Example | Description | Supported |
Packit Service a3c5fa
| --- | --- | --- |
Packit Service a3c5fa
| `/bookstore/book[1]` | Returns the first book element | ✔ |
Packit Service a3c5fa
| `/bookstore/book[first()]` | Returns the first book element | ✔ |
Packit Service a3c5fa
| `/bookstore/book[last()]` | Returns the last book element | ✔ |
Packit Service a3c5fa
| `/bookstore/book[last()-1]` | Returns the last but one book element | ✖ |
Packit Service a3c5fa
| `/bookstore/book[position()<3]` | Returns the first two books | ✔ |
Packit Service a3c5fa
| `/bookstore/book[upper-case(text())=='HARRY POTTER']` | Returns the first book | ✔ |
Packit Service a3c5fa
| `/bookstore/book[@percentage>=90]` | Returns the book with `>=` 90% completion | ✔ |
Packit Service a3c5fa
| `/bookstore/book/title[@lang]` | Returns titles with an attribute named `lang` | ✔ |
Packit Service a3c5fa
| `/bookstore/book/title[@lang='en']` | Returns titles that have a `lang`equal `en` | ✔ |
Packit Service a3c5fa
| `/bookstore/book/title[@lang!='en']` | Returns titles that have a `lang` not equal `en` | ✔ |
Packit Service a3c5fa
| `/bookstore/book/title[@lang<='zz_ZZ']` | Returns titles that `lang` <= `zz_ZZ` | ✔ |
Packit Service a3c5fa
| `/bookstore/book[price>35.00]` | Returns the books with a price greater than 35 | ✖ |
Packit Service a3c5fa
| `/bookstore/book[price>35.00]/title` | Returns the titles that have a price greater than 35 | ✖ |
Packit Service a3c5fa
| `/bookstore/book/title[text()='Learning XML']` | Returns the book node with matching content | ✔ |
Packit Service a3c5fa
Packit Service a3c5fa
Compilation
Packit Service a3c5fa
----------
Packit Service a3c5fa
Packit Service a3c5fa
libxmlb is a standard meson project.  It can be compiled using the following basic steps:
Packit Service a3c5fa
Packit Service a3c5fa
```
Packit Service a3c5fa
# meson build
Packit Service a3c5fa
# ninja -C build
Packit Service a3c5fa
# ninja -C build install
Packit Service a3c5fa
# ldconfig
Packit Service a3c5fa
```
Packit Service a3c5fa
Packit Service a3c5fa
This will by default install the library into `/usr/local`. On some Linux distributions you may
Packit Service a3c5fa
need to configure the linker path in `/etc/ld.so.conf` to be able to locate it.
Packit Service a3c5fa
The call to `ldconfig` is needed to refresh the linker cache.