Blame doc/memorymanagement.txt

Packit Service b74dd5
Memory management
Packit Service b74dd5
=================
Packit Service b74dd5
Packit Service b74dd5
There can be two types of nodes:
Packit Service b74dd5
Packit Service b74dd5
* those connected to an existing tree
Packit Service b74dd5
Packit Service b74dd5
* those unconnected. These may be the top node of a tree
Packit Service b74dd5
Packit Service b74dd5
Nodes consist of a C-level libxml2 node, Node for short, and
Packit Service b74dd5
optionally a Python-level proxy node, Proxy. Zero, one or more Proxies can
Packit Service b74dd5
exist for a single Node.
Packit Service b74dd5
Packit Service b74dd5
Proxies are garbage collected automatically by Python. Nodes are not
Packit Service b74dd5
garbage collected at all. Instead, explicit mechanisms exist for
Packit Service b74dd5
Nodes to clear them and the tree they may be the top of.
Packit Service b74dd5
Packit Service b74dd5
A Node can be safely freed when:
Packit Service b74dd5
Packit Service b74dd5
* no Proxy is connected to this Node
Packit Service b74dd5
Packit Service b74dd5
* no Proxy cannot be created for this Node
Packit Service b74dd5
Packit Service b74dd5
A Proxy cannot be created to a CNode when:
Packit Service b74dd5
Packit Service b74dd5
* no Proxy exist for nodes that are connected to that Node
Packit Service b74dd5
Packit Service b74dd5
This is the case when:
Packit Service b74dd5
Packit Service b74dd5
* the Node is in a tree that has no Proxy connected to any of the nodes.
Packit Service b74dd5
Packit Service b74dd5
This means that the whole tree in such a condition can be freed.
Packit Service b74dd5
Packit Service b74dd5
Detecting whether a Node is in a tree that has no Proxies connected to
Packit Service b74dd5
it can be done by relying on Python's garbage collection
Packit Service b74dd5
algorithm. Each Proxy can have a reference to the Proxy that points to
Packit Service b74dd5
the top of the tree. In case of a document tree, this reference is to
Packit Service b74dd5
the Document Proxy. When no more references exist in the system to the
Packit Service b74dd5
top Proxy, this means no more Proxies exist that point to the Node
Packit Service b74dd5
tree the top Proxy is the top of. If this Node tree is unconnected;
Packit Service b74dd5
i.e. it is not a subtree, this means that tree can be safely garbage
Packit Service b74dd5
collected.
Packit Service b74dd5
Packit Service b74dd5
A special case exists for document references. Each Proxy will always
Packit Service b74dd5
have a reference to the Document Proxy, as any Node will have such a
Packit Service b74dd5
reference to the Document Node. This means that a Document Node can
Packit Service b74dd5
only be garbage collected when no more Proxies at all exist anymore
Packit Service b74dd5
which refer to the Document. This is a separate system from the
Packit Service b74dd5
top-Node references, even though the top-node in many cases will be
Packit Service b74dd5
the Document. This because there is no way to get to a node that is
Packit Service b74dd5
not connected to the Document tree from a Document Proxy.
Packit Service b74dd5
Packit Service b74dd5
This approach requires a system that can keep track of the top of the
Packit Service b74dd5
tree in any case. Usually this is simple: when a Proxy gets connected,
Packit Service b74dd5
the tree top becomes the tree top of whatever node it is connected
Packit Service b74dd5
to. 
Packit Service b74dd5
Packit Service b74dd5
Sometimes this is more difficult: a Proxy may exist pointing to a node
Packit Service b74dd5
in a subtree that just got connected. The top reference cannot be
Packit Service b74dd5
updated. This is a problem in the following case:
Packit Service b74dd5
Packit Service b74dd5
    a
Packit Service b74dd5
  b    c         h
Packit Service b74dd5
d  e  f  g     i  j
Packit Service b74dd5
              k
Packit Service b74dd5
Packit Service b74dd5
now imagine we have a proxy to k, K, and a proxy of i, I. They both
Packit Service b74dd5
have a pointer to proxy H.
Packit Service b74dd5
Packit Service b74dd5
Now imagine i gets moved under g through proxy I. Proxy I will have an
Packit Service b74dd5
updated pointer to proxy A. However, proxy K cannot be updated and still
Packit Service b74dd5
points to H, from which it is now in fact disconnected.
Packit Service b74dd5
Packit Service b74dd5
proxy H cannot be removed now until proxy A is removed. In addition,
Packit Service b74dd5
proxy A has a refcount that is too low because proxy K doesn't point
Packit Service b74dd5
to it but should.
Packit Service b74dd5
Packit Service b74dd5
Another strategy involves having a reference count on the underlying
Packit Service b74dd5
nodes, one per proxy. A node can only be freed if there is no
Packit Service b74dd5
descendant-or-self that has the refcount higher than 0. A node, when
Packit Service b74dd5
no more Python references to it exist, will check for refcounts first.
Packit Service b74dd5
The drawback of this is potentially heavy tree-walking each time a proxy
Packit Service b74dd5
can be removed.