Tree - source-git/libdwarf - CentOS Git server

source-git / libdwarf

Files

Commit: 4778df5151285153d0d9331dfb45a7e51910c92b
Blob Blame History Raw
Some thoughts about tsearch.

David Anderson
Updated 15 February 2014.

Tsearch() was defined and implemented fairly early in
the history of UNIX, but I don't know exactly when.
The earliest documentation I have is from the 1991
edition of UNIX System V Release 4 Programmer's Reference
Manual.

It is useful because it enables one to make a searchable tree
of, well, any data one might have need to search.   Tsearch never needs
to understand the format of the data.

The functions defined there are  tsearch(), tfind(),
tdelete(), and twalk().  The description is just
as difficult to understand as the documentation in release 3.54
of the Linux man-pages project (the most recent I have on hand).

The Single Unix Specification page on tsearch() etc is slightly
different text from the Programmer's Reference Manual and the 
Linux man page, but no clearer.

The confusion is partly due to the interface definition.  Until the
late 20th century only limited attention was given to interface
design.  By the late 20th century it seems clear that any
newly designed tsearch-like functionality would have a significantly
different interface.   An example of an improved interface is
the GNU/Linux function hsearch_r().  Making the return value
a small integer defining what the function actually did
(and using other arguments to return additional values as
necessary) significantly simplifies the discussion.

Here though we are talking about the old and standard interface.
We attempt to clarify the messy parts. See the examples provided
for actual code.

The functions and tables of tsearch are not thread safe.
Nor thread-aware.  Any access to one of tables at the
same time the table is being updated will lead to chaos
eventually.

Any table tsearch maintains consists of records
of undefined (meaning the definition
is local to the internals) content each record 
of which contains, as one element, a copy of a KEY pointer.

In the following I freely use tsearch for dwarf_tsearch
etc. The functionality and interfaces are identical.

=========
Further reading 

The canonical reference for binary trees and the
algorithm reference for most of the tsearch code is 
Donald E. Knuth,
"The Art of Computer Programming" Volume 3 Sorting and Searching,
Second Edition. 

"Algorithms" Fourth Edition, Robert Sedgewick and Kevin Wayne,
is the basis for the red-black tree coding.


Wikipedia has quite a few entries of interest.
  http://en.wikipedia.org/wiki/Self-balancing_binary_search_tree
  http://en.wikipedia.org/wiki/Tree_traversal
  http://en.wikipedia.org/wiki/Red–black_tree
are just a few.

After implementing tsearch I looked briefly at
some other projects.

freecode.net: See the 'GNU libavl' project.
  The libavl libary is a GNU project in C with each tree type
  having a uniquely-named but consistent set of interfaces.
  The interface design is much more sensible than
  Unix TSEARCH.  You  will probably find it in most any
  Linux distribution.  The 2.0.3 tarfile source has no
  configure step and the 2.0.3 Makefile failed for me,
  it placed -lm too early on the command line building texitree
  means the version mentioned on freecode.net:
  ftp://ftp.gnu.org/pub/gnu/avl/avl-2.0.3.tar.gz
  is rather old but shows internal evidence of
  last being updated in 2007 (*.pdf and other file timestamps
  in the tar file). The Makefile fails for me doing plain 'make'.  
  Doing 'make program' to just build the source works fine.
  The interface definitions are more sensible than tsearch,
  rb_delete returns the deleted item when it succeeds, for example.

freecode.net: See the 'libredblack' project.
  This project tarfile was last updated in 2003.
   It is on sourceforge at http://sourceforge.net/projects/redblacktree/
  Implemented in C.
  Configure worked (version 1.3) but the make step failed
  running the python app ./rbgen due to the presence
  of the string  %prefix on line 9 of rbgen.
  Removing that one line of python commentary from Eric Raymond
  fixed the build!
  There is a good amount of C #define magic making it harder to
  read the source than one might expect.
  The interface definitions are more sensible than tsearch,
  rb_delete returns the deleted item when it succeeds, for example.

freecode.net: See the project 'Template based B+ Tree'.
  Is in C++ and can be used for searches which are file or
  memory based through use of C++ templates by the implemenation.
  Have not attempted to build it.

=========
tsearch:
void *tsearch(const void *key, void **rootp,
    int (*compar)(const void *l, const void *r));

Our terminology comes from the above declaration.
tsearch() returns a pointer.  If tsearch 
fails due to an out of memory or internal error
it returns NULL.  Otherwise it returns a non-null
pointer (see Return value, below).

KEY: 
First we will define KEY as it is ordinarily used. 
KEY is a pointer to an object you define and initialize.
The object must remain in stable
storage for as long as the table has a copy of the KEY
pointing to the object.   Example:
    struct mystruct *key_a = malloc(sizeof(struct mystruct));
    initialize(key_a, my data to fill in struct...);
    void *r = tsearch(key_a,&treeroot,comparfunc);

ROOTP:
This must be the address of a void* datum.
Before the first call of tsearch the value of *rootp must
be NULL (set by you somehow).
The contents are maintained by tsearch thereafter.

COMPAR:
A function you write whose argments (when tsearch internals call it)
are KEY pointers from two records (your records).  The function should
return -1 if the record KEY l points to is considered less than
the recorda KEY r points to. Return zero of the values
are considered to match.  Return 1 otherwise.
Example:  int comparefunc(const void *l,const void *r) { 
struct mystruct *lp = l;
struct mystruct *rp = r;
if(lp->myv <  rp->myv) return -1;
if(lp->myv ==  rp->myv) return 0;
return 1; }
Of course the comparison need not be of simple values, it could involve
anything. This is just a simple sketch.

Return value:
If tsearch() returns NULL something went wrong. There was insufficient memory
or an internal error of some kind.

Lets call the KEY passed in KEYa.
Otherwise tsearch() returns a pointer to a KEY for this object.
Dereference to get an actual KEY (a pointer to your object).
Lets call this dereferenced KEY key_deref.
  struct mystruct *key_deref = *(struct mystruct *)r;
if KEYa == key_deref then:
  KEYa  was added to the tree.
  Hence the tree now has a copy of
  the value of key_a.
else:
  KEYa matched a record in the tree and
  you need to free any space you allocated
  to build the key for this tsearch call. 
  In our example:
  free(key_a).
  

I hope the above clarifies the use of tsearch somewhat.
============
Tfind:
void *find(const void *key, void **rootp,
    int (*compar)(const void *l, const void *r));

The interfaces are the same but the return value is
(in contrast with tsearch) reasonably clear:

A non-null return is a key (pointer).
It never adds a new record, it simple tries to find a record.
A NULL return means either that the search-ed for record did not exist or that
something internal went wrong or perhaps that something incorrect
was detected at runtime in the arguments passed in.

============
Tdelete:
void *tdelete(const void *key, void **rootp,
    int (*compar)(const void *l, const void *r));

The arguments are the same as tsearch/tfind, 
but the return value is a bit odd.

If something goes wrong or if the record does not exist, 
it returns NULL.  That is straightforward.

If the record is deleted tdelete is supposed to return
a pointer to the parent of the deleted record.
The content of the deleted record is NOT deleted.

If the record deleted was the last record in the
tree,  a NULL is returned and  *rootp is set to NULL.
Thus restoring initial conditions of an empty tree.

If the record deleted was the root (of the records
in the tree as of the call) then what is this supposed
to return?  No available documentation suggests an
answer. The current dwarf_tdelete implementations
return some record or other in this case.

In the case of dwarf_tdelete() for a hash 'tree'
if the node was the last in a hash chain then
NULL is returned. (ugly, but it is difficult to
figure out what else to do).

All this means that it is a waste of time to 
inspect the return value from tdelete.

So to do a tdelete and do any necessary
memory free do:
    key = xxxx
    t = tfind(key,&root,compar_func)
    
    if (t) {
        tdelete(key,&root,compar_func)
    }
    free key
    free r

where what 'free key' means is to free whatever 'key = xxxx'
allocated. Which means do the same for 'free r'.

Of course if your keys point into to a static array of data
then free is unnecessary, but how often will the data
be in static memory?  
And if it is in static storage, why do any free at all?



============
Tdestroy:
void *tdestroy((void * root,
    void (*free_node)(void * key));

This frees all memory the tree system has and
along the way it calls 'free_node' for each
node in the tree. 

It empties the tree, but, oddly, the root argument
is a simple pointer to the root, not a pointer-to-pointer.
Hence this routine can free the tree, but cannot
assign a NULL to the root.  So you should do
    root = 0;
after calling tdestroy().


============
Twalk:
void *twalk((const void * /*root*/,
    void (* /*action*/)(const void * /*nodep*/,
    const DW_VISIT  /*which*/,
    const int  /*depth*/));

============
Tdump:
void tdump(const void *rootp,
    char *(*keyprint)(const void *key),
    const char * msg);

This is unique to the dwarf_tsearch library.
It prints (to standard out) a representation of the tree.
The output is indented one space per level and ordered such
that turning the output 90 degrees clockwise shows a kind
of picture of the tree structure in the way trees
are normally shown in books.
It may be of slight interest for debugging trees if the
trees are not too large.

The 'msg' argument is just a character string of interest to you, something
identifying the output. It is printed once and otherwise ignored.

The 'keyprint' argument is a pointer to
a function you write.  The function 
is called for each node in turn. It
should return a pointer to a string with whatever
data from the passed-in-by-tdump key you wish to show. 
Normally the pointer returned from keyprint should be pointing to a 
static area so there is no issue with memory leakage to worry about.
tdump will print the returned string immediately and will not refer
to it again.


============
tsearch use using pointers (declarations left out):
    The struct here is entirely ours: tsearch neither knows nor cares
    how it is laid out.

    mt = struct example_tentry *mt =
        (struct example_tentry *)calloc(sizeof(struct example_tentry),1); 
    mt->mt_key = keyvalue;
    mt->mt_data = datavalue;
    errno = 0;
    /* tsearch adds an entry if its not present already. */
    retval = dwarf_tsearch(mt,tree, mt_compare_func  );
    if(retval == 0) {
        printf("FAIL ENOMEM in search on  %d, give up insertrecsbypointer\n",i);
        exit(1);
    } else {
        struct example_tentry *re = 0;
        re = *(struct example_tentry **)retval;
        if(re != mt) {
            /* found existing entry. */
            mt_free_func(mt);
        } else {
            /* New entry mt was added. */
        }
    }

In this case the mt_compare_func() might if using a struct look like:
    int
    mt_compare_func(const void *l, const void *r)
    {
        const struct example_tentry *ml = l;
        const struct example_tentry *mr = r;
        /* If the key were a string, one could use strcmp()
           instead of the comparisons here */
        if(ml->mt_key < mr->mt_key) {
            return -1;
        }
        if(ml->mt_key > mr->mt_key) {
            return 1;
        }
        return 0;
    }




============
tsearch use using values (declarations left out):
            void *mt = (void *)key;
            errno = 0;
            /* tsearch adds an entry if its not present already. */
            retval = dwarf_tsearch(mt,tree, value_compare_func  );
            if(retval == 0) {
                printf("FAIL ENOMEM in search on  %d, give up insertrecsbypointer\n",i);
                exit(1);
            } else {
                /* successful search.mt might have been added by the call, 
                   or maybe it was already in the tree.
                   It is impossible to tell which */
            }
In this case the value_compare_func might look like:
             int
             value_compare_func(const void *l, const void *r)
             {
                 VALTYPE lp = (VALTYPE)l;
                 VALTYPE rp = (VALTYPE)r;
                 if(lp < rp) {
                     return -1;
                 }
                 if(lp > rp) {
                     return 1;
                 }
                 return 0;
             }
In this case the free_node function would look like
void free_node { /* Do nothing. */ }


In this case, there is never any free() calls you need to make
as the key is not a pointer to anything.
===================
If I were designing the interfaces in 2014 I might do
as follows.

I think the GNU hsearch_r interfaces are a fine
approach and could be extended to tsearch().  But
I propose something slightly different.

struct tsearch_base; /* opaque struct, content defined by
   the search code, content not made public */
int compare_func(void *l, void*r); /* you define comparison function */
int free_func(void *n); /* you define free function */

None of the proposed interfaces touch errno.

All functions return 0 on success and an errno on failure.
They return EINVAL if an argument is incorrect somehow.

int tcreate(struct tsearch_base **base,compare_func,free_func)
    allocate a struct_tsearch_base record and puts a pointer to
    it in *base. Records the compare and free_func pointers.
    A call on this by uses is a requirement .

    returns:
    ENOMEM if out of memory.
    EINVAL if something wrong with an argument or an internal problem.

int tsearch(void *key,struct tsearch_base *base);
    returns:
    ENOMEM if out of memory.
    EINVAL if something wrong with an argument or an internal problem.

int tfind(void *key,struct tsearch_base *base);
    returns:
    ESRCH if not found.
    EINVAL if something wrong with an argument or an internal problem.

int tdelete(void *key,struct tsearch_base *base,void **keyout);
    If successful, tdelete deletes the record key identifies
    and returns that record's recorded key through *keyout.
    So if a free()or other action is required you can take that action.

    The 'base' struct tsearch_base record is NOT freed, 
    it remains, whether the tree has any records left or not.

    returns:
    ESRCH if key not found.
    EINVAL if something wrong with an argument or an internal problem.

int tsize(struct tsearch_base *base,unsigned long*count_out);
    Stores a count of the records currently recorded in the tree in *count_out.

    EINVAL if something wrong with an argument or an internal problem.
    

int tdestroy(struct tsearch_base **base);
    A call is a requirement on users -- to free up the tree space.
    Calls free_func  on each node in the tree and
    frees all tree memory and does 
        *base = NULL
    to reset your tree pointer.

    returns:
    EINVAL if something wrong with an argument or an internal problem.
=================
source-git / libdwarf

Source Code

Files