Blame docs/text_widget_internals.txt

Packit 98cdb6
This file documents how GtkTextView works, at least partially.  You
Packit 98cdb6
probably want to read the text widget overview in the reference manual
Packit 98cdb6
to get an application programmer overview of the public API before
Packit 98cdb6
reading this. The overview in the reference manual documents
Packit 98cdb6
GtkTextBuffer, GtkTextView, GtkTextMark, etc. from a public API
Packit 98cdb6
standpoint.
Packit 98cdb6
Packit 98cdb6
The BTree
Packit 98cdb6
===
Packit 98cdb6
Packit 98cdb6
The heart of the text widget is a data structure called GtkTextBTree,
Packit 98cdb6
which implements all the hard work of the public GtkTextBuffer object.
Packit 98cdb6
The purpose of the btree is to make most operations at least O(log N),
Packit 98cdb6
so application programmers can just use whatever API is convenient
Packit 98cdb6
without worrying about O(N) performance pitfalls.
Packit 98cdb6
Packit 98cdb6
The BTree is a tree of paragraphs (newline-terminated lines).  The
Packit 98cdb6
leaves of the tree are paragraphs, represented by a GtkTextLine. The
Packit 98cdb6
nodes of the tree above the leaves are represented by
Packit 98cdb6
GtkTextBTreeNode. The nodes are used to store aggregate data counts,
Packit 98cdb6
so we can for example skip 100 paragraphs or 100 characters, without
Packit 98cdb6
having to traverse 100 nodes in a list.
Packit 98cdb6
Packit 98cdb6
You might guess from this that many operations are O(N) where N is the
Packit 98cdb6
number of bytes in a paragraph, and you would be right. The text
Packit 98cdb6
widget is efficient for huge numbers of paragraphs, but will choke on
Packit 98cdb6
extremely long blocks of text without intervening newlines.
Packit 98cdb6
Packit 98cdb6
("newline" is a slight lie, we also honor \r, \r\n, and some funky
Packit 98cdb6
Unicode characters for paragraph breaks. So this means annoyingly that
Packit 98cdb6
the paragraph break char may be more than one byte.)
Packit 98cdb6
Packit 98cdb6
The idea of the btree is something like:
Packit 98cdb6
Packit 98cdb6
 
Packit 98cdb6
               ------ Node (lines = 6)
Packit 98cdb6
              /          Line 0
Packit 98cdb6
             /           Line 1
Packit 98cdb6
            /            Line 2
Packit 98cdb6
           /             Line 3
Packit 98cdb6
          /              Line 4
Packit 98cdb6
         /               Line 5
Packit 98cdb6
 Node (lines = 12)       
Packit 98cdb6
         \
Packit 98cdb6
          \---------- Node (lines = 6)
Packit 98cdb6
                         Line 6
Packit 98cdb6
                         Line 7
Packit 98cdb6
                         Line 8
Packit 98cdb6
                         Line 9
Packit 98cdb6
                         Line 10
Packit 98cdb6
                         Line 11
Packit 98cdb6
   
Packit 98cdb6
Packit 98cdb6
In addition to keeping aggregate line counts at each node, we count
Packit 98cdb6
characters, and information about the tag toggles appearing below each
Packit 98cdb6
node.
Packit 98cdb6
Packit 98cdb6
Structure of a GtkTextLine
Packit 98cdb6
===
Packit 98cdb6
Packit 98cdb6
A GtkTextLine contains a single paragraph of text. It should probably
Packit 98cdb6
be renamed GtkTextPara someday but ah well.  GtkTextLine is used for 
Packit 98cdb6
the leaf nodes of the BTree.
Packit 98cdb6
Packit 98cdb6
A line is a list of GtkTextLineSegment. Line segments contain the
Packit 98cdb6
actual data found in the text buffer.
Packit 98cdb6
 
Packit 98cdb6
Here are the types of line segment (see gtktextsegment.h,
Packit 98cdb6
gtktextchild.h, etc.):
Packit 98cdb6
Packit 98cdb6
  Character:         contains a block of UTF-8 text. 
Packit 98cdb6
Packit 98cdb6
  Mark:              marks a position in the buffer, such as a cursor.
Packit 98cdb6
Packit 98cdb6
  Tag toggle:        indicates that a tag is toggled on or toggled off at
Packit 98cdb6
                     this point. when you apply a tag to a range of
Packit 98cdb6
                     text, we add a toggle on at the start of the
Packit 98cdb6
                     range, and a toggle off at the end.  (and do any
Packit 98cdb6
                     necessary merging with existing toggles, so we
Packit 98cdb6
                     always have the minimum number possible)
Packit 98cdb6
 
Packit 98cdb6
  Child widget:      stores a child widget that behaves as a single 
Packit 98cdb6
                     Unicode character from an editing perspective.
Packit 98cdb6
                     (well, stores a list of child widgets, one per 
Packit 98cdb6
                     GtkTextView displaying the buffer)
Packit 98cdb6
Packit 98cdb6
  Image:             stores a GdkPixbuf that behaves as a single 
Packit 98cdb6
                     character from an editing perspective.
Packit 98cdb6
Packit 98cdb6
Packit 98cdb6
Each line segment has a "class" which identifies its type, and also
Packit 98cdb6
provides some virtual functions for handling that segment.
Packit 98cdb6
The functions in the class are:
Packit 98cdb6
Packit 98cdb6
 - SplitFunc, divides the segment so another segment can be inserted.
Packit 98cdb6
Packit 98cdb6
 - DeleteFunc, finalizes the segment
Packit 98cdb6
Packit 98cdb6
 - CleanupFunc, after modifying a line by adding/removing segments, 
Packit 98cdb6
   this function is used to try merging segments that can be merged, 
Packit 98cdb6
   e.g. two adjacent character segments with no marks or toggles 
Packit 98cdb6
   in between.
Packit 98cdb6
Packit 98cdb6
 - LineChangeFunc, called when a segment moves to a different line;
Packit 98cdb6
   according to comments in the code this function may not be needed
Packit 98cdb6
   anymore.
Packit 98cdb6
 
Packit 98cdb6
 - SegCheckFunc, does sanity-checking when debugging is enabled. 
Packit 98cdb6
   Basically equivalent to assert(segment is not broken).
Packit 98cdb6
Packit 98cdb6
The segment class also contains two data fields:
Packit 98cdb6
 
Packit 98cdb6
 - the name of the segment type, used for debugging
Packit 98cdb6
Packit 98cdb6
 - a boolean flag for whether the segment has right or left 
Packit 98cdb6
   gravity. A segment with right gravity ends up on the right of a
Packit 98cdb6
   newly-inserted segment that's placed at the same character offset,
Packit 98cdb6
   and a segment with left gravity ends up on the left of a
Packit 98cdb6
   newly-inserted segment. For example the insertion cursor 
Packit 98cdb6
   has right gravity, because as you type new text is inserted, 
Packit 98cdb6
   and the cursor ends up on the right.
Packit 98cdb6
Packit 98cdb6
The segment itself contains contains a header, plus some
Packit 98cdb6
variable-length data that depends on the type of the segment. 
Packit 98cdb6
The header contains the length of the segment in characters and in
Packit 98cdb6
bytes. Some segments have a length of zero. Segments with nonzero
Packit 98cdb6
length are referred to as "indexable" and would generally be
Packit 98cdb6
user-visible; indexable segments include text, images, and widgets. 
Packit 98cdb6
Segments with zero length occupy positions between characters, and
Packit 98cdb6
include marks and tag toggles.
Packit 98cdb6
Packit 98cdb6
The GtkText*Body structs are the type-specific portions of 
Packit 98cdb6
GtkTextSegment.
Packit 98cdb6
Packit 98cdb6
Character segments have the actual character data allocated in the
Packit 98cdb6
same malloc() block as the GtkTextSegment, to save both malloc()
Packit 98cdb6
overhead and the overhead of a pointer to the character data.
Packit 98cdb6
Packit 98cdb6
Storing and tracking tags in the BTree
Packit 98cdb6
===
Packit 98cdb6
Packit 98cdb6
A GtkTextTag is an object representing some text attributes.  A tag
Packit 98cdb6
can affect zero attributes (for example one used only for internal
Packit 98cdb6
application bookkeeping), a single attribute such as "bold", or any
Packit 98cdb6
number of attributes (such as large and bold and centered for a
Packit 98cdb6
"header" tag).
Packit 98cdb6
Packit 98cdb6
The tags that can be applied to a given buffer are stored in the
Packit 98cdb6
GtkTextTagTable for that buffer. The tag table is just a collection of
Packit 98cdb6
tags.
Packit 98cdb6
Packit 98cdb6
The real work of applying/removing tags happens in the function
Packit 98cdb6
_gtk_text_btree_tag(). Essentially we remove all tag toggle segments
Packit 98cdb6
that affect the tag being applied or removed from the given range;
Packit 98cdb6
then we add a toggle-on and a toggle-off segment at either end of the
Packit 98cdb6
range; then for any lines we modified, we call the CleanupFunc
Packit 98cdb6
routines for the segments, to merge segments that can be merged.
Packit 98cdb6
Packit 98cdb6
This is complicated somewhat because we keep information about the tag
Packit 98cdb6
toggles in the btree, allowing us to locate tagged regions or
Packit 98cdb6
add/remove tags in O(log N) instead of O(N) time. Tag information is
Packit 98cdb6
stored in "struct Summary" (that's a bad name, it could probably use
Packit 98cdb6
renaming). Each BTreeNode has a list of Summary hanging off of it, one
Packit 98cdb6
for each tag that's toggled somewhere below the node. The Summary
Packit 98cdb6
simply contains a count of tag toggle segments found below the node.
Packit 98cdb6
Packit 98cdb6
Packit 98cdb6
Views of the BTree (GtkTextLayout)
Packit 98cdb6
===
Packit 98cdb6
Packit 98cdb6
Each BTree has one or more views that display the tree.  Originally
Packit 98cdb6
there was some idea that a view could be any object, so there are some
Packit 98cdb6
"gpointer view_id" left in the code. However, at some point we decided
Packit 98cdb6
that all views had to be a GtkTextLayout and so the btree does assume
Packit 98cdb6
that from time to time.
Packit 98cdb6
Packit 98cdb6
The BTree maintains some per-line and per-node data that is specific 
Packit 98cdb6
to each view. The per-line data is in GtkTextLineData and the per-node
Packit 98cdb6
data is in another badly-named struct called NodeData (should be
Packit 98cdb6
PerViewNodeData or something). The purpose of these is to store:
Packit 98cdb6
Packit 98cdb6
 - aggregate height, so we can calculate the Y position of each
Packit 98cdb6
   paragraph in O(log N) time, and can get the full height 
Packit 98cdb6
   of the buffer in O(1) time. The height is per-view since 
Packit 98cdb6
   each GtkTextView may have a different size allocation.
Packit 98cdb6
Packit 98cdb6
 - maximum width (the longest line), so we can calculate the width of
Packit 98cdb6
   the entire buffer in O(1) time in order to properly set up the
Packit 98cdb6
   horizontal scrollbar.
Packit 98cdb6
Packit 98cdb6
 - a flag for whether the line is "valid" - valid lines have not been
Packit 98cdb6
   modified since we last computed their width and height. Invalid
Packit 98cdb6
   lines need to have their width and height recomputed.
Packit 98cdb6
Packit 98cdb6
At all times, we have a width and height for each view that can be
Packit 98cdb6
used. This starts out as 0x0. Lines can be incrementally revalidated,
Packit 98cdb6
which causes the width and height of the buffer to grow. So if you
Packit 98cdb6
open a new text widget with a lot of text in it, you can watch the
Packit 98cdb6
scrollbar adjust as the height is computed in an idle handler.  Lines
Packit 98cdb6
whose height has never been computed are taken to have a height of 0.
Packit 98cdb6
Packit 98cdb6
Iterators (GtkTextIter)
Packit 98cdb6
===
Packit 98cdb6
Packit 98cdb6
Iterators are fairly complex in order to avoid re-traversing the btree
Packit 98cdb6
or a line in the btree each time the iterator is used. That is, they
Packit 98cdb6
save a bunch of pointers - to the current segment, the current line,
Packit 98cdb6
etc.
Packit 98cdb6
Packit 98cdb6
Two "validity stamps" are kept in the btree that are used to detect
Packit 98cdb6
and handle possibly-invalid pointers in iterators. The
Packit 98cdb6
"chars_changed_stamp" is incremented whenever a segment with
Packit 98cdb6
char_count > 0 (an indexable segment) is added or removed. It is an
Packit 98cdb6
application bug if the application uses an iterator with a
Packit 98cdb6
chars_changed_stamp different from the current stamp of the BTree.
Packit 98cdb6
That is, you can't use an iterator after adding/removing characters.
Packit 98cdb6
Packit 98cdb6
The "segments_changed_stamp" is incremented any time we change any
Packit 98cdb6
segments, and tells outstanding iterators that any pointers to 
Packit 98cdb6
GtkTextSegment that they may be holding are now invalid. For example, 
Packit 98cdb6
if you are iterating over a character segment, and insert a mark in
Packit 98cdb6
the middle of the segment, the character segment will be split in half
Packit 98cdb6
and the original segment will be freed. This increments
Packit 98cdb6
segments_changed_stamp, causing your iterator to drop its current
Packit 98cdb6
segment pointer and count from the beginning of the line again to find 
Packit 98cdb6
the new segment.
Packit 98cdb6
Packit 98cdb6
Iterators also cache some random information such as the current line
Packit 98cdb6
number, just because it's free to do so.
Packit 98cdb6
Packit 98cdb6
GtkTextLayout
Packit 98cdb6
===
Packit 98cdb6
Packit 98cdb6
If you think of GtkTextBTree as the backend for GtkTextBuffer,
Packit 98cdb6
GtkTextLayout is the backend for GtkTextView. GtkTextLayout was also
Packit 98cdb6
used for a canvas item at one point, which is why its methods are not
Packit 98cdb6
underscore-prefixed and the header gets installed. But GtkTextLayout
Packit 98cdb6
is really intended to be private.
Packit 98cdb6
Packit 98cdb6
The main task of GtkTextLayout is to validate lines (compute their
Packit 98cdb6
width and height) by converting the lines to a PangoLayout and using
Packit 98cdb6
Pango functions. GtkTextLayout is also used for visual iteration, and
Packit 98cdb6
mapping visual locations to logical buffer positions.
Packit 98cdb6
Packit 98cdb6
Validating a line involves creating the GtkTextLineDisplay for that 
Packit 98cdb6
line. To save memory, GtkTextLineDisplay objects are always created
Packit 98cdb6
transiently, we don't keep them around.
Packit 98cdb6
Packit 98cdb6
The layout has three signals:
Packit 98cdb6
Packit 98cdb6
 - "invalidated" means some line was changed, so GtkTextView 
Packit 98cdb6
   needs to install idle handlers to revalidate.
Packit 98cdb6
Packit 98cdb6
 - "changed" means some lines were validated, so the aggregate
Packit 98cdb6
   width/height of the BTree is now different.
Packit 98cdb6
Packit 98cdb6
 - "allocate_child" means we need to size allocate a 
Packit 98cdb6
   child widget
Packit 98cdb6
Packit 98cdb6
gtk_text_layout_get_line_display() is sort of the "heart" of
Packit 98cdb6
GtkTextLayout. This function validates a line. 
Packit 98cdb6
Packit 98cdb6
Line validation involves:
Packit 98cdb6
Packit 98cdb6
 - convert any GtkTextTag on the line to PangoAttrList
Packit 98cdb6
 
Packit 98cdb6
 - add the preedit string
Packit 98cdb6
Packit 98cdb6
 - keep track of "visible marks" (the cursor)
Packit 98cdb6
Packit 98cdb6
A given set of tags is composited to a GtkTextAttributes. (In the Tk
Packit 98cdb6
code this was called a "style" and there are still relics of this in
Packit 98cdb6
the code, such as "invalidate_cached_style()", that should be cleaned
Packit 98cdb6
up.) 
Packit 98cdb6
Packit 98cdb6
There's a single-GtkTextAttributes cache, "layout->one_style_cache",
Packit 98cdb6
which is used to avoid recomputing the mapping from tags to attributes
Packit 98cdb6
for every segment. The one_style_cache is stored in the GtkTextLayout
Packit 98cdb6
instead of just a local variable in gtk_text_layout_get_line_display()
Packit 98cdb6
so we can use it across multiple lines. Any time we see a segment that
Packit 98cdb6
may change the current style (such as a tag toggle), the cache has to
Packit 98cdb6
be dropped.
Packit 98cdb6
Packit 98cdb6
To compute a GtkTextAttributes from the GtkTextTag that apply to a
Packit 98cdb6
given segment, the function is _gtk_text_attributes_fill_from_tags(). 
Packit 98cdb6
This "mashes" a list of tags into a single set of text attributes. 
Packit 98cdb6
If no tags affect a given attribute, a default set of attributes are
Packit 98cdb6
used. These defaults sometimes come from widget->style on the
Packit 98cdb6
GtkTextView, and sometimes come from a property of the GtkTextView
Packit 98cdb6
such as "pixels_above_lines"
Packit 98cdb6
Packit 98cdb6
GtkTextView
Packit 98cdb6
===
Packit 98cdb6
Packit 98cdb6
Once you get GtkTextLayout and GtkTextBTree the actual GtkTextView 
Packit 98cdb6
widget is not that complicated.
Packit 98cdb6
Packit 98cdb6
The main complexity is the interaction between scrolling and line
Packit 98cdb6
validation, which is documented with a long comment in gtktextview.c.
Packit 98cdb6
Packit 98cdb6
The other thing to know about is just that the text view has "border
Packit 98cdb6
windows" on the sides, used to draw line numbers and such; these
Packit 98cdb6
scroll along with the main window.
Packit 98cdb6
Packit 98cdb6
Invisible text
Packit 98cdb6
===
Packit 98cdb6
Packit 98cdb6
Invisible text doesn't work yet. It is a property that can be set by a
Packit 98cdb6
GtkTextTag; so you determine whether text is invisible using the same
Packit 98cdb6
mechanism you would use to check whether the text is bold, or orange.
Packit 98cdb6
Packit 98cdb6
The intended behavior of invisible text is that it should vanish
Packit 98cdb6
completely, as if it did not exist. The use-case we were thinking of
Packit 98cdb6
was a code editor with function folding, where you can hide all
Packit 98cdb6
function bodies. That could be implemented by creating a
Packit 98cdb6
"function_body" GtkTextTag and toggling its "invisible" attribute to
Packit 98cdb6
hide/show the function bodies.
Packit 98cdb6
Packit 98cdb6
Lines are normally validated in an idle handler, but as an exception,
Packit 98cdb6
lines that are onscreen are always validated synchronously. Thus
Packit 98cdb6
invisible text raises the danger that we might have a huge number of
Packit 98cdb6
invisible lines "onscreen" - this needs to be handled efficiently.
Packit 98cdb6
Packit 98cdb6
At one point we were considering making "invisible" a per-paragraph
Packit 98cdb6
attribute (meaning the invisibility state of the first character in
Packit 98cdb6
the paragraph makes the whole paragraph visible or not
Packit 98cdb6
visible). Several existing tag attributes work this way, such as the
Packit 98cdb6
margin width. I don't remember why we were going to do this, but it
Packit 98cdb6
may have been due to some implementation difficulty that will become
Packit 98cdb6
clear if you try implementing invisible text. ;-)
Packit 98cdb6
Packit 98cdb6
To finish invisible text support, all the cursor navigation
Packit 98cdb6
etc. functions (the _display_lines() stuff) will need to skip
Packit 98cdb6
invisible text. Also, various functions with _visible in the name,
Packit 98cdb6
such as gtk_text_iter_get_visible_text(), have to be audited to be
Packit 98cdb6
sure they don't get invisible text. And user operations such as
Packit 98cdb6
cut-and-paste need to copy only visible text.
Packit 98cdb6