Blame docs/text_widget_internals.txt

Packit Service fb6fa5
This file documents how GtkTextView works, at least partially.  You
Packit Service fb6fa5
probably want to read the text widget overview in the reference manual
Packit Service fb6fa5
to get an application programmer overview of the public API before
Packit Service fb6fa5
reading this. The overview in the reference manual documents
Packit Service fb6fa5
GtkTextBuffer, GtkTextView, GtkTextMark, etc. from a public API
Packit Service fb6fa5
standpoint.
Packit Service fb6fa5
Packit Service fb6fa5
The BTree
Packit Service fb6fa5
===
Packit Service fb6fa5
Packit Service fb6fa5
The heart of the text widget is a data structure called GtkTextBTree,
Packit Service fb6fa5
which implements all the hard work of the public GtkTextBuffer object.
Packit Service fb6fa5
The purpose of the btree is to make most operations at least O(log N),
Packit Service fb6fa5
so application programmers can just use whatever API is convenient
Packit Service fb6fa5
without worrying about O(N) performance pitfalls.
Packit Service fb6fa5
Packit Service fb6fa5
The BTree is a tree of paragraphs (newline-terminated lines).  The
Packit Service fb6fa5
leaves of the tree are paragraphs, represented by a GtkTextLine. The
Packit Service fb6fa5
nodes of the tree above the leaves are represented by
Packit Service fb6fa5
GtkTextBTreeNode. The nodes are used to store aggregate data counts,
Packit Service fb6fa5
so we can for example skip 100 paragraphs or 100 characters, without
Packit Service fb6fa5
having to traverse 100 nodes in a list.
Packit Service fb6fa5
Packit Service fb6fa5
You might guess from this that many operations are O(N) where N is the
Packit Service fb6fa5
number of bytes in a paragraph, and you would be right. The text
Packit Service fb6fa5
widget is efficient for huge numbers of paragraphs, but will choke on
Packit Service fb6fa5
extremely long blocks of text without intervening newlines.
Packit Service fb6fa5
Packit Service fb6fa5
("newline" is a slight lie, we also honor \r, \r\n, and some funky
Packit Service fb6fa5
Unicode characters for paragraph breaks. So this means annoyingly that
Packit Service fb6fa5
the paragraph break char may be more than one byte.)
Packit Service fb6fa5
Packit Service fb6fa5
The idea of the btree is something like:
Packit Service fb6fa5
Packit Service fb6fa5
 
Packit Service fb6fa5
               ------ Node (lines = 6)
Packit Service fb6fa5
              /          Line 0
Packit Service fb6fa5
             /           Line 1
Packit Service fb6fa5
            /            Line 2
Packit Service fb6fa5
           /             Line 3
Packit Service fb6fa5
          /              Line 4
Packit Service fb6fa5
         /               Line 5
Packit Service fb6fa5
 Node (lines = 12)       
Packit Service fb6fa5
         \
Packit Service fb6fa5
          \---------- Node (lines = 6)
Packit Service fb6fa5
                         Line 6
Packit Service fb6fa5
                         Line 7
Packit Service fb6fa5
                         Line 8
Packit Service fb6fa5
                         Line 9
Packit Service fb6fa5
                         Line 10
Packit Service fb6fa5
                         Line 11
Packit Service fb6fa5
   
Packit Service fb6fa5
Packit Service fb6fa5
In addition to keeping aggregate line counts at each node, we count
Packit Service fb6fa5
characters, and information about the tag toggles appearing below each
Packit Service fb6fa5
node.
Packit Service fb6fa5
Packit Service fb6fa5
Structure of a GtkTextLine
Packit Service fb6fa5
===
Packit Service fb6fa5
Packit Service fb6fa5
A GtkTextLine contains a single paragraph of text. It should probably
Packit Service fb6fa5
be renamed GtkTextPara someday but ah well.  GtkTextLine is used for 
Packit Service fb6fa5
the leaf nodes of the BTree.
Packit Service fb6fa5
Packit Service fb6fa5
A line is a list of GtkTextLineSegment. Line segments contain the
Packit Service fb6fa5
actual data found in the text buffer.
Packit Service fb6fa5
 
Packit Service fb6fa5
Here are the types of line segment (see gtktextsegment.h,
Packit Service fb6fa5
gtktextchild.h, etc.):
Packit Service fb6fa5
Packit Service fb6fa5
  Character:         contains a block of UTF-8 text. 
Packit Service fb6fa5
Packit Service fb6fa5
  Mark:              marks a position in the buffer, such as a cursor.
Packit Service fb6fa5
Packit Service fb6fa5
  Tag toggle:        indicates that a tag is toggled on or toggled off at
Packit Service fb6fa5
                     this point. when you apply a tag to a range of
Packit Service fb6fa5
                     text, we add a toggle on at the start of the
Packit Service fb6fa5
                     range, and a toggle off at the end.  (and do any
Packit Service fb6fa5
                     necessary merging with existing toggles, so we
Packit Service fb6fa5
                     always have the minimum number possible)
Packit Service fb6fa5
 
Packit Service fb6fa5
  Child widget:      stores a child widget that behaves as a single 
Packit Service fb6fa5
                     Unicode character from an editing perspective.
Packit Service fb6fa5
                     (well, stores a list of child widgets, one per 
Packit Service fb6fa5
                     GtkTextView displaying the buffer)
Packit Service fb6fa5
Packit Service fb6fa5
  Image:             stores a GdkPixbuf that behaves as a single 
Packit Service fb6fa5
                     character from an editing perspective.
Packit Service fb6fa5
Packit Service fb6fa5
Packit Service fb6fa5
Each line segment has a "class" which identifies its type, and also
Packit Service fb6fa5
provides some virtual functions for handling that segment.
Packit Service fb6fa5
The functions in the class are:
Packit Service fb6fa5
Packit Service fb6fa5
 - SplitFunc, divides the segment so another segment can be inserted.
Packit Service fb6fa5
Packit Service fb6fa5
 - DeleteFunc, finalizes the segment
Packit Service fb6fa5
Packit Service fb6fa5
 - CleanupFunc, after modifying a line by adding/removing segments, 
Packit Service fb6fa5
   this function is used to try merging segments that can be merged, 
Packit Service fb6fa5
   e.g. two adjacent character segments with no marks or toggles 
Packit Service fb6fa5
   in between.
Packit Service fb6fa5
Packit Service fb6fa5
 - LineChangeFunc, called when a segment moves to a different line;
Packit Service fb6fa5
   according to comments in the code this function may not be needed
Packit Service fb6fa5
   anymore.
Packit Service fb6fa5
 
Packit Service fb6fa5
 - SegCheckFunc, does sanity-checking when debugging is enabled. 
Packit Service fb6fa5
   Basically equivalent to assert(segment is not broken).
Packit Service fb6fa5
Packit Service fb6fa5
The segment class also contains two data fields:
Packit Service fb6fa5
 
Packit Service fb6fa5
 - the name of the segment type, used for debugging
Packit Service fb6fa5
Packit Service fb6fa5
 - a boolean flag for whether the segment has right or left 
Packit Service fb6fa5
   gravity. A segment with right gravity ends up on the right of a
Packit Service fb6fa5
   newly-inserted segment that's placed at the same character offset,
Packit Service fb6fa5
   and a segment with left gravity ends up on the left of a
Packit Service fb6fa5
   newly-inserted segment. For example the insertion cursor 
Packit Service fb6fa5
   has right gravity, because as you type new text is inserted, 
Packit Service fb6fa5
   and the cursor ends up on the right.
Packit Service fb6fa5
Packit Service fb6fa5
The segment itself contains contains a header, plus some
Packit Service fb6fa5
variable-length data that depends on the type of the segment. 
Packit Service fb6fa5
The header contains the length of the segment in characters and in
Packit Service fb6fa5
bytes. Some segments have a length of zero. Segments with nonzero
Packit Service fb6fa5
length are referred to as "indexable" and would generally be
Packit Service fb6fa5
user-visible; indexable segments include text, images, and widgets. 
Packit Service fb6fa5
Segments with zero length occupy positions between characters, and
Packit Service fb6fa5
include marks and tag toggles.
Packit Service fb6fa5
Packit Service fb6fa5
The GtkText*Body structs are the type-specific portions of 
Packit Service fb6fa5
GtkTextSegment.
Packit Service fb6fa5
Packit Service fb6fa5
Character segments have the actual character data allocated in the
Packit Service fb6fa5
same malloc() block as the GtkTextSegment, to save both malloc()
Packit Service fb6fa5
overhead and the overhead of a pointer to the character data.
Packit Service fb6fa5
Packit Service fb6fa5
Storing and tracking tags in the BTree
Packit Service fb6fa5
===
Packit Service fb6fa5
Packit Service fb6fa5
A GtkTextTag is an object representing some text attributes.  A tag
Packit Service fb6fa5
can affect zero attributes (for example one used only for internal
Packit Service fb6fa5
application bookkeeping), a single attribute such as "bold", or any
Packit Service fb6fa5
number of attributes (such as large and bold and centered for a
Packit Service fb6fa5
"header" tag).
Packit Service fb6fa5
Packit Service fb6fa5
The tags that can be applied to a given buffer are stored in the
Packit Service fb6fa5
GtkTextTagTable for that buffer. The tag table is just a collection of
Packit Service fb6fa5
tags.
Packit Service fb6fa5
Packit Service fb6fa5
The real work of applying/removing tags happens in the function
Packit Service fb6fa5
_gtk_text_btree_tag(). Essentially we remove all tag toggle segments
Packit Service fb6fa5
that affect the tag being applied or removed from the given range;
Packit Service fb6fa5
then we add a toggle-on and a toggle-off segment at either end of the
Packit Service fb6fa5
range; then for any lines we modified, we call the CleanupFunc
Packit Service fb6fa5
routines for the segments, to merge segments that can be merged.
Packit Service fb6fa5
Packit Service fb6fa5
This is complicated somewhat because we keep information about the tag
Packit Service fb6fa5
toggles in the btree, allowing us to locate tagged regions or
Packit Service fb6fa5
add/remove tags in O(log N) instead of O(N) time. Tag information is
Packit Service fb6fa5
stored in "struct Summary" (that's a bad name, it could probably use
Packit Service fb6fa5
renaming). Each BTreeNode has a list of Summary hanging off of it, one
Packit Service fb6fa5
for each tag that's toggled somewhere below the node. The Summary
Packit Service fb6fa5
simply contains a count of tag toggle segments found below the node.
Packit Service fb6fa5
Packit Service fb6fa5
Packit Service fb6fa5
Views of the BTree (GtkTextLayout)
Packit Service fb6fa5
===
Packit Service fb6fa5
Packit Service fb6fa5
Each BTree has one or more views that display the tree.  Originally
Packit Service fb6fa5
there was some idea that a view could be any object, so there are some
Packit Service fb6fa5
"gpointer view_id" left in the code. However, at some point we decided
Packit Service fb6fa5
that all views had to be a GtkTextLayout and so the btree does assume
Packit Service fb6fa5
that from time to time.
Packit Service fb6fa5
Packit Service fb6fa5
The BTree maintains some per-line and per-node data that is specific 
Packit Service fb6fa5
to each view. The per-line data is in GtkTextLineData and the per-node
Packit Service fb6fa5
data is in another badly-named struct called NodeData (should be
Packit Service fb6fa5
PerViewNodeData or something). The purpose of these is to store:
Packit Service fb6fa5
Packit Service fb6fa5
 - aggregate height, so we can calculate the Y position of each
Packit Service fb6fa5
   paragraph in O(log N) time, and can get the full height 
Packit Service fb6fa5
   of the buffer in O(1) time. The height is per-view since 
Packit Service fb6fa5
   each GtkTextView may have a different size allocation.
Packit Service fb6fa5
Packit Service fb6fa5
 - maximum width (the longest line), so we can calculate the width of
Packit Service fb6fa5
   the entire buffer in O(1) time in order to properly set up the
Packit Service fb6fa5
   horizontal scrollbar.
Packit Service fb6fa5
Packit Service fb6fa5
 - a flag for whether the line is "valid" - valid lines have not been
Packit Service fb6fa5
   modified since we last computed their width and height. Invalid
Packit Service fb6fa5
   lines need to have their width and height recomputed.
Packit Service fb6fa5
Packit Service fb6fa5
At all times, we have a width and height for each view that can be
Packit Service fb6fa5
used. This starts out as 0x0. Lines can be incrementally revalidated,
Packit Service fb6fa5
which causes the width and height of the buffer to grow. So if you
Packit Service fb6fa5
open a new text widget with a lot of text in it, you can watch the
Packit Service fb6fa5
scrollbar adjust as the height is computed in an idle handler.  Lines
Packit Service fb6fa5
whose height has never been computed are taken to have a height of 0.
Packit Service fb6fa5
Packit Service fb6fa5
Iterators (GtkTextIter)
Packit Service fb6fa5
===
Packit Service fb6fa5
Packit Service fb6fa5
Iterators are fairly complex in order to avoid re-traversing the btree
Packit Service fb6fa5
or a line in the btree each time the iterator is used. That is, they
Packit Service fb6fa5
save a bunch of pointers - to the current segment, the current line,
Packit Service fb6fa5
etc.
Packit Service fb6fa5
Packit Service fb6fa5
Two "validity stamps" are kept in the btree that are used to detect
Packit Service fb6fa5
and handle possibly-invalid pointers in iterators. The
Packit Service fb6fa5
"chars_changed_stamp" is incremented whenever a segment with
Packit Service fb6fa5
char_count > 0 (an indexable segment) is added or removed. It is an
Packit Service fb6fa5
application bug if the application uses an iterator with a
Packit Service fb6fa5
chars_changed_stamp different from the current stamp of the BTree.
Packit Service fb6fa5
That is, you can't use an iterator after adding/removing characters.
Packit Service fb6fa5
Packit Service fb6fa5
The "segments_changed_stamp" is incremented any time we change any
Packit Service fb6fa5
segments, and tells outstanding iterators that any pointers to 
Packit Service fb6fa5
GtkTextSegment that they may be holding are now invalid. For example, 
Packit Service fb6fa5
if you are iterating over a character segment, and insert a mark in
Packit Service fb6fa5
the middle of the segment, the character segment will be split in half
Packit Service fb6fa5
and the original segment will be freed. This increments
Packit Service fb6fa5
segments_changed_stamp, causing your iterator to drop its current
Packit Service fb6fa5
segment pointer and count from the beginning of the line again to find 
Packit Service fb6fa5
the new segment.
Packit Service fb6fa5
Packit Service fb6fa5
Iterators also cache some random information such as the current line
Packit Service fb6fa5
number, just because it's free to do so.
Packit Service fb6fa5
Packit Service fb6fa5
GtkTextLayout
Packit Service fb6fa5
===
Packit Service fb6fa5
Packit Service fb6fa5
If you think of GtkTextBTree as the backend for GtkTextBuffer,
Packit Service fb6fa5
GtkTextLayout is the backend for GtkTextView. GtkTextLayout was also
Packit Service fb6fa5
used for a canvas item at one point, which is why its methods are not
Packit Service fb6fa5
underscore-prefixed and the header gets installed. But GtkTextLayout
Packit Service fb6fa5
is really intended to be private.
Packit Service fb6fa5
Packit Service fb6fa5
The main task of GtkTextLayout is to validate lines (compute their
Packit Service fb6fa5
width and height) by converting the lines to a PangoLayout and using
Packit Service fb6fa5
Pango functions. GtkTextLayout is also used for visual iteration, and
Packit Service fb6fa5
mapping visual locations to logical buffer positions.
Packit Service fb6fa5
Packit Service fb6fa5
Validating a line involves creating the GtkTextLineDisplay for that 
Packit Service fb6fa5
line. To save memory, GtkTextLineDisplay objects are always created
Packit Service fb6fa5
transiently, we don't keep them around.
Packit Service fb6fa5
Packit Service fb6fa5
The layout has three signals:
Packit Service fb6fa5
Packit Service fb6fa5
 - "invalidated" means some line was changed, so GtkTextView 
Packit Service fb6fa5
   needs to install idle handlers to revalidate.
Packit Service fb6fa5
Packit Service fb6fa5
 - "changed" means some lines were validated, so the aggregate
Packit Service fb6fa5
   width/height of the BTree is now different.
Packit Service fb6fa5
Packit Service fb6fa5
 - "allocate_child" means we need to size allocate a 
Packit Service fb6fa5
   child widget
Packit Service fb6fa5
Packit Service fb6fa5
gtk_text_layout_get_line_display() is sort of the "heart" of
Packit Service fb6fa5
GtkTextLayout. This function validates a line. 
Packit Service fb6fa5
Packit Service fb6fa5
Line validation involves:
Packit Service fb6fa5
Packit Service fb6fa5
 - convert any GtkTextTag on the line to PangoAttrList
Packit Service fb6fa5
 
Packit Service fb6fa5
 - add the preedit string
Packit Service fb6fa5
Packit Service fb6fa5
 - keep track of "visible marks" (the cursor)
Packit Service fb6fa5
Packit Service fb6fa5
A given set of tags is composited to a GtkTextAttributes. (In the Tk
Packit Service fb6fa5
code this was called a "style" and there are still relics of this in
Packit Service fb6fa5
the code, such as "invalidate_cached_style()", that should be cleaned
Packit Service fb6fa5
up.) 
Packit Service fb6fa5
Packit Service fb6fa5
There's a single-GtkTextAttributes cache, "layout->one_style_cache",
Packit Service fb6fa5
which is used to avoid recomputing the mapping from tags to attributes
Packit Service fb6fa5
for every segment. The one_style_cache is stored in the GtkTextLayout
Packit Service fb6fa5
instead of just a local variable in gtk_text_layout_get_line_display()
Packit Service fb6fa5
so we can use it across multiple lines. Any time we see a segment that
Packit Service fb6fa5
may change the current style (such as a tag toggle), the cache has to
Packit Service fb6fa5
be dropped.
Packit Service fb6fa5
Packit Service fb6fa5
To compute a GtkTextAttributes from the GtkTextTag that apply to a
Packit Service fb6fa5
given segment, the function is _gtk_text_attributes_fill_from_tags(). 
Packit Service fb6fa5
This "mashes" a list of tags into a single set of text attributes. 
Packit Service fb6fa5
If no tags affect a given attribute, a default set of attributes are
Packit Service fb6fa5
used. These defaults sometimes come from widget->style on the
Packit Service fb6fa5
GtkTextView, and sometimes come from a property of the GtkTextView
Packit Service fb6fa5
such as "pixels_above_lines"
Packit Service fb6fa5
Packit Service fb6fa5
GtkTextView
Packit Service fb6fa5
===
Packit Service fb6fa5
Packit Service fb6fa5
Once you get GtkTextLayout and GtkTextBTree the actual GtkTextView 
Packit Service fb6fa5
widget is not that complicated.
Packit Service fb6fa5
Packit Service fb6fa5
The main complexity is the interaction between scrolling and line
Packit Service fb6fa5
validation, which is documented with a long comment in gtktextview.c.
Packit Service fb6fa5
Packit Service fb6fa5
The other thing to know about is just that the text view has "border
Packit Service fb6fa5
windows" on the sides, used to draw line numbers and such; these
Packit Service fb6fa5
scroll along with the main window.
Packit Service fb6fa5
Packit Service fb6fa5
Invisible text
Packit Service fb6fa5
===
Packit Service fb6fa5
Packit Service fb6fa5
Invisible text doesn't work yet. It is a property that can be set by a
Packit Service fb6fa5
GtkTextTag; so you determine whether text is invisible using the same
Packit Service fb6fa5
mechanism you would use to check whether the text is bold, or orange.
Packit Service fb6fa5
Packit Service fb6fa5
The intended behavior of invisible text is that it should vanish
Packit Service fb6fa5
completely, as if it did not exist. The use-case we were thinking of
Packit Service fb6fa5
was a code editor with function folding, where you can hide all
Packit Service fb6fa5
function bodies. That could be implemented by creating a
Packit Service fb6fa5
"function_body" GtkTextTag and toggling its "invisible" attribute to
Packit Service fb6fa5
hide/show the function bodies.
Packit Service fb6fa5
Packit Service fb6fa5
Lines are normally validated in an idle handler, but as an exception,
Packit Service fb6fa5
lines that are onscreen are always validated synchronously. Thus
Packit Service fb6fa5
invisible text raises the danger that we might have a huge number of
Packit Service fb6fa5
invisible lines "onscreen" - this needs to be handled efficiently.
Packit Service fb6fa5
Packit Service fb6fa5
At one point we were considering making "invisible" a per-paragraph
Packit Service fb6fa5
attribute (meaning the invisibility state of the first character in
Packit Service fb6fa5
the paragraph makes the whole paragraph visible or not
Packit Service fb6fa5
visible). Several existing tag attributes work this way, such as the
Packit Service fb6fa5
margin width. I don't remember why we were going to do this, but it
Packit Service fb6fa5
may have been due to some implementation difficulty that will become
Packit Service fb6fa5
clear if you try implementing invisible text. ;-)
Packit Service fb6fa5
Packit Service fb6fa5
To finish invisible text support, all the cursor navigation
Packit Service fb6fa5
etc. functions (the _display_lines() stuff) will need to skip
Packit Service fb6fa5
invisible text. Also, various functions with _visible in the name,
Packit Service fb6fa5
such as gtk_text_iter_get_visible_text(), have to be audited to be
Packit Service fb6fa5
sure they don't get invisible text. And user operations such as
Packit Service fb6fa5
cut-and-paste need to copy only visible text.
Packit Service fb6fa5