Blob Blame History Raw
Date: Sun, 14 Sep 1997 20:17:06 -0700 (PDT)
From: Josh MacDonald <jmacd@CS.Berkeley.EDU>
Subject: [gtk-list] gtktext widget internal documentation

Pete convinced me to just write up the text widget and let someone else
finish it.  I'm pretty busy and have other commitments now.  Sorry.  I think
I'm not the most qualified for some of the remaining work anyway, because I
don't really know Gtk and it's event model very well.  Most of the work so 
far was possible without knowing Gtk all that well, it was simply a data 
structure exercise (though after reading this you might say it was a fairly
complicated data structure exercise).  I'm happy to answer questions.


High level description:

There are several layers of data structure to the widget.  They are
separated from each other as much as possible.  The first is a gapped
text segment similar to the data structure Emacs uses for representing
text.  Then there is a property list, which stores text properties for
various ranges of text.  There is no direct relation between the text
property list and the gapped text segment.  Finally there is a drawn
line parameter cache to speed calculations when drawing and redrawing
lines on screen.  In addition to these data structures, there are
structures to help iterate over text in the buffer.

The gapped text segment is quite simple.  It's parameters are (all
parameters I mention here are in the structure GtkText):

  guchar* text;
  guint text_len;
  guint gap_position;
  guint gap_size;
  guint text_end;

TEXT is the buffer, TEXT_LEN is its allocated length.  TEXT_END is the
length of the text, including the gap.  GAP_POSITION is the start of
the gap, and GAP_SIZE is the gap's length.  Therefore, TEXT_END -
GAP_SIZE is the length of the text in the buffer.  The macro
TEXT_LENGTH returns this value.  To get the value of a character in
the buffer, use the macro TEXT_INDEX(TEXT,INDEX).  This macro tests
whether the index is less than the GAP_POSITION and returns
TEXT[INDEX] or returns TEXT[GAP_SIZE+INDEX].  The function
MOVE_GAP_TO_POINT positions the gap to a particular index.  The
function MAKE_FORWARD_SPACE lengthens the gap to provide room for a
certain number of characters.

The property list is a doubly linked list (GList) of text property
data for each contiguous set of characters with similar properties.
The data field of the GList points to a TextProperty structure, which

  TextFont* font;
  GdkColor* back_color;
  GdkColor* fore_color;
  guint length;

Currently, only font and color data are contained in the property
list, but it can be extended by modifying the INSERT_TEXT_PROPERTY,
TEXT_PROPERTIES_EQUAL, and a few other procedures.  The text property
structure does not contain an absolute offset, only a length.  As a
result, inserting a character into the buffer simply requires moving
the gap to the correct position, making room in the buffer, and either
inserting a new property or extending the old one.  This logic is done
by INSERT_TEXT_PROPERTY.  A similar procedure exists to delete from
the text property list, DELETE_TEXT_PROPERTY.  Since the property
structure doesn't contain an offset, insertion into the list is an
O(1) operation.  All such operations act on the insertion point, which
is the POINT field of the GtkText structure.

The GtkPropertyMark structure is used for keeping track of the mapping
between absolute buffer offsets and positions in the property list.
These will be referred to as property marks.  Generally, there are
four property marks the system keeps track of.  Two are trivial, the
beginning and the end of the buffer are easy to find.  The other two
are the insertion point (POINT) and the cursor point (CURSOR_MARK).
All operations on the text buffer are done using a property mark as a
sort of cursor to keep track of the alignment of the property list and
the absolute buffer offset.  The GtkPropertyMark structure contains:

  GList* property;
  guint offset;
  guint index;

PROPERTY is a pointer at the current property list element.  INDEX is
the absolute buffer index, and OFFSET is the offset of INDEX from the
beginning of PROPERTY.  It is essential to keep property marks valid,
or else you will have the wrong text properties at each property mark
transition.  An important point is that all property marks are invalid
after a buffer modification unless care is taken to keep them
accurate.  That is the difficulty of the insert and delete operations,
because as the next section describes, line data is cached and by
necessity contains text property marks.  The functions for operating
and computing property marks are:

 void advance_mark     (GtkPropertyMark* mark);
 void decrement_mark   (GtkPropertyMark* mark);
 void advance_mark_n   (GtkPropertyMark* mark, gint n);
 void decrement_mark_n (GtkPropertyMark* mark, gint n);
 void move_mark_n      (GtkPropertyMark* mark, gint n);

 GtkPropertyMark find_mark      (GtkText* text, guint mark_position);
 GtkPropertyMark find_mark_near (GtkText* text, guint mark_position,
                                 const GtkPropertyMark* near);

ADVANCE_MARK and DECREMENT_MARK modify the mark by plus or minus one
buffer index.  ADVANCE_MARK_N and DECREMENT_MARK_N modify the mark by
plus or minus N indices.  MOVE_MARK_N accepts a positive or negative
argument.  FIND_MARK returns a mark at MARK_POSITION using a linear
search from the nearest known property mark (the beginning, the end,
the point, etc).  FIND_MARK_NEAR also does a linear search, but
searches from the NEAR argument.  A number of macros exist at the top
of the file for doing things like getting the current text property,
or some component of the current property.  See the MARK_* macros.

Next there is a LineParams structure which contains all the
information necessary to draw one line of text on screen.  When I say
"line" here, I do not mean one line of text separated by newlines,
rather I mean one row of text on screen.  It is a matter of policy how
visible lines are chosen and there are currently two policies,
line-wrap and no-line-wrap.  I suspect it would not be difficult to
implement new policies for doing such things as justification.  The
LineParams structure includes the following fields:

  guint font_ascent;
  guint font_descent;
  guint pixel_width;
  guint displayable_chars;
  guint wraps : 1;

  PrevTabCont tab_cont;
  PrevTabCont tab_cont_next;

  GtkPropertyMark start;
  GtkPropertyMark end;

FONT_ASCENT and FONT_DESCENT are the maximum ascent and descent of any
character in the line.  PIXEL_WIDTH is the number of pixels wide the
drawn region is, though I don't think it's actually being used
currently.  You may wish to remove this field, eventually, though I
suspect it will come in handy implementing horizontal scrolling.
DISPLAYABLE_CHARS is the number of characters in the line actually
drawn.  This may be less than the number of characters in the line
when line wrapping is off (see below).  The bitflag WRAPS tells
whether the next line is a continuation of this line.  START and END
are the marks at the beginning and end of the line.  Note that END is
the actual last character, not one past it, so the smallest line
(containing, for example, one newline) has START == END.  TAB_CONT and
TAB_CONT_NEXT are for computation of tab positions.  I will discuss
them later.

A point about the end of the buffer.  You may be tempted to consider
working with the buffer as an array of length TEXT_LENGTH(TEXT), but
you have to be careful that the editor allows you to position your
cursor at the last index of the buffer, one past the last character.
The macro LAST_INDEX(TEXT, MARK) returns true if MARK is positioned at
this index.  If you see or add a special case in the code for this
end-of-buffer case, make sure to use LAST_INDEX if you can.  Very
often, the last index is treated as a newline.

[ One way the last index is special is that, although it is always
  part of some property, it will never be part of a property of
  length 1 unless there are no other characters in the text. That
  is, its properties are always that of the preceding character,
  if any.
  There is a fair bit of special case code to maintain this condition -
  which is needed so that user has control over the properties of
  characters inserted at the last position. OWT 2/9/98 ]

Tab stops are variable width.  A list of tab stops is contained in the
GtkText structure:

  GList *tab_stops;
  gint default_tab_width;

The elements of tab_stops are integers casted to gpointer.  This is a
little bogus, but works.  For example:

  text->default_tab_width = 4;
  text->tab_stops = NULL;
  text->tab_stops = g_list_prepend (text->tab_stops, (void*)8);
  text->tab_stops = g_list_prepend (text->tab_stops, (void*)8);

is how these fields are initialized, currently.  This means that the
first two tabs occur at 8 and 16, and every 4 characters thereafter.
Tab stops are used in the computation of line geometry (to fill in a
LineParams structure), and the width of the space character in the
current font is used.  The PrevTabCont structure, of which two are
stored per line, is used to compute the geometry of lines which may
have wrapped and carried part of a tab with them:

  guint pixel_offset;
  TabStopMark tab_start;

PIXEL_OFFSET is the number of pixels at which the line should start,
and tab_start is a tab stop mark, which is similar to a property mark,
only it keeps track of the mapping between line position (column) and
the next tab stop.  A TabStopMark contains:

  GList* tab_stops;
  gint to_next_tab;

TAB_STOPS is a pointer into the TAB_STOPS field of the GtkText
structure.  TO_NEXT_TAB is the number of characters before the next
tab.  The functions ADVANCE_TAB_MARK and ADVANCE_TAB_MARK_N advance
these marks.  The LineParams structure contains two PrevTabCont
structures, which each contain a tab stop.  The first (TAB_CONT) is
for computing the beginning pixel offset, as mentioned above.  The
second (TAB_CONT_NEXT) is used to initialize the TAB_CONT field of the
next line if it wraps.

Since computing the parameters of a line are fairly complicated, I
have one interface that should be all you ever need to figure out
something about a line.  The function FIND_LINE_PARAMS computes the
parameters of a single line.  The function LINE_PARAMS_ITERATE is used
for computing the properties of some number (> 0) of sequential lines.

line_params_iterate (GtkText* text,
		     const GtkPropertyMark* mark0,
		     const PrevTabCont* tab_mark0,
		     gboolean alloc,
		     gpointer data,
		     LineIteratorFunction iter);

where LineIteratorFunction is:

typedef gint (*LineIteratorFunction) (GtkText* text,
                                      LineParams* lp,
                                      gpointer data);

The arguments are a text widget (TEXT), the property mark at the
beginning of the first line (MARK0), the tab stop mark at the
beginning of that line (TAB_MARK0), whether to heap-allocate the
LineParams structure (ALLOC), some client data (DATA), and a function
to call with the parameters of each line.  TAB_MARK0 may be NULL, but
if so MARK0 MUST BE A REAL LINE START (not a continued line start; it
is preceded by a newline).  If TAB_MARK0 is not NULL, MARK0 may be any
line start (continued or not).  See the code for examples.  The
function ITER is called with each LineParams computed.  If ALLOC was
true, LINE_PARAMS_ITERATE heap-allocates the LineParams and does not
free them.  Otherwise, no storage is permanently allocated.  ITER
should return TRUE when it wishes to continue no longer.

There are currently two uses of LINE_PARAMS_ITERATE:

* Compute the total buffer height for setting the parameters of the
  scroll bars.  This is done in SET_VERTICAL_SCROLL each time the
  window is resized.  When horizontal scrolling is added, depending on
  the policy chosen, the max line width can be computed here as well.

* Computing geometry of some pixel height worth of lines.  This is

The GtkText structure contains a cache of the LineParams data for all
visible lines:

  GList *current_line;
  GList *line_start_cache;

  guint first_line_start_index;
  guint first_cut_pixels;
  guint first_onscreen_hor_pixel;
  guint first_onscreen_ver_pixel;

LINE_START_CACHE is a doubly linked list of LineParams.  CURRENT_LINE
is a transient piece of data which is set in various places such as
the mouse click code.  Generally, it is the line on which the cursor
property mark CURSOR_MARK is on.  LINE_START_CACHE points to the first
visible line and may contain PREV pointers if the cached data of
offscreen lines is kept around.  I haven't come up with a policy.  The
cache can keep more lines than are visible if desired, but the result
is that inserts and deletes will then become slower as the entire
cache has to be "corrected".  Right now it doesn't delete from the
cache (it should).  As a result, scrolling through the whole buffer
once will fill the cache with an entry for each line, and subsequent
modifications will be slower than they should
be. FIRST_LINE_START_INDEX is the index of the *REAL* line start of
the first line.  That is, if the first visible line is a continued
line, this is the index of the real line start (preceded by a
newline).  FIRST_CUT_PIXELS is the number of pixels which are not
drawn on the first visible line.  If FIRST_CUT_PIXELS is zero, the
whole line is visible.  FIRST_ONSCREEN_HOR_PIXEL is not used.
FIRST_ONSCREEN_VER_PIXEL is the absolute pixel which starts the
visible region.  This is used for setting the vertical scroll bar.

Other miscellaneous things in the GtkText structure:

Gtk specific things:

  GtkWidget widget;

  GdkWindow *text_area;

  GtkAdjustment *hadj;
  GtkAdjustment *vadj;

  GdkGC *gc;

  GdkPixmap* line_wrap_bitmap;
  GdkPixmap* line_arrow_bitmap;

These are pretty self explanatory, especially if you know Gtk.
LINE_WRAP_BITMAP and LINE_ARROW_BITMAP are two bitmaps used to
indicate that a line wraps and is continued offscreen, respectively.

Some flags:

  guint has_cursor : 1;
  guint is_editable : 1;
  guint line_wrap : 1;
  guint freeze : 1;
  guint has_selection : 1;
  guint own_selection : 1;

HAS_CURSOR is true iff the cursor is visible.  IS_EDITABLE is true iff
the user is allowed to modify the buffer.  If IS_EDITABLE is false,
HAS_CURSOR is guaranteed to be false.  If IS_EDITABLE is true,
HAS_CURSOR starts out false and is set to true the first time the user
clicks in the window.  LINE_WRAP is where the line-wrap policy is
set.  True means wrap lines, false means continue lines offscreen,

The text properties list:

  GList *text_properties;
  GList *text_properties_end;

A scratch area used for constructing a contiguous piece of the buffer
which may otherwise span the gap.  It is not strictly necessary
but simplifies the drawing code because it does not need to deal with
the gap.

  guchar* scratch_buffer;
  guint   scratch_buffer_len;

The last vertical scrollbar position.  Currently this looks the same
as FIRST_ONSCREEN_VER_PIXEL.  I can't remember why I have two values.
Perhaps someone should clean this up.

  gint last_ver_value;

The cursor:

  gint            cursor_pos_x;
  gint            cursor_pos_y;
  GtkPropertyMark cursor_mark;
  gchar           cursor_char;
  gchar           cursor_char_offset;
  gint            cursor_virtual_x;
  gint            cursor_drawn_level;

CURSOR_POS_X and CURSOR_POS_Y are the screen coordinates of the
cursor.  CURSOR_MARK is the buffer position.  CURSOR_CHAR is
TEXT_INDEX (TEXT, CURSOR_MARK.INDEX) if a drawable character, or 0 if
it is whitespace, which is treated specially.  CURSOR_CHAR_OFFSET is
the pixel offset above the base of the line at which it should be
drawn.  Note that the base of the line is not the "baseline" in the
traditional font metric sense.  A line (LineParams) is
"baseline" is FONT_DESCENT below the base of the line.  I think this
requires a drawing.

0                      AAAAAAA
1                      AAAAAAA
2                     AAAAAAAAA
3                     AAAAAAAAA
4                    AAAAA AAAAA
5                    AAAAA AAAAA
6                   AAAAA   AAAAA
7                  AAAAA     AAAAA
8                  AAAAA     AAAAA
9                 AAAAAAAAAAAAAAAAA
10                AAAAAAAAAAAAAAAAA
11               AAAAA         AAAAA
12               AAAAA         AAAAA
13              AAAAAA         AAAAAA

This line is 20 pixels high, has FONT_ASCENT=14, FONT_DESCENT=6.  It's
"base" is at y=20.  Characters are drawn at y=14.  The LINE_START
macro returns the pixel height.  The LINE_CONTAINS macro is true if
the line contains a certain buffer index.  The LINE_STARTS_AT macro is
true if the line starts at a certain buffer index.  The
LINE_START_PIXEL is the pixel offset the line should be drawn at,
according the the tab continuation of the previous line.

Exposure and drawing:

Exposure is handled from the EXPOSE_TEXT function.  It assumes that
the LINE_START_CACHE and all its parameters are accurate and simply
exposes any line which is in the exposure region.  It calls the
CLEAR_AREA function to clear the background and/or lay down a pixmap
background.  The text widget has a scrollable pixmap background, which
is implemented in CLEAR_AREA.  CLEAR_AREA does the math to figure out
how to tile the pixmap itself so that it can scroll the text with a
copy area call.  If the CURSOR argument to EXPOSE_TEXT is true, it
also draws the cursor.

The function DRAW_LINE draws a single line, doing all the tab and
color computations necessary.  The function DRAW_LINE_WRAP draws the
line wrap bitmap at the end of the line if it wraps.  TEXT_EXPOSE will
expand the cached line data list if it has to by calling
and undraw the cursor.  They count the number of draws and undraws so
that the cursor may be undrawn even if the cursor is already undrawn
and the re-draw will not occur too early.  This is useful in handling

Handling of the cursor is a little messed up, I should add.  It has to
be undrawn and drawn at various places.  Something better needs to be
done about this, because it currently doesn't do the right thing in
certain places.  I can't remember where very well.  Look for the calls

RECOMPUTE_GEOMETRY is called when the geometry of the window changes
or when it is first drawn.  This is probably not done right.  My
biggest weakness in writing this code is that I've never written a
widget before so I got most of the event handling stuff wrong as far
as Gtk is concerned.  Fortunately, most of the code is unrelated and
simply an exercise in data structure manipulation.


Scrolling is fairly straightforward.  It looks at the top line, and
advances it pixel by pixel until the FIRST_CUT_PIXELS equals the line
height and then advances the LINE_START_CACHE.  When it runs out of
lines it fetches more.  The function SCROLL_INT is used to scroll from
inside the code, it calls the appropriate functions and handles
updating the scroll bars.  It dispatches a change event which causes
Gtk to call the correct scroll action, which then enters SCROLL_UP or
SCROLL_DOWN.  Careful with the cursor during these changes.

Insertion, deletion:

There's some confusion right now over what to do with the cursor when
it's offscreen due to scrolling.  This is a policy decision.  I don't
know what's best.  Spencer criticized me for forcing it to stay
onscreen.  It shouldn't be hard to make stuff work with the cursor

Currently I've got functions to do insertion and deletion of a single
character.  It's fairly complicated.  In order to do efficient pasting
into the buffer, or write code that modifies the buffer while the
buffer is drawn, it needs to do multiple characters at at time.  This
is the hardest part of what remains.  Currently, gtk_text_insert does
not re-expose the modified lines.  It needs to.  Pete did this wrong at
one point and I disabled modification completely, I don't know what
the current state of things are.  The functions
Here's pseudo code for insert.  Delete is quite similar.

  insert character into the buffer
  update the text property list
  move the point
  undraw the cursor
  correct all LineParams cache entries after the insertion point
  compute the new height of the modified line
  compare with the old height of the modified line
  remove the old LineParams from the cache
  insert the new LineParams into the cache
  if the lines are of different height, do a copy area to move the
    area below the insertion down
  expose the current line
  update the cursor mark
  redraw the cursor

What needs to be done:

Horizontal scrolling, robustness, testing, selection handling.  If you
want to work in the text widget pay attention to the debugging
facilities I've written at the end of gtktext.c.  I'm sorry I waited
so long to try and pass this off.  I'm super busy with school and
work, and when I have free time my highest priority is another version
of PRCS.

Feel free to ask me questions.