Date: Sun, 14 Sep 1997 20:17:06 -0700 (PDT) From: Josh MacDonald To: gnome@athena.nuclecu.unam.mx, gtk-list@redhat.com Subject: [gtk-list] gtktext widget internal documentation Pete convinced me to just write up the text widget and let someone else finish it. I'm pretty busy and have other commitments now. Sorry. I think I'm not the most qualified for some of the remaining work anyway, because I don't really know Gtk and it's event model very well. Most of the work so far was possible without knowing Gtk all that well, it was simply a data structure exercise (though after reading this you might say it was a fairly complicated data structure exercise). I'm happy to answer questions. -josh High level description: There are several layers of data structure to the widget. They are separated from each other as much as possible. The first is a gapped text segment similar to the data structure Emacs uses for representing text. Then there is a property list, which stores text properties for various ranges of text. There is no direct relation between the text property list and the gapped text segment. Finally there is a drawn line parameter cache to speed calculations when drawing and redrawing lines on screen. In addition to these data structures, there are structures to help iterate over text in the buffer. The gapped text segment is quite simple. It's parameters are (all parameters I mention here are in the structure GtkText): guchar* text; guint text_len; guint gap_position; guint gap_size; guint text_end; TEXT is the buffer, TEXT_LEN is its allocated length. TEXT_END is the length of the text, including the gap. GAP_POSITION is the start of the gap, and GAP_SIZE is the gap's length. Therefore, TEXT_END - GAP_SIZE is the length of the text in the buffer. The macro TEXT_LENGTH returns this value. To get the value of a character in the buffer, use the macro TEXT_INDEX(TEXT,INDEX). This macro tests whether the index is less than the GAP_POSITION and returns TEXT[INDEX] or returns TEXT[GAP_SIZE+INDEX]. The function MOVE_GAP_TO_POINT positions the gap to a particular index. The function MAKE_FORWARD_SPACE lengthens the gap to provide room for a certain number of characters. The property list is a doubly linked list (GList) of text property data for each contiguous set of characters with similar properties. The data field of the GList points to a TextProperty structure, which contains: TextFont* font; GdkColor* back_color; GdkColor* fore_color; guint length; Currently, only font and color data are contained in the property list, but it can be extended by modifying the INSERT_TEXT_PROPERTY, TEXT_PROPERTIES_EQUAL, and a few other procedures. The text property structure does not contain an absolute offset, only a length. As a result, inserting a character into the buffer simply requires moving the gap to the correct position, making room in the buffer, and either inserting a new property or extending the old one. This logic is done by INSERT_TEXT_PROPERTY. A similar procedure exists to delete from the text property list, DELETE_TEXT_PROPERTY. Since the property structure doesn't contain an offset, insertion into the list is an O(1) operation. All such operations act on the insertion point, which is the POINT field of the GtkText structure. The GtkPropertyMark structure is used for keeping track of the mapping between absolute buffer offsets and positions in the property list. These will be referred to as property marks. Generally, there are four property marks the system keeps track of. Two are trivial, the beginning and the end of the buffer are easy to find. The other two are the insertion point (POINT) and the cursor point (CURSOR_MARK). All operations on the text buffer are done using a property mark as a sort of cursor to keep track of the alignment of the property list and the absolute buffer offset. The GtkPropertyMark structure contains: GList* property; guint offset; guint index; PROPERTY is a pointer at the current property list element. INDEX is the absolute buffer index, and OFFSET is the offset of INDEX from the beginning of PROPERTY. It is essential to keep property marks valid, or else you will have the wrong text properties at each property mark transition. An important point is that all property marks are invalid after a buffer modification unless care is taken to keep them accurate. That is the difficulty of the insert and delete operations, because as the next section describes, line data is cached and by necessity contains text property marks. The functions for operating and computing property marks are: void advance_mark (GtkPropertyMark* mark); void decrement_mark (GtkPropertyMark* mark); void advance_mark_n (GtkPropertyMark* mark, gint n); void decrement_mark_n (GtkPropertyMark* mark, gint n); void move_mark_n (GtkPropertyMark* mark, gint n); GtkPropertyMark find_mark (GtkText* text, guint mark_position); GtkPropertyMark find_mark_near (GtkText* text, guint mark_position, const GtkPropertyMark* near); ADVANCE_MARK and DECREMENT_MARK modify the mark by plus or minus one buffer index. ADVANCE_MARK_N and DECREMENT_MARK_N modify the mark by plus or minus N indices. MOVE_MARK_N accepts a positive or negative argument. FIND_MARK returns a mark at MARK_POSITION using a linear search from the nearest known property mark (the beginning, the end, the point, etc). FIND_MARK_NEAR also does a linear search, but searches from the NEAR argument. A number of macros exist at the top of the file for doing things like getting the current text property, or some component of the current property. See the MARK_* macros. Next there is a LineParams structure which contains all the information necessary to draw one line of text on screen. When I say "line" here, I do not mean one line of text separated by newlines, rather I mean one row of text on screen. It is a matter of policy how visible lines are chosen and there are currently two policies, line-wrap and no-line-wrap. I suspect it would not be difficult to implement new policies for doing such things as justification. The LineParams structure includes the following fields: guint font_ascent; guint font_descent; guint pixel_width; guint displayable_chars; guint wraps : 1; PrevTabCont tab_cont; PrevTabCont tab_cont_next; GtkPropertyMark start; GtkPropertyMark end; FONT_ASCENT and FONT_DESCENT are the maximum ascent and descent of any character in the line. PIXEL_WIDTH is the number of pixels wide the drawn region is, though I don't think it's actually being used currently. You may wish to remove this field, eventually, though I suspect it will come in handy implementing horizontal scrolling. DISPLAYABLE_CHARS is the number of characters in the line actually drawn. This may be less than the number of characters in the line when line wrapping is off (see below). The bitflag WRAPS tells whether the next line is a continuation of this line. START and END are the marks at the beginning and end of the line. Note that END is the actual last character, not one past it, so the smallest line (containing, for example, one newline) has START == END. TAB_CONT and TAB_CONT_NEXT are for computation of tab positions. I will discuss them later. A point about the end of the buffer. You may be tempted to consider working with the buffer as an array of length TEXT_LENGTH(TEXT), but you have to be careful that the editor allows you to position your cursor at the last index of the buffer, one past the last character. The macro LAST_INDEX(TEXT, MARK) returns true if MARK is positioned at this index. If you see or add a special case in the code for this end-of-buffer case, make sure to use LAST_INDEX if you can. Very often, the last index is treated as a newline. [ One way the last index is special is that, although it is always part of some property, it will never be part of a property of length 1 unless there are no other characters in the text. That is, its properties are always that of the preceding character, if any. There is a fair bit of special case code to maintain this condition - which is needed so that user has control over the properties of characters inserted at the last position. OWT 2/9/98 ] Tab stops are variable width. A list of tab stops is contained in the GtkText structure: GList *tab_stops; gint default_tab_width; The elements of tab_stops are integers casted to gpointer. This is a little bogus, but works. For example: text->default_tab_width = 4; text->tab_stops = NULL; text->tab_stops = g_list_prepend (text->tab_stops, (void*)8); text->tab_stops = g_list_prepend (text->tab_stops, (void*)8); is how these fields are initialized, currently. This means that the first two tabs occur at 8 and 16, and every 4 characters thereafter. Tab stops are used in the computation of line geometry (to fill in a LineParams structure), and the width of the space character in the current font is used. The PrevTabCont structure, of which two are stored per line, is used to compute the geometry of lines which may have wrapped and carried part of a tab with them: guint pixel_offset; TabStopMark tab_start; PIXEL_OFFSET is the number of pixels at which the line should start, and tab_start is a tab stop mark, which is similar to a property mark, only it keeps track of the mapping between line position (column) and the next tab stop. A TabStopMark contains: GList* tab_stops; gint to_next_tab; TAB_STOPS is a pointer into the TAB_STOPS field of the GtkText structure. TO_NEXT_TAB is the number of characters before the next tab. The functions ADVANCE_TAB_MARK and ADVANCE_TAB_MARK_N advance these marks. The LineParams structure contains two PrevTabCont structures, which each contain a tab stop. The first (TAB_CONT) is for computing the beginning pixel offset, as mentioned above. The second (TAB_CONT_NEXT) is used to initialize the TAB_CONT field of the next line if it wraps. Since computing the parameters of a line are fairly complicated, I have one interface that should be all you ever need to figure out something about a line. The function FIND_LINE_PARAMS computes the parameters of a single line. The function LINE_PARAMS_ITERATE is used for computing the properties of some number (> 0) of sequential lines. void line_params_iterate (GtkText* text, const GtkPropertyMark* mark0, const PrevTabCont* tab_mark0, gboolean alloc, gpointer data, LineIteratorFunction iter); where LineIteratorFunction is: typedef gint (*LineIteratorFunction) (GtkText* text, LineParams* lp, gpointer data); The arguments are a text widget (TEXT), the property mark at the beginning of the first line (MARK0), the tab stop mark at the beginning of that line (TAB_MARK0), whether to heap-allocate the LineParams structure (ALLOC), some client data (DATA), and a function to call with the parameters of each line. TAB_MARK0 may be NULL, but if so MARK0 MUST BE A REAL LINE START (not a continued line start; it is preceded by a newline). If TAB_MARK0 is not NULL, MARK0 may be any line start (continued or not). See the code for examples. The function ITER is called with each LineParams computed. If ALLOC was true, LINE_PARAMS_ITERATE heap-allocates the LineParams and does not free them. Otherwise, no storage is permanently allocated. ITER should return TRUE when it wishes to continue no longer. There are currently two uses of LINE_PARAMS_ITERATE: * Compute the total buffer height for setting the parameters of the scroll bars. This is done in SET_VERTICAL_SCROLL each time the window is resized. When horizontal scrolling is added, depending on the policy chosen, the max line width can be computed here as well. * Computing geometry of some pixel height worth of lines. This is done in FETCH_LINES, FETCH_LINES_BACKWARD, FETCH_LINES_FORWARD, etc. The GtkText structure contains a cache of the LineParams data for all visible lines: GList *current_line; GList *line_start_cache; guint first_line_start_index; guint first_cut_pixels; guint first_onscreen_hor_pixel; guint first_onscreen_ver_pixel; LINE_START_CACHE is a doubly linked list of LineParams. CURRENT_LINE is a transient piece of data which is set in various places such as the mouse click code. Generally, it is the line on which the cursor property mark CURSOR_MARK is on. LINE_START_CACHE points to the first visible line and may contain PREV pointers if the cached data of offscreen lines is kept around. I haven't come up with a policy. The cache can keep more lines than are visible if desired, but the result is that inserts and deletes will then become slower as the entire cache has to be "corrected". Right now it doesn't delete from the cache (it should). As a result, scrolling through the whole buffer once will fill the cache with an entry for each line, and subsequent modifications will be slower than they should be. FIRST_LINE_START_INDEX is the index of the *REAL* line start of the first line. That is, if the first visible line is a continued line, this is the index of the real line start (preceded by a newline). FIRST_CUT_PIXELS is the number of pixels which are not drawn on the first visible line. If FIRST_CUT_PIXELS is zero, the whole line is visible. FIRST_ONSCREEN_HOR_PIXEL is not used. FIRST_ONSCREEN_VER_PIXEL is the absolute pixel which starts the visible region. This is used for setting the vertical scroll bar. Other miscellaneous things in the GtkText structure: Gtk specific things: GtkWidget widget; GdkWindow *text_area; GtkAdjustment *hadj; GtkAdjustment *vadj; GdkGC *gc; GdkPixmap* line_wrap_bitmap; GdkPixmap* line_arrow_bitmap; These are pretty self explanatory, especially if you know Gtk. LINE_WRAP_BITMAP and LINE_ARROW_BITMAP are two bitmaps used to indicate that a line wraps and is continued offscreen, respectively. Some flags: guint has_cursor : 1; guint is_editable : 1; guint line_wrap : 1; guint freeze : 1; guint has_selection : 1; guint own_selection : 1; HAS_CURSOR is true iff the cursor is visible. IS_EDITABLE is true iff the user is allowed to modify the buffer. If IS_EDITABLE is false, HAS_CURSOR is guaranteed to be false. If IS_EDITABLE is true, HAS_CURSOR starts out false and is set to true the first time the user clicks in the window. LINE_WRAP is where the line-wrap policy is set. True means wrap lines, false means continue lines offscreen, horizontally. The text properties list: GList *text_properties; GList *text_properties_end; A scratch area used for constructing a contiguous piece of the buffer which may otherwise span the gap. It is not strictly necessary but simplifies the drawing code because it does not need to deal with the gap. guchar* scratch_buffer; guint scratch_buffer_len; The last vertical scrollbar position. Currently this looks the same as FIRST_ONSCREEN_VER_PIXEL. I can't remember why I have two values. Perhaps someone should clean this up. gint last_ver_value; The cursor: gint cursor_pos_x; gint cursor_pos_y; GtkPropertyMark cursor_mark; gchar cursor_char; gchar cursor_char_offset; gint cursor_virtual_x; gint cursor_drawn_level; CURSOR_POS_X and CURSOR_POS_Y are the screen coordinates of the cursor. CURSOR_MARK is the buffer position. CURSOR_CHAR is TEXT_INDEX (TEXT, CURSOR_MARK.INDEX) if a drawable character, or 0 if it is whitespace, which is treated specially. CURSOR_CHAR_OFFSET is the pixel offset above the base of the line at which it should be drawn. Note that the base of the line is not the "baseline" in the traditional font metric sense. A line (LineParams) is FONT_ASCENT+FONT_DESCENT high (use the macro LINE_HEIGHT). The "baseline" is FONT_DESCENT below the base of the line. I think this requires a drawing. 0 AAAAAAA 1 AAAAAAA 2 AAAAAAAAA 3 AAAAAAAAA 4 AAAAA AAAAA 5 AAAAA AAAAA 6 AAAAA AAAAA 7 AAAAA AAAAA 8 AAAAA AAAAA 9 AAAAAAAAAAAAAAAAA 10 AAAAAAAAAAAAAAAAA 11 AAAAA AAAAA 12 AAAAA AAAAA 13 AAAAAA AAAAAA 14______________AAAAA___________AAAAA__________________________________ 15 16 17 18 19 20 This line is 20 pixels high, has FONT_ASCENT=14, FONT_DESCENT=6. It's "base" is at y=20. Characters are drawn at y=14. The LINE_START macro returns the pixel height. The LINE_CONTAINS macro is true if the line contains a certain buffer index. The LINE_STARTS_AT macro is true if the line starts at a certain buffer index. The LINE_START_PIXEL is the pixel offset the line should be drawn at, according the the tab continuation of the previous line. Exposure and drawing: Exposure is handled from the EXPOSE_TEXT function. It assumes that the LINE_START_CACHE and all its parameters are accurate and simply exposes any line which is in the exposure region. It calls the CLEAR_AREA function to clear the background and/or lay down a pixmap background. The text widget has a scrollable pixmap background, which is implemented in CLEAR_AREA. CLEAR_AREA does the math to figure out how to tile the pixmap itself so that it can scroll the text with a copy area call. If the CURSOR argument to EXPOSE_TEXT is true, it also draws the cursor. The function DRAW_LINE draws a single line, doing all the tab and color computations necessary. The function DRAW_LINE_WRAP draws the line wrap bitmap at the end of the line if it wraps. TEXT_EXPOSE will expand the cached line data list if it has to by calling FETCH_LINES_FORWARD. The functions DRAW_CURSOR and UNDRAW_CURSOR draw and undraw the cursor. They count the number of draws and undraws so that the cursor may be undrawn even if the cursor is already undrawn and the re-draw will not occur too early. This is useful in handling scrolling. Handling of the cursor is a little messed up, I should add. It has to be undrawn and drawn at various places. Something better needs to be done about this, because it currently doesn't do the right thing in certain places. I can't remember where very well. Look for the calls to DRAW_CURSOR and UNDRAW_CURSOR. RECOMPUTE_GEOMETRY is called when the geometry of the window changes or when it is first drawn. This is probably not done right. My biggest weakness in writing this code is that I've never written a widget before so I got most of the event handling stuff wrong as far as Gtk is concerned. Fortunately, most of the code is unrelated and simply an exercise in data structure manipulation. Scrolling: Scrolling is fairly straightforward. It looks at the top line, and advances it pixel by pixel until the FIRST_CUT_PIXELS equals the line height and then advances the LINE_START_CACHE. When it runs out of lines it fetches more. The function SCROLL_INT is used to scroll from inside the code, it calls the appropriate functions and handles updating the scroll bars. It dispatches a change event which causes Gtk to call the correct scroll action, which then enters SCROLL_UP or SCROLL_DOWN. Careful with the cursor during these changes. Insertion, deletion: There's some confusion right now over what to do with the cursor when it's offscreen due to scrolling. This is a policy decision. I don't know what's best. Spencer criticized me for forcing it to stay onscreen. It shouldn't be hard to make stuff work with the cursor offscreen. Currently I've got functions to do insertion and deletion of a single character. It's fairly complicated. In order to do efficient pasting into the buffer, or write code that modifies the buffer while the buffer is drawn, it needs to do multiple characters at at time. This is the hardest part of what remains. Currently, gtk_text_insert does not re-expose the modified lines. It needs to. Pete did this wrong at one point and I disabled modification completely, I don't know what the current state of things are. The functions INSERT_CHAR_LINE_EXPOSE and DELETE_CHAR_LINE_EXPOSE do the work. Here's pseudo code for insert. Delete is quite similar. insert character into the buffer update the text property list move the point undraw the cursor correct all LineParams cache entries after the insertion point compute the new height of the modified line compare with the old height of the modified line remove the old LineParams from the cache insert the new LineParams into the cache if the lines are of different height, do a copy area to move the area below the insertion down expose the current line update the cursor mark redraw the cursor What needs to be done: Horizontal scrolling, robustness, testing, selection handling. If you want to work in the text widget pay attention to the debugging facilities I've written at the end of gtktext.c. I'm sorry I waited so long to try and pass this off. I'm super busy with school and work, and when I have free time my highest priority is another version of PRCS. Feel free to ask me questions.