General ideas of Pixops ======================= - Gain speed by special-casing the common case, and using generic code to handle the uncommon case. - Most of the time in scaling an image is in the center; however code that can handle edges properly is slow because it needs to deal with the possibility of running off the edge. So make the fast case code only handle the centers, and use generic, slow, code for the edges, Structure of Pixops =================== The code of pixops can roughly be grouped into four parts: - Filter computation functions - Functions for scaling or compositing lines and pixels using precomputed filters - pixops process, the central driver that iterates through the image calling pixel or line functions as necessary - Wrapper functions (pixops_scale/composite/composite_color) that compute the filter, chooses the line and pixel functions and then call pixops_processs with the filter, line, and pixel functions. pixops process is a pretty scary looking function: static void pixops_process (guchar *dest_buf, int render_x0, int render_y0, int render_x1, int render_y1, int dest_rowstride, int dest_channels, gboolean dest_has_alpha, const guchar *src_buf, int src_width, int src_height, int src_rowstride, int src_channels, gboolean src_has_alpha, double scale_x, double scale_y, int check_x, int check_y, int check_size, guint32 color1, guint32 color2, PixopsFilter *filter, PixopsLineFunc line_func, PixopsPixelFunc pixel_func) (Some of the arguments should be moved into structures. It's basically "all the arguments to pixops_composite_color plus three more") The arguments can be divided up into: Information about the destination buffer guchar *dest_buf, int dest_rowstride, int dest_channels, gboolean dest_has_alpha, Information about the source buffer guchar *src_buf, int src_rowstride, int src_channels, gboolean src_has_alpha, int src_width, int src_height, Information on how to scale the source buf and the region of the scaled source to render onto the destination buffer int render_x0, int render_y0, int render_x1, int render_y1 double scale_x, double scale_y Information about a constant color or check pattern onto which to to composite int check_x, int check_y, int check_size, guint32 color1, guint32 color2 Information precomputed to use during the scale operation PixopsFilter *filter, PixopsLineFunc line_func, OixopsPixelFunc pixel_func Filter computation ================== The PixopsFilter structure looks like: struct _PixopsFilter { int *weights; int n_x; int n_y; double x_offset; double y_offset; }; 'weights' is an array of size: weights[SUBSAMPLE][SUBSAMPLE][n_x][n_y] SUBSAMPLE is a constant - currently 16 in pixops.c. In order to compute a scaled destination pixel we convolve an array of n_x by n_y source pixels with one of the SUBSAMPLE * SUBSAMPLE filter matrices stored in weights. The choice of filter matrix is determined by the fractional part of the source location. To compute dest[i,j] we do the following: x = i * scale_x + x_offset; y = i * scale_x + y_offset; x_int = floor(x) y_int = floor(y) C = weights[SUBSAMPLE*(x - x_int)][SUBSAMPLE*(y - y_int)] total = sum[l=0..n_x-1, j=0..n_y-1] (C[l,m] * src[x_int + l, x_int + m]) The filter weights are integers scaled so that the total of the weights in the weights array is equal to 65536. When the source does not have alpha, we simply compute each channel as above, so total is in the range [0,255*65536] dest = src / 65536 When the source does have alpha, then we need to compute using "pre-multiplied alpha": a_total = sum (C[l,m] * src_a[x_int + l, x_int + m]) c_total = sum (C[l,m] * src_a[x_int + l, x_int + m] * src_c[x_int + l, x_int + m]) This gives us a result for c_total in the range of [0,255*a_total] c_dest = c_total / a_total Mathematical aside: The process of producing a destination filter consists of: - Producing a continuous approximation to the source image via interpolation. - Sampling that continuous approximation with filter. This is representable as: S(x,y) = sum[i=-inf,inf; j=-inf,inf] A(frac(x),frac(y))[i,j] * S[floor(x)+i,floor(y)+j] D[i,j] = Integral(s=-inf,inf; t=-inf,inf) B(i+x,j+y) S((i+x)/scale_x,(i+y)/scale_y) By reordering the sums and integrals, you get something of the form: D[i,j] = sum[l=-inf,inf; m=-inf;inf] C[l,m] S[i+l,j+l] The arrays in weights are the C[l,m] above, and are thus determined by the interpolating algorithm in use and the sampling filter: INTERPOLATE SAMPLE ART_FILTER_NEAREST nearest neighbour point ART_FILTER_TILES nearest neighbour box ART_FILTER_BILINEAR (scale < 1) nearest neighbour box (scale < 1) ART_FILTER_BILINEAR (scale > 1) bilinear point (scale > 1) ART_FILTER_HYPER bilinear box Pixel Functions =============== typedef void (*PixopsPixelFunc) (guchar *dest, int dest_x, int dest_channels, int dest_has_alpha, int src_has_alpha, int check_size, guint32 color1, guint32 color2, int r, int g, int b, int a); The arguments here are: dest: location to store the output pixel dest_x: x coordinate of destination (for handling checks) dest_has_alpha, dest_channels: Information about the destination pixbuf src_has_alpha: Information about the source pixbuf check_size, color1, color2: Information for color background for composite_color variant r,g,b,a - scaled red, green, blue and alpha r,g,b are premultiplied alpha. a is in [0,65536*255] r is in [0,255*a] g is in [0,255*a] b is in [0,255*a] If src_has_alpha is false, then a will be 65536*255, allowing optimization. Line functions ============== typedef guchar *(*PixopsLineFunc) (int *weights, int n_x, int n_y, guchar *dest, int dest_x, guchar *dest_end, int dest_channels, int dest_has_alpha, guchar **src, int src_channels, gboolean src_has_alpha, int x_init, int x_step, int src_width, int check_size, guint32 color1, guint32 color2); The argumets are: weights, n_x, n_y Filter weights for this row - dimensions weights[SUBSAMPLE][n_x][n_y] dest, dest_x, dest_end, dest_channels, dest_has_alpha The destination buffer, function will start writing into *dest and increment by dest_channels, until dest == dest_end. Reading from src for these pixels is guaranteed not to go outside of the bufer bounds src, src_channels, src_has_alpha src[n_y] - an array of pointers to the start of the source rows for each filter coordinate. x_init, x_step Information about x positions in source image. src_width - unused check_size, color1, color2: Information for color background for composite_color variant The total for the destination pixel at dest + i is given by SUM (l=0..n_x - 1, m=0..n_y - 1) src[m][(x_init + i * x_step)>> SCALE_SHIFT + l] * weights[m][l] Algorithms for compositing ========================== Compositing alpha on non alpha: R = As * Rs + (1 - As) * Rd G = As * Gs + (1 - As) * Gd B = As * Bs + (1 - As) * Bd This can be regrouped as: Cd + Cs * (Cs - Rd) Compositing alpha on alpha: A = As + (1 - As) * Ad R = (As * Rs + (1 - As) * Rd * Ad) / A G = (As * Gs + (1 - As) * Gd * Ad) / A B = (As * Bs + (1 - As) * Bd * Ad) / A The way to think of this is in terms of the "area": The final pixel is composed of area As of the source pixel and (1 - As) * Ad of the target pixel. So the final pixel is a weighted average with those weights. Note that the weights do not add up to one - hence the non-constant division. Integer tricks for compositing ============================== MMX Code ======== Line functions are provided in MMX functionsfor a few special cases: n_x = n_y = 2 src_channels = 3 dest_channels = 3 op = scale src_channels = 4 with alpha dest_channels = 4 no alpha op = composite src_channels = 4 with alpha dest_channels = 4 no alpha op = composite_color For the case n_x = n_y = 2 - primarily hit when scaling up with bilinear scaling, we can take advantage of the fact that multiple destination pixels will be composed from the same source pixels. That is a destination pixel is a linear combination of the source pixels around it: S0 S1 D D' D'' ... S2 S3 Each mmx register is 64 bits wide, so we can unpack a source pixel into the low 8 bits of 4 16 bit words, and store it into a mmx register. For each destination pixel, we first make sure that we have pixels S0 ... S3 loaded into registers mm0 ...mm3. (This will often involve not doing anything or moving mm1 and mm3 into mm0 and mm1 then reloading mm1 and mm3 with new values). Then we load up the appropriate weights for the 4 corner pixels based on the offsets of the destination pixel within the source pixels. We have preexpanded the weights to 64 bits wide and truncated the range to 8 bits, so an original filter value of 0x5321 would be expanded to 0x0053005300530053 For source buffers without alpha, we simply do a multiply-add of the weights, giving us a 16 bit quantity for the result that we shift left by 8 and store in the destination buffer. When the source buffer has alpha, then things become more complicated - when we load up mm0 and mm3, we premultiply the alpha, so they contain: (a*ff >> 8) (r*a >> 8) (g*a >> 8) (b*a >> a) Then when we multiply by the weights, and add we end up with premultiplied r,g,b,a in the range of 0 .. 0xff * 0ff, call them A,R,G,B We then need to composite with the dest pixels - which we do by: r_dest = (R + ((0xff * 0xff - A) >> 8) * r_dest) >> 8 (0xff * 0xff)