|
Packit |
a4058c |
The code in this directory implements optimized, filtered scaling
|
|
Packit |
a4058c |
for pixmap data.
|
|
Packit |
a4058c |
|
|
Packit |
a4058c |
This code is copyright Red Hat, Inc, 2000 and licensed under the terms
|
|
Packit |
a4058c |
of the GNU Lesser General Public License (LGPL).
|
|
Packit |
a4058c |
|
|
Packit |
a4058c |
(If you want to use it in a project where that license is not
|
|
Packit |
a4058c |
appropriate, please contact me, and most likely something can be
|
|
Packit |
a4058c |
worked out.)
|
|
Packit |
a4058c |
|
|
Packit |
a4058c |
Owen Taylor <otaylor@redhat.com>
|
|
Packit |
a4058c |
|
|
Packit |
a4058c |
PRINCIPLES
|
|
Packit |
a4058c |
==========
|
|
Packit |
a4058c |
|
|
Packit |
a4058c |
The general principle of this code is that it first computes a filter
|
|
Packit |
a4058c |
matrix for the given filtering mode, and then calls a general driver
|
|
Packit |
a4058c |
routine, passing in functions to composite pixels and lines.
|
|
Packit |
a4058c |
|
|
Packit |
a4058c |
(The pixel functions are used for handling edge cases, and the line
|
|
Packit |
a4058c |
functions are simply used for the middle parts of the image.)
|
|
Packit |
a4058c |
|
|
Packit |
a4058c |
The system is designed so that the line functions can be simple,
|
|
Packit |
a4058c |
don't have to worry about special cases, can be selected to
|
|
Packit |
a4058c |
be specific to the particular formats involved. This allows them
|
|
Packit |
a4058c |
to be hyper-optimized. Since most of the compution time is
|
|
Packit |
a4058c |
spent in these functions, this results in an overall fast design.
|
|
Packit |
a4058c |
|
|
Packit |
a4058c |
MMX assembly code for Intel (and compatible) processors is included
|
|
Packit |
a4058c |
for a number of the most common special cases:
|
|
Packit |
a4058c |
|
|
Packit |
a4058c |
scaling from RGB to RGB
|
|
Packit |
a4058c |
compositing from RGBA to RGBx
|
|
Packit |
a4058c |
compositing against a color from RGBA and storing in a RGBx buffer
|
|
Packit |
a4058c |
|
|
Packit |
a4058c |
Alpha compositing 8 bit RGBAa onto RGB is defined in terms of
|
|
Packit |
a4058c |
rounding the exact result (real values in [0,1]):
|
|
Packit |
a4058c |
|
|
Packit |
a4058c |
cc = ca * aa + (1 - aa) * Cb
|
|
Packit |
a4058c |
|
|
Packit |
a4058c |
Cc = ROUND [255. * (Ca/255. * Aa/255. + (1 - Aa/255.) * Cb/255.)]
|
|
Packit |
a4058c |
|
|
Packit |
a4058c |
ROUND(i / 255.) can be computed exactly for i in [0,255*255] as:
|
|
Packit |
a4058c |
|
|
Packit |
a4058c |
t = i + 0x80; result = (t + (t >> 8)) >> 8; [ call this as To8(i) ]
|
|
Packit |
a4058c |
|
|
Packit |
a4058c |
So,
|
|
Packit |
a4058c |
|
|
Packit |
a4058c |
t = Ca * Aa + (255 - Aa) * Cb + 0x80;
|
|
Packit |
a4058c |
Cc = (t + (t >> 8)) >> 8;
|
|
Packit |
a4058c |
|
|
Packit |
a4058c |
Alpha compositing 8 bit RaGaBaAa onto RbGbBbAa is a little harder, for
|
|
Packit |
a4058c |
non-premultiplied alpha. The premultiplied result is simple:
|
|
Packit |
a4058c |
|
|
Packit |
a4058c |
ac = aa + (1 - aa) * ab
|
|
Packit |
a4058c |
cc = ca + (1 - aa) * cb
|
|
Packit |
a4058c |
|
|
Packit |
a4058c |
Which can be computed in integers terms as:
|
|
Packit |
a4058c |
|
|
Packit |
a4058c |
Cc = Ca + To8 ((255 - Aa) * Cb)
|
|
Packit |
a4058c |
Ac = Aa + To8 ((255 - Aa) * Ab)
|
|
Packit |
a4058c |
|
|
Packit |
a4058c |
For non-premultiplied alpha, we need divide the color components by
|
|
Packit |
a4058c |
the alpha:
|
|
Packit |
a4058c |
|
|
Packit |
a4058c |
+- (ca * aa + (1 - aa) * ab * cb)) / ac; aa != 0
|
|
Packit |
a4058c |
cc = |
|
|
Packit |
a4058c |
+- cb; aa == 0
|
|
Packit |
a4058c |
|
|
Packit |
a4058c |
To calculate this as in integer, we note the alternate form:
|
|
Packit |
a4058c |
|
|
Packit |
a4058c |
cc = cb + aa * (ca - cb) / ac
|
|
Packit |
a4058c |
|
|
Packit |
a4058c |
[ 'cc = ca + (ac - aa) * (cb - ca) / ac' can also be useful numerically,
|
|
Packit |
a4058c |
but isn't important here ]
|
|
Packit |
a4058c |
|
|
Packit |
a4058c |
We can express this as integers as:
|
|
Packit |
a4058c |
|
|
Packit |
a4058c |
Ac_tmp = Aa * 255 + (255 - Aa) * Ab;
|
|
Packit |
a4058c |
|
|
Packit |
a4058c |
+- Cb + (255 * Aa * (Ca - Cb) + Ac_tmp / 2) / Ac_tmp ; Ca > Cb
|
|
Packit |
a4058c |
Cc = |
|
|
Packit |
a4058c |
+- Cb - (255 * Aa * (Cb - Ca) + Ac_tmp / 2) / Ac_tmp ; ca <= Cb
|
|
Packit |
a4058c |
|
|
Packit |
a4058c |
Or, playing bit tricks to avoid the conditional
|
|
Packit |
a4058c |
|
|
Packit |
a4058c |
Cc = Cb + (255 * Aa * (Ca - Cb) + (((Ca - Cb) >> 8) ^ (Ac_tmp / 2)) ) / Ac_tmp
|
|
Packit |
a4058c |
|
|
Packit |
a4058c |
TODO
|
|
Packit |
a4058c |
====
|
|
Packit |
a4058c |
|
|
Packit |
a4058c |
* ART_FILTER_HYPER is not correctly implemented. It is currently
|
|
Packit |
a4058c |
implemented as a filter that is derived by doing linear interpolation
|
|
Packit |
a4058c |
on the source image and then averaging that with a box filter.
|
|
Packit |
a4058c |
|
|
Packit |
a4058c |
It should be defined as followed (see art_filterlevel.h)
|
|
Packit |
a4058c |
|
|
Packit |
a4058c |
"HYPER is the highest quality reconstruction function. It is derived
|
|
Packit |
a4058c |
from the hyperbolic filters in Wolberg's "Digital Image Warping,"
|
|
Packit |
a4058c |
and is formally defined as the hyperbolic-filter sampling the ideal
|
|
Packit |
a4058c |
hyperbolic-filter interpolated image (the filter is designed to be
|
|
Packit |
a4058c |
idempotent for 1:1 pixel mapping). It is the slowest and highest
|
|
Packit |
a4058c |
quality."
|
|
Packit |
a4058c |
|
|
Packit |
a4058c |
The current HYPER is probably as slow, but lower quality. Also, there
|
|
Packit |
a4058c |
are some subtle errors in the calculation current HYPER that show up as dark
|
|
Packit |
a4058c |
stripes if you scale a constant-color image.
|
|
Packit |
a4058c |
|
|
Packit |
a4058c |
* There are some roundoff errors in the compositing routines.
|
|
Packit |
a4058c |
the _nearest() variants do it right, most of the other code
|
|
Packit |
a4058c |
is wrong to some degree or another.
|
|
Packit |
a4058c |
|
|
Packit |
a4058c |
For instance, in composite_line_22_4a4(), we have:
|
|
Packit |
a4058c |
|
|
Packit |
a4058c |
dest[0] = ((0xff0000 - a) * dest[0] + r) >> 24;
|
|
Packit |
a4058c |
|
|
Packit |
a4058c |
if a is 0 (implies r == 0), then we have:
|
|
Packit |
a4058c |
|
|
Packit |
a4058c |
(0xff0000 * dest[0]) >> 24
|
|
Packit |
a4058c |
|
|
Packit |
a4058c |
which gives results which are 1 to low:
|
|
Packit |
a4058c |
|
|
Packit |
a4058c |
255 => 254, 1 => 0.
|
|
Packit |
a4058c |
|
|
Packit |
a4058c |
So, this should be something like:
|
|
Packit |
a4058c |
|
|
Packit |
a4058c |
((0xff0000 - a) * dest[0] + r + 0xffffff) >> 24;
|
|
Packit |
a4058c |
|
|
Packit |
a4058c |
(Not checked, caveat emptor)
|
|
Packit |
a4058c |
|
|
Packit |
a4058c |
An alternatve formulation of this as:
|
|
Packit |
a4058c |
|
|
Packit |
a4058c |
dest[0] + (r - a * dest[0] + 0xffffff) >> 24
|
|
Packit |
a4058c |
|
|
Packit |
a4058c |
may be better numerically, but would need consideration for overflow.
|
|
Packit |
a4058c |
|
|
Packit |
a4058c |
* The generic functions could be sped up considerably by
|
|
Packit |
a4058c |
switching around conditionals and inner loops in various
|
|
Packit |
a4058c |
places.
|
|
Packit |
a4058c |
|
|
Packit |
a4058c |
* Right now, in several of the most common cases, there are
|
|
Packit |
a4058c |
optimized mmx routines, but no optimized C routines.
|
|
Packit |
a4058c |
|
|
Packit |
a4058c |
For instance, there is a
|
|
Packit |
a4058c |
|
|
Packit |
a4058c |
pixops_composite_line_22_4a4_mmx()
|
|
Packit |
a4058c |
|
|
Packit |
a4058c |
But no
|
|
Packit |
a4058c |
|
|
Packit |
a4058c |
pixops_composite_line_22_4a4()
|
|
Packit |
a4058c |
|
|
Packit |
a4058c |
Also, it may be desirable to include a few more special cases - in particular:
|
|
Packit |
a4058c |
|
|
Packit |
a4058c |
pixops_composite_line_22_4a3()
|
|
Packit |
a4058c |
|
|
Packit |
a4058c |
May be desirable.
|
|
Packit |
a4058c |
|
|
Packit |
a4058c |
* Scaling down images by large scale factors is _slow_ since huge filter
|
|
Packit |
a4058c |
matrixes are computed. (e.g., to scale down by a factor of 100, we compute
|
|
Packit |
a4058c |
101x101 filter matrixes. At some point, it would be more efficent to
|
|
Packit |
a4058c |
switch over to subsampling when scaling down - one should never need a filter
|
|
Packit |
a4058c |
matrix bigger than 16x16.
|
|
Packit |
a4058c |
|