Blame libfreerdp/primitives/README.txt

Packit 1fb8d4
The Primitives Library
Packit 1fb8d4

Packit 1fb8d4
Introduction
Packit 1fb8d4
------------
Packit 1fb8d4
The purpose of the primitives library is to give the freerdp code easy
Packit 1fb8d4
access to *run-time* optimization via SIMD operations.  When the library
Packit 1fb8d4
is initialized, dynamic checks of processor features are run (such as
Packit 1fb8d4
the support of SSE3 or Neon), and entrypoints are linked to through
Packit 1fb8d4
function pointers to provide the fastest possible operations.  All
Packit 1fb8d4
routines offer generic C alternatives as fallbacks.
Packit 1fb8d4

Packit 1fb8d4
Run-time optimization has the advantage of allowing a single executable
Packit 1fb8d4
to run fast on multiple platforms with different SIMD capabilities.
Packit 1fb8d4

Packit 1fb8d4

Packit 1fb8d4
Use In Code
Packit 1fb8d4
-----------
Packit 1fb8d4
A singleton pointing to a structure containing the function pointers
Packit 1fb8d4
is accessed through primitives_get().   The function pointers can then
Packit 1fb8d4
be used from that structure, e.g.
Packit 1fb8d4

Packit 1fb8d4
    primitives_t *prims = primitives_get();
Packit 1fb8d4
    prims->shiftC_16s(buffer, shifts, buffer, 256);
Packit 1fb8d4

Packit 1fb8d4
Of course, there is some overhead in calling through the function pointer
Packit 1fb8d4
and setting up the SIMD operations, so it would be counterproductive to
Packit 1fb8d4
call the primitives library for very small operation, e.g. initializing an
Packit 1fb8d4
array of eight values to a constant.  The primitives library is intended
Packit 1fb8d4
for larger-scale operations, e.g. arrays of size 64 and larger.
Packit 1fb8d4

Packit 1fb8d4

Packit 1fb8d4
Initialization and Cleanup
Packit 1fb8d4
--------------------------
Packit 1fb8d4
Library initialization is done the first time primitives_init() is called
Packit 1fb8d4
or the first time primitives_get() is used.  Cleanup (if any) is done by
Packit 1fb8d4
primitives_deinit().
Packit 1fb8d4

Packit 1fb8d4

Packit 1fb8d4
Intel Integrated Performance Primitives (IPP)
Packit 1fb8d4
---------------------------------------------
Packit 1fb8d4
If freerdp is compiled with IPP support (-DWITH_IPP=ON), the IPP function
Packit 1fb8d4
calls will be used (where available) to fill the function pointers.
Packit 1fb8d4
Where possible, function names and parameter lists match IPP format so
Packit 1fb8d4
that the IPP functions can be plugged into the function pointers without
Packit 1fb8d4
a wrapper layer.  Use of IPP is completely optional, and in many cases
Packit 1fb8d4
the SSE operations in the primitives library itself are faster or similar
Packit 1fb8d4
in performance.
Packit 1fb8d4

Packit 1fb8d4

Packit 1fb8d4
Coverage
Packit 1fb8d4
--------
Packit 1fb8d4
The primitives library is not meant to be comprehensive, offering
Packit 1fb8d4
entrypoints for every operation and operand type.  Instead, the coverage
Packit 1fb8d4
is focused on operations known to be performance bottlenecks in the code.
Packit 1fb8d4
For instance, 16-bit signed operations are used widely in the RemoteFX
Packit 1fb8d4
software, so you'll find 16s versions of several operations, but there
Packit 1fb8d4
is no attempt to provide (unused) copies of the same code for 8u, 16u,
Packit 1fb8d4
32s, etc.
Packit 1fb8d4

Packit 1fb8d4

Packit 1fb8d4
New Optimizations
Packit 1fb8d4
-----------------
Packit 1fb8d4
As the need arises, new optimizations can be added to the library,
Packit 1fb8d4
including NEON, AVX, and perhaps OpenCL or other SIMD implementations.
Packit 1fb8d4
The CPU feature detection is done in winpr/sysinfo.
Packit 1fb8d4

Packit 1fb8d4

Packit 1fb8d4
Adding Entrypoints
Packit 1fb8d4
------------------
Packit 1fb8d4
As the need for new operations or operands arises, new entrypoints can
Packit 1fb8d4
be added.  
Packit 1fb8d4
  1) Function prototypes and pointers are added to 
Packit 1fb8d4
     include/freerdp/primitives.h
Packit 1fb8d4
  2) New module initialization and cleanup function prototypes are added
Packit 1fb8d4
     to prim_internal.h and called in primitives.c (primitives_init()
Packit 1fb8d4
     and primitives_deinit()).
Packit 1fb8d4
  3) Operation names and parameter lists should be compatible with the IPP.
Packit 1fb8d4
     IPP manuals are available online at software.intel.com.
Packit 1fb8d4
  4) A generic C entrypoint must be available as a fallback.
Packit 1fb8d4
  5) prim_templates.h contains macro-based templates for simple operations,
Packit 1fb8d4
     such as applying a single SSE operation to arrays of data.
Packit 1fb8d4
     The template functions can frequently be used to extend the
Packit 1fb8d4
     operations without writing a lot of new code.
Packit 1fb8d4

Packit 1fb8d4
Cache Management
Packit 1fb8d4
----------------
Packit 1fb8d4
I haven't found a lot of speed improvement by attempting prefetch, and
Packit 1fb8d4
in fact it seems to have a negative impact in some cases.  Done correctly
Packit 1fb8d4
perhaps the routines could be further accelerated by proper use of prefetch,
Packit 1fb8d4
fences, etc.
Packit 1fb8d4

Packit 1fb8d4

Packit 1fb8d4
Testing
Packit 1fb8d4
-------
Packit 1fb8d4
In the test subdirectory is an executable (prim_test) that tests both
Packit 1fb8d4
functionality and speed of primitives library operations.   Any new
Packit 1fb8d4
modules should be added to that test, following the conventions already
Packit 1fb8d4
established in that directory.  The program can be executed on various
Packit 1fb8d4
target hardware to compare generic C, optimized, and IPP performance
Packit 1fb8d4
with various array sizes.
Packit 1fb8d4