|
Packit |
5c3484 |
Copyright 2001 Free Software Foundation, Inc.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
This file is part of the GNU MP Library.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
The GNU MP Library is free software; you can redistribute it and/or modify
|
|
Packit |
5c3484 |
it under the terms of either:
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
* the GNU Lesser General Public License as published by the Free
|
|
Packit |
5c3484 |
Software Foundation; either version 3 of the License, or (at your
|
|
Packit |
5c3484 |
option) any later version.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
or
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
* the GNU General Public License as published by the Free Software
|
|
Packit |
5c3484 |
Foundation; either version 2 of the License, or (at your option) any
|
|
Packit |
5c3484 |
later version.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
or both in parallel, as here.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
The GNU MP Library is distributed in the hope that it will be useful, but
|
|
Packit |
5c3484 |
WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
|
|
Packit |
5c3484 |
or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
|
|
Packit |
5c3484 |
for more details.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
You should have received copies of the GNU General Public License and the
|
|
Packit |
5c3484 |
GNU Lesser General Public License along with the GNU MP Library. If not,
|
|
Packit |
5c3484 |
see https://www.gnu.org/licenses/.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
INTEL PENTIUM-4 MPN SUBROUTINES
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
This directory contains mpn functions optimized for Intel Pentium-4.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
The mmx subdirectory has routines using MMX instructions, the sse2
|
|
Packit |
5c3484 |
subdirectory has routines using SSE2 instructions. All P4s have these, the
|
|
Packit |
5c3484 |
separate directories are just so configure can omit that code if the
|
|
Packit |
5c3484 |
assembler doesn't support it.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
STATUS
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
cycles/limb
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
mpn_add_n/sub_n 4 normal, 6 in-place
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
mpn_mul_1 4 normal, 6 in-place
|
|
Packit |
5c3484 |
mpn_addmul_1 6
|
|
Packit |
5c3484 |
mpn_submul_1 7
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
mpn_mul_basecase 6 cycles/crossproduct (approx)
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
mpn_sqr_basecase 3.5 cycles/crossproduct (approx)
|
|
Packit |
5c3484 |
or 7.0 cycles/triangleproduct (approx)
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
mpn_l/rshift 1.75
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
The shifts ought to be able to go at 1.5 c/l, but not much effort has been
|
|
Packit |
5c3484 |
applied to them yet.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
In-place operations, and all addmul, submul, mul_basecase and sqr_basecase
|
|
Packit |
5c3484 |
calls, suffer from pipeline anomalies associated with write combining and
|
|
Packit |
5c3484 |
movd reads and writes to the same or nearby locations. The movq
|
|
Packit |
5c3484 |
instructions do not trigger the same hardware problems. Unfortunately,
|
|
Packit |
5c3484 |
using movq and splitting/combining seems to require too many extra
|
|
Packit |
5c3484 |
instructions to help. Perhaps future chip steppings will be better.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
NOTES
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
The Pentium-4 pipeline "Netburst", provides for quite a number of surprises.
|
|
Packit |
5c3484 |
Many traditional x86 instructions run very slowly, requiring use of
|
|
Packit |
5c3484 |
alterative instructions for acceptable performance.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
adcl and sbbl are quite slow at 8 cycles for reg->reg. paddq of 32-bits
|
|
Packit |
5c3484 |
within a 64-bit mmx register seems better, though the combination
|
|
Packit |
5c3484 |
paddq/psrlq when propagating a carry is still a 4 cycle latency.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
incl and decl should be avoided, instead use add $1 and sub $1. Apparently
|
|
Packit |
5c3484 |
the carry flag is not separately renamed, so incl and decl depend on all
|
|
Packit |
5c3484 |
previous flags-setting instructions.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
shll and shrl have a 4 cycle latency, or 8 times the latency of the fastest
|
|
Packit |
5c3484 |
integer instructions (addl, subl, orl, andl, and some more). shldl and
|
|
Packit |
5c3484 |
shrdl seem to have 13 and 15 cycles latency, respectively. Bizarre.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
movq mmx -> mmx does have 6 cycle latency, as noted in the documentation.
|
|
Packit |
5c3484 |
pxor/por or similar combination at 2 cycles latency can be used instead.
|
|
Packit |
5c3484 |
The movq however executes in the float unit, thereby saving MMX execution
|
|
Packit |
5c3484 |
resources. With the right juggling, data moves shouldn't be on a dependent
|
|
Packit |
5c3484 |
chain.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
L1 is write-through, but the write-combining sounds like it does enough to
|
|
Packit |
5c3484 |
not require explicit destination prefetching.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
xmm registers so far haven't found a use, but not much effort has been
|
|
Packit |
5c3484 |
expended. A configure test for whether the operating system knows
|
|
Packit |
5c3484 |
fxsave/fxrestor will be needed if they're used.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
REFERENCES
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
Intel Pentium-4 processor manuals,
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
http://developer.intel.com/design/pentium4/manuals
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
"Intel Pentium 4 Processor Optimization Reference Manual", Intel, 2001,
|
|
Packit |
5c3484 |
order number 248966. Available on-line:
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
http://developer.intel.com/design/pentium4/manuals/248966.htm
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
----------------
|
|
Packit |
5c3484 |
Local variables:
|
|
Packit |
5c3484 |
mode: text
|
|
Packit |
5c3484 |
fill-column: 76
|
|
Packit |
5c3484 |
End:
|