|
Packit |
5c3484 |
Copyright 2003, 2004, 2006, 2008 Free Software Foundation, Inc.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
This file is part of the GNU MP Library.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
The GNU MP Library is free software; you can redistribute it and/or modify
|
|
Packit |
5c3484 |
it under the terms of either:
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
* the GNU Lesser General Public License as published by the Free
|
|
Packit |
5c3484 |
Software Foundation; either version 3 of the License, or (at your
|
|
Packit |
5c3484 |
option) any later version.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
or
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
* the GNU General Public License as published by the Free Software
|
|
Packit |
5c3484 |
Foundation; either version 2 of the License, or (at your option) any
|
|
Packit |
5c3484 |
later version.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
or both in parallel, as here.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
The GNU MP Library is distributed in the hope that it will be useful, but
|
|
Packit |
5c3484 |
WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
|
|
Packit |
5c3484 |
or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
|
|
Packit |
5c3484 |
for more details.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
You should have received copies of the GNU General Public License and the
|
|
Packit |
5c3484 |
GNU Lesser General Public License along with the GNU MP Library. If not,
|
|
Packit |
5c3484 |
see https://www.gnu.org/licenses/.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
AMD64 MPN SUBROUTINES
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
This directory contains mpn functions for AMD64 chips. It is also useful
|
|
Packit |
5c3484 |
for 64-bit Pentiums, and "Core 2".
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
RELEVANT OPTIMIZATION ISSUES
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
The Opteron and Athlon64 can sustain up to 3 instructions per cycle, but in
|
|
Packit |
5c3484 |
practice that is only possible for integer instructions. But almost any
|
|
Packit |
5c3484 |
three integer instructions can issue simultaneously, including any 3 ALU
|
|
Packit |
5c3484 |
operations, including shifts. Up to two memory operations can issue each
|
|
Packit |
5c3484 |
cycle.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
Scheduling typically requires that load-use instructions are split into
|
|
Packit |
5c3484 |
separate load and use instructions. That requires more decode resources,
|
|
Packit |
5c3484 |
and it is rarely a win. Opteron/Athlon64 have deep out-of-order core.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
Optimizing for 64-bit Pentium4 is probably a waste of time, as the most
|
|
Packit |
5c3484 |
critical instructions are very poorly implemented here. Perhaps we could
|
|
Packit |
5c3484 |
save a cycle or two, but the most common loops now run at between 10 and 22
|
|
Packit |
5c3484 |
cycles, so a saved cycle isn't too exciting.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
The new spin of the venerable P6 core, the "Core 2" is much better than the
|
|
Packit |
5c3484 |
Pentium4 for the GMP loops. Its integer pipeline is somewhat similar to to
|
|
Packit |
5c3484 |
the Opteron/Athlon64 pipeline, except that the GMP favourites ADC/SBB and
|
|
Packit |
5c3484 |
MUL are slower. Furthermore, an INC/DEC followed by ADC/SBB incur a
|
|
Packit |
5c3484 |
pipeline stall of around 10 cycles. The default mpn_add_n and mpn_sub_n
|
|
Packit |
5c3484 |
code suffers badly from the stall. The code in the core2 subdirectory uses
|
|
Packit |
5c3484 |
the almost forgotten instruction JRCXZ for loop control, and updates the
|
|
Packit |
5c3484 |
induction variable using LEA.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
REFERENCES
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
"System V Application Binary Interface AMD64 Architecture Processor
|
|
Packit |
5c3484 |
Supplement", draft version 0.99, December 2007.
|
|
Packit |
5c3484 |
http://www.x86-64.org/documentation/abi.pdf
|