Blame mpn/pa32/README

Packit 5c3484
Copyright 1996, 1999, 2001, 2002, 2004 Free Software Foundation, Inc.
Packit 5c3484
Packit 5c3484
This file is part of the GNU MP Library.
Packit 5c3484
Packit 5c3484
The GNU MP Library is free software; you can redistribute it and/or modify
Packit 5c3484
it under the terms of either:
Packit 5c3484
Packit 5c3484
  * the GNU Lesser General Public License as published by the Free
Packit 5c3484
    Software Foundation; either version 3 of the License, or (at your
Packit 5c3484
    option) any later version.
Packit 5c3484
Packit 5c3484
or
Packit 5c3484
Packit 5c3484
  * the GNU General Public License as published by the Free Software
Packit 5c3484
    Foundation; either version 2 of the License, or (at your option) any
Packit 5c3484
    later version.
Packit 5c3484
Packit 5c3484
or both in parallel, as here.
Packit 5c3484
Packit 5c3484
The GNU MP Library is distributed in the hope that it will be useful, but
Packit 5c3484
WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
Packit 5c3484
or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
Packit 5c3484
for more details.
Packit 5c3484
Packit 5c3484
You should have received copies of the GNU General Public License and the
Packit 5c3484
GNU Lesser General Public License along with the GNU MP Library.  If not,
Packit 5c3484
see https://www.gnu.org/licenses/.
Packit 5c3484
Packit 5c3484
Packit 5c3484
Packit 5c3484
Packit 5c3484
Packit 5c3484
Packit 5c3484
This directory contains mpn functions for various HP PA-RISC chips.  Code
Packit 5c3484
that runs faster on the PA7100 and later implementations, is in the pa7100
Packit 5c3484
directory.
Packit 5c3484
Packit 5c3484
RELEVANT OPTIMIZATION ISSUES
Packit 5c3484
Packit 5c3484
  Load and Store timing
Packit 5c3484
Packit 5c3484
On the PA7000 no memory instructions can issue the two cycles after a store.
Packit 5c3484
For the PA7100, this is reduced to one cycle.
Packit 5c3484
Packit 5c3484
The PA7100 has a lookup-free cache, so it helps to schedule loads and the
Packit 5c3484
dependent instruction really far from each other.
Packit 5c3484
Packit 5c3484
STATUS
Packit 5c3484
Packit 5c3484
1. mpn_mul_1 could be improved to 6.5 cycles/limb on the PA7100, using the
Packit 5c3484
   instructions below (but some sw pipelining is needed to avoid the
Packit 5c3484
   xmpyu-fstds delay):
Packit 5c3484
Packit 5c3484
	fldds	s1_ptr
Packit 5c3484
Packit 5c3484
	xmpyu
Packit 5c3484
	fstds	N(%r30)
Packit 5c3484
	xmpyu
Packit 5c3484
	fstds	N(%r30)
Packit 5c3484
Packit 5c3484
	ldws	N(%r30)
Packit 5c3484
	ldws	N(%r30)
Packit 5c3484
	ldws	N(%r30)
Packit 5c3484
	ldws	N(%r30)
Packit 5c3484
Packit 5c3484
	addc
Packit 5c3484
	stws	res_ptr
Packit 5c3484
	addc
Packit 5c3484
	stws	res_ptr
Packit 5c3484
Packit 5c3484
	addib	Loop
Packit 5c3484
Packit 5c3484
2. mpn_addmul_1 could be improved from the current 10 to 7.5 cycles/limb
Packit 5c3484
   (asymptotically) on the PA7100, using the instructions below.  With proper
Packit 5c3484
   sw pipelining and the unrolling level below, the speed becomes 8
Packit 5c3484
   cycles/limb.
Packit 5c3484
Packit 5c3484
	fldds	s1_ptr
Packit 5c3484
	fldds	s1_ptr
Packit 5c3484
Packit 5c3484
	xmpyu
Packit 5c3484
	fstds	N(%r30)
Packit 5c3484
	xmpyu
Packit 5c3484
	fstds	N(%r30)
Packit 5c3484
	xmpyu
Packit 5c3484
	fstds	N(%r30)
Packit 5c3484
	xmpyu
Packit 5c3484
	fstds	N(%r30)
Packit 5c3484
Packit 5c3484
	ldws	N(%r30)
Packit 5c3484
	ldws	N(%r30)
Packit 5c3484
	ldws	N(%r30)
Packit 5c3484
	ldws	N(%r30)
Packit 5c3484
	ldws	N(%r30)
Packit 5c3484
	ldws	N(%r30)
Packit 5c3484
	ldws	N(%r30)
Packit 5c3484
	ldws	N(%r30)
Packit 5c3484
	addc
Packit 5c3484
	addc
Packit 5c3484
	addc
Packit 5c3484
	addc
Packit 5c3484
	addc	%r0,%r0,cy-limb
Packit 5c3484
Packit 5c3484
	ldws	res_ptr
Packit 5c3484
	ldws	res_ptr
Packit 5c3484
	ldws	res_ptr
Packit 5c3484
	ldws	res_ptr
Packit 5c3484
	add
Packit 5c3484
	stws	res_ptr
Packit 5c3484
	addc
Packit 5c3484
	stws	res_ptr
Packit 5c3484
	addc
Packit 5c3484
	stws	res_ptr
Packit 5c3484
	addc
Packit 5c3484
	stws	res_ptr
Packit 5c3484
Packit 5c3484
	addib
Packit 5c3484
Packit 5c3484
3. For the PA8000 we have to stick to using 32-bit limbs before compiler
Packit 5c3484
   support emerges.  But we want to use 64-bit operations whenever possible,
Packit 5c3484
   in particular for loads and stores.  It is possible to handle mpn_add_n
Packit 5c3484
   efficiently by rotating (when s1/s2 are aligned), masking+bit field
Packit 5c3484
   inserting when (they are not).  The speed should double compared to the
Packit 5c3484
   code used today.
Packit 5c3484
Packit 5c3484
Packit 5c3484
Packit 5c3484
Packit 5c3484
LABEL SYNTAX
Packit 5c3484
Packit 5c3484
The HP-UX assembler takes labels starting in column 0 with no colon,
Packit 5c3484
Packit 5c3484
	L$loop  ldws,mb -4(0,%r25),%r22
Packit 5c3484
Packit 5c3484
Gas on hppa GNU/Linux however requires a colon,
Packit 5c3484
Packit 5c3484
	L$loop: ldws,mb -4(0,%r25),%r22
Packit 5c3484
Packit 5c3484
This is covered by using LDEF() from asm-defs.m4.  An alternative would be
Packit 5c3484
to use ".label" which is accepted by both,
Packit 5c3484
Packit 5c3484
		.label  L$loop
Packit 5c3484
		ldws,mb -4(0,%r25),%r22
Packit 5c3484
Packit 5c3484
but that's not as nice to look at, not if you're used to assembler code
Packit 5c3484
having labels in column 0.
Packit 5c3484
Packit 5c3484
Packit 5c3484
Packit 5c3484
Packit 5c3484
REFERENCES
Packit 5c3484
Packit 5c3484
Hewlett Packard, "HP Assembler Reference Manual", 9th edition, June 1998,
Packit 5c3484
part number 92432-90012.
Packit 5c3484
Packit 5c3484
Packit 5c3484
Packit 5c3484
----------------
Packit 5c3484
Local variables:
Packit 5c3484
mode: text
Packit 5c3484
fill-column: 76
Packit 5c3484
End: