Tree - source-git/gmp - CentOS Git server

source-git / gmp

Blame mpn/cray/README

Blob History Raw

Packit	5c3484	`Copyright 2000-2002 Free Software Foundation, Inc.`
Packit	5c3484
Packit	5c3484	`This file is part of the GNU MP Library.`
Packit	5c3484
Packit	5c3484	`The GNU MP Library is free software; you can redistribute it and/or modify`
Packit	5c3484	`it under the terms of either:`
Packit	5c3484
Packit	5c3484	`* the GNU Lesser General Public License as published by the Free`
Packit	5c3484	`Software Foundation; either version 3 of the License, or (at your`
Packit	5c3484	`option) any later version.`
Packit	5c3484
Packit	5c3484	`or`
Packit	5c3484
Packit	5c3484	`* the GNU General Public License as published by the Free Software`
Packit	5c3484	`Foundation; either version 2 of the License, or (at your option) any`
Packit	5c3484	`later version.`
Packit	5c3484
Packit	5c3484	`or both in parallel, as here.`
Packit	5c3484
Packit	5c3484	`The GNU MP Library is distributed in the hope that it will be useful, but`
Packit	5c3484	`WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY`
Packit	5c3484	`or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License`
Packit	5c3484	`for more details.`
Packit	5c3484
Packit	5c3484	`You should have received copies of the GNU General Public License and the`
Packit	5c3484	`GNU Lesser General Public License along with the GNU MP Library. If not,`
Packit	5c3484	`see https://www.gnu.org/licenses/.`
Packit	5c3484
Packit	5c3484
Packit	5c3484
Packit	5c3484
Packit	5c3484
Packit	5c3484
Packit	5c3484	`The code in this directory works for Cray vector systems such as C90,`
Packit	5c3484	`J90, T90 (both the CFP variant and the IEEE variant) and SV1. (For`
Packit	5c3484	the T3E and T3D systems, see the `alpha' subdirectory at the same
Packit	5c3484	`level as the directory containing this file.)`
Packit	5c3484
Packit	5c3484	`The cfp subdirectory is for systems utilizing the traditional Cray`
Packit	5c3484	`floating-point format, and the ieee subdirectory is for the newer`
Packit	5c3484	`systems that use the IEEE floating-point format.`
Packit	5c3484
Packit	5c3484	`There are several issues that reduces speed on Cray systems. For`
Packit	5c3484	`systems with cfp floating point, the main obstacle is the forming of`
Packit	5c3484	`128-bit products. For IEEE systems, adding, and in particular`
Packit	5c3484	`computing carry is the main issue. There are no vectorizing`
Packit	5c3484	`unsigned-less-than instructions, and the sequence that implement that`
Packit	5c3484	`operation is very long.`
Packit	5c3484
Packit	5c3484	`Shifting is the only operation that is simple to make fast. All Cray`
Packit	5c3484	`systems have a bitblt instructions (Vi Vj,Vj<Ak and Vi Vj,Vj>Ak) that`
Packit	5c3484	`should be really useful.`
Packit	5c3484
Packit	5c3484	`For best speed for cfp systems, we need a mul_basecase, since that`
Packit	5c3484	`reduces the need for carry propagation to a minimum. Depending on the`
Packit	5c3484	`size (vn) of the smaller of the two operands (V), we should split U and V`
Packit	5c3484	`in different chunk sizes:`
Packit	5c3484
Packit	5c3484	`U split in 2 32-bit parts`
Packit	5c3484	`V split according to the table:`
Packit	5c3484	`parts 4 5 6 7 8`
Packit	5c3484	`bits/part 16 13 11 10 8`
Packit	5c3484	`max allowed vn 1 8 32 64 256`
Packit	5c3484	`number of multiplies 8 10 12 14 16`
Packit	5c3484	`peak cycles/limb 4 5 6 7 8`
Packit	5c3484
Packit	5c3484	`U split in 3 22-bit parts`
Packit	5c3484	`V split according to the table:`
Packit	5c3484	`parts 3 4 5`
Packit	5c3484	`bits/part 22 16 13`
Packit	5c3484	`max allowed vn 16 1024 8192`
Packit	5c3484	`number of multiplies 9 12 15`
Packit	5c3484	`peak cycles/limb 4.5 6 7.5`
Packit	5c3484
Packit	5c3484	`U split in 4 16-bit parts`
Packit	5c3484	`V split according to the table:`
Packit	5c3484	`parts 4`
Packit	5c3484	`bits/part 16`
Packit	5c3484	`max allowed vn 65536`
Packit	5c3484	`number of multiplies 16`
Packit	5c3484	`peak cycles/limb 8`
Packit	5c3484
Packit	5c3484	`(A T90 CPU can accumulate two products per cycle.)`
Packit	5c3484
Packit	5c3484	`IDEA:`
Packit	5c3484	`* Rewrite mpn_add_n:`
Packit	5c3484	`short cy[n + 1];`
Packit	5c3484	`#pragma _CRI ivdep`
Packit	5c3484	`for (i = 0; i < n; i++)`
Packit	5c3484	`{ s = up[i] + vp[i];`
Packit	5c3484	`rp[i] = s;`
Packit	5c3484	`cy[i + 1] = s < up[i]; }`
Packit	5c3484	`more_carries = 0;`
Packit	5c3484	`#pragma _CRI ivdep`
Packit	5c3484	`for (i = 1; i < n; i++)`
Packit	5c3484	`{ s = rp[i] + cy[i];`
Packit	5c3484	`rp[i] = s;`
Packit	5c3484	`more_carries += s < cy[i]; }`
Packit	5c3484	`cys = 0;`
Packit	5c3484	`if (more_carries)`
Packit	5c3484	`{`
Packit	5c3484	`cys = rp[1] < cy[1];`
Packit	5c3484	`for (i = 2; i < n; i++)`
Packit	5c3484	`{ rp[i] += cys;`
Packit	5c3484	`cys = rp[i] < cys; }`
Packit	5c3484	`}`
Packit	5c3484	`return cys + cy[n];`
Packit	5c3484
Packit	5c3484	`* Write mpn_add3_n for adding three operands. First add operands 1`
Packit	5c3484	`and 2, and generate cy[]. Then add operand 3 to the partial result,`
Packit	5c3484	`and accumulate carry into cy[]. Finally propagate carry just like`
Packit	5c3484	`in the new mpn_add_n.`
Packit	5c3484
Packit	5c3484	`IDEA:`
Packit	5c3484
Packit	5c3484	`Store fewer bits, perhaps 62, per limb. That brings mpn_add_n time`
Packit	5c3484	`down to 2.5 cycles/limb and mpn_addmul_1 times to 4 cycles/limb. By`
Packit	5c3484	`storing even fewer bits per limb, perhaps 56, it would be possible to`
Packit	5c3484	`write a mul_mul_basecase that would run at effectively 1 cycle/limb.`
Packit	5c3484	`(Use VM here to better handle the romb-shaped multiply area, perhaps`
Packit	5c3484	`rounding operand sizes up to the next power of 2.)`

source-git / gmp

Source Code

Blame mpn/cray/README