Tree - source-git/gmp - CentOS Git server

source-git / gmp

Blame mpn/s390_64/README

Blob History Raw

Packit	5c3484	`Copyright 2011 Free Software Foundation, Inc.`
Packit	5c3484
Packit	5c3484	`This file is part of the GNU MP Library.`
Packit	5c3484
Packit	5c3484	`The GNU MP Library is free software; you can redistribute it and/or modify`
Packit	5c3484	`it under the terms of either:`
Packit	5c3484
Packit	5c3484	`* the GNU Lesser General Public License as published by the Free`
Packit	5c3484	`Software Foundation; either version 3 of the License, or (at your`
Packit	5c3484	`option) any later version.`
Packit	5c3484
Packit	5c3484	`or`
Packit	5c3484
Packit	5c3484	`* the GNU General Public License as published by the Free Software`
Packit	5c3484	`Foundation; either version 2 of the License, or (at your option) any`
Packit	5c3484	`later version.`
Packit	5c3484
Packit	5c3484	`or both in parallel, as here.`
Packit	5c3484
Packit	5c3484	`The GNU MP Library is distributed in the hope that it will be useful, but`
Packit	5c3484	`WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY`
Packit	5c3484	`or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License`
Packit	5c3484	`for more details.`
Packit	5c3484
Packit	5c3484	`You should have received copies of the GNU General Public License and the`
Packit	5c3484	`GNU Lesser General Public License along with the GNU MP Library. If not,`
Packit	5c3484	`see https://www.gnu.org/licenses/.`
Packit	5c3484
Packit	5c3484
Packit	5c3484
Packit	5c3484	`There are 5 generations of 64-but s390 processors, z900, z990, z9,`
Packit	5c3484	`z10, and z196. The current GMP code was optimised for the two oldest,`
Packit	5c3484	`z900 and z990.`
Packit	5c3484
Packit	5c3484
Packit	5c3484	`mpn_copyi`
Packit	5c3484
Packit	5c3484	`This code makes use of a loop around MVC. It almost surely runs very`
Packit	5c3484	`close to optimally. A small improvement could be done by using one`
Packit	5c3484	`MVC for size 256 bytes, now we use two (we use an extra MVC when`
Packit	5c3484	`copying any multiple of 256 bytes).`
Packit	5c3484
Packit	5c3484
Packit	5c3484	`mpn_copyd`
Packit	5c3484
Packit	5c3484	`We have tried several feed-in variants here, branch tree, jump table`
Packit	5c3484	`and computed goto. The fastest (on z990) turned out to be computed`
Packit	5c3484	`goto.`
Packit	5c3484
Packit	5c3484	`An approach not tried is EX of LMG and STMG, modifying the register set`
Packit	5c3484	`on-the-fly. Using that trick, we could completely avoid using`
Packit	5c3484	`separate feed-in paths.`
Packit	5c3484
Packit	5c3484
Packit	5c3484	`mpn_lshift, mpn_rshift`
Packit	5c3484
Packit	5c3484	`The current code runs at pipeline decode bandwidth on z990.`
Packit	5c3484
Packit	5c3484
Packit	5c3484	`mpn_add_n, mpn_sub_n`
Packit	5c3484
Packit	5c3484	`The current code is 4-way unrolled. It should be unrolled more, at`
Packit	5c3484	`least 8x, in order to reach 2.5 c/l.`
Packit	5c3484
Packit	5c3484
Packit	5c3484	`mpn_mul_1, mpn_addmul_1, mpn_submul_1`
Packit	5c3484
Packit	5c3484	`The current code is very naive, but due to the non-pipelined nature of`
Packit	5c3484	`MLGR on z900 and z990, more sophisticated code would not gain much.`
Packit	5c3484
Packit	5c3484	`On z10 one would need to cluster at least 4 MLGR together, in order to`
Packit	5c3484	`reduce stalling.`
Packit	5c3484
Packit	5c3484	`On z196, one surely want to use unrolling and pipelining, to perhaps`
Packit	5c3484	`reach around 12 c/l. A major issue here and on z10 is ALCGR's 3 cycle`
Packit	5c3484	`stalling.`
Packit	5c3484
Packit	5c3484
Packit	5c3484	`mpn_mul_2, mpn_addmul_2`
Packit	5c3484
Packit	5c3484	`At least for older machines (z900, z990) with very slow MLGR, we`
Packit	5c3484	`should use Karatsuba's algorithm on 2-limb units, making mul_2 and`
Packit	5c3484	`addmul_2 the main multiplication primitives. The newer machines might`
Packit	5c3484	`benefit less from this approach, perhaps in particular z10, where MLGR`
Packit	5c3484	`clustering is more important.`
Packit	5c3484
Packit	5c3484	`With Karatsuba, one could hope for around 16 cycles per accumulated`
Packit	5c3484	`128 cross product, on z990.`

source-git / gmp

Source Code

Blame mpn/s390_64/README