Tree - source-git/gmp - CentOS Git server

source-git / gmp

Blame doc/tasks.html

Blob History Raw

Packit	5c3484
Packit	5c3484	`<html>`
Packit	5c3484	`<head>`
Packit	5c3484	`<title>GMP Itemized Development Tasks</title>`
Packit	5c3484	`<link rel="shortcut icon" href="favicon.ico">`
Packit	5c3484	`<link rel="stylesheet" href="gmp.css">`
Packit	5c3484	`<meta http-equiv="Content-Type" content="text/html; charset=utf-8">`
Packit	5c3484	`</head>`
Packit	5c3484
Packit	5c3484	`<center>`
Packit	5c3484
Packit	5c3484	`GMP Itemized Development Tasks`
Packit	5c3484
Packit	5c3484	`</center>`
Packit	5c3484
Packit	5c3484	`<font size=-1>`
Packit	5c3484
Packit	5c3484	`Copyright 2000-2004, 2006, 2008, 2009 Free Software Foundation, Inc.`
Packit	5c3484
Packit	5c3484	`This file is part of the GNU MP Library.`
Packit	5c3484
Packit	5c3484	`The GNU MP Library is free software; you can redistribute it and/or modify`
Packit	5c3484	`it under the terms of either:`
Packit	5c3484
Packit	5c3484	`* the GNU Lesser General Public License as published by the Free`
Packit	5c3484	`Software Foundation; either version 3 of the License, or (at your`
Packit	5c3484	`option) any later version.`
Packit	5c3484
Packit	5c3484	`or`
Packit	5c3484
Packit	5c3484	`* the GNU General Public License as published by the Free Software`
Packit	5c3484	`Foundation; either version 2 of the License, or (at your option) any`
Packit	5c3484	`later version.`
Packit	5c3484
Packit	5c3484	`or both in parallel, as here.`
Packit	5c3484
Packit	5c3484	`The GNU MP Library is distributed in the hope that it will be useful, but`
Packit	5c3484	`WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY`
Packit	5c3484	`or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License`
Packit	5c3484	`for more details.`
Packit	5c3484
Packit	5c3484	`You should have received copies of the GNU General Public License and the`
Packit	5c3484	`GNU Lesser General Public License along with the GNU MP Library. If not,`
Packit	5c3484	`see https://www.gnu.org/licenses/.`
Packit	5c3484
Packit	5c3484	`</font>`
Packit	5c3484
Packit	5c3484
Packit	5c3484
Packit	5c3484	`This file current as of 29 Jan 2014. An up-to-date version is available at`
Packit	5c3484	`https://gmplib.org/tasks.html.`
Packit	5c3484	`Please send comments about this page to gmp-devel<font>@</font>gmplib.org.`
Packit	5c3484
Packit	5c3484	`These are itemized GMP development tasks. Not all the tasks`
Packit	5c3484	`listed here are suitable for volunteers, but many of them are.`
Packit	5c3484	`Please see the projects file for more`
Packit	5c3484	`sizeable projects.`
Packit	5c3484
Packit	5c3484	`CAUTION: This file needs updating. Many of the tasks here have`
Packit	5c3484	`either already been taken care of, or have become irrelevant.`
Packit	5c3484
Packit	5c3484	`Correctness and Completeness`
Packit	5c3484
Packit	5c3484	`_LONG_LONG_LIMB in gmp.h is not namespace clean. Reported`
Packit	5c3484	`by Patrick Pelissier.`
Packit	5c3484
Packit	5c3484	`We sort of mentioned _LONG_LONG_LIMB in past releases, so`
Packit	5c3484	`need to be careful about changing it. It used to be a define`
Packit	5c3484	`applications had to set for long long limb systems, but that in`
Packit	5c3484	`particular is no longer relevant now that it's established automatically.`
Packit	5c3484	`The various reuse.c tests need to force reallocation by calling`
Packit	5c3484	`_mpz_realloc with a small (1 limb) size.`
Packit	5c3484	`One reuse case is missing from mpX/tests/reuse.c:`
Packit	5c3484	`mpz_XXX(a,a,a).`
Packit	5c3484	Make the string reading functions allow the `0x' prefix when the base is
Packit	5c3484	`explicitly 16. They currently only allow that prefix when the base is`
Packit	5c3484	`unspecified (zero).`
Packit	5c3484	`mpf_eq is not always correct, when one operand is`
Packit	5c3484	`1000000000... and the other operand is 0111111111..., i.e., extremely`
Packit	5c3484	`close. There is a special case in mpf_sub for this`
Packit	5c3484	`situation; put similar code in mpf_eq. [In progress.]`
Packit	5c3484	`mpf_eq doesn't implement what gmp.texi specifies. It should`
Packit	5c3484	`not use just whole limbs, but partial limbs. [In progress.]`
Packit	5c3484	`mpf_set_str doesn't validate it's exponent, for instance`
Packit	5c3484	`garbage 123.456eX789X is accepted (and an exponent 0 used), and overflow`
Packit	5c3484	`of a long is not detected.`
Packit	5c3484	`mpf_add doesn't check for a carry from truncated portions of`
Packit	5c3484	`the inputs, and in that respect doesn't implement the "infinite precision`
Packit	5c3484	`followed by truncate" specified in the manual.`
Packit	5c3484	`Windows DLLs: tests/mpz/reuse.c and tests/mpf/reuse.c initialize global`
Packit	5c3484	`variables with pointers to mpz_add etc, which doesn't work`
Packit	5c3484	`when those routines are coming from a DLL (because they're effectively`
Packit	5c3484	`function pointer global variables themselves). Need to rearrange perhaps`
Packit	5c3484	`to a set of calls to a test function rather than iterating over an array.`
Packit	5c3484	`mpz_pow_ui: Detect when the result would be more memory than`
Packit	5c3484	`a size_t can represent and raise some suitable exception,`
Packit	5c3484	`probably an alloc call asking for SIZE_T_MAX, and if that`
Packit	5c3484	`somehow succeeds then an abort. Various size overflows of`
Packit	5c3484	`this kind are not handled gracefully, probably resulting in segvs.`
Packit	5c3484
Packit	5c3484	`In mpz_n_pow_ui, detect when the count of low zero bits`
Packit	5c3484	`exceeds an unsigned long. There's a (small) chance of this`
Packit	5c3484	`happening but still having enough memory to represent the value.`
Packit	5c3484	`Reported by Winfried Dreckmann in for instance mpz_ui_pow_ui (x,`
Packit	5c3484	`4UL, 1431655766UL).`
Packit	5c3484	`mpf: Detect exponent overflow and raise some exception.`
Packit	5c3484	`It'd be nice to allow the full mp_exp_t range since that's`
Packit	5c3484	`how it's been in the past, but maybe dropping one bit would make it`
Packit	5c3484	`easier to test if e1+e2 goes out of bounds.`
Packit	5c3484
Packit	5c3484
Packit	5c3484
Packit	5c3484
Packit	5c3484	`Machine Independent Optimization`
Packit	5c3484
Packit	5c3484	`mpf_cmp: For better cache locality, don't test for low zero`
Packit	5c3484	`limbs until the high limbs fail to give an ordering. Reduce code size by`
Packit	5c3484	`turning the three mpn_cmp's into a single loop stopping when`
Packit	5c3484	`the end of one operand is reached (and then looking for a non-zero in the`
Packit	5c3484	`rest of the other).`
Packit	5c3484	`mpf_mul_2exp, mpf_div_2exp: The use of`
Packit	5c3484	`mpn_lshift for any size<=prec means repeated`
Packit	5c3484	`mul_2exp and div_2exp calls accumulate low zero`
Packit	5c3484	`limbs until size==prec+1 is reached. Those zeros will slow down`
Packit	5c3484	`subsequent operations, especially if the value is otherwise only small.`
Packit	5c3484	`If low bits of the low limb are zero, use mpn_rshift so as`
Packit	5c3484	`to not increase the size.`
Packit	5c3484	`mpn_dc_sqrtrem, mpn_sqrtrem2: Don't use`
Packit	5c3484	`mpn_add_1 and mpn_sub_1 for 1 limb operations,`
Packit	5c3484	`instead ADDC_LIMB and SUBC_LIMB.`
Packit	5c3484	`mpn_sqrtrem2: Use plain variables for sp[0] and`
Packit	5c3484	`rp[0] calculations, so the compiler needn't worry about`
Packit	5c3484	`aliasing between sp and rp.`
Packit	5c3484	`mpn_sqrtrem: Some work can be saved in the last step when`
Packit	5c3484	`the remainder is not required, as noted in Paul's paper.`
Packit	5c3484	`mpq_add, mpq_sub: The gcd fits a single limb`
Packit	5c3484	`with high probability and in this case binvert_limb could`
Packit	5c3484	`be used to calculate the inverse just once for the two exact divisions`
Packit	5c3484	`"op1.den / gcd" and "op2.den / gcd", rather than letting`
Packit	5c3484	`mpn_bdiv_q_1 do it each time. This would require calling`
Packit	5c3484	`mpn_pi1_bdiv_q_1.`
Packit	5c3484	`mpn_gcdext: Don't test count_leading_zeros for`
Packit	5c3484	`zero, instead check the high bit of the operand and avoid invoking`
Packit	5c3484	`count_leading_zeros. This is an optimization on all`
Packit	5c3484	`machines, and significant on machines with slow`
Packit	5c3484	`count_leading_zeros, though it's possible an already`
Packit	5c3484	`normalized operand might not be encountered very often.`
Packit	5c3484	`Rewrite umul_ppmm to use floating-point for generating the`
Packit	5c3484	`most significant limb (if GMP_LIMB_BITS &lt= 52 bits).`
Packit	5c3484	`(Peter Montgomery has some ideas on this subject.)`
Packit	5c3484	`Improve the default umul_ppmm code in longlong.h: Add partial`
Packit	5c3484	`products with fewer operations.`
Packit	5c3484	`Consider inlining mpz_set_ui. This would be both small and`
Packit	5c3484	`fast, especially for compile-time constants, but would make application`
Packit	5c3484	`binaries depend on having 1 limb allocated to an mpz_t,`
Packit	5c3484	`preventing the "lazy" allocation scheme below.`
Packit	5c3484	`Consider inlining mpz_[cft]div_ui and maybe`
Packit	5c3484	`mpz_[cft]div_r_ui. A __gmp_divide_by_zero`
Packit	5c3484	`would be needed for the divide by zero test, unless that could be left to`
Packit	5c3484	`mpn_mod_1 (not sure currently whether all the risc chips`
Packit	5c3484	`provoke the right exception there if using mul-by-inverse).`
Packit	5c3484	`Consider inlining: mpz_fits_s*_p. The setups for`
Packit	5c3484	`LONG_MAX etc would need to go into gmp.h, and on Cray it`
Packit	5c3484	`might, unfortunately, be necessary to forcibly include <limits.h>`
Packit	5c3484	`since there's no apparent way to get SHRT_MAX with an`
Packit	5c3484	`expression (since short and unsigned short can`
Packit	5c3484	`be different sizes).`
Packit	5c3484	`mpz_powm and mpz_powm_ui aren't very fast on one`
Packit	5c3484	`or two limb moduli, due to a lot of function call overheads. These could`
Packit	5c3484	`perhaps be handled as special cases.`
Packit	5c3484	`Make sure mpz_powm_ui is never slower than the corresponding`
Packit	5c3484	`computation using mpz_powm.`
Packit	5c3484	`mpz_powm REDC should do multiplications by g[]`
Packit	5c3484	`using the division method when they're small, since the REDC form of a`
Packit	5c3484	`small multiplier is normally a full size product. Probably would need a`
Packit	5c3484	`new tuned parameter to say what size multiplier is "small", as a function`
Packit	5c3484	`of the size of the modulus.`
Packit	5c3484	`mpn_gcd might be able to be sped up on small to moderate`
Packit	5c3484	`sizes by improving find_a, possibly just by providing an`
Packit	5c3484	`alternate implementation for CPUs with slowish`
Packit	5c3484	`count_leading_zeros.`
Packit	5c3484	`mpf_set_str produces low zero limbs when a string has a`
Packit	5c3484	`fraction but is exactly representable, eg. 0.5 in decimal. These could be`
Packit	5c3484	`stripped to save work in later operations.`
Packit	5c3484	`mpz_and, mpz_ior and mpz_xor should`
Packit	5c3484	`use mpn_and_n etc for the benefit of the small number of`
Packit	5c3484	`targets with native versions of those routines. Need to be careful not to`
Packit	5c3484	`pass size==0. Is some code sharing possible between the mpz`
Packit	5c3484	`routines?`
Packit	5c3484	`mpf_add: Don't do a copy to avoid overlapping operands`
Packit	5c3484	`unless it's really necessary (currently only sizes are tested, not`
Packit	5c3484	`whether r really is u or v).`
Packit	5c3484	`mpf_add: Under the check for v having no effect on the`
Packit	5c3484	`result, perhaps test for r==u and do nothing in that case, rather than`
Packit	5c3484	`currently it looks like an MPN_COPY_INCR will be done to`
Packit	5c3484	`reduce prec+1 limbs to prec.`
Packit	5c3484	`mpf_div_ui: Instead of padding with low zeros, call`
Packit	5c3484	`mpn_divrem_1 asking for fractional quotient limbs.`
Packit	5c3484	`mpf_div_ui: Eliminate TMP_ALLOC. When r!=u`
Packit	5c3484	`there's no overlap and the division can be called on those operands.`
Packit	5c3484	`When r==u and is prec+1 limbs, then it's an in-place division. If r==u`
Packit	5c3484	`and not prec+1 limbs, then move the available limbs up to prec+1 and do`
Packit	5c3484	`an in-place there.`
Packit	5c3484	`mpf_div_ui: Whether the high quotient limb is zero can be`
Packit	5c3484	`determined by testing the dividend for high<divisor. When non-zero, the`
Packit	5c3484	`division can be done on prec dividend limbs instead of prec+1. The result`
Packit	5c3484	`size is also known before the division, so that can be a tail call (once`
Packit	5c3484	`the TMP_ALLOC is eliminated).`
Packit	5c3484	`mpn_divrem_2 could usefully accept unnormalized divisors and`
Packit	5c3484	`shift the dividend on-the-fly, since this should cost nothing on`
Packit	5c3484	`superscalar processors and avoid the need for temporary copying in`
Packit	5c3484	`mpn_tdiv_qr.`
Packit	5c3484	`mpf_sqrt: If r!=u, and if u doesn't need to be padded with`
Packit	5c3484	`zeros, then there's no need for the tp temporary.`
Packit	5c3484	`mpq_cmp_ui could form the num1*den2 and`
Packit	5c3484	`num2*den1 products limb-by-limb from high to low and look at`
Packit	5c3484	`each step for values differing by more than the possible carry bit from`
Packit	5c3484	`the uncalculated portion.`
Packit	5c3484	`mpq_cmp could do the same high-to-low progressive multiply`
Packit	5c3484	`and compare. The benefits of karatsuba and higher multiplication`
Packit	5c3484	`algorithms are lost, but if it's assumed only a few high limbs will be`
Packit	5c3484	`needed to determine an order then that's fine.`
Packit	5c3484	`mpn_add_1, mpn_sub_1, mpn_add,`
Packit	5c3484	`mpn_sub: Internally use __GMPN_ADD_1 etc`
Packit	5c3484	`instead of the functions, so they get inlined on all compilers, not just`
Packit	5c3484	`gcc and others with inline recognised in gmp.h.`
Packit	5c3484	`__GMPN_ADD_1 etc are meant mostly to support application`
Packit	5c3484	`inline mpn_add_1 etc and if they don't come out good for`
Packit	5c3484	`internal uses then special forms can be introduced, for instance many`
Packit	5c3484	`internal uses are in-place. Sometimes a block of code is executed based`
Packit	5c3484	`on the carry-out, rather than using it arithmetically, and those places`
Packit	5c3484	`might want to do their own loops entirely.`
Packit	5c3484	`__gmp_extract_double on 64-bit systems could use just one`
Packit	5c3484	`bitfield for the mantissa extraction, not two, when endianness permits.`
Packit	5c3484	`Might depend on the compiler allowing long long bit fields`
Packit	5c3484	`when that's the only actual 64-bit type.`
Packit	5c3484	`tal-notreent.c could keep a block of memory permanently allocated.`
Packit	5c3484	`Currently the last nested TMP_FREE releases all memory, so`
Packit	5c3484	`there's an allocate and free every time a top-level function using`
Packit	5c3484	`TMP is called. Would need`
Packit	5c3484	`mp_set_memory_functions to tell tal-notreent.c to release`
Packit	5c3484	`any cached memory when changing allocation functions though.`
Packit	5c3484	`__gmp_tmp_alloc from tal-notreent.c could be partially`
Packit	5c3484	`inlined. If the current chunk has enough room then a couple of pointers`
Packit	5c3484	`can be updated. Only if more space is required then a call to some sort`
Packit	5c3484	`of __gmp_tmp_increase would be needed. The requirement that`
Packit	5c3484	`TMP_ALLOC is an expression might make the implementation a`
Packit	5c3484	`bit ugly and/or a bit sub-optimal.`
Packit	5c3484
Packit	5c3484	`#define TMP_ALLOC(n)`
Packit	5c3484	`((ROUND_UP(n) > current->end - current->point ?`
Packit	5c3484	`__gmp_tmp_increase (ROUND_UP (n)) : 0),`
Packit	5c3484	`current->point += ROUND_UP (n),`
Packit	5c3484	`current->point - ROUND_UP (n))`
Packit	5c3484
Packit	5c3484	`__mp_bases has a lot of data for bases which are pretty much`
Packit	5c3484	`never used. Perhaps the table should just go up to base 16, and have`
Packit	5c3484	`code to generate data above that, if and when required. Naturally this`
Packit	5c3484	`assumes the code would be smaller than the data saved.`
Packit	5c3484	`__mp_bases field big_base_inverted is only used`
Packit	5c3484	`if USE_PREINV_DIVREM_1 is true, and could be omitted`
Packit	5c3484	`otherwise, to save space.`
Packit	5c3484	`mpz_get_str, mtox: For power-of-2 bases, which`
Packit	5c3484	`are of course fast, it seems a little silly to make a second pass over`
Packit	5c3484	`the mpn_get_str output to convert to ASCII. Perhaps combine`
Packit	5c3484	`that with the bit extractions.`
Packit	5c3484	`mpz_gcdext: If the caller requests only the S cofactor (of`
Packit	5c3484	`A), and A<B, then the code ends up generating the cofactor T (of B) and`
Packit	5c3484	`deriving S from that. Perhaps it'd be possible to arrange to get S in`
Packit	5c3484	`the first place by calling mpn_gcdext with A+B,B. This`
Packit	5c3484	`might only be an advantage if A and B are about the same size.`
Packit	5c3484	`mpz_n_pow_ui does a good job with small bases and stripping`
Packit	5c3484	`powers of 2, but it's perhaps a bit too complicated for what it gains.`
Packit	5c3484	`The simpler mpn_pow_1 is a little faster on small exponents.`
Packit	5c3484	`(Note some of the ugliness in mpz_n_pow_ui is due to`
Packit	5c3484	`supporting mpn_mul_2.)`
Packit	5c3484
Packit	5c3484	`Perhaps the stripping of 2s in mpz_n_pow_ui should be`
Packit	5c3484	`confined to single limb operands for simplicity and since that's where`
Packit	5c3484	`the greatest gain would be.`
Packit	5c3484
Packit	5c3484	`Ideally mpn_pow_1 and mpz_n_pow_ui would be`
Packit	5c3484	`merged. The reason mpz_n_pow_ui writes to an`
Packit	5c3484	`mpz_t is that its callers leave it to make a good estimate`
Packit	5c3484	`of the result size. Callers of mpn_pow_1 already know the`
Packit	5c3484	`size by separate means (mp_bases).`
Packit	5c3484	`mpz_invert should call mpn_gcdext directly.`
Packit	5c3484
Packit	5c3484
Packit	5c3484
Packit	5c3484	`Machine Dependent Optimization`
Packit	5c3484
Packit	5c3484	`invert_limb on various processors might benefit from the`
Packit	5c3484	`little Newton iteration done for alpha and ia64.`
Packit	5c3484	`Alpha 21264: mpn_addlsh1_n could be implemented with`
Packit	5c3484	`mpn_addmul_1, since that code at 3.5 is a touch faster than`
Packit	5c3484	`a separate lshift and add_n at`
Packit	5c3484	`1.75+2.125=3.875. Or very likely some specific addlsh1_n`
Packit	5c3484	`code could beat both.`
Packit	5c3484	`Alpha 21264: Improve feed-in code for mpn_mul_1,`
Packit	5c3484	`mpn_addmul_1, and mpn_submul_1.`
Packit	5c3484	`Alpha 21164: Rewrite mpn_mul_1, mpn_addmul_1,`
Packit	5c3484	`and mpn_submul_1 for the 21164. This should use both integer`
Packit	5c3484	`multiplies and floating-point multiplies. For the floating-point`
Packit	5c3484	`operations, the single-limb multiplier should be split into three 21-bit`
Packit	5c3484	`chunks, or perhaps even better in four 16-bit chunks. Probably possible`
Packit	5c3484	`to reach 9 cycles/limb.`
Packit	5c3484	`Alpha: GCC 3.4 will introduce __builtin_ctzl,`
Packit	5c3484	`__builtin_clzl and __builtin_popcountl using`
Packit	5c3484	`the corresponding CIX ct instructions, and`
Packit	5c3484	`__builtin_alpha_cmpbge. These should give GCC more`
Packit	5c3484	`information about scheduling etc than the asm blocks`
Packit	5c3484	`currently used in longlong.h and gmp-impl.h.`
Packit	5c3484	`Alpha Unicos: Apparently there's no alloca on this system,`
Packit	5c3484	`making configure choose the slower`
Packit	5c3484	`malloc-reentrant allocation method. Is there a better way?`
Packit	5c3484	`Maybe variable-length arrays per notes below.`
Packit	5c3484	`Alpha Unicos 21164, 21264: .align is not used since it pads`
Packit	5c3484	`with garbage. Does the code get the intended slotting required for the`
Packit	5c3484	`claimed speeds? .align at the start of a function would`
Packit	5c3484	`presumably be safe no matter how it pads.`
Packit	5c3484	`ARM V5: count_leading_zeros can use the clz`
Packit	5c3484	`instruction. For GCC 3.4 and up, do this via __builtin_clzl`
Packit	5c3484	`since then gcc knows it's "predicable".`
Packit	5c3484	`Itanium: GCC 3.4 introduces __builtin_popcount which can be`
Packit	5c3484	`used instead of an asm block. The builtin should give gcc`
Packit	5c3484	`more opportunities for scheduling, bundling and predication.`
Packit	5c3484	`__builtin_ctz similarly (it just uses popcount as per`
Packit	5c3484	`current longlong.h).`
Packit	5c3484	`UltraSPARC/64: Optimize mpn_mul_1, mpn_addmul_1,`
Packit	5c3484	`for s2 < 2^32 (or perhaps for any zero 16-bit s2 chunk). Not sure how`
Packit	5c3484	`much this can improve the speed, though, since the symmetry that we rely`
Packit	5c3484	`on is lost. Perhaps we can just gain cycles when s2 < 2^16, or more`
Packit	5c3484	`accurately, when two 16-bit s2 chunks which are 16 bits apart are zero.`
Packit	5c3484	`UltraSPARC/64: Write native mpn_submul_1, analogous to`
Packit	5c3484	`mpn_addmul_1.`
Packit	5c3484	`UltraSPARC/64: Write umul_ppmm. Using four`
Packit	5c3484	`"mulx"s either with an asm block or via the generic C code is`
Packit	5c3484	`about 90 cycles. Try using fp operations, and also try using karatsuba`
Packit	5c3484	`for just three "mulx"s.`
Packit	5c3484	`UltraSPARC/32: Rewrite mpn_lshift, mpn_rshift.`
Packit	5c3484	`Will give 2 cycles/limb. Trivial modifications of mpn/sparc64 should do.`
Packit	5c3484	`UltraSPARC/32: Write special mpn_Xmul_1 loops for s2 < 2^16.`
Packit	5c3484	`UltraSPARC/32: Use mulx for umul_ppmm if`
Packit	5c3484	`possible (see commented out code in longlong.h). This is unlikely to`
Packit	5c3484	`save more than a couple of cycles, so perhaps isn't worth bothering with.`
Packit	5c3484	`UltraSPARC/32: On Solaris gcc doesn't give us __sparc_v9__`
Packit	5c3484	`or anything to indicate V9 support when -mcpu=v9 is selected. See`
Packit	5c3484	`gcc/config/sol2-sld-64.h. Will need to pass something through from`
Packit	5c3484	`./configure to select the right code in longlong.h. (Currently nothing`
Packit	5c3484	`is lost because mulx for multiplying is commented out.)`
Packit	5c3484	`UltraSPARC/32: mpn_divexact_1 and`
Packit	5c3484	`mpn_modexact_1c_odd can use a 64-bit inverse and take`
Packit	5c3484	`64-bits at a time from the dividend, as per the 32-bit divisor case in`
Packit	5c3484	`mpn/sparc64/mode1o.c. This must be done in assembler, since the full`
Packit	5c3484	`64-bit registers (%gN) are not available from C.`
Packit	5c3484	`UltraSPARC/32: mpn_divexact_by3c can work 64-bits at a time`
Packit	5c3484	`using mulx, in assembler. This would be the same as for`
Packit	5c3484	`sparc64.`
Packit	5c3484	`UltraSPARC: binvert_limb might save a few cycles from`
Packit	5c3484	`masking down to just the useful bits at each point in the calculation,`
Packit	5c3484	`since mulx speed depends on the highest bit set. Either`
Packit	5c3484	`explicit masks or small types like short and`
Packit	5c3484	`int ought to work.`
Packit	5c3484	`Sparc64 HAL R1 popc: This chip reputedly implements`
Packit	5c3484	`popc properly (see gcc sparc.md). Would need to recognise`
Packit	5c3484	`it as sparchalr1 or something in configure / config.sub /`
Packit	5c3484	`config.guess. popc_limb in gmp-impl.h could use this (per`
Packit	5c3484	`commented out code). count_trailing_zeros could use it too.`
Packit	5c3484	`PA64: Improve mpn_addmul_1, mpn_submul_1, and`
Packit	5c3484	`mpn_mul_1. The current code runs at 11 cycles/limb. It`
Packit	5c3484	`should be possible to saturate the cache, which will happen at 8`
Packit	5c3484	`cycles/limb (7.5 for mpn_mul_1). Write special loops for s2 < 2^32;`
Packit	5c3484	`it should be possible to make them run at about 5 cycles/limb.`
Packit	5c3484	`PPC601: See which of the power or powerpc32 code runs better. Currently`
Packit	5c3484	`the powerpc32 is used, but only because it's the default for`
Packit	5c3484	`powerpc*.`
Packit	5c3484	`PPC630: Rewrite mpn_addmul_1, mpn_submul_1, and`
Packit	5c3484	`mpn_mul_1. Use both integer and floating-point operations,`
Packit	5c3484	`possibly two floating-point and one integer limb per loop. Split operands`
Packit	5c3484	`into four 16-bit chunks for fast fp operations. Should easily reach 9`
Packit	5c3484	`cycles/limb (using one int + one fp), but perhaps even 7 cycles/limb`
Packit	5c3484	`(using one int + two fp).`
Packit	5c3484	`PPC630: mpn_rshift could do the same sort of unrolled loop`
Packit	5c3484	`as mpn_lshift. Some judicious use of m4 might let the two`
Packit	5c3484	`share source code, or with a register to control the loop direction`
Packit	5c3484	`perhaps even share object code.`
Packit	5c3484	`Implement mpn_mul_basecase and mpn_sqr_basecase`
Packit	5c3484	`for important machines. Helping the generic sqr_basecase.c with an`
Packit	5c3484	`mpn_sqr_diagonal might be enough for some of the RISCs.`
Packit	5c3484	`POWER2/POWER2SC: Schedule mpn_lshift/mpn_rshift.`
Packit	5c3484	`Will bring time from 1.75 to 1.25 cycles/limb.`
Packit	5c3484	`X86: Optimize non-MMX mpn_lshift for shifts by 1. (See`
Packit	5c3484	`Pentium code.)`
Packit	5c3484	`X86: Good authority has it that in the past an inline rep`
Packit	5c3484	`movs would upset GCC register allocation for the whole function.`
Packit	5c3484	`Is this still true in GCC 3? It uses rep movs itself for`
Packit	5c3484	`__builtin_memcpy. Examine the code for some simple and`
Packit	5c3484	`complex functions to find out. Inlining rep movs would be`
Packit	5c3484	`desirable, it'd be both smaller and faster.`
Packit	5c3484	`Pentium P54: mpn_lshift and mpn_rshift can come`
Packit	5c3484	`down from 6.0 c/l to 5.5 or 5.375 by paying attention to pairing after`
Packit	5c3484	`shrdl and shldl, see mpn/x86/pentium/README.`
Packit	5c3484	`Pentium P55 MMX: mpn_lshift and mpn_rshift`
Packit	5c3484	`might benefit from some destination prefetching.`
Packit	5c3484	`PentiumPro: mpn_divrem_1 might be able to use a`
Packit	5c3484	`mul-by-inverse, hoping for maybe 30 c/l.`
Packit	5c3484	`K7: mpn_lshift and mpn_rshift might be able to`
Packit	5c3484	`do something branch-free for unaligned startups, and shaving one insn`
Packit	5c3484	`from the loop with alternative indexing might save a cycle.`
Packit	5c3484	`PPC32: Try using fewer registers in the current mpn_lshift.`
Packit	5c3484	`The pipeline is now extremely deep, perhaps unnecessarily deep.`
Packit	5c3484	`Fujitsu VPP: Vectorize main functions, perhaps in assembly language.`
Packit	5c3484	`Fujitsu VPP: Write mpn_mul_basecase and`
Packit	5c3484	`mpn_sqr_basecase. This should use a "vertical multiplication`
Packit	5c3484	`method", to avoid carry propagation. splitting one of the operands in`
Packit	5c3484	`11-bit chunks.`
Packit	5c3484	`Pentium: mpn_lshift by 31 should use the special rshift`
Packit	5c3484	`by 1 code, and vice versa mpn_rshift by 31 should use the`
Packit	5c3484	`special lshift by 1. This would be best as a jump across to the other`
Packit	5c3484	`routine, could let both live in lshift.asm and omit rshift.asm on finding`
Packit	5c3484	`mpn_rshift already provided.`
Packit	5c3484	`Cray T3E: Experiment with optimization options. In particular,`
Packit	5c3484	`-hpipeline3 seems promising. We should at least up -O to -O2 or -O3.`
Packit	5c3484	`Cray: mpn_com and mpn_and_n etc very probably`
Packit	5c3484	`wants a pragma like MPN_COPY_INCR.`
Packit	5c3484	`Cray vector systems: mpn_lshift, mpn_rshift,`
Packit	5c3484	`mpn_popcount and mpn_hamdist are nice and small`
Packit	5c3484	`and could be inlined to avoid function calls.`
Packit	5c3484	`Cray: Variable length arrays seem to be faster than the tal-notreent.c`
Packit	5c3484	`scheme. Not sure why, maybe they merely give the compiler more`
Packit	5c3484	`information about aliasing (or the lack thereof). Would like to modify`
Packit	5c3484	`TMP_ALLOC to use them, or introduce a new scheme. Memory`
Packit	5c3484	`blocks wanted unconditionally are easy enough, those wanted only`
Packit	5c3484	`sometimes are a problem. Perhaps a special size calculation to ask for a`
Packit	5c3484	`dummy length 1 when unwanted, or perhaps an inlined subroutine`
Packit	5c3484	`duplicating code under each conditional. Don't really want to turn`
Packit	5c3484	`everything into a dog's dinner just because Cray don't offer an`
Packit	5c3484	`alloca.`
Packit	5c3484	`Cray: mpn_get_str on power-of-2 bases ought to vectorize.`
Packit	5c3484	`Does it? bits_per_digit and the inner loop over bits in a`
Packit	5c3484	`limb might prevent it. Perhaps special cases for binary, octal and hex`
Packit	5c3484	`would be worthwhile (very possibly for all processors too).`
Packit	5c3484	`S390: BSWAP_LIMB_FETCH looks like it could be done with`
Packit	5c3484	`lrvg, as per glibc sysdeps/s390/s390-64/bits/byteswap.h.`
Packit	5c3484	`This is only for 64-bit mode or something is it, since 32-bit mode has`
Packit	5c3484	`other code? Also, is it worth using for BSWAP_LIMB too, or`
Packit	5c3484	`would that mean a store and re-fetch? Presumably that's what comes out`
Packit	5c3484	`in glibc.`
Packit	5c3484	`Improve count_leading_zeros for 64-bit machines:`
Packit	5c3484
Packit	5c3484	`if ((x &gt&gt 32) == 0) { x &lt&lt= 32; cnt += 32; }`
Packit	5c3484	`if ((x &gt&gt 48) == 0) { x &lt&lt= 16; cnt += 16; }`
Packit	5c3484	`...`
Packit	5c3484	`IRIX 6 MIPSpro compiler has an __inline which could perhaps`
Packit	5c3484	`be used in __GMP_EXTERN_INLINE. What would be the right way`
Packit	5c3484	`to identify suitable versions of that compiler?`
Packit	5c3484	`IRIX cc is rumoured to have an _int_mult_upper`
Packit	5c3484	`(in <intrinsics.h> like Cray), but it didn't seem to`
Packit	5c3484	`exist on some IRIX 6.5 systems tried. If it does actually exist`
Packit	5c3484	`somewhere it would very likely be an improvement over a function call to`
Packit	5c3484	`umul.asm.`
Packit	5c3484	`mpn_get_str final divisions by the base with`
Packit	5c3484	`udiv_qrnd_unnorm could use some sort of multiply-by-inverse`
Packit	5c3484	`on suitable machines. This ends up happening for decimal by presenting`
Packit	5c3484	`the compiler with a run-time constant, but the same for other bases would`
Packit	5c3484	`be good. Perhaps use could be made of the fact base<256.`
Packit	5c3484	`mpn_umul_ppmm, mpn_udiv_qrnnd: Return a`
Packit	5c3484	`structure like div_t to avoid going through memory, in`
Packit	5c3484	`particular helping RISCs that don't do store-to-load forwarding. Clearly`
Packit	5c3484	`this is only possible if the ABI returns a structure of two`
Packit	5c3484	`mp_limb_ts in registers.`
Packit	5c3484
Packit	5c3484	`On PowerPC, structures are returned in memory on AIX and Darwin. In SVR4`
Packit	5c3484	`they're returned in registers, except that draft SVR4 had said memory, so`
Packit	5c3484	`it'd be prudent to check which is done. We can jam the compiler into the`
Packit	5c3484	`right mode if we know how, since all this is purely internal to libgmp.`
Packit	5c3484	`(gcc has an option, though of course gcc doesn't matter since we use`
Packit	5c3484	`inline asm there.)`
Packit	5c3484
Packit	5c3484
Packit	5c3484	`New Functionality`
Packit	5c3484
Packit	5c3484	`Maybe add mpz_crr (Chinese Remainder Reconstruction).`
Packit	5c3484	Let `0b' and `0B' mean binary input everywhere.
Packit	5c3484	`mpz_init and mpq_init could do lazy allocation.`
Packit	5c3484	`Set ALLOC(var) to 0 to indicate nothing allocated, and let`
Packit	5c3484	`_mpz_realloc do the initial alloc. Set`
Packit	5c3484	`z->_mp_d to a dummy that mpz_get_ui and`
Packit	5c3484	`similar can unconditionally fetch from. Niels Möller has had a go at`
Packit	5c3484	`this.`
Packit	5c3484
Packit	5c3484	`The advantages of the lazy scheme would be:`
Packit	5c3484
Packit	5c3484	`Initial allocate would be the size required for the first value`
Packit	5c3484	`stored, rather than getting 1 limb in mpz_init and then`
Packit	5c3484	`more or less immediately reallocating.`
Packit	5c3484	`mpz_init would only store magic values in the`
Packit	5c3484	`mpz_t fields, and could be inlined.`
Packit	5c3484	`A fixed initializer could even be used by applications, like`
Packit	5c3484	`mpz_t z = MPZ_INITIALIZER;, which might be convenient`
Packit	5c3484	`for globals.`
Packit	5c3484
Packit	5c3484	`The advantages of the current scheme are:`
Packit	5c3484
Packit	5c3484	`mpz_set_ui and other similar routines needn't check the`
Packit	5c3484	`size allocated and can just store unconditionally.`
Packit	5c3484	`mpz_set_ui and perhaps others like`
Packit	5c3484	`mpz_tdiv_r_ui and a prospective`
Packit	5c3484	`mpz_set_ull could be inlined.`
Packit	5c3484
Packit	5c3484	`Add mpf_out_raw and mpf_inp_raw. Make sure`
Packit	5c3484	`format is portable between 32-bit and 64-bit machines, and between`
Packit	5c3484	`little-endian and big-endian machines. A format which MPFR can use too`
Packit	5c3484	`would be good.`
Packit	5c3484	`mpn_and_n ... mpn_copyd: Perhaps make the mpn`
Packit	5c3484	`logops and copys available in gmp.h, either as library functions or`
Packit	5c3484	`inlines, with the availability of library functions instantiated in the`
Packit	5c3484	`generated gmp.h at build time.`
Packit	5c3484	`mpz_set_str etc variants taking string lengths rather than`
Packit	5c3484	`null-terminators.`
Packit	5c3484	`mpz_andn, mpz_iorn, mpz_nand,`
Packit	5c3484	`mpz_nior, mpz_xnor might be useful additions,`
Packit	5c3484	`if they could share code with the current such functions (which should be`
Packit	5c3484	`possible).`
Packit	5c3484	`mpz_and_ui etc might be of use sometimes. Suggested by`
Packit	5c3484	`Niels Möller.`
Packit	5c3484	`mpf_set_str and mpf_inp_str could usefully`
Packit	5c3484	`accept 0x, 0b etc when base==0. Perhaps the exponent could default to`
Packit	5c3484	`decimal in this case, with a further 0x, 0b etc allowed there.`
Packit	5c3484	`Eg. 0xFFAA@0x5A. A leading "0" for octal would match the integers, but`
Packit	5c3484	`probably something like "0.123" ought not mean octal.`
Packit	5c3484	`GMP_LONG_LONG_LIMB or some such could become a documented`
Packit	5c3484	`feature of gmp.h, so applications could know whether to`
Packit	5c3484	`printf a limb using %lu or %Lu.`
Packit	5c3484	`GMP_PRIdMP_LIMB and similar defines following C99`
Packit	5c3484	`<inttypes.h> might be of use to applications printing limbs. But`
Packit	5c3484	`if GMP_LONG_LONG_LIMB or whatever is added then perhaps this`
Packit	5c3484	`can easily enough be left to applications.`
Packit	5c3484	`gmp_printf could accept %b for binary output.`
Packit	5c3484	`It'd be nice if it worked for plain int etc too, not just`
Packit	5c3484	`mpz_t etc.`
Packit	5c3484	`gmp_printf in fact could usefully accept an arbitrary base,`
Packit	5c3484	`for both integer and float conversions. A base either in the format`
Packit	5c3484	`string or as a parameter with * should be allowed. Maybe`
Packit	5c3484	`&13b (b for base) or something like that.`
Packit	5c3484	`gmp_printf could perhaps accept mpq_t for float`
Packit	5c3484	`conversions, eg. "%.4Qf". This would be merely for`
Packit	5c3484	`convenience, but still might be useful. Rounding would be the same as`
Packit	5c3484	`for an mpf_t (ie. currently round-to-nearest, but not`
Packit	5c3484	`actually documented). Alternately, perhaps a separate`
Packit	5c3484	`mpq_get_str_point or some such might be more use. Suggested`
Packit	5c3484	`by Pedro Gimeno.`
Packit	5c3484	`mpz_rscan0 or mpz_revscan0 or some such`
Packit	5c3484	`searching towards the low end of an integer might match`
Packit	5c3484	`mpz_scan0 nicely. Likewise for scan1.`
Packit	5c3484	`Suggested by Roberto Bagnara.`
Packit	5c3484	`mpz_bit_subset or some such to test whether one integer is a`
Packit	5c3484	`bitwise subset of another might be of use. Some sort of return value`
Packit	5c3484	`indicating whether it's a proper or non-proper subset would be good and`
Packit	5c3484	`wouldn't cost anything in the implementation. Suggested by Roberto`
Packit	5c3484	`Bagnara.`
Packit	5c3484	`mpf_get_ld, mpf_set_ld: Conversions between`
Packit	5c3484	`mpf_t and long double, suggested by Dan`
Packit	5c3484	`Christensen. Other long double routines might be desirable`
Packit	5c3484	`too, but mpf would be a start.`
Packit	5c3484
Packit	5c3484	`long double is an ANSI-ism, so everything involving it would`
Packit	5c3484	`need to be suppressed on a K&R compiler.`
Packit	5c3484
Packit	5c3484	`There'd be some work to be done by configure to recognise`
Packit	5c3484	`the format in use, MPFR has a start on this. Often long`
Packit	5c3484	`double is the same as double, which is easy but`
Packit	5c3484	`pretty pointless. A single float format detector macro could look at`
Packit	5c3484	`double then long double`
Packit	5c3484
Packit	5c3484	`Sometimes there's a compiler option for the size of a long`
Packit	5c3484	`double, eg. xlc on AIX can use either 64-bit or 128-bit. It's`
Packit	5c3484	`probably simplest to regard this as a compiler compatibility issue, and`
Packit	5c3484	`leave it to users or sysadmins to ensure application and library code is`
Packit	5c3484	`built the same.`
Packit	5c3484	`mpz_sqrt_if_perfect_square: When`
Packit	5c3484	`mpz_perfect_square_p does its tests it calculates a square`
Packit	5c3484	`root and then discards it. For some applications it might be useful to`
Packit	5c3484	`return that root. Suggested by Jason Moxham.`
Packit	5c3484	`mpz_get_ull, mpz_set_ull,`
Packit	5c3484	`mpz_get_sll, mpz_get_sll: Conversions for`
Packit	5c3484	`long long. These would aid interoperability, though a`
Packit	5c3484	`mixture of GMP and long long would probably not be too`
Packit	5c3484	`common. Since long long is not always available (it's in`
Packit	5c3484	`C99 and GCC though), disadvantages of using long long in`
Packit	5c3484	`libgmp.a would be`
Packit	5c3484
Packit	5c3484	`Library contents vary according to the build compiler.`
Packit	5c3484	`gmp.h would need an ugly #ifdef block to decide if the`
Packit	5c3484	`application compiler could take the long long`
Packit	5c3484	`prototypes.`
Packit	5c3484	`Some sort of LIBGMP_HAS_LONGLONG might be wanted to`
Packit	5c3484	`indicate whether the functions are available. (Applications using`
Packit	5c3484	`autoconf could probe the library too.)`
Packit	5c3484
Packit	5c3484	`It'd be possible to defer the need for long long to`
Packit	5c3484	`application compile time, by having something like`
Packit	5c3484	`mpz_set_2ui called with two halves of a long`
Packit	5c3484	`long. Disadvantages of this would be,`
Packit	5c3484
Packit	5c3484	`Bigger code in the application, though perhaps not if a long`
Packit	5c3484	`long is normally passed as two halves anyway.`
Packit	5c3484	`mpz_get_ull would be a rather big inline, or would have`
Packit	5c3484	`to be two function calls.`
Packit	5c3484	`mpz_get_sll would be a worse inline, and would put the`
Packit	5c3484	`treatment of -0x10..00 into applications (see`
Packit	5c3484	`mpz_get_si correctness above).`
Packit	5c3484	`Although having libgmp.a independent of the build compiler is nice,`
Packit	5c3484	`it sort of sacrifices the capabilities of a good compiler to`
Packit	5c3484	`uniformity with inferior ones.`
Packit	5c3484
Packit	5c3484	`Plain use of long long is probably the lesser evil, if only`
Packit	5c3484	`because it makes best use of gcc. In fact perhaps it would suffice to`
Packit	5c3484	`guarantee long long conversions only when using GCC for both`
Packit	5c3484	`application and library. That would cover free software, and we can`
Packit	5c3484	`worry about selected vendor compilers later.`
Packit	5c3484
Packit	5c3484	`In C++ the situation is probably clearer, we demand fairly recent C++ so`
Packit	5c3484	`long long should be available always. We'd probably prefer`
Packit	5c3484	`to have the C and C++ the same in respect of long long`
Packit	5c3484	`support, but it would be possible to have it unconditionally in gmpxx.h,`
Packit	5c3484	`by some means or another.`
Packit	5c3484	`mpz_strtoz parsing the same as strtol.`
Packit	5c3484	`Suggested by Alexander Kruppa.`
Packit	5c3484
Packit	5c3484
Packit	5c3484
Packit	5c3484	`Configuration`
Packit	5c3484
Packit	5c3484
Packit	5c3484	`Alpha ev7, ev79: Add code to config.guess to detect these. Believe ev7`
Packit	5c3484	`will be "3-1307" in the current switch, but need to verify that. (On`
Packit	5c3484	`OSF, current configfsf.guess identifies ev7 using psrinfo, we need to do`
Packit	5c3484	`it ourselves for other systems.)`
Packit	5c3484	`Alpha OSF: Libtool (version 1.5) doesn't seem to recognise this system is`
Packit	5c3484	`"pic always" and ends up running gcc twice with the same options. This`
Packit	5c3484	`is wasteful, but harmless. Perhaps a newer libtool will be better.`
Packit	5c3484	`ARM: umul_ppmm in longlong.h always uses umull,`
Packit	5c3484	`but is that available only for M series chips or some such? Perhaps it`
Packit	5c3484	`should be configured in some way.`
Packit	5c3484	`HPPA: config.guess should recognize 7000, 7100, 7200, and 8x00.`
Packit	5c3484	`HPPA: gcc 3.2 introduces a -mschedule=7200 etc parameter,`
Packit	5c3484	`which could be driven by an exact hppa cpu type.`
Packit	5c3484	`Mips: config.guess should say mipsr3000, mipsr4000, mipsr10000, etc.`
Packit	5c3484	`"hinv -c processor" gives lots of information on Irix. Standard`
Packit	5c3484	`config.guess appends "el" to indicate endianness, but`
Packit	5c3484	`AC_C_BIGENDIAN seems the best way to handle that for GMP.`
Packit	5c3484	`PowerPC: The function descriptor nonsense for AIX is currently driven by`
Packit	5c3484	`--aix*. It might be more reliable to do some sort of`
Packit	5c3484	`feature test, examining the compiler output perhaps. It might also be`
Packit	5c3484	`nice to merge the aix.m4 files into powerpc-defs.m4.`
Packit	5c3484	`config.m4 is generated only by the configure script, it won't be`
Packit	5c3484	`regenerated by config.status. Creating it as an AC_OUTPUT`
Packit	5c3484	`would work, but it might upset "make" to have things like L$`
Packit	5c3484	`get into the Makefiles through AC_SUBST.`
Packit	5c3484	`AC_CONFIG_COMMANDS would be the alternative. With some`
Packit	5c3484	`careful m4 quoting the changequote calls might not be`
Packit	5c3484	`needed, which might free up the order in which things had to be output.`
Packit	5c3484	`Automake: Latest automake has a CCAS, CCASFLAGS`
Packit	5c3484	`scheme. Though we probably wouldn't be using its assembler support we`
Packit	5c3484	`could try to use those variables in compatible ways.`
Packit	5c3484	`GMP_LDFLAGS could probably be done with plain`
Packit	5c3484	`LDFLAGS already used by automake for all linking. But with`
Packit	5c3484	`a bit of luck the next libtool will pass pretty much all`
Packit	5c3484	`CFLAGS through to the compiler when linking, making`
Packit	5c3484	`GMP_LDFLAGS unnecessary.`
Packit	5c3484	`mpn/Makeasm.am uses -c and -o together in the`
Packit	5c3484	`.S and .asm rules, but apparently that isn't completely portable (there's`
Packit	5c3484	`an autoconf AC_PROG_CC_C_O test for it). So far we've not`
Packit	5c3484	`had problems, but perhaps the rules could be rewritten to use "foo.s" as`
Packit	5c3484	`the temporary, or to do a suitable "mv" of the result. The only danger`
Packit	5c3484	`from using foo.s would be if a compile failed and the temporary foo.s`
Packit	5c3484	`then looked like the primary source. Hopefully if the`
Packit	5c3484	`SUFFIXES are ordered to have .S and .asm ahead of .s that`
Packit	5c3484	`wouldn't happen. Might need to check.`
Packit	5c3484
Packit	5c3484
Packit	5c3484
Packit	5c3484	`Random Numbers`
Packit	5c3484
Packit	5c3484	`_gmp_rand is not particularly fast on the linear`
Packit	5c3484	`congruential algorithm and could stand various improvements.`
Packit	5c3484
Packit	5c3484	`Make a second seed area within gmp_randstate_t (or`
Packit	5c3484	`_mp_algdata rather) to save some copying.`
Packit	5c3484	`Make a special case for a single limb 2exp modulus, to`
Packit	5c3484	`avoid mpn_mul calls. Perhaps the same for two limbs.`
Packit	5c3484	`Inline the lc code, to avoid a function call and`
Packit	5c3484	`TMP_ALLOC for every chunk.`
Packit	5c3484	`Perhaps the 2exp and general LC cases should be split,`
Packit	5c3484	`for clarity (if the general case is retained).`
Packit	5c3484
Packit	5c3484	`gmp_randstate_t used for parameters perhaps should become`
Packit	5c3484	`gmp_randstate_ptr the same as other types.`
Packit	5c3484	`Some of the empirical randomness tests could be included in a "make`
Packit	5c3484	`check". They ought to work everywhere, for a given seed at least.`
Packit	5c3484
Packit	5c3484
Packit	5c3484
Packit	5c3484	`C++`
Packit	5c3484
Packit	5c3484	`mpz_class(string), etc: Use the C++ global locale to`
Packit	5c3484	`identify whitespace.`
Packit	5c3484
Packit	5c3484	`mpf_class(string): Use the C++ global locale decimal point,`
Packit	5c3484	`rather than the C one.`
Packit	5c3484
Packit	5c3484	`Consider making these variant mpz_set_str etc forms`
Packit	5c3484	`available for mpz_t too, not just mpz_class`
Packit	5c3484	`etc.`
Packit	5c3484	`mpq_class operator+=: Don't emit an unnecessary`
Packit	5c3484	`mpq_set(q,q) before mpz_addmul etc.`
Packit	5c3484	`Put various bits of gmpxx.h into libgmpxx, to avoid excessive inlining.`
Packit	5c3484	`Candidates for this would be,`
Packit	5c3484
Packit	5c3484	`mpz_class(const char *), etc: since they're normally`
Packit	5c3484	`not fast anyway, and we can hide the exception throw.`
Packit	5c3484	`mpz_class(string), etc: to hide the cstr`
Packit	5c3484	`needed to get to the C conversion function.`
Packit	5c3484	`mpz_class string, char* etc constructors: likewise to`
Packit	5c3484	`hide the throws and conversions.`
Packit	5c3484	`mpz_class::get_str, etc: to hide the char*`
Packit	5c3484	`to string conversion and free. Perhaps`
Packit	5c3484	`mpz_get_str can write directly into a`
Packit	5c3484	`string, to avoid copying.`
Packit	5c3484
Packit	5c3484	`Consider making such string returning variants`
Packit	5c3484	`available for use with plain mpz_t etc too.`
Packit	5c3484
Packit	5c3484
Packit	5c3484
Packit	5c3484	`Miscellaneous`
Packit	5c3484
Packit	5c3484	`mpz_gcdext and mpn_gcdext ought to document`
Packit	5c3484	`what range of values the generated cofactors can take, and preferably`
Packit	5c3484	`ensure the definition uniquely specifies the cofactors for given inputs.`
Packit	5c3484	`A basic extended Euclidean algorithm or multi-step variant leads to`
Packit	5c3484	`\|x\|<\|b\| and \|y\|<\|a\| or something like that, but there's probably`
Packit	5c3484	`two solutions under just those restrictions.`
Packit	5c3484	`demos/factorize.c: use mpz_divisible_ui_p rather than`
Packit	5c3484	`mpz_tdiv_qr_ui. (Of course dividing multiple primes at a`
Packit	5c3484	`time would be better still.)`
Packit	5c3484	`The various test programs use quite a bit of the main`
Packit	5c3484	`libgmp. This establishes good cross-checks, but it might be`
Packit	5c3484	`better to use simple reference routines where possible. Where it's not`
Packit	5c3484	`possible some attention could be paid to the order of the tests, so a`
Packit	5c3484	`libgmp routine is only used for tests once it seems to be`
Packit	5c3484	`good.`
Packit	5c3484	`MUL_FFT_THRESHOLD etc: the FFT thresholds should allow a`
Packit	5c3484	`return to a previous k at certain sizes. This arises basically due to`
Packit	5c3484	`the step effect caused by size multiples effectively used for each k.`
Packit	5c3484	`Looking at a graph makes it fairly clear.`
Packit	5c3484	`__gmp_doprnt_mpf does a rather unattractive round-to-nearest`
Packit	5c3484	`on the string returned by mpf_get_str. Perhaps some variant`
Packit	5c3484	`of mpf_get_str could be made which would better suit.`
Packit	5c3484
Packit	5c3484
Packit	5c3484
Packit	5c3484	`Aids to Development`
Packit	5c3484
Packit	5c3484	`Add ASSERTs at the start of each user-visible mpz/mpq/mpf`
Packit	5c3484	`function to check the validity of each mp?_t parameter, in`
Packit	5c3484	`particular to check they've been mp?_inited. This might`
Packit	5c3484	`catch elementary mistakes in user programs. Care would need to be taken`
Packit	5c3484	`over MPZ_TMP_INITed variables used internally. If nothing`
Packit	5c3484	`else then consistency checks like size<=alloc, ptr not`
Packit	5c3484	`NULL and ptr+size not wrapping around the address space,`
Packit	5c3484	`would be possible. A more sophisticated scheme could track`
Packit	5c3484	`_mp_d pointers and ensure only a valid one is used. Such a`
Packit	5c3484	`scheme probably wouldn't be reentrant, not without some help from the`
Packit	5c3484	`system.`
Packit	5c3484	`tune/time.c could try to determine at runtime whether`
Packit	5c3484	`getrusage and gettimeofday are reliable.`
Packit	5c3484	`Currently we pretend in configure that the dodgy m68k netbsd 1.4.1`
Packit	5c3484	`getrusage doesn't exist. If a test might take a long time`
Packit	5c3484	`to run then perhaps cache the result in a file somewhere.`
Packit	5c3484	`tune/time.c could choose the default precision based on the`
Packit	5c3484	`speed_unittime determined, independent of the method in use.`
Packit	5c3484	`Cray vector systems: CPU frequency could be determined from`
Packit	5c3484	`sysconf(_SC_CLK_TCK), since it seems to be clock cycle`
Packit	5c3484	`based. Is this true for all Cray systems? Would like some documentation`
Packit	5c3484	`or something to confirm.`
Packit	5c3484
Packit	5c3484
Packit	5c3484
Packit	5c3484	`Documentation`
Packit	5c3484
Packit	5c3484	`mpz_inp_str (etc) doesn't say when it stops reading digits.`
Packit	5c3484	`mpn_get_str isn't terribly clear about how many digits it`
Packit	5c3484	`produces. It'd probably be possible to say at most one leading zero,`
Packit	5c3484	`which is what both it and mpz_get_str currently do. But`
Packit	5c3484	`want to be careful not to bind ourselves to something that might not suit`
Packit	5c3484	`another implementation.`
Packit	5c3484	`va_arg doesn't do the right thing with mpz_t`
Packit	5c3484	`etc directly, but instead needs a pointer type like MP_INT*.`
Packit	5c3484	`It'd be good to show how to do this, but we'd either need to document`
Packit	5c3484	`mpz_ptr and friends, or perhaps fallback on something`
Packit	5c3484	`slightly nasty with void*.`
Packit	5c3484
Packit	5c3484
Packit	5c3484
Packit	5c3484	`Bright Ideas`
Packit	5c3484
Packit	5c3484	`The following may or may not be feasible, and aren't likely to get done in the`
Packit	5c3484	`near future, but are at least worth thinking about.`
Packit	5c3484
Packit	5c3484
Packit	5c3484	`Reorganize longlong.h so that we can inline the operations even for the`
Packit	5c3484	`system compiler. When there is no such compiler feature, make calls to`
Packit	5c3484	`stub functions. Write such stub functions for as many machines as`
Packit	5c3484	`possible.`
Packit	5c3484	`longlong.h could declare when it's using, or would like to use,`
Packit	5c3484	`mpn_umul_ppmm, and the corresponding umul.asm file could be`
Packit	5c3484	`included in libgmp only in that case, the same as is effectively done for`
Packit	5c3484	`__clz_tab. Likewise udiv.asm and perhaps cntlz.asm. This`
Packit	5c3484	`would only be a very small space saving, so perhaps not worth the`
Packit	5c3484	`complexity.`
Packit	5c3484	`longlong.h could be built at configure time by concatenating or`
Packit	5c3484	`#including fragments from each directory in the mpn path. This would`
Packit	5c3484	`select CPU specific macros the same way as CPU specific assembler code.`
Packit	5c3484	`Code used would no longer depend on cpp predefines, and the current`
Packit	5c3484	`nested conditionals could be flattened out.`
Packit	5c3484	`mpz_get_si returns 0x80000000 for -0x100000000, whereas it's`
Packit	5c3484	`sort of supposed to return the low 31 (or 63) bits. But this is`
Packit	5c3484	`undocumented, and perhaps not too important.`
Packit	5c3484	`mpz_init_set* and mpz_realloc could allocate`
Packit	5c3484	`say an extra 16 limbs over what's needed, so as to reduce the chance of`
Packit	5c3484	`having to do a reallocate if the mpz_t grows a bit more.`
Packit	5c3484	`This could only be an option, since it'd badly bloat memory usage in`
Packit	5c3484	`applications using many small values.`
Packit	5c3484	`mpq functions could perhaps check for numerator or`
Packit	5c3484	`denominator equal to 1, on the assumption that integers or`
Packit	5c3484	`denominator-only values might be expected to occur reasonably often.`
Packit	5c3484	`count_trailing_zeros is used on more or less uniformly`
Packit	5c3484	`distributed numbers in a couple of places. For some CPUs`
Packit	5c3484	`count_trailing_zeros is slow and it's probably worth handling`
Packit	5c3484	`the frequently occurring 0 to 2 trailing zeros cases specially.`
Packit	5c3484	`mpf_t might like to let the exponent be undefined when`
Packit	5c3484	`size==0, instead of requiring it 0 as now. It should be possible to do`
Packit	5c3484	`size==0 tests before paying attention to the exponent. The advantage is`
Packit	5c3484	`not needing to set exp in the various places a zero result can arise,`
Packit	5c3484	`which avoids some tedium but is otherwise perhaps not too important.`
Packit	5c3484	`Currently mpz_set_f and mpf_cmp_ui depend on`
Packit	5c3484	`exp==0, maybe elsewhere too.`
Packit	5c3484	`__gmp_allocate_func: Could use GCC __attribute__`
Packit	5c3484	`((malloc)) on this, though don't know if it'd do much. GCC 3.0`
Packit	5c3484	`allows that attribute on functions, but not function pointers (see info`
Packit	5c3484	`node "Attribute Syntax"), so would need a new autoconf test. This can`
Packit	5c3484	`wait until there's a GCC that supports it.`
Packit	5c3484	`mpz_add_ui contains two __GMPN_COPYs, one from`
Packit	5c3484	`mpn_add_1 and one from mpn_sub_1. If those two`
Packit	5c3484	`routines were opened up a bit maybe that code could be shared. When a`
Packit	5c3484	`copy needs to be done there's no carry to append for the add, and if the`
Packit	5c3484	`copy is non-empty no high zero for the sub.`
Packit	5c3484
Packit	5c3484
Packit	5c3484
Packit	5c3484	`Old and Obsolete Stuff`
Packit	5c3484
Packit	5c3484	`The following tasks apply to chips or systems that are old and/or obsolete.`
Packit	5c3484	`It's unlikely anything will be done about them unless anyone is actively using`
Packit	5c3484	`them.`
Packit	5c3484
Packit	5c3484
Packit	5c3484	`Sparc32: The integer based udiv_nfp.asm used to be selected by`
Packit	5c3484	`configure --nfp but that option is gone now that autoconf is`
Packit	5c3484	`used. The file could go somewhere suitable in the mpn search if any`
Packit	5c3484	`chips might benefit from it, though it's possible we don't currently`
Packit	5c3484	`differentiate enough exact cpu types to do this properly.`
Packit	5c3484	`VAX D and G format double floats are straightforward and`
Packit	5c3484	`could perhaps be handled directly in __gmp_extract_double`
Packit	5c3484	`and maybe in mpn_get_d, rather than falling back on the`
Packit	5c3484	`generic code. (Both formats are detected by configure.)`
Packit	5c3484
Packit	5c3484
Packit	5c3484
Packit	5c3484
Packit	5c3484
Packit	5c3484	`</body>`
Packit	5c3484	`</html>`
Packit	5c3484
Packit	5c3484
Packit	5c3484	`Local variables:`
Packit	5c3484	`eval: (add-hook 'write-file-hooks 'time-stamp)`
Packit	5c3484	`time-stamp-start: "This file current as of "`
Packit	5c3484	`time-stamp-format: "%:d %3b %:y"`
Packit	5c3484	`time-stamp-end: "\\."`
Packit	5c3484	`time-stamp-line-limit: 50`
Packit	5c3484	`End:`
Packit	5c3484	`-->`

source-git / gmp

Source Code

Blame doc/tasks.html

Correctness and Completeness

Machine Independent Optimization

Machine Dependent Optimization

New Functionality

Configuration

Random Numbers

C++

Miscellaneous

Aids to Development

Documentation

Bright Ideas

Old and Obsolete Stuff