|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
<html>
|
|
Packit |
5c3484 |
<head>
|
|
Packit |
5c3484 |
<title>GMP Itemized Development Tasks</title>
|
|
Packit |
5c3484 |
<link rel="shortcut icon" href="favicon.ico">
|
|
Packit |
5c3484 |
<link rel="stylesheet" href="gmp.css">
|
|
Packit |
5c3484 |
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
|
|
Packit |
5c3484 |
</head>
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
<center>
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
GMP Itemized Development Tasks
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
</center>
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
<font size=-1>
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
Copyright 2000-2004, 2006, 2008, 2009 Free Software Foundation, Inc.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
This file is part of the GNU MP Library.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
The GNU MP Library is free software; you can redistribute it and/or modify
|
|
Packit |
5c3484 |
it under the terms of either:
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
* the GNU Lesser General Public License as published by the Free
|
|
Packit |
5c3484 |
Software Foundation; either version 3 of the License, or (at your
|
|
Packit |
5c3484 |
option) any later version.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
or
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
* the GNU General Public License as published by the Free Software
|
|
Packit |
5c3484 |
Foundation; either version 2 of the License, or (at your option) any
|
|
Packit |
5c3484 |
later version.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
or both in parallel, as here.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
The GNU MP Library is distributed in the hope that it will be useful, but
|
|
Packit |
5c3484 |
WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
|
|
Packit |
5c3484 |
or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
|
|
Packit |
5c3484 |
for more details.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
You should have received copies of the GNU General Public License and the
|
|
Packit |
5c3484 |
GNU Lesser General Public License along with the GNU MP Library. If not,
|
|
Packit |
5c3484 |
see https://www.gnu.org/licenses/.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
</font>
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
This file current as of 29 Jan 2014. An up-to-date version is available at
|
|
Packit |
5c3484 |
https://gmplib.org/tasks.html.
|
|
Packit |
5c3484 |
Please send comments about this page to gmp-devel<font>@</font>gmplib.org.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
These are itemized GMP development tasks. Not all the tasks
|
|
Packit |
5c3484 |
listed here are suitable for volunteers, but many of them are.
|
|
Packit |
5c3484 |
Please see the projects file for more
|
|
Packit |
5c3484 |
sizeable projects.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
CAUTION: This file needs updating. Many of the tasks here have
|
|
Packit |
5c3484 |
either already been taken care of, or have become irrelevant.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
Correctness and Completeness
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
_LONG_LONG_LIMB in gmp.h is not namespace clean. Reported
|
|
Packit |
5c3484 |
by Patrick Pelissier.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
We sort of mentioned _LONG_LONG_LIMB in past releases, so
|
|
Packit |
5c3484 |
need to be careful about changing it. It used to be a define
|
|
Packit |
5c3484 |
applications had to set for long long limb systems, but that in
|
|
Packit |
5c3484 |
particular is no longer relevant now that it's established automatically.
|
|
Packit |
5c3484 |
The various reuse.c tests need to force reallocation by calling
|
|
Packit |
5c3484 |
_mpz_realloc with a small (1 limb) size.
|
|
Packit |
5c3484 |
One reuse case is missing from mpX/tests/reuse.c:
|
|
Packit |
5c3484 |
mpz_XXX(a,a,a) .
|
|
Packit |
5c3484 |
Make the string reading functions allow the `0x' prefix when the base is
|
|
Packit |
5c3484 |
explicitly 16. They currently only allow that prefix when the base is
|
|
Packit |
5c3484 |
unspecified (zero).
|
|
Packit |
5c3484 |
mpf_eq is not always correct, when one operand is
|
|
Packit |
5c3484 |
1000000000... and the other operand is 0111111111..., i.e., extremely
|
|
Packit |
5c3484 |
close. There is a special case in mpf_sub for this
|
|
Packit |
5c3484 |
situation; put similar code in mpf_eq . [In progress.]
|
|
Packit |
5c3484 |
mpf_eq doesn't implement what gmp.texi specifies. It should
|
|
Packit |
5c3484 |
not use just whole limbs, but partial limbs. [In progress.]
|
|
Packit |
5c3484 |
mpf_set_str doesn't validate it's exponent, for instance
|
|
Packit |
5c3484 |
garbage 123.456eX789X is accepted (and an exponent 0 used), and overflow
|
|
Packit |
5c3484 |
of a long is not detected.
|
|
Packit |
5c3484 |
mpf_add doesn't check for a carry from truncated portions of
|
|
Packit |
5c3484 |
the inputs, and in that respect doesn't implement the "infinite precision
|
|
Packit |
5c3484 |
followed by truncate" specified in the manual.
|
|
Packit |
5c3484 |
Windows DLLs: tests/mpz/reuse.c and tests/mpf/reuse.c initialize global
|
|
Packit |
5c3484 |
variables with pointers to mpz_add etc, which doesn't work
|
|
Packit |
5c3484 |
when those routines are coming from a DLL (because they're effectively
|
|
Packit |
5c3484 |
function pointer global variables themselves). Need to rearrange perhaps
|
|
Packit |
5c3484 |
to a set of calls to a test function rather than iterating over an array.
|
|
Packit |
5c3484 |
mpz_pow_ui : Detect when the result would be more memory than
|
|
Packit |
5c3484 |
a size_t can represent and raise some suitable exception,
|
|
Packit |
5c3484 |
probably an alloc call asking for SIZE_T_MAX , and if that
|
|
Packit |
5c3484 |
somehow succeeds then an abort . Various size overflows of
|
|
Packit |
5c3484 |
this kind are not handled gracefully, probably resulting in segvs.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
In mpz_n_pow_ui , detect when the count of low zero bits
|
|
Packit |
5c3484 |
exceeds an unsigned long . There's a (small) chance of this
|
|
Packit |
5c3484 |
happening but still having enough memory to represent the value.
|
|
Packit |
5c3484 |
Reported by Winfried Dreckmann in for instance mpz_ui_pow_ui (x,
|
|
Packit |
5c3484 |
4UL, 1431655766UL).
|
|
Packit |
5c3484 |
mpf : Detect exponent overflow and raise some exception.
|
|
Packit |
5c3484 |
It'd be nice to allow the full mp_exp_t range since that's
|
|
Packit |
5c3484 |
how it's been in the past, but maybe dropping one bit would make it
|
|
Packit |
5c3484 |
easier to test if e1+e2 goes out of bounds.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
Machine Independent Optimization
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
mpf_cmp : For better cache locality, don't test for low zero
|
|
Packit |
5c3484 |
limbs until the high limbs fail to give an ordering. Reduce code size by
|
|
Packit |
5c3484 |
turning the three mpn_cmp 's into a single loop stopping when
|
|
Packit |
5c3484 |
the end of one operand is reached (and then looking for a non-zero in the
|
|
Packit |
5c3484 |
rest of the other).
|
|
Packit |
5c3484 |
mpf_mul_2exp , mpf_div_2exp : The use of
|
|
Packit |
5c3484 |
mpn_lshift for any size<=prec means repeated
|
|
Packit |
5c3484 |
mul_2exp and div_2exp calls accumulate low zero
|
|
Packit |
5c3484 |
limbs until size==prec+1 is reached. Those zeros will slow down
|
|
Packit |
5c3484 |
subsequent operations, especially if the value is otherwise only small.
|
|
Packit |
5c3484 |
If low bits of the low limb are zero, use mpn_rshift so as
|
|
Packit |
5c3484 |
to not increase the size.
|
|
Packit |
5c3484 |
mpn_dc_sqrtrem , mpn_sqrtrem2 : Don't use
|
|
Packit |
5c3484 |
mpn_add_1 and mpn_sub_1 for 1 limb operations,
|
|
Packit |
5c3484 |
instead ADDC_LIMB and SUBC_LIMB .
|
|
Packit |
5c3484 |
mpn_sqrtrem2 : Use plain variables for sp[0] and
|
|
Packit |
5c3484 |
rp[0] calculations, so the compiler needn't worry about
|
|
Packit |
5c3484 |
aliasing between sp and rp .
|
|
Packit |
5c3484 |
mpn_sqrtrem : Some work can be saved in the last step when
|
|
Packit |
5c3484 |
the remainder is not required, as noted in Paul's paper.
|
|
Packit |
5c3484 |
mpq_add , mpq_sub : The gcd fits a single limb
|
|
Packit |
5c3484 |
with high probability and in this case binvert_limb could
|
|
Packit |
5c3484 |
be used to calculate the inverse just once for the two exact divisions
|
|
Packit |
5c3484 |
"op1.den / gcd" and "op2.den / gcd", rather than letting
|
|
Packit |
5c3484 |
mpn_bdiv_q_1 do it each time. This would require calling
|
|
Packit |
5c3484 |
mpn_pi1_bdiv_q_1 .
|
|
Packit |
5c3484 |
mpn_gcdext : Don't test count_leading_zeros for
|
|
Packit |
5c3484 |
zero, instead check the high bit of the operand and avoid invoking
|
|
Packit |
5c3484 |
count_leading_zeros . This is an optimization on all
|
|
Packit |
5c3484 |
machines, and significant on machines with slow
|
|
Packit |
5c3484 |
count_leading_zeros , though it's possible an already
|
|
Packit |
5c3484 |
normalized operand might not be encountered very often.
|
|
Packit |
5c3484 |
Rewrite umul_ppmm to use floating-point for generating the
|
|
Packit |
5c3484 |
most significant limb (if GMP_LIMB_BITS <= 52 bits).
|
|
Packit |
5c3484 |
(Peter Montgomery has some ideas on this subject.)
|
|
Packit |
5c3484 |
Improve the default umul_ppmm code in longlong.h: Add partial
|
|
Packit |
5c3484 |
products with fewer operations.
|
|
Packit |
5c3484 |
Consider inlining mpz_set_ui . This would be both small and
|
|
Packit |
5c3484 |
fast, especially for compile-time constants, but would make application
|
|
Packit |
5c3484 |
binaries depend on having 1 limb allocated to an mpz_t ,
|
|
Packit |
5c3484 |
preventing the "lazy" allocation scheme below.
|
|
Packit |
5c3484 |
Consider inlining mpz_[cft]div_ui and maybe
|
|
Packit |
5c3484 |
mpz_[cft]div_r_ui . A __gmp_divide_by_zero
|
|
Packit |
5c3484 |
would be needed for the divide by zero test, unless that could be left to
|
|
Packit |
5c3484 |
mpn_mod_1 (not sure currently whether all the risc chips
|
|
Packit |
5c3484 |
provoke the right exception there if using mul-by-inverse).
|
|
Packit |
5c3484 |
Consider inlining: mpz_fits_s*_p . The setups for
|
|
Packit |
5c3484 |
LONG_MAX etc would need to go into gmp.h, and on Cray it
|
|
Packit |
5c3484 |
might, unfortunately, be necessary to forcibly include <limits.h>
|
|
Packit |
5c3484 |
since there's no apparent way to get SHRT_MAX with an
|
|
Packit |
5c3484 |
expression (since short and unsigned short can
|
|
Packit |
5c3484 |
be different sizes).
|
|
Packit |
5c3484 |
mpz_powm and mpz_powm_ui aren't very fast on one
|
|
Packit |
5c3484 |
or two limb moduli, due to a lot of function call overheads. These could
|
|
Packit |
5c3484 |
perhaps be handled as special cases.
|
|
Packit |
5c3484 |
Make sure mpz_powm_ui is never slower than the corresponding
|
|
Packit |
5c3484 |
computation using mpz_powm .
|
|
Packit |
5c3484 |
mpz_powm REDC should do multiplications by g[]
|
|
Packit |
5c3484 |
using the division method when they're small, since the REDC form of a
|
|
Packit |
5c3484 |
small multiplier is normally a full size product. Probably would need a
|
|
Packit |
5c3484 |
new tuned parameter to say what size multiplier is "small", as a function
|
|
Packit |
5c3484 |
of the size of the modulus.
|
|
Packit |
5c3484 |
mpn_gcd might be able to be sped up on small to moderate
|
|
Packit |
5c3484 |
sizes by improving find_a , possibly just by providing an
|
|
Packit |
5c3484 |
alternate implementation for CPUs with slowish
|
|
Packit |
5c3484 |
count_leading_zeros .
|
|
Packit |
5c3484 |
mpf_set_str produces low zero limbs when a string has a
|
|
Packit |
5c3484 |
fraction but is exactly representable, eg. 0.5 in decimal. These could be
|
|
Packit |
5c3484 |
stripped to save work in later operations.
|
|
Packit |
5c3484 |
mpz_and , mpz_ior and mpz_xor should
|
|
Packit |
5c3484 |
use mpn_and_n etc for the benefit of the small number of
|
|
Packit |
5c3484 |
targets with native versions of those routines. Need to be careful not to
|
|
Packit |
5c3484 |
pass size==0. Is some code sharing possible between the mpz
|
|
Packit |
5c3484 |
routines?
|
|
Packit |
5c3484 |
mpf_add : Don't do a copy to avoid overlapping operands
|
|
Packit |
5c3484 |
unless it's really necessary (currently only sizes are tested, not
|
|
Packit |
5c3484 |
whether r really is u or v).
|
|
Packit |
5c3484 |
mpf_add : Under the check for v having no effect on the
|
|
Packit |
5c3484 |
result, perhaps test for r==u and do nothing in that case, rather than
|
|
Packit |
5c3484 |
currently it looks like an MPN_COPY_INCR will be done to
|
|
Packit |
5c3484 |
reduce prec+1 limbs to prec.
|
|
Packit |
5c3484 |
mpf_div_ui : Instead of padding with low zeros, call
|
|
Packit |
5c3484 |
mpn_divrem_1 asking for fractional quotient limbs.
|
|
Packit |
5c3484 |
mpf_div_ui : Eliminate TMP_ALLOC . When r!=u
|
|
Packit |
5c3484 |
there's no overlap and the division can be called on those operands.
|
|
Packit |
5c3484 |
When r==u and is prec+1 limbs, then it's an in-place division. If r==u
|
|
Packit |
5c3484 |
and not prec+1 limbs, then move the available limbs up to prec+1 and do
|
|
Packit |
5c3484 |
an in-place there.
|
|
Packit |
5c3484 |
mpf_div_ui : Whether the high quotient limb is zero can be
|
|
Packit |
5c3484 |
determined by testing the dividend for high<divisor. When non-zero, the
|
|
Packit |
5c3484 |
division can be done on prec dividend limbs instead of prec+1. The result
|
|
Packit |
5c3484 |
size is also known before the division, so that can be a tail call (once
|
|
Packit |
5c3484 |
the TMP_ALLOC is eliminated).
|
|
Packit |
5c3484 |
mpn_divrem_2 could usefully accept unnormalized divisors and
|
|
Packit |
5c3484 |
shift the dividend on-the-fly, since this should cost nothing on
|
|
Packit |
5c3484 |
superscalar processors and avoid the need for temporary copying in
|
|
Packit |
5c3484 |
mpn_tdiv_qr .
|
|
Packit |
5c3484 |
mpf_sqrt : If r!=u, and if u doesn't need to be padded with
|
|
Packit |
5c3484 |
zeros, then there's no need for the tp temporary.
|
|
Packit |
5c3484 |
mpq_cmp_ui could form the num1*den2 and
|
|
Packit |
5c3484 |
num2*den1 products limb-by-limb from high to low and look at
|
|
Packit |
5c3484 |
each step for values differing by more than the possible carry bit from
|
|
Packit |
5c3484 |
the uncalculated portion.
|
|
Packit |
5c3484 |
mpq_cmp could do the same high-to-low progressive multiply
|
|
Packit |
5c3484 |
and compare. The benefits of karatsuba and higher multiplication
|
|
Packit |
5c3484 |
algorithms are lost, but if it's assumed only a few high limbs will be
|
|
Packit |
5c3484 |
needed to determine an order then that's fine.
|
|
Packit |
5c3484 |
mpn_add_1 , mpn_sub_1 , mpn_add ,
|
|
Packit |
5c3484 |
mpn_sub : Internally use __GMPN_ADD_1 etc
|
|
Packit |
5c3484 |
instead of the functions, so they get inlined on all compilers, not just
|
|
Packit |
5c3484 |
gcc and others with inline recognised in gmp.h.
|
|
Packit |
5c3484 |
__GMPN_ADD_1 etc are meant mostly to support application
|
|
Packit |
5c3484 |
inline mpn_add_1 etc and if they don't come out good for
|
|
Packit |
5c3484 |
internal uses then special forms can be introduced, for instance many
|
|
Packit |
5c3484 |
internal uses are in-place. Sometimes a block of code is executed based
|
|
Packit |
5c3484 |
on the carry-out, rather than using it arithmetically, and those places
|
|
Packit |
5c3484 |
might want to do their own loops entirely.
|
|
Packit |
5c3484 |
__gmp_extract_double on 64-bit systems could use just one
|
|
Packit |
5c3484 |
bitfield for the mantissa extraction, not two, when endianness permits.
|
|
Packit |
5c3484 |
Might depend on the compiler allowing long long bit fields
|
|
Packit |
5c3484 |
when that's the only actual 64-bit type.
|
|
Packit |
5c3484 |
tal-notreent.c could keep a block of memory permanently allocated.
|
|
Packit |
5c3484 |
Currently the last nested TMP_FREE releases all memory, so
|
|
Packit |
5c3484 |
there's an allocate and free every time a top-level function using
|
|
Packit |
5c3484 |
TMP is called. Would need
|
|
Packit |
5c3484 |
mp_set_memory_functions to tell tal-notreent.c to release
|
|
Packit |
5c3484 |
any cached memory when changing allocation functions though.
|
|
Packit |
5c3484 |
__gmp_tmp_alloc from tal-notreent.c could be partially
|
|
Packit |
5c3484 |
inlined. If the current chunk has enough room then a couple of pointers
|
|
Packit |
5c3484 |
can be updated. Only if more space is required then a call to some sort
|
|
Packit |
5c3484 |
of __gmp_tmp_increase would be needed. The requirement that
|
|
Packit |
5c3484 |
TMP_ALLOC is an expression might make the implementation a
|
|
Packit |
5c3484 |
bit ugly and/or a bit sub-optimal.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
#define TMP_ALLOC(n)
|
|
Packit |
5c3484 |
((ROUND_UP(n) > current->end - current->point ?
|
|
Packit |
5c3484 |
__gmp_tmp_increase (ROUND_UP (n)) : 0),
|
|
Packit |
5c3484 |
current->point += ROUND_UP (n),
|
|
Packit |
5c3484 |
current->point - ROUND_UP (n))
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
__mp_bases has a lot of data for bases which are pretty much
|
|
Packit |
5c3484 |
never used. Perhaps the table should just go up to base 16, and have
|
|
Packit |
5c3484 |
code to generate data above that, if and when required. Naturally this
|
|
Packit |
5c3484 |
assumes the code would be smaller than the data saved.
|
|
Packit |
5c3484 |
__mp_bases field big_base_inverted is only used
|
|
Packit |
5c3484 |
if USE_PREINV_DIVREM_1 is true, and could be omitted
|
|
Packit |
5c3484 |
otherwise, to save space.
|
|
Packit |
5c3484 |
mpz_get_str , mtox : For power-of-2 bases, which
|
|
Packit |
5c3484 |
are of course fast, it seems a little silly to make a second pass over
|
|
Packit |
5c3484 |
the mpn_get_str output to convert to ASCII. Perhaps combine
|
|
Packit |
5c3484 |
that with the bit extractions.
|
|
Packit |
5c3484 |
mpz_gcdext : If the caller requests only the S cofactor (of
|
|
Packit |
5c3484 |
A), and A<B, then the code ends up generating the cofactor T (of B) and
|
|
Packit |
5c3484 |
deriving S from that. Perhaps it'd be possible to arrange to get S in
|
|
Packit |
5c3484 |
the first place by calling mpn_gcdext with A+B,B. This
|
|
Packit |
5c3484 |
might only be an advantage if A and B are about the same size.
|
|
Packit |
5c3484 |
mpz_n_pow_ui does a good job with small bases and stripping
|
|
Packit |
5c3484 |
powers of 2, but it's perhaps a bit too complicated for what it gains.
|
|
Packit |
5c3484 |
The simpler mpn_pow_1 is a little faster on small exponents.
|
|
Packit |
5c3484 |
(Note some of the ugliness in mpz_n_pow_ui is due to
|
|
Packit |
5c3484 |
supporting mpn_mul_2 .)
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
Perhaps the stripping of 2s in mpz_n_pow_ui should be
|
|
Packit |
5c3484 |
confined to single limb operands for simplicity and since that's where
|
|
Packit |
5c3484 |
the greatest gain would be.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
Ideally mpn_pow_1 and mpz_n_pow_ui would be
|
|
Packit |
5c3484 |
merged. The reason mpz_n_pow_ui writes to an
|
|
Packit |
5c3484 |
mpz_t is that its callers leave it to make a good estimate
|
|
Packit |
5c3484 |
of the result size. Callers of mpn_pow_1 already know the
|
|
Packit |
5c3484 |
size by separate means (mp_bases ).
|
|
Packit |
5c3484 |
mpz_invert should call mpn_gcdext directly.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
Machine Dependent Optimization
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
invert_limb on various processors might benefit from the
|
|
Packit |
5c3484 |
little Newton iteration done for alpha and ia64.
|
|
Packit |
5c3484 |
Alpha 21264: mpn_addlsh1_n could be implemented with
|
|
Packit |
5c3484 |
mpn_addmul_1 , since that code at 3.5 is a touch faster than
|
|
Packit |
5c3484 |
a separate lshift and add_n at
|
|
Packit |
5c3484 |
1.75+2.125=3.875. Or very likely some specific addlsh1_n
|
|
Packit |
5c3484 |
code could beat both.
|
|
Packit |
5c3484 |
Alpha 21264: Improve feed-in code for mpn_mul_1 ,
|
|
Packit |
5c3484 |
mpn_addmul_1 , and mpn_submul_1 .
|
|
Packit |
5c3484 |
Alpha 21164: Rewrite mpn_mul_1 , mpn_addmul_1 ,
|
|
Packit |
5c3484 |
and mpn_submul_1 for the 21164. This should use both integer
|
|
Packit |
5c3484 |
multiplies and floating-point multiplies. For the floating-point
|
|
Packit |
5c3484 |
operations, the single-limb multiplier should be split into three 21-bit
|
|
Packit |
5c3484 |
chunks, or perhaps even better in four 16-bit chunks. Probably possible
|
|
Packit |
5c3484 |
to reach 9 cycles/limb.
|
|
Packit |
5c3484 |
Alpha: GCC 3.4 will introduce __builtin_ctzl ,
|
|
Packit |
5c3484 |
__builtin_clzl and __builtin_popcountl using
|
|
Packit |
5c3484 |
the corresponding CIX ct instructions, and
|
|
Packit |
5c3484 |
__builtin_alpha_cmpbge . These should give GCC more
|
|
Packit |
5c3484 |
information about scheduling etc than the asm blocks
|
|
Packit |
5c3484 |
currently used in longlong.h and gmp-impl.h.
|
|
Packit |
5c3484 |
Alpha Unicos: Apparently there's no alloca on this system,
|
|
Packit |
5c3484 |
making configure choose the slower
|
|
Packit |
5c3484 |
malloc-reentrant allocation method. Is there a better way?
|
|
Packit |
5c3484 |
Maybe variable-length arrays per notes below.
|
|
Packit |
5c3484 |
Alpha Unicos 21164, 21264: .align is not used since it pads
|
|
Packit |
5c3484 |
with garbage. Does the code get the intended slotting required for the
|
|
Packit |
5c3484 |
claimed speeds? .align at the start of a function would
|
|
Packit |
5c3484 |
presumably be safe no matter how it pads.
|
|
Packit |
5c3484 |
ARM V5: count_leading_zeros can use the clz
|
|
Packit |
5c3484 |
instruction. For GCC 3.4 and up, do this via __builtin_clzl
|
|
Packit |
5c3484 |
since then gcc knows it's "predicable".
|
|
Packit |
5c3484 |
Itanium: GCC 3.4 introduces __builtin_popcount which can be
|
|
Packit |
5c3484 |
used instead of an asm block. The builtin should give gcc
|
|
Packit |
5c3484 |
more opportunities for scheduling, bundling and predication.
|
|
Packit |
5c3484 |
__builtin_ctz similarly (it just uses popcount as per
|
|
Packit |
5c3484 |
current longlong.h).
|
|
Packit |
5c3484 |
UltraSPARC/64: Optimize mpn_mul_1 , mpn_addmul_1 ,
|
|
Packit |
5c3484 |
for s2 < 2^32 (or perhaps for any zero 16-bit s2 chunk). Not sure how
|
|
Packit |
5c3484 |
much this can improve the speed, though, since the symmetry that we rely
|
|
Packit |
5c3484 |
on is lost. Perhaps we can just gain cycles when s2 < 2^16, or more
|
|
Packit |
5c3484 |
accurately, when two 16-bit s2 chunks which are 16 bits apart are zero.
|
|
Packit |
5c3484 |
UltraSPARC/64: Write native mpn_submul_1 , analogous to
|
|
Packit |
5c3484 |
mpn_addmul_1 .
|
|
Packit |
5c3484 |
UltraSPARC/64: Write umul_ppmm . Using four
|
|
Packit |
5c3484 |
"mulx "s either with an asm block or via the generic C code is
|
|
Packit |
5c3484 |
about 90 cycles. Try using fp operations, and also try using karatsuba
|
|
Packit |
5c3484 |
for just three "mulx "s.
|
|
Packit |
5c3484 |
UltraSPARC/32: Rewrite mpn_lshift , mpn_rshift .
|
|
Packit |
5c3484 |
Will give 2 cycles/limb. Trivial modifications of mpn/sparc64 should do.
|
|
Packit |
5c3484 |
UltraSPARC/32: Write special mpn_Xmul_1 loops for s2 < 2^16.
|
|
Packit |
5c3484 |
UltraSPARC/32: Use mulx for umul_ppmm if
|
|
Packit |
5c3484 |
possible (see commented out code in longlong.h). This is unlikely to
|
|
Packit |
5c3484 |
save more than a couple of cycles, so perhaps isn't worth bothering with.
|
|
Packit |
5c3484 |
UltraSPARC/32: On Solaris gcc doesn't give us __sparc_v9__
|
|
Packit |
5c3484 |
or anything to indicate V9 support when -mcpu=v9 is selected. See
|
|
Packit |
5c3484 |
gcc/config/sol2-sld-64.h. Will need to pass something through from
|
|
Packit |
5c3484 |
./configure to select the right code in longlong.h. (Currently nothing
|
|
Packit |
5c3484 |
is lost because mulx for multiplying is commented out.)
|
|
Packit |
5c3484 |
UltraSPARC/32: mpn_divexact_1 and
|
|
Packit |
5c3484 |
mpn_modexact_1c_odd can use a 64-bit inverse and take
|
|
Packit |
5c3484 |
64-bits at a time from the dividend, as per the 32-bit divisor case in
|
|
Packit |
5c3484 |
mpn/sparc64/mode1o.c. This must be done in assembler, since the full
|
|
Packit |
5c3484 |
64-bit registers (%gN ) are not available from C.
|
|
Packit |
5c3484 |
UltraSPARC/32: mpn_divexact_by3c can work 64-bits at a time
|
|
Packit |
5c3484 |
using mulx , in assembler. This would be the same as for
|
|
Packit |
5c3484 |
sparc64.
|
|
Packit |
5c3484 |
UltraSPARC: binvert_limb might save a few cycles from
|
|
Packit |
5c3484 |
masking down to just the useful bits at each point in the calculation,
|
|
Packit |
5c3484 |
since mulx speed depends on the highest bit set. Either
|
|
Packit |
5c3484 |
explicit masks or small types like short and
|
|
Packit |
5c3484 |
int ought to work.
|
|
Packit |
5c3484 |
Sparc64 HAL R1 popc : This chip reputedly implements
|
|
Packit |
5c3484 |
popc properly (see gcc sparc.md). Would need to recognise
|
|
Packit |
5c3484 |
it as sparchalr1 or something in configure / config.sub /
|
|
Packit |
5c3484 |
config.guess. popc_limb in gmp-impl.h could use this (per
|
|
Packit |
5c3484 |
commented out code). count_trailing_zeros could use it too.
|
|
Packit |
5c3484 |
PA64: Improve mpn_addmul_1 , mpn_submul_1 , and
|
|
Packit |
5c3484 |
mpn_mul_1 . The current code runs at 11 cycles/limb. It
|
|
Packit |
5c3484 |
should be possible to saturate the cache, which will happen at 8
|
|
Packit |
5c3484 |
cycles/limb (7.5 for mpn_mul_1). Write special loops for s2 < 2^32;
|
|
Packit |
5c3484 |
it should be possible to make them run at about 5 cycles/limb.
|
|
Packit |
5c3484 |
PPC601: See which of the power or powerpc32 code runs better. Currently
|
|
Packit |
5c3484 |
the powerpc32 is used, but only because it's the default for
|
|
Packit |
5c3484 |
powerpc* .
|
|
Packit |
5c3484 |
PPC630: Rewrite mpn_addmul_1 , mpn_submul_1 , and
|
|
Packit |
5c3484 |
mpn_mul_1 . Use both integer and floating-point operations,
|
|
Packit |
5c3484 |
possibly two floating-point and one integer limb per loop. Split operands
|
|
Packit |
5c3484 |
into four 16-bit chunks for fast fp operations. Should easily reach 9
|
|
Packit |
5c3484 |
cycles/limb (using one int + one fp), but perhaps even 7 cycles/limb
|
|
Packit |
5c3484 |
(using one int + two fp).
|
|
Packit |
5c3484 |
PPC630: mpn_rshift could do the same sort of unrolled loop
|
|
Packit |
5c3484 |
as mpn_lshift . Some judicious use of m4 might let the two
|
|
Packit |
5c3484 |
share source code, or with a register to control the loop direction
|
|
Packit |
5c3484 |
perhaps even share object code.
|
|
Packit |
5c3484 |
Implement mpn_mul_basecase and mpn_sqr_basecase
|
|
Packit |
5c3484 |
for important machines. Helping the generic sqr_basecase.c with an
|
|
Packit |
5c3484 |
mpn_sqr_diagonal might be enough for some of the RISCs.
|
|
Packit |
5c3484 |
POWER2/POWER2SC: Schedule mpn_lshift /mpn_rshift .
|
|
Packit |
5c3484 |
Will bring time from 1.75 to 1.25 cycles/limb.
|
|
Packit |
5c3484 |
X86: Optimize non-MMX mpn_lshift for shifts by 1. (See
|
|
Packit |
5c3484 |
Pentium code.)
|
|
Packit |
5c3484 |
X86: Good authority has it that in the past an inline rep
|
|
Packit |
5c3484 |
movs would upset GCC register allocation for the whole function.
|
|
Packit |
5c3484 |
Is this still true in GCC 3? It uses rep movs itself for
|
|
Packit |
5c3484 |
__builtin_memcpy . Examine the code for some simple and
|
|
Packit |
5c3484 |
complex functions to find out. Inlining rep movs would be
|
|
Packit |
5c3484 |
desirable, it'd be both smaller and faster.
|
|
Packit |
5c3484 |
Pentium P54: mpn_lshift and mpn_rshift can come
|
|
Packit |
5c3484 |
down from 6.0 c/l to 5.5 or 5.375 by paying attention to pairing after
|
|
Packit |
5c3484 |
shrdl and shldl , see mpn/x86/pentium/README.
|
|
Packit |
5c3484 |
Pentium P55 MMX: mpn_lshift and mpn_rshift
|
|
Packit |
5c3484 |
might benefit from some destination prefetching.
|
|
Packit |
5c3484 |
PentiumPro: mpn_divrem_1 might be able to use a
|
|
Packit |
5c3484 |
mul-by-inverse, hoping for maybe 30 c/l.
|
|
Packit |
5c3484 |
K7: mpn_lshift and mpn_rshift might be able to
|
|
Packit |
5c3484 |
do something branch-free for unaligned startups, and shaving one insn
|
|
Packit |
5c3484 |
from the loop with alternative indexing might save a cycle.
|
|
Packit |
5c3484 |
PPC32: Try using fewer registers in the current mpn_lshift .
|
|
Packit |
5c3484 |
The pipeline is now extremely deep, perhaps unnecessarily deep.
|
|
Packit |
5c3484 |
Fujitsu VPP: Vectorize main functions, perhaps in assembly language.
|
|
Packit |
5c3484 |
Fujitsu VPP: Write mpn_mul_basecase and
|
|
Packit |
5c3484 |
mpn_sqr_basecase . This should use a "vertical multiplication
|
|
Packit |
5c3484 |
method", to avoid carry propagation. splitting one of the operands in
|
|
Packit |
5c3484 |
11-bit chunks.
|
|
Packit |
5c3484 |
Pentium: mpn_lshift by 31 should use the special rshift
|
|
Packit |
5c3484 |
by 1 code, and vice versa mpn_rshift by 31 should use the
|
|
Packit |
5c3484 |
special lshift by 1. This would be best as a jump across to the other
|
|
Packit |
5c3484 |
routine, could let both live in lshift.asm and omit rshift.asm on finding
|
|
Packit |
5c3484 |
mpn_rshift already provided.
|
|
Packit |
5c3484 |
Cray T3E: Experiment with optimization options. In particular,
|
|
Packit |
5c3484 |
-hpipeline3 seems promising. We should at least up -O to -O2 or -O3.
|
|
Packit |
5c3484 |
Cray: mpn_com and mpn_and_n etc very probably
|
|
Packit |
5c3484 |
wants a pragma like MPN_COPY_INCR .
|
|
Packit |
5c3484 |
Cray vector systems: mpn_lshift , mpn_rshift ,
|
|
Packit |
5c3484 |
mpn_popcount and mpn_hamdist are nice and small
|
|
Packit |
5c3484 |
and could be inlined to avoid function calls.
|
|
Packit |
5c3484 |
Cray: Variable length arrays seem to be faster than the tal-notreent.c
|
|
Packit |
5c3484 |
scheme. Not sure why, maybe they merely give the compiler more
|
|
Packit |
5c3484 |
information about aliasing (or the lack thereof). Would like to modify
|
|
Packit |
5c3484 |
TMP_ALLOC to use them, or introduce a new scheme. Memory
|
|
Packit |
5c3484 |
blocks wanted unconditionally are easy enough, those wanted only
|
|
Packit |
5c3484 |
sometimes are a problem. Perhaps a special size calculation to ask for a
|
|
Packit |
5c3484 |
dummy length 1 when unwanted, or perhaps an inlined subroutine
|
|
Packit |
5c3484 |
duplicating code under each conditional. Don't really want to turn
|
|
Packit |
5c3484 |
everything into a dog's dinner just because Cray don't offer an
|
|
Packit |
5c3484 |
alloca .
|
|
Packit |
5c3484 |
Cray: mpn_get_str on power-of-2 bases ought to vectorize.
|
|
Packit |
5c3484 |
Does it? bits_per_digit and the inner loop over bits in a
|
|
Packit |
5c3484 |
limb might prevent it. Perhaps special cases for binary, octal and hex
|
|
Packit |
5c3484 |
would be worthwhile (very possibly for all processors too).
|
|
Packit |
5c3484 |
S390: BSWAP_LIMB_FETCH looks like it could be done with
|
|
Packit |
5c3484 |
lrvg , as per glibc sysdeps/s390/s390-64/bits/byteswap.h.
|
|
Packit |
5c3484 |
This is only for 64-bit mode or something is it, since 32-bit mode has
|
|
Packit |
5c3484 |
other code? Also, is it worth using for BSWAP_LIMB too, or
|
|
Packit |
5c3484 |
would that mean a store and re-fetch? Presumably that's what comes out
|
|
Packit |
5c3484 |
in glibc.
|
|
Packit |
5c3484 |
Improve count_leading_zeros for 64-bit machines:
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
if ((x >> 32) == 0) { x <<= 32; cnt += 32; }
|
|
Packit |
5c3484 |
if ((x >> 48) == 0) { x <<= 16; cnt += 16; }
|
|
Packit |
5c3484 |
...
|
|
Packit |
5c3484 |
IRIX 6 MIPSpro compiler has an __inline which could perhaps
|
|
Packit |
5c3484 |
be used in __GMP_EXTERN_INLINE . What would be the right way
|
|
Packit |
5c3484 |
to identify suitable versions of that compiler?
|
|
Packit |
5c3484 |
IRIX cc is rumoured to have an _int_mult_upper
|
|
Packit |
5c3484 |
(in <intrinsics.h> like Cray), but it didn't seem to
|
|
Packit |
5c3484 |
exist on some IRIX 6.5 systems tried. If it does actually exist
|
|
Packit |
5c3484 |
somewhere it would very likely be an improvement over a function call to
|
|
Packit |
5c3484 |
umul.asm.
|
|
Packit |
5c3484 |
mpn_get_str final divisions by the base with
|
|
Packit |
5c3484 |
udiv_qrnd_unnorm could use some sort of multiply-by-inverse
|
|
Packit |
5c3484 |
on suitable machines. This ends up happening for decimal by presenting
|
|
Packit |
5c3484 |
the compiler with a run-time constant, but the same for other bases would
|
|
Packit |
5c3484 |
be good. Perhaps use could be made of the fact base<256.
|
|
Packit |
5c3484 |
mpn_umul_ppmm , mpn_udiv_qrnnd : Return a
|
|
Packit |
5c3484 |
structure like div_t to avoid going through memory, in
|
|
Packit |
5c3484 |
particular helping RISCs that don't do store-to-load forwarding. Clearly
|
|
Packit |
5c3484 |
this is only possible if the ABI returns a structure of two
|
|
Packit |
5c3484 |
mp_limb_t s in registers.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
On PowerPC, structures are returned in memory on AIX and Darwin. In SVR4
|
|
Packit |
5c3484 |
they're returned in registers, except that draft SVR4 had said memory, so
|
|
Packit |
5c3484 |
it'd be prudent to check which is done. We can jam the compiler into the
|
|
Packit |
5c3484 |
right mode if we know how, since all this is purely internal to libgmp.
|
|
Packit |
5c3484 |
(gcc has an option, though of course gcc doesn't matter since we use
|
|
Packit |
5c3484 |
inline asm there.)
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
New Functionality
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
Maybe add mpz_crr (Chinese Remainder Reconstruction).
|
|
Packit |
5c3484 |
Let `0b' and `0B' mean binary input everywhere.
|
|
Packit |
5c3484 |
mpz_init and mpq_init could do lazy allocation.
|
|
Packit |
5c3484 |
Set ALLOC(var) to 0 to indicate nothing allocated, and let
|
|
Packit |
5c3484 |
_mpz_realloc do the initial alloc. Set
|
|
Packit |
5c3484 |
z->_mp_d to a dummy that mpz_get_ui and
|
|
Packit |
5c3484 |
similar can unconditionally fetch from. Niels Möller has had a go at
|
|
Packit |
5c3484 |
this.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
The advantages of the lazy scheme would be:
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
Initial allocate would be the size required for the first value
|
|
Packit |
5c3484 |
stored, rather than getting 1 limb in mpz_init and then
|
|
Packit |
5c3484 |
more or less immediately reallocating.
|
|
Packit |
5c3484 |
mpz_init would only store magic values in the
|
|
Packit |
5c3484 |
mpz_t fields, and could be inlined.
|
|
Packit |
5c3484 |
A fixed initializer could even be used by applications, like
|
|
Packit |
5c3484 |
mpz_t z = MPZ_INITIALIZER; , which might be convenient
|
|
Packit |
5c3484 |
for globals.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
The advantages of the current scheme are:
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
mpz_set_ui and other similar routines needn't check the
|
|
Packit |
5c3484 |
size allocated and can just store unconditionally.
|
|
Packit |
5c3484 |
mpz_set_ui and perhaps others like
|
|
Packit |
5c3484 |
mpz_tdiv_r_ui and a prospective
|
|
Packit |
5c3484 |
mpz_set_ull could be inlined.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
Add mpf_out_raw and mpf_inp_raw . Make sure
|
|
Packit |
5c3484 |
format is portable between 32-bit and 64-bit machines, and between
|
|
Packit |
5c3484 |
little-endian and big-endian machines. A format which MPFR can use too
|
|
Packit |
5c3484 |
would be good.
|
|
Packit |
5c3484 |
mpn_and_n ... mpn_copyd : Perhaps make the mpn
|
|
Packit |
5c3484 |
logops and copys available in gmp.h, either as library functions or
|
|
Packit |
5c3484 |
inlines, with the availability of library functions instantiated in the
|
|
Packit |
5c3484 |
generated gmp.h at build time.
|
|
Packit |
5c3484 |
mpz_set_str etc variants taking string lengths rather than
|
|
Packit |
5c3484 |
null-terminators.
|
|
Packit |
5c3484 |
mpz_andn , mpz_iorn , mpz_nand ,
|
|
Packit |
5c3484 |
mpz_nior , mpz_xnor might be useful additions,
|
|
Packit |
5c3484 |
if they could share code with the current such functions (which should be
|
|
Packit |
5c3484 |
possible).
|
|
Packit |
5c3484 |
mpz_and_ui etc might be of use sometimes. Suggested by
|
|
Packit |
5c3484 |
Niels Möller.
|
|
Packit |
5c3484 |
mpf_set_str and mpf_inp_str could usefully
|
|
Packit |
5c3484 |
accept 0x, 0b etc when base==0. Perhaps the exponent could default to
|
|
Packit |
5c3484 |
decimal in this case, with a further 0x, 0b etc allowed there.
|
|
Packit |
5c3484 |
Eg. 0xFFAA@0x5A. A leading "0" for octal would match the integers, but
|
|
Packit |
5c3484 |
probably something like "0.123" ought not mean octal.
|
|
Packit |
5c3484 |
GMP_LONG_LONG_LIMB or some such could become a documented
|
|
Packit |
5c3484 |
feature of gmp.h, so applications could know whether to
|
|
Packit |
5c3484 |
printf a limb using %lu or %Lu .
|
|
Packit |
5c3484 |
GMP_PRIdMP_LIMB and similar defines following C99
|
|
Packit |
5c3484 |
<inttypes.h> might be of use to applications printing limbs. But
|
|
Packit |
5c3484 |
if GMP_LONG_LONG_LIMB or whatever is added then perhaps this
|
|
Packit |
5c3484 |
can easily enough be left to applications.
|
|
Packit |
5c3484 |
gmp_printf could accept %b for binary output.
|
|
Packit |
5c3484 |
It'd be nice if it worked for plain int etc too, not just
|
|
Packit |
5c3484 |
mpz_t etc.
|
|
Packit |
5c3484 |
gmp_printf in fact could usefully accept an arbitrary base,
|
|
Packit |
5c3484 |
for both integer and float conversions. A base either in the format
|
|
Packit |
5c3484 |
string or as a parameter with * should be allowed. Maybe
|
|
Packit |
5c3484 |
&13b (b for base) or something like that.
|
|
Packit |
5c3484 |
gmp_printf could perhaps accept mpq_t for float
|
|
Packit |
5c3484 |
conversions, eg. "%.4Qf" . This would be merely for
|
|
Packit |
5c3484 |
convenience, but still might be useful. Rounding would be the same as
|
|
Packit |
5c3484 |
for an mpf_t (ie. currently round-to-nearest, but not
|
|
Packit |
5c3484 |
actually documented). Alternately, perhaps a separate
|
|
Packit |
5c3484 |
mpq_get_str_point or some such might be more use. Suggested
|
|
Packit |
5c3484 |
by Pedro Gimeno.
|
|
Packit |
5c3484 |
mpz_rscan0 or mpz_revscan0 or some such
|
|
Packit |
5c3484 |
searching towards the low end of an integer might match
|
|
Packit |
5c3484 |
mpz_scan0 nicely. Likewise for scan1 .
|
|
Packit |
5c3484 |
Suggested by Roberto Bagnara.
|
|
Packit |
5c3484 |
mpz_bit_subset or some such to test whether one integer is a
|
|
Packit |
5c3484 |
bitwise subset of another might be of use. Some sort of return value
|
|
Packit |
5c3484 |
indicating whether it's a proper or non-proper subset would be good and
|
|
Packit |
5c3484 |
wouldn't cost anything in the implementation. Suggested by Roberto
|
|
Packit |
5c3484 |
Bagnara.
|
|
Packit |
5c3484 |
mpf_get_ld , mpf_set_ld : Conversions between
|
|
Packit |
5c3484 |
mpf_t and long double , suggested by Dan
|
|
Packit |
5c3484 |
Christensen. Other long double routines might be desirable
|
|
Packit |
5c3484 |
too, but mpf would be a start.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
long double is an ANSI-ism, so everything involving it would
|
|
Packit |
5c3484 |
need to be suppressed on a K&R compiler.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
There'd be some work to be done by configure to recognise
|
|
Packit |
5c3484 |
the format in use, MPFR has a start on this. Often long
|
|
Packit |
5c3484 |
double is the same as double , which is easy but
|
|
Packit |
5c3484 |
pretty pointless. A single float format detector macro could look at
|
|
Packit |
5c3484 |
double then long double
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
Sometimes there's a compiler option for the size of a long
|
|
Packit |
5c3484 |
double, eg. xlc on AIX can use either 64-bit or 128-bit. It's
|
|
Packit |
5c3484 |
probably simplest to regard this as a compiler compatibility issue, and
|
|
Packit |
5c3484 |
leave it to users or sysadmins to ensure application and library code is
|
|
Packit |
5c3484 |
built the same.
|
|
Packit |
5c3484 |
mpz_sqrt_if_perfect_square : When
|
|
Packit |
5c3484 |
mpz_perfect_square_p does its tests it calculates a square
|
|
Packit |
5c3484 |
root and then discards it. For some applications it might be useful to
|
|
Packit |
5c3484 |
return that root. Suggested by Jason Moxham.
|
|
Packit |
5c3484 |
mpz_get_ull , mpz_set_ull ,
|
|
Packit |
5c3484 |
mpz_get_sll , mpz_get_sll : Conversions for
|
|
Packit |
5c3484 |
long long . These would aid interoperability, though a
|
|
Packit |
5c3484 |
mixture of GMP and long long would probably not be too
|
|
Packit |
5c3484 |
common. Since long long is not always available (it's in
|
|
Packit |
5c3484 |
C99 and GCC though), disadvantages of using long long in
|
|
Packit |
5c3484 |
libgmp.a would be
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
Library contents vary according to the build compiler.
|
|
Packit |
5c3484 |
gmp.h would need an ugly #ifdef block to decide if the
|
|
Packit |
5c3484 |
application compiler could take the long long
|
|
Packit |
5c3484 |
prototypes.
|
|
Packit |
5c3484 |
Some sort of LIBGMP_HAS_LONGLONG might be wanted to
|
|
Packit |
5c3484 |
indicate whether the functions are available. (Applications using
|
|
Packit |
5c3484 |
autoconf could probe the library too.)
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
It'd be possible to defer the need for long long to
|
|
Packit |
5c3484 |
application compile time, by having something like
|
|
Packit |
5c3484 |
mpz_set_2ui called with two halves of a long
|
|
Packit |
5c3484 |
long. Disadvantages of this would be,
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
Bigger code in the application, though perhaps not if a long
|
|
Packit |
5c3484 |
long is normally passed as two halves anyway.
|
|
Packit |
5c3484 |
mpz_get_ull would be a rather big inline, or would have
|
|
Packit |
5c3484 |
to be two function calls.
|
|
Packit |
5c3484 |
mpz_get_sll would be a worse inline, and would put the
|
|
Packit |
5c3484 |
treatment of -0x10..00 into applications (see
|
|
Packit |
5c3484 |
mpz_get_si correctness above).
|
|
Packit |
5c3484 |
Although having libgmp.a independent of the build compiler is nice,
|
|
Packit |
5c3484 |
it sort of sacrifices the capabilities of a good compiler to
|
|
Packit |
5c3484 |
uniformity with inferior ones.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
Plain use of long long is probably the lesser evil, if only
|
|
Packit |
5c3484 |
because it makes best use of gcc. In fact perhaps it would suffice to
|
|
Packit |
5c3484 |
guarantee long long conversions only when using GCC for both
|
|
Packit |
5c3484 |
application and library. That would cover free software, and we can
|
|
Packit |
5c3484 |
worry about selected vendor compilers later.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
In C++ the situation is probably clearer, we demand fairly recent C++ so
|
|
Packit |
5c3484 |
long long should be available always. We'd probably prefer
|
|
Packit |
5c3484 |
to have the C and C++ the same in respect of long long
|
|
Packit |
5c3484 |
support, but it would be possible to have it unconditionally in gmpxx.h,
|
|
Packit |
5c3484 |
by some means or another.
|
|
Packit |
5c3484 |
mpz_strtoz parsing the same as strtol .
|
|
Packit |
5c3484 |
Suggested by Alexander Kruppa.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
Configuration
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
Alpha ev7, ev79: Add code to config.guess to detect these. Believe ev7
|
|
Packit |
5c3484 |
will be "3-1307" in the current switch, but need to verify that. (On
|
|
Packit |
5c3484 |
OSF, current configfsf.guess identifies ev7 using psrinfo, we need to do
|
|
Packit |
5c3484 |
it ourselves for other systems.)
|
|
Packit |
5c3484 |
Alpha OSF: Libtool (version 1.5) doesn't seem to recognise this system is
|
|
Packit |
5c3484 |
"pic always" and ends up running gcc twice with the same options. This
|
|
Packit |
5c3484 |
is wasteful, but harmless. Perhaps a newer libtool will be better.
|
|
Packit |
5c3484 |
ARM: umul_ppmm in longlong.h always uses umull ,
|
|
Packit |
5c3484 |
but is that available only for M series chips or some such? Perhaps it
|
|
Packit |
5c3484 |
should be configured in some way.
|
|
Packit |
5c3484 |
HPPA: config.guess should recognize 7000, 7100, 7200, and 8x00.
|
|
Packit |
5c3484 |
HPPA: gcc 3.2 introduces a -mschedule=7200 etc parameter,
|
|
Packit |
5c3484 |
which could be driven by an exact hppa cpu type.
|
|
Packit |
5c3484 |
Mips: config.guess should say mipsr3000, mipsr4000, mipsr10000, etc.
|
|
Packit |
5c3484 |
"hinv -c processor" gives lots of information on Irix. Standard
|
|
Packit |
5c3484 |
config.guess appends "el" to indicate endianness, but
|
|
Packit |
5c3484 |
AC_C_BIGENDIAN seems the best way to handle that for GMP.
|
|
Packit |
5c3484 |
PowerPC: The function descriptor nonsense for AIX is currently driven by
|
|
Packit |
5c3484 |
*-*-aix* . It might be more reliable to do some sort of
|
|
Packit |
5c3484 |
feature test, examining the compiler output perhaps. It might also be
|
|
Packit |
5c3484 |
nice to merge the aix.m4 files into powerpc-defs.m4.
|
|
Packit |
5c3484 |
config.m4 is generated only by the configure script, it won't be
|
|
Packit |
5c3484 |
regenerated by config.status. Creating it as an AC_OUTPUT
|
|
Packit |
5c3484 |
would work, but it might upset "make" to have things like L$
|
|
Packit |
5c3484 |
get into the Makefiles through AC_SUBST .
|
|
Packit |
5c3484 |
AC_CONFIG_COMMANDS would be the alternative. With some
|
|
Packit |
5c3484 |
careful m4 quoting the changequote calls might not be
|
|
Packit |
5c3484 |
needed, which might free up the order in which things had to be output.
|
|
Packit |
5c3484 |
Automake: Latest automake has a CCAS , CCASFLAGS
|
|
Packit |
5c3484 |
scheme. Though we probably wouldn't be using its assembler support we
|
|
Packit |
5c3484 |
could try to use those variables in compatible ways.
|
|
Packit |
5c3484 |
GMP_LDFLAGS could probably be done with plain
|
|
Packit |
5c3484 |
LDFLAGS already used by automake for all linking. But with
|
|
Packit |
5c3484 |
a bit of luck the next libtool will pass pretty much all
|
|
Packit |
5c3484 |
CFLAGS through to the compiler when linking, making
|
|
Packit |
5c3484 |
GMP_LDFLAGS unnecessary.
|
|
Packit |
5c3484 |
mpn/Makeasm.am uses -c and -o together in the
|
|
Packit |
5c3484 |
.S and .asm rules, but apparently that isn't completely portable (there's
|
|
Packit |
5c3484 |
an autoconf AC_PROG_CC_C_O test for it). So far we've not
|
|
Packit |
5c3484 |
had problems, but perhaps the rules could be rewritten to use "foo.s" as
|
|
Packit |
5c3484 |
the temporary, or to do a suitable "mv" of the result. The only danger
|
|
Packit |
5c3484 |
from using foo.s would be if a compile failed and the temporary foo.s
|
|
Packit |
5c3484 |
then looked like the primary source. Hopefully if the
|
|
Packit |
5c3484 |
SUFFIXES are ordered to have .S and .asm ahead of .s that
|
|
Packit |
5c3484 |
wouldn't happen. Might need to check.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
Random Numbers
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
_gmp_rand is not particularly fast on the linear
|
|
Packit |
5c3484 |
congruential algorithm and could stand various improvements.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
Make a second seed area within gmp_randstate_t (or
|
|
Packit |
5c3484 |
_mp_algdata rather) to save some copying.
|
|
Packit |
5c3484 |
Make a special case for a single limb 2exp modulus, to
|
|
Packit |
5c3484 |
avoid mpn_mul calls. Perhaps the same for two limbs.
|
|
Packit |
5c3484 |
Inline the lc code, to avoid a function call and
|
|
Packit |
5c3484 |
TMP_ALLOC for every chunk.
|
|
Packit |
5c3484 |
Perhaps the 2exp and general LC cases should be split,
|
|
Packit |
5c3484 |
for clarity (if the general case is retained).
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
gmp_randstate_t used for parameters perhaps should become
|
|
Packit |
5c3484 |
gmp_randstate_ptr the same as other types.
|
|
Packit |
5c3484 |
Some of the empirical randomness tests could be included in a "make
|
|
Packit |
5c3484 |
check". They ought to work everywhere, for a given seed at least.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
C++
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
mpz_class(string) , etc: Use the C++ global locale to
|
|
Packit |
5c3484 |
identify whitespace.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
mpf_class(string) : Use the C++ global locale decimal point,
|
|
Packit |
5c3484 |
rather than the C one.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
Consider making these variant mpz_set_str etc forms
|
|
Packit |
5c3484 |
available for mpz_t too, not just mpz_class
|
|
Packit |
5c3484 |
etc.
|
|
Packit |
5c3484 |
mpq_class operator+= : Don't emit an unnecessary
|
|
Packit |
5c3484 |
mpq_set(q,q) before mpz_addmul etc.
|
|
Packit |
5c3484 |
Put various bits of gmpxx.h into libgmpxx, to avoid excessive inlining.
|
|
Packit |
5c3484 |
Candidates for this would be,
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
mpz_class(const char *) , etc: since they're normally
|
|
Packit |
5c3484 |
not fast anyway, and we can hide the exception throw .
|
|
Packit |
5c3484 |
mpz_class(string) , etc: to hide the cstr
|
|
Packit |
5c3484 |
needed to get to the C conversion function.
|
|
Packit |
5c3484 |
mpz_class string, char* etc constructors: likewise to
|
|
Packit |
5c3484 |
hide the throws and conversions.
|
|
Packit |
5c3484 |
mpz_class::get_str , etc: to hide the char*
|
|
Packit |
5c3484 |
to string conversion and free. Perhaps
|
|
Packit |
5c3484 |
mpz_get_str can write directly into a
|
|
Packit |
5c3484 |
string , to avoid copying.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
Consider making such string returning variants
|
|
Packit |
5c3484 |
available for use with plain mpz_t etc too.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
Miscellaneous
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
mpz_gcdext and mpn_gcdext ought to document
|
|
Packit |
5c3484 |
what range of values the generated cofactors can take, and preferably
|
|
Packit |
5c3484 |
ensure the definition uniquely specifies the cofactors for given inputs.
|
|
Packit |
5c3484 |
A basic extended Euclidean algorithm or multi-step variant leads to
|
|
Packit |
5c3484 |
|x|<|b| and |y|<|a| or something like that, but there's probably
|
|
Packit |
5c3484 |
two solutions under just those restrictions.
|
|
Packit |
5c3484 |
demos/factorize.c: use mpz_divisible_ui_p rather than
|
|
Packit |
5c3484 |
mpz_tdiv_qr_ui . (Of course dividing multiple primes at a
|
|
Packit |
5c3484 |
time would be better still.)
|
|
Packit |
5c3484 |
The various test programs use quite a bit of the main
|
|
Packit |
5c3484 |
libgmp . This establishes good cross-checks, but it might be
|
|
Packit |
5c3484 |
better to use simple reference routines where possible. Where it's not
|
|
Packit |
5c3484 |
possible some attention could be paid to the order of the tests, so a
|
|
Packit |
5c3484 |
libgmp routine is only used for tests once it seems to be
|
|
Packit |
5c3484 |
good.
|
|
Packit |
5c3484 |
MUL_FFT_THRESHOLD etc: the FFT thresholds should allow a
|
|
Packit |
5c3484 |
return to a previous k at certain sizes. This arises basically due to
|
|
Packit |
5c3484 |
the step effect caused by size multiples effectively used for each k.
|
|
Packit |
5c3484 |
Looking at a graph makes it fairly clear.
|
|
Packit |
5c3484 |
__gmp_doprnt_mpf does a rather unattractive round-to-nearest
|
|
Packit |
5c3484 |
on the string returned by mpf_get_str . Perhaps some variant
|
|
Packit |
5c3484 |
of mpf_get_str could be made which would better suit.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
Aids to Development
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
Add ASSERT s at the start of each user-visible mpz/mpq/mpf
|
|
Packit |
5c3484 |
function to check the validity of each mp?_t parameter, in
|
|
Packit |
5c3484 |
particular to check they've been mp?_init ed. This might
|
|
Packit |
5c3484 |
catch elementary mistakes in user programs. Care would need to be taken
|
|
Packit |
5c3484 |
over MPZ_TMP_INIT ed variables used internally. If nothing
|
|
Packit |
5c3484 |
else then consistency checks like size<=alloc, ptr not
|
|
Packit |
5c3484 |
NULL and ptr+size not wrapping around the address space,
|
|
Packit |
5c3484 |
would be possible. A more sophisticated scheme could track
|
|
Packit |
5c3484 |
_mp_d pointers and ensure only a valid one is used. Such a
|
|
Packit |
5c3484 |
scheme probably wouldn't be reentrant, not without some help from the
|
|
Packit |
5c3484 |
system.
|
|
Packit |
5c3484 |
tune/time.c could try to determine at runtime whether
|
|
Packit |
5c3484 |
getrusage and gettimeofday are reliable.
|
|
Packit |
5c3484 |
Currently we pretend in configure that the dodgy m68k netbsd 1.4.1
|
|
Packit |
5c3484 |
getrusage doesn't exist. If a test might take a long time
|
|
Packit |
5c3484 |
to run then perhaps cache the result in a file somewhere.
|
|
Packit |
5c3484 |
tune/time.c could choose the default precision based on the
|
|
Packit |
5c3484 |
speed_unittime determined, independent of the method in use.
|
|
Packit |
5c3484 |
Cray vector systems: CPU frequency could be determined from
|
|
Packit |
5c3484 |
sysconf(_SC_CLK_TCK) , since it seems to be clock cycle
|
|
Packit |
5c3484 |
based. Is this true for all Cray systems? Would like some documentation
|
|
Packit |
5c3484 |
or something to confirm.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
Documentation
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
mpz_inp_str (etc) doesn't say when it stops reading digits.
|
|
Packit |
5c3484 |
mpn_get_str isn't terribly clear about how many digits it
|
|
Packit |
5c3484 |
produces. It'd probably be possible to say at most one leading zero,
|
|
Packit |
5c3484 |
which is what both it and mpz_get_str currently do. But
|
|
Packit |
5c3484 |
want to be careful not to bind ourselves to something that might not suit
|
|
Packit |
5c3484 |
another implementation.
|
|
Packit |
5c3484 |
va_arg doesn't do the right thing with mpz_t
|
|
Packit |
5c3484 |
etc directly, but instead needs a pointer type like MP_INT* .
|
|
Packit |
5c3484 |
It'd be good to show how to do this, but we'd either need to document
|
|
Packit |
5c3484 |
mpz_ptr and friends, or perhaps fallback on something
|
|
Packit |
5c3484 |
slightly nasty with void* .
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
Bright Ideas
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
The following may or may not be feasible, and aren't likely to get done in the
|
|
Packit |
5c3484 |
near future, but are at least worth thinking about.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
Reorganize longlong.h so that we can inline the operations even for the
|
|
Packit |
5c3484 |
system compiler. When there is no such compiler feature, make calls to
|
|
Packit |
5c3484 |
stub functions. Write such stub functions for as many machines as
|
|
Packit |
5c3484 |
possible.
|
|
Packit |
5c3484 |
longlong.h could declare when it's using, or would like to use,
|
|
Packit |
5c3484 |
mpn_umul_ppmm , and the corresponding umul.asm file could be
|
|
Packit |
5c3484 |
included in libgmp only in that case, the same as is effectively done for
|
|
Packit |
5c3484 |
__clz_tab . Likewise udiv.asm and perhaps cntlz.asm. This
|
|
Packit |
5c3484 |
would only be a very small space saving, so perhaps not worth the
|
|
Packit |
5c3484 |
complexity.
|
|
Packit |
5c3484 |
longlong.h could be built at configure time by concatenating or
|
|
Packit |
5c3484 |
#including fragments from each directory in the mpn path. This would
|
|
Packit |
5c3484 |
select CPU specific macros the same way as CPU specific assembler code.
|
|
Packit |
5c3484 |
Code used would no longer depend on cpp predefines, and the current
|
|
Packit |
5c3484 |
nested conditionals could be flattened out.
|
|
Packit |
5c3484 |
mpz_get_si returns 0x80000000 for -0x100000000, whereas it's
|
|
Packit |
5c3484 |
sort of supposed to return the low 31 (or 63) bits. But this is
|
|
Packit |
5c3484 |
undocumented, and perhaps not too important.
|
|
Packit |
5c3484 |
mpz_init_set* and mpz_realloc could allocate
|
|
Packit |
5c3484 |
say an extra 16 limbs over what's needed, so as to reduce the chance of
|
|
Packit |
5c3484 |
having to do a reallocate if the mpz_t grows a bit more.
|
|
Packit |
5c3484 |
This could only be an option, since it'd badly bloat memory usage in
|
|
Packit |
5c3484 |
applications using many small values.
|
|
Packit |
5c3484 |
mpq functions could perhaps check for numerator or
|
|
Packit |
5c3484 |
denominator equal to 1, on the assumption that integers or
|
|
Packit |
5c3484 |
denominator-only values might be expected to occur reasonably often.
|
|
Packit |
5c3484 |
count_trailing_zeros is used on more or less uniformly
|
|
Packit |
5c3484 |
distributed numbers in a couple of places. For some CPUs
|
|
Packit |
5c3484 |
count_trailing_zeros is slow and it's probably worth handling
|
|
Packit |
5c3484 |
the frequently occurring 0 to 2 trailing zeros cases specially.
|
|
Packit |
5c3484 |
mpf_t might like to let the exponent be undefined when
|
|
Packit |
5c3484 |
size==0, instead of requiring it 0 as now. It should be possible to do
|
|
Packit |
5c3484 |
size==0 tests before paying attention to the exponent. The advantage is
|
|
Packit |
5c3484 |
not needing to set exp in the various places a zero result can arise,
|
|
Packit |
5c3484 |
which avoids some tedium but is otherwise perhaps not too important.
|
|
Packit |
5c3484 |
Currently mpz_set_f and mpf_cmp_ui depend on
|
|
Packit |
5c3484 |
exp==0, maybe elsewhere too.
|
|
Packit |
5c3484 |
__gmp_allocate_func : Could use GCC __attribute__
|
|
Packit |
5c3484 |
((malloc)) on this, though don't know if it'd do much. GCC 3.0
|
|
Packit |
5c3484 |
allows that attribute on functions, but not function pointers (see info
|
|
Packit |
5c3484 |
node "Attribute Syntax"), so would need a new autoconf test. This can
|
|
Packit |
5c3484 |
wait until there's a GCC that supports it.
|
|
Packit |
5c3484 |
mpz_add_ui contains two __GMPN_COPY s, one from
|
|
Packit |
5c3484 |
mpn_add_1 and one from mpn_sub_1 . If those two
|
|
Packit |
5c3484 |
routines were opened up a bit maybe that code could be shared. When a
|
|
Packit |
5c3484 |
copy needs to be done there's no carry to append for the add, and if the
|
|
Packit |
5c3484 |
copy is non-empty no high zero for the sub.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
Old and Obsolete Stuff
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
The following tasks apply to chips or systems that are old and/or obsolete.
|
|
Packit |
5c3484 |
It's unlikely anything will be done about them unless anyone is actively using
|
|
Packit |
5c3484 |
them.
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
Sparc32: The integer based udiv_nfp.asm used to be selected by
|
|
Packit |
5c3484 |
configure --nfp but that option is gone now that autoconf is
|
|
Packit |
5c3484 |
used. The file could go somewhere suitable in the mpn search if any
|
|
Packit |
5c3484 |
chips might benefit from it, though it's possible we don't currently
|
|
Packit |
5c3484 |
differentiate enough exact cpu types to do this properly.
|
|
Packit |
5c3484 |
VAX D and G format double floats are straightforward and
|
|
Packit |
5c3484 |
could perhaps be handled directly in __gmp_extract_double
|
|
Packit |
5c3484 |
and maybe in mpn_get_d , rather than falling back on the
|
|
Packit |
5c3484 |
generic code. (Both formats are detected by configure .)
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
</body>
|
|
Packit |
5c3484 |
</html>
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
|
|
Packit |
5c3484 |
Local variables:
|
|
Packit |
5c3484 |
eval: (add-hook 'write-file-hooks 'time-stamp)
|
|
Packit |
5c3484 |
time-stamp-start: "This file current as of "
|
|
Packit |
5c3484 |
time-stamp-format: "%:d %3b %:y"
|
|
Packit |
5c3484 |
time-stamp-end: "\\."
|
|
Packit |
5c3484 |
time-stamp-line-limit: 50
|
|
Packit |
5c3484 |
End:
|
|
Packit |
5c3484 |
-->
|