Blame doc/tasks.html

Packit 5c3484
Packit 5c3484
<html>
Packit 5c3484
<head>
Packit 5c3484
  <title>GMP Itemized Development Tasks</title>
Packit 5c3484
  <link rel="shortcut icon" href="favicon.ico">
Packit 5c3484
  <link rel="stylesheet" href="gmp.css">
Packit 5c3484
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
Packit 5c3484
</head>
Packit 5c3484
Packit 5c3484
<center>
Packit 5c3484
  

Packit 5c3484
    GMP Itemized Development Tasks
Packit 5c3484
  
Packit 5c3484
</center>
Packit 5c3484
Packit 5c3484
<font size=-1>
Packit 5c3484
Packit 5c3484
Copyright 2000-2004, 2006, 2008, 2009 Free Software Foundation, Inc.
Packit 5c3484
Packit 5c3484
This file is part of the GNU MP Library.
Packit 5c3484
Packit 5c3484
The GNU MP Library is free software; you can redistribute it and/or modify
Packit 5c3484
it under the terms of either:
Packit 5c3484
Packit 5c3484
  * the GNU Lesser General Public License as published by the Free
Packit 5c3484
    Software Foundation; either version 3 of the License, or (at your
Packit 5c3484
    option) any later version.
Packit 5c3484
Packit 5c3484
or
Packit 5c3484
Packit 5c3484
  * the GNU General Public License as published by the Free Software
Packit 5c3484
    Foundation; either version 2 of the License, or (at your option) any
Packit 5c3484
    later version.
Packit 5c3484
Packit 5c3484
or both in parallel, as here.
Packit 5c3484
Packit 5c3484
The GNU MP Library is distributed in the hope that it will be useful, but
Packit 5c3484
WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
Packit 5c3484
or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
Packit 5c3484
for more details.
Packit 5c3484
Packit 5c3484
You should have received copies of the GNU General Public License and the
Packit 5c3484
GNU Lesser General Public License along with the GNU MP Library.  If not,
Packit 5c3484
see https://www.gnu.org/licenses/.
Packit 5c3484
Packit 5c3484
</font>
Packit 5c3484
Packit 5c3484

Packit 5c3484
Packit 5c3484
  This file current as of 29 Jan 2014.  An up-to-date version is available at
Packit 5c3484
  https://gmplib.org/tasks.html.
Packit 5c3484
  Please send comments about this page to gmp-devel<font>@</font>gmplib.org.
Packit 5c3484
Packit 5c3484

These are itemized GMP development tasks. Not all the tasks

Packit 5c3484
    listed here are suitable for volunteers, but many of them are.
Packit 5c3484
    Please see the projects file for more
Packit 5c3484
    sizeable projects.
Packit 5c3484
Packit 5c3484

CAUTION: This file needs updating. Many of the tasks here have

Packit 5c3484
either already been taken care of, or have become irrelevant.
Packit 5c3484
Packit 5c3484

Correctness and Completeness

Packit 5c3484
    Packit 5c3484
  • _LONG_LONG_LIMB in gmp.h is not namespace clean. Reported
  • Packit 5c3484
         by Patrick Pelissier.
    Packit 5c3484
         
    Packit 5c3484
         We sort of mentioned _LONG_LONG_LIMB in past releases, so
    Packit 5c3484
         need to be careful about changing it.  It used to be a define
    Packit 5c3484
         applications had to set for long long limb systems, but that in
    Packit 5c3484
         particular is no longer relevant now that it's established automatically.
    Packit 5c3484
  • The various reuse.c tests need to force reallocation by calling
  • Packit 5c3484
         _mpz_realloc with a small (1 limb) size.
    Packit 5c3484
  • One reuse case is missing from mpX/tests/reuse.c:
  • Packit 5c3484
         mpz_XXX(a,a,a).
    Packit 5c3484
  • Make the string reading functions allow the `0x' prefix when the base is
  • Packit 5c3484
         explicitly 16.  They currently only allow that prefix when the base is
    Packit 5c3484
         unspecified (zero).
    Packit 5c3484
  • mpf_eq is not always correct, when one operand is
  • Packit 5c3484
         1000000000... and the other operand is 0111111111..., i.e., extremely
    Packit 5c3484
         close.  There is a special case in mpf_sub for this
    Packit 5c3484
         situation; put similar code in mpf_eq.  [In progress.]
    Packit 5c3484
  • mpf_eq doesn't implement what gmp.texi specifies. It should
  • Packit 5c3484
         not use just whole limbs, but partial limbs.  [In progress.]
    Packit 5c3484
  • mpf_set_str doesn't validate it's exponent, for instance
  • Packit 5c3484
         garbage 123.456eX789X is accepted (and an exponent 0 used), and overflow
    Packit 5c3484
         of a long is not detected.
    Packit 5c3484
  • mpf_add doesn't check for a carry from truncated portions of
  • Packit 5c3484
         the inputs, and in that respect doesn't implement the "infinite precision
    Packit 5c3484
         followed by truncate" specified in the manual.
    Packit 5c3484
  • Windows DLLs: tests/mpz/reuse.c and tests/mpf/reuse.c initialize global
  • Packit 5c3484
         variables with pointers to mpz_add etc, which doesn't work
    Packit 5c3484
         when those routines are coming from a DLL (because they're effectively
    Packit 5c3484
         function pointer global variables themselves).  Need to rearrange perhaps
    Packit 5c3484
         to a set of calls to a test function rather than iterating over an array.
    Packit 5c3484
  • mpz_pow_ui: Detect when the result would be more memory than
  • Packit 5c3484
         a size_t can represent and raise some suitable exception,
    Packit 5c3484
         probably an alloc call asking for SIZE_T_MAX, and if that
    Packit 5c3484
         somehow succeeds then an abort.  Various size overflows of
    Packit 5c3484
         this kind are not handled gracefully, probably resulting in segvs.
    Packit 5c3484
         
    Packit 5c3484
         In mpz_n_pow_ui, detect when the count of low zero bits
    Packit 5c3484
         exceeds an unsigned long.  There's a (small) chance of this
    Packit 5c3484
         happening but still having enough memory to represent the value.
    Packit 5c3484
         Reported by Winfried Dreckmann in for instance mpz_ui_pow_ui (x,
    Packit 5c3484
         4UL, 1431655766UL).
    Packit 5c3484
  • mpf: Detect exponent overflow and raise some exception.
  • Packit 5c3484
         It'd be nice to allow the full mp_exp_t range since that's
    Packit 5c3484
         how it's been in the past, but maybe dropping one bit would make it
    Packit 5c3484
         easier to test if e1+e2 goes out of bounds.
    Packit 5c3484
    Packit 5c3484
    Packit 5c3484
    Packit 5c3484
    Packit 5c3484

    Machine Independent Optimization

    Packit 5c3484
      Packit 5c3484
    • mpf_cmp: For better cache locality, don't test for low zero
    • Packit 5c3484
           limbs until the high limbs fail to give an ordering.  Reduce code size by
      Packit 5c3484
           turning the three mpn_cmp's into a single loop stopping when
      Packit 5c3484
           the end of one operand is reached (and then looking for a non-zero in the
      Packit 5c3484
           rest of the other).
      Packit 5c3484
    • mpf_mul_2exp, mpf_div_2exp: The use of
    • Packit 5c3484
           mpn_lshift for any size<=prec means repeated
      Packit 5c3484
           mul_2exp and div_2exp calls accumulate low zero
      Packit 5c3484
           limbs until size==prec+1 is reached.  Those zeros will slow down
      Packit 5c3484
           subsequent operations, especially if the value is otherwise only small.
      Packit 5c3484
           If low bits of the low limb are zero, use mpn_rshift so as
      Packit 5c3484
           to not increase the size.
      Packit 5c3484
    • mpn_dc_sqrtrem, mpn_sqrtrem2: Don't use
    • Packit 5c3484
           mpn_add_1 and mpn_sub_1 for 1 limb operations,
      Packit 5c3484
           instead ADDC_LIMB and SUBC_LIMB.
      Packit 5c3484
    • mpn_sqrtrem2: Use plain variables for sp[0] and
    • Packit 5c3484
           rp[0] calculations, so the compiler needn't worry about
      Packit 5c3484
           aliasing between sp and rp.
      Packit 5c3484
    • mpn_sqrtrem: Some work can be saved in the last step when
    • Packit 5c3484
           the remainder is not required, as noted in Paul's paper.
      Packit 5c3484
    • mpq_add, mpq_sub: The gcd fits a single limb
    • Packit 5c3484
           with high probability and in this case binvert_limb could
      Packit 5c3484
           be used to calculate the inverse just once for the two exact divisions
      Packit 5c3484
           "op1.den / gcd" and "op2.den / gcd", rather than letting
      Packit 5c3484
           mpn_bdiv_q_1 do it each time.  This would require calling
      Packit 5c3484
           mpn_pi1_bdiv_q_1.
      Packit 5c3484
    • mpn_gcdext: Don't test count_leading_zeros for
    • Packit 5c3484
           zero, instead check the high bit of the operand and avoid invoking
      Packit 5c3484
           count_leading_zeros.  This is an optimization on all
      Packit 5c3484
           machines, and significant on machines with slow
      Packit 5c3484
           count_leading_zeros, though it's possible an already
      Packit 5c3484
           normalized operand might not be encountered very often.
      Packit 5c3484
    • Rewrite umul_ppmm to use floating-point for generating the
    • Packit 5c3484
           most significant limb (if GMP_LIMB_BITS &lt= 52 bits).
      Packit 5c3484
           (Peter Montgomery has some ideas on this subject.)
      Packit 5c3484
    • Improve the default umul_ppmm code in longlong.h: Add partial
    • Packit 5c3484
           products with fewer operations.
      Packit 5c3484
    • Consider inlining mpz_set_ui. This would be both small and
    • Packit 5c3484
           fast, especially for compile-time constants, but would make application
      Packit 5c3484
           binaries depend on having 1 limb allocated to an mpz_t,
      Packit 5c3484
           preventing the "lazy" allocation scheme below.
      Packit 5c3484
    • Consider inlining mpz_[cft]div_ui and maybe
    • Packit 5c3484
           mpz_[cft]div_r_ui.  A __gmp_divide_by_zero
      Packit 5c3484
           would be needed for the divide by zero test, unless that could be left to
      Packit 5c3484
           mpn_mod_1 (not sure currently whether all the risc chips
      Packit 5c3484
           provoke the right exception there if using mul-by-inverse).
      Packit 5c3484
    • Consider inlining: mpz_fits_s*_p. The setups for
    • Packit 5c3484
           LONG_MAX etc would need to go into gmp.h, and on Cray it
      Packit 5c3484
           might, unfortunately, be necessary to forcibly include <limits.h>
      Packit 5c3484
           since there's no apparent way to get SHRT_MAX with an
      Packit 5c3484
           expression (since short and unsigned short can
      Packit 5c3484
           be different sizes).
      Packit 5c3484
    • mpz_powm and mpz_powm_ui aren't very fast on one
    • Packit 5c3484
           or two limb moduli, due to a lot of function call overheads.  These could
      Packit 5c3484
           perhaps be handled as special cases.
      Packit 5c3484
    • Make sure mpz_powm_ui is never slower than the corresponding
    • Packit 5c3484
           computation using mpz_powm.
      Packit 5c3484
    • mpz_powm REDC should do multiplications by g[]
    • Packit 5c3484
           using the division method when they're small, since the REDC form of a
      Packit 5c3484
           small multiplier is normally a full size product.  Probably would need a
      Packit 5c3484
           new tuned parameter to say what size multiplier is "small", as a function
      Packit 5c3484
           of the size of the modulus.
      Packit 5c3484
    • mpn_gcd might be able to be sped up on small to moderate
    • Packit 5c3484
           sizes by improving find_a, possibly just by providing an
      Packit 5c3484
           alternate implementation for CPUs with slowish
      Packit 5c3484
           count_leading_zeros.
      Packit 5c3484
    • mpf_set_str produces low zero limbs when a string has a
    • Packit 5c3484
           fraction but is exactly representable, eg. 0.5 in decimal.  These could be
      Packit 5c3484
           stripped to save work in later operations.
      Packit 5c3484
    • mpz_and, mpz_ior and mpz_xor should
    • Packit 5c3484
           use mpn_and_n etc for the benefit of the small number of
      Packit 5c3484
           targets with native versions of those routines.  Need to be careful not to
      Packit 5c3484
           pass size==0.  Is some code sharing possible between the mpz
      Packit 5c3484
           routines?
      Packit 5c3484
    • mpf_add: Don't do a copy to avoid overlapping operands
    • Packit 5c3484
           unless it's really necessary (currently only sizes are tested, not
      Packit 5c3484
           whether r really is u or v).
      Packit 5c3484
    • mpf_add: Under the check for v having no effect on the
    • Packit 5c3484
           result, perhaps test for r==u and do nothing in that case, rather than
      Packit 5c3484
           currently it looks like an MPN_COPY_INCR will be done to
      Packit 5c3484
           reduce prec+1 limbs to prec.
      Packit 5c3484
    • mpf_div_ui: Instead of padding with low zeros, call
    • Packit 5c3484
           mpn_divrem_1 asking for fractional quotient limbs.
      Packit 5c3484
    • mpf_div_ui: Eliminate TMP_ALLOC. When r!=u
    • Packit 5c3484
           there's no overlap and the division can be called on those operands.
      Packit 5c3484
           When r==u and is prec+1 limbs, then it's an in-place division.  If r==u
      Packit 5c3484
           and not prec+1 limbs, then move the available limbs up to prec+1 and do
      Packit 5c3484
           an in-place there.
      Packit 5c3484
    • mpf_div_ui: Whether the high quotient limb is zero can be
    • Packit 5c3484
           determined by testing the dividend for high<divisor.  When non-zero, the
      Packit 5c3484
           division can be done on prec dividend limbs instead of prec+1.  The result
      Packit 5c3484
           size is also known before the division, so that can be a tail call (once
      Packit 5c3484
           the TMP_ALLOC is eliminated).
      Packit 5c3484
    • mpn_divrem_2 could usefully accept unnormalized divisors and
    • Packit 5c3484
           shift the dividend on-the-fly, since this should cost nothing on
      Packit 5c3484
           superscalar processors and avoid the need for temporary copying in
      Packit 5c3484
           mpn_tdiv_qr.
      Packit 5c3484
    • mpf_sqrt: If r!=u, and if u doesn't need to be padded with
    • Packit 5c3484
           zeros, then there's no need for the tp temporary.
      Packit 5c3484
    • mpq_cmp_ui could form the num1*den2 and
    • Packit 5c3484
           num2*den1 products limb-by-limb from high to low and look at
      Packit 5c3484
           each step for values differing by more than the possible carry bit from
      Packit 5c3484
           the uncalculated portion.
      Packit 5c3484
    • mpq_cmp could do the same high-to-low progressive multiply
    • Packit 5c3484
           and compare.  The benefits of karatsuba and higher multiplication
      Packit 5c3484
           algorithms are lost, but if it's assumed only a few high limbs will be
      Packit 5c3484
           needed to determine an order then that's fine.
      Packit 5c3484
    • mpn_add_1, mpn_sub_1, mpn_add,
    • Packit 5c3484
           mpn_sub: Internally use __GMPN_ADD_1 etc
      Packit 5c3484
           instead of the functions, so they get inlined on all compilers, not just
      Packit 5c3484
           gcc and others with inline recognised in gmp.h.
      Packit 5c3484
           __GMPN_ADD_1 etc are meant mostly to support application
      Packit 5c3484
           inline mpn_add_1 etc and if they don't come out good for
      Packit 5c3484
           internal uses then special forms can be introduced, for instance many
      Packit 5c3484
           internal uses are in-place.  Sometimes a block of code is executed based
      Packit 5c3484
           on the carry-out, rather than using it arithmetically, and those places
      Packit 5c3484
           might want to do their own loops entirely.
      Packit 5c3484
    • __gmp_extract_double on 64-bit systems could use just one
    • Packit 5c3484
           bitfield for the mantissa extraction, not two, when endianness permits.
      Packit 5c3484
           Might depend on the compiler allowing long long bit fields
      Packit 5c3484
           when that's the only actual 64-bit type.
      Packit 5c3484
    • tal-notreent.c could keep a block of memory permanently allocated.
    • Packit 5c3484
           Currently the last nested TMP_FREE releases all memory, so
      Packit 5c3484
           there's an allocate and free every time a top-level function using
      Packit 5c3484
           TMP is called.  Would need
      Packit 5c3484
           mp_set_memory_functions to tell tal-notreent.c to release
      Packit 5c3484
           any cached memory when changing allocation functions though.
      Packit 5c3484
    • __gmp_tmp_alloc from tal-notreent.c could be partially
    • Packit 5c3484
           inlined.  If the current chunk has enough room then a couple of pointers
      Packit 5c3484
           can be updated.  Only if more space is required then a call to some sort
      Packit 5c3484
           of __gmp_tmp_increase would be needed.  The requirement that
      Packit 5c3484
           TMP_ALLOC is an expression might make the implementation a
      Packit 5c3484
           bit ugly and/or a bit sub-optimal.
      Packit 5c3484
      Packit 5c3484
      #define TMP_ALLOC(n)
      Packit 5c3484
        ((ROUND_UP(n) > current->end - current->point ?
      Packit 5c3484
           __gmp_tmp_increase (ROUND_UP (n)) : 0),
      Packit 5c3484
           current->point += ROUND_UP (n),
      Packit 5c3484
           current->point - ROUND_UP (n))
      Packit 5c3484
      Packit 5c3484
    • __mp_bases has a lot of data for bases which are pretty much
    • Packit 5c3484
           never used.  Perhaps the table should just go up to base 16, and have
      Packit 5c3484
           code to generate data above that, if and when required.  Naturally this
      Packit 5c3484
           assumes the code would be smaller than the data saved.
      Packit 5c3484
    • __mp_bases field big_base_inverted is only used
    • Packit 5c3484
           if USE_PREINV_DIVREM_1 is true, and could be omitted
      Packit 5c3484
           otherwise, to save space.
      Packit 5c3484
    • mpz_get_str, mtox: For power-of-2 bases, which
    • Packit 5c3484
           are of course fast, it seems a little silly to make a second pass over
      Packit 5c3484
           the mpn_get_str output to convert to ASCII.  Perhaps combine
      Packit 5c3484
           that with the bit extractions.
      Packit 5c3484
    • mpz_gcdext: If the caller requests only the S cofactor (of
    • Packit 5c3484
           A), and A<B, then the code ends up generating the cofactor T (of B) and
      Packit 5c3484
           deriving S from that.  Perhaps it'd be possible to arrange to get S in
      Packit 5c3484
           the first place by calling mpn_gcdext with A+B,B.  This
      Packit 5c3484
           might only be an advantage if A and B are about the same size.
      Packit 5c3484
    • mpz_n_pow_ui does a good job with small bases and stripping
    • Packit 5c3484
           powers of 2, but it's perhaps a bit too complicated for what it gains.
      Packit 5c3484
           The simpler mpn_pow_1 is a little faster on small exponents.
      Packit 5c3484
           (Note some of the ugliness in mpz_n_pow_ui is due to
      Packit 5c3484
           supporting mpn_mul_2.)
      Packit 5c3484
           
      Packit 5c3484
           Perhaps the stripping of 2s in mpz_n_pow_ui should be
      Packit 5c3484
           confined to single limb operands for simplicity and since that's where
      Packit 5c3484
           the greatest gain would be.
      Packit 5c3484
           
      Packit 5c3484
           Ideally mpn_pow_1 and mpz_n_pow_ui would be
      Packit 5c3484
           merged.  The reason mpz_n_pow_ui writes to an
      Packit 5c3484
           mpz_t is that its callers leave it to make a good estimate
      Packit 5c3484
           of the result size.  Callers of mpn_pow_1 already know the
      Packit 5c3484
           size by separate means (mp_bases).
      Packit 5c3484
    • mpz_invert should call mpn_gcdext directly.
    • Packit 5c3484
      Packit 5c3484
      Packit 5c3484
      Packit 5c3484

      Machine Dependent Optimization

      Packit 5c3484
        Packit 5c3484
      • invert_limb on various processors might benefit from the
      • Packit 5c3484
             little Newton iteration done for alpha and ia64.
        Packit 5c3484
      • Alpha 21264: mpn_addlsh1_n could be implemented with
      • Packit 5c3484
             mpn_addmul_1, since that code at 3.5 is a touch faster than
        Packit 5c3484
             a separate lshift and add_n at
        Packit 5c3484
             1.75+2.125=3.875.  Or very likely some specific addlsh1_n
        Packit 5c3484
             code could beat both.
        Packit 5c3484
      • Alpha 21264: Improve feed-in code for mpn_mul_1,
      • Packit 5c3484
             mpn_addmul_1, and mpn_submul_1.
        Packit 5c3484
      • Alpha 21164: Rewrite mpn_mul_1, mpn_addmul_1,
      • Packit 5c3484
             and mpn_submul_1 for the 21164.  This should use both integer
        Packit 5c3484
             multiplies and floating-point multiplies.  For the floating-point
        Packit 5c3484
             operations, the single-limb multiplier should be split into three 21-bit
        Packit 5c3484
             chunks, or perhaps even better in four 16-bit chunks.  Probably possible
        Packit 5c3484
             to reach 9 cycles/limb.
        Packit 5c3484
      • Alpha: GCC 3.4 will introduce __builtin_ctzl,
      • Packit 5c3484
             __builtin_clzl and __builtin_popcountl using
        Packit 5c3484
             the corresponding CIX ct instructions, and
        Packit 5c3484
             __builtin_alpha_cmpbge.  These should give GCC more
        Packit 5c3484
             information about scheduling etc than the asm blocks
        Packit 5c3484
             currently used in longlong.h and gmp-impl.h.
        Packit 5c3484
      • Alpha Unicos: Apparently there's no alloca on this system,
      • Packit 5c3484
             making configure choose the slower
        Packit 5c3484
             malloc-reentrant allocation method.  Is there a better way?
        Packit 5c3484
             Maybe variable-length arrays per notes below.
        Packit 5c3484
      • Alpha Unicos 21164, 21264: .align is not used since it pads
      • Packit 5c3484
             with garbage.  Does the code get the intended slotting required for the
        Packit 5c3484
             claimed speeds?  .align at the start of a function would
        Packit 5c3484
             presumably be safe no matter how it pads.
        Packit 5c3484
      • ARM V5: count_leading_zeros can use the clz
      • Packit 5c3484
             instruction.  For GCC 3.4 and up, do this via __builtin_clzl
        Packit 5c3484
             since then gcc knows it's "predicable".
        Packit 5c3484
      • Itanium: GCC 3.4 introduces __builtin_popcount which can be
      • Packit 5c3484
             used instead of an asm block.  The builtin should give gcc
        Packit 5c3484
             more opportunities for scheduling, bundling and predication.
        Packit 5c3484
             __builtin_ctz similarly (it just uses popcount as per
        Packit 5c3484
             current longlong.h).
        Packit 5c3484
      • UltraSPARC/64: Optimize mpn_mul_1, mpn_addmul_1,
      • Packit 5c3484
             for s2 < 2^32 (or perhaps for any zero 16-bit s2 chunk).  Not sure how
        Packit 5c3484
             much this can improve the speed, though, since the symmetry that we rely
        Packit 5c3484
             on is lost.  Perhaps we can just gain cycles when s2 < 2^16, or more
        Packit 5c3484
             accurately, when two 16-bit s2 chunks which are 16 bits apart are zero.
        Packit 5c3484
      • UltraSPARC/64: Write native mpn_submul_1, analogous to
      • Packit 5c3484
             mpn_addmul_1.
        Packit 5c3484
      • UltraSPARC/64: Write umul_ppmm. Using four
      • Packit 5c3484
             "mulx"s either with an asm block or via the generic C code is
        Packit 5c3484
             about 90 cycles.  Try using fp operations, and also try using karatsuba
        Packit 5c3484
             for just three "mulx"s.
        Packit 5c3484
      • UltraSPARC/32: Rewrite mpn_lshift, mpn_rshift.
      • Packit 5c3484
             Will give 2 cycles/limb.  Trivial modifications of mpn/sparc64 should do.
        Packit 5c3484
      • UltraSPARC/32: Write special mpn_Xmul_1 loops for s2 < 2^16.
      • Packit 5c3484
      • UltraSPARC/32: Use mulx for umul_ppmm if
      • Packit 5c3484
             possible (see commented out code in longlong.h).  This is unlikely to
        Packit 5c3484
             save more than a couple of cycles, so perhaps isn't worth bothering with.
        Packit 5c3484
      • UltraSPARC/32: On Solaris gcc doesn't give us __sparc_v9__
      • Packit 5c3484
             or anything to indicate V9 support when -mcpu=v9 is selected.  See
        Packit 5c3484
             gcc/config/sol2-sld-64.h.  Will need to pass something through from
        Packit 5c3484
             ./configure to select the right code in longlong.h.  (Currently nothing
        Packit 5c3484
             is lost because mulx for multiplying is commented out.)
        Packit 5c3484
      • UltraSPARC/32: mpn_divexact_1 and
      • Packit 5c3484
             mpn_modexact_1c_odd can use a 64-bit inverse and take
        Packit 5c3484
             64-bits at a time from the dividend, as per the 32-bit divisor case in
        Packit 5c3484
             mpn/sparc64/mode1o.c.  This must be done in assembler, since the full
        Packit 5c3484
             64-bit registers (%gN) are not available from C.
        Packit 5c3484
      • UltraSPARC/32: mpn_divexact_by3c can work 64-bits at a time
      • Packit 5c3484
             using mulx, in assembler.  This would be the same as for
        Packit 5c3484
             sparc64.
        Packit 5c3484
      • UltraSPARC: binvert_limb might save a few cycles from
      • Packit 5c3484
             masking down to just the useful bits at each point in the calculation,
        Packit 5c3484
             since mulx speed depends on the highest bit set.  Either
        Packit 5c3484
             explicit masks or small types like short and
        Packit 5c3484
             int ought to work.
        Packit 5c3484
      • Sparc64 HAL R1 popc: This chip reputedly implements
      • Packit 5c3484
             popc properly (see gcc sparc.md).  Would need to recognise
        Packit 5c3484
             it as sparchalr1 or something in configure / config.sub /
        Packit 5c3484
             config.guess.  popc_limb in gmp-impl.h could use this (per
        Packit 5c3484
             commented out code).  count_trailing_zeros could use it too.
        Packit 5c3484
      • PA64: Improve mpn_addmul_1, mpn_submul_1, and
      • Packit 5c3484
             mpn_mul_1.  The current code runs at 11 cycles/limb.  It
        Packit 5c3484
             should be possible to saturate the cache, which will happen at 8
        Packit 5c3484
             cycles/limb (7.5 for mpn_mul_1).  Write special loops for s2 < 2^32;
        Packit 5c3484
             it should be possible to make them run at about 5 cycles/limb.
        Packit 5c3484
      • PPC601: See which of the power or powerpc32 code runs better. Currently
      • Packit 5c3484
             the powerpc32 is used, but only because it's the default for
        Packit 5c3484
             powerpc*.
        Packit 5c3484
      • PPC630: Rewrite mpn_addmul_1, mpn_submul_1, and
      • Packit 5c3484
             mpn_mul_1.  Use both integer and floating-point operations,
        Packit 5c3484
             possibly two floating-point and one integer limb per loop.  Split operands
        Packit 5c3484
             into four 16-bit chunks for fast fp operations.  Should easily reach 9
        Packit 5c3484
             cycles/limb (using one int + one fp), but perhaps even 7 cycles/limb
        Packit 5c3484
             (using one int + two fp).
        Packit 5c3484
      • PPC630: mpn_rshift could do the same sort of unrolled loop
      • Packit 5c3484
             as mpn_lshift.  Some judicious use of m4 might let the two
        Packit 5c3484
             share source code, or with a register to control the loop direction
        Packit 5c3484
             perhaps even share object code.
        Packit 5c3484
      • Implement mpn_mul_basecase and mpn_sqr_basecase
      • Packit 5c3484
             for important machines.  Helping the generic sqr_basecase.c with an
        Packit 5c3484
             mpn_sqr_diagonal might be enough for some of the RISCs.
        Packit 5c3484
      • POWER2/POWER2SC: Schedule mpn_lshift/mpn_rshift.
      • Packit 5c3484
             Will bring time from 1.75 to 1.25 cycles/limb.
        Packit 5c3484
      • X86: Optimize non-MMX mpn_lshift for shifts by 1. (See
      • Packit 5c3484
             Pentium code.)
        Packit 5c3484
      • X86: Good authority has it that in the past an inline rep
      • Packit 5c3484
             movs would upset GCC register allocation for the whole function.
        Packit 5c3484
             Is this still true in GCC 3?  It uses rep movs itself for
        Packit 5c3484
             __builtin_memcpy.  Examine the code for some simple and
        Packit 5c3484
             complex functions to find out.  Inlining rep movs would be
        Packit 5c3484
             desirable, it'd be both smaller and faster.
        Packit 5c3484
      • Pentium P54: mpn_lshift and mpn_rshift can come
      • Packit 5c3484
             down from 6.0 c/l to 5.5 or 5.375 by paying attention to pairing after
        Packit 5c3484
             shrdl and shldl, see mpn/x86/pentium/README.
        Packit 5c3484
      • Pentium P55 MMX: mpn_lshift and mpn_rshift
      • Packit 5c3484
             might benefit from some destination prefetching.
        Packit 5c3484
      • PentiumPro: mpn_divrem_1 might be able to use a
      • Packit 5c3484
             mul-by-inverse, hoping for maybe 30 c/l.
        Packit 5c3484
      • K7: mpn_lshift and mpn_rshift might be able to
      • Packit 5c3484
             do something branch-free for unaligned startups, and shaving one insn
        Packit 5c3484
             from the loop with alternative indexing might save a cycle.
        Packit 5c3484
      • PPC32: Try using fewer registers in the current mpn_lshift.
      • Packit 5c3484
             The pipeline is now extremely deep, perhaps unnecessarily deep.
        Packit 5c3484
      • Fujitsu VPP: Vectorize main functions, perhaps in assembly language.
      • Packit 5c3484
      • Fujitsu VPP: Write mpn_mul_basecase and
      • Packit 5c3484
             mpn_sqr_basecase.  This should use a "vertical multiplication
        Packit 5c3484
             method", to avoid carry propagation.  splitting one of the operands in
        Packit 5c3484
             11-bit chunks.
        Packit 5c3484
      • Pentium: mpn_lshift by 31 should use the special rshift
      • Packit 5c3484
             by 1 code, and vice versa mpn_rshift by 31 should use the
        Packit 5c3484
             special lshift by 1.  This would be best as a jump across to the other
        Packit 5c3484
             routine, could let both live in lshift.asm and omit rshift.asm on finding
        Packit 5c3484
             mpn_rshift already provided.
        Packit 5c3484
      • Cray T3E: Experiment with optimization options. In particular,
      • Packit 5c3484
             -hpipeline3 seems promising.  We should at least up -O to -O2 or -O3.
        Packit 5c3484
      • Cray: mpn_com and mpn_and_n etc very probably
      • Packit 5c3484
             wants a pragma like MPN_COPY_INCR.
        Packit 5c3484
      • Cray vector systems: mpn_lshift, mpn_rshift,
      • Packit 5c3484
             mpn_popcount and mpn_hamdist are nice and small
        Packit 5c3484
             and could be inlined to avoid function calls.
        Packit 5c3484
      • Cray: Variable length arrays seem to be faster than the tal-notreent.c
      • Packit 5c3484
             scheme.  Not sure why, maybe they merely give the compiler more
        Packit 5c3484
             information about aliasing (or the lack thereof).  Would like to modify
        Packit 5c3484
             TMP_ALLOC to use them, or introduce a new scheme.  Memory
        Packit 5c3484
             blocks wanted unconditionally are easy enough, those wanted only
        Packit 5c3484
             sometimes are a problem.  Perhaps a special size calculation to ask for a
        Packit 5c3484
             dummy length 1 when unwanted, or perhaps an inlined subroutine
        Packit 5c3484
             duplicating code under each conditional.  Don't really want to turn
        Packit 5c3484
             everything into a dog's dinner just because Cray don't offer an
        Packit 5c3484
             alloca.
        Packit 5c3484
      • Cray: mpn_get_str on power-of-2 bases ought to vectorize.
      • Packit 5c3484
             Does it?  bits_per_digit and the inner loop over bits in a
        Packit 5c3484
             limb might prevent it.  Perhaps special cases for binary, octal and hex
        Packit 5c3484
             would be worthwhile (very possibly for all processors too).
        Packit 5c3484
      • S390: BSWAP_LIMB_FETCH looks like it could be done with
      • Packit 5c3484
             lrvg, as per glibc sysdeps/s390/s390-64/bits/byteswap.h.
        Packit 5c3484
             This is only for 64-bit mode or something is it, since 32-bit mode has
        Packit 5c3484
             other code?  Also, is it worth using for BSWAP_LIMB too, or
        Packit 5c3484
             would that mean a store and re-fetch?  Presumably that's what comes out
        Packit 5c3484
             in glibc.
        Packit 5c3484
      • Improve count_leading_zeros for 64-bit machines:
      • Packit 5c3484
          
        Packit 5c3484
        	   if ((x &gt&gt 32) == 0) { x &lt&lt= 32; cnt += 32; }
        Packit 5c3484
        	   if ((x &gt&gt 48) == 0) { x &lt&lt= 16; cnt += 16; }
        Packit 5c3484
        	   ... 
        Packit 5c3484
      • IRIX 6 MIPSpro compiler has an __inline which could perhaps
      • Packit 5c3484
             be used in __GMP_EXTERN_INLINE.  What would be the right way
        Packit 5c3484
             to identify suitable versions of that compiler?
        Packit 5c3484
      • IRIX cc is rumoured to have an _int_mult_upper
      • Packit 5c3484
             (in <intrinsics.h> like Cray), but it didn't seem to
        Packit 5c3484
             exist on some IRIX 6.5 systems tried.  If it does actually exist
        Packit 5c3484
             somewhere it would very likely be an improvement over a function call to
        Packit 5c3484
             umul.asm.
        Packit 5c3484
      • mpn_get_str final divisions by the base with
      • Packit 5c3484
             udiv_qrnd_unnorm could use some sort of multiply-by-inverse
        Packit 5c3484
             on suitable machines.  This ends up happening for decimal by presenting
        Packit 5c3484
             the compiler with a run-time constant, but the same for other bases would
        Packit 5c3484
             be good.  Perhaps use could be made of the fact base<256.
        Packit 5c3484
      • mpn_umul_ppmm, mpn_udiv_qrnnd: Return a
      • Packit 5c3484
             structure like div_t to avoid going through memory, in
        Packit 5c3484
             particular helping RISCs that don't do store-to-load forwarding.  Clearly
        Packit 5c3484
             this is only possible if the ABI returns a structure of two
        Packit 5c3484
             mp_limb_ts in registers.
        Packit 5c3484
             
        Packit 5c3484
             On PowerPC, structures are returned in memory on AIX and Darwin.  In SVR4
        Packit 5c3484
             they're returned in registers, except that draft SVR4 had said memory, so
        Packit 5c3484
             it'd be prudent to check which is done.  We can jam the compiler into the
        Packit 5c3484
             right mode if we know how, since all this is purely internal to libgmp.
        Packit 5c3484
             (gcc has an option, though of course gcc doesn't matter since we use
        Packit 5c3484
             inline asm there.)
        Packit 5c3484
        Packit 5c3484
        Packit 5c3484

        New Functionality

        Packit 5c3484
          Packit 5c3484
        • Maybe add mpz_crr (Chinese Remainder Reconstruction).
        • Packit 5c3484
        • Let `0b' and `0B' mean binary input everywhere.
        • Packit 5c3484
        • mpz_init and mpq_init could do lazy allocation.
        • Packit 5c3484
               Set ALLOC(var) to 0 to indicate nothing allocated, and let
          Packit 5c3484
               _mpz_realloc do the initial alloc.  Set
          Packit 5c3484
               z->_mp_d to a dummy that mpz_get_ui and
          Packit 5c3484
               similar can unconditionally fetch from.  Niels Möller has had a go at
          Packit 5c3484
               this.
          Packit 5c3484
               
          Packit 5c3484
               The advantages of the lazy scheme would be:
          Packit 5c3484
               
            Packit 5c3484
                 
          • Initial allocate would be the size required for the first value
          • Packit 5c3484
                      stored, rather than getting 1 limb in mpz_init and then
            Packit 5c3484
                      more or less immediately reallocating.
            Packit 5c3484
                 
          • mpz_init would only store magic values in the
          • Packit 5c3484
                      mpz_t fields, and could be inlined.
            Packit 5c3484
                 
          • A fixed initializer could even be used by applications, like
          • Packit 5c3484
                      mpz_t z = MPZ_INITIALIZER;, which might be convenient
            Packit 5c3484
                      for globals.
            Packit 5c3484
                 
            Packit 5c3484
                 The advantages of the current scheme are:
            Packit 5c3484
                 
              Packit 5c3484
                   
            • mpz_set_ui and other similar routines needn't check the
            • Packit 5c3484
                        size allocated and can just store unconditionally.
              Packit 5c3484
                   
            • mpz_set_ui and perhaps others like
            • Packit 5c3484
                        mpz_tdiv_r_ui and a prospective
              Packit 5c3484
                        mpz_set_ull could be inlined.
              Packit 5c3484
                   
              Packit 5c3484
            • Add mpf_out_raw and mpf_inp_raw. Make sure
            • Packit 5c3484
                   format is portable between 32-bit and 64-bit machines, and between
              Packit 5c3484
                   little-endian and big-endian machines.  A format which MPFR can use too
              Packit 5c3484
                   would be good.
              Packit 5c3484
            • mpn_and_n ... mpn_copyd: Perhaps make the mpn
            • Packit 5c3484
                   logops and copys available in gmp.h, either as library functions or
              Packit 5c3484
                   inlines, with the availability of library functions instantiated in the
              Packit 5c3484
                   generated gmp.h at build time.
              Packit 5c3484
            • mpz_set_str etc variants taking string lengths rather than
            • Packit 5c3484
                   null-terminators.
              Packit 5c3484
            • mpz_andn, mpz_iorn, mpz_nand,
            • Packit 5c3484
                   mpz_nior, mpz_xnor might be useful additions,
              Packit 5c3484
                   if they could share code with the current such functions (which should be
              Packit 5c3484
                   possible).
              Packit 5c3484
            • mpz_and_ui etc might be of use sometimes. Suggested by
            • Packit 5c3484
                   Niels Möller.
              Packit 5c3484
            • mpf_set_str and mpf_inp_str could usefully
            • Packit 5c3484
                   accept 0x, 0b etc when base==0.  Perhaps the exponent could default to
              Packit 5c3484
                   decimal in this case, with a further 0x, 0b etc allowed there.
              Packit 5c3484
                   Eg. 0xFFAA@0x5A.  A leading "0" for octal would match the integers, but
              Packit 5c3484
                   probably something like "0.123" ought not mean octal.
              Packit 5c3484
            • GMP_LONG_LONG_LIMB or some such could become a documented
            • Packit 5c3484
                   feature of gmp.h, so applications could know whether to
              Packit 5c3484
                   printf a limb using %lu or %Lu.
              Packit 5c3484
            • GMP_PRIdMP_LIMB and similar defines following C99
            • Packit 5c3484
                   <inttypes.h> might be of use to applications printing limbs.  But
              Packit 5c3484
                   if GMP_LONG_LONG_LIMB or whatever is added then perhaps this
              Packit 5c3484
                   can easily enough be left to applications.
              Packit 5c3484
            • gmp_printf could accept %b for binary output.
            • Packit 5c3484
                   It'd be nice if it worked for plain int etc too, not just
              Packit 5c3484
                   mpz_t etc.
              Packit 5c3484
            • gmp_printf in fact could usefully accept an arbitrary base,
            • Packit 5c3484
                   for both integer and float conversions.  A base either in the format
              Packit 5c3484
                   string or as a parameter with * should be allowed.  Maybe
              Packit 5c3484
                   &13b (b for base) or something like that.
              Packit 5c3484
            • gmp_printf could perhaps accept mpq_t for float
            • Packit 5c3484
                   conversions, eg. "%.4Qf".  This would be merely for
              Packit 5c3484
                   convenience, but still might be useful.  Rounding would be the same as
              Packit 5c3484
                   for an mpf_t (ie. currently round-to-nearest, but not
              Packit 5c3484
                   actually documented).  Alternately, perhaps a separate
              Packit 5c3484
                   mpq_get_str_point or some such might be more use.  Suggested
              Packit 5c3484
                   by Pedro Gimeno.
              Packit 5c3484
            • mpz_rscan0 or mpz_revscan0 or some such
            • Packit 5c3484
                   searching towards the low end of an integer might match
              Packit 5c3484
                   mpz_scan0 nicely.  Likewise for scan1.
              Packit 5c3484
                   Suggested by Roberto Bagnara.
              Packit 5c3484
            • mpz_bit_subset or some such to test whether one integer is a
            • Packit 5c3484
                   bitwise subset of another might be of use.  Some sort of return value
              Packit 5c3484
                   indicating whether it's a proper or non-proper subset would be good and
              Packit 5c3484
                   wouldn't cost anything in the implementation.  Suggested by Roberto
              Packit 5c3484
                   Bagnara.
              Packit 5c3484
            • mpf_get_ld, mpf_set_ld: Conversions between
            • Packit 5c3484
                   mpf_t and long double, suggested by Dan
              Packit 5c3484
                   Christensen.  Other long double routines might be desirable
              Packit 5c3484
                   too, but mpf would be a start.
              Packit 5c3484
                   
              Packit 5c3484
                   long double is an ANSI-ism, so everything involving it would
              Packit 5c3484
                   need to be suppressed on a K&R compiler.
              Packit 5c3484
                   
              Packit 5c3484
                   There'd be some work to be done by configure to recognise
              Packit 5c3484
                   the format in use, MPFR has a start on this.  Often long
              Packit 5c3484
                   double is the same as double, which is easy but
              Packit 5c3484
                   pretty pointless.  A single float format detector macro could look at
              Packit 5c3484
                   double then long double
              Packit 5c3484
                   
              Packit 5c3484
                   Sometimes there's a compiler option for the size of a long
              Packit 5c3484
                   double, eg. xlc on AIX can use either 64-bit or 128-bit.  It's
              Packit 5c3484
                   probably simplest to regard this as a compiler compatibility issue, and
              Packit 5c3484
                   leave it to users or sysadmins to ensure application and library code is
              Packit 5c3484
                   built the same.
              Packit 5c3484
            • mpz_sqrt_if_perfect_square: When
            • Packit 5c3484
                   mpz_perfect_square_p does its tests it calculates a square
              Packit 5c3484
                   root and then discards it.  For some applications it might be useful to
              Packit 5c3484
                   return that root.  Suggested by Jason Moxham.
              Packit 5c3484
            • mpz_get_ull, mpz_set_ull,
            • Packit 5c3484
                   mpz_get_sll, mpz_get_sll: Conversions for
              Packit 5c3484
                   long long.  These would aid interoperability, though a
              Packit 5c3484
                   mixture of GMP and long long would probably not be too
              Packit 5c3484
                   common.  Since long long is not always available (it's in
              Packit 5c3484
                   C99 and GCC though), disadvantages of using long long in
              Packit 5c3484
                   libgmp.a would be
              Packit 5c3484
                   
                Packit 5c3484
                     
              • Library contents vary according to the build compiler.
              • Packit 5c3484
                     
              • gmp.h would need an ugly #ifdef block to decide if the
              • Packit 5c3484
                          application compiler could take the long long
                Packit 5c3484
                          prototypes.
                Packit 5c3484
                     
              • Some sort of LIBGMP_HAS_LONGLONG might be wanted to
              • Packit 5c3484
                          indicate whether the functions are available.  (Applications using
                Packit 5c3484
                          autoconf could probe the library too.)
                Packit 5c3484
                     
                Packit 5c3484
                     It'd be possible to defer the need for long long to
                Packit 5c3484
                     application compile time, by having something like
                Packit 5c3484
                     mpz_set_2ui called with two halves of a long
                Packit 5c3484
                     long.  Disadvantages of this would be,
                Packit 5c3484
                     
                  Packit 5c3484
                       
                • Bigger code in the application, though perhaps not if a long
                • Packit 5c3484
                            long is normally passed as two halves anyway.
                  Packit 5c3484
                       
                • mpz_get_ull would be a rather big inline, or would have
                • Packit 5c3484
                            to be two function calls.
                  Packit 5c3484
                       
                • mpz_get_sll would be a worse inline, and would put the
                • Packit 5c3484
                            treatment of -0x10..00 into applications (see
                  Packit 5c3484
                            mpz_get_si correctness above).
                  Packit 5c3484
                       
                • Although having libgmp.a independent of the build compiler is nice,
                • Packit 5c3484
                            it sort of sacrifices the capabilities of a good compiler to
                  Packit 5c3484
                            uniformity with inferior ones.
                  Packit 5c3484
                       
                  Packit 5c3484
                       Plain use of long long is probably the lesser evil, if only
                  Packit 5c3484
                       because it makes best use of gcc.  In fact perhaps it would suffice to
                  Packit 5c3484
                       guarantee long long conversions only when using GCC for both
                  Packit 5c3484
                       application and library.  That would cover free software, and we can
                  Packit 5c3484
                       worry about selected vendor compilers later.
                  Packit 5c3484
                       
                  Packit 5c3484
                       In C++ the situation is probably clearer, we demand fairly recent C++ so
                  Packit 5c3484
                       long long should be available always.  We'd probably prefer
                  Packit 5c3484
                       to have the C and C++ the same in respect of long long
                  Packit 5c3484
                       support, but it would be possible to have it unconditionally in gmpxx.h,
                  Packit 5c3484
                       by some means or another.
                  Packit 5c3484
                • mpz_strtoz parsing the same as strtol.
                • Packit 5c3484
                       Suggested by Alexander Kruppa.
                  Packit 5c3484
                  Packit 5c3484
                  Packit 5c3484
                  Packit 5c3484

                  Configuration

                  Packit 5c3484
                  Packit 5c3484
                    Packit 5c3484
                  • Alpha ev7, ev79: Add code to config.guess to detect these. Believe ev7
                  • Packit 5c3484
                         will be "3-1307" in the current switch, but need to verify that.  (On
                    Packit 5c3484
                         OSF, current configfsf.guess identifies ev7 using psrinfo, we need to do
                    Packit 5c3484
                         it ourselves for other systems.)
                    Packit 5c3484
                  • Alpha OSF: Libtool (version 1.5) doesn't seem to recognise this system is
                  • Packit 5c3484
                         "pic always" and ends up running gcc twice with the same options.  This
                    Packit 5c3484
                         is wasteful, but harmless.  Perhaps a newer libtool will be better.
                    Packit 5c3484
                  • ARM: umul_ppmm in longlong.h always uses umull,
                  • Packit 5c3484
                         but is that available only for M series chips or some such?  Perhaps it
                    Packit 5c3484
                         should be configured in some way.
                    Packit 5c3484
                  • HPPA: config.guess should recognize 7000, 7100, 7200, and 8x00.
                  • Packit 5c3484
                  • HPPA: gcc 3.2 introduces a -mschedule=7200 etc parameter,
                  • Packit 5c3484
                         which could be driven by an exact hppa cpu type.
                    Packit 5c3484
                  • Mips: config.guess should say mipsr3000, mipsr4000, mipsr10000, etc.
                  • Packit 5c3484
                         "hinv -c processor" gives lots of information on Irix.  Standard
                    Packit 5c3484
                         config.guess appends "el" to indicate endianness, but
                    Packit 5c3484
                         AC_C_BIGENDIAN seems the best way to handle that for GMP.
                    Packit 5c3484
                  • PowerPC: The function descriptor nonsense for AIX is currently driven by
                  • Packit 5c3484
                         *-*-aix*.  It might be more reliable to do some sort of
                    Packit 5c3484
                         feature test, examining the compiler output perhaps.  It might also be
                    Packit 5c3484
                         nice to merge the aix.m4 files into powerpc-defs.m4.
                    Packit 5c3484
                  • config.m4 is generated only by the configure script, it won't be
                  • Packit 5c3484
                         regenerated by config.status.  Creating it as an AC_OUTPUT
                    Packit 5c3484
                         would work, but it might upset "make" to have things like L$
                    Packit 5c3484
                         get into the Makefiles through AC_SUBST.
                    Packit 5c3484
                         AC_CONFIG_COMMANDS would be the alternative.  With some
                    Packit 5c3484
                         careful m4 quoting the changequote calls might not be
                    Packit 5c3484
                         needed, which might free up the order in which things had to be output.
                    Packit 5c3484
                  • Automake: Latest automake has a CCAS, CCASFLAGS
                  • Packit 5c3484
                         scheme.  Though we probably wouldn't be using its assembler support we
                    Packit 5c3484
                         could try to use those variables in compatible ways.
                    Packit 5c3484
                  • GMP_LDFLAGS could probably be done with plain
                  • Packit 5c3484
                         LDFLAGS already used by automake for all linking.  But with
                    Packit 5c3484
                         a bit of luck the next libtool will pass pretty much all
                    Packit 5c3484
                         CFLAGS through to the compiler when linking, making
                    Packit 5c3484
                         GMP_LDFLAGS unnecessary.
                    Packit 5c3484
                  • mpn/Makeasm.am uses -c and -o together in the
                  • Packit 5c3484
                         .S and .asm rules, but apparently that isn't completely portable (there's
                    Packit 5c3484
                         an autoconf AC_PROG_CC_C_O test for it).  So far we've not
                    Packit 5c3484
                         had problems, but perhaps the rules could be rewritten to use "foo.s" as
                    Packit 5c3484
                         the temporary, or to do a suitable "mv" of the result.  The only danger
                    Packit 5c3484
                         from using foo.s would be if a compile failed and the temporary foo.s
                    Packit 5c3484
                         then looked like the primary source.  Hopefully if the
                    Packit 5c3484
                         SUFFIXES are ordered to have .S and .asm ahead of .s that
                    Packit 5c3484
                         wouldn't happen.  Might need to check.
                    Packit 5c3484
                    Packit 5c3484
                    Packit 5c3484
                    Packit 5c3484

                    Random Numbers

                    Packit 5c3484
                      Packit 5c3484
                    • _gmp_rand is not particularly fast on the linear
                    • Packit 5c3484
                           congruential algorithm and could stand various improvements.
                      Packit 5c3484
                           
                        Packit 5c3484
                             
                      • Make a second seed area within gmp_randstate_t (or
                      • Packit 5c3484
                                  _mp_algdata rather) to save some copying.
                        Packit 5c3484
                             
                      • Make a special case for a single limb 2exp modulus, to
                      • Packit 5c3484
                                  avoid mpn_mul calls.  Perhaps the same for two limbs.
                        Packit 5c3484
                             
                      • Inline the lc code, to avoid a function call and
                      • Packit 5c3484
                                  TMP_ALLOC for every chunk.
                        Packit 5c3484
                             
                      • Perhaps the 2exp and general LC cases should be split,
                      • Packit 5c3484
                                  for clarity (if the general case is retained).
                        Packit 5c3484
                             
                        Packit 5c3484
                      • gmp_randstate_t used for parameters perhaps should become
                      • Packit 5c3484
                             gmp_randstate_ptr the same as other types.
                        Packit 5c3484
                      • Some of the empirical randomness tests could be included in a "make
                      • Packit 5c3484
                             check".  They ought to work everywhere, for a given seed at least.
                        Packit 5c3484
                        Packit 5c3484
                        Packit 5c3484
                        Packit 5c3484

                        C++

                        Packit 5c3484
                          Packit 5c3484
                        • mpz_class(string), etc: Use the C++ global locale to
                        • Packit 5c3484
                               identify whitespace.
                          Packit 5c3484
                               
                          Packit 5c3484
                               mpf_class(string): Use the C++ global locale decimal point,
                          Packit 5c3484
                               rather than the C one.
                          Packit 5c3484
                               
                          Packit 5c3484
                               Consider making these variant mpz_set_str etc forms
                          Packit 5c3484
                               available for mpz_t too, not just mpz_class
                          Packit 5c3484
                               etc.
                          Packit 5c3484
                        • mpq_class operator+=: Don't emit an unnecessary
                        • Packit 5c3484
                               mpq_set(q,q) before mpz_addmul etc.
                          Packit 5c3484
                        • Put various bits of gmpxx.h into libgmpxx, to avoid excessive inlining.
                        • Packit 5c3484
                               Candidates for this would be,
                          Packit 5c3484
                               
                            Packit 5c3484
                                 
                          • mpz_class(const char *), etc: since they're normally
                          • Packit 5c3484
                                      not fast anyway, and we can hide the exception throw.
                            Packit 5c3484
                                 
                          • mpz_class(string), etc: to hide the cstr
                          • Packit 5c3484
                                      needed to get to the C conversion function.
                            Packit 5c3484
                                 
                          • mpz_class string, char* etc constructors: likewise to
                          • Packit 5c3484
                                      hide the throws and conversions.
                            Packit 5c3484
                                 
                          • mpz_class::get_str, etc: to hide the char*
                          • Packit 5c3484
                                      to string conversion and free.  Perhaps
                            Packit 5c3484
                                      mpz_get_str can write directly into a
                            Packit 5c3484
                                      string, to avoid copying.
                            Packit 5c3484
                                      
                            Packit 5c3484
                                      Consider making such string returning variants
                            Packit 5c3484
                                      available for use with plain mpz_t etc too.
                            Packit 5c3484
                                 
                            Packit 5c3484
                            Packit 5c3484
                            Packit 5c3484

                            Miscellaneous

                            Packit 5c3484
                              Packit 5c3484
                            • mpz_gcdext and mpn_gcdext ought to document
                            • Packit 5c3484
                                   what range of values the generated cofactors can take, and preferably
                              Packit 5c3484
                                   ensure the definition uniquely specifies the cofactors for given inputs.
                              Packit 5c3484
                                   A basic extended Euclidean algorithm or multi-step variant leads to
                              Packit 5c3484
                                   |x|<|b| and |y|<|a| or something like that, but there's probably
                              Packit 5c3484
                                   two solutions under just those restrictions.
                              Packit 5c3484
                            • demos/factorize.c: use mpz_divisible_ui_p rather than
                            • Packit 5c3484
                                   mpz_tdiv_qr_ui.  (Of course dividing multiple primes at a
                              Packit 5c3484
                                   time would be better still.)
                              Packit 5c3484
                            • The various test programs use quite a bit of the main
                            • Packit 5c3484
                                   libgmp.  This establishes good cross-checks, but it might be
                              Packit 5c3484
                                   better to use simple reference routines where possible.  Where it's not
                              Packit 5c3484
                                   possible some attention could be paid to the order of the tests, so a
                              Packit 5c3484
                                   libgmp routine is only used for tests once it seems to be
                              Packit 5c3484
                                   good.
                              Packit 5c3484
                            • MUL_FFT_THRESHOLD etc: the FFT thresholds should allow a
                            • Packit 5c3484
                                   return to a previous k at certain sizes.  This arises basically due to
                              Packit 5c3484
                                   the step effect caused by size multiples effectively used for each k.
                              Packit 5c3484
                                   Looking at a graph makes it fairly clear.
                              Packit 5c3484
                            • __gmp_doprnt_mpf does a rather unattractive round-to-nearest
                            • Packit 5c3484
                                   on the string returned by mpf_get_str.  Perhaps some variant
                              Packit 5c3484
                                   of mpf_get_str could be made which would better suit.
                              Packit 5c3484
                              Packit 5c3484
                              Packit 5c3484
                              Packit 5c3484

                              Aids to Development

                              Packit 5c3484
                                Packit 5c3484
                              • Add ASSERTs at the start of each user-visible mpz/mpq/mpf
                              • Packit 5c3484
                                     function to check the validity of each mp?_t parameter, in
                                Packit 5c3484
                                     particular to check they've been mp?_inited.  This might
                                Packit 5c3484
                                     catch elementary mistakes in user programs.  Care would need to be taken
                                Packit 5c3484
                                     over MPZ_TMP_INITed variables used internally.  If nothing
                                Packit 5c3484
                                     else then consistency checks like size<=alloc, ptr not
                                Packit 5c3484
                                     NULL and ptr+size not wrapping around the address space,
                                Packit 5c3484
                                     would be possible.  A more sophisticated scheme could track
                                Packit 5c3484
                                     _mp_d pointers and ensure only a valid one is used.  Such a
                                Packit 5c3484
                                     scheme probably wouldn't be reentrant, not without some help from the
                                Packit 5c3484
                                     system.
                                Packit 5c3484
                              • tune/time.c could try to determine at runtime whether
                              • Packit 5c3484
                                     getrusage and gettimeofday are reliable.
                                Packit 5c3484
                                     Currently we pretend in configure that the dodgy m68k netbsd 1.4.1
                                Packit 5c3484
                                     getrusage doesn't exist.  If a test might take a long time
                                Packit 5c3484
                                     to run then perhaps cache the result in a file somewhere.
                                Packit 5c3484
                              • tune/time.c could choose the default precision based on the
                              • Packit 5c3484
                                     speed_unittime determined, independent of the method in use.
                                Packit 5c3484
                              • Cray vector systems: CPU frequency could be determined from
                              • Packit 5c3484
                                     sysconf(_SC_CLK_TCK), since it seems to be clock cycle
                                Packit 5c3484
                                     based.  Is this true for all Cray systems?  Would like some documentation
                                Packit 5c3484
                                     or something to confirm.
                                Packit 5c3484
                                Packit 5c3484
                                Packit 5c3484
                                Packit 5c3484

                                Documentation

                                Packit 5c3484
                                  Packit 5c3484
                                • mpz_inp_str (etc) doesn't say when it stops reading digits.
                                • Packit 5c3484
                                • mpn_get_str isn't terribly clear about how many digits it
                                • Packit 5c3484
                                       produces.  It'd probably be possible to say at most one leading zero,
                                  Packit 5c3484
                                       which is what both it and mpz_get_str currently do.  But
                                  Packit 5c3484
                                       want to be careful not to bind ourselves to something that might not suit
                                  Packit 5c3484
                                       another implementation.
                                  Packit 5c3484
                                • va_arg doesn't do the right thing with mpz_t
                                • Packit 5c3484
                                       etc directly, but instead needs a pointer type like MP_INT*.
                                  Packit 5c3484
                                       It'd be good to show how to do this, but we'd either need to document
                                  Packit 5c3484
                                       mpz_ptr and friends, or perhaps fallback on something
                                  Packit 5c3484
                                       slightly nasty with void*.
                                  Packit 5c3484
                                  Packit 5c3484
                                  Packit 5c3484
                                  Packit 5c3484

                                  Bright Ideas

                                  Packit 5c3484
                                  Packit 5c3484

                                  The following may or may not be feasible, and aren't likely to get done in the

                                  Packit 5c3484
                                  near future, but are at least worth thinking about.
                                  Packit 5c3484
                                  Packit 5c3484
                                    Packit 5c3484
                                  • Reorganize longlong.h so that we can inline the operations even for the
                                  • Packit 5c3484
                                         system compiler.  When there is no such compiler feature, make calls to
                                    Packit 5c3484
                                         stub functions.  Write such stub functions for as many machines as
                                    Packit 5c3484
                                         possible.
                                    Packit 5c3484
                                  • longlong.h could declare when it's using, or would like to use,
                                  • Packit 5c3484
                                         mpn_umul_ppmm, and the corresponding umul.asm file could be
                                    Packit 5c3484
                                         included in libgmp only in that case, the same as is effectively done for
                                    Packit 5c3484
                                         __clz_tab.  Likewise udiv.asm and perhaps cntlz.asm.  This
                                    Packit 5c3484
                                         would only be a very small space saving, so perhaps not worth the
                                    Packit 5c3484
                                         complexity.
                                    Packit 5c3484
                                  • longlong.h could be built at configure time by concatenating or
                                  • Packit 5c3484
                                         #including fragments from each directory in the mpn path.  This would
                                    Packit 5c3484
                                         select CPU specific macros the same way as CPU specific assembler code.
                                    Packit 5c3484
                                         Code used would no longer depend on cpp predefines, and the current
                                    Packit 5c3484
                                         nested conditionals could be flattened out.
                                    Packit 5c3484
                                  • mpz_get_si returns 0x80000000 for -0x100000000, whereas it's
                                  • Packit 5c3484
                                         sort of supposed to return the low 31 (or 63) bits.  But this is
                                    Packit 5c3484
                                         undocumented, and perhaps not too important.
                                    Packit 5c3484
                                  • mpz_init_set* and mpz_realloc could allocate
                                  • Packit 5c3484
                                         say an extra 16 limbs over what's needed, so as to reduce the chance of
                                    Packit 5c3484
                                         having to do a reallocate if the mpz_t grows a bit more.
                                    Packit 5c3484
                                         This could only be an option, since it'd badly bloat memory usage in
                                    Packit 5c3484
                                         applications using many small values.
                                    Packit 5c3484
                                  • mpq functions could perhaps check for numerator or
                                  • Packit 5c3484
                                         denominator equal to 1, on the assumption that integers or
                                    Packit 5c3484
                                         denominator-only values might be expected to occur reasonably often.
                                    Packit 5c3484
                                  • count_trailing_zeros is used on more or less uniformly
                                  • Packit 5c3484
                                         distributed numbers in a couple of places.  For some CPUs
                                    Packit 5c3484
                                         count_trailing_zeros is slow and it's probably worth handling
                                    Packit 5c3484
                                         the frequently occurring 0 to 2 trailing zeros cases specially.
                                    Packit 5c3484
                                  • mpf_t might like to let the exponent be undefined when
                                  • Packit 5c3484
                                         size==0, instead of requiring it 0 as now.  It should be possible to do
                                    Packit 5c3484
                                         size==0 tests before paying attention to the exponent.  The advantage is
                                    Packit 5c3484
                                         not needing to set exp in the various places a zero result can arise,
                                    Packit 5c3484
                                         which avoids some tedium but is otherwise perhaps not too important.
                                    Packit 5c3484
                                         Currently mpz_set_f and mpf_cmp_ui depend on
                                    Packit 5c3484
                                         exp==0, maybe elsewhere too.
                                    Packit 5c3484
                                  • __gmp_allocate_func: Could use GCC __attribute__
                                  • Packit 5c3484
                                         ((malloc)) on this, though don't know if it'd do much.  GCC 3.0
                                    Packit 5c3484
                                         allows that attribute on functions, but not function pointers (see info
                                    Packit 5c3484
                                         node "Attribute Syntax"), so would need a new autoconf test.  This can
                                    Packit 5c3484
                                         wait until there's a GCC that supports it.
                                    Packit 5c3484
                                  • mpz_add_ui contains two __GMPN_COPYs, one from
                                  • Packit 5c3484
                                         mpn_add_1 and one from mpn_sub_1.  If those two
                                    Packit 5c3484
                                         routines were opened up a bit maybe that code could be shared.  When a
                                    Packit 5c3484
                                         copy needs to be done there's no carry to append for the add, and if the
                                    Packit 5c3484
                                         copy is non-empty no high zero for the sub.
                                    Packit 5c3484
                                    Packit 5c3484
                                    Packit 5c3484
                                    Packit 5c3484

                                    Old and Obsolete Stuff

                                    Packit 5c3484
                                    Packit 5c3484

                                    The following tasks apply to chips or systems that are old and/or obsolete.

                                    Packit 5c3484
                                    It's unlikely anything will be done about them unless anyone is actively using
                                    Packit 5c3484
                                    them.
                                    Packit 5c3484
                                    Packit 5c3484
                                      Packit 5c3484
                                    • Sparc32: The integer based udiv_nfp.asm used to be selected by
                                    • Packit 5c3484
                                           configure --nfp but that option is gone now that autoconf is
                                      Packit 5c3484
                                           used.  The file could go somewhere suitable in the mpn search if any
                                      Packit 5c3484
                                           chips might benefit from it, though it's possible we don't currently
                                      Packit 5c3484
                                           differentiate enough exact cpu types to do this properly.
                                      Packit 5c3484
                                    • VAX D and G format double floats are straightforward and
                                    • Packit 5c3484
                                           could perhaps be handled directly in __gmp_extract_double
                                      Packit 5c3484
                                           and maybe in mpn_get_d, rather than falling back on the
                                      Packit 5c3484
                                           generic code.  (Both formats are detected by configure.)
                                      Packit 5c3484
                                      Packit 5c3484
                                      Packit 5c3484
                                      Packit 5c3484

                                      Packit 5c3484
                                      Packit 5c3484
                                      </body>
                                      Packit 5c3484
                                      </html>
                                      Packit 5c3484
                                      Packit 5c3484
                                      Packit 5c3484
                                      Local variables:
                                      Packit 5c3484
                                      eval: (add-hook 'write-file-hooks 'time-stamp)
                                      Packit 5c3484
                                      time-stamp-start: "This file current as of "
                                      Packit 5c3484
                                      time-stamp-format: "%:d %3b %:y"
                                      Packit 5c3484
                                      time-stamp-end: "\\."
                                      Packit 5c3484
                                      time-stamp-line-limit: 50
                                      Packit 5c3484
                                      End:
                                      Packit 5c3484
                                      -->