All current (2001) S/390 and z/Architecture machines are single-issue, but some newer machines have a deep pipeline. Software-pipelining is therefore beneficial. * mpn_add_n, mpn_sub_n: Use code along the lines below. Two-way unrolling would be adequate. mp_limb_t mpn_add_n (mp_ptr rp, mp_srcptr up, mp_srcptr vp, mp_size_t n) { mp_limb_t a, b, r, cy; mp_size_t i; mp_limb_t mm = -1; cy = 0; up += n; vp += n; rp += n; i = -n; do { a = up[i]; b = vp[i]; r = a + b + cy; rp[i] = r; cy = (((a & b) | ((a | b) & (r ^ mm)))) >> 31; i++; } while (i < 0); return cy; } * mpn_lshift, mpn_rshift: Use SLDL/SRDL, and two-way unrolling. * mpn_mul_1, mpn_addmul_1, mpn_submul_1: For machines with just signed multiply (MR), use two loops, similar to the corresponding VAX or POWER functions. Handle carry like for mpn_add_n.