[gcc r14-8835] wide-int: Fix mul_internal overflow handling [PR113753]

public inbox for gcc-cvs@sourceware.org
help / color / mirror / Atom feed

* [gcc r14-8835] wide-int: Fix mul_internal overflow handling [PR113753]
@ 2024-02-07  8:45 Jakub Jelinek
  0 siblings, 0 replies; only message in thread
From: Jakub Jelinek @ 2024-02-07  8:45 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:d755a82079d61783eb0a573b0c74024d29359e4c

commit r14-8835-gd755a82079d61783eb0a573b0c74024d29359e4c
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Wed Feb 7 09:44:33 2024 +0100

    wide-int: Fix mul_internal overflow handling [PR113753]
    
    As the following testcases show, the overflow (needs_overflow) and high
    handling in wi::mul_internal seem to only work correctly for either
    small precisions (less than or equal to 32, that is handled by earlier
    simpler code, not the full Knuth's multiplication algorithm) or for
    precisions which are multiple of HOST_BITS_PER_WIDE_INT, so it happened
    to work fine in most pre-_BitInt era types (and for large bitfields we
    probably didn't have good coverage or were lucky and nothing was asking
    if there was overflow or not; I think high multiplication is typically
    used only when we have optab in corresponding mode).
    E.g. on the gcc.dg/bitint-87.c testcase, there were overflow warnings
    emitted only the the / 2wb * 3wb _BitInt(128) cases, but not in the
    other precisions.
    
    I found 3 issues when prec > HOST_BITS_PER_HALF_WIDE_INT and
    (prec % HOST_BITS_PER_WIDE_INT) != 0:
    1) the checking for overflow was simply checking the values of the
       r array members from half_blocks_needed to 2 * half_blocks_needed - 1,
       for UNSIGNED overflow checking if any of them is non-zero, for
       SIGNED comparing them if any is different from top where top is computed
       from the sign bit of the result (see below); similarly, the highpart
       multiplication simply picks the half limbs at r + half_blocks_needed
       offset; and furthermore, for SIGNED there is an adjustment if either
       operand was negative which also just walks r half-limbs from
       half_blocks_needed onwards;
       this works great for precisions like 64 or 128, but for precisions like
       129, 159, 160 or 161 doesn't, it would need to walk the bits in the
       half-limbs starting right above the most significant bit of the base
       precision; that can be up to a whole half-limb and all but one bit from
       the one below it earlier
    2) as the comment says, the multiplication is originally done as unsigned
       multiplication, with adjustment of the high bits which subtracts the
       other operand once:
          if (wi::neg_p (op1))
            {
              b = 0;
              for (i = 0; i < half_blocks_needed; i++)
                {
                  t = (unsigned HOST_WIDE_INT)r[i + half_blocks_needed]
                    - (unsigned HOST_WIDE_INT)v[i] - b;
                  r[i + half_blocks_needed] = t & HALF_INT_MASK;
                  b = t >> (HOST_BITS_PER_WIDE_INT - 1);
                }
            }
       and similarly for the other one.  Now, this also only works nicely if
       a negative operand has just a single sign bit set in the given precision;
       but we were unpacking the operands with wi_unpack (..., SIGNED);, so
       say for the negative operand in 129-bit precision, that means the least
       significant bit of u[half_blocks_needed - 2] (or v instead of u depending
       on which argument it is) is the set sign bit, but then there are 31
       further copies of the sign bit in u[half_blocks_needed - 2] and
       further 32 copies in u[half_blocks_needed - 1]; the above adjustment
       for signed operands doesn't really do the right job in such cases, it
       would need to subtract many more times the other operand
    3) the computation of top for SIGNED
              top = r[(half_blocks_needed) - 1];
              top = SIGN_MASK (top << (HOST_BITS_PER_WIDE_INT / 2));
              top &= mask;
       also uses the most significant bit which fits into prec of the result
       only if prec is multiple of HOST_BITS_PER_WIDE_INT, otherwise we need
       to look at a different bit and sometimes it can be also a bit in
       r[half_blocks_needed - 2]
    
    For 1), while for UNSIGNED overflow it could be fairly easy to check
    the bits above prec in r half-limbs for being non-zero, doing all the
    shifts also in the SIGNED adjustment code in 2 further locations and finally
    for the high handling (unless we want to assert one doesn't do the highpart
    multiply for such precisions) would be quite ugly and hard to maintain, so
    I instead chose (implemented in the second hunk) to shift the
    beyond-precision bits up such that the expectations of the rest of the
    code are met, that is the LSB of r[half_blocks_needed] after adjustment
    is the bit immediately above the precision, etc.  We don't need to care
    about the bits it shifts out, because the multiplication will yield at most
    2 * prec bits.
    
    For 2), the patch changes the wi_unpack argument from SIGNED to UNSIGNED,
    so that we get all zero bits above the precision.
    
    And finally for 3) it does shifts and perhaps picks lower r half-limb so
    that it uses the actual MSB of the result within prec.
    
    2024-02-07  Jakub Jelinek  <jakub@redhat.com>
    
            PR tree-optimization/113753
            * wide-int.cc (wi::mul_internal): Unpack op1val and op2val with
            UNSIGNED rather than SIGNED.  If high or needs_overflow and prec is
            not a multiple of HOST_BITS_PER_WIDE_INT, shift left bits above prec
            so that they start with r[half_blocks_needed] lowest bit.  Fix up
            computation of top mask for SIGNED.
    
            * gcc.dg/torture/bitint-56.c: New test.
            * gcc.dg/bitint-87.c: New test.

Diff:
---
 gcc/testsuite/gcc.dg/bitint-87.c         | 24 ++++++++++++++++++++++
 gcc/testsuite/gcc.dg/torture/bitint-56.c | 25 +++++++++++++++++++++++
 gcc/wide-int.cc                          | 34 ++++++++++++++++++++++++++++----
 3 files changed, 79 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/bitint-87.c b/gcc/testsuite/gcc.dg/bitint-87.c
new file mode 100644
index 000000000000..4dbd7eeb33e3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/bitint-87.c
@@ -0,0 +1,24 @@
+/* PR tree-optimization/113753 */
+/* { dg-do compile { target bitint575 } } */
+/* { dg-options "-std=gnu23" } */
+
+_BitInt(161) a = 1461501637330902918203684832716283019655932542975wb / 2wb * 2wb;
+_BitInt(160) b = 730750818665451459101842416358141509827966271487wb / 2wb * 2wb;
+_BitInt(159) c = 365375409332725729550921208179070754913983135743wb / 2wb * 2wb;
+_BitInt(129) d = 340282366920938463463374607431768211455wb / 2wb * 2wb;
+_BitInt(128) e = 170141183460469231731687303715884105727wb / 2wb * 2wb;
+_BitInt(161) f = (-1461501637330902918203684832716283019655932542975wb - 1wb) / 2wb * 2wb;
+_BitInt(160) g = (-730750818665451459101842416358141509827966271487wb - 1wb) / 2wb * 2wb;
+_BitInt(159) h = (-365375409332725729550921208179070754913983135743wb - 1wb) / 2wb * 2wb;
+_BitInt(129) i = (-340282366920938463463374607431768211455wb - 1wb) / 2wb * 2wb;
+_BitInt(128) j = (-170141183460469231731687303715884105727wb - 1wb) / 2wb * 2wb;
+_BitInt(161) k = 1461501637330902918203684832716283019655932542975wb / 2wb * 3wb;		/* { dg-warning "integer overflow in expression of type '_BitInt\\\(161\\\)' results in" } */
+_BitInt(160) l = 730750818665451459101842416358141509827966271487wb / 2wb * 3wb;		/* { dg-warning "integer overflow in expression of type '_BitInt\\\(160\\\)' results in" } */
+_BitInt(159) m = 365375409332725729550921208179070754913983135743wb / 2wb * 3wb;		/* { dg-warning "integer overflow in expression of type '_BitInt\\\(159\\\)' results in" } */
+_BitInt(129) n = 340282366920938463463374607431768211455wb / 2wb * 3wb;				/* { dg-warning "integer overflow in expression of type '_BitInt\\\(129\\\)' results in" } */
+_BitInt(128) o = 170141183460469231731687303715884105727wb / 2wb * 3wb;				/* { dg-warning "integer overflow in expression of type '_BitInt\\\(128\\\)' results in" } */
+_BitInt(161) p = (-1461501637330902918203684832716283019655932542975wb - 1wb) / 2wb * 3wb;	/* { dg-warning "integer overflow in expression of type '_BitInt\\\(161\\\)' results in" } */
+_BitInt(160) q = (-730750818665451459101842416358141509827966271487wb - 1wb) / 2wb * 3wb;	/* { dg-warning "integer overflow in expression of type '_BitInt\\\(160\\\)' results in" } */
+_BitInt(159) r = (-365375409332725729550921208179070754913983135743wb - 1wb) / 2wb * 3wb;	/* { dg-warning "integer overflow in expression of type '_BitInt\\\(159\\\)' results in" } */
+_BitInt(129) s = (-340282366920938463463374607431768211455wb - 1wb) / 2wb * 3wb;		/* { dg-warning "integer overflow in expression of type '_BitInt\\\(129\\\)' results in" } */
+_BitInt(128) t = (-170141183460469231731687303715884105727wb - 1wb) / 2wb * 3wb;		/* { dg-warning "integer overflow in expression of type '_BitInt\\\(128\\\)' results in" } */
diff --git a/gcc/testsuite/gcc.dg/torture/bitint-56.c b/gcc/testsuite/gcc.dg/torture/bitint-56.c
new file mode 100644
index 000000000000..6a76a812a0b8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/bitint-56.c
@@ -0,0 +1,25 @@
+/* PR tree-optimization/113753 */
+/* { dg-do run { target bitint } } */
+/* { dg-options "-std=c23 -pedantic-errors" } */
+/* { dg-skip-if "" { ! run_expensive_tests }  { "*" } { "-O0" "-O2" } } */
+/* { dg-skip-if "" { ! run_expensive_tests } { "-flto" } { "" } } */
+
+#if __BITINT_MAXWIDTH__ >= 129
+unsigned _BitInt(128)
+foo (unsigned u, unsigned _BitInt(128) a, unsigned _BitInt(128) b)
+{
+  unsigned _BitInt(129) m = a % b;
+  return u * m / u;
+}
+#endif
+
+int
+main ()
+{
+#if __BITINT_MAXWIDTH__ >= 129
+  if (foo (0xfa637c33, 0x37af7fe8b0000000000000000wb,
+	   0xfffffffff0000000000000000wb)
+      != 0x16f7e93f6d726b38b38d0b753wb)
+    __builtin_abort ();
+#endif
+}
diff --git a/gcc/wide-int.cc b/gcc/wide-int.cc
index 5067493d5084..a8a6443ff7f9 100644
--- a/gcc/wide-int.cc
+++ b/gcc/wide-int.cc
@@ -1483,8 +1483,8 @@ wi::mul_internal (HOST_WIDE_INT *val, const HOST_WIDE_INT *op1val,
     }
 
   /* We do unsigned mul and then correct it.  */
-  wi_unpack (u, op1val, op1len, half_blocks_needed, prec, SIGNED);
-  wi_unpack (v, op2val, op2len, half_blocks_needed, prec, SIGNED);
+  wi_unpack (u, op1val, op1len, half_blocks_needed, prec, UNSIGNED);
+  wi_unpack (v, op2val, op2len, half_blocks_needed, prec, UNSIGNED);
 
   /* The 2 is for a full mult.  */
   memset (r, 0, half_blocks_needed * 2
@@ -1503,6 +1503,28 @@ wi::mul_internal (HOST_WIDE_INT *val, const HOST_WIDE_INT *op1val,
       r[j + half_blocks_needed] = k;
     }
 
+  unsigned int shift;
+  if ((high || needs_overflow) && (shift = prec % HOST_BITS_PER_WIDE_INT) != 0)
+    {
+      /* The high or needs_overflow code assumes that the high bits
+	 only appear from r[half_blocks_needed] up to
+	 r[half_blocks_needed * 2 - 1].  If prec is not a multiple
+	 of HOST_BITS_PER_WIDE_INT, shift the bits above prec up
+	 to make that code simple.  */
+      if (shift == HOST_BITS_PER_HALF_WIDE_INT)
+	memmove (&r[half_blocks_needed], &r[half_blocks_needed - 1],
+		 sizeof (r[0]) * half_blocks_needed);
+      else
+	{
+	  unsigned int skip = shift < HOST_BITS_PER_HALF_WIDE_INT;
+	  if (!skip)
+	    shift -= HOST_BITS_PER_HALF_WIDE_INT;
+	  for (i = 2 * half_blocks_needed - 1; i >= half_blocks_needed; i--)
+	    r[i] = ((r[i - skip] << (-shift % HOST_BITS_PER_HALF_WIDE_INT))
+		    | (r[i - skip - 1] >> shift));
+	}
+    }
+
   /* We did unsigned math above.  For signed we must adjust the
      product (assuming we need to see that).  */
   if (sgn == SIGNED && (high || needs_overflow))
@@ -1543,8 +1565,12 @@ wi::mul_internal (HOST_WIDE_INT *val, const HOST_WIDE_INT *op1val,
 	top = 0;
       else
 	{
-	  top = r[(half_blocks_needed) - 1];
-	  top = SIGN_MASK (top << (HOST_BITS_PER_WIDE_INT / 2));
+	  top = r[half_blocks_needed - 1
+		  - ((-prec % HOST_BITS_PER_WIDE_INT)
+		     >= HOST_BITS_PER_HALF_WIDE_INT)];
+	  top = SIGN_MASK (((unsigned HOST_WIDE_INT) top)
+			   << (HOST_BITS_PER_WIDE_INT / 2
+			       + (-prec % HOST_BITS_PER_HALF_WIDE_INT)));
 	  top &= mask;
 	}

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2024-02-07  8:45 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-07  8:45 [gcc r14-8835] wide-int: Fix mul_internal overflow handling [PR113753] Jakub Jelinek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).