public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Simplify X * C1 == C2 with undefined overflow
@ 2020-08-01  7:28 Marc Glisse
  2020-08-03  8:51 ` Richard Biener
  2020-08-09 21:24 ` Simplify X * C1 == C2 with wrapping overflow Marc Glisse
  0 siblings, 2 replies; 13+ messages in thread
From: Marc Glisse @ 2020-08-01  7:28 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: TEXT/PLAIN, Size: 593 bytes --]

Hello,

this transformation is quite straightforward, without overflow, 3*X==15 is 
the same as X==5 and 3*X==5 cannot happen. Adding a single_use restriction 
for the first case didn't seem necessary, although of course it can 
slightly increase register pressure in some cases.

Bootstrap+regtest on x86_64-pc-linux-gnu.

2020-08-03  Marc Glisse  <marc.glisse@inria.fr>

  	PR tree-optimization/95433
  	* match.pd (X * C1 == C2): New transformation.

  	* gcc.c-torture/execute/pr23135.c: Add -fwrapv to avoid
 	undefined behavior.
  	* gcc.dg/tree-ssa/pr95433.c: New file.

-- 
Marc Glisse

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Type: TEXT/x-diff; name=equal.patch, Size: 2026 bytes --]

diff --git a/gcc/match.pd b/gcc/match.pd
index a052c9e3dbc..78fd8cf5d9e 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1578,6 +1578,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 	&& wi::neg_p (wi::to_wide (@1), TYPE_SIGN (TREE_TYPE (@1))))
     (cmp @2 @0))))))
 
+/* For integral types with undefined overflow fold
+   x * C1 == C2 into x == C2 / C1 or false.  */
+(for cmp (eq ne)
+ (simplify
+  (cmp (mult @0 INTEGER_CST@1) INTEGER_CST@2)
+  (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
+       && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
+       && wi::to_wide (@1) != 0)
+   (with { widest_int quot; }
+    (if (wi::multiple_of_p (wi::to_widest (@2), wi::to_widest (@1),
+			    TYPE_SIGN (TREE_TYPE (@0)), &quot))
+     (cmp @0 { wide_int_to_tree (TREE_TYPE (@0), quot); })
+     { build_int_cst (type, cmp == NE_EXPR); })))))
+
 /* (X - 1U) <= INT_MAX-1U into (int) X > 0.  */
 (for cmp (le gt)
      icmp (gt le)
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr23135.c b/gcc/testsuite/gcc.c-torture/execute/pr23135.c
index e740ff52874..ef9b7efc9c4 100644
--- a/gcc/testsuite/gcc.c-torture/execute/pr23135.c
+++ b/gcc/testsuite/gcc.c-torture/execute/pr23135.c
@@ -1,7 +1,7 @@
 /* Based on execute/simd-1.c, modified by joern.rennecke@st.com to
    trigger a reload bug.  Verified for gcc mainline from 20050722 13:00 UTC
    for sh-elf -m4 -O2.  */
-/* { dg-options "-Wno-psabi" } */
+/* { dg-options "-Wno-psabi -fwrapv" } */
 /* { dg-add-options stack_size } */
 
 #ifndef STACK_SIZE
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr95433.c b/gcc/testsuite/gcc.dg/tree-ssa/pr95433.c
new file mode 100644
index 00000000000..4e161ee26cc
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr95433.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-optimized" } */
+
+int f(int x){return x*7==17;}
+int g(int x){return x*3==15;}
+
+/* { dg-final { scan-tree-dump "return 0;" "optimized" } } */
+/* { dg-final { scan-tree-dump "== 5;" "optimized" } } */

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Simplify X * C1 == C2 with undefined overflow
  2020-08-01  7:28 Simplify X * C1 == C2 with undefined overflow Marc Glisse
@ 2020-08-03  8:51 ` Richard Biener
  2020-08-04 15:38   ` Marc Glisse
  2020-08-09 21:24 ` Simplify X * C1 == C2 with wrapping overflow Marc Glisse
  1 sibling, 1 reply; 13+ messages in thread
From: Richard Biener @ 2020-08-03  8:51 UTC (permalink / raw)
  To: Marc Glisse; +Cc: GCC Patches

On Sat, Aug 1, 2020 at 9:29 AM Marc Glisse <marc.glisse@inria.fr> wrote:
>
> Hello,
>
> this transformation is quite straightforward, without overflow, 3*X==15 is
> the same as X==5 and 3*X==5 cannot happen. Adding a single_use restriction
> for the first case didn't seem necessary, although of course it can
> slightly increase register pressure in some cases.
>
> Bootstrap+regtest on x86_64-pc-linux-gnu.

OK with using constant_boolean_node (cmp == NE_EXPR, type).

ISTR we had the x * 0 == CST simplification somewhere
but maybe it was x * y == 0 ...  ah, yes:

/* Transform comparisons of the form X * C1 CMP 0 to X CMP 0 in the
   signed arithmetic case.  That form is created by the compiler
   often enough for folding it to be of value.  One example is in
   computing loop trip counts after Operator Strength Reduction.  */
(for cmp (simple_comparison)
     scmp (swapped_simple_comparison)
 (simplify
  (cmp (mult@3 @0 INTEGER_CST@1) integer_zerop@2)

As it is placed after your pattern it will be never matched I think
(but we don't warn because of INTEGER_CST vs. integer_zerop).

But I think your pattern subsumes it besides of the X * 0 == 0
compare - oh, and the other pattern also handles relational compares
(those will still trigger).

Maybe place the patterns next to each other?  Also see whether
moving yours after the above will cease the testcases to be handled
because it's no longer matched - if not this might be the better
order.

Richard.

> 2020-08-03  Marc Glisse  <marc.glisse@inria.fr>
>
>         PR tree-optimization/95433
>         * match.pd (X * C1 == C2): New transformation.
>
>         * gcc.c-torture/execute/pr23135.c: Add -fwrapv to avoid
>         undefined behavior.
>         * gcc.dg/tree-ssa/pr95433.c: New file.
>
> --
> Marc Glisse

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Simplify X * C1 == C2 with undefined overflow
  2020-08-03  8:51 ` Richard Biener
@ 2020-08-04 15:38   ` Marc Glisse
  0 siblings, 0 replies; 13+ messages in thread
From: Marc Glisse @ 2020-08-04 15:38 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

On Mon, 3 Aug 2020, Richard Biener wrote:

> On Sat, Aug 1, 2020 at 9:29 AM Marc Glisse <marc.glisse@inria.fr> wrote:
>>
>> Hello,
>>
>> this transformation is quite straightforward, without overflow, 3*X==15 is
>> the same as X==5 and 3*X==5 cannot happen. Adding a single_use restriction
>> for the first case didn't seem necessary, although of course it can
>> slightly increase register pressure in some cases.
>>
>> Bootstrap+regtest on x86_64-pc-linux-gnu.
>
> OK with using constant_boolean_node (cmp == NE_EXPR, type).
>
> ISTR we had the x * 0 == CST simplification somewhere
> but maybe it was x * y == 0 ...  ah, yes:
>
> /* Transform comparisons of the form X * C1 CMP 0 to X CMP 0 in the
>   signed arithmetic case.  That form is created by the compiler
>   often enough for folding it to be of value.  One example is in
>   computing loop trip counts after Operator Strength Reduction.  */
> (for cmp (simple_comparison)
>     scmp (swapped_simple_comparison)
> (simplify
>  (cmp (mult@3 @0 INTEGER_CST@1) integer_zerop@2)
>
> As it is placed after your pattern it will be never matched I think
> (but we don't warn because of INTEGER_CST vs. integer_zerop).
>
> But I think your pattern subsumes it besides of the X * 0 == 0
> compare - oh, and the other pattern also handles relational compares
> (those will still trigger).
>
> Maybe place the patterns next to each other?  Also see whether
> moving yours after the above will cease the testcases to be handled
> because it's no longer matched - if not this might be the better
> order.

I moved it after, it still works, so I pushed the patch. Note that the 
other transformation has a single_use restriction, while this one doesn't, 
that's not very consistent, but also hopefully not so important...

-- 
Marc Glisse

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Simplify X * C1 == C2 with wrapping overflow
  2020-08-01  7:28 Simplify X * C1 == C2 with undefined overflow Marc Glisse
  2020-08-03  8:51 ` Richard Biener
@ 2020-08-09 21:24 ` Marc Glisse
  2020-08-10  8:39   ` Jakub Jelinek
  1 sibling, 1 reply; 13+ messages in thread
From: Marc Glisse @ 2020-08-09 21:24 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: TEXT/PLAIN, Size: 624 bytes --]

Odd numbers are invertible in Z / 2^n Z, so X * C1 == C2 can be rewritten 
as X == C2 * inv(C1) when overflow wraps.

mod_inv should probably be updated to better match the other wide_int 
functions, but that's a separate issue.

Bootstrap+regtest on x86_64-pc-linux-gnu.

2020-08-10  Marc Glisse  <marc.glisse@inria.fr>

 	PR tree-optimization/95433
 	* match.pd (X * C1 == C2): Handle wrapping overflow.
 	* expr.c (maybe_optimize_mod_cmp): Qualify call to mod_inv.
 	(mod_inv): Move...
 	* wide-int.cc (mod_inv): ... here.
 	* wide-int.h (mod_inv): Declare it.

 	* gcc.dg/tree-ssa/pr95433-2.c: New file.

-- 
Marc Glisse

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Type: TEXT/x-diff; name=equal2.patch, Size: 5965 bytes --]

diff --git a/gcc/expr.c b/gcc/expr.c
index a150fa0d3b5..ebf0c9e4797 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -11859,38 +11859,6 @@ string_constant (tree arg, tree *ptr_offset, tree *mem_size, tree *decl)
   return init;
 }
 \f
-/* Compute the modular multiplicative inverse of A modulo M
-   using extended Euclid's algorithm.  Assumes A and M are coprime.  */
-static wide_int
-mod_inv (const wide_int &a, const wide_int &b)
-{
-  /* Verify the assumption.  */
-  gcc_checking_assert (wi::eq_p (wi::gcd (a, b), 1));
-
-  unsigned int p = a.get_precision () + 1;
-  gcc_checking_assert (b.get_precision () + 1 == p);
-  wide_int c = wide_int::from (a, p, UNSIGNED);
-  wide_int d = wide_int::from (b, p, UNSIGNED);
-  wide_int x0 = wide_int::from (0, p, UNSIGNED);
-  wide_int x1 = wide_int::from (1, p, UNSIGNED);
-
-  if (wi::eq_p (b, 1))
-    return wide_int::from (1, p, UNSIGNED);
-
-  while (wi::gt_p (c, 1, UNSIGNED))
-    {
-      wide_int t = d;
-      wide_int q = wi::divmod_trunc (c, d, UNSIGNED, &d);
-      c = t;
-      wide_int s = x0;
-      x0 = wi::sub (x1, wi::mul (q, x0));
-      x1 = s;
-    }
-  if (wi::lt_p (x1, 0, SIGNED))
-    x1 += d;
-  return x1;
-}
-
 /* Optimize x % C1 == C2 for signed modulo if C1 is a power of two and C2
    is non-zero and C3 ((1<<(prec-1)) | (C1 - 1)):
    for C2 > 0 to x & C3 == C2
@@ -12101,7 +12069,7 @@ maybe_optimize_mod_cmp (enum tree_code code, tree *arg0, tree *arg1)
   w = wi::lrshift (w, shift);
   wide_int a = wide_int::from (w, prec + 1, UNSIGNED);
   wide_int b = wi::shifted_mask (prec, 1, false, prec + 1);
-  wide_int m = wide_int::from (mod_inv (a, b), prec, UNSIGNED);
+  wide_int m = wide_int::from (wi::mod_inv (a, b), prec, UNSIGNED);
   tree c3 = wide_int_to_tree (type, m);
   tree c5 = NULL_TREE;
   wide_int d, e;
diff --git a/gcc/match.pd b/gcc/match.pd
index 7e5c5a6eae6..c3b88168ac4 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3828,7 +3828,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
      (cmp @0 @2))))))
 
 /* For integral types with undefined overflow fold
-   x * C1 == C2 into x == C2 / C1 or false.  */
+   x * C1 == C2 into x == C2 / C1 or false.
+   If overflow wraps and C1 is odd, simplify to x == C2 / C1 in the ring
+   Z / 2^n Z.  */
 (for cmp (eq ne)
  (simplify
   (cmp (mult @0 INTEGER_CST@1) INTEGER_CST@2)
@@ -3839,7 +3841,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
     (if (wi::multiple_of_p (wi::to_widest (@2), wi::to_widest (@1),
 			    TYPE_SIGN (TREE_TYPE (@0)), &quot))
      (cmp @0 { wide_int_to_tree (TREE_TYPE (@0), quot); })
-     { constant_boolean_node (cmp == NE_EXPR, type); })))))
+     { constant_boolean_node (cmp == NE_EXPR, type); }))
+   (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
+	&& TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0))
+	&& (wi::bit_and (wi::to_wide (@1), 1) == 1))
+    (cmp @0
+     {
+       tree itype = TREE_TYPE (@0);
+       int p = TYPE_PRECISION (itype);
+       wide_int m = wi::one (p + 1) << p;
+       wide_int a = wide_int::from (wi::to_wide (@1), p + 1, UNSIGNED);
+       wide_int i = wide_int::from (wi::mod_inv (a, m),
+				    p, TYPE_SIGN (itype));
+       wide_int_to_tree (itype, wi::mul (i, wi::to_wide (@2)));
+     })))))
 
 /* Simplify comparison of something with itself.  For IEEE
    floating-point, we can only do some of these simplifications.  */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr95433-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr95433-2.c
new file mode 100644
index 00000000000..c830a3d195f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr95433-2.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fwrapv -fdump-tree-gimple" } */
+
+typedef __INT32_TYPE__ int32_t;
+typedef unsigned __INT32_TYPE__ uint32_t;
+
+int e(int32_t x){return 3*x==5;}
+int f(int32_t x){return 3*x==-5;}
+int g(int32_t x){return -3*x==5;}
+int h(int32_t x){return 7*x==3;}
+int i(uint32_t x){return 7*x==3;}
+
+/* { dg-final { scan-tree-dump-times "== 1431655767" 1 "gimple" } } */
+/* { dg-final { scan-tree-dump-times "== -1431655767" 2 "gimple" } } */
+/* { dg-final { scan-tree-dump-times "== 613566757" 2 "gimple" } } */
diff --git a/gcc/wide-int.cc b/gcc/wide-int.cc
index 03be0074816..f4d949c38a0 100644
--- a/gcc/wide-int.cc
+++ b/gcc/wide-int.cc
@@ -2223,6 +2223,39 @@ wi::round_up_for_mask (const wide_int &val, const wide_int &mask)
   return (val | tmp) & -tmp;
 }
 
+/* Compute the modular multiplicative inverse of A modulo B
+   using extended Euclid's algorithm.  Assumes A and B are coprime,
+   and that A and B have the same precision.  */
+wide_int
+wi::mod_inv (const wide_int &a, const wide_int &b)
+{
+  /* Verify the assumption.  */
+  gcc_checking_assert (wi::eq_p (wi::gcd (a, b), 1));
+
+  unsigned int p = a.get_precision () + 1;
+  gcc_checking_assert (b.get_precision () + 1 == p);
+  wide_int c = wide_int::from (a, p, UNSIGNED);
+  wide_int d = wide_int::from (b, p, UNSIGNED);
+  wide_int x0 = wide_int::from (0, p, UNSIGNED);
+  wide_int x1 = wide_int::from (1, p, UNSIGNED);
+
+  if (wi::eq_p (b, 1))
+    return wide_int::from (1, p, UNSIGNED);
+
+  while (wi::gt_p (c, 1, UNSIGNED))
+    {
+      wide_int t = d;
+      wide_int q = wi::divmod_trunc (c, d, UNSIGNED, &d);
+      c = t;
+      wide_int s = x0;
+      x0 = wi::sub (x1, wi::mul (q, x0));
+      x1 = s;
+    }
+  if (wi::lt_p (x1, 0, SIGNED))
+    x1 += d;
+  return x1;
+}
+
 /*
  * Private utilities.
  */
diff --git a/gcc/wide-int.h b/gcc/wide-int.h
index bb0d16b99b1..39cd5b9bd17 100644
--- a/gcc/wide-int.h
+++ b/gcc/wide-int.h
@@ -3389,6 +3389,8 @@ namespace wi
   wide_int round_down_for_mask (const wide_int &, const wide_int &);
   wide_int round_up_for_mask (const wide_int &, const wide_int &);
 
+  wide_int mod_inv (const wide_int &a, const wide_int &b);
+
   template <typename T>
   T mask (unsigned int, bool);
 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Simplify X * C1 == C2 with wrapping overflow
  2020-08-09 21:24 ` Simplify X * C1 == C2 with wrapping overflow Marc Glisse
@ 2020-08-10  8:39   ` Jakub Jelinek
  0 siblings, 0 replies; 13+ messages in thread
From: Jakub Jelinek @ 2020-08-10  8:39 UTC (permalink / raw)
  To: Marc Glisse; +Cc: gcc-patches

On Sun, Aug 09, 2020 at 11:24:54PM +0200, Marc Glisse wrote:
> Odd numbers are invertible in Z / 2^n Z, so X * C1 == C2 can be rewritten as
> X == C2 * inv(C1) when overflow wraps.
> 
> mod_inv should probably be updated to better match the other wide_int
> functions, but that's a separate issue.
> 
> Bootstrap+regtest on x86_64-pc-linux-gnu.
> 
> 2020-08-10  Marc Glisse  <marc.glisse@inria.fr>
> 
> 	PR tree-optimization/95433
> 	* match.pd (X * C1 == C2): Handle wrapping overflow.
> 	* expr.c (maybe_optimize_mod_cmp): Qualify call to mod_inv.
> 	(mod_inv): Move...
> 	* wide-int.cc (mod_inv): ... here.
> 	* wide-int.h (mod_inv): Declare it.
> 
> 	* gcc.dg/tree-ssa/pr95433-2.c: New file.

LGTM, thanks.

	Jakub


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Simplify X * C1 == C2 with undefined overflow
  2020-08-07 21:36         ` Marc Glisse
@ 2020-08-08  6:55           ` Jakub Jelinek
  0 siblings, 0 replies; 13+ messages in thread
From: Jakub Jelinek @ 2020-08-08  6:55 UTC (permalink / raw)
  To: Marc Glisse; +Cc: Joern Wolfgang Rennecke, GCC Patches

On Fri, Aug 07, 2020 at 11:36:59PM +0200, Marc Glisse wrote:
> > > https://en.wikipedia.org/wiki/Extended_Euclidean_algorithm is the most
> > > common way to compute the modular multiplicative inverse of a number. For 3
> > > and 2^32, it could tell us that 2863311531*3-2*2^32=1, so modulo 2^32
> > > 2863311531*3==1, and 3*X==5 is the same as 2863311531*3*X==2863311531*5,
> > > i.e. X==1431655767.
> > 
> > wi::gcd ?
> 
> That's the first place I looked, but this is only the regular Euclid
> algorithm, not the extended one. It tells you what the gcd is, but does not
> give you the coefficients of the Bézout identity. For 3 and 2^32, it would
> tell me the gcd is 1, while I want the number 2863311531 (inverse of 3).
> 
> Ah, found it, there is mod_inv hidden in expr.c!

It can be certainly moved to wide-int.{h,cc} if you need it elsewhere.

	Jakub


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Simplify X * C1 == C2 with undefined overflow
  2020-08-07 20:57     ` Marc Glisse
  2020-08-07 21:02       ` Jakub Jelinek
@ 2020-08-07 21:58       ` Joern Wolfgang Rennecke
  1 sibling, 0 replies; 13+ messages in thread
From: Joern Wolfgang Rennecke @ 2020-08-07 21:58 UTC (permalink / raw)
  To: Marc Glisse; +Cc: GCC Patches


On 07/08/20 21:57, Marc Glisse wrote:
> On Fri, 7 Aug 2020, Joern Wolfgang Rennecke wrote:
>
>
>> However, there are cases were the second division will not be 
>> possible without rest.
>> Consider : 7*X == 3
>> 3/7 is 0 rest 3.
>> 0x100000000 / 7 is 0x24924924 rest 4
>> Since 3 cannot be represented as an integer multiple of -4, we can 
>> conclude that the predicate
>> is always false.
>
> 613566757*7-2^32==3
Ah, right.  I thought to decompose the variable factor into a 
non-overflowing and an overflowing part.
But now I see that the former will not always be found with a standard 
division.  Here we'd have to
consider:
3 by 7 is 1 rest -4 .. and then I get your result above.  But of course 
it's no longer an efficient algorithm
if I have to test lots of non-canonical division results.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Simplify X * C1 == C2 with undefined overflow
  2020-08-07 21:02       ` Jakub Jelinek
@ 2020-08-07 21:36         ` Marc Glisse
  2020-08-08  6:55           ` Jakub Jelinek
  0 siblings, 1 reply; 13+ messages in thread
From: Marc Glisse @ 2020-08-07 21:36 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Joern Wolfgang Rennecke, GCC Patches

On Fri, 7 Aug 2020, Jakub Jelinek wrote:

> On Fri, Aug 07, 2020 at 10:57:54PM +0200, Marc Glisse wrote:
>> On Fri, 7 Aug 2020, Joern Wolfgang Rennecke wrote:
>>
>>>
>>> On 07/08/20 19:21, Marc Glisse wrote:
>>>>
>>>> If we are going to handle the wrapping case, we shouldn't limit to
>>>> the non-wrapping meaning of multiplicity. 3*X==5 should become
>>>> X==1431655767 (for a 32 bit type), etc.
>>>>
>>>> Do we have an extended gcd somewhere in gcc, to help compute 1431655767?
>>> I don't quite see yet how this relates to gcd,
>>
>> https://en.wikipedia.org/wiki/Extended_Euclidean_algorithm is the most
>> common way to compute the modular multiplicative inverse of a number. For 3
>> and 2^32, it could tell us that 2863311531*3-2*2^32=1, so modulo 2^32
>> 2863311531*3==1, and 3*X==5 is the same as 2863311531*3*X==2863311531*5,
>> i.e. X==1431655767.
>
> wi::gcd ?

That's the first place I looked, but this is only the regular Euclid 
algorithm, not the extended one. It tells you what the gcd is, but does 
not give you the coefficients of the Bézout identity. For 3 and 2^32, it 
would tell me the gcd is 1, while I want the number 2863311531 (inverse of 
3).

Ah, found it, there is mod_inv hidden in expr.c!

-- 
Marc Glisse

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Simplify X * C1 == C2 with undefined overflow
  2020-08-07 20:57     ` Marc Glisse
@ 2020-08-07 21:02       ` Jakub Jelinek
  2020-08-07 21:36         ` Marc Glisse
  2020-08-07 21:58       ` Joern Wolfgang Rennecke
  1 sibling, 1 reply; 13+ messages in thread
From: Jakub Jelinek @ 2020-08-07 21:02 UTC (permalink / raw)
  To: Marc Glisse; +Cc: Joern Wolfgang Rennecke, GCC Patches

On Fri, Aug 07, 2020 at 10:57:54PM +0200, Marc Glisse wrote:
> On Fri, 7 Aug 2020, Joern Wolfgang Rennecke wrote:
> 
> > 
> > On 07/08/20 19:21, Marc Glisse wrote:
> > > 
> > > If we are going to handle the wrapping case, we shouldn't limit to
> > > the non-wrapping meaning of multiplicity. 3*X==5 should become
> > > X==1431655767 (for a 32 bit type), etc.
> > > 
> > > Do we have an extended gcd somewhere in gcc, to help compute 1431655767?
> > I don't quite see yet how this relates to gcd,
> 
> https://en.wikipedia.org/wiki/Extended_Euclidean_algorithm is the most
> common way to compute the modular multiplicative inverse of a number. For 3
> and 2^32, it could tell us that 2863311531*3-2*2^32=1, so modulo 2^32
> 2863311531*3==1, and 3*X==5 is the same as 2863311531*3*X==2863311531*5,
> i.e. X==1431655767.

wi::gcd ?

	Jakub


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Simplify X * C1 == C2 with undefined overflow
  2020-08-07 20:30   ` Joern Wolfgang Rennecke
@ 2020-08-07 20:57     ` Marc Glisse
  2020-08-07 21:02       ` Jakub Jelinek
  2020-08-07 21:58       ` Joern Wolfgang Rennecke
  0 siblings, 2 replies; 13+ messages in thread
From: Marc Glisse @ 2020-08-07 20:57 UTC (permalink / raw)
  To: Joern Wolfgang Rennecke; +Cc: GCC Patches

On Fri, 7 Aug 2020, Joern Wolfgang Rennecke wrote:

>
> On 07/08/20 19:21, Marc Glisse wrote:
>> 
>> If we are going to handle the wrapping case, we shouldn't limit to the 
>> non-wrapping meaning of multiplicity. 3*X==5 should become X==1431655767 
>> (for a 32 bit type), etc.
>> 
>> Do we have an extended gcd somewhere in gcc, to help compute 1431655767?
> I don't quite see yet how this relates to gcd,

https://en.wikipedia.org/wiki/Extended_Euclidean_algorithm is the most 
common way to compute the modular multiplicative inverse of a number. For 
3 and 2^32, it could tell us that 2863311531*3-2*2^32=1, so modulo 2^32 
2863311531*3==1, and 3*X==5 is the same as 2863311531*3*X==2863311531*5, 
i.e. X==1431655767.

> However, there are cases were the second division will not be possible 
> without rest.
> Consider : 7*X == 3
> 3/7 is 0 rest 3.
> 0x100000000 / 7 is 0x24924924 rest 4
> Since 3 cannot be represented as an integer multiple of -4, we can conclude 
> that the predicate
> is always false.

613566757*7-2^32==3

> Also: 14*X == 28
> has a non-zero power of two in the constant factor, so we have to factor out 
> the power of two
> (if the right hand side has a lower power of two, the predicate is always 
> false) and consider
> in modulo arithmetic with a number of bits less by the factored out power of 
> two, i.e. 31 bit here:
> 7*X == 14
> which is solved as above - but in 32 bit arithmetic - to
> X == 2
> and to convert back to 32 bit arithmetic, we get:
> X & 0x7fffffff == 2
> or
> 2*x == 4

Yes, we can reduce the size of the numbers a bit, but the gains won't be 
as nice for even numbers. I think the always-false case is already handled 
by CCP, tracking which bits can be 0 (I didn't check).

-- 
Marc Glisse

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Simplify X * C1 == C2 with undefined overflow
  2020-08-07 18:21 ` Marc Glisse
@ 2020-08-07 20:30   ` Joern Wolfgang Rennecke
  2020-08-07 20:57     ` Marc Glisse
  0 siblings, 1 reply; 13+ messages in thread
From: Joern Wolfgang Rennecke @ 2020-08-07 20:30 UTC (permalink / raw)
  To: Marc Glisse; +Cc: GCC Patches


On 07/08/20 19:21, Marc Glisse wrote:
>
> If we are going to handle the wrapping case, we shouldn't limit to the 
> non-wrapping meaning of multiplicity. 3*X==5 should become 
> X==1431655767 (for a 32 bit type), etc.
>
> Do we have an extended gcd somewhere in gcc, to help compute 1431655767?
I don't quite see yet how this relates to gcd, but how how about this:

First, we divide the right hand side of the comparison by the constant 
factor.  5 divided by 3
is one, with rest 2.  Now we want to express this rest two by wrapping 
overflow.
0x100000000 divided by 3 is 0x55555555 rest 1.  When we multiply 
0x55555555 again by three,
the rest is missing, so we get -1 in 32 bit.  So we try to express 2 as 
a multiple of -1:
2/-1 = -2

Thus, the quotient is 1 + 0x55555555 * -2 , which, using 32 bit wrapping 
arithmetic,
is 0x55555557, i.e. 1431655767 .
Again, because the constant factor (3) is odd, there can be no other 
solution.

However, there are cases were the second division will not be possible 
without rest.
Consider : 7*X == 3
3/7 is 0 rest 3.
0x100000000 / 7 is 0x24924924 rest 4
Since 3 cannot be represented as an integer multiple of -4, we can 
conclude that the predicate
is always false.

Also: 14*X == 28
has a non-zero power of two in the constant factor, so we have to factor 
out the power of two
(if the right hand side has a lower power of two, the predicate is 
always false) and consider
in modulo arithmetic with a number of bits less by the factored out 
power of two, i.e. 31 bit here:
7*X == 14
which is solved as above - but in 32 bit arithmetic - to
X == 2
and to convert back to 32 bit arithmetic, we get:
X & 0x7fffffff == 2
or
2*x == 4


Likewise, 14*X == 30 would be reduced to 7*X == 15 in 31 bit arithmetic, 
then we find that
we can't express the rest of 1 as an integer multiple of -2, so the 
predicate is always false.
  .
>
>> (Even if the variable factor is wider than equality comparison, that 
>> is not a problem as long as the comparison is not widened by the 
>> transformation.)
>>
>> On the other hand, the following generalizations would work only 
>> without overflow:
>> - handling of inequality-comparisons - merely have to account for the 
>> sign of the factor reversing the sense of
>>  the inequality, e.g. -3*X >= 15 ---> X <= 5
>
> And again for this one, we have to be careful how we round the 
> division, but we don't have to limit to the case where 15 is a 
> multiple of 3. 3*X>7 can be replaced with X>2.
>
>
> Those are two nice suggestions. Do you intend to write a patch?
No, I just couldn't resist applying a smidge of number theory to a 
compiler problem.
I still have to herd other patches.
> Otherwise I'll try to do it eventually (no promise).
Would be nice.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Simplify X * C1 == C2 with undefined overflow
  2020-08-07 13:05 Simplify X * C1 == C2 with undefined overflow Joern Wolfgang Rennecke
@ 2020-08-07 18:21 ` Marc Glisse
  2020-08-07 20:30   ` Joern Wolfgang Rennecke
  0 siblings, 1 reply; 13+ messages in thread
From: Marc Glisse @ 2020-08-07 18:21 UTC (permalink / raw)
  To: Joern Wolfgang Rennecke; +Cc: GCC Patches

On Fri, 7 Aug 2020, Joern Wolfgang Rennecke wrote:

>> this transformation is quite straightforward, without overflow, 3*X==15 is
>> the same as X==5 and 3*X==5 cannot happen.
>
> Actually, with binary and decimal computers, this transformation (with 
> these specific numbers) is also valid for wrapping overflow.  More 
> generally, it is valid for wrapping overflow if the right hand side of 
> the comparison is divisible without rest by the constant factor, and the 
> constant factor has no sub-factor that is a zero divisor for the ring 
> defined by the wrapping operation. For binary computers, the latter 
> condition can be more simply be restated as: The constant factor has to 
> be odd. (For decimal computers, it's: must not be divisible by two 
> and/or five.)

If we are going to handle the wrapping case, we shouldn't limit to the 
non-wrapping meaning of multiplicity. 3*X==5 should become X==1431655767 
(for a 32 bit type), etc.

Do we have an extended gcd somewhere in gcc, to help compute 1431655767?

> (Even if the variable factor is wider than equality comparison, that is 
> not a problem as long as the comparison is not widened by the 
> transformation.)
>
> On the other hand, the following generalizations would work only without 
> overflow:
> - handling of inequality-comparisons - merely have to account for the sign of 
> the factor reversing the sense of
>  the inequality, e.g. -3*X >= 15 ---> X <= 5

And again for this one, we have to be careful how we round the division, 
but we don't have to limit to the case where 15 is a multiple of 3. 3*X>7 
can be replaced with X>2.


Those are two nice suggestions. Do you intend to write a patch? Otherwise 
I'll try to do it eventually (no promise).

-- 
Marc Glisse

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Simplify X * C1 == C2 with undefined overflow
@ 2020-08-07 13:05 Joern Wolfgang Rennecke
  2020-08-07 18:21 ` Marc Glisse
  0 siblings, 1 reply; 13+ messages in thread
From: Joern Wolfgang Rennecke @ 2020-08-07 13:05 UTC (permalink / raw)
  To: GCC Patches, Marc Glisse

> this transformation is quite straightforward, without overflow, 3*X==15 is
> the same as X==5 and 3*X==5 cannot happen.

Actually, with binary and decimal computers, this transformation (with these specific numbers)
is also valid for wrapping overflow.  More generally, it is valid for wrapping overflow
if the right hand side of the comparison is divisible without rest by the constant factor,
and the constant factor has no sub-factor that is a zero divisor for the ring defined by the wrapping operation.
For binary computers, the latter condition can be more simply be restated as: The constant factor has to be odd.
(For decimal computers, it's: must not be divisible by two and/or five.)

(Even if the variable factor is wider than equality comparison, that is not a problem
  as long as the comparison is not widened by the transformation.)

On the other hand, the following generalizations would work only without overflow:
- handling of inequality-comparisons - merely have to account for the sign of the factor reversing the sense of
   the inequality, e.g. -3*X >= 15 ---> X <= 5
- If the right-hand-side constant is not a multiple of the constant factor, the product is always unequal, i.e.
   an EQ test would be always false, NE would be always true.


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-08-10  8:39 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-01  7:28 Simplify X * C1 == C2 with undefined overflow Marc Glisse
2020-08-03  8:51 ` Richard Biener
2020-08-04 15:38   ` Marc Glisse
2020-08-09 21:24 ` Simplify X * C1 == C2 with wrapping overflow Marc Glisse
2020-08-10  8:39   ` Jakub Jelinek
2020-08-07 13:05 Simplify X * C1 == C2 with undefined overflow Joern Wolfgang Rennecke
2020-08-07 18:21 ` Marc Glisse
2020-08-07 20:30   ` Joern Wolfgang Rennecke
2020-08-07 20:57     ` Marc Glisse
2020-08-07 21:02       ` Jakub Jelinek
2020-08-07 21:36         ` Marc Glisse
2020-08-08  6:55           ` Jakub Jelinek
2020-08-07 21:58       ` Joern Wolfgang Rennecke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).