public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] PR middle-end/98865: Optimize (a>>63)*b as -(a>>63)&b in match.pd.
@ 2022-04-20 18:35 Roger Sayle
  2022-05-02 11:48 ` Richard Biener
  0 siblings, 1 reply; 2+ messages in thread
From: Roger Sayle @ 2022-04-20 18:35 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1954 bytes --]


This patch implements the constant folding optimization(s) described in
PR middle-end/98865, which should help address the serious performance
regression of Botan AES-128/XTS mentioned in PR tree-optimization/98856.
This combines aspects of both Jakub Jelinek's patch in comment #2 and
Andrew Pinski's patch in comment #4, so both are listed as co-authors.

Alas truth_valued_p is not quite what we want (and tweaking its
definition has undesirable side-effects), so instead this patch
introduces a new zero_one_valued predicate based on tree_nonzero_bits
that extends truth_valued_p (which is for Boolean types with single
bit precision).  This is then used to simple if X*Y into X&Y when
both X and Y are zero_one_valued_p, and simplify X*Y into (-X)&Y when
X is zero_one_valued_p, in both cases replacing an integer multiplication
with a cheaper bit-wise AND.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and with --target_board=unix{-m32}, with
no new failures, except for a tweak required to tree-ssa/vrp116.c.
The recently proposed cmove patch ensures the i386 backend continues
to generate identical code for vrp116.c as before.

Ok, either for mainline or when stage 1 reopens?


2022-04-20  Roger Sayle  <roger@nextmovesoftware.com>
            Andrew Pinski  <apinski@marvell.com>
            Jakub Jelinek  <jakub@redhat.com>

gcc/ChangeLog
	PR middle-end/98865
	* match.pd (match zero_one_valued_p): New predicate.
	(mult @0 @1): Use zero_one_valued_p for transforming into (and @0
@1).
	(mult zero_one_valued_p@0 @1): Convert integer multiplication into
	a negation and a bit-wise AND, if it can't be cheaply implemented by
	a single left shift.

gcc/testsuite/ChangeLog
	PR middle-end/98865
	* gcc.dg/pr98865.c: New test case.
	* gcc.dg/vrp116.c: Tweak test to confirm the integer multiplication
	has been eliminated, not for the actual replacement implementation.

Thanks,
Roger
--


[-- Attachment #2: patchjj3.txt --]
[-- Type: text/plain, Size: 3405 bytes --]

diff --git a/gcc/match.pd b/gcc/match.pd
index 6d691d3..16a1203 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -285,14 +285,6 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
            || !COMPLEX_FLOAT_TYPE_P (type)))
    (negate @0)))
 
-/* Transform { 0 or 1 } * { 0 or 1 } into { 0 or 1 } & { 0 or 1 } */
-(simplify
- (mult SSA_NAME@1 SSA_NAME@2)
-  (if (INTEGRAL_TYPE_P (type)
-       && get_nonzero_bits (@1) == 1
-       && get_nonzero_bits (@2) == 1)
-   (bit_and @1 @2)))
-
 /* Transform x * { 0 or 1, 0 or 1, ... } into x & { 0 or -1, 0 or -1, ...},
    unless the target has native support for the former but not the latter.  */
 (simplify
@@ -1789,6 +1781,28 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (bit_not (bit_not @0))
   @0)
 
+(match zero_one_valued_p
+ @0
+ (if (INTEGRAL_TYPE_P (type) && tree_nonzero_bits (@0) == 1)))
+(match zero_one_valued_p
+ truth_valued_p@0)
+
+/* Transform { 0 or 1 } * { 0 or 1 } into { 0 or 1 } & { 0 or 1 } */
+(simplify
+ (mult zero_one_valued_p@0 zero_one_valued_p@1)
+ (if (INTEGRAL_TYPE_P (type))
+  (bit_and @0 @1)))
+
+/* Transform x * { 0 or 1 } into x & { 0 or -1 }, i.e. an integer
+   multiplication into negate/bitwise and.  Don't do this if the
+   multiplication is cheap, may be implemented by a single shift.  */
+(simplify
+ (mult:c zero_one_valued_p@0 @1)
+ (if (INTEGRAL_TYPE_P (type)
+      && (TREE_CODE (@1) != INTEGER_CST
+          || wi::popcount (wi::to_wide (@1)) > 1))
+  (bit_and (negate @0) @1)))
+
 /* Convert ~ (-A) to A - 1.  */
 (simplify
  (bit_not (convert? (negate @0)))
diff --git a/gcc/testsuite/gcc.dg/pr98865.c b/gcc/testsuite/gcc.dg/pr98865.c
new file mode 100644
index 0000000..e7599d3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr98865.c
@@ -0,0 +1,60 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#if __SIZEOF_INT__ == 4
+unsigned int foo(unsigned int a, unsigned int b)
+{
+  return (a >> 31) * b;
+}
+
+int bar(int a, int b)
+{
+  return -(a >> 31) * b;
+}
+
+int baz(int a, int b)
+{
+  int c = a >> 31;
+  int d = -c;
+  return d * b;
+}
+#endif
+
+#if __SIZEOF_LONG_LONG__ == 8
+unsigned long long fool (unsigned long long a, unsigned long long b)
+{
+  return (a >> 63) * b;
+}
+
+long long barl (long long a, long long b)
+{
+  return -(a >> 63) * b;
+}
+
+long long bazl (long long a, long long b)
+{
+  long long c = a >> 63;
+  long long d = -c;
+  return d * b;
+}
+#endif
+
+unsigned int pin (int a, unsigned int b)
+{
+  unsigned int t =  a & 1;
+  return t * b;
+}
+
+unsigned long pinl (long a, unsigned long b)
+{
+  unsigned long t =  a & 1;
+  return t * b;
+}
+
+unsigned long long pinll (long long a, unsigned long long b)
+{
+  unsigned long long t =  a & 1;
+  return t * b;
+}
+
+/* { dg-final { scan-tree-dump-not " \\* " "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp116.c b/gcc/testsuite/gcc.dg/tree-ssa/vrp116.c
index 9e68a77..469b232 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/vrp116.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp116.c
@@ -9,4 +9,5 @@ f (int m1, int m2, int c)
   return e ? m1 : m2;
 }
 
-/* { dg-final { scan-tree-dump-times "\\? c_\[0-9\]\\(D\\) : 0" 1 "optimized" } } */
+/* The MULT_EXPR should become BIT_AND_EXPR or COND_EXPR.  */
+/* { dg-final { scan-tree-dump-not " \\* " "optimized" } } */

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [PATCH] PR middle-end/98865: Optimize (a>>63)*b as -(a>>63)&b in match.pd.
  2022-04-20 18:35 [PATCH] PR middle-end/98865: Optimize (a>>63)*b as -(a>>63)&b in match.pd Roger Sayle
@ 2022-05-02 11:48 ` Richard Biener
  0 siblings, 0 replies; 2+ messages in thread
From: Richard Biener @ 2022-05-02 11:48 UTC (permalink / raw)
  To: Roger Sayle; +Cc: GCC Patches

On Wed, Apr 20, 2022 at 8:35 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> This patch implements the constant folding optimization(s) described in
> PR middle-end/98865, which should help address the serious performance
> regression of Botan AES-128/XTS mentioned in PR tree-optimization/98856.
> This combines aspects of both Jakub Jelinek's patch in comment #2 and
> Andrew Pinski's patch in comment #4, so both are listed as co-authors.
>
> Alas truth_valued_p is not quite what we want (and tweaking its
> definition has undesirable side-effects), so instead this patch
> introduces a new zero_one_valued predicate based on tree_nonzero_bits
> that extends truth_valued_p (which is for Boolean types with single
> bit precision).  This is then used to simple if X*Y into X&Y when
> both X and Y are zero_one_valued_p, and simplify X*Y into (-X)&Y when
> X is zero_one_valued_p, in both cases replacing an integer multiplication
> with a cheaper bit-wise AND.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and with --target_board=unix{-m32}, with
> no new failures, except for a tweak required to tree-ssa/vrp116.c.
> The recently proposed cmove patch ensures the i386 backend continues
> to generate identical code for vrp116.c as before.
>
> Ok, either for mainline or when stage 1 reopens?

One issue is that on GIMPLE we consider (bit_and (negate @0) @1) to be
more costly than (mult @0 @1) because it is two operations rather than one.
So can we at least do (bit_and (negate! @0) @1) and thus require the negate
to be simplified?

Also the

+      && (TREE_CODE (@1) != INTEGER_CST
+          || wi::popcount (wi::to_wide (@1)) > 1))

exception is odd without providing the desired canonicalization to a shift?

In the end all this looks more like something for RTL (expansion?) where we
can query costs rather than canonicalization (to simpler expressions) which
is what we should do on GIMPLE.

Richard.

>
>
> 2022-04-20  Roger Sayle  <roger@nextmovesoftware.com>
>             Andrew Pinski  <apinski@marvell.com>
>             Jakub Jelinek  <jakub@redhat.com>
>
> gcc/ChangeLog
>         PR middle-end/98865
>         * match.pd (match zero_one_valued_p): New predicate.
>         (mult @0 @1): Use zero_one_valued_p for transforming into (and @0
> @1).
>         (mult zero_one_valued_p@0 @1): Convert integer multiplication into
>         a negation and a bit-wise AND, if it can't be cheaply implemented by
>         a single left shift.
>
> gcc/testsuite/ChangeLog
>         PR middle-end/98865
>         * gcc.dg/pr98865.c: New test case.
>         * gcc.dg/vrp116.c: Tweak test to confirm the integer multiplication
>         has been eliminated, not for the actual replacement implementation.
>
> Thanks,
> Roger
> --
>

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2022-05-02 11:48 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-20 18:35 [PATCH] PR middle-end/98865: Optimize (a>>63)*b as -(a>>63)&b in match.pd Roger Sayle
2022-05-02 11:48 ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).