public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: [PATCH] match.pd: Implement missed optimization (~X | Y) ^ X -> ~(X & Y) [PR109986]
@ 2023-07-25 19:42 David Edelsohn
  2023-07-25 19:44 ` Jakub Jelinek
  0 siblings, 1 reply; 11+ messages in thread
From: David Edelsohn @ 2023-07-25 19:42 UTC (permalink / raw)
  To: Drew Ross; +Cc: GCC Patches

Hi, Drew

Thanks for addressing this missed optimization.

The testcase includes an incorrect assumption: signed char, which
causes the testcase to fail on PowerPC.

Should the testcase be updated to specify signed char in the function
signatures or should -fsigned-char be added to the command line
options?

Thanks, David

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] match.pd: Implement missed optimization (~X | Y) ^ X -> ~(X & Y) [PR109986]
  2023-07-25 19:42 [PATCH] match.pd: Implement missed optimization (~X | Y) ^ X -> ~(X & Y) [PR109986] David Edelsohn
@ 2023-07-25 19:44 ` Jakub Jelinek
  2023-07-25 20:54   ` Andrew Pinski
  0 siblings, 1 reply; 11+ messages in thread
From: Jakub Jelinek @ 2023-07-25 19:44 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Drew Ross, GCC Patches

On Tue, Jul 25, 2023 at 03:42:21PM -0400, David Edelsohn via Gcc-patches wrote:
> Hi, Drew
> 
> Thanks for addressing this missed optimization.
> 
> The testcase includes an incorrect assumption: signed char, which
> causes the testcase to fail on PowerPC.
> 
> Should the testcase be updated to specify signed char in the function
> signatures or should -fsigned-char be added to the command line
> options?

I think we should use signed char instead of char in the testcase.

	Jakub


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] match.pd: Implement missed optimization (~X | Y) ^ X -> ~(X & Y) [PR109986]
  2023-07-25 19:44 ` Jakub Jelinek
@ 2023-07-25 20:54   ` Andrew Pinski
  2023-07-25 21:58     ` Andrew Pinski
  0 siblings, 1 reply; 11+ messages in thread
From: Andrew Pinski @ 2023-07-25 20:54 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: David Edelsohn, Drew Ross, GCC Patches

On Tue, Jul 25, 2023 at 12:45 PM Jakub Jelinek via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On Tue, Jul 25, 2023 at 03:42:21PM -0400, David Edelsohn via Gcc-patches wrote:
> > Hi, Drew
> >
> > Thanks for addressing this missed optimization.
> >
> > The testcase includes an incorrect assumption: signed char, which
> > causes the testcase to fail on PowerPC.
> >
> > Should the testcase be updated to specify signed char in the function
> > signatures or should -fsigned-char be added to the command line
> > options?
>
> I think we should use signed char instead of char in the testcase.

I also think it should be `signed char` instead as I mentioned in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110803 .

Thanks,
Andrew

>
>         Jakub
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] match.pd: Implement missed optimization (~X | Y) ^ X -> ~(X & Y) [PR109986]
  2023-07-25 20:54   ` Andrew Pinski
@ 2023-07-25 21:58     ` Andrew Pinski
  2023-07-26 13:37       ` Drew Ross
  0 siblings, 1 reply; 11+ messages in thread
From: Andrew Pinski @ 2023-07-25 21:58 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: David Edelsohn, Drew Ross, GCC Patches

On Tue, Jul 25, 2023 at 1:54 PM Andrew Pinski <pinskia@gmail.com> wrote:
>
> On Tue, Jul 25, 2023 at 12:45 PM Jakub Jelinek via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > On Tue, Jul 25, 2023 at 03:42:21PM -0400, David Edelsohn via Gcc-patches wrote:
> > > Hi, Drew
> > >
> > > Thanks for addressing this missed optimization.
> > >
> > > The testcase includes an incorrect assumption: signed char, which
> > > causes the testcase to fail on PowerPC.
> > >
> > > Should the testcase be updated to specify signed char in the function
> > > signatures or should -fsigned-char be added to the command line
> > > options?
> >
> > I think we should use signed char instead of char in the testcase.
>
> I also think it should be `signed char` instead as I mentioned in
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110803 .

Committed the testsuite fix as r14-2767-g67357270772b91 .

Thanks,
Andrew

>
> Thanks,
> Andrew
>
> >
> >         Jakub
> >

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] match.pd: Implement missed optimization (~X | Y) ^ X -> ~(X & Y) [PR109986]
  2023-07-25 21:58     ` Andrew Pinski
@ 2023-07-26 13:37       ` Drew Ross
  0 siblings, 0 replies; 11+ messages in thread
From: Drew Ross @ 2023-07-26 13:37 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: Jakub Jelinek, David Edelsohn, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 1205 bytes --]

Thanks for catching and fixing David and Andrew.

Drew

On Tue, Jul 25, 2023 at 5:59 PM Andrew Pinski <pinskia@gmail.com> wrote:

> On Tue, Jul 25, 2023 at 1:54 PM Andrew Pinski <pinskia@gmail.com> wrote:
> >
> > On Tue, Jul 25, 2023 at 12:45 PM Jakub Jelinek via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> > >
> > > On Tue, Jul 25, 2023 at 03:42:21PM -0400, David Edelsohn via
> Gcc-patches wrote:
> > > > Hi, Drew
> > > >
> > > > Thanks for addressing this missed optimization.
> > > >
> > > > The testcase includes an incorrect assumption: signed char, which
> > > > causes the testcase to fail on PowerPC.
> > > >
> > > > Should the testcase be updated to specify signed char in the function
> > > > signatures or should -fsigned-char be added to the command line
> > > > options?
> > >
> > > I think we should use signed char instead of char in the testcase.
> >
> > I also think it should be `signed char` instead as I mentioned in
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110803 .
>
> Committed the testsuite fix as r14-2767-g67357270772b91 .
>
> Thanks,
> Andrew
>
> >
> > Thanks,
> > Andrew
> >
> > >
> > >         Jakub
> > >
>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] match.pd: Implement missed optimization (~X | Y) ^ X -> ~(X & Y) [PR109986]
  2023-07-11 13:58     ` Richard Biener
@ 2023-07-19 13:17       ` Drew Ross
  0 siblings, 0 replies; 11+ messages in thread
From: Drew Ross @ 2023-07-19 13:17 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jakub Jelinek, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 7744 bytes --]

Trying to lower converts to operands through, for example,
(for op (bit_ior bit_and bit_xor)
 (for rop (bit_xor bit_ior bit_and)
  (simplify
   (op:c (nop_convert (rop @0 @1)) @3)
   (op (rop (convert:type @0) (convert:type @1)) @3))))

(simplify
 (convert (bit_not @0))
 (bit_not (convert:type @0)))

Runs into infinite oscillations with
/* Try to fold (type) X op CST -> (type) (X op ((type-x) CST))
   when profitable.
...
  (bitop (convert@2 @0) (convert?@3 @1))
...
   (convert (bitop @0 (convert @1)))))

when integer constants are involved ex.
unsigned int main (int x, unsigned int y)
{
  unsigned int a = x | 4213678;
  unsigned int b = a ^ y;
  return b;
}

I think using Jakub's bitwise equal macro to get it down to 16 cases might
be our best option.

Drew

On Tue, Jul 11, 2023 at 9:58 AM Richard Biener <richard.guenther@gmail.com>
wrote:

> On Tue, Jul 11, 2023 at 3:08 PM Jakub Jelinek <jakub@redhat.com> wrote:
> >
> > On Thu, Jul 06, 2023 at 03:00:28PM +0200, Richard Biener via Gcc-patches
> wrote:
> > > On Wed, Jul 5, 2023 at 3:42 PM Drew Ross via Gcc-patches
> > > <gcc-patches@gcc.gnu.org> wrote:
> > > >
> > > >     Adds a simplification for (~X | Y) ^ X to be folded into ~(X &
> Y).
> > > >     Tested successfully on x86_64 and x86 targets.
> > > >
> > > >             PR middle-end/109986
> > > >
> > > >     gcc/ChangeLog:
> > > >
> > > >             * match.pd ((~X | Y) ^ X -> ~(X & Y)): New
> simplification.
> > > >
> > > >     gcc/testsuite/ChangeLog:
> > > >
> > > >             * gcc.c-torture/execute/pr109986.c: New test.
> > > >             * gcc.dg/tree-ssa/pr109986.c: New test.
> > > > ---
> > > >  gcc/match.pd                                  |  11 ++
> > > >  .../gcc.c-torture/execute/pr109986.c          |  41 ++++
> > > >  gcc/testsuite/gcc.dg/tree-ssa/pr109986.c      | 177
> ++++++++++++++++++
> > > >  3 files changed, 229 insertions(+)
> > > >  create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr109986.c
> > > >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr109986.c
> > > >
> > > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > > index a17d6838c14..d9d7d932881 100644
> > > > --- a/gcc/match.pd
> > > > +++ b/gcc/match.pd
> > > > @@ -1627,6 +1627,17 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > > >   (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
> > > >    (convert (bit_and @1 (bit_not @0)))))
> > > >
> > > > +/* (~X | Y) ^ X -> ~(X & Y).  */
> > > > +(simplify
> > > > + (bit_xor:c (nop_convert1?
> > > > +             (bit_ior:c (nop_convert2? (bit_not (nop_convert3? @0)))
> > > > +                        @1)) (nop_convert4? @0))
> > >
> > > you want to reduce the number of nop_convert? - for example
> > > I wonder if we can canonicalize
> > >
> > >  (T)~X and ~(T)X
> > >
> > > for nop-conversions.  The same might apply to binary bitwise operations
> > > where we should push those to a direction where they are likely
> eliminated.
> > > Usually we'd push them outwards.
> > >
> > > The issue with the above pattern is that nop_convertN? expands to 2^N
> > > separate patterns.  Together with the two :c you get 64 out of this.
> > >
> > > I do not see that all of the combinations can happen when X has to
> > > match unless we fail to contract some of them like if we have
> > > (unsigned)(~(signed)X | Y) ^ X which we could rewrite like
> > > -> (unsigned)((signed)~X | Y) ^ X -> (~X | (unsigned) Y) ^ X
> > > with the last step being somewhat difficult unless we do
> > > (signed)~X | Y -> (signed)(~X | (unsigned)Y).  It feels like a
> > > propagation problem and less of a direct pattern matching one.
> >
> > The nop_convert1? in the pattern might seem to be unnecessary
> > for cases like:
> > int i, j, k, l;
> > unsigned u, v, w, x;
> >
> > void
> > foo (void)
> > {
> >   int t0 = i;
> >   int t1 = (~t0) | j;
> >   x = t1 ^ (unsigned) t0;
> >   unsigned t2 = u;
> >   unsigned t3 = (~t2) | v;
> >   i = ((int) t3) ^ (int) t2;
> > }
> > we actually optimize it with or without the nop_convert1? in place,
> > because we have the
> > /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST))
> >    when profitable.
> > ...
> >   (bitop (convert@2 @0) (convert?@3 @1))
> > ...
> >    (convert (bitop @0 (convert @1)))))
> > simplification.
> > Except that on
> > void
> > bar (void)
> > {
> >   unsigned t0 = u;
> >   int t1 = (~(int) t0) | j;
> >   x = t1 ^ t0;
> >   int t2 = i;
> >   unsigned t3 = (~(unsigned) t2) | v;
> >   i = ((int) t3) ^ t2;
> > }
> > the optimization doesn't trigger without the nop_convert1? and does
> > with it.
> >
> > Perhaps we could get rid of nop_convert3? and nop_convert4?
> > by introducing a macro/inline function predicate like:
> > bitwise_equal_p (expr1, expr2) and instead of using
> > (nop_convert3? @0) and (nop_convert4? @0) in the pattern
> > use @0 and @2 and then add
> > if (bitwise_equal_p (@0, @2))
> > to the condition.
> > For GENERIC (i.e. in generic-match-head.cc) it could be something like:
> > static inline bool
> > bitwise_equal_p (tree expr1, tree expr2)
> > {
> >   STRIP_NOPS (expr1);
> >   STRIP_NOPS (expr2);
> >   if (expr1 == expr2)
> >     return true;
> >   if (!tree_nop_conversion_p (TREE_TYPE (expr1), TREE_TYPE (expr2)))
> >     return false;
> >   if (TREE_CODE (expr1) == INTEGER_CST && TREE_CODE (expr2) ==
> INTEGER_CST)
> >     return wi::to_wide (expr1) == wi::to_wide (expr2);
> >   return operand_equal_p (expr1, expr2, 0);
> > }
> > (the INTEGER_CST special case because operand_equal_p compares
> wi::to_widest
> > which could be different if one constant is signed and the other
> unsigned).
> > For GIMPLE, I wonder if it shouldn't be a macro that takes valueize into
> > account, and do something like:
> > #define bitwise_equal_p(expr1, expr2) gimple_bitwise_equal_p (expr1,
> expr2, valueize)
> >
> > bool gimple_nop_convert (tree, tree *, tree (*)(tree));
> >
> > static inline bool
> > gimple_bitwise_equal_p (tree expr1, tree expr2, tree (*valueize) (tree))
> > {
> >   if (expr1 == expr2)
> >     return true;
> >   if (!tree_nop_conversion_p (TREE_TYPE (expr1), TREE_TYPE (expr2)))
> >     return false;
> >   if (TREE_CODE (expr1) == INTEGER_CST && TREE_CODE (expr2) ==
> INTEGER_CST)
> >     return wi::to_wide (expr1) == wi::to_wide (expr2);
> >   if (operand_equal_p (expr1, expr2, 0))
> >     return true;
> >   tree expr3, expr4;
> >   if (!gimple_nop_convert (expr1, &expr3, valueize))
> >     expr3 = expr1;
> >   if (!gimple_nop_convert (expr2, &expr4, valueize))
> >     expr4 = expr2;
> >   if (expr1 != expr3)
> >     {
> >       if (operand_equal_p (expr3, expr2, 0))
> >         return true;
> >       if (expr2 != expr4 && operand_equal_p (expr3, expr4, 0))
> >         return true;
> >     }
> >   if (expr2 != expr4 && operand_equal_p (expr1, expr4, 0))
> >     return true;
> >   return false;
> > }
> >
> > Completely untested.  What do you think?
> > Though, that brings us only still to 16 cases of this.
>
> I guess we can also not worry and hope for a better code generator ...
>
> The obvious improvement there is to delay pattern expansion (with for and
> ?)
> until we get two patterns on the same sub-tree so patterns that are the
> only ones at some point during the sub-tree matching can then be expanded
> with code generation optimized for code size (:c is the only difficult
> case there).
>
> Matching the shortest paths to leaf first might then improve things
> further.
>
> But this is a complete rewrite of the decision tree builder, so ...
>
> Richard.
>
> >
> >         Jakub
> >
>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] match.pd: Implement missed optimization (~X | Y) ^ X -> ~(X & Y) [PR109986]
  2023-07-11 13:08   ` Jakub Jelinek
@ 2023-07-11 13:58     ` Richard Biener
  2023-07-19 13:17       ` Drew Ross
  0 siblings, 1 reply; 11+ messages in thread
From: Richard Biener @ 2023-07-11 13:58 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Drew Ross, gcc-patches

On Tue, Jul 11, 2023 at 3:08 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Thu, Jul 06, 2023 at 03:00:28PM +0200, Richard Biener via Gcc-patches wrote:
> > On Wed, Jul 5, 2023 at 3:42 PM Drew Ross via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> > >
> > >     Adds a simplification for (~X | Y) ^ X to be folded into ~(X & Y).
> > >     Tested successfully on x86_64 and x86 targets.
> > >
> > >             PR middle-end/109986
> > >
> > >     gcc/ChangeLog:
> > >
> > >             * match.pd ((~X | Y) ^ X -> ~(X & Y)): New simplification.
> > >
> > >     gcc/testsuite/ChangeLog:
> > >
> > >             * gcc.c-torture/execute/pr109986.c: New test.
> > >             * gcc.dg/tree-ssa/pr109986.c: New test.
> > > ---
> > >  gcc/match.pd                                  |  11 ++
> > >  .../gcc.c-torture/execute/pr109986.c          |  41 ++++
> > >  gcc/testsuite/gcc.dg/tree-ssa/pr109986.c      | 177 ++++++++++++++++++
> > >  3 files changed, 229 insertions(+)
> > >  create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr109986.c
> > >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr109986.c
> > >
> > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > index a17d6838c14..d9d7d932881 100644
> > > --- a/gcc/match.pd
> > > +++ b/gcc/match.pd
> > > @@ -1627,6 +1627,17 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > >   (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
> > >    (convert (bit_and @1 (bit_not @0)))))
> > >
> > > +/* (~X | Y) ^ X -> ~(X & Y).  */
> > > +(simplify
> > > + (bit_xor:c (nop_convert1?
> > > +             (bit_ior:c (nop_convert2? (bit_not (nop_convert3? @0)))
> > > +                        @1)) (nop_convert4? @0))
> >
> > you want to reduce the number of nop_convert? - for example
> > I wonder if we can canonicalize
> >
> >  (T)~X and ~(T)X
> >
> > for nop-conversions.  The same might apply to binary bitwise operations
> > where we should push those to a direction where they are likely eliminated.
> > Usually we'd push them outwards.
> >
> > The issue with the above pattern is that nop_convertN? expands to 2^N
> > separate patterns.  Together with the two :c you get 64 out of this.
> >
> > I do not see that all of the combinations can happen when X has to
> > match unless we fail to contract some of them like if we have
> > (unsigned)(~(signed)X | Y) ^ X which we could rewrite like
> > -> (unsigned)((signed)~X | Y) ^ X -> (~X | (unsigned) Y) ^ X
> > with the last step being somewhat difficult unless we do
> > (signed)~X | Y -> (signed)(~X | (unsigned)Y).  It feels like a
> > propagation problem and less of a direct pattern matching one.
>
> The nop_convert1? in the pattern might seem to be unnecessary
> for cases like:
> int i, j, k, l;
> unsigned u, v, w, x;
>
> void
> foo (void)
> {
>   int t0 = i;
>   int t1 = (~t0) | j;
>   x = t1 ^ (unsigned) t0;
>   unsigned t2 = u;
>   unsigned t3 = (~t2) | v;
>   i = ((int) t3) ^ (int) t2;
> }
> we actually optimize it with or without the nop_convert1? in place,
> because we have the
> /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST))
>    when profitable.
> ...
>   (bitop (convert@2 @0) (convert?@3 @1))
> ...
>    (convert (bitop @0 (convert @1)))))
> simplification.
> Except that on
> void
> bar (void)
> {
>   unsigned t0 = u;
>   int t1 = (~(int) t0) | j;
>   x = t1 ^ t0;
>   int t2 = i;
>   unsigned t3 = (~(unsigned) t2) | v;
>   i = ((int) t3) ^ t2;
> }
> the optimization doesn't trigger without the nop_convert1? and does
> with it.
>
> Perhaps we could get rid of nop_convert3? and nop_convert4?
> by introducing a macro/inline function predicate like:
> bitwise_equal_p (expr1, expr2) and instead of using
> (nop_convert3? @0) and (nop_convert4? @0) in the pattern
> use @0 and @2 and then add
> if (bitwise_equal_p (@0, @2))
> to the condition.
> For GENERIC (i.e. in generic-match-head.cc) it could be something like:
> static inline bool
> bitwise_equal_p (tree expr1, tree expr2)
> {
>   STRIP_NOPS (expr1);
>   STRIP_NOPS (expr2);
>   if (expr1 == expr2)
>     return true;
>   if (!tree_nop_conversion_p (TREE_TYPE (expr1), TREE_TYPE (expr2)))
>     return false;
>   if (TREE_CODE (expr1) == INTEGER_CST && TREE_CODE (expr2) == INTEGER_CST)
>     return wi::to_wide (expr1) == wi::to_wide (expr2);
>   return operand_equal_p (expr1, expr2, 0);
> }
> (the INTEGER_CST special case because operand_equal_p compares wi::to_widest
> which could be different if one constant is signed and the other unsigned).
> For GIMPLE, I wonder if it shouldn't be a macro that takes valueize into
> account, and do something like:
> #define bitwise_equal_p(expr1, expr2) gimple_bitwise_equal_p (expr1, expr2, valueize)
>
> bool gimple_nop_convert (tree, tree *, tree (*)(tree));
>
> static inline bool
> gimple_bitwise_equal_p (tree expr1, tree expr2, tree (*valueize) (tree))
> {
>   if (expr1 == expr2)
>     return true;
>   if (!tree_nop_conversion_p (TREE_TYPE (expr1), TREE_TYPE (expr2)))
>     return false;
>   if (TREE_CODE (expr1) == INTEGER_CST && TREE_CODE (expr2) == INTEGER_CST)
>     return wi::to_wide (expr1) == wi::to_wide (expr2);
>   if (operand_equal_p (expr1, expr2, 0))
>     return true;
>   tree expr3, expr4;
>   if (!gimple_nop_convert (expr1, &expr3, valueize))
>     expr3 = expr1;
>   if (!gimple_nop_convert (expr2, &expr4, valueize))
>     expr4 = expr2;
>   if (expr1 != expr3)
>     {
>       if (operand_equal_p (expr3, expr2, 0))
>         return true;
>       if (expr2 != expr4 && operand_equal_p (expr3, expr4, 0))
>         return true;
>     }
>   if (expr2 != expr4 && operand_equal_p (expr1, expr4, 0))
>     return true;
>   return false;
> }
>
> Completely untested.  What do you think?
> Though, that brings us only still to 16 cases of this.

I guess we can also not worry and hope for a better code generator ...

The obvious improvement there is to delay pattern expansion (with for and ?)
until we get two patterns on the same sub-tree so patterns that are the
only ones at some point during the sub-tree matching can then be expanded
with code generation optimized for code size (:c is the only difficult
case there).

Matching the shortest paths to leaf first might then improve things further.

But this is a complete rewrite of the decision tree builder, so ...

Richard.

>
>         Jakub
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] match.pd: Implement missed optimization (~X | Y) ^ X -> ~(X & Y) [PR109986]
  2023-07-06 13:00 ` Richard Biener
  2023-07-06 19:51   ` Jakub Jelinek
@ 2023-07-11 13:08   ` Jakub Jelinek
  2023-07-11 13:58     ` Richard Biener
  1 sibling, 1 reply; 11+ messages in thread
From: Jakub Jelinek @ 2023-07-11 13:08 UTC (permalink / raw)
  To: Richard Biener; +Cc: Drew Ross, gcc-patches

On Thu, Jul 06, 2023 at 03:00:28PM +0200, Richard Biener via Gcc-patches wrote:
> On Wed, Jul 5, 2023 at 3:42 PM Drew Ross via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> >     Adds a simplification for (~X | Y) ^ X to be folded into ~(X & Y).
> >     Tested successfully on x86_64 and x86 targets.
> >
> >             PR middle-end/109986
> >
> >     gcc/ChangeLog:
> >
> >             * match.pd ((~X | Y) ^ X -> ~(X & Y)): New simplification.
> >
> >     gcc/testsuite/ChangeLog:
> >
> >             * gcc.c-torture/execute/pr109986.c: New test.
> >             * gcc.dg/tree-ssa/pr109986.c: New test.
> > ---
> >  gcc/match.pd                                  |  11 ++
> >  .../gcc.c-torture/execute/pr109986.c          |  41 ++++
> >  gcc/testsuite/gcc.dg/tree-ssa/pr109986.c      | 177 ++++++++++++++++++
> >  3 files changed, 229 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr109986.c
> >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr109986.c
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index a17d6838c14..d9d7d932881 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -1627,6 +1627,17 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >   (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
> >    (convert (bit_and @1 (bit_not @0)))))
> >
> > +/* (~X | Y) ^ X -> ~(X & Y).  */
> > +(simplify
> > + (bit_xor:c (nop_convert1?
> > +             (bit_ior:c (nop_convert2? (bit_not (nop_convert3? @0)))
> > +                        @1)) (nop_convert4? @0))
> 
> you want to reduce the number of nop_convert? - for example
> I wonder if we can canonicalize
> 
>  (T)~X and ~(T)X
> 
> for nop-conversions.  The same might apply to binary bitwise operations
> where we should push those to a direction where they are likely eliminated.
> Usually we'd push them outwards.
> 
> The issue with the above pattern is that nop_convertN? expands to 2^N
> separate patterns.  Together with the two :c you get 64 out of this.
> 
> I do not see that all of the combinations can happen when X has to
> match unless we fail to contract some of them like if we have
> (unsigned)(~(signed)X | Y) ^ X which we could rewrite like
> -> (unsigned)((signed)~X | Y) ^ X -> (~X | (unsigned) Y) ^ X
> with the last step being somewhat difficult unless we do
> (signed)~X | Y -> (signed)(~X | (unsigned)Y).  It feels like a
> propagation problem and less of a direct pattern matching one.

The nop_convert1? in the pattern might seem to be unnecessary
for cases like:
int i, j, k, l;
unsigned u, v, w, x;

void
foo (void)
{
  int t0 = i;
  int t1 = (~t0) | j;
  x = t1 ^ (unsigned) t0;
  unsigned t2 = u;
  unsigned t3 = (~t2) | v;
  i = ((int) t3) ^ (int) t2;
}
we actually optimize it with or without the nop_convert1? in place,
because we have the
/* Try to fold (type) X op CST -> (type) (X op ((type-x) CST))
   when profitable.
...
  (bitop (convert@2 @0) (convert?@3 @1))
...
   (convert (bitop @0 (convert @1)))))
simplification.
Except that on
void
bar (void)
{
  unsigned t0 = u;
  int t1 = (~(int) t0) | j;
  x = t1 ^ t0;
  int t2 = i;
  unsigned t3 = (~(unsigned) t2) | v;
  i = ((int) t3) ^ t2;
}
the optimization doesn't trigger without the nop_convert1? and does
with it.

Perhaps we could get rid of nop_convert3? and nop_convert4?
by introducing a macro/inline function predicate like:
bitwise_equal_p (expr1, expr2) and instead of using
(nop_convert3? @0) and (nop_convert4? @0) in the pattern
use @0 and @2 and then add
if (bitwise_equal_p (@0, @2))
to the condition.
For GENERIC (i.e. in generic-match-head.cc) it could be something like:
static inline bool
bitwise_equal_p (tree expr1, tree expr2)
{
  STRIP_NOPS (expr1);
  STRIP_NOPS (expr2);
  if (expr1 == expr2)
    return true;
  if (!tree_nop_conversion_p (TREE_TYPE (expr1), TREE_TYPE (expr2)))
    return false;
  if (TREE_CODE (expr1) == INTEGER_CST && TREE_CODE (expr2) == INTEGER_CST)
    return wi::to_wide (expr1) == wi::to_wide (expr2);
  return operand_equal_p (expr1, expr2, 0);
}
(the INTEGER_CST special case because operand_equal_p compares wi::to_widest
which could be different if one constant is signed and the other unsigned).
For GIMPLE, I wonder if it shouldn't be a macro that takes valueize into
account, and do something like:
#define bitwise_equal_p(expr1, expr2) gimple_bitwise_equal_p (expr1, expr2, valueize)

bool gimple_nop_convert (tree, tree *, tree (*)(tree));

static inline bool
gimple_bitwise_equal_p (tree expr1, tree expr2, tree (*valueize) (tree))
{
  if (expr1 == expr2)
    return true;
  if (!tree_nop_conversion_p (TREE_TYPE (expr1), TREE_TYPE (expr2)))
    return false;
  if (TREE_CODE (expr1) == INTEGER_CST && TREE_CODE (expr2) == INTEGER_CST)
    return wi::to_wide (expr1) == wi::to_wide (expr2);
  if (operand_equal_p (expr1, expr2, 0))
    return true;
  tree expr3, expr4;
  if (!gimple_nop_convert (expr1, &expr3, valueize))
    expr3 = expr1;
  if (!gimple_nop_convert (expr2, &expr4, valueize))
    expr4 = expr2;
  if (expr1 != expr3)
    {
      if (operand_equal_p (expr3, expr2, 0))
	return true;
      if (expr2 != expr4 && operand_equal_p (expr3, expr4, 0))
	return true;
    }
  if (expr2 != expr4 && operand_equal_p (expr1, expr4, 0))
    return true;
  return false;
}

Completely untested.  What do you think?
Though, that brings us only still to 16 cases of this.

	Jakub


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] match.pd: Implement missed optimization (~X | Y) ^ X -> ~(X & Y) [PR109986]
  2023-07-06 13:00 ` Richard Biener
@ 2023-07-06 19:51   ` Jakub Jelinek
  2023-07-11 13:08   ` Jakub Jelinek
  1 sibling, 0 replies; 11+ messages in thread
From: Jakub Jelinek @ 2023-07-06 19:51 UTC (permalink / raw)
  To: Richard Biener; +Cc: Drew Ross, gcc-patches

On Thu, Jul 06, 2023 at 03:00:28PM +0200, Richard Biener via Gcc-patches wrote:
> > +  (if (types_match (type, @1))
> > +   (bit_not (bit_and @1 (convert @0)))
> > +   (if (types_match (type, @0))
> > +    (bit_not (bit_and (convert @1) @0))
> > +    (convert (bit_not (bit_and @0 (convert @1)))))))
> 
> You can elide the types_match checks and instead always emit
> 
>   (convert (bit_not (bit_and @0 (convert @1)))
> 
> the conversions are elided when the types match.

If all types match, sure, any of the variants will be good.
But if say @1 matches type and doesn't match @0, then
(convert (bit_not (bit_and @0 (convert @1)))
will result in 2 conversions instead of just 1.
Of course, it could be alternatively solved by some other simplify
that would reduce the number of conversions.

	Jakub


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] match.pd: Implement missed optimization (~X | Y) ^ X -> ~(X & Y) [PR109986]
  2023-07-05 13:41 Drew Ross
@ 2023-07-06 13:00 ` Richard Biener
  2023-07-06 19:51   ` Jakub Jelinek
  2023-07-11 13:08   ` Jakub Jelinek
  0 siblings, 2 replies; 11+ messages in thread
From: Richard Biener @ 2023-07-06 13:00 UTC (permalink / raw)
  To: Drew Ross; +Cc: gcc-patches

On Wed, Jul 5, 2023 at 3:42 PM Drew Ross via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
>     Adds a simplification for (~X | Y) ^ X to be folded into ~(X & Y).
>     Tested successfully on x86_64 and x86 targets.
>
>             PR middle-end/109986
>
>     gcc/ChangeLog:
>
>             * match.pd ((~X | Y) ^ X -> ~(X & Y)): New simplification.
>
>     gcc/testsuite/ChangeLog:
>
>             * gcc.c-torture/execute/pr109986.c: New test.
>             * gcc.dg/tree-ssa/pr109986.c: New test.
> ---
>  gcc/match.pd                                  |  11 ++
>  .../gcc.c-torture/execute/pr109986.c          |  41 ++++
>  gcc/testsuite/gcc.dg/tree-ssa/pr109986.c      | 177 ++++++++++++++++++
>  3 files changed, 229 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr109986.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr109986.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index a17d6838c14..d9d7d932881 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -1627,6 +1627,17 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
>    (convert (bit_and @1 (bit_not @0)))))
>
> +/* (~X | Y) ^ X -> ~(X & Y).  */
> +(simplify
> + (bit_xor:c (nop_convert1?
> +             (bit_ior:c (nop_convert2? (bit_not (nop_convert3? @0)))
> +                        @1)) (nop_convert4? @0))

you want to reduce the number of nop_convert? - for example
I wonder if we can canonicalize

 (T)~X and ~(T)X

for nop-conversions.  The same might apply to binary bitwise operations
where we should push those to a direction where they are likely eliminated.
Usually we'd push them outwards.

The issue with the above pattern is that nop_convertN? expands to 2^N
separate patterns.  Together with the two :c you get 64 out of this.

I do not see that all of the combinations can happen when X has to
match unless we fail to contract some of them like if we have
(unsigned)(~(signed)X | Y) ^ X which we could rewrite like
-> (unsigned)((signed)~X | Y) ^ X -> (~X | (unsigned) Y) ^ X
with the last step being somewhat difficult unless we do
(signed)~X | Y -> (signed)(~X | (unsigned)Y).  It feels like a
propagation problem and less of a direct pattern matching one.

> +  (if (types_match (type, @1))
> +   (bit_not (bit_and @1 (convert @0)))
> +   (if (types_match (type, @0))
> +    (bit_not (bit_and (convert @1) @0))
> +    (convert (bit_not (bit_and @0 (convert @1)))))))

You can elide the types_match checks and instead always emit

  (convert (bit_not (bit_and @0 (convert @1)))

the conversions are elided when the types match.

Richard.

> +
>  /* Convert ~X ^ ~Y to X ^ Y.  */
>  (simplify
>   (bit_xor (convert1? (bit_not @0)) (convert2? (bit_not @1)))
> diff --git a/gcc/testsuite/gcc.c-torture/execute/pr109986.c b/gcc/testsuite/gcc.c-torture/execute/pr109986.c
> new file mode 100644
> index 00000000000..00ee9888539
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/execute/pr109986.c
> @@ -0,0 +1,41 @@
> +/* PR middle-end/109986 */
> +
> +#include "../../gcc.dg/tree-ssa/pr109986.c"
> +
> +int
> +main ()
> +{
> +  if (t1 (29789, 29477) != -28678) __builtin_abort ();
> +  if (t2 (20196, -18743) != 4294965567) __builtin_abort ();
> +  if (t3 (127, 99) != -100) __builtin_abort ();
> +  if (t4 (100, 53) != 219) __builtin_abort ();
> +  if (t5 (20100, 1283) != -1025) __builtin_abort ();
> +  if (t6 (20100, 10283) != 63487) __builtin_abort ();
> +  if (t7 (2136614690L, 1136698390L) != -1128276995L) __builtin_abort ();
> +  if (t8 (1136698390L, 2136614690L) != -1128276995UL) __builtin_abort ();
> +  if (t9 (9176690219839792930LL, 3176690219839721234LL) != -3175044472123688707LL)
> +    __builtin_abort ();
> +  if (t10 (9176690219839792930LL, 3176690219839721234LL) != 15271699601585862909ULL)
> +    __builtin_abort ();
> +  if (t11 (29789, 29477) != -28678) __builtin_abort ();
> +  if (t12 (20196, -18743) != 4294965567) __builtin_abort ();
> +  if (t13 (127, 99) != -100) __builtin_abort ();
> +  if (t14 (100, 53) != 219) __builtin_abort ();
> +  if (t15 (20100, 1283) != -1025) __builtin_abort ();
> +  if (t16 (20100, 10283) != 63487) __builtin_abort ();
> +  if (t17 (2136614690, 1136698390) != -1128276995) __builtin_abort ();
> +  if (t18 (1136698390L, 2136614690L) != -1128276995UL) __builtin_abort ();
> +  if (t19 (9176690219839792930LL, 3176690219839721234LL) != -3175044472123688707LL)
> +    __builtin_abort ();
> +  if (t20 (9176690219839792930LL, 3176690219839721234LL) != 15271699601585862909ULL)
> +    __builtin_abort ();
> +  v4si a1 = {1, 2, 3, 4};
> +  v4si a2 = {6, 7, 8, 9};
> +  v4si r1 = {-1, -3, -1, -1};
> +  v4si b1 = t21 (a1, a2);
> +  v4si b2 = t22 (a1, a2);
> +  if (__builtin_memcmp (&b1,  &r1,  sizeof (b1) != 0)) __builtin_abort();
> +  if (__builtin_memcmp (&b2,  &r1,  sizeof (b2) != 0)) __builtin_abort();
> +  return 0;
> +}
> +
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr109986.c b/gcc/testsuite/gcc.dg/tree-ssa/pr109986.c
> new file mode 100644
> index 00000000000..45f099b5656
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr109986.c
> @@ -0,0 +1,177 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-dse1 -Wno-psabi" } */
> +
> +typedef int v4si __attribute__((vector_size(16)));
> +
> +/* Generic */
> +__attribute__((noipa)) int
> +t1 (int a, int b)
> +{
> +  return (~a | b) ^ a;
> +}
> +
> +__attribute__((noipa)) unsigned int
> +t2 (int a, int b)
> +{
> +  return a ^ (~a | (unsigned int) b);
> +}
> +
> +__attribute__((noipa)) char
> +t3 (char a, char b)
> +{
> +  return (b | ~a) ^ a;
> +}
> +
> +__attribute__((noipa)) unsigned char
> +t4 (char a, char b)
> +{
> +  return ((unsigned char) a) ^ (b | ~a);
> +}
> +
> +__attribute__((noipa)) short
> +t5 (short a, short b)
> +{
> +  return a ^ (b | ~a);
> +}
> +
> +__attribute__((noipa)) unsigned short
> +t6 (short a, short b)
> +{
> +  return ((unsigned short) a) ^ (b | ~a);
> +}
> +
> +__attribute__((noipa)) long
> +t7 (long a, long b)
> +{
> +  return a ^ (b | ~a);
> +}
> +
> +__attribute__((noipa)) unsigned long
> +t8 (long a, long b)
> +{
> +  return ((unsigned long) a) ^ (b | ~a);
> +}
> +
> +__attribute__((noipa)) long long
> +t9 (long long a, long long b)
> +{
> +  return a ^ (b | ~a);
> +}
> +
> +__attribute__((noipa)) unsigned long long
> +t10 (long long a, long long b)
> +{
> +  return ((unsigned long long) a) ^ (b | ~a);
> +}
> +
> +__attribute__((noipa)) v4si
> +t21 (v4si a, v4si b)
> +{
> +  return a ^ (b | ~a);
> +}
> +
> +/* Gimple */
> +__attribute__((noipa)) int
> +t11 (int a, int b)
> +{
> +  int t1 = ~a;
> +  int t2 = t1 | b;
> +  int t3 = t2 ^ a;
> +  return t3;
> +}
> +
> +__attribute__((noipa)) unsigned int
> +t12 (int a, unsigned int b)
> +{
> +  int t1 = ~a;
> +  unsigned int t2 = t1 | b;
> +  unsigned int t3 = a ^ t2;
> +  return t3;
> +}
> +
> +__attribute__((noipa)) char
> +t13 (char a, char b)
> +{
> +  char t1 = ~a;
> +  char t2 = b | t1;
> +  char t3 = t2 ^ a;
> +  return t3;
> +}
> +
> +__attribute__((noipa)) unsigned char
> +t14 (unsigned char a, char b)
> +{
> +  unsigned char t1 = ~a;
> +  char t2 = b | t1;
> +  unsigned char t3 = a ^ t2;
> +  return t3;
> +}
> +
> +__attribute__((noipa)) short
> +t15 (short a, short b)
> +{
> +  short t1 = ~a;
> +  short t2 = t1 | b;
> +  short t3 = t2 ^ a;
> +  return t3;
> +}
> +
> +__attribute__((noipa)) unsigned short
> +t16 (unsigned short a, short b)
> +{
> +  short t1 = ~a;
> +  short t2 = t1 | b;
> +  unsigned short t3 = t2 ^ a;
> +  return t3;
> +}
> +
> +__attribute__((noipa)) long
> +t17 (long a, long b)
> +{
> +  long t1 = ~a;
> +  long t2 = t1 | b;
> +  long t3 = t2 ^ a;
> +  return t3;
> +}
> +
> +__attribute__((noipa)) unsigned long
> +t18 (long a, unsigned long b)
> +{
> +  long t1 = ~a;
> +  unsigned long t2 = t1 | b;
> +  unsigned long t3 = t2 ^ a;
> +  return t3;
> +}
> +
> +__attribute__((noipa)) long long
> +t19 (long long a, long long b)
> +{
> +  long long t1 = ~a;
> +  long long t2 = t1 | b;
> +  long long t3 = t2 ^ a;
> +  return t3;
> +}
> +
> +__attribute__((noipa)) unsigned long long
> +t20 (long long a, long long b)
> +{
> +  long long t1 = ~a;
> +  long long t2 = t1 | b;
> +  unsigned long long t3 = a ^ t2;
> +  return t3;
> +}
> +
> +__attribute__((noipa)) v4si
> +t22 (v4si a, v4si b)
> +{
> +  v4si t1 = ~a;
> +  v4si t2 = t1 | b;
> +  v4si t3 = a ^ t2;
> +  return t3;
> +}
> +
> +/* { dg-final { scan-tree-dump-not " \\\| " "dse1" } } */
> +/* { dg-final { scan-tree-dump-not " \\\^ " "dse1" } } */
> +/* { dg-final { scan-tree-dump-times " ~" 22 "dse1" } } */
> +/* { dg-final { scan-tree-dump-times " & " 22 "dse1" } } */
> +
> --
> 2.39.3
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH] match.pd: Implement missed optimization (~X | Y) ^ X -> ~(X & Y) [PR109986]
@ 2023-07-05 13:41 Drew Ross
  2023-07-06 13:00 ` Richard Biener
  0 siblings, 1 reply; 11+ messages in thread
From: Drew Ross @ 2023-07-05 13:41 UTC (permalink / raw)
  To: gcc-patches; +Cc: Drew Ross

    Adds a simplification for (~X | Y) ^ X to be folded into ~(X & Y).
    Tested successfully on x86_64 and x86 targets.

            PR middle-end/109986

    gcc/ChangeLog:

            * match.pd ((~X | Y) ^ X -> ~(X & Y)): New simplification.

    gcc/testsuite/ChangeLog:

            * gcc.c-torture/execute/pr109986.c: New test.
            * gcc.dg/tree-ssa/pr109986.c: New test.
---
 gcc/match.pd                                  |  11 ++
 .../gcc.c-torture/execute/pr109986.c          |  41 ++++
 gcc/testsuite/gcc.dg/tree-ssa/pr109986.c      | 177 ++++++++++++++++++
 3 files changed, 229 insertions(+)
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr109986.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr109986.c

diff --git a/gcc/match.pd b/gcc/match.pd
index a17d6838c14..d9d7d932881 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1627,6 +1627,17 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
   (convert (bit_and @1 (bit_not @0)))))
 
+/* (~X | Y) ^ X -> ~(X & Y).  */
+(simplify
+ (bit_xor:c (nop_convert1?
+             (bit_ior:c (nop_convert2? (bit_not (nop_convert3? @0)))
+                        @1)) (nop_convert4? @0))
+  (if (types_match (type, @1))
+   (bit_not (bit_and @1 (convert @0)))
+   (if (types_match (type, @0))
+    (bit_not (bit_and (convert @1) @0))
+    (convert (bit_not (bit_and @0 (convert @1)))))))
+
 /* Convert ~X ^ ~Y to X ^ Y.  */
 (simplify
  (bit_xor (convert1? (bit_not @0)) (convert2? (bit_not @1)))
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr109986.c b/gcc/testsuite/gcc.c-torture/execute/pr109986.c
new file mode 100644
index 00000000000..00ee9888539
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr109986.c
@@ -0,0 +1,41 @@
+/* PR middle-end/109986 */
+
+#include "../../gcc.dg/tree-ssa/pr109986.c"
+
+int 
+main ()
+{
+  if (t1 (29789, 29477) != -28678) __builtin_abort ();
+  if (t2 (20196, -18743) != 4294965567) __builtin_abort ();
+  if (t3 (127, 99) != -100) __builtin_abort ();
+  if (t4 (100, 53) != 219) __builtin_abort ();
+  if (t5 (20100, 1283) != -1025) __builtin_abort ();
+  if (t6 (20100, 10283) != 63487) __builtin_abort ();
+  if (t7 (2136614690L, 1136698390L) != -1128276995L) __builtin_abort ();
+  if (t8 (1136698390L, 2136614690L) != -1128276995UL) __builtin_abort ();
+  if (t9 (9176690219839792930LL, 3176690219839721234LL) != -3175044472123688707LL)
+    __builtin_abort ();
+  if (t10 (9176690219839792930LL, 3176690219839721234LL) != 15271699601585862909ULL)
+    __builtin_abort ();
+  if (t11 (29789, 29477) != -28678) __builtin_abort ();
+  if (t12 (20196, -18743) != 4294965567) __builtin_abort ();
+  if (t13 (127, 99) != -100) __builtin_abort ();
+  if (t14 (100, 53) != 219) __builtin_abort ();
+  if (t15 (20100, 1283) != -1025) __builtin_abort ();
+  if (t16 (20100, 10283) != 63487) __builtin_abort ();
+  if (t17 (2136614690, 1136698390) != -1128276995) __builtin_abort ();
+  if (t18 (1136698390L, 2136614690L) != -1128276995UL) __builtin_abort ();
+  if (t19 (9176690219839792930LL, 3176690219839721234LL) != -3175044472123688707LL)
+    __builtin_abort ();
+  if (t20 (9176690219839792930LL, 3176690219839721234LL) != 15271699601585862909ULL)
+    __builtin_abort ();
+  v4si a1 = {1, 2, 3, 4};
+  v4si a2 = {6, 7, 8, 9}; 
+  v4si r1 = {-1, -3, -1, -1}; 
+  v4si b1 = t21 (a1, a2);
+  v4si b2 = t22 (a1, a2);
+  if (__builtin_memcmp (&b1,  &r1,  sizeof (b1) != 0)) __builtin_abort();	
+  if (__builtin_memcmp (&b2,  &r1,  sizeof (b2) != 0)) __builtin_abort();
+  return 0;
+}
+
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr109986.c b/gcc/testsuite/gcc.dg/tree-ssa/pr109986.c
new file mode 100644
index 00000000000..45f099b5656
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr109986.c
@@ -0,0 +1,177 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-dse1 -Wno-psabi" } */
+
+typedef int v4si __attribute__((vector_size(16)));
+
+/* Generic */
+__attribute__((noipa)) int 
+t1 (int a, int b)
+{
+  return (~a | b) ^ a;
+}
+
+__attribute__((noipa)) unsigned int 
+t2 (int a, int b)
+{
+  return a ^ (~a | (unsigned int) b);
+}
+
+__attribute__((noipa)) char
+t3 (char a, char b)
+{
+  return (b | ~a) ^ a;
+}
+
+__attribute__((noipa)) unsigned char
+t4 (char a, char b)
+{
+  return ((unsigned char) a) ^ (b | ~a);
+}
+
+__attribute__((noipa)) short
+t5 (short a, short b)
+{
+  return a ^ (b | ~a);
+}
+
+__attribute__((noipa)) unsigned short
+t6 (short a, short b)
+{
+  return ((unsigned short) a) ^ (b | ~a);
+}
+
+__attribute__((noipa)) long
+t7 (long a, long b)
+{
+  return a ^ (b | ~a);
+}
+
+__attribute__((noipa)) unsigned long
+t8 (long a, long b)
+{
+  return ((unsigned long) a) ^ (b | ~a);
+}
+
+__attribute__((noipa)) long long
+t9 (long long a, long long b)
+{
+  return a ^ (b | ~a);
+}
+
+__attribute__((noipa)) unsigned long long
+t10 (long long a, long long b)
+{
+  return ((unsigned long long) a) ^ (b | ~a);
+}
+
+__attribute__((noipa)) v4si
+t21 (v4si a, v4si b)
+{
+  return a ^ (b | ~a);
+}
+
+/* Gimple */
+__attribute__((noipa)) int 
+t11 (int a, int b)
+{
+  int t1 = ~a;
+  int t2 = t1 | b;
+  int t3 = t2 ^ a;
+  return t3;
+}
+
+__attribute__((noipa)) unsigned int
+t12 (int a, unsigned int b)
+{
+  int t1 = ~a;
+  unsigned int t2 = t1 | b;
+  unsigned int t3 = a ^ t2;
+  return t3;
+}
+
+__attribute__((noipa)) char
+t13 (char a, char b)
+{
+  char t1 = ~a;
+  char t2 = b | t1;
+  char t3 = t2 ^ a;
+  return t3;
+}
+
+__attribute__((noipa)) unsigned char
+t14 (unsigned char a, char b)
+{
+  unsigned char t1 = ~a;
+  char t2 = b | t1;
+  unsigned char t3 = a ^ t2;
+  return t3;
+}
+
+__attribute__((noipa)) short 
+t15 (short a, short b)
+{
+  short t1 = ~a;
+  short t2 = t1 | b;
+  short t3 = t2 ^ a;
+  return t3;
+}
+
+__attribute__((noipa)) unsigned short 
+t16 (unsigned short a, short b)
+{
+  short t1 = ~a;
+  short t2 = t1 | b;
+  unsigned short t3 = t2 ^ a;
+  return t3;
+}
+
+__attribute__((noipa)) long 
+t17 (long a, long b)
+{
+  long t1 = ~a;
+  long t2 = t1 | b;
+  long t3 = t2 ^ a;
+  return t3;
+}
+
+__attribute__((noipa)) unsigned long 
+t18 (long a, unsigned long b)
+{
+  long t1 = ~a;
+  unsigned long t2 = t1 | b;
+  unsigned long t3 = t2 ^ a;
+  return t3;
+}
+
+__attribute__((noipa)) long long 
+t19 (long long a, long long b)
+{
+  long long t1 = ~a;
+  long long t2 = t1 | b;
+  long long t3 = t2 ^ a;
+  return t3;
+}
+
+__attribute__((noipa)) unsigned long long 
+t20 (long long a, long long b)
+{
+  long long t1 = ~a;
+  long long t2 = t1 | b;
+  unsigned long long t3 = a ^ t2;
+  return t3;
+}
+
+__attribute__((noipa)) v4si
+t22 (v4si a, v4si b)
+{
+  v4si t1 = ~a;
+  v4si t2 = t1 | b;
+  v4si t3 = a ^ t2;
+  return t3;
+}
+
+/* { dg-final { scan-tree-dump-not " \\\| " "dse1" } } */
+/* { dg-final { scan-tree-dump-not " \\\^ " "dse1" } } */
+/* { dg-final { scan-tree-dump-times " ~" 22 "dse1" } } */
+/* { dg-final { scan-tree-dump-times " & " 22 "dse1" } } */
+
-- 
2.39.3


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-07-26 13:37 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-25 19:42 [PATCH] match.pd: Implement missed optimization (~X | Y) ^ X -> ~(X & Y) [PR109986] David Edelsohn
2023-07-25 19:44 ` Jakub Jelinek
2023-07-25 20:54   ` Andrew Pinski
2023-07-25 21:58     ` Andrew Pinski
2023-07-26 13:37       ` Drew Ross
  -- strict thread matches above, loose matches on Subject: below --
2023-07-05 13:41 Drew Ross
2023-07-06 13:00 ` Richard Biener
2023-07-06 19:51   ` Jakub Jelinek
2023-07-11 13:08   ` Jakub Jelinek
2023-07-11 13:58     ` Richard Biener
2023-07-19 13:17       ` Drew Ross

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).