Trying to lower converts to operands through, for example,
(for op (bit_ior bit_and bit_xor)
 (for rop (bit_xor bit_ior bit_and)
  (simplify
   (op:c (nop_convert (rop @0 @1)) @3)
   (op (rop (convert:type @0) (convert:type @1)) @3))))

(simplify
 (convert (bit_not @0))
 (bit_not (convert:type @0)))

Runs into infinite oscillations with
/* Try to fold (type) X op CST -> (type) (X op ((type-x) CST))
   when profitable.
...
  (bitop (convert@2 @0) (convert?@3 @1))
...
   (convert (bitop @0 (convert @1)))))

when integer constants are involved ex.
unsigned int main (int x, unsigned int y)
{
  unsigned int a = x | 4213678;
  unsigned int b = a ^ y;
  return b;
}

I think using Jakub's bitwise equal macro to get it down to 16 cases might
be our best option.

Drew

On Tue, Jul 11, 2023 at 9:58 AM Richard Biener <richard.guenther@gmail.com>
wrote:

> On Tue, Jul 11, 2023 at 3:08 PM Jakub Jelinek <jakub@redhat.com> wrote:
> >
> > On Thu, Jul 06, 2023 at 03:00:28PM +0200, Richard Biener via Gcc-patches
> wrote:
> > > On Wed, Jul 5, 2023 at 3:42 PM Drew Ross via Gcc-patches
> > > <gcc-patches@gcc.gnu.org> wrote:
> > > >
> > > >     Adds a simplification for (~X | Y) ^ X to be folded into ~(X &
> Y).
> > > >     Tested successfully on x86_64 and x86 targets.
> > > >
> > > >             PR middle-end/109986
> > > >
> > > >     gcc/ChangeLog:
> > > >
> > > >             * match.pd ((~X | Y) ^ X -> ~(X & Y)): New
> simplification.
> > > >
> > > >     gcc/testsuite/ChangeLog:
> > > >
> > > >             * gcc.c-torture/execute/pr109986.c: New test.
> > > >             * gcc.dg/tree-ssa/pr109986.c: New test.
> > > > ---
> > > >  gcc/match.pd                                  |  11 ++
> > > >  .../gcc.c-torture/execute/pr109986.c          |  41 ++++
> > > >  gcc/testsuite/gcc.dg/tree-ssa/pr109986.c      | 177
> ++++++++++++++++++
> > > >  3 files changed, 229 insertions(+)
> > > >  create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr109986.c
> > > >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr109986.c
> > > >
> > > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > > index a17d6838c14..d9d7d932881 100644
> > > > --- a/gcc/match.pd
> > > > +++ b/gcc/match.pd
> > > > @@ -1627,6 +1627,17 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > > >   (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
> > > >    (convert (bit_and @1 (bit_not @0)))))
> > > >
> > > > +/* (~X | Y) ^ X -> ~(X & Y).  */
> > > > +(simplify
> > > > + (bit_xor:c (nop_convert1?
> > > > +             (bit_ior:c (nop_convert2? (bit_not (nop_convert3? @0)))
> > > > +                        @1)) (nop_convert4? @0))
> > >
> > > you want to reduce the number of nop_convert? - for example
> > > I wonder if we can canonicalize
> > >
> > >  (T)~X and ~(T)X
> > >
> > > for nop-conversions.  The same might apply to binary bitwise operations
> > > where we should push those to a direction where they are likely
> eliminated.
> > > Usually we'd push them outwards.
> > >
> > > The issue with the above pattern is that nop_convertN? expands to 2^N
> > > separate patterns.  Together with the two :c you get 64 out of this.
> > >
> > > I do not see that all of the combinations can happen when X has to
> > > match unless we fail to contract some of them like if we have
> > > (unsigned)(~(signed)X | Y) ^ X which we could rewrite like
> > > -> (unsigned)((signed)~X | Y) ^ X -> (~X | (unsigned) Y) ^ X
> > > with the last step being somewhat difficult unless we do
> > > (signed)~X | Y -> (signed)(~X | (unsigned)Y).  It feels like a
> > > propagation problem and less of a direct pattern matching one.
> >
> > The nop_convert1? in the pattern might seem to be unnecessary
> > for cases like:
> > int i, j, k, l;
> > unsigned u, v, w, x;
> >
> > void
> > foo (void)
> > {
> >   int t0 = i;
> >   int t1 = (~t0) | j;
> >   x = t1 ^ (unsigned) t0;
> >   unsigned t2 = u;
> >   unsigned t3 = (~t2) | v;
> >   i = ((int) t3) ^ (int) t2;
> > }
> > we actually optimize it with or without the nop_convert1? in place,
> > because we have the
> > /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST))
> >    when profitable.
> > ...
> >   (bitop (convert@2 @0) (convert?@3 @1))
> > ...
> >    (convert (bitop @0 (convert @1)))))
> > simplification.
> > Except that on
> > void
> > bar (void)
> > {
> >   unsigned t0 = u;
> >   int t1 = (~(int) t0) | j;
> >   x = t1 ^ t0;
> >   int t2 = i;
> >   unsigned t3 = (~(unsigned) t2) | v;
> >   i = ((int) t3) ^ t2;
> > }
> > the optimization doesn't trigger without the nop_convert1? and does
> > with it.
> >
> > Perhaps we could get rid of nop_convert3? and nop_convert4?
> > by introducing a macro/inline function predicate like:
> > bitwise_equal_p (expr1, expr2) and instead of using
> > (nop_convert3? @0) and (nop_convert4? @0) in the pattern
> > use @0 and @2 and then add
> > if (bitwise_equal_p (@0, @2))
> > to the condition.
> > For GENERIC (i.e. in generic-match-head.cc) it could be something like:
> > static inline bool
> > bitwise_equal_p (tree expr1, tree expr2)
> > {
> >   STRIP_NOPS (expr1);
> >   STRIP_NOPS (expr2);
> >   if (expr1 == expr2)
> >     return true;
> >   if (!tree_nop_conversion_p (TREE_TYPE (expr1), TREE_TYPE (expr2)))
> >     return false;
> >   if (TREE_CODE (expr1) == INTEGER_CST && TREE_CODE (expr2) ==
> INTEGER_CST)
> >     return wi::to_wide (expr1) == wi::to_wide (expr2);
> >   return operand_equal_p (expr1, expr2, 0);
> > }
> > (the INTEGER_CST special case because operand_equal_p compares
> wi::to_widest
> > which could be different if one constant is signed and the other
> unsigned).
> > For GIMPLE, I wonder if it shouldn't be a macro that takes valueize into
> > account, and do something like:
> > #define bitwise_equal_p(expr1, expr2) gimple_bitwise_equal_p (expr1,
> expr2, valueize)
> >
> > bool gimple_nop_convert (tree, tree *, tree (*)(tree));
> >
> > static inline bool
> > gimple_bitwise_equal_p (tree expr1, tree expr2, tree (*valueize) (tree))
> > {
> >   if (expr1 == expr2)
> >     return true;
> >   if (!tree_nop_conversion_p (TREE_TYPE (expr1), TREE_TYPE (expr2)))
> >     return false;
> >   if (TREE_CODE (expr1) == INTEGER_CST && TREE_CODE (expr2) ==
> INTEGER_CST)
> >     return wi::to_wide (expr1) == wi::to_wide (expr2);
> >   if (operand_equal_p (expr1, expr2, 0))
> >     return true;
> >   tree expr3, expr4;
> >   if (!gimple_nop_convert (expr1, &expr3, valueize))
> >     expr3 = expr1;
> >   if (!gimple_nop_convert (expr2, &expr4, valueize))
> >     expr4 = expr2;
> >   if (expr1 != expr3)
> >     {
> >       if (operand_equal_p (expr3, expr2, 0))
> >         return true;
> >       if (expr2 != expr4 && operand_equal_p (expr3, expr4, 0))
> >         return true;
> >     }
> >   if (expr2 != expr4 && operand_equal_p (expr1, expr4, 0))
> >     return true;
> >   return false;
> > }
> >
> > Completely untested.  What do you think?
> > Though, that brings us only still to 16 cases of this.
>
> I guess we can also not worry and hope for a better code generator ...
>
> The obvious improvement there is to delay pattern expansion (with for and
> ?)
> until we get two patterns on the same sub-tree so patterns that are the
> only ones at some point during the sub-tree matching can then be expanded
> with code generation optimized for code size (:c is the only difficult
> case there).
>
> Matching the shortest paths to leaf first might then improve things
> further.
>
> But this is a complete rewrite of the decision tree builder, so ...
>
> Richard.
>
> >
> >         Jakub
> >
>
>