From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=Qm3S=C5=gmail.com=richard.guenther@sourceware.org>
Received: from mail-lj1-x22b.google.com (mail-lj1-x22b.google.com [IPv6:2a00:1450:4864:20::22b])
	by sourceware.org (Postfix) with ESMTPS id 15FFA3858421
	for <gcc-patches@gcc.gnu.org>; Tue, 11 Jul 2023 13:58:39 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 15FFA3858421
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com
Received: by mail-lj1-x22b.google.com with SMTP id 38308e7fff4ca-2b703a0453fso90495091fa.3
        for <gcc-patches@gcc.gnu.org>; Tue, 11 Jul 2023 06:58:39 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20221208; t=1689083917; x=1691675917;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=ceBa+ovl9gt8UANGpZsysY9K40VpD9jQUfjukz/BaCA=;
        b=ozEDCD8PeJzBE0GrSTsZcQ53NwwNiqKzMEeQiNnSBZmS72/3IT6DAHmpQaIiWvnuck
         N/7YkL8D1Nu/Jp/RetANr2rsHnKg6zNUR131md3/Upg3eW7nHBE5/uSwnFGpYbKfvVoE
         itZXvUgo9y5d3VQhsHQRGJMKBgvR/pzctnfGR4Ciqa7T6iL4rbpTQqhvH7+qwqLy7Dol
         urQ3M2U1SrP8MruypSRCeRLrzFRqNwWhqKi5CiSWuX+E6ZvgysCi9Ljn7Z8yjZ66SNLK
         m+PMLJAnaKpUE9O4GDTfowTnroyliwuRJm9/6fZd0VFPaI9EWcZp7yVBRbX6N2+ixzv3
         ue3Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1689083917; x=1691675917;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=ceBa+ovl9gt8UANGpZsysY9K40VpD9jQUfjukz/BaCA=;
        b=IrcZmFGNrkdHV+BlsS/FhMNPe2YZ0gF7cgO7a0nad4nvpqshSK2JTzqBCSmgKEwpz1
         l8tnrCsQTUm8poiL6735fEEEmpY3mhmmuxFgcmagc5yc50xx4hulndqXpXyjmqyqO3NT
         rHPxmUNPLZLTepxF7UwmVU5AXKWvzTEYkg+VP1chFx4QstPB4fMrQ2W4el2R81TKK6Ra
         0JvowY63+JQHJwHxSxzxoQbWejul8wfqhIrC2kQbmPuuJiz6pv+Hx/x5SI95ha0aZqWl
         ZG6H5mY+ZfDP4bjnxdgxPcVlCRQaYDJpJMdAjhv04ON7hO3IMHO4aEc8EFFWgP1ObQMB
         IJsw==
X-Gm-Message-State: ABy/qLaigPVtRCwjgwH/OQLQR4qYxeiSv92jCyRUxACs3+w9SPXNy57M
	9UVY+OMW8RKb9yQvjsanpVJEJPwZAD3Af/IweuP5A0oY
X-Google-Smtp-Source: APBJJlHMlzR4QwfgBkbMJ661AnMWC4KbLrQaccKZBbiQoyyeQyQCtuu7ZM2d3PlTYxQ+y+vtCgfjdBgZmOQL0y1bLio=
X-Received: by 2002:a2e:9f07:0:b0:2b7:25b2:e37a with SMTP id
 u7-20020a2e9f07000000b002b725b2e37amr3469887ljk.44.1689083917194; Tue, 11 Jul
 2023 06:58:37 -0700 (PDT)
MIME-Version: 1.0
References: <20230705134147.13325-1-drross@redhat.com> <CAFiYyc0J95EveoqKahTKj+o-yzSQoVeXSAM2=QxkG7uhw8WU_g@mail.gmail.com>
 <ZK1UPBbBhwZ+RwdO@tucnak>
In-Reply-To: <ZK1UPBbBhwZ+RwdO@tucnak>
From: Richard Biener <richard.guenther@gmail.com>
Date: Tue, 11 Jul 2023 15:58:15 +0200
Message-ID: <CAFiYyc38n4mwJF5bBXV9gRssLKma9cehC5rwjn-a3C4U3OAB4w@mail.gmail.com>
Subject: Re: [PATCH] match.pd: Implement missed optimization (~X | Y) ^ X ->
 ~(X & Y) [PR109986]
To: Jakub Jelinek <jakub@redhat.com>
Cc: Drew Ross <drross@redhat.com>, gcc-patches@gcc.gnu.org
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

On Tue, Jul 11, 2023 at 3:08=E2=80=AFPM Jakub Jelinek <jakub@redhat.com> wr=
ote:
>
> On Thu, Jul 06, 2023 at 03:00:28PM +0200, Richard Biener via Gcc-patches =
wrote:
> > On Wed, Jul 5, 2023 at 3:42=E2=80=AFPM Drew Ross via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> > >
> > >     Adds a simplification for (~X | Y) ^ X to be folded into ~(X & Y)=
.
> > >     Tested successfully on x86_64 and x86 targets.
> > >
> > >             PR middle-end/109986
> > >
> > >     gcc/ChangeLog:
> > >
> > >             * match.pd ((~X | Y) ^ X -> ~(X & Y)): New simplification=
.
> > >
> > >     gcc/testsuite/ChangeLog:
> > >
> > >             * gcc.c-torture/execute/pr109986.c: New test.
> > >             * gcc.dg/tree-ssa/pr109986.c: New test.
> > > ---
> > >  gcc/match.pd                                  |  11 ++
> > >  .../gcc.c-torture/execute/pr109986.c          |  41 ++++
> > >  gcc/testsuite/gcc.dg/tree-ssa/pr109986.c      | 177 ++++++++++++++++=
++
> > >  3 files changed, 229 insertions(+)
> > >  create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr109986.c
> > >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr109986.c
> > >
> > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > index a17d6838c14..d9d7d932881 100644
> > > --- a/gcc/match.pd
> > > +++ b/gcc/match.pd
> > > @@ -1627,6 +1627,17 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > >   (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
> > >    (convert (bit_and @1 (bit_not @0)))))
> > >
> > > +/* (~X | Y) ^ X -> ~(X & Y).  */
> > > +(simplify
> > > + (bit_xor:c (nop_convert1?
> > > +             (bit_ior:c (nop_convert2? (bit_not (nop_convert3? @0)))
> > > +                        @1)) (nop_convert4? @0))
> >
> > you want to reduce the number of nop_convert? - for example
> > I wonder if we can canonicalize
> >
> >  (T)~X and ~(T)X
> >
> > for nop-conversions.  The same might apply to binary bitwise operations
> > where we should push those to a direction where they are likely elimina=
ted.
> > Usually we'd push them outwards.
> >
> > The issue with the above pattern is that nop_convertN? expands to 2^N
> > separate patterns.  Together with the two :c you get 64 out of this.
> >
> > I do not see that all of the combinations can happen when X has to
> > match unless we fail to contract some of them like if we have
> > (unsigned)(~(signed)X | Y) ^ X which we could rewrite like
> > -> (unsigned)((signed)~X | Y) ^ X -> (~X | (unsigned) Y) ^ X
> > with the last step being somewhat difficult unless we do
> > (signed)~X | Y -> (signed)(~X | (unsigned)Y).  It feels like a
> > propagation problem and less of a direct pattern matching one.
>
> The nop_convert1? in the pattern might seem to be unnecessary
> for cases like:
> int i, j, k, l;
> unsigned u, v, w, x;
>
> void
> foo (void)
> {
>   int t0 =3D i;
>   int t1 =3D (~t0) | j;
>   x =3D t1 ^ (unsigned) t0;
>   unsigned t2 =3D u;
>   unsigned t3 =3D (~t2) | v;
>   i =3D ((int) t3) ^ (int) t2;
> }
> we actually optimize it with or without the nop_convert1? in place,
> because we have the
> /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST))
>    when profitable.
> ...
>   (bitop (convert@2 @0) (convert?@3 @1))
> ...
>    (convert (bitop @0 (convert @1)))))
> simplification.
> Except that on
> void
> bar (void)
> {
>   unsigned t0 =3D u;
>   int t1 =3D (~(int) t0) | j;
>   x =3D t1 ^ t0;
>   int t2 =3D i;
>   unsigned t3 =3D (~(unsigned) t2) | v;
>   i =3D ((int) t3) ^ t2;
> }
> the optimization doesn't trigger without the nop_convert1? and does
> with it.
>
> Perhaps we could get rid of nop_convert3? and nop_convert4?
> by introducing a macro/inline function predicate like:
> bitwise_equal_p (expr1, expr2) and instead of using
> (nop_convert3? @0) and (nop_convert4? @0) in the pattern
> use @0 and @2 and then add
> if (bitwise_equal_p (@0, @2))
> to the condition.
> For GENERIC (i.e. in generic-match-head.cc) it could be something like:
> static inline bool
> bitwise_equal_p (tree expr1, tree expr2)
> {
>   STRIP_NOPS (expr1);
>   STRIP_NOPS (expr2);
>   if (expr1 =3D=3D expr2)
>     return true;
>   if (!tree_nop_conversion_p (TREE_TYPE (expr1), TREE_TYPE (expr2)))
>     return false;
>   if (TREE_CODE (expr1) =3D=3D INTEGER_CST && TREE_CODE (expr2) =3D=3D IN=
TEGER_CST)
>     return wi::to_wide (expr1) =3D=3D wi::to_wide (expr2);
>   return operand_equal_p (expr1, expr2, 0);
> }
> (the INTEGER_CST special case because operand_equal_p compares wi::to_wid=
est
> which could be different if one constant is signed and the other unsigned=
).
> For GIMPLE, I wonder if it shouldn't be a macro that takes valueize into
> account, and do something like:
> #define bitwise_equal_p(expr1, expr2) gimple_bitwise_equal_p (expr1, expr=
2, valueize)
>
> bool gimple_nop_convert (tree, tree *, tree (*)(tree));
>
> static inline bool
> gimple_bitwise_equal_p (tree expr1, tree expr2, tree (*valueize) (tree))
> {
>   if (expr1 =3D=3D expr2)
>     return true;
>   if (!tree_nop_conversion_p (TREE_TYPE (expr1), TREE_TYPE (expr2)))
>     return false;
>   if (TREE_CODE (expr1) =3D=3D INTEGER_CST && TREE_CODE (expr2) =3D=3D IN=
TEGER_CST)
>     return wi::to_wide (expr1) =3D=3D wi::to_wide (expr2);
>   if (operand_equal_p (expr1, expr2, 0))
>     return true;
>   tree expr3, expr4;
>   if (!gimple_nop_convert (expr1, &expr3, valueize))
>     expr3 =3D expr1;
>   if (!gimple_nop_convert (expr2, &expr4, valueize))
>     expr4 =3D expr2;
>   if (expr1 !=3D expr3)
>     {
>       if (operand_equal_p (expr3, expr2, 0))
>         return true;
>       if (expr2 !=3D expr4 && operand_equal_p (expr3, expr4, 0))
>         return true;
>     }
>   if (expr2 !=3D expr4 && operand_equal_p (expr1, expr4, 0))
>     return true;
>   return false;
> }
>
> Completely untested.  What do you think?
> Though, that brings us only still to 16 cases of this.

I guess we can also not worry and hope for a better code generator ...

The obvious improvement there is to delay pattern expansion (with for and ?=
)
until we get two patterns on the same sub-tree so patterns that are the
only ones at some point during the sub-tree matching can then be expanded
with code generation optimized for code size (:c is the only difficult
case there).

Matching the shortest paths to leaf first might then improve things further=
.

But this is a complete rewrite of the decision tree builder, so ...

Richard.

>
>         Jakub
>