From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id 0373B3858C33 for ; Wed, 19 Jul 2023 13:17:25 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 0373B3858C33 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1689772645; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=RM4nPag529Ut0AbqTRMzqi+DYjCwsIX6kUnXUujBxbI=; b=GBhHkP+xTbOv0CRJ4zojT6fKAWlfnrNn+2GYq60BAyOGYODe28A1UYW8DdjE3pPalL63Xz QcTxbEZnEmDeL/J20kfyupUzC6n64Wg2YQeeyeG8qU5EirDKrui1MkSvirluPV/N1zSiWD FieTNPTa3o3oSgoGspbvj16pTogygVo= Received: from mail-lj1-f200.google.com (mail-lj1-f200.google.com [209.85.208.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-128-rjEVWmCrN82XKaBKYTvo2Q-1; Wed, 19 Jul 2023 09:17:23 -0400 X-MC-Unique: rjEVWmCrN82XKaBKYTvo2Q-1 Received: by mail-lj1-f200.google.com with SMTP id 38308e7fff4ca-2b743113ecdso63894371fa.0 for ; Wed, 19 Jul 2023 06:17:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689772642; x=1692364642; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=RM4nPag529Ut0AbqTRMzqi+DYjCwsIX6kUnXUujBxbI=; b=Buj1g2wASDwDRw84NtiZlWGao4pl8lDvJp47tCu4psFCw7LxMIteB+INEBE+KY/uQu 7Pg0ZcoNA3i70ju/CgrKmYZBobLnm5pEUYU/pB/BlovVV5my3/w+ZYYk6w8vVvrl67Ee uoh58ShlQYvHDUju4ZvP7/XF/Uv10iG7MudPG7LjPMIMfifoq1a6qP/6EFBC/W4gumer iWQfkNWN1yivHYQYsy6mPiC6wYOPC4KyBKa1g/iUDKGzvnQ+49sJB1OjyPj1WVEAJQT3 lqRMTkN6Qgrrgxv43fLDZfjqMiEqYljjKVa4wQf6VdepwE0+ylhnueS9dnfotln0JEwt uL9A== X-Gm-Message-State: ABy/qLYf34SJJMJgWM+S0F0MUUTPUbb0pnIEoke+BPMDq2qdC8jrRMmq 8+lVeyR1aL44VkXKevlsKpnmgsdD//0HN7g7Hz/7dC6QtxWPK5HoZz7UDvaU57R5WemIARZwsTo LeCediwDTSqhVilX10MWokIgbCltafezMlw== X-Received: by 2002:ac2:58ce:0:b0:4fd:d481:ff35 with SMTP id u14-20020ac258ce000000b004fdd481ff35mr1604312lfo.42.1689772641983; Wed, 19 Jul 2023 06:17:21 -0700 (PDT) X-Google-Smtp-Source: APBJJlEwXUA31lyhtN9nMjDwRZhjfp+J0j42n+v5j09ytgzLY8jW++7AyM2KOS2IS1Z7zHh7JqTax7B4rLgrskEUsN4= X-Received: by 2002:ac2:58ce:0:b0:4fd:d481:ff35 with SMTP id u14-20020ac258ce000000b004fdd481ff35mr1604302lfo.42.1689772641643; Wed, 19 Jul 2023 06:17:21 -0700 (PDT) MIME-Version: 1.0 References: <20230705134147.13325-1-drross@redhat.com> In-Reply-To: From: Drew Ross Date: Wed, 19 Jul 2023 09:17:10 -0400 Message-ID: Subject: Re: [PATCH] match.pd: Implement missed optimization (~X | Y) ^ X -> ~(X & Y) [PR109986] To: Richard Biener Cc: Jakub Jelinek , gcc-patches@gcc.gnu.org X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: multipart/alternative; boundary="0000000000000e84f50600d6dab3" X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --0000000000000e84f50600d6dab3 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Trying to lower converts to operands through, for example, (for op (bit_ior bit_and bit_xor) (for rop (bit_xor bit_ior bit_and) (simplify (op:c (nop_convert (rop @0 @1)) @3) (op (rop (convert:type @0) (convert:type @1)) @3)))) (simplify (convert (bit_not @0)) (bit_not (convert:type @0))) Runs into infinite oscillations with /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST)) when profitable. ... (bitop (convert@2 @0) (convert?@3 @1)) ... (convert (bitop @0 (convert @1))))) when integer constants are involved ex. unsigned int main (int x, unsigned int y) { unsigned int a =3D x | 4213678; unsigned int b =3D a ^ y; return b; } I think using Jakub's bitwise equal macro to get it down to 16 cases might be our best option. Drew On Tue, Jul 11, 2023 at 9:58=E2=80=AFAM Richard Biener wrote: > On Tue, Jul 11, 2023 at 3:08=E2=80=AFPM Jakub Jelinek = wrote: > > > > On Thu, Jul 06, 2023 at 03:00:28PM +0200, Richard Biener via Gcc-patches > wrote: > > > On Wed, Jul 5, 2023 at 3:42=E2=80=AFPM Drew Ross via Gcc-patches > > > wrote: > > > > > > > > Adds a simplification for (~X | Y) ^ X to be folded into ~(X & > Y). > > > > Tested successfully on x86_64 and x86 targets. > > > > > > > > PR middle-end/109986 > > > > > > > > gcc/ChangeLog: > > > > > > > > * match.pd ((~X | Y) ^ X -> ~(X & Y)): New > simplification. > > > > > > > > gcc/testsuite/ChangeLog: > > > > > > > > * gcc.c-torture/execute/pr109986.c: New test. > > > > * gcc.dg/tree-ssa/pr109986.c: New test. > > > > --- > > > > gcc/match.pd | 11 ++ > > > > .../gcc.c-torture/execute/pr109986.c | 41 ++++ > > > > gcc/testsuite/gcc.dg/tree-ssa/pr109986.c | 177 > ++++++++++++++++++ > > > > 3 files changed, 229 insertions(+) > > > > create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr109986.c > > > > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr109986.c > > > > > > > > diff --git a/gcc/match.pd b/gcc/match.pd > > > > index a17d6838c14..d9d7d932881 100644 > > > > --- a/gcc/match.pd > > > > +++ b/gcc/match.pd > > > > @@ -1627,6 +1627,17 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > > > > (if (tree_nop_conversion_p (type, TREE_TYPE (@0))) > > > > (convert (bit_and @1 (bit_not @0))))) > > > > > > > > +/* (~X | Y) ^ X -> ~(X & Y). */ > > > > +(simplify > > > > + (bit_xor:c (nop_convert1? > > > > + (bit_ior:c (nop_convert2? (bit_not (nop_convert3? @0)= )) > > > > + @1)) (nop_convert4? @0)) > > > > > > you want to reduce the number of nop_convert? - for example > > > I wonder if we can canonicalize > > > > > > (T)~X and ~(T)X > > > > > > for nop-conversions. The same might apply to binary bitwise operatio= ns > > > where we should push those to a direction where they are likely > eliminated. > > > Usually we'd push them outwards. > > > > > > The issue with the above pattern is that nop_convertN? expands to 2^N > > > separate patterns. Together with the two :c you get 64 out of this. > > > > > > I do not see that all of the combinations can happen when X has to > > > match unless we fail to contract some of them like if we have > > > (unsigned)(~(signed)X | Y) ^ X which we could rewrite like > > > -> (unsigned)((signed)~X | Y) ^ X -> (~X | (unsigned) Y) ^ X > > > with the last step being somewhat difficult unless we do > > > (signed)~X | Y -> (signed)(~X | (unsigned)Y). It feels like a > > > propagation problem and less of a direct pattern matching one. > > > > The nop_convert1? in the pattern might seem to be unnecessary > > for cases like: > > int i, j, k, l; > > unsigned u, v, w, x; > > > > void > > foo (void) > > { > > int t0 =3D i; > > int t1 =3D (~t0) | j; > > x =3D t1 ^ (unsigned) t0; > > unsigned t2 =3D u; > > unsigned t3 =3D (~t2) | v; > > i =3D ((int) t3) ^ (int) t2; > > } > > we actually optimize it with or without the nop_convert1? in place, > > because we have the > > /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST)) > > when profitable. > > ... > > (bitop (convert@2 @0) (convert?@3 @1)) > > ... > > (convert (bitop @0 (convert @1))))) > > simplification. > > Except that on > > void > > bar (void) > > { > > unsigned t0 =3D u; > > int t1 =3D (~(int) t0) | j; > > x =3D t1 ^ t0; > > int t2 =3D i; > > unsigned t3 =3D (~(unsigned) t2) | v; > > i =3D ((int) t3) ^ t2; > > } > > the optimization doesn't trigger without the nop_convert1? and does > > with it. > > > > Perhaps we could get rid of nop_convert3? and nop_convert4? > > by introducing a macro/inline function predicate like: > > bitwise_equal_p (expr1, expr2) and instead of using > > (nop_convert3? @0) and (nop_convert4? @0) in the pattern > > use @0 and @2 and then add > > if (bitwise_equal_p (@0, @2)) > > to the condition. > > For GENERIC (i.e. in generic-match-head.cc) it could be something like: > > static inline bool > > bitwise_equal_p (tree expr1, tree expr2) > > { > > STRIP_NOPS (expr1); > > STRIP_NOPS (expr2); > > if (expr1 =3D=3D expr2) > > return true; > > if (!tree_nop_conversion_p (TREE_TYPE (expr1), TREE_TYPE (expr2))) > > return false; > > if (TREE_CODE (expr1) =3D=3D INTEGER_CST && TREE_CODE (expr2) =3D=3D > INTEGER_CST) > > return wi::to_wide (expr1) =3D=3D wi::to_wide (expr2); > > return operand_equal_p (expr1, expr2, 0); > > } > > (the INTEGER_CST special case because operand_equal_p compares > wi::to_widest > > which could be different if one constant is signed and the other > unsigned). > > For GIMPLE, I wonder if it shouldn't be a macro that takes valueize into > > account, and do something like: > > #define bitwise_equal_p(expr1, expr2) gimple_bitwise_equal_p (expr1, > expr2, valueize) > > > > bool gimple_nop_convert (tree, tree *, tree (*)(tree)); > > > > static inline bool > > gimple_bitwise_equal_p (tree expr1, tree expr2, tree (*valueize) (tree)) > > { > > if (expr1 =3D=3D expr2) > > return true; > > if (!tree_nop_conversion_p (TREE_TYPE (expr1), TREE_TYPE (expr2))) > > return false; > > if (TREE_CODE (expr1) =3D=3D INTEGER_CST && TREE_CODE (expr2) =3D=3D > INTEGER_CST) > > return wi::to_wide (expr1) =3D=3D wi::to_wide (expr2); > > if (operand_equal_p (expr1, expr2, 0)) > > return true; > > tree expr3, expr4; > > if (!gimple_nop_convert (expr1, &expr3, valueize)) > > expr3 =3D expr1; > > if (!gimple_nop_convert (expr2, &expr4, valueize)) > > expr4 =3D expr2; > > if (expr1 !=3D expr3) > > { > > if (operand_equal_p (expr3, expr2, 0)) > > return true; > > if (expr2 !=3D expr4 && operand_equal_p (expr3, expr4, 0)) > > return true; > > } > > if (expr2 !=3D expr4 && operand_equal_p (expr1, expr4, 0)) > > return true; > > return false; > > } > > > > Completely untested. What do you think? > > Though, that brings us only still to 16 cases of this. > > I guess we can also not worry and hope for a better code generator ... > > The obvious improvement there is to delay pattern expansion (with for and > ?) > until we get two patterns on the same sub-tree so patterns that are the > only ones at some point during the sub-tree matching can then be expanded > with code generation optimized for code size (:c is the only difficult > case there). > > Matching the shortest paths to leaf first might then improve things > further. > > But this is a complete rewrite of the decision tree builder, so ... > > Richard. > > > > > Jakub > > > > --0000000000000e84f50600d6dab3--