From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io1-xd2b.google.com (mail-io1-xd2b.google.com [IPv6:2607:f8b0:4864:20::d2b]) by sourceware.org (Postfix) with ESMTPS id E289838582A4 for ; Fri, 17 Mar 2023 20:43:33 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E289838582A4 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=sifive.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=sifive.com Received: by mail-io1-xd2b.google.com with SMTP id m22so2849171ioy.4 for ; Fri, 17 Mar 2023 13:43:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sifive.com; s=google; t=1679085812; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Vv0GdqxfpCePwmo+WrfU5JbEaewqrnJZncliNHrWc4A=; b=eyIharEmubYZNR+9DzQfJOk6f+czkInMMQ6LW8Pr42YvC+iMl5B0n0eCnhf2hB7fZ1 2qZWiPk++QG22oVSMrlOxdsV+nzj2QQQhgz5UH6kDGXHnj1HgvtUk0uxk2tlR55FIbqM lQ8z8JYosqp5ZVkNbAvJFtBv88Z9FZRX3pPlV3gKgJ26Cng/q/QaV37Hi0GktqGkjiin Y01oNo3RSDHyaypHzf+tKYAzqJV0L6CTp09FrJ6ZuEnHnVdpqw7LnncGSAz/DQcrtRNn P75KATrfogBqSJ4FPnseMq1Ik5i0oC1GIq4vUyYc7zEu4gA0D12cBuctjmVtv6FRRh4K nktg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679085812; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Vv0GdqxfpCePwmo+WrfU5JbEaewqrnJZncliNHrWc4A=; b=t8ATZfMp37gTZQr+CH2bkK9xdeMduLr+ZofiL3dsrb1fOsnVH5yjwpwPb1Sd0vw5ZO +RHTugDPFJOEcOQHdgOVDi3I/KrGh6jZ+UMeaXJ6ayBmwwhsyCLn90eTpvjo0RMuih4g 2ZoM0NgifwFGsTP+G8AZH5sWj21QzcY+ags55lPJeUb9XXbJMz4/+6GxyhUEpO/u6ZQ+ K6YF0XnmINH2REAa6QdlnbRM2Aq9fQfjd1IWQHgngZrP/YRPG3RJ93omWyYDp2Hs++mB mY5eSK8U1lnaV19FTMQuIJLS2mMI7jRdm9UQgKH2Ejc5sK82DeuXJwnMW0nyywbC5ohf T3Kg== X-Gm-Message-State: AO0yUKVqI4+T87RsEKjend2MUaEr9ZhdDuinJXST9kxBY4cDB1fPC9WK 6hfn83MnsSwZf0KZkhUynknJmxUcx++hdPFM15yKETK1fYf9Ir6zxwro6ihERimoL35WP7xbXDL oU1qhDWJFYh9GBGzKkpMosvn5Z+x7AqVQRxDzs8IYlstfe1lW/Dw0cVMOqZxcvQgiL1IUnsts4Q == X-Google-Smtp-Source: AK7set9ri3Y92FFjfx/OXvG5oXQnHMOdolfLcagITqsSdcETJlSgK+c0vo3LQs2VB8+E4T/AdT5WNg== X-Received: by 2002:a5d:8994:0:b0:752:2f97:80bb with SMTP id m20-20020a5d8994000000b007522f9780bbmr476906iol.19.1679085812429; Fri, 17 Mar 2023 13:43:32 -0700 (PDT) Received: from mail-io1-f51.google.com (mail-io1-f51.google.com. [209.85.166.51]) by smtp.gmail.com with ESMTPSA id d42-20020a023f2a000000b004063e9acb00sm779215jaa.127.2023.03.17.13.43.31 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 17 Mar 2023 13:43:31 -0700 (PDT) Received: by mail-io1-f51.google.com with SMTP id s4so2835207ioj.11 for ; Fri, 17 Mar 2023 13:43:31 -0700 (PDT) X-Received: by 2002:a05:6602:20ce:b0:753:d03e:9151 with SMTP id 14-20020a05660220ce00b00753d03e9151mr10985ioz.1.1679085811348; Fri, 17 Mar 2023 13:43:31 -0700 (PDT) MIME-Version: 1.0 References: <20230316152706.2214124-1-manolis.tsamis@vrull.eu> In-Reply-To: From: Andrew Waterman Date: Fri, 17 Mar 2023 13:43:20 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v1] [RFC] Improve folding for comparisons with zero in tree-ssa-forwprop. To: Philipp Tomsich Cc: Richard Biener , Manolis Tsamis , Andrew MacLeod , gcc-patches@gcc.gnu.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-9.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Fri, Mar 17, 2023 at 6:16=E2=80=AFAM Philipp Tomsich wrote: > > On Fri, 17 Mar 2023 at 09:31, Richard Biener = wrote: > > > > On Thu, Mar 16, 2023 at 4:27=E2=80=AFPM Manolis Tsamis wrote: > > > > > > For this C testcase: > > > > > > void g(); > > > void f(unsigned int *a) > > > { > > > if (++*a =3D=3D 1) > > > g(); > > > } > > > > > > GCC will currently emit a comparison with 1 by using the value > > > of *a after the increment. This can be improved by comparing > > > against 0 and using the value before the increment. As a result > > > there is a potentially shorter dependancy chain (no need to wait > > > for the result of +1) and on targets with compare zero instructions > > > the generated code is one instruction shorter. > > > > The downside is we now need two registers and their lifetime overlaps. > > > > Your patch mixes changing / inverting a parameter (which seems unneeded > > for the actual change) with preferring compares against zero. > > > > What's the reason to specifically prefer compares against zero? On x86 > > we have add that sets flags, so ++*a =3D=3D 0 would be preferred, but > > for your sequence we'd need a test reg, reg; branch on zero, so we do > > not save any instruction. > > AArch64, RISC-V and MIPS support a branch-on-(not-)equals-zero, while > comparing against a constant requires to load any non-zero value into > a register first. Equality comparisons against 0 are also slightly cheaper than equality comparisons against 1 on x86, though it's a code-size difference, not an instruction-count difference. Not sure this changes the story--just pointing out that this optimization might be slightly more generally applicable than it initially seems. > > This feels a bit like we need to call onto the backend to check > whether comparisons against 0 are cheaper. > > Obviously, the underlying issue become worse if the immediate can not > be built up in a single instruction. > Using RISC-V as an example (primarily, as RISC-V makes it particularly > easy to run into multi-instruction sequences for constants), we can > construct the following case: > > void f(unsigned int *a) > { > if ((*a +=3D 0x900) =3D=3D 0x900) > g(); > } > > which GCC 12.2.0 (trunk may already be small enough to reuse the > constant once loaded into register, but I did not check=E2=80=A6) with -O= 3 > turns into: > > f: > lw a4,0(a0) > li a5,4096 > addiw a5,a5,-1792 > addw a4,a5,a4 > li a5,4096 > sw a4,0(a0) > addi a5,a5,-1792 > beq a4,a5,.L4 > ret > .L4: > tail g > > Thanks, > Philipp. > > > On Fri, 17 Mar 2023 at 09:31, Richard Biener = wrote: > > > > On Thu, Mar 16, 2023 at 4:27=E2=80=AFPM Manolis Tsamis wrote: > > > > > > For this C testcase: > > > > > > void g(); > > > void f(unsigned int *a) > > > { > > > if (++*a =3D=3D 1) > > > g(); > > > } > > > > > > GCC will currently emit a comparison with 1 by using the value > > > of *a after the increment. This can be improved by comparing > > > against 0 and using the value before the increment. As a result > > > there is a potentially shorter dependancy chain (no need to wait > > > for the result of +1) and on targets with compare zero instructions > > > the generated code is one instruction shorter. > > > > The downside is we now need two registers and their lifetime overlaps. > > > > Your patch mixes changing / inverting a parameter (which seems unneeded > > for the actual change) with preferring compares against zero. > > > > What's the reason to specifically prefer compares against zero? On x86 > > we have add that sets flags, so ++*a =3D=3D 0 would be preferred, but > > for your sequence we'd need a test reg, reg; branch on zero, so we do > > not save any instruction. > > > > We do have quite some number of bugreports with regards to making VRPs > > life harder when splitting things this way. It's easier for VRP to han= dle > > > > _1 =3D _2 + 1; > > if (_1 =3D=3D 1) > > > > than it is > > > > _1 =3D _2 + 1; > > if (_2 =3D=3D 0) > > > > where VRP fails to derive a range for _1 on the _2 =3D=3D 0 branch. So= besides > > the life-range issue there's other side-effects as well. Maybe ranger = meanwhile > > can handle the above case? > > > > What's the overall effect of the change on a larger code base? > > > > Thanks, > > Richard. > > > > > > > > Example from Aarch64: > > > > > > Before > > > ldr w1, [x0] > > > add w1, w1, 1 > > > str w1, [x0] > > > cmp w1, 1 > > > beq .L4 > > > ret > > > > > > After > > > ldr w1, [x0] > > > add w2, w1, 1 > > > str w2, [x0] > > > cbz w1, .L4 > > > ret > > > > > > gcc/ChangeLog: > > > > > > * tree-ssa-forwprop.cc (combine_cond_expr_cond): > > > (forward_propagate_into_comparison_1): Optimize > > > for zero comparisons. > > > > > > Signed-off-by: Manolis Tsamis > > > --- > > > > > > gcc/tree-ssa-forwprop.cc | 41 +++++++++++++++++++++++++++-----------= -- > > > 1 file changed, 28 insertions(+), 13 deletions(-) > > > > > > diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc > > > index e34f0888954..93d5043821b 100644 > > > --- a/gcc/tree-ssa-forwprop.cc > > > +++ b/gcc/tree-ssa-forwprop.cc > > > @@ -373,12 +373,13 @@ rhs_to_tree (tree type, gimple *stmt) > > > /* Combine OP0 CODE OP1 in the context of a COND_EXPR. Returns > > > the folded result in a form suitable for COND_EXPR_COND or > > > NULL_TREE, if there is no suitable simplified form. If > > > - INVARIANT_ONLY is true only gimple_min_invariant results are > > > - considered simplified. */ > > > + ALWAYS_COMBINE is false then only combine it the resulting > > > + expression is gimple_min_invariant or considered simplified > > > + compared to the original. */ > > > > > > static tree > > > combine_cond_expr_cond (gimple *stmt, enum tree_code code, tree type= , > > > - tree op0, tree op1, bool invariant_only) > > > + tree op0, tree op1, bool always_combine) > > > { > > > tree t; > > > > > > @@ -398,17 +399,31 @@ combine_cond_expr_cond (gimple *stmt, enum tree= _code code, tree type, > > > /* Canonicalize the combined condition for use in a COND_EXPR. */ > > > t =3D canonicalize_cond_expr_cond (t); > > > > > > - /* Bail out if we required an invariant but didn't get one. */ > > > - if (!t || (invariant_only && !is_gimple_min_invariant (t))) > > > + if (!t) > > > { > > > fold_undefer_overflow_warnings (false, NULL, 0); > > > return NULL_TREE; > > > } > > > > > > - bool nowarn =3D warning_suppressed_p (stmt, OPT_Wstrict_overflow); > > > - fold_undefer_overflow_warnings (!nowarn, stmt, 0); > > > + if (always_combine || is_gimple_min_invariant (t)) > > > + { > > > + bool nowarn =3D warning_suppressed_p (stmt, OPT_Wstrict_overfl= ow); > > > + fold_undefer_overflow_warnings (!nowarn, stmt, 0); > > > + return t; > > > + } > > > > > > - return t; > > > + /* If the result of folding is a zero comparison treat it preferen= tially. */ > > > + if (TREE_CODE_CLASS (TREE_CODE (t)) =3D=3D tcc_comparison > > > + && integer_zerop (TREE_OPERAND (t, 1)) > > > + && !integer_zerop (op1)) > > > + { > > > + bool nowarn =3D warning_suppressed_p (stmt, OPT_Wstrict_overfl= ow); > > > + fold_undefer_overflow_warnings (!nowarn, stmt, 0); > > > + return t; > > > + } > > > + > > > + fold_undefer_overflow_warnings (false, NULL, 0); > > > + return NULL_TREE; > > > } > > > > > > /* Combine the comparison OP0 CODE OP1 at LOC with the defining stat= ements > > > @@ -432,7 +447,7 @@ forward_propagate_into_comparison_1 (gimple *stmt= , > > > if (def_stmt && can_propagate_from (def_stmt)) > > > { > > > enum tree_code def_code =3D gimple_assign_rhs_code (def_stm= t); > > > - bool invariant_only_p =3D !single_use0_p; > > > + bool always_combine =3D single_use0_p; > > > > > > rhs0 =3D rhs_to_tree (TREE_TYPE (op1), def_stmt); > > > > > > @@ -442,10 +457,10 @@ forward_propagate_into_comparison_1 (gimple *st= mt, > > > && TREE_CODE (TREE_TYPE (TREE_OPERAND (rhs0, 0))) > > > =3D=3D BOOLEAN_TYPE) > > > || TREE_CODE_CLASS (def_code) =3D=3D tcc_comparison= )) > > > - invariant_only_p =3D false; > > > + always_combine =3D true; > > > > > > tmp =3D combine_cond_expr_cond (stmt, code, type, > > > - rhs0, op1, invariant_only_p); > > > + rhs0, op1, always_combine); > > > if (tmp) > > > return tmp; > > > } > > > @@ -459,7 +474,7 @@ forward_propagate_into_comparison_1 (gimple *stmt= , > > > { > > > rhs1 =3D rhs_to_tree (TREE_TYPE (op0), def_stmt); > > > tmp =3D combine_cond_expr_cond (stmt, code, type, > > > - op0, rhs1, !single_use1_p); > > > + op0, rhs1, single_use1_p); > > > if (tmp) > > > return tmp; > > > } > > > @@ -470,7 +485,7 @@ forward_propagate_into_comparison_1 (gimple *stmt= , > > > && rhs1 !=3D NULL_TREE) > > > tmp =3D combine_cond_expr_cond (stmt, code, type, > > > rhs0, rhs1, > > > - !(single_use0_p && single_use1_p)); > > > + single_use0_p && single_use1_p); > > > > > > return tmp; > > > } > > > -- > > > 2.34.1 > > >