From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=iTsJ=AP=gmail.com=richard.guenther@sourceware.org>
Received: from mail-lj1-x234.google.com (mail-lj1-x234.google.com [IPv6:2a00:1450:4864:20::234])
	by sourceware.org (Postfix) with ESMTPS id 2E97D3858D32
	for <gcc-patches@gcc.gnu.org>; Mon, 24 Apr 2023 08:08:10 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2E97D3858D32
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com
Received: by mail-lj1-x234.google.com with SMTP id 38308e7fff4ca-2a8bca69e8bso38139441fa.3
        for <gcc-patches@gcc.gnu.org>; Mon, 24 Apr 2023 01:08:10 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20221208; t=1682323688; x=1684915688;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=wxTtZvPB0CSYsg/VQeTqhbn9F+GE0BtxU9wV7c5Iih4=;
        b=bDPjt1R5Z8fSlrfBmeA7LkJJdm05FswvlbykGFEMaIEEfnwVegkc+90RmqzmSw+oft
         SB1p2X0hdtMfrft+vVJmE8xzn2EvKsZxX4429xiLNPpy9ECE7sgx2EJ3raBDA19yi/Ww
         6NhXUMWNfqmUMmg5TnodCsmeqHr+mjPtMxJmtQZZmJBZGj0p6BTIrOLI63K9qVYstOrK
         j3vRIm5lKbK5MUpImzlqbYZddadB8DIhfyojgBTTU91UJ7aaidVMonAmrEmTaw8D8TQi
         aRYYO2Sum2tiwokdN2mnl/Wfp4+rtIGiV7klIMeqos0kNF0ufzKiXg9zafcVOxjs+7sZ
         kk5w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1682323688; x=1684915688;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=wxTtZvPB0CSYsg/VQeTqhbn9F+GE0BtxU9wV7c5Iih4=;
        b=NvokUaF8x93tkefX5vwWNSMbesNO5PjeflNyQe1AJitpuvKPnnwESjeO9skRdljTNO
         JPRgXwzCrKnD2ljymmN68Aif8cOFT+e8iEQM60DdREvdmoRexArL0QlmooMxpQ5mdAsL
         bun476lv9/i6Cq3HX+5bRzGrcbypHB/fLqppvL1u56tpixdLN2zxwfg/VHmoEQhoXDHV
         4yFtj8zFofwbyNRC72ANgRiNZfMQZoFqmh7rZao4EoRwX2uT2uR1NgYp94Qj5jBprTfT
         kXurQ6TejJr4P7DWNW7Ko4UagJ78YYFIh4/47p2A9I4Lx1uT2UV3OIfMNzvNc1HH6Zej
         SC1g==
X-Gm-Message-State: AAQBX9c0zyIpRVnKpEY/y03TtihU9pMAwgNIRNHd2utDkSdw9vH7OWEi
	sKp84u8sYCKJo+vfmU4zKFnUPiiHuZA66k/ArqU=
X-Google-Smtp-Source: AKy350b9uzPtVoslgJrL810chkYYUjI2M7t82usWLe6v501V+LBgnA0JUTx7mDMqqJaMpKcxQWic8lfuvlVAv35dpac=
X-Received: by 2002:a2e:8508:0:b0:2a8:a916:6b1c with SMTP id
 j8-20020a2e8508000000b002a8a9166b1cmr2356350lji.4.1682323688328; Mon, 24 Apr
 2023 01:08:08 -0700 (PDT)
MIME-Version: 1.0
References: <20230316152706.2214124-1-manolis.tsamis@vrull.eu>
 <CAFiYyc1yb8NA6yqUiL-QHT7=4=qQESL8kw88bhhXdA_xVnXyOw@mail.gmail.com>
 <CAM3yNXqDNAV5F+zTPzk7pw_TJOTnW41YWv=PH=h3xxC1uMMMnA@mail.gmail.com> <CAAeLtUCfubT2AXjiJGQhoXWM3QbP89utPGVMakakRB1jM-VFaQ@mail.gmail.com>
In-Reply-To: <CAAeLtUCfubT2AXjiJGQhoXWM3QbP89utPGVMakakRB1jM-VFaQ@mail.gmail.com>
From: Richard Biener <richard.guenther@gmail.com>
Date: Mon, 24 Apr 2023 10:06:41 +0200
Message-ID: <CAFiYyc1Ton4hYUjzWkTP-mOh8_wAW0MFnnNZtbMyDXY4qihccA@mail.gmail.com>
Subject: Re: [PATCH v1] [RFC] Improve folding for comparisons with zero in tree-ssa-forwprop.
To: Philipp Tomsich <philipp.tomsich@vrull.eu>
Cc: Manolis Tsamis <manolis.tsamis@vrull.eu>, Andrew MacLeod <amacleod@redhat.com>, 
	gcc-patches@gcc.gnu.org
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-7.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

On Fri, Apr 21, 2023 at 11:01=E2=80=AFPM Philipp Tomsich
<philipp.tomsich@vrull.eu> wrote:
>
> Any guidance on the next steps for this patch?

I think we want to perform this transform later, in particular when
the test is a loop
exit test we do not want to do it as it prevents coalescing of the IV
on the backedge
at out-of-SSA time.

That means doing the transform in folding and/or before inlining (the
test could become
a loop exit test) would be a no-go.  In fact for SSA coalescing we'd
want the reverse
transform in some cases, see PRs 86270 and 70359.

If we can reliably undo for the loop case I suppose we can do the
canonicalization
to compare against zero.  In any case please split up the patch (note I've =
also
hoped we could eventually get rid of that part of tree-ssa-forwprop.cc in f=
avor
of match.pd patterns since it uses GENERIC folding :/).

Richard.

> I believe that we answered all open questions, but may have missed someth=
ing.
>
> With trunk open for new development, we would like to revise and land thi=
s=E2=80=A6
>
> Thanks,
> Philipp.
>
> On Mon, 20 Mar 2023 at 15:02, Manolis Tsamis <manolis.tsamis@vrull.eu> wr=
ote:
> >
> > On Fri, Mar 17, 2023 at 10:31=E2=80=AFAM Richard Biener
> > <richard.guenther@gmail.com> wrote:
> > >
> > > On Thu, Mar 16, 2023 at 4:27=E2=80=AFPM Manolis Tsamis <manolis.tsami=
s@vrull.eu> wrote:
> > > >
> > > > For this C testcase:
> > > >
> > > > void g();
> > > > void f(unsigned int *a)
> > > > {
> > > >   if (++*a =3D=3D 1)
> > > >     g();
> > > > }
> > > >
> > > > GCC will currently emit a comparison with 1 by using the value
> > > > of *a after the increment. This can be improved by comparing
> > > > against 0 and using the value before the increment. As a result
> > > > there is a potentially shorter dependancy chain (no need to wait
> > > > for the result of +1) and on targets with compare zero instructions
> > > > the generated code is one instruction shorter.
> > >
> > > The downside is we now need two registers and their lifetime overlaps=
.
> > >
> > > Your patch mixes changing / inverting a parameter (which seems unneed=
ed
> > > for the actual change) with preferring compares against zero.
> > >
> >
> > Indeed. I thought that without that change the original names wouldn't =
properly
> > describe what the parameter actually does and that's why I've changed i=
t.
> > I can undo that in the next revision.
> >
> > > What's the reason to specifically prefer compares against zero?  On x=
86
> > > we have add that sets flags, so ++*a =3D=3D 0 would be preferred, but
> > > for your sequence we'd need a test reg, reg; branch on zero, so we do
> > > not save any instruction.
> > >
> >
> > My reasoning is that zero is treated preferentially  in most if not
> > all architectures. Some specifically have zero/non-zero comparisons so
> > we get one less instruction. X86 doesn't explicitly have that but I
> > think that test reg, reg may not be always needed depending on the
> > rest of the code. By what Andrew mentions below there may even be
> > optimizations for zero in the microarchitecture level.
> >
> > Because this is still an arch-specific thing I initially tried to make
> > it arch-depended by invoking the target's const functions (e.g. If I
> > recall correctly aarch64 will return a lower cost for zero
> > comparisons). But the code turned out complicated and messy so I came
> > up with this alternative that just treats zero preferentially.
> >
> > If you have in mind a way that this can be done in a better way I
> > could try to implement it.
> >
> > > We do have quite some number of bugreports with regards to making VRP=
s
> > > life harder when splitting things this way.  It's easier for VRP to h=
andle
> > >
> > >   _1 =3D _2 + 1;
> > >   if (_1 =3D=3D 1)
> > >
> > > than it is
> > >
> > >   _1 =3D _2 + 1;
> > >   if (_2 =3D=3D 0)
> > >
> > > where VRP fails to derive a range for _1 on the _2 =3D=3D 0 branch.  =
So besides
> > > the life-range issue there's other side-effects as well.  Maybe range=
r meanwhile
> > > can handle the above case?
> > >
> >
> > Answered by Andrew MacLeod.
> >
> > > What's the overall effect of the change on a larger code base?
> > >
> >
> > I made some quick runs of SPEC2017 and got the following results (# of
> > folds of zero comparisons):
> >
> >  gcc             2586
> >  xalancbmk 1456
> >  perlbench   375
> >  x264           307
> >  omnetpp     137
> >  leela           24
> >  deepsjeng  15
> >  exchange2  4
> >  xz                4
> >
> > My test runs on Aarch64 do not show any significant change in runtime.
> > In some cases (e.g. gcc) the binary is smaller in size, but that can
> > depend on a number of other things.
> >
> > Thanks,
> > Manolis
> >
> > > Thanks,
> > > Richard.
> > >
> > > >
> > > > Example from Aarch64:
> > > >
> > > > Before
> > > >         ldr     w1, [x0]
> > > >         add     w1, w1, 1
> > > >         str     w1, [x0]
> > > >         cmp     w1, 1
> > > >         beq     .L4
> > > >         ret
> > > >
> > > > After
> > > >         ldr     w1, [x0]
> > > >         add     w2, w1, 1
> > > >         str     w2, [x0]
> > > >         cbz     w1, .L4
> > > >         ret
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > >         * tree-ssa-forwprop.cc (combine_cond_expr_cond):
> > > >         (forward_propagate_into_comparison_1): Optimize
> > > >         for zero comparisons.
> > > >
> > > > Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu>
> > > > ---
> > > >
> > > >  gcc/tree-ssa-forwprop.cc | 41 +++++++++++++++++++++++++++---------=
----
> > > >  1 file changed, 28 insertions(+), 13 deletions(-)
> > > >
> > > > diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc
> > > > index e34f0888954..93d5043821b 100644
> > > > --- a/gcc/tree-ssa-forwprop.cc
> > > > +++ b/gcc/tree-ssa-forwprop.cc
> > > > @@ -373,12 +373,13 @@ rhs_to_tree (tree type, gimple *stmt)
> > > >  /* Combine OP0 CODE OP1 in the context of a COND_EXPR.  Returns
> > > >     the folded result in a form suitable for COND_EXPR_COND or
> > > >     NULL_TREE, if there is no suitable simplified form.  If
> > > > -   INVARIANT_ONLY is true only gimple_min_invariant results are
> > > > -   considered simplified.  */
> > > > +   ALWAYS_COMBINE is false then only combine it the resulting
> > > > +   expression is gimple_min_invariant or considered simplified
> > > > +   compared to the original.  */
> > > >
> > > >  static tree
> > > >  combine_cond_expr_cond (gimple *stmt, enum tree_code code, tree ty=
pe,
> > > > -                       tree op0, tree op1, bool invariant_only)
> > > > +                       tree op0, tree op1, bool always_combine)
> > > >  {
> > > >    tree t;
> > > >
> > > > @@ -398,17 +399,31 @@ combine_cond_expr_cond (gimple *stmt, enum tr=
ee_code code, tree type,
> > > >    /* Canonicalize the combined condition for use in a COND_EXPR.  =
*/
> > > >    t =3D canonicalize_cond_expr_cond (t);
> > > >
> > > > -  /* Bail out if we required an invariant but didn't get one.  */
> > > > -  if (!t || (invariant_only && !is_gimple_min_invariant (t)))
> > > > +  if (!t)
> > > >      {
> > > >        fold_undefer_overflow_warnings (false, NULL, 0);
> > > >        return NULL_TREE;
> > > >      }
> > > >
> > > > -  bool nowarn =3D warning_suppressed_p (stmt, OPT_Wstrict_overflow=
);
> > > > -  fold_undefer_overflow_warnings (!nowarn, stmt, 0);
> > > > +  if (always_combine || is_gimple_min_invariant (t))
> > > > +    {
> > > > +      bool nowarn =3D warning_suppressed_p (stmt, OPT_Wstrict_over=
flow);
> > > > +      fold_undefer_overflow_warnings (!nowarn, stmt, 0);
> > > > +      return t;
> > > > +    }
> > > >
> > > > -  return t;
> > > > +  /* If the result of folding is a zero comparison treat it prefer=
entially.  */
> > > > +  if (TREE_CODE_CLASS (TREE_CODE (t)) =3D=3D tcc_comparison
> > > > +      && integer_zerop (TREE_OPERAND (t, 1))
> > > > +      && !integer_zerop (op1))
> > > > +    {
> > > > +      bool nowarn =3D warning_suppressed_p (stmt, OPT_Wstrict_over=
flow);
> > > > +      fold_undefer_overflow_warnings (!nowarn, stmt, 0);
> > > > +      return t;
> > > > +    }
> > > > +
> > > > +  fold_undefer_overflow_warnings (false, NULL, 0);
> > > > +  return NULL_TREE;
> > > >  }
> > > >
> > > >  /* Combine the comparison OP0 CODE OP1 at LOC with the defining st=
atements
> > > > @@ -432,7 +447,7 @@ forward_propagate_into_comparison_1 (gimple *st=
mt,
> > > >        if (def_stmt && can_propagate_from (def_stmt))
> > > >         {
> > > >           enum tree_code def_code =3D gimple_assign_rhs_code (def_s=
tmt);
> > > > -         bool invariant_only_p =3D !single_use0_p;
> > > > +         bool always_combine =3D single_use0_p;
> > > >
> > > >           rhs0 =3D rhs_to_tree (TREE_TYPE (op1), def_stmt);
> > > >
> > > > @@ -442,10 +457,10 @@ forward_propagate_into_comparison_1 (gimple *=
stmt,
> > > >                    && TREE_CODE (TREE_TYPE (TREE_OPERAND (rhs0, 0))=
)
> > > >                       =3D=3D BOOLEAN_TYPE)
> > > >                   || TREE_CODE_CLASS (def_code) =3D=3D tcc_comparis=
on))
> > > > -           invariant_only_p =3D false;
> > > > +           always_combine =3D true;
> > > >
> > > >           tmp =3D combine_cond_expr_cond (stmt, code, type,
> > > > -                                       rhs0, op1, invariant_only_p=
);
> > > > +                                       rhs0, op1, always_combine);
> > > >           if (tmp)
> > > >             return tmp;
> > > >         }
> > > > @@ -459,7 +474,7 @@ forward_propagate_into_comparison_1 (gimple *st=
mt,
> > > >         {
> > > >           rhs1 =3D rhs_to_tree (TREE_TYPE (op0), def_stmt);
> > > >           tmp =3D combine_cond_expr_cond (stmt, code, type,
> > > > -                                       op0, rhs1, !single_use1_p);
> > > > +                                       op0, rhs1, single_use1_p);
> > > >           if (tmp)
> > > >             return tmp;
> > > >         }
> > > > @@ -470,7 +485,7 @@ forward_propagate_into_comparison_1 (gimple *st=
mt,
> > > >        && rhs1 !=3D NULL_TREE)
> > > >      tmp =3D combine_cond_expr_cond (stmt, code, type,
> > > >                                   rhs0, rhs1,
> > > > -                                 !(single_use0_p && single_use1_p)=
);
> > > > +                                 single_use0_p && single_use1_p);
> > > >
> > > >    return tmp;
> > > >  }
> > > > --
> > > > 2.34.1
> > > >