From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=WYiO=7M=vrull.eu=manolis.tsamis@sourceware.org>
Received: from mail-pg1-x52e.google.com (mail-pg1-x52e.google.com [IPv6:2607:f8b0:4864:20::52e])
	by sourceware.org (Postfix) with ESMTPS id B471E3858413
	for <gcc-patches@gcc.gnu.org>; Mon, 20 Mar 2023 14:02:25 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B471E3858413
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=vrull.eu
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=vrull.eu
Received: by mail-pg1-x52e.google.com with SMTP id h31so6637946pgl.6
        for <gcc-patches@gcc.gnu.org>; Mon, 20 Mar 2023 07:02:25 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=vrull.eu; s=google; t=1679320944;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=0ZVmP+CHlfi1N0Tvn9U+8kZOrfT7PTc9YmkbmNh0tec=;
        b=S3IHLigLDUCnURXwl/9KqiXh7hXbDxdlYh+IbcyyyUB+9rpDyma1DX0XfhIy9s+AvF
         iLHqdllAvO1Kig2RFmk833v3P3y3nUexxjZp4PT8yXwJxm/Tua9C7riahajBfFbU5786
         +k9AYrmJrSAHclVJGQKBO4DWjS+NWQOEnGew1Yx8D9A5pK60ay89K3mj+E2hq8cSrQAe
         S3t6rY6xwkMxRt3S/3RZNxrPA9YfDjg9RVNGzrQFAX9dt8NZ6kxuRf515lCvb9KDm3Rw
         6x/OrRF0cBV0iXzwLT/koeP5OxdCgEJM7JDwhb1K+ZUFzCu6dhPoB/l1+DfPHncThnYG
         4jrg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112; t=1679320944;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=0ZVmP+CHlfi1N0Tvn9U+8kZOrfT7PTc9YmkbmNh0tec=;
        b=Fs7RLY6LsBmAhjZDACKKTL+UxUBlmFGC0vRnIAGIs30QO/9WEUaXCq9WNOuuB4R74j
         YryRQHIeVb8tZAVeOxFc5hxND2FOs9J1Lay6xYkKJI0bCkwwMba2NcdJuDd6AcnLrYzB
         ovuYtXXpuSI9IKE8bUKNfc17E40ixCxcdQLQlUXws8qhtrj0t5BA931fTLjsFsWGPU74
         cp49umqAq5ewrSh4uGY1pkbIHzoEp+2K9J0Xzm7T4qw2BMyBoZM91W9wEXaBByPyXkXx
         VN5rJJJF1YJzzAuvH3mKwNdxzIpknr/S5zs8x2xMztpmLWfWLQEuZUW1w1f9a2cqiYbu
         xb4Q==
X-Gm-Message-State: AO0yUKV3pa+yuuDUVfZxNMfMUrsMZw/WrBjkLomK4ItN60pC6I7RVgHN
	kV2i0divBpWPNuTSOMDpoewctC+6TAPfhJtm1wXRiA==
X-Google-Smtp-Source: AK7set8yw/twabw8tuI0N6RYzLn6o4VWkSts1RuwQqNDRuHDEuBar2cfgBcXbinzZ8dRmRYwVWn0fl+vjLgCRNCedNk=
X-Received: by 2002:a65:66d4:0:b0:503:2678:4f14 with SMTP id
 c20-20020a6566d4000000b0050326784f14mr1843452pgw.8.1679320944544; Mon, 20 Mar
 2023 07:02:24 -0700 (PDT)
MIME-Version: 1.0
References: <20230316152706.2214124-1-manolis.tsamis@vrull.eu> <CAFiYyc1yb8NA6yqUiL-QHT7=4=qQESL8kw88bhhXdA_xVnXyOw@mail.gmail.com>
In-Reply-To: <CAFiYyc1yb8NA6yqUiL-QHT7=4=qQESL8kw88bhhXdA_xVnXyOw@mail.gmail.com>
From: Manolis Tsamis <manolis.tsamis@vrull.eu>
Date: Mon, 20 Mar 2023 16:01:48 +0200
Message-ID: <CAM3yNXqDNAV5F+zTPzk7pw_TJOTnW41YWv=PH=h3xxC1uMMMnA@mail.gmail.com>
Subject: Re: [PATCH v1] [RFC] Improve folding for comparisons with zero in tree-ssa-forwprop.
To: Richard Biener <richard.guenther@gmail.com>
Cc: Andrew MacLeod <amacleod@redhat.com>, gcc-patches@gcc.gnu.org, 
	Philipp Tomsich <philipp.tomsich@vrull.eu>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-9.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,JMQ_SPF_NEUTRAL,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

On Fri, Mar 17, 2023 at 10:31=E2=80=AFAM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Thu, Mar 16, 2023 at 4:27=E2=80=AFPM Manolis Tsamis <manolis.tsamis@vr=
ull.eu> wrote:
> >
> > For this C testcase:
> >
> > void g();
> > void f(unsigned int *a)
> > {
> >   if (++*a =3D=3D 1)
> >     g();
> > }
> >
> > GCC will currently emit a comparison with 1 by using the value
> > of *a after the increment. This can be improved by comparing
> > against 0 and using the value before the increment. As a result
> > there is a potentially shorter dependancy chain (no need to wait
> > for the result of +1) and on targets with compare zero instructions
> > the generated code is one instruction shorter.
>
> The downside is we now need two registers and their lifetime overlaps.
>
> Your patch mixes changing / inverting a parameter (which seems unneeded
> for the actual change) with preferring compares against zero.
>

Indeed. I thought that without that change the original names wouldn't prop=
erly
describe what the parameter actually does and that's why I've changed it.
I can undo that in the next revision.

> What's the reason to specifically prefer compares against zero?  On x86
> we have add that sets flags, so ++*a =3D=3D 0 would be preferred, but
> for your sequence we'd need a test reg, reg; branch on zero, so we do
> not save any instruction.
>

My reasoning is that zero is treated preferentially  in most if not
all architectures. Some specifically have zero/non-zero comparisons so
we get one less instruction. X86 doesn't explicitly have that but I
think that test reg, reg may not be always needed depending on the
rest of the code. By what Andrew mentions below there may even be
optimizations for zero in the microarchitecture level.

Because this is still an arch-specific thing I initially tried to make
it arch-depended by invoking the target's const functions (e.g. If I
recall correctly aarch64 will return a lower cost for zero
comparisons). But the code turned out complicated and messy so I came
up with this alternative that just treats zero preferentially.

If you have in mind a way that this can be done in a better way I
could try to implement it.

> We do have quite some number of bugreports with regards to making VRPs
> life harder when splitting things this way.  It's easier for VRP to handl=
e
>
>   _1 =3D _2 + 1;
>   if (_1 =3D=3D 1)
>
> than it is
>
>   _1 =3D _2 + 1;
>   if (_2 =3D=3D 0)
>
> where VRP fails to derive a range for _1 on the _2 =3D=3D 0 branch.  So b=
esides
> the life-range issue there's other side-effects as well.  Maybe ranger me=
anwhile
> can handle the above case?
>

Answered by Andrew MacLeod.

> What's the overall effect of the change on a larger code base?
>

I made some quick runs of SPEC2017 and got the following results (# of
folds of zero comparisons):

 gcc             2586
 xalancbmk 1456
 perlbench   375
 x264           307
 omnetpp     137
 leela           24
 deepsjeng  15
 exchange2  4
 xz                4

My test runs on Aarch64 do not show any significant change in runtime.
In some cases (e.g. gcc) the binary is smaller in size, but that can
depend on a number of other things.

Thanks,
Manolis

> Thanks,
> Richard.
>
> >
> > Example from Aarch64:
> >
> > Before
> >         ldr     w1, [x0]
> >         add     w1, w1, 1
> >         str     w1, [x0]
> >         cmp     w1, 1
> >         beq     .L4
> >         ret
> >
> > After
> >         ldr     w1, [x0]
> >         add     w2, w1, 1
> >         str     w2, [x0]
> >         cbz     w1, .L4
> >         ret
> >
> > gcc/ChangeLog:
> >
> >         * tree-ssa-forwprop.cc (combine_cond_expr_cond):
> >         (forward_propagate_into_comparison_1): Optimize
> >         for zero comparisons.
> >
> > Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu>
> > ---
> >
> >  gcc/tree-ssa-forwprop.cc | 41 +++++++++++++++++++++++++++-------------
> >  1 file changed, 28 insertions(+), 13 deletions(-)
> >
> > diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc
> > index e34f0888954..93d5043821b 100644
> > --- a/gcc/tree-ssa-forwprop.cc
> > +++ b/gcc/tree-ssa-forwprop.cc
> > @@ -373,12 +373,13 @@ rhs_to_tree (tree type, gimple *stmt)
> >  /* Combine OP0 CODE OP1 in the context of a COND_EXPR.  Returns
> >     the folded result in a form suitable for COND_EXPR_COND or
> >     NULL_TREE, if there is no suitable simplified form.  If
> > -   INVARIANT_ONLY is true only gimple_min_invariant results are
> > -   considered simplified.  */
> > +   ALWAYS_COMBINE is false then only combine it the resulting
> > +   expression is gimple_min_invariant or considered simplified
> > +   compared to the original.  */
> >
> >  static tree
> >  combine_cond_expr_cond (gimple *stmt, enum tree_code code, tree type,
> > -                       tree op0, tree op1, bool invariant_only)
> > +                       tree op0, tree op1, bool always_combine)
> >  {
> >    tree t;
> >
> > @@ -398,17 +399,31 @@ combine_cond_expr_cond (gimple *stmt, enum tree_c=
ode code, tree type,
> >    /* Canonicalize the combined condition for use in a COND_EXPR.  */
> >    t =3D canonicalize_cond_expr_cond (t);
> >
> > -  /* Bail out if we required an invariant but didn't get one.  */
> > -  if (!t || (invariant_only && !is_gimple_min_invariant (t)))
> > +  if (!t)
> >      {
> >        fold_undefer_overflow_warnings (false, NULL, 0);
> >        return NULL_TREE;
> >      }
> >
> > -  bool nowarn =3D warning_suppressed_p (stmt, OPT_Wstrict_overflow);
> > -  fold_undefer_overflow_warnings (!nowarn, stmt, 0);
> > +  if (always_combine || is_gimple_min_invariant (t))
> > +    {
> > +      bool nowarn =3D warning_suppressed_p (stmt, OPT_Wstrict_overflow=
);
> > +      fold_undefer_overflow_warnings (!nowarn, stmt, 0);
> > +      return t;
> > +    }
> >
> > -  return t;
> > +  /* If the result of folding is a zero comparison treat it preferenti=
ally.  */
> > +  if (TREE_CODE_CLASS (TREE_CODE (t)) =3D=3D tcc_comparison
> > +      && integer_zerop (TREE_OPERAND (t, 1))
> > +      && !integer_zerop (op1))
> > +    {
> > +      bool nowarn =3D warning_suppressed_p (stmt, OPT_Wstrict_overflow=
);
> > +      fold_undefer_overflow_warnings (!nowarn, stmt, 0);
> > +      return t;
> > +    }
> > +
> > +  fold_undefer_overflow_warnings (false, NULL, 0);
> > +  return NULL_TREE;
> >  }
> >
> >  /* Combine the comparison OP0 CODE OP1 at LOC with the defining statem=
ents
> > @@ -432,7 +447,7 @@ forward_propagate_into_comparison_1 (gimple *stmt,
> >        if (def_stmt && can_propagate_from (def_stmt))
> >         {
> >           enum tree_code def_code =3D gimple_assign_rhs_code (def_stmt)=
;
> > -         bool invariant_only_p =3D !single_use0_p;
> > +         bool always_combine =3D single_use0_p;
> >
> >           rhs0 =3D rhs_to_tree (TREE_TYPE (op1), def_stmt);
> >
> > @@ -442,10 +457,10 @@ forward_propagate_into_comparison_1 (gimple *stmt=
,
> >                    && TREE_CODE (TREE_TYPE (TREE_OPERAND (rhs0, 0)))
> >                       =3D=3D BOOLEAN_TYPE)
> >                   || TREE_CODE_CLASS (def_code) =3D=3D tcc_comparison))
> > -           invariant_only_p =3D false;
> > +           always_combine =3D true;
> >
> >           tmp =3D combine_cond_expr_cond (stmt, code, type,
> > -                                       rhs0, op1, invariant_only_p);
> > +                                       rhs0, op1, always_combine);
> >           if (tmp)
> >             return tmp;
> >         }
> > @@ -459,7 +474,7 @@ forward_propagate_into_comparison_1 (gimple *stmt,
> >         {
> >           rhs1 =3D rhs_to_tree (TREE_TYPE (op0), def_stmt);
> >           tmp =3D combine_cond_expr_cond (stmt, code, type,
> > -                                       op0, rhs1, !single_use1_p);
> > +                                       op0, rhs1, single_use1_p);
> >           if (tmp)
> >             return tmp;
> >         }
> > @@ -470,7 +485,7 @@ forward_propagate_into_comparison_1 (gimple *stmt,
> >        && rhs1 !=3D NULL_TREE)
> >      tmp =3D combine_cond_expr_cond (stmt, code, type,
> >                                   rhs0, rhs1,
> > -                                 !(single_use0_p && single_use1_p));
> > +                                 single_use0_p && single_use1_p);
> >
> >    return tmp;
> >  }
> > --
> > 2.34.1
> >