public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Richard Biener <richard.guenther@gmail.com>
To: Philipp Tomsich <philipp.tomsich@vrull.eu>
Cc: Manolis Tsamis <manolis.tsamis@vrull.eu>,
	Andrew MacLeod <amacleod@redhat.com>,
	 gcc-patches@gcc.gnu.org
Subject: Re: [PATCH v1] [RFC] Improve folding for comparisons with zero in tree-ssa-forwprop.
Date: Fri, 17 Mar 2023 15:03:20 +0100	[thread overview]
Message-ID: <CAFiYyc1t5scks4oo78ECe_2s3_Rvi2Z1RMsZtKH66EZT4ZGwyw@mail.gmail.com> (raw)
In-Reply-To: <CAAeLtUBuuV7SPde0gxr-OfyrJZcfhBu1Q6oBaLEvN9XYeCdBTQ@mail.gmail.com>

On Fri, Mar 17, 2023 at 2:15 PM Philipp Tomsich
<philipp.tomsich@vrull.eu> wrote:
>
> On Fri, 17 Mar 2023 at 09:31, Richard Biener <richard.guenther@gmail.com> wrote:
> >
> > On Thu, Mar 16, 2023 at 4:27 PM Manolis Tsamis <manolis.tsamis@vrull.eu> wrote:
> > >
> > > For this C testcase:
> > >
> > > void g();
> > > void f(unsigned int *a)
> > > {
> > >   if (++*a == 1)
> > >     g();
> > > }
> > >
> > > GCC will currently emit a comparison with 1 by using the value
> > > of *a after the increment. This can be improved by comparing
> > > against 0 and using the value before the increment. As a result
> > > there is a potentially shorter dependancy chain (no need to wait
> > > for the result of +1) and on targets with compare zero instructions
> > > the generated code is one instruction shorter.
> >
> > The downside is we now need two registers and their lifetime overlaps.
> >
> > Your patch mixes changing / inverting a parameter (which seems unneeded
> > for the actual change) with preferring compares against zero.
> >
> > What's the reason to specifically prefer compares against zero?  On x86
> > we have add that sets flags, so ++*a == 0 would be preferred, but
> > for your sequence we'd need a test reg, reg; branch on zero, so we do
> > not save any instruction.
>
> AArch64, RISC-V and MIPS support a branch-on-(not-)equals-zero, while
> comparing against a constant requires to load any non-zero value into
> a register first.
> This feels a bit like we need to call onto the backend to check
> whether comparisons against 0 are cheaper.

Hmm - we should try hard to not introduce too many target dependent IL
differences early.  I do wonder if it's possible to handle this later
(and also the reverse transform).  Ideally we'd know register pressure
and the cost of the constants involved as well.  We do some related
"fixups" when rewriting out-of-SSA, mainly to improve coalescing
around backedges and avoiding overlapping life ranges there for loop
exit compares with live before/after IVs after the loop.

> Obviously, the underlying issue become worse if the immediate can not
> be built up in a single instruction.
> Using RISC-V as an example (primarily, as RISC-V makes it particularly
> easy to run into multi-instruction sequences for constants), we can
> construct the following case:
>
>   void f(unsigned int *a)
>   {
>     if ((*a += 0x900) == 0x900)
>        g();
>   }
>
> which GCC 12.2.0 (trunk may already be small enough to reuse the
> constant once loaded into register, but I did not check…) with -O3
> turns into:
>
> f:
>   lw a4,0(a0)
>   li a5,4096
>   addiw a5,a5,-1792
>   addw a4,a5,a4
>   li a5,4096
>   sw a4,0(a0)
>   addi a5,a5,-1792
>   beq a4,a5,.L4
>   ret
> .L4:
>   tail g
>
> Thanks,
> Philipp.
>
>
> On Fri, 17 Mar 2023 at 09:31, Richard Biener <richard.guenther@gmail.com> wrote:
> >
> > On Thu, Mar 16, 2023 at 4:27 PM Manolis Tsamis <manolis.tsamis@vrull.eu> wrote:
> > >
> > > For this C testcase:
> > >
> > > void g();
> > > void f(unsigned int *a)
> > > {
> > >   if (++*a == 1)
> > >     g();
> > > }
> > >
> > > GCC will currently emit a comparison with 1 by using the value
> > > of *a after the increment. This can be improved by comparing
> > > against 0 and using the value before the increment. As a result
> > > there is a potentially shorter dependancy chain (no need to wait
> > > for the result of +1) and on targets with compare zero instructions
> > > the generated code is one instruction shorter.
> >
> > The downside is we now need two registers and their lifetime overlaps.
> >
> > Your patch mixes changing / inverting a parameter (which seems unneeded
> > for the actual change) with preferring compares against zero.
> >
> > What's the reason to specifically prefer compares against zero?  On x86
> > we have add that sets flags, so ++*a == 0 would be preferred, but
> > for your sequence we'd need a test reg, reg; branch on zero, so we do
> > not save any instruction.
> >
> > We do have quite some number of bugreports with regards to making VRPs
> > life harder when splitting things this way.  It's easier for VRP to handle
> >
> >   _1 = _2 + 1;
> >   if (_1 == 1)
> >
> > than it is
> >
> >   _1 = _2 + 1;
> >   if (_2 == 0)
> >
> > where VRP fails to derive a range for _1 on the _2 == 0 branch.  So besides
> > the life-range issue there's other side-effects as well.  Maybe ranger meanwhile
> > can handle the above case?
> >
> > What's the overall effect of the change on a larger code base?
> >
> > Thanks,
> > Richard.
> >
> > >
> > > Example from Aarch64:
> > >
> > > Before
> > >         ldr     w1, [x0]
> > >         add     w1, w1, 1
> > >         str     w1, [x0]
> > >         cmp     w1, 1
> > >         beq     .L4
> > >         ret
> > >
> > > After
> > >         ldr     w1, [x0]
> > >         add     w2, w1, 1
> > >         str     w2, [x0]
> > >         cbz     w1, .L4
> > >         ret
> > >
> > > gcc/ChangeLog:
> > >
> > >         * tree-ssa-forwprop.cc (combine_cond_expr_cond):
> > >         (forward_propagate_into_comparison_1): Optimize
> > >         for zero comparisons.
> > >
> > > Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu>
> > > ---
> > >
> > >  gcc/tree-ssa-forwprop.cc | 41 +++++++++++++++++++++++++++-------------
> > >  1 file changed, 28 insertions(+), 13 deletions(-)
> > >
> > > diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc
> > > index e34f0888954..93d5043821b 100644
> > > --- a/gcc/tree-ssa-forwprop.cc
> > > +++ b/gcc/tree-ssa-forwprop.cc
> > > @@ -373,12 +373,13 @@ rhs_to_tree (tree type, gimple *stmt)
> > >  /* Combine OP0 CODE OP1 in the context of a COND_EXPR.  Returns
> > >     the folded result in a form suitable for COND_EXPR_COND or
> > >     NULL_TREE, if there is no suitable simplified form.  If
> > > -   INVARIANT_ONLY is true only gimple_min_invariant results are
> > > -   considered simplified.  */
> > > +   ALWAYS_COMBINE is false then only combine it the resulting
> > > +   expression is gimple_min_invariant or considered simplified
> > > +   compared to the original.  */
> > >
> > >  static tree
> > >  combine_cond_expr_cond (gimple *stmt, enum tree_code code, tree type,
> > > -                       tree op0, tree op1, bool invariant_only)
> > > +                       tree op0, tree op1, bool always_combine)
> > >  {
> > >    tree t;
> > >
> > > @@ -398,17 +399,31 @@ combine_cond_expr_cond (gimple *stmt, enum tree_code code, tree type,
> > >    /* Canonicalize the combined condition for use in a COND_EXPR.  */
> > >    t = canonicalize_cond_expr_cond (t);
> > >
> > > -  /* Bail out if we required an invariant but didn't get one.  */
> > > -  if (!t || (invariant_only && !is_gimple_min_invariant (t)))
> > > +  if (!t)
> > >      {
> > >        fold_undefer_overflow_warnings (false, NULL, 0);
> > >        return NULL_TREE;
> > >      }
> > >
> > > -  bool nowarn = warning_suppressed_p (stmt, OPT_Wstrict_overflow);
> > > -  fold_undefer_overflow_warnings (!nowarn, stmt, 0);
> > > +  if (always_combine || is_gimple_min_invariant (t))
> > > +    {
> > > +      bool nowarn = warning_suppressed_p (stmt, OPT_Wstrict_overflow);
> > > +      fold_undefer_overflow_warnings (!nowarn, stmt, 0);
> > > +      return t;
> > > +    }
> > >
> > > -  return t;
> > > +  /* If the result of folding is a zero comparison treat it preferentially.  */
> > > +  if (TREE_CODE_CLASS (TREE_CODE (t)) == tcc_comparison
> > > +      && integer_zerop (TREE_OPERAND (t, 1))
> > > +      && !integer_zerop (op1))
> > > +    {
> > > +      bool nowarn = warning_suppressed_p (stmt, OPT_Wstrict_overflow);
> > > +      fold_undefer_overflow_warnings (!nowarn, stmt, 0);
> > > +      return t;
> > > +    }
> > > +
> > > +  fold_undefer_overflow_warnings (false, NULL, 0);
> > > +  return NULL_TREE;
> > >  }
> > >
> > >  /* Combine the comparison OP0 CODE OP1 at LOC with the defining statements
> > > @@ -432,7 +447,7 @@ forward_propagate_into_comparison_1 (gimple *stmt,
> > >        if (def_stmt && can_propagate_from (def_stmt))
> > >         {
> > >           enum tree_code def_code = gimple_assign_rhs_code (def_stmt);
> > > -         bool invariant_only_p = !single_use0_p;
> > > +         bool always_combine = single_use0_p;
> > >
> > >           rhs0 = rhs_to_tree (TREE_TYPE (op1), def_stmt);
> > >
> > > @@ -442,10 +457,10 @@ forward_propagate_into_comparison_1 (gimple *stmt,
> > >                    && TREE_CODE (TREE_TYPE (TREE_OPERAND (rhs0, 0)))
> > >                       == BOOLEAN_TYPE)
> > >                   || TREE_CODE_CLASS (def_code) == tcc_comparison))
> > > -           invariant_only_p = false;
> > > +           always_combine = true;
> > >
> > >           tmp = combine_cond_expr_cond (stmt, code, type,
> > > -                                       rhs0, op1, invariant_only_p);
> > > +                                       rhs0, op1, always_combine);
> > >           if (tmp)
> > >             return tmp;
> > >         }
> > > @@ -459,7 +474,7 @@ forward_propagate_into_comparison_1 (gimple *stmt,
> > >         {
> > >           rhs1 = rhs_to_tree (TREE_TYPE (op0), def_stmt);
> > >           tmp = combine_cond_expr_cond (stmt, code, type,
> > > -                                       op0, rhs1, !single_use1_p);
> > > +                                       op0, rhs1, single_use1_p);
> > >           if (tmp)
> > >             return tmp;
> > >         }
> > > @@ -470,7 +485,7 @@ forward_propagate_into_comparison_1 (gimple *stmt,
> > >        && rhs1 != NULL_TREE)
> > >      tmp = combine_cond_expr_cond (stmt, code, type,
> > >                                   rhs0, rhs1,
> > > -                                 !(single_use0_p && single_use1_p));
> > > +                                 single_use0_p && single_use1_p);
> > >
> > >    return tmp;
> > >  }
> > > --
> > > 2.34.1
> > >

  reply	other threads:[~2023-03-17 14:03 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-16 15:27 Manolis Tsamis
2023-03-16 16:41 ` Jeff Law
2023-03-16 20:32   ` Philipp Tomsich
2023-03-17  8:31 ` Richard Biener
2023-03-17 13:15   ` Philipp Tomsich
2023-03-17 14:03     ` Richard Biener [this message]
2023-03-17 20:43     ` Andrew Waterman
2023-03-17 14:12   ` Andrew MacLeod
2023-03-20 14:01   ` Manolis Tsamis
2023-03-23 23:27     ` Jeff Law
2023-04-21 21:01     ` Philipp Tomsich
2023-04-24  8:06       ` Richard Biener
2023-04-24 23:05         ` Jeff Law
2023-04-25  7:21           ` Richard Biener
2023-04-26  2:30             ` Jeff Law
2023-04-26  6:41               ` Richard Biener
2023-08-02 14:07                 ` Manolis Tsamis
2023-08-03  7:04                   ` Richard Biener
2023-08-03 15:21                     ` Jeff Law
2023-08-04  6:37                       ` Richard Biener

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAFiYyc1t5scks4oo78ECe_2s3_Rvi2Z1RMsZtKH66EZT4ZGwyw@mail.gmail.com \
    --to=richard.guenther@gmail.com \
    --cc=amacleod@redhat.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=manolis.tsamis@vrull.eu \
    --cc=philipp.tomsich@vrull.eu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).