public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Richard Biener <richard.guenther@gmail.com>
To: Andrew Pinski <quic_apinski@quicinc.com>,
	gcc-patches@gcc.gnu.org,  richard.sandiford@arm.com
Subject: Re: Ping: [PATCHv3] aarch64/expr: Use ccmp when the outer expression is used twice [PR100942]
Date: Mon, 22 Jan 2024 15:06:36 +0100	[thread overview]
Message-ID: <CAFiYyc2odxuboOAwVKuJ_JJQgFKkXNf9SimgXTM=-bB-aDb6dA@mail.gmail.com> (raw)
In-Reply-To: <mptle8hqz66.fsf_-_@arm.com>

On Mon, Jan 22, 2024 at 2:24 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Ping for the expr/cfgexpand bits
>
> Richard Sandiford <richard.sandiford@arm.com> writes:
> > Andrew Pinski <quic_apinski@quicinc.com> writes:
> >> Ccmp is not used if the result of the and/ior is used by both
> >> a GIMPLE_COND and a GIMPLE_ASSIGN. This improves the code generation
> >> here by using ccmp in this case.
> >> Two changes is required, first we need to allow the outer statement's
> >> result be used more than once.
> >> The second change is that during the expansion of the gimple, we need
> >> to try using ccmp. This is needed because we don't use expand the ssa
> >> name of the lhs but rather expand directly from the gimple.
> >>
> >> A small note on the ccmp_4.c testcase, we should be able to get slightly
> >> better than with this patch but it is one extra instruction compared to
> >> before.
> >>
> >> Diff from v1:
> >> * v2: Split out expand_gimple_assign_ssa so the we only need to handle
> >> promotion once. Add ccmp_5.c testcase which was suggested. Change comment
> >> on ccmp_candidate_p.
> >
> > I meant more that we should split out the gassign handling in
> > expand_expr_real_1, since we're effectively making cfgexpand follow
> > it more closely.  What do you think about the attached version?
> > Tested on aarch64-linux-gnu and x86_64-linux-gnu.
> >
> > OK for the expr/cfgexpand bits?

OK.

Richard.

> > Thanks,
> > Richard
> >
> > ----
> >
> > Ccmp is not used if the result of the and/ior is used by both
> > a GIMPLE_COND and a GIMPLE_ASSIGN. This improves the code generation
> > here by using ccmp in this case.
> > Two changes is required, first we need to allow the outer statement's
> > result be used more than once.
> > The second change is that during the expansion of the gimple, we need
> > to try using ccmp. This is needed because we don't use expand the ssa
> > name of the lhs but rather expand directly from the gimple.
> >
> > A small note on the ccmp_4.c testcase, we should be able to get slightly
> > better than with this patch but it is one extra instruction compared to
> > before.
> >
> >       PR target/100942
> >
> > gcc/ChangeLog:
> >
> >       * ccmp.cc (ccmp_candidate_p): Add outer argument.
> >       Allow if the outer is true and the lhs is used more
> >       than once.
> >       (expand_ccmp_expr): Update call to ccmp_candidate_p.
> >       * expr.h (expand_expr_real_gassign): Declare.
> >       * expr.cc (expand_expr_real_gassign): New function, split out from...
> >       (expand_expr_real_1): ...here.
> >       * cfgexpand.cc (expand_gimple_stmt_1): Use expand_expr_real_gassign.
> >
> > gcc/testsuite/ChangeLog:
> >
> >       * gcc.target/aarch64/ccmp_3.c: New test.
> >       * gcc.target/aarch64/ccmp_4.c: New test.
> >       * gcc.target/aarch64/ccmp_5.c: New test.
> >
> > Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
> > Co-authored-by: Richard Sandiford <richard.sandiford@arm.com>
> > ---
> >  gcc/ccmp.cc                               |  12 +--
> >  gcc/cfgexpand.cc                          |  31 ++-----
> >  gcc/expr.cc                               | 103 ++++++++++++----------
> >  gcc/expr.h                                |   3 +
> >  gcc/testsuite/gcc.target/aarch64/ccmp_3.c |  20 +++++
> >  gcc/testsuite/gcc.target/aarch64/ccmp_4.c |  35 ++++++++
> >  gcc/testsuite/gcc.target/aarch64/ccmp_5.c |  20 +++++
> >  7 files changed, 149 insertions(+), 75 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/ccmp_3.c
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/ccmp_4.c
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/ccmp_5.c
> >
> > diff --git a/gcc/ccmp.cc b/gcc/ccmp.cc
> > index 09d6b5595a4..7cb525addf4 100644
> > --- a/gcc/ccmp.cc
> > +++ b/gcc/ccmp.cc
> > @@ -90,9 +90,10 @@ ccmp_tree_comparison_p (tree t, basic_block bb)
> >     If all checks OK in expand_ccmp_expr, it emits insns in prep_seq, then
> >     insns in gen_seq.  */
> >
> > -/* Check whether G is a potential conditional compare candidate.  */
> > +/* Check whether G is a potential conditional compare candidate; OUTER is true if
> > +   G is the outer most AND/IOR.  */
> >  static bool
> > -ccmp_candidate_p (gimple *g)
> > +ccmp_candidate_p (gimple *g, bool outer = false)
> >  {
> >    tree lhs, op0, op1;
> >    gimple *gs0, *gs1;
> > @@ -109,8 +110,9 @@ ccmp_candidate_p (gimple *g)
> >    lhs = gimple_assign_lhs (g);
> >    op0 = gimple_assign_rhs1 (g);
> >    op1 = gimple_assign_rhs2 (g);
> > -  if ((TREE_CODE (op0) != SSA_NAME) || (TREE_CODE (op1) != SSA_NAME)
> > -      || !has_single_use (lhs))
> > +  if ((TREE_CODE (op0) != SSA_NAME) || (TREE_CODE (op1) != SSA_NAME))
> > +    return false;
> > +  if (!outer && !has_single_use (lhs))
> >      return false;
> >
> >    bb = gimple_bb (g);
> > @@ -284,7 +286,7 @@ expand_ccmp_expr (gimple *g, machine_mode mode)
> >    rtx_insn *last;
> >    rtx tmp;
> >
> > -  if (!ccmp_candidate_p (g))
> > +  if (!ccmp_candidate_p (g, true))
> >      return NULL_RTX;
> >
> >    last = get_last_insn ();
> > diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
> > index 1db22f0a1a3..381ed2c82d7 100644
> > --- a/gcc/cfgexpand.cc
> > +++ b/gcc/cfgexpand.cc
> > @@ -3971,37 +3971,18 @@ expand_gimple_stmt_1 (gimple *stmt)
> >         {
> >           rtx target, temp;
> >           bool nontemporal = gimple_assign_nontemporal_move_p (assign_stmt);
> > -         struct separate_ops ops;
> >           bool promoted = false;
> >
> >           target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
> >           if (GET_CODE (target) == SUBREG && SUBREG_PROMOTED_VAR_P (target))
> >             promoted = true;
> >
> > -         ops.code = gimple_assign_rhs_code (assign_stmt);
> > -         ops.type = TREE_TYPE (lhs);
> > -         switch (get_gimple_rhs_class (ops.code))
> > -           {
> > -             case GIMPLE_TERNARY_RHS:
> > -               ops.op2 = gimple_assign_rhs3 (assign_stmt);
> > -               /* Fallthru */
> > -             case GIMPLE_BINARY_RHS:
> > -               ops.op1 = gimple_assign_rhs2 (assign_stmt);
> > -               /* Fallthru */
> > -             case GIMPLE_UNARY_RHS:
> > -               ops.op0 = gimple_assign_rhs1 (assign_stmt);
> > -               break;
> > -             default:
> > -               gcc_unreachable ();
> > -           }
> > -         ops.location = gimple_location (stmt);
> > -
> > -         /* If we want to use a nontemporal store, force the value to
> > -            register first.  If we store into a promoted register,
> > -            don't directly expand to target.  */
> > +        /* If we want to use a nontemporal store, force the value to
> > +           register first.  If we store into a promoted register,
> > +           don't directly expand to target.  */
> >           temp = nontemporal || promoted ? NULL_RTX : target;
> > -         temp = expand_expr_real_2 (&ops, temp, GET_MODE (target),
> > -                                    EXPAND_NORMAL);
> > +         temp = expand_expr_real_gassign (assign_stmt, temp,
> > +                                          GET_MODE (target), EXPAND_NORMAL);
> >
> >           if (temp == target)
> >             ;
> > @@ -4013,7 +3994,7 @@ expand_gimple_stmt_1 (gimple *stmt)
> >               if (CONSTANT_P (temp) && GET_MODE (temp) == VOIDmode)
> >                 {
> >                   temp = convert_modes (GET_MODE (target),
> > -                                       TYPE_MODE (ops.type),
> > +                                       TYPE_MODE (TREE_TYPE (lhs)),
> >                                         temp, unsignedp);
> >                   temp = convert_modes (GET_MODE (SUBREG_REG (target)),
> >                                         GET_MODE (target), temp, unsignedp);
> > diff --git a/gcc/expr.cc b/gcc/expr.cc
> > index dc816bc20fa..f9052a6ff5f 100644
> > --- a/gcc/expr.cc
> > +++ b/gcc/expr.cc
> > @@ -11035,6 +11035,62 @@ stmt_is_replaceable_p (gimple *stmt)
> >    return false;
> >  }
> >
> > +/* A subroutine of expand_expr_real_1.  Expand gimple assignment G,
> > +   which is known to set an SSA_NAME result.  The other arguments are
> > +   as for expand_expr_real_1.  */
> > +
> > +rtx
> > +expand_expr_real_gassign (gassign *g, rtx target, machine_mode tmode,
> > +                       enum expand_modifier modifier, rtx *alt_rtl,
> > +                       bool inner_reference_p)
> > +{
> > +  separate_ops ops;
> > +  rtx r;
> > +  location_t saved_loc = curr_insn_location ();
> > +  auto loc = gimple_location (g);
> > +  if (loc != UNKNOWN_LOCATION)
> > +    set_curr_insn_location (loc);
> > +  tree lhs = gimple_assign_lhs (g);
> > +  ops.code = gimple_assign_rhs_code (g);
> > +  ops.type = TREE_TYPE (lhs);
> > +  switch (get_gimple_rhs_class (ops.code))
> > +    {
> > +    case GIMPLE_TERNARY_RHS:
> > +      ops.op2 = gimple_assign_rhs3 (g);
> > +      /* Fallthru */
> > +    case GIMPLE_BINARY_RHS:
> > +      ops.op1 = gimple_assign_rhs2 (g);
> > +
> > +      /* Try to expand conditonal compare.  */
> > +      if (targetm.gen_ccmp_first)
> > +     {
> > +       gcc_checking_assert (targetm.gen_ccmp_next != NULL);
> > +       r = expand_ccmp_expr (g, TYPE_MODE (ops.type));
> > +       if (r)
> > +         break;
> > +     }
> > +      /* Fallthru */
> > +    case GIMPLE_UNARY_RHS:
> > +      ops.op0 = gimple_assign_rhs1 (g);
> > +      ops.location = loc;
> > +      r = expand_expr_real_2 (&ops, target, tmode, modifier);
> > +      break;
> > +    case GIMPLE_SINGLE_RHS:
> > +      {
> > +     r = expand_expr_real (gimple_assign_rhs1 (g), target,
> > +                           tmode, modifier, alt_rtl,
> > +                           inner_reference_p);
> > +     break;
> > +      }
> > +    default:
> > +      gcc_unreachable ();
> > +    }
> > +  set_curr_insn_location (saved_loc);
> > +  if (REG_P (r) && !REG_EXPR (r))
> > +    set_reg_attrs_for_decl_rtl (lhs, r);
> > +  return r;
> > +}
> > +
> >  rtx
> >  expand_expr_real_1 (tree exp, rtx target, machine_mode tmode,
> >                   enum expand_modifier modifier, rtx *alt_rtl,
> > @@ -11198,51 +11254,8 @@ expand_expr_real_1 (tree exp, rtx target, machine_mode tmode,
> >         && stmt_is_replaceable_p (SSA_NAME_DEF_STMT (exp)))
> >       g = SSA_NAME_DEF_STMT (exp);
> >        if (g)
> > -     {
> > -       rtx r;
> > -       location_t saved_loc = curr_insn_location ();
> > -       loc = gimple_location (g);
> > -       if (loc != UNKNOWN_LOCATION)
> > -         set_curr_insn_location (loc);
> > -       ops.code = gimple_assign_rhs_code (g);
> > -          switch (get_gimple_rhs_class (ops.code))
> > -         {
> > -         case GIMPLE_TERNARY_RHS:
> > -           ops.op2 = gimple_assign_rhs3 (g);
> > -           /* Fallthru */
> > -         case GIMPLE_BINARY_RHS:
> > -           ops.op1 = gimple_assign_rhs2 (g);
> > -
> > -           /* Try to expand conditonal compare.  */
> > -           if (targetm.gen_ccmp_first)
> > -             {
> > -               gcc_checking_assert (targetm.gen_ccmp_next != NULL);
> > -               r = expand_ccmp_expr (g, mode);
> > -               if (r)
> > -                 break;
> > -             }
> > -           /* Fallthru */
> > -         case GIMPLE_UNARY_RHS:
> > -           ops.op0 = gimple_assign_rhs1 (g);
> > -           ops.type = TREE_TYPE (gimple_assign_lhs (g));
> > -           ops.location = loc;
> > -           r = expand_expr_real_2 (&ops, target, tmode, modifier);
> > -           break;
> > -         case GIMPLE_SINGLE_RHS:
> > -           {
> > -             r = expand_expr_real (gimple_assign_rhs1 (g), target,
> > -                                   tmode, modifier, alt_rtl,
> > -                                   inner_reference_p);
> > -             break;
> > -           }
> > -         default:
> > -           gcc_unreachable ();
> > -         }
> > -       set_curr_insn_location (saved_loc);
> > -       if (REG_P (r) && !REG_EXPR (r))
> > -         set_reg_attrs_for_decl_rtl (SSA_NAME_VAR (exp), r);
> > -       return r;
> > -     }
> > +     return expand_expr_real_gassign (as_a<gassign *> (g), target, tmode,
> > +                                      modifier, alt_rtl, inner_reference_p);
> >
> >        ssa_name = exp;
> >        decl_rtl = get_rtx_for_ssa_name (ssa_name);
> > diff --git a/gcc/expr.h b/gcc/expr.h
> > index f04f40da6ab..64956f63029 100644
> > --- a/gcc/expr.h
> > +++ b/gcc/expr.h
> > @@ -302,6 +302,9 @@ extern rtx expand_expr_real_1 (tree, rtx, machine_mode,
> >                              enum expand_modifier, rtx *, bool);
> >  extern rtx expand_expr_real_2 (sepops, rtx, machine_mode,
> >                              enum expand_modifier);
> > +extern rtx expand_expr_real_gassign (gassign *, rtx, machine_mode,
> > +                                  enum expand_modifier modifier,
> > +                                  rtx * = nullptr, bool = false);
> >
> >  /* Generate code for computing expression EXP.
> >     An rtx for the computed value is returned.  The value is never null.
> > diff --git a/gcc/testsuite/gcc.target/aarch64/ccmp_3.c b/gcc/testsuite/gcc.target/aarch64/ccmp_3.c
> > new file mode 100644
> > index 00000000000..a2b47fbee14
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/ccmp_3.c
> > @@ -0,0 +1,20 @@
> > +/* { dg-options "-O2" } */
> > +/* PR target/100942 */
> > +
> > +void foo(void);
> > +int f1(int a, int b)
> > +{
> > +  int c = a == 0 || b == 0;
> > +  if (c) foo();
> > +  return c;
> > +}
> > +
> > +/* We should get one cmp followed by ccmp and one cset. */
> > +/* { dg-final { scan-assembler "\tccmp\t" } } */
> > +/* { dg-final { scan-assembler-times "\tcset\t" 1 } } */
> > +/* { dg-final { scan-assembler-times "\tcmp\t" 1 } } */
> > +/* And not get 2 cmps and 2 (or more cset) and orr and a cbnz. */
> > +/* { dg-final { scan-assembler-not "\torr\t" } } */
> > +/* { dg-final { scan-assembler-not "\tcbnz\t" } } */
> > +/* { dg-final { scan-assembler-not "\tcbz\t" } } */
> > +
> > diff --git a/gcc/testsuite/gcc.target/aarch64/ccmp_4.c b/gcc/testsuite/gcc.target/aarch64/ccmp_4.c
> > new file mode 100644
> > index 00000000000..bc0f57a7c59
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/ccmp_4.c
> > @@ -0,0 +1,35 @@
> > +/* { dg-options "-O2" } */
> > +/* PR target/100942 */
> > +
> > +void foo(void);
> > +int f1(int a, int b, int d)
> > +{
> > +  int c = a < 8 || b < 9;
> > +  int e = d < 11 || c;
> > +  if (e) foo();
> > +  return c;
> > +}
> > +
> > +/*
> > +  We really should get:
> > +        cmp     w0, 7
> > +        ccmp    w1, 8, 4, gt
> > +        cset    w0, le
> > +        ccmp    w2, 10, 4, gt
> > +        ble     .L11
> > +
> > +  But we currently get:
> > +        cmp     w0, 7
> > +        ccmp    w1, 8, 4, gt
> > +        cset    w0, le
> > +        cmp     w0, 0
> > +        ccmp    w2, 10, 4, eq
> > +        ble     .L11
> > +  The middle cmp is not needed.
> > + */
> > +
> > +/* We should end up with only one cmp and 2 ccmp and 1 cset but currently we get 2 cmp
> > +   though. */
> > +/* { dg-final { scan-assembler-times "\tccmp\t" 2 } } */
> > +/* { dg-final { scan-assembler-times "\tcset\t" 1 } } */
> > +/* { dg-final { scan-assembler-times "\tcmp\t" 1 { xfail *-*-* } } } */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/ccmp_5.c b/gcc/testsuite/gcc.target/aarch64/ccmp_5.c
> > new file mode 100644
> > index 00000000000..7e52ae4f322
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/ccmp_5.c
> > @@ -0,0 +1,20 @@
> > +
> > +/* { dg-options "-O2" } */
> > +/* PR target/100942 */
> > +void f1(int a, int b, _Bool *x)
> > +{
> > +  x[0] = x[1] = a == 0 || b == 0;
> > +}
> > +
> > +void f2(int a, int b, int *x)
> > +{
> > +  x[0] = x[1] = a == 0 || b == 0;
> > +}
> > +
> > +
> > +/* Both functions should be using ccmp rather than 2 cset/orr.  */
> > +/* { dg-final { scan-assembler-times "\tccmp\t" 2 } } */
> > +/* { dg-final { scan-assembler-times "\tcset\t" 2 } } */
> > +/* { dg-final { scan-assembler-times "\tcmp\t" 2 } } */
> > +/* { dg-final { scan-assembler-not "\torr\t" } } */
> > +

      reply	other threads:[~2024-01-22 14:07 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-05  7:43 [PATCHv2] " Andrew Pinski
2024-01-12 12:26 ` [PATCHv3] " Richard Sandiford
2024-01-12 21:28   ` Andrew Pinski (QUIC)
2024-01-22 13:24   ` Ping: " Richard Sandiford
2024-01-22 14:06     ` Richard Biener [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAFiYyc2odxuboOAwVKuJ_JJQgFKkXNf9SimgXTM=-bB-aDb6dA@mail.gmail.com' \
    --to=richard.guenther@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=quic_apinski@quicinc.com \
    --cc=richard.sandiford@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).