Re: [PATCH] Add a bit dislike for separate mem alternative when op is REG_P.

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Hongtao Liu <crazylht@gmail.com>
To: Vladimir Makarov <vmakarov@redhat.com>
Cc: liuhongt <hongtao.liu@intel.com>, GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] Add a bit dislike for separate mem alternative when op is REG_P.
Date: Mon, 30 May 2022 11:05:00 +0800	[thread overview]
Message-ID: <CAMZc-bxUshdxtfp-wgboN+=Uz0Y-8K4=wj0ab3T43d_AjaRGYA@mail.gmail.com> (raw)
In-Reply-To: <8b505a07-64bb-f483-63cc-cef6e8e4642c@redhat.com>

On Fri, May 27, 2022 at 5:12 AM Vladimir Makarov via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
>
> On 2022-05-24 23:39, liuhongt wrote:
> > Rigt now, mem_cost for separate mem alternative is 1 * frequency which
> > is pretty small and caused the unnecessary SSE spill in the PR, I've tried
> > to rework backend cost model, but RA still not happy with that(regress
> > somewhere else). I think the root cause of this is cost for separate 'm'
> > alternative cost is too small, especially considering that the mov cost
> > of gpr are 2(default for REGISTER_MOVE_COST). So this patch increase mem_cost
> > to 2*frequency, also increase 1 for reg_class cost when m alternative.
> >
> >
> > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > Ok for trunk?
>
> Thank you for addressing this problem. And sorry I can not approve this
> patch at least w/o your additional work on benchmarking this change.
>
> This code is very old.  It is coming from older RA (former file
> regclass.c) and existed practically since GCC day 1.  People tried many
> times to improve this code.  The code also affects many targets.
Yes, that's why I increased it as low as possible, so it won't regress
#c6 in the PR.
>
> I can approve this patch if you show that there is no regression at
> least on x86-64 on some credible benchmark, e.g. SPEC2006 or SPEC2017.
>
I've tested the patch for SPEC2017 with both  -march=cascadelake
-Ofast -flto and -O2 -mtune=generic.
No obvious regression is observed, the binaries are all different from
before, so I looked at 2 of them, the difference mainly comes from
different choices of registers(xmm13 -> xmm12).
Ok for trunk then?
> I know it is a big work but when I myself do such changes I check
> SPEC2017.  I rejected my changes like this one several times when I
> benchmarked them on SPEC2017 although at the first glance they looked
> reasonable.
>
> > gcc/ChangeLog:
> >
> >       PR target/105513
> >       * ira-costs.cc (record_reg_classes): Increase both mem_cost
> >       and reg class cost by 1 for separate mem alternative when
> >       REG_P (op).
> >
> > gcc/testsuite/ChangeLog:
> >
> >       * gcc.target/i386/pr105513-1.c: New test.
> > ---
> >   gcc/ira-costs.cc                           | 26 +++++++++++++---------
> >   gcc/testsuite/gcc.target/i386/pr105513-1.c | 16 +++++++++++++
> >   2 files changed, 31 insertions(+), 11 deletions(-)
> >   create mode 100644 gcc/testsuite/gcc.target/i386/pr105513-1.c
> >
> > diff --git a/gcc/ira-costs.cc b/gcc/ira-costs.cc
> > index 964c94a06ef..f7b8325e195 100644
> > --- a/gcc/ira-costs.cc
> > +++ b/gcc/ira-costs.cc
> > @@ -625,7 +625,8 @@ record_reg_classes (int n_alts, int n_ops, rtx *ops,
> >                         for (k = cost_classes_ptr->num - 1; k >= 0; k--)
> >                           {
> >                             rclass = cost_classes[k];
> > -                           pp_costs[k] = mem_cost[rclass][0] * frequency;
> > +                           pp_costs[k] = (mem_cost[rclass][0]
> > +                                          + 1) * frequency;
> >                           }
> >                       }
> >                     else
> > @@ -648,7 +649,8 @@ record_reg_classes (int n_alts, int n_ops, rtx *ops,
> >                         for (k = cost_classes_ptr->num - 1; k >= 0; k--)
> >                           {
> >                             rclass = cost_classes[k];
> > -                           pp_costs[k] = mem_cost[rclass][1] * frequency;
> > +                           pp_costs[k] = (mem_cost[rclass][1]
> > +                                          + 1) * frequency;
> >                           }
> >                       }
> >                     else
> > @@ -670,9 +672,9 @@ record_reg_classes (int n_alts, int n_ops, rtx *ops,
> >                         for (k = cost_classes_ptr->num - 1; k >= 0; k--)
> >                           {
> >                             rclass = cost_classes[k];
> > -                           pp_costs[k] = ((mem_cost[rclass][0]
> > -                                           + mem_cost[rclass][1])
> > -                                          * frequency);
> > +                           pp_costs[k] = (mem_cost[rclass][0]
> > +                                          + mem_cost[rclass][1]
> > +                                          + 2) * frequency;
> >                           }
> >                       }
> >                     else
> > @@ -861,7 +863,8 @@ record_reg_classes (int n_alts, int n_ops, rtx *ops,
> >                         for (k = cost_classes_ptr->num - 1; k >= 0; k--)
> >                           {
> >                             rclass = cost_classes[k];
> > -                           pp_costs[k] = mem_cost[rclass][0] * frequency;
> > +                           pp_costs[k] = (mem_cost[rclass][0]
> > +                                          + 1) * frequency;
> >                           }
> >                       }
> >                     else
> > @@ -884,7 +887,8 @@ record_reg_classes (int n_alts, int n_ops, rtx *ops,
> >                         for (k = cost_classes_ptr->num - 1; k >= 0; k--)
> >                           {
> >                             rclass = cost_classes[k];
> > -                           pp_costs[k] = mem_cost[rclass][1] * frequency;
> > +                           pp_costs[k] = (mem_cost[rclass][1]
> > +                                          + 1) * frequency;
> >                           }
> >                       }
> >                     else
> > @@ -906,9 +910,9 @@ record_reg_classes (int n_alts, int n_ops, rtx *ops,
> >                         for (k = cost_classes_ptr->num - 1; k >= 0; k--)
> >                           {
> >                             rclass = cost_classes[k];
> > -                           pp_costs[k] = ((mem_cost[rclass][0]
> > -                                           + mem_cost[rclass][1])
> > -                                          * frequency);
> > +                           pp_costs[k] = (mem_cost[rclass][0]
> > +                                          + mem_cost[rclass][1]
> > +                                          + 2) * frequency;
> >                           }
> >                       }
> >                     else
> > @@ -929,7 +933,7 @@ record_reg_classes (int n_alts, int n_ops, rtx *ops,
> >                   /* Although we don't need insn to reload from
> >                      memory, still accessing memory is usually more
> >                      expensive than a register.  */
> > -                 pp->mem_cost = frequency;
> > +                 pp->mem_cost = 2 * frequency;
> >                 else
> >                   /* If the alternative actually allows memory, make
> >                      things a bit cheaper since we won't need an
> > diff --git a/gcc/testsuite/gcc.target/i386/pr105513-1.c b/gcc/testsuite/gcc.target/i386/pr105513-1.c
> > new file mode 100644
> > index 00000000000..530f5292252
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr105513-1.c
> > @@ -0,0 +1,16 @@
> > +/* { dg-do compile { target { ! ia32 } } } */
> > +/* { dg-options "-O2 -msse2 -mtune=skylake -mfpmath=sse" } */
> > +/* { dg-final { scan-assembler-not "\\(%rsp\\)" } } */
> > +
> > +static int as_int(float x)
> > +{
> > +    return (union{float x; int i;}){x}.i;
> > +}
> > +
> > +float f(double y, float x)
> > +{
> > +    int i = as_int(x);
> > +    if (__builtin_expect(i > 99, 0)) return 0;
> > +    if (i*2u < 77) if (i==2) return 0;
> > +    return y*x;
> > +}
>


-- 
BR,
Hongtao

next prev parent reply	other threads:[~2022-05-30  3:05 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-25  3:39 liuhongt
2022-05-25  5:17 ` Hongtao Liu
2022-05-26 21:12 ` Vladimir Makarov
2022-05-30  3:05   ` Hongtao Liu [this message]
2022-05-31 16:28     ` Vladimir Makarov
2022-05-31 16:40       ` Richard Sandiford
2022-05-31 23:51         ` Hongtao Liu
2022-05-27  9:39 ` Alexander Monakov
2022-05-30  2:52   ` Liu, Hongtao
2022-05-30  6:22     ` Alexander Monakov
2022-05-30  7:14       ` Hongtao Liu
2022-05-30  7:44         ` Alexander Monakov
2022-05-30  8:34           ` Hongtao Liu
2022-05-30  9:41             ` Alexander Monakov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAMZc-bxUshdxtfp-wgboN+=Uz0Y-8K4=wj0ab3T43d_AjaRGYA@mail.gmail.com' \
    --to=crazylht@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=hongtao.liu@intel.com \
    --cc=vmakarov@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).