Re: [PATCH, i386]: Do not limit the cost of moves to/from XMM register to minimum 8.

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Hongtao Liu <crazylht@gmail.com>
To: Uros Bizjak <ubizjak@gmail.com>
Cc: Richard Biener <richard.guenther@gmail.com>,
	Jakub Jelinek <jakub@redhat.com>, 	Alan Modra <amodra@gmail.com>,
	"gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH, i386]: Do not limit the cost of moves to/from XMM register to minimum 8.
Date: Tue, 03 Sep 2019 07:57:00 -0000	[thread overview]
Message-ID: <CAMZc-bwwsGRHgUzYjZLju+ZrPx6X1bfK31nqR2RaBgK1vcFmAg@mail.gmail.com> (raw)
In-Reply-To: <CAFULd4be1Du4hZb+YwS9meEk2G7wwf7KmFtigruo-ZA36hd0rg@mail.gmail.com>

On Mon, Sep 2, 2019 at 4:41 PM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Mon, Sep 2, 2019 at 10:13 AM Hongtao Liu <crazylht@gmail.com> wrote:
> >
> > > which is not the case with core_cost (and similar with skylake_cost):
> > >
> > >   2, 2, 4,                /* cost of moving XMM,YMM,ZMM register */
> > >   {6, 6, 6, 6, 12},            /* cost of loading SSE registers
> > >                        in 32,64,128,256 and 512-bit */
> > >   {6, 6, 6, 6, 12},            /* cost of storing SSE registers
> > >                        in 32,64,128,256 and 512-bit */
> > >   2, 2,                    /* SSE->integer and integer->SSE moves */
> > >
> > > We have the same cost of moving between integer registers (by default
> > > set to 2), between SSE registers and between integer and SSE register
> > > sets. I think that at least the cost of moves between regsets should
> > > be substantially higher, rs6000 uses 3x cost of intra-regset moves;
> > > that would translate to the value of 6. The value should be low enough
> > > to keep the cost below the value that forces move through the memory.
> > > Changing core register allocation cost of SSE <-> integer to:
> > >
> > > --cut here--
> > > Index: config/i386/x86-tune-costs.h
> > > ===================================================================
> > > --- config/i386/x86-tune-costs.h        (revision 275281)
> > > +++ config/i386/x86-tune-costs.h        (working copy)
> > > @@ -2555,7 +2555,7 @@ struct processor_costs core_cost = {
> > >                                            in 32,64,128,256 and 512-bit */
> > >    {6, 6, 6, 6, 12},                    /* cost of storing SSE registers
> > >                                            in 32,64,128,256 and 512-bit */
> > > -  2, 2,                                        /* SSE->integer and
> > > integer->SSE moves */
> > > +  6, 6,                                        /* SSE->integer and
> > > integer->SSE moves */
> > >    /* End of register allocator costs.  */
> > >    },
> > >
> > > --cut here--
> > >
> > > still produces direct move in gcc.target/i386/minmax-6.c
> > >
> > > I think that in addition to attached patch, values between 2 and 6
> > > should be considered in benchmarking. Unfortunately, without access to
> > > regressed SPEC tests, I can't analyse these changes by myself.
> > >
> > > Uros.
> >
> > Apply similar change to skylake_cost, on skylake workstation we got
> > performance like:
> > ---------------------------
> > version                                                            |
> > 548_exchange_r score
> > gcc10_20180822:                                           |   10
> > apply remove_max8                                       |   8.9
> > also apply increase integer_tofrom_sse cost |   9.69
> > -----------------------------
> > Still 3% regression which is related to _gfortran_mminloc0_4_i4 in
> > libgfortran.so.5.0.0.
> >
> > I found suspicious code as bellow, does it affect?
>
> Hard to say without access to the test, but I'm glad that changing the
> knob has noticeable effect. I think that (as said by Alan) a fine-tune
> of register pressure calculation will be needed to push this forward.
>
> Uros.
>
> > ------------------
> > modified   gcc/config/i386/i386-features.c
> > @@ -590,7 +590,7 @@ general_scalar_chain::compute_convert_gain ()
> >    if (dump_file)
> >      fprintf (dump_file, "  Instruction conversion gain: %d\n", gain);
> >
> > -  /* ???  What about integer to SSE?  */
> > +  /* ???  What about integer to SSE?  */???
> >    EXECUTE_IF_SET_IN_BITMAP (defs_conv, 0, insn_uid, bi)
> >      cost += DF_REG_DEF_COUNT (insn_uid) * ix86_cost->sse_to_integer;
> > ------------------
> > --
> > BR,
> > Hongtao

Note:
Removing limit of cost would introduce lots of regressions in SPEC2017 as follow
--------------------------------
531.deepsjeng_r  -7.18%
548.exchange_r  -6.70%
557.xz_r -6.74%
508.namd_r -2.81%
527.cam4_r -6.48%
544.nab_r -3.99%

Tested on skylake server.
-------------------------------------
How about  changing cost from 2 to 8 until we figure out a better number.

-- 
BR,
Hongtao

next prev parent reply	other threads:[~2019-09-03  7:57 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-28 16:12 [PATCH, i386]: Improve STV conversion of shifts Uros Bizjak
2019-08-29  8:49 ` Uros Bizjak
2019-08-29 10:11   ` [PATCH, i386]: Fix secondary_reload_needed (was: Re: [PATCH, i386]: Improve STV conversion of shifts) Uros Bizjak
2019-08-29 10:17     ` Uros Bizjak
2019-08-30  7:40   ` [PATCH, i386]: Improve STV conversion of shifts Richard Biener
2019-08-30  8:29     ` [PATCH, i386]: Do not limit the cost of moves to/from XMM register to minimum 8 Uros Bizjak
2019-08-31  3:01       ` Alan Modra
2019-08-31 16:12         ` Richard Biener
2019-09-01 19:49           ` Uros Bizjak
2019-09-02  7:16             ` Alan Modra
2019-09-02  8:13             ` Hongtao Liu
2019-09-02  8:41               ` Uros Bizjak
2019-09-03  7:57                 ` Hongtao Liu [this message]
2019-09-03 11:24                   ` Richard Biener
2019-09-03 11:33                     ` Richard Biener
2019-09-03 16:50                       ` Uros Bizjak
2019-09-04  1:42                         ` Hongtao Liu
2019-09-05  5:47                           ` Hongtao Liu
2019-09-05  8:54                             ` Uros Bizjak
2019-09-06 19:32                               ` [PATCH, i386]: Fix PR 91654: Runtime SPEC regression on Haswell Uros Bizjak
2019-09-02 10:23               ` [PATCH, i386]: Do not limit the cost of moves to/from XMM register to minimum 8 Richard Biener
2019-09-03  1:37                 ` Hongtao Liu
2019-08-29 20:01 Uros Bizjak
2019-08-30  7:12 ` Hongtao Liu
2019-08-30  7:16   ` Hongtao Liu
2019-08-30  7:22   ` Uros Bizjak
2019-08-30  7:30     ` Hongtao Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAMZc-bwwsGRHgUzYjZLju+ZrPx6X1bfK31nqR2RaBgK1vcFmAg@mail.gmail.com \
    --to=crazylht@gmail.com \
    --cc=amodra@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jakub@redhat.com \
    --cc=richard.guenther@gmail.com \
    --cc=ubizjak@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).