public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Hongtao Liu <crazylht@gmail.com>
To: Uros Bizjak <ubizjak@gmail.com>
Cc: Richard Biener <richard.guenther@gmail.com>,
	Jakub Jelinek <jakub@redhat.com>, 	Alan Modra <amodra@gmail.com>,
	"gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH, i386]: Do not limit the cost of moves to/from XMM register to minimum 8.
Date: Thu, 05 Sep 2019 05:47:00 -0000	[thread overview]
Message-ID: <CAMZc-bz1Mb-w45c-SQQfHXFe-AVptuk2juXzFdNNQH_4dGSv7w@mail.gmail.com> (raw)
In-Reply-To: <CAMZc-byKUUgD6GUcGkHiKdNk-wXijG97A7jeNpYeEuCeK-LXaw@mail.gmail.com>

On Wed, Sep 4, 2019 at 9:44 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Wed, Sep 4, 2019 at 12:50 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > On Tue, Sep 3, 2019 at 1:33 PM Richard Biener
> > <richard.guenther@gmail.com> wrote:
> >
> > > > > Note:
> > > > > Removing limit of cost would introduce lots of regressions in SPEC2017 as follow
> > > > > --------------------------------
> > > > > 531.deepsjeng_r  -7.18%
> > > > > 548.exchange_r  -6.70%
> > > > > 557.xz_r -6.74%
> > > > > 508.namd_r -2.81%
> > > > > 527.cam4_r -6.48%
> > > > > 544.nab_r -3.99%
> > > > >
> > > > > Tested on skylake server.
> > > > > -------------------------------------
> > > > > How about  changing cost from 2 to 8 until we figure out a better number.
> > > >
> > > > Certainly works for me.  Note the STV code uses the "other" sse_to_integer
> > > > number and the numbers in question here are those for the RA.  There's
> > > > a multitude of values used in the tables here, including some a lot larger.
> > > > So the overall bumping to 8 certainly was the wrong thing to do and instead
> > > > individual numbers should have been adjusted (didn't look at the history
> > > > of that bumping).
> > >
> > > For reference:
> > >
> > > r125951 | uros | 2007-06-22 19:51:06 +0200 (Fri, 22 Jun 2007) | 6 lines
> > >
> > >     PR target/32413
> > >     * config/i386/i386.c (ix86_register_move_cost): Rise the cost of
> > >     moves between MMX/SSE registers to at least 8 units to prevent
> > >     ICE caused by non-tieable SI/HI/QImodes in SSE registers.
> > >
> > > should probably have been "twice the cost of X" or something like that
> > > instead where X be some reg-reg move cost.
> >
> > Thanks for the reference. It looks that the patch fixes the issue in
> > the wrong place, this should be solved in
> > inline_secondary_memory_needed:
> >
> >       /* Between SSE and general, we have moves no larger than word size.  */
> >       if (!(INTEGER_CLASS_P (class1) || INTEGER_CLASS_P (class2))
> >            || GET_MODE_SIZE (mode) < GET_MODE_SIZE (SImode)
> >            || GET_MODE_SIZE (mode) > UNITS_PER_WORD)
> >         return true;
> >
> > as an alternative to implement QI and HImode moves as a SImode move
> > between SSE and int<->SSE registers. We have
> > ix86_secondary_memory_needed_mode that extends QI and HImode secondary
> > memory to SImode, so this should solve PR32413.
> >
> > Other than that, what to do with the bizzare property of direct moves
> > that benchmark far worse than indirect moves? I was expecting that
> > keeping the cost of direct inter-regset moves just a bit below the
> > cost of int<->mem<->xmm, but (much ?) higher than itra-regset moves
> > would prevent unwanted wandering of values between register sets,
> > while still generating the direct move when needed. While this almost
>
> I've not tested it yet.
> So i'll start a test about this patch(change cost from 2-->6) with
> Richard's change.
> I'll keep you informed when finishing test.
>
> > fixes the runtime regression, it is not clear to me from Hongtao Liu's
> > message if  Richard's 2019-08-27 fixes the remaining regression or
> > not). Liu, can you please clarify?
> >
> --------------------------------
> 531.deepsjeng_r  -7.18%
> 548.exchange_r  -6.70%
> 557.xz_r -6.74%
> 508.namd_r -2.81%
> 527.cam4_r -6.48%
> 544.nab_r -3.99%
>
> Tested on skylake server.
> -------------------------------------
> Those regressions are comparing gcc10_20190830 to gcc10_20190824 which
> are mainly caused by removing limit of 8.
>
> > > >  For example Pentium4 has quite high bases for move
> > > > costs, like xmm <-> xmm move costing 12 and SSE->integer costing 20
> > > > while the opposite 12.
> > > >
> > > > So yes, we want to revert the patch by applying its effect to the
> > > > individual cost tables so we can revisit this for the still interesting
> > > > micro-architectures.
> >
> > Uros.
>
>
>
> --
> BR,
> Hongtao

Change cost from 2->6 got
-------------
531.deepsjeng_r  9.64%
548.exchange_r  10.24%
557.xc_r              7.99%
508.namd_r         1.08%
527.cam4_r          6.91%
553.nab_r            3.06%
------------

for 531,548,557,527, even better comparing to version before regression.
for 508,533, still little regressions comparing to version before regression.
-- 
BR,
Hongtao

  reply	other threads:[~2019-09-05  5:47 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-28 16:12 [PATCH, i386]: Improve STV conversion of shifts Uros Bizjak
2019-08-29  8:49 ` Uros Bizjak
2019-08-29 10:11   ` [PATCH, i386]: Fix secondary_reload_needed (was: Re: [PATCH, i386]: Improve STV conversion of shifts) Uros Bizjak
2019-08-29 10:17     ` Uros Bizjak
2019-08-30  7:40   ` [PATCH, i386]: Improve STV conversion of shifts Richard Biener
2019-08-30  8:29     ` [PATCH, i386]: Do not limit the cost of moves to/from XMM register to minimum 8 Uros Bizjak
2019-08-31  3:01       ` Alan Modra
2019-08-31 16:12         ` Richard Biener
2019-09-01 19:49           ` Uros Bizjak
2019-09-02  7:16             ` Alan Modra
2019-09-02  8:13             ` Hongtao Liu
2019-09-02  8:41               ` Uros Bizjak
2019-09-03  7:57                 ` Hongtao Liu
2019-09-03 11:24                   ` Richard Biener
2019-09-03 11:33                     ` Richard Biener
2019-09-03 16:50                       ` Uros Bizjak
2019-09-04  1:42                         ` Hongtao Liu
2019-09-05  5:47                           ` Hongtao Liu [this message]
2019-09-05  8:54                             ` Uros Bizjak
2019-09-06 19:32                               ` [PATCH, i386]: Fix PR 91654: Runtime SPEC regression on Haswell Uros Bizjak
2019-09-02 10:23               ` [PATCH, i386]: Do not limit the cost of moves to/from XMM register to minimum 8 Richard Biener
2019-09-03  1:37                 ` Hongtao Liu
2019-08-29 20:01 Uros Bizjak
2019-08-30  7:12 ` Hongtao Liu
2019-08-30  7:16   ` Hongtao Liu
2019-08-30  7:22   ` Uros Bizjak
2019-08-30  7:30     ` Hongtao Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAMZc-bz1Mb-w45c-SQQfHXFe-AVptuk2juXzFdNNQH_4dGSv7w@mail.gmail.com \
    --to=crazylht@gmail.com \
    --cc=amodra@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jakub@redhat.com \
    --cc=richard.guenther@gmail.com \
    --cc=ubizjak@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).