From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-508283-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 112736 invoked by alias); 4 Sep 2019 01:42:35 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 112531 invoked by uid 89); 4 Sep 2019 01:42:34 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-7.1 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,GIT_PATCH_2,KAM_ASCII_DIVIDERS,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.1 spammy=H*r:ip*209.85.210.66, H*RU:209.85.210.66, HX-Spam-Relays-External:209.85.210.66
X-HELO: mail-ot1-f66.google.com
Received: from mail-ot1-f66.google.com (HELO mail-ot1-f66.google.com) (209.85.210.66) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 04 Sep 2019 01:42:32 +0000
Received: by mail-ot1-f66.google.com with SMTP id 97so16137836otr.4        for <gcc-patches@gcc.gnu.org>; Tue, 03 Sep 2019 18:42:32 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;        d=gmail.com; s=20161025;        h=mime-version:references:in-reply-to:from:date:message-id:subject:to         :cc;        bh=bz0f3bU+ctEOekdsc2RixZ9WvgTV6OcIAMa69jH3QU8=;        b=unGrWSLRsaliXLY+fghOWwOFU8w+GqAwWlkBd21buzrAzupTUynQbVI7tVfR2/VBF/         lxB4cKQNfX0YU1rLM6ciJuR1Q6Hd75SGH0oHmRj+Oj+ryNW0hpkN6UxgoIJH0ScAAUtg         VNPiyURVSJrexoax2jpiThU+U9+wF99VHf32k2hW3IGNpEOuHD4XY+unebqfR91SVPqA         esZW3bq8DvxPfotil3oP9+jOVVq9QyG61+JgHBcHW9ZGriKVGDmiez28FU0I+UpmZ+n6         44fNOTLJPz9euIuIQWrte5xvdYeH9MdhMCP+OH1dgdA405ZCtGTPvs1mudMujdeZlY1D         K3kg==
MIME-Version: 1.0
References: <CAFULd4Z88+aey62UENVeSQCzCx+ev7-AYbCgW-ox63qa7R6TtA@mail.gmail.com> <CAFULd4Yaoa3h4vtd=x0yposto8hsLouLAwSdF5P2thG9CuVC=A@mail.gmail.com> <CAFiYyc1HOHw0RTXKP32OpsgNqAHXyZoHOp7PWFV8Zc_LYpLb4Q@mail.gmail.com> <CAFULd4aoX-JqbkFECYSMHgCEx2zL=WkDFGR0ZrE4a5sywYW3Zw@mail.gmail.com> <20190831005151.GD9227@bubble.grove.modra.org> <B11C5072-1F6C-4BCA-B3DF-FB2490740858@gmail.com> <CAFULd4Z0PbsE4eMC035c4Tv1YFSg1JsWp8--ZWYS=gVBH1oR-g@mail.gmail.com> <CAMZc-bzzSMHcJGweXx0SyDBhjWjUHt1khcY+kYcc=89bEwH9eA@mail.gmail.com> <CAFULd4be1Du4hZb+YwS9meEk2G7wwf7KmFtigruo-ZA36hd0rg@mail.gmail.com> <CAMZc-bwwsGRHgUzYjZLju+ZrPx6X1bfK31nqR2RaBgK1vcFmAg@mail.gmail.com> <CAFiYyc1_aVbNA-GrL=RZCMHz3gB0GO5H7Ac_SW6ko_1vu_pzHw@mail.gmail.com> <CAFiYyc3mj=SC91tyrRF-4Jsy=F3B5z3A08bZTpGttp4WTuS1Yg@mail.gmail.com> <CAFULd4apv4jpm3Upm_t2HmGdbDDo6ezHS9N3DVTxWqm_Vp0AoQ@mail.gmail.com>
In-Reply-To: <CAFULd4apv4jpm3Upm_t2HmGdbDDo6ezHS9N3DVTxWqm_Vp0AoQ@mail.gmail.com>
From: Hongtao Liu <crazylht@gmail.com>
Date: Wed, 04 Sep 2019 01:42:00 -0000
Message-ID: <CAMZc-byKUUgD6GUcGkHiKdNk-wXijG97A7jeNpYeEuCeK-LXaw@mail.gmail.com>
Subject: Re: [PATCH, i386]: Do not limit the cost of moves to/from XMM register to minimum 8.
To: Uros Bizjak <ubizjak@gmail.com>
Cc: Richard Biener <richard.guenther@gmail.com>, Jakub Jelinek <jakub@redhat.com>, 	Alan Modra <amodra@gmail.com>, "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Content-Type: text/plain; charset="UTF-8"
X-IsSubscribed: yes
X-SW-Source: 2019-09/txt/msg00174.txt.bz2

On Wed, Sep 4, 2019 at 12:50 AM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Tue, Sep 3, 2019 at 1:33 PM Richard Biener
> <richard.guenther@gmail.com> wrote:
>
> > > > Note:
> > > > Removing limit of cost would introduce lots of regressions in SPEC2017 as follow
> > > > --------------------------------
> > > > 531.deepsjeng_r  -7.18%
> > > > 548.exchange_r  -6.70%
> > > > 557.xz_r -6.74%
> > > > 508.namd_r -2.81%
> > > > 527.cam4_r -6.48%
> > > > 544.nab_r -3.99%
> > > >
> > > > Tested on skylake server.
> > > > -------------------------------------
> > > > How about  changing cost from 2 to 8 until we figure out a better number.
> > >
> > > Certainly works for me.  Note the STV code uses the "other" sse_to_integer
> > > number and the numbers in question here are those for the RA.  There's
> > > a multitude of values used in the tables here, including some a lot larger.
> > > So the overall bumping to 8 certainly was the wrong thing to do and instead
> > > individual numbers should have been adjusted (didn't look at the history
> > > of that bumping).
> >
> > For reference:
> >
> > r125951 | uros | 2007-06-22 19:51:06 +0200 (Fri, 22 Jun 2007) | 6 lines
> >
> >     PR target/32413
> >     * config/i386/i386.c (ix86_register_move_cost): Rise the cost of
> >     moves between MMX/SSE registers to at least 8 units to prevent
> >     ICE caused by non-tieable SI/HI/QImodes in SSE registers.
> >
> > should probably have been "twice the cost of X" or something like that
> > instead where X be some reg-reg move cost.
>
> Thanks for the reference. It looks that the patch fixes the issue in
> the wrong place, this should be solved in
> inline_secondary_memory_needed:
>
>       /* Between SSE and general, we have moves no larger than word size.  */
>       if (!(INTEGER_CLASS_P (class1) || INTEGER_CLASS_P (class2))
>            || GET_MODE_SIZE (mode) < GET_MODE_SIZE (SImode)
>            || GET_MODE_SIZE (mode) > UNITS_PER_WORD)
>         return true;
>
> as an alternative to implement QI and HImode moves as a SImode move
> between SSE and int<->SSE registers. We have
> ix86_secondary_memory_needed_mode that extends QI and HImode secondary
> memory to SImode, so this should solve PR32413.
>
> Other than that, what to do with the bizzare property of direct moves
> that benchmark far worse than indirect moves? I was expecting that
> keeping the cost of direct inter-regset moves just a bit below the
> cost of int<->mem<->xmm, but (much ?) higher than itra-regset moves
> would prevent unwanted wandering of values between register sets,
> while still generating the direct move when needed. While this almost

I've not tested it yet.
So i'll start a test about this patch(change cost from 2-->6) with
Richard's change.
I'll keep you informed when finishing test.

> fixes the runtime regression, it is not clear to me from Hongtao Liu's
> message if  Richard's 2019-08-27 fixes the remaining regression or
> not). Liu, can you please clarify?
>
--------------------------------
531.deepsjeng_r  -7.18%
548.exchange_r  -6.70%
557.xz_r -6.74%
508.namd_r -2.81%
527.cam4_r -6.48%
544.nab_r -3.99%

Tested on skylake server.
-------------------------------------
Those regressions are comparing gcc10_20190830 to gcc10_20190824 which
are mainly caused by removing limit of 8.

> > >  For example Pentium4 has quite high bases for move
> > > costs, like xmm <-> xmm move costing 12 and SSE->integer costing 20
> > > while the opposite 12.
> > >
> > > So yes, we want to revert the patch by applying its effect to the
> > > individual cost tables so we can revisit this for the still interesting
> > > micro-architectures.
>
> Uros.


-- 
BR,
Hongtao