From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-508216-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 99382 invoked by alias); 3 Sep 2019 11:33:27 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 99374 invoked by uid 89); 3 Sep 2019 11:33:27 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-6.9 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,GIT_PATCH_2,GIT_PATCH_3,KAM_ASCII_DIVIDERS,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.1 spammy=rise, H*i:sk:RZCMHz3, H*f:sk:RZCMHz3
X-HELO: mail-lj1-f193.google.com
Received: from mail-lj1-f193.google.com (HELO mail-lj1-f193.google.com) (209.85.208.193) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 03 Sep 2019 11:33:25 +0000
Received: by mail-lj1-f193.google.com with SMTP id z17so15683162ljz.0        for <gcc-patches@gcc.gnu.org>; Tue, 03 Sep 2019 04:33:24 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;        d=gmail.com; s=20161025;        h=mime-version:references:in-reply-to:from:date:message-id:subject:to         :cc;        bh=1k22jZrlKGpRoP119XDBmdjm+9aXn1cQyOpSdB4f05Q=;        b=a+SWyZfylIa/0xb3VBq6Zs/QXt1ajtaB5etbgSDP/t60JQaRivFkZRzSc4ZJ8ob2aN         awBaCutCgqvcKsDLbgAv3y9Wh1eA82wbzePBwjdvePqyt+f9K5FWTaODZ86pYPJNXH9n         S4Uw6dD297mxqkvhyNhE0IMi1v7HfEPfP+Oa1g+iGzhxJ4Hti2usIii5w5Vps6O61bfW         X9e6FoaS7r50OM7p9LDWSjOMwg1NfUVE2/GYl0hneJM4rBpTTT5D+mg/NRWCT9/OllyE         rnseKZBGkqYcm0QRGmJGmQe9uUZVFgWJ60nlY5An2NnBzFStzsblX4cvMI53JxWPOjhr         /30Q==
MIME-Version: 1.0
References: <CAFULd4Z88+aey62UENVeSQCzCx+ev7-AYbCgW-ox63qa7R6TtA@mail.gmail.com> <CAFULd4Yaoa3h4vtd=x0yposto8hsLouLAwSdF5P2thG9CuVC=A@mail.gmail.com> <CAFiYyc1HOHw0RTXKP32OpsgNqAHXyZoHOp7PWFV8Zc_LYpLb4Q@mail.gmail.com> <CAFULd4aoX-JqbkFECYSMHgCEx2zL=WkDFGR0ZrE4a5sywYW3Zw@mail.gmail.com> <20190831005151.GD9227@bubble.grove.modra.org> <B11C5072-1F6C-4BCA-B3DF-FB2490740858@gmail.com> <CAFULd4Z0PbsE4eMC035c4Tv1YFSg1JsWp8--ZWYS=gVBH1oR-g@mail.gmail.com> <CAMZc-bzzSMHcJGweXx0SyDBhjWjUHt1khcY+kYcc=89bEwH9eA@mail.gmail.com> <CAFULd4be1Du4hZb+YwS9meEk2G7wwf7KmFtigruo-ZA36hd0rg@mail.gmail.com> <CAMZc-bwwsGRHgUzYjZLju+ZrPx6X1bfK31nqR2RaBgK1vcFmAg@mail.gmail.com> <CAFiYyc1_aVbNA-GrL=RZCMHz3gB0GO5H7Ac_SW6ko_1vu_pzHw@mail.gmail.com>
In-Reply-To: <CAFiYyc1_aVbNA-GrL=RZCMHz3gB0GO5H7Ac_SW6ko_1vu_pzHw@mail.gmail.com>
From: Richard Biener <richard.guenther@gmail.com>
Date: Tue, 03 Sep 2019 11:33:00 -0000
Message-ID: <CAFiYyc3mj=SC91tyrRF-4Jsy=F3B5z3A08bZTpGttp4WTuS1Yg@mail.gmail.com>
Subject: Re: [PATCH, i386]: Do not limit the cost of moves to/from XMM register to minimum 8.
To: Hongtao Liu <crazylht@gmail.com>
Cc: Uros Bizjak <ubizjak@gmail.com>, Jakub Jelinek <jakub@redhat.com>, Alan Modra <amodra@gmail.com>, 	"gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Content-Type: text/plain; charset="UTF-8"
X-IsSubscribed: yes
X-SW-Source: 2019-09/txt/msg00107.txt.bz2

On Tue, Sep 3, 2019 at 1:24 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Tue, Sep 3, 2019 at 9:57 AM Hongtao Liu <crazylht@gmail.com> wrote:
> >
> > On Mon, Sep 2, 2019 at 4:41 PM Uros Bizjak <ubizjak@gmail.com> wrote:
> > >
> > > On Mon, Sep 2, 2019 at 10:13 AM Hongtao Liu <crazylht@gmail.com> wrote:
> > > >
> > > > > which is not the case with core_cost (and similar with skylake_cost):
> > > > >
> > > > >   2, 2, 4,                /* cost of moving XMM,YMM,ZMM register */
> > > > >   {6, 6, 6, 6, 12},            /* cost of loading SSE registers
> > > > >                        in 32,64,128,256 and 512-bit */
> > > > >   {6, 6, 6, 6, 12},            /* cost of storing SSE registers
> > > > >                        in 32,64,128,256 and 512-bit */
> > > > >   2, 2,                    /* SSE->integer and integer->SSE moves */
> > > > >
> > > > > We have the same cost of moving between integer registers (by default
> > > > > set to 2), between SSE registers and between integer and SSE register
> > > > > sets. I think that at least the cost of moves between regsets should
> > > > > be substantially higher, rs6000 uses 3x cost of intra-regset moves;
> > > > > that would translate to the value of 6. The value should be low enough
> > > > > to keep the cost below the value that forces move through the memory.
> > > > > Changing core register allocation cost of SSE <-> integer to:
> > > > >
> > > > > --cut here--
> > > > > Index: config/i386/x86-tune-costs.h
> > > > > ===================================================================
> > > > > --- config/i386/x86-tune-costs.h        (revision 275281)
> > > > > +++ config/i386/x86-tune-costs.h        (working copy)
> > > > > @@ -2555,7 +2555,7 @@ struct processor_costs core_cost = {
> > > > >                                            in 32,64,128,256 and 512-bit */
> > > > >    {6, 6, 6, 6, 12},                    /* cost of storing SSE registers
> > > > >                                            in 32,64,128,256 and 512-bit */
> > > > > -  2, 2,                                        /* SSE->integer and
> > > > > integer->SSE moves */
> > > > > +  6, 6,                                        /* SSE->integer and
> > > > > integer->SSE moves */
> > > > >    /* End of register allocator costs.  */
> > > > >    },
> > > > >
> > > > > --cut here--
> > > > >
> > > > > still produces direct move in gcc.target/i386/minmax-6.c
> > > > >
> > > > > I think that in addition to attached patch, values between 2 and 6
> > > > > should be considered in benchmarking. Unfortunately, without access to
> > > > > regressed SPEC tests, I can't analyse these changes by myself.
> > > > >
> > > > > Uros.
> > > >
> > > > Apply similar change to skylake_cost, on skylake workstation we got
> > > > performance like:
> > > > ---------------------------
> > > > version                                                            |
> > > > 548_exchange_r score
> > > > gcc10_20180822:                                           |   10
> > > > apply remove_max8                                       |   8.9
> > > > also apply increase integer_tofrom_sse cost |   9.69
> > > > -----------------------------
> > > > Still 3% regression which is related to _gfortran_mminloc0_4_i4 in
> > > > libgfortran.so.5.0.0.
> > > >
> > > > I found suspicious code as bellow, does it affect?
> > >
> > > Hard to say without access to the test, but I'm glad that changing the
> > > knob has noticeable effect. I think that (as said by Alan) a fine-tune
> > > of register pressure calculation will be needed to push this forward.
> > >
> > > Uros.
> > >
> > > > ------------------
> > > > modified   gcc/config/i386/i386-features.c
> > > > @@ -590,7 +590,7 @@ general_scalar_chain::compute_convert_gain ()
> > > >    if (dump_file)
> > > >      fprintf (dump_file, "  Instruction conversion gain: %d\n", gain);
> > > >
> > > > -  /* ???  What about integer to SSE?  */
> > > > +  /* ???  What about integer to SSE?  */???
> > > >    EXECUTE_IF_SET_IN_BITMAP (defs_conv, 0, insn_uid, bi)
> > > >      cost += DF_REG_DEF_COUNT (insn_uid) * ix86_cost->sse_to_integer;
> > > > ------------------
> > > > --
> > > > BR,
> > > > Hongtao
> >
> > Note:
> > Removing limit of cost would introduce lots of regressions in SPEC2017 as follow
> > --------------------------------
> > 531.deepsjeng_r  -7.18%
> > 548.exchange_r  -6.70%
> > 557.xz_r -6.74%
> > 508.namd_r -2.81%
> > 527.cam4_r -6.48%
> > 544.nab_r -3.99%
> >
> > Tested on skylake server.
> > -------------------------------------
> > How about  changing cost from 2 to 8 until we figure out a better number.
>
> Certainly works for me.  Note the STV code uses the "other" sse_to_integer
> number and the numbers in question here are those for the RA.  There's
> a multitude of values used in the tables here, including some a lot larger.
> So the overall bumping to 8 certainly was the wrong thing to do and instead
> individual numbers should have been adjusted (didn't look at the history
> of that bumping).

For reference:

r125951 | uros | 2007-06-22 19:51:06 +0200 (Fri, 22 Jun 2007) | 6 lines

    PR target/32413
    * config/i386/i386.c (ix86_register_move_cost): Rise the cost of
    moves between MMX/SSE registers to at least 8 units to prevent
    ICE caused by non-tieable SI/HI/QImodes in SSE registers.

should probably have been "twice the cost of X" or something like that
instead where X be some reg-reg move cost.

>  For example Pentium4 has quite high bases for move
> costs, like xmm <-> xmm move costing 12 and SSE->integer costing 20
> while the opposite 12.
>
> So yes, we want to revert the patch by applying its effect to the
> individual cost tables so we can revisit this for the still interesting
> micro-architectures.
>
> Richard.
>
> > --
> > BR,
> > Hongtao