From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-508148-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 96960 invoked by alias); 2 Sep 2019 10:23:11 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 96950 invoked by uid 89); 2 Sep 2019 10:23:11 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-6.9 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,GIT_PATCH_2,GIT_PATCH_3,KAM_ASCII_DIVIDERS,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.1 spammy=outline, Outline, crazylht@gmail.com, crazylhtgmailcom
X-HELO: mail-lf1-f54.google.com
Received: from mail-lf1-f54.google.com (HELO mail-lf1-f54.google.com) (209.85.167.54) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 02 Sep 2019 10:23:09 +0000
Received: by mail-lf1-f54.google.com with SMTP id y4so3122312lfe.11        for <gcc-patches@gcc.gnu.org>; Mon, 02 Sep 2019 03:23:09 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;        d=gmail.com; s=20161025;        h=mime-version:references:in-reply-to:from:date:message-id:subject:to         :cc;        bh=H7nqvrq3hAYlHPGQh+RsJe7yQGhLwjj1rFiH7FlEA74=;        b=tSZQRPyRIAXRh7TzuQ9A5F9YL1ziZGGhZw3+RHVY8u5vkj6XLhYEe/nbZd6mmSkaVk         XRVT67rld8Y42TU0yqHw+7nidvB6slzScTY4dlptHsRrlDbpJLYx2ExJbO8JHIopya0b         gcfm+iOLLnmbC82UVbGUKyF8gUXmi+k+kyg6MidrkwYwU+W763yZB5dBKkKOvj6GQ/Rm         mRgmYftm6MOk+RwQW9Ubn13lL03YOjUY37rBi+dQGE7U+LAO/Hg2n1k137EncYwSCQ0/         ls6gWtYWWz+kSmO1PaxvxxQIBY/90UgBAMoDUbhXThlrc9kjG7aB7yF5irlFQ4QAmzC2         YNdA==
MIME-Version: 1.0
References: <CAFULd4Z88+aey62UENVeSQCzCx+ev7-AYbCgW-ox63qa7R6TtA@mail.gmail.com> <CAFULd4Yaoa3h4vtd=x0yposto8hsLouLAwSdF5P2thG9CuVC=A@mail.gmail.com> <CAFiYyc1HOHw0RTXKP32OpsgNqAHXyZoHOp7PWFV8Zc_LYpLb4Q@mail.gmail.com> <CAFULd4aoX-JqbkFECYSMHgCEx2zL=WkDFGR0ZrE4a5sywYW3Zw@mail.gmail.com> <20190831005151.GD9227@bubble.grove.modra.org> <B11C5072-1F6C-4BCA-B3DF-FB2490740858@gmail.com> <CAFULd4Z0PbsE4eMC035c4Tv1YFSg1JsWp8--ZWYS=gVBH1oR-g@mail.gmail.com> <CAMZc-bzzSMHcJGweXx0SyDBhjWjUHt1khcY+kYcc=89bEwH9eA@mail.gmail.com>
In-Reply-To: <CAMZc-bzzSMHcJGweXx0SyDBhjWjUHt1khcY+kYcc=89bEwH9eA@mail.gmail.com>
From: Richard Biener <richard.guenther@gmail.com>
Date: Mon, 02 Sep 2019 10:23:00 -0000
Message-ID: <CAFiYyc3sQE0m4CRNMsjPp9EsTsJ20VMefSdhoEEoD8-4WXLyRg@mail.gmail.com>
Subject: Re: [PATCH, i386]: Do not limit the cost of moves to/from XMM register to minimum 8.
To: Hongtao Liu <crazylht@gmail.com>
Cc: Uros Bizjak <ubizjak@gmail.com>, Jakub Jelinek <jakub@redhat.com>, Alan Modra <amodra@gmail.com>, 	"gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Content-Type: text/plain; charset="UTF-8"
X-IsSubscribed: yes
X-SW-Source: 2019-09/txt/msg00039.txt.bz2

On Mon, Sep 2, 2019 at 10:13 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> > which is not the case with core_cost (and similar with skylake_cost):
> >
> >   2, 2, 4,                /* cost of moving XMM,YMM,ZMM register */
> >   {6, 6, 6, 6, 12},            /* cost of loading SSE registers
> >                        in 32,64,128,256 and 512-bit */
> >   {6, 6, 6, 6, 12},            /* cost of storing SSE registers
> >                        in 32,64,128,256 and 512-bit */
> >   2, 2,                    /* SSE->integer and integer->SSE moves */
> >
> > We have the same cost of moving between integer registers (by default
> > set to 2), between SSE registers and between integer and SSE register
> > sets. I think that at least the cost of moves between regsets should
> > be substantially higher, rs6000 uses 3x cost of intra-regset moves;
> > that would translate to the value of 6. The value should be low enough
> > to keep the cost below the value that forces move through the memory.
> > Changing core register allocation cost of SSE <-> integer to:
> >
> > --cut here--
> > Index: config/i386/x86-tune-costs.h
> > ===================================================================
> > --- config/i386/x86-tune-costs.h        (revision 275281)
> > +++ config/i386/x86-tune-costs.h        (working copy)
> > @@ -2555,7 +2555,7 @@ struct processor_costs core_cost = {
> >                                            in 32,64,128,256 and 512-bit */
> >    {6, 6, 6, 6, 12},                    /* cost of storing SSE registers
> >                                            in 32,64,128,256 and 512-bit */
> > -  2, 2,                                        /* SSE->integer and
> > integer->SSE moves */
> > +  6, 6,                                        /* SSE->integer and
> > integer->SSE moves */
> >    /* End of register allocator costs.  */
> >    },
> >
> > --cut here--
> >
> > still produces direct move in gcc.target/i386/minmax-6.c
> >
> > I think that in addition to attached patch, values between 2 and 6
> > should be considered in benchmarking. Unfortunately, without access to
> > regressed SPEC tests, I can't analyse these changes by myself.
> >
> > Uros.
>
> Apply similar change to skylake_cost, on skylake workstation we got
> performance like:
> ---------------------------
> version                                                            |
> 548_exchange_r score
> gcc10_20180822:                                           |   10
> apply remove_max8                                       |   8.9
> also apply increase integer_tofrom_sse cost |   9.69
> -----------------------------
> Still 3% regression which is related to _gfortran_mminloc0_4_i4 in
> libgfortran.so.5.0.0.
>
> I found suspicious code as bellow, does it affect?

This should be fixed after

2019-08-27  Richard Biener  <rguenther@suse.de>

        * config/i386/i386-features.h
        (general_scalar_chain::~general_scalar_chain): Add.
        (general_scalar_chain::insns_conv): New bitmap.
        (general_scalar_chain::n_sse_to_integer): New.
        (general_scalar_chain::n_integer_to_sse): Likewise.
        (general_scalar_chain::make_vector_copies): Adjust signature.
        * config/i386/i386-features.c
        (general_scalar_chain::general_scalar_chain): Outline,
        initialize new members.
        (general_scalar_chain::~general_scalar_chain): New.
        (general_scalar_chain::mark_dual_mode_def): Record insns
        we need to insert conversions at and count them.
        (general_scalar_chain::compute_convert_gain): Account
        for conversion instructions at chain boundary.
        (general_scalar_chain::make_vector_copies): Generate a single
        copy for a def by a specific insn.
        (general_scalar_chain::convert_registers): First populate
        defs_map, then make copies at out-of chain insns.

where the only ???  is that we have

  const int sse_to_integer;     /* cost of moving SSE register to integer.  */

but not integer_to_sse.  In the hard_register sub-struct of processor_cost
we have both:

      const int sse_to_integer; /* cost of moving SSE register to integer.  */
      const int integer_to_sse; /* cost of moving integer register to SSE. */

IMHO that we have mostly the same kind of costs two times is odd.
And the compute_convert_gain function adds up apples and oranges.

> ------------------
> modified   gcc/config/i386/i386-features.c
> @@ -590,7 +590,7 @@ general_scalar_chain::compute_convert_gain ()
>    if (dump_file)
>      fprintf (dump_file, "  Instruction conversion gain: %d\n", gain);
>
> -  /* ???  What about integer to SSE?  */
> +  /* ???  What about integer to SSE?  */???
>    EXECUTE_IF_SET_IN_BITMAP (defs_conv, 0, insn_uid, bi)
>      cost += DF_REG_DEF_COUNT (insn_uid) * ix86_cost->sse_to_integer;
> ------------------
> --
> BR,
> Hongtao