From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 96960 invoked by alias); 2 Sep 2019 10:23:11 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 96950 invoked by uid 89); 2 Sep 2019 10:23:11 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-6.9 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,GIT_PATCH_2,GIT_PATCH_3,KAM_ASCII_DIVIDERS,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.1 spammy=outline, Outline, crazylht@gmail.com, crazylhtgmailcom X-HELO: mail-lf1-f54.google.com Received: from mail-lf1-f54.google.com (HELO mail-lf1-f54.google.com) (209.85.167.54) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 02 Sep 2019 10:23:09 +0000 Received: by mail-lf1-f54.google.com with SMTP id y4so3122312lfe.11 for ; Mon, 02 Sep 2019 03:23:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=H7nqvrq3hAYlHPGQh+RsJe7yQGhLwjj1rFiH7FlEA74=; b=tSZQRPyRIAXRh7TzuQ9A5F9YL1ziZGGhZw3+RHVY8u5vkj6XLhYEe/nbZd6mmSkaVk XRVT67rld8Y42TU0yqHw+7nidvB6slzScTY4dlptHsRrlDbpJLYx2ExJbO8JHIopya0b gcfm+iOLLnmbC82UVbGUKyF8gUXmi+k+kyg6MidrkwYwU+W763yZB5dBKkKOvj6GQ/Rm mRgmYftm6MOk+RwQW9Ubn13lL03YOjUY37rBi+dQGE7U+LAO/Hg2n1k137EncYwSCQ0/ ls6gWtYWWz+kSmO1PaxvxxQIBY/90UgBAMoDUbhXThlrc9kjG7aB7yF5irlFQ4QAmzC2 YNdA== MIME-Version: 1.0 References: <20190831005151.GD9227@bubble.grove.modra.org> In-Reply-To: From: Richard Biener Date: Mon, 02 Sep 2019 10:23:00 -0000 Message-ID: Subject: Re: [PATCH, i386]: Do not limit the cost of moves to/from XMM register to minimum 8. To: Hongtao Liu Cc: Uros Bizjak , Jakub Jelinek , Alan Modra , "gcc-patches@gcc.gnu.org" Content-Type: text/plain; charset="UTF-8" X-IsSubscribed: yes X-SW-Source: 2019-09/txt/msg00039.txt.bz2 On Mon, Sep 2, 2019 at 10:13 AM Hongtao Liu wrote: > > > which is not the case with core_cost (and similar with skylake_cost): > > > > 2, 2, 4, /* cost of moving XMM,YMM,ZMM register */ > > {6, 6, 6, 6, 12}, /* cost of loading SSE registers > > in 32,64,128,256 and 512-bit */ > > {6, 6, 6, 6, 12}, /* cost of storing SSE registers > > in 32,64,128,256 and 512-bit */ > > 2, 2, /* SSE->integer and integer->SSE moves */ > > > > We have the same cost of moving between integer registers (by default > > set to 2), between SSE registers and between integer and SSE register > > sets. I think that at least the cost of moves between regsets should > > be substantially higher, rs6000 uses 3x cost of intra-regset moves; > > that would translate to the value of 6. The value should be low enough > > to keep the cost below the value that forces move through the memory. > > Changing core register allocation cost of SSE <-> integer to: > > > > --cut here-- > > Index: config/i386/x86-tune-costs.h > > =================================================================== > > --- config/i386/x86-tune-costs.h (revision 275281) > > +++ config/i386/x86-tune-costs.h (working copy) > > @@ -2555,7 +2555,7 @@ struct processor_costs core_cost = { > > in 32,64,128,256 and 512-bit */ > > {6, 6, 6, 6, 12}, /* cost of storing SSE registers > > in 32,64,128,256 and 512-bit */ > > - 2, 2, /* SSE->integer and > > integer->SSE moves */ > > + 6, 6, /* SSE->integer and > > integer->SSE moves */ > > /* End of register allocator costs. */ > > }, > > > > --cut here-- > > > > still produces direct move in gcc.target/i386/minmax-6.c > > > > I think that in addition to attached patch, values between 2 and 6 > > should be considered in benchmarking. Unfortunately, without access to > > regressed SPEC tests, I can't analyse these changes by myself. > > > > Uros. > > Apply similar change to skylake_cost, on skylake workstation we got > performance like: > --------------------------- > version | > 548_exchange_r score > gcc10_20180822: | 10 > apply remove_max8 | 8.9 > also apply increase integer_tofrom_sse cost | 9.69 > ----------------------------- > Still 3% regression which is related to _gfortran_mminloc0_4_i4 in > libgfortran.so.5.0.0. > > I found suspicious code as bellow, does it affect? This should be fixed after 2019-08-27 Richard Biener * config/i386/i386-features.h (general_scalar_chain::~general_scalar_chain): Add. (general_scalar_chain::insns_conv): New bitmap. (general_scalar_chain::n_sse_to_integer): New. (general_scalar_chain::n_integer_to_sse): Likewise. (general_scalar_chain::make_vector_copies): Adjust signature. * config/i386/i386-features.c (general_scalar_chain::general_scalar_chain): Outline, initialize new members. (general_scalar_chain::~general_scalar_chain): New. (general_scalar_chain::mark_dual_mode_def): Record insns we need to insert conversions at and count them. (general_scalar_chain::compute_convert_gain): Account for conversion instructions at chain boundary. (general_scalar_chain::make_vector_copies): Generate a single copy for a def by a specific insn. (general_scalar_chain::convert_registers): First populate defs_map, then make copies at out-of chain insns. where the only ??? is that we have const int sse_to_integer; /* cost of moving SSE register to integer. */ but not integer_to_sse. In the hard_register sub-struct of processor_cost we have both: const int sse_to_integer; /* cost of moving SSE register to integer. */ const int integer_to_sse; /* cost of moving integer register to SSE. */ IMHO that we have mostly the same kind of costs two times is odd. And the compute_convert_gain function adds up apples and oranges. > ------------------ > modified gcc/config/i386/i386-features.c > @@ -590,7 +590,7 @@ general_scalar_chain::compute_convert_gain () > if (dump_file) > fprintf (dump_file, " Instruction conversion gain: %d\n", gain); > > - /* ??? What about integer to SSE? */ > + /* ??? What about integer to SSE? */??? > EXECUTE_IF_SET_IN_BITMAP (defs_conv, 0, insn_uid, bi) > cost += DF_REG_DEF_COUNT (insn_uid) * ix86_cost->sse_to_integer; > ------------------ > -- > BR, > Hongtao