From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 99382 invoked by alias); 3 Sep 2019 11:33:27 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 99374 invoked by uid 89); 3 Sep 2019 11:33:27 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-6.9 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,GIT_PATCH_2,GIT_PATCH_3,KAM_ASCII_DIVIDERS,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.1 spammy=rise, H*i:sk:RZCMHz3, H*f:sk:RZCMHz3 X-HELO: mail-lj1-f193.google.com Received: from mail-lj1-f193.google.com (HELO mail-lj1-f193.google.com) (209.85.208.193) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 03 Sep 2019 11:33:25 +0000 Received: by mail-lj1-f193.google.com with SMTP id z17so15683162ljz.0 for ; Tue, 03 Sep 2019 04:33:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=1k22jZrlKGpRoP119XDBmdjm+9aXn1cQyOpSdB4f05Q=; b=a+SWyZfylIa/0xb3VBq6Zs/QXt1ajtaB5etbgSDP/t60JQaRivFkZRzSc4ZJ8ob2aN awBaCutCgqvcKsDLbgAv3y9Wh1eA82wbzePBwjdvePqyt+f9K5FWTaODZ86pYPJNXH9n S4Uw6dD297mxqkvhyNhE0IMi1v7HfEPfP+Oa1g+iGzhxJ4Hti2usIii5w5Vps6O61bfW X9e6FoaS7r50OM7p9LDWSjOMwg1NfUVE2/GYl0hneJM4rBpTTT5D+mg/NRWCT9/OllyE rnseKZBGkqYcm0QRGmJGmQe9uUZVFgWJ60nlY5An2NnBzFStzsblX4cvMI53JxWPOjhr /30Q== MIME-Version: 1.0 References: <20190831005151.GD9227@bubble.grove.modra.org> In-Reply-To: From: Richard Biener Date: Tue, 03 Sep 2019 11:33:00 -0000 Message-ID: Subject: Re: [PATCH, i386]: Do not limit the cost of moves to/from XMM register to minimum 8. To: Hongtao Liu Cc: Uros Bizjak , Jakub Jelinek , Alan Modra , "gcc-patches@gcc.gnu.org" Content-Type: text/plain; charset="UTF-8" X-IsSubscribed: yes X-SW-Source: 2019-09/txt/msg00107.txt.bz2 On Tue, Sep 3, 2019 at 1:24 PM Richard Biener wrote: > > On Tue, Sep 3, 2019 at 9:57 AM Hongtao Liu wrote: > > > > On Mon, Sep 2, 2019 at 4:41 PM Uros Bizjak wrote: > > > > > > On Mon, Sep 2, 2019 at 10:13 AM Hongtao Liu wrote: > > > > > > > > > which is not the case with core_cost (and similar with skylake_cost): > > > > > > > > > > 2, 2, 4, /* cost of moving XMM,YMM,ZMM register */ > > > > > {6, 6, 6, 6, 12}, /* cost of loading SSE registers > > > > > in 32,64,128,256 and 512-bit */ > > > > > {6, 6, 6, 6, 12}, /* cost of storing SSE registers > > > > > in 32,64,128,256 and 512-bit */ > > > > > 2, 2, /* SSE->integer and integer->SSE moves */ > > > > > > > > > > We have the same cost of moving between integer registers (by default > > > > > set to 2), between SSE registers and between integer and SSE register > > > > > sets. I think that at least the cost of moves between regsets should > > > > > be substantially higher, rs6000 uses 3x cost of intra-regset moves; > > > > > that would translate to the value of 6. The value should be low enough > > > > > to keep the cost below the value that forces move through the memory. > > > > > Changing core register allocation cost of SSE <-> integer to: > > > > > > > > > > --cut here-- > > > > > Index: config/i386/x86-tune-costs.h > > > > > =================================================================== > > > > > --- config/i386/x86-tune-costs.h (revision 275281) > > > > > +++ config/i386/x86-tune-costs.h (working copy) > > > > > @@ -2555,7 +2555,7 @@ struct processor_costs core_cost = { > > > > > in 32,64,128,256 and 512-bit */ > > > > > {6, 6, 6, 6, 12}, /* cost of storing SSE registers > > > > > in 32,64,128,256 and 512-bit */ > > > > > - 2, 2, /* SSE->integer and > > > > > integer->SSE moves */ > > > > > + 6, 6, /* SSE->integer and > > > > > integer->SSE moves */ > > > > > /* End of register allocator costs. */ > > > > > }, > > > > > > > > > > --cut here-- > > > > > > > > > > still produces direct move in gcc.target/i386/minmax-6.c > > > > > > > > > > I think that in addition to attached patch, values between 2 and 6 > > > > > should be considered in benchmarking. Unfortunately, without access to > > > > > regressed SPEC tests, I can't analyse these changes by myself. > > > > > > > > > > Uros. > > > > > > > > Apply similar change to skylake_cost, on skylake workstation we got > > > > performance like: > > > > --------------------------- > > > > version | > > > > 548_exchange_r score > > > > gcc10_20180822: | 10 > > > > apply remove_max8 | 8.9 > > > > also apply increase integer_tofrom_sse cost | 9.69 > > > > ----------------------------- > > > > Still 3% regression which is related to _gfortran_mminloc0_4_i4 in > > > > libgfortran.so.5.0.0. > > > > > > > > I found suspicious code as bellow, does it affect? > > > > > > Hard to say without access to the test, but I'm glad that changing the > > > knob has noticeable effect. I think that (as said by Alan) a fine-tune > > > of register pressure calculation will be needed to push this forward. > > > > > > Uros. > > > > > > > ------------------ > > > > modified gcc/config/i386/i386-features.c > > > > @@ -590,7 +590,7 @@ general_scalar_chain::compute_convert_gain () > > > > if (dump_file) > > > > fprintf (dump_file, " Instruction conversion gain: %d\n", gain); > > > > > > > > - /* ??? What about integer to SSE? */ > > > > + /* ??? What about integer to SSE? */??? > > > > EXECUTE_IF_SET_IN_BITMAP (defs_conv, 0, insn_uid, bi) > > > > cost += DF_REG_DEF_COUNT (insn_uid) * ix86_cost->sse_to_integer; > > > > ------------------ > > > > -- > > > > BR, > > > > Hongtao > > > > Note: > > Removing limit of cost would introduce lots of regressions in SPEC2017 as follow > > -------------------------------- > > 531.deepsjeng_r -7.18% > > 548.exchange_r -6.70% > > 557.xz_r -6.74% > > 508.namd_r -2.81% > > 527.cam4_r -6.48% > > 544.nab_r -3.99% > > > > Tested on skylake server. > > ------------------------------------- > > How about changing cost from 2 to 8 until we figure out a better number. > > Certainly works for me. Note the STV code uses the "other" sse_to_integer > number and the numbers in question here are those for the RA. There's > a multitude of values used in the tables here, including some a lot larger. > So the overall bumping to 8 certainly was the wrong thing to do and instead > individual numbers should have been adjusted (didn't look at the history > of that bumping). For reference: r125951 | uros | 2007-06-22 19:51:06 +0200 (Fri, 22 Jun 2007) | 6 lines PR target/32413 * config/i386/i386.c (ix86_register_move_cost): Rise the cost of moves between MMX/SSE registers to at least 8 units to prevent ICE caused by non-tieable SI/HI/QImodes in SSE registers. should probably have been "twice the cost of X" or something like that instead where X be some reg-reg move cost. > For example Pentium4 has quite high bases for move > costs, like xmm <-> xmm move costing 12 and SSE->integer costing 20 > while the opposite 12. > > So yes, we want to revert the patch by applying its effect to the > individual cost tables so we can revisit this for the still interesting > micro-architectures. > > Richard. > > > -- > > BR, > > Hongtao