From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 112736 invoked by alias); 4 Sep 2019 01:42:35 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 112531 invoked by uid 89); 4 Sep 2019 01:42:34 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-7.1 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,GIT_PATCH_2,KAM_ASCII_DIVIDERS,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.1 spammy=H*r:ip*209.85.210.66, H*RU:209.85.210.66, HX-Spam-Relays-External:209.85.210.66 X-HELO: mail-ot1-f66.google.com Received: from mail-ot1-f66.google.com (HELO mail-ot1-f66.google.com) (209.85.210.66) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 04 Sep 2019 01:42:32 +0000 Received: by mail-ot1-f66.google.com with SMTP id 97so16137836otr.4 for ; Tue, 03 Sep 2019 18:42:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=bz0f3bU+ctEOekdsc2RixZ9WvgTV6OcIAMa69jH3QU8=; b=unGrWSLRsaliXLY+fghOWwOFU8w+GqAwWlkBd21buzrAzupTUynQbVI7tVfR2/VBF/ lxB4cKQNfX0YU1rLM6ciJuR1Q6Hd75SGH0oHmRj+Oj+ryNW0hpkN6UxgoIJH0ScAAUtg VNPiyURVSJrexoax2jpiThU+U9+wF99VHf32k2hW3IGNpEOuHD4XY+unebqfR91SVPqA esZW3bq8DvxPfotil3oP9+jOVVq9QyG61+JgHBcHW9ZGriKVGDmiez28FU0I+UpmZ+n6 44fNOTLJPz9euIuIQWrte5xvdYeH9MdhMCP+OH1dgdA405ZCtGTPvs1mudMujdeZlY1D K3kg== MIME-Version: 1.0 References: <20190831005151.GD9227@bubble.grove.modra.org> In-Reply-To: From: Hongtao Liu Date: Wed, 04 Sep 2019 01:42:00 -0000 Message-ID: Subject: Re: [PATCH, i386]: Do not limit the cost of moves to/from XMM register to minimum 8. To: Uros Bizjak Cc: Richard Biener , Jakub Jelinek , Alan Modra , "gcc-patches@gcc.gnu.org" Content-Type: text/plain; charset="UTF-8" X-IsSubscribed: yes X-SW-Source: 2019-09/txt/msg00174.txt.bz2 On Wed, Sep 4, 2019 at 12:50 AM Uros Bizjak wrote: > > On Tue, Sep 3, 2019 at 1:33 PM Richard Biener > wrote: > > > > > Note: > > > > Removing limit of cost would introduce lots of regressions in SPEC2017 as follow > > > > -------------------------------- > > > > 531.deepsjeng_r -7.18% > > > > 548.exchange_r -6.70% > > > > 557.xz_r -6.74% > > > > 508.namd_r -2.81% > > > > 527.cam4_r -6.48% > > > > 544.nab_r -3.99% > > > > > > > > Tested on skylake server. > > > > ------------------------------------- > > > > How about changing cost from 2 to 8 until we figure out a better number. > > > > > > Certainly works for me. Note the STV code uses the "other" sse_to_integer > > > number and the numbers in question here are those for the RA. There's > > > a multitude of values used in the tables here, including some a lot larger. > > > So the overall bumping to 8 certainly was the wrong thing to do and instead > > > individual numbers should have been adjusted (didn't look at the history > > > of that bumping). > > > > For reference: > > > > r125951 | uros | 2007-06-22 19:51:06 +0200 (Fri, 22 Jun 2007) | 6 lines > > > > PR target/32413 > > * config/i386/i386.c (ix86_register_move_cost): Rise the cost of > > moves between MMX/SSE registers to at least 8 units to prevent > > ICE caused by non-tieable SI/HI/QImodes in SSE registers. > > > > should probably have been "twice the cost of X" or something like that > > instead where X be some reg-reg move cost. > > Thanks for the reference. It looks that the patch fixes the issue in > the wrong place, this should be solved in > inline_secondary_memory_needed: > > /* Between SSE and general, we have moves no larger than word size. */ > if (!(INTEGER_CLASS_P (class1) || INTEGER_CLASS_P (class2)) > || GET_MODE_SIZE (mode) < GET_MODE_SIZE (SImode) > || GET_MODE_SIZE (mode) > UNITS_PER_WORD) > return true; > > as an alternative to implement QI and HImode moves as a SImode move > between SSE and int<->SSE registers. We have > ix86_secondary_memory_needed_mode that extends QI and HImode secondary > memory to SImode, so this should solve PR32413. > > Other than that, what to do with the bizzare property of direct moves > that benchmark far worse than indirect moves? I was expecting that > keeping the cost of direct inter-regset moves just a bit below the > cost of int<->mem<->xmm, but (much ?) higher than itra-regset moves > would prevent unwanted wandering of values between register sets, > while still generating the direct move when needed. While this almost I've not tested it yet. So i'll start a test about this patch(change cost from 2-->6) with Richard's change. I'll keep you informed when finishing test. > fixes the runtime regression, it is not clear to me from Hongtao Liu's > message if Richard's 2019-08-27 fixes the remaining regression or > not). Liu, can you please clarify? > -------------------------------- 531.deepsjeng_r -7.18% 548.exchange_r -6.70% 557.xz_r -6.74% 508.namd_r -2.81% 527.cam4_r -6.48% 544.nab_r -3.99% Tested on skylake server. ------------------------------------- Those regressions are comparing gcc10_20190830 to gcc10_20190824 which are mainly caused by removing limit of 8. > > > For example Pentium4 has quite high bases for move > > > costs, like xmm <-> xmm move costing 12 and SSE->integer costing 20 > > > while the opposite 12. > > > > > > So yes, we want to revert the patch by applying its effect to the > > > individual cost tables so we can revisit this for the still interesting > > > micro-architectures. > > Uros. -- BR, Hongtao