From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 24309 invoked by alias); 20 Jan 2020 09:09:23 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 24301 invoked by uid 89); 20 Jan 2020 09:09:23 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-16.6 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,SPF_PASS autolearn=ham version=3.3.1 spammy=H*M:intra, combinec, fallthru, UD:combine.c X-HELO: smtp.ispras.ru Received: from winnie.ispras.ru (HELO smtp.ispras.ru) (83.149.199.91) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 20 Jan 2020 09:09:13 +0000 Received: from [10.10.3.121] (monopod.intra.ispras.ru [10.10.3.121]) by smtp.ispras.ru (Postfix) with ESMTP id C39B7203C9 for ; Mon, 20 Jan 2020 12:09:10 +0300 (MSK) Date: Mon, 20 Jan 2020 09:37:00 -0000 From: Alexander Monakov To: gcc-patches@gcc.gnu.org Subject: Re: [PATCH] optimize costly division in rtx_cost In-Reply-To: Message-ID: References: User-Agent: Alpine 2.20.13 (LNX 116 2015-12-14) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-SW-Source: 2020-01/txt/msg01167.txt.bz2 Ping. On Sun, 5 Jan 2020, Alexander Monakov wrote: > Hi, > > I noticed there's a costly signed 64-bit division in rtx_cost on x86 as well as > any other target where UNITS_PER_WORD is implemented like TARGET_64BIT ? 8 : 4. > It's also evident that rtx_cost does redundant work for a SET rtx argument. > > Obviously the variable named 'factor' rarely exceeds 1, so in the majority of > cases it can be computed with a well-predictable branch rather than a division. > > This patch makes rtx_cost do the division only in case mode is wider than > UNITS_PER_WORD, and also moves a test for a SET up front to avoid redundancy. > No functional change. > > Bootstrapped on x86_64, ok for trunk? > > To illustrate the improvement this buys, for tramp3d -O2 compilation, I got > > before: > 73887675319 instructions:u > > 72438432200 cycles:u > 924298569 idq.ms_uops:u > 102603799255 uops_dispatched.thread:u > > after: > 73888371724 instructions:u > > 72386986612 cycles:u > 802744775 idq.ms_uops:u > 102096987220 uops_dispatched.thread:u > > (this is on Sandybridge, idq.ms_uops are uops going via the microcode sequencer, > so the unneeded division is responsible for a good fraction of them) > > * rtlanal.c (rtx_cost): Handle a SET up front. Avoid division if the > mode is not wider than UNITS_PER_WORD. > > diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c > index 9a7afccefb8..c7ab86e228b 100644 > --- a/gcc/rtlanal.c > +++ b/gcc/rtlanal.c > @@ -4207,18 +4207,23 @@ rtx_cost (rtx x, machine_mode mode, enum rtx_code outer_code, > const char *fmt; > int total; > int factor; > + unsigned mode_size; > > if (x == 0) > return 0; > > - if (GET_MODE (x) != VOIDmode) > + if (GET_CODE (x) == SET) > + /* A SET doesn't have a mode, so let's look at the SET_DEST to get > + the mode for the factor. */ > + mode = GET_MODE (SET_DEST (x)); > + else if (GET_MODE (x) != VOIDmode) > mode = GET_MODE (x); > > + mode_size = estimated_poly_value (GET_MODE_SIZE (mode)); > + > /* A size N times larger than UNITS_PER_WORD likely needs N times as > many insns, taking N times as long. */ > - factor = estimated_poly_value (GET_MODE_SIZE (mode)) / UNITS_PER_WORD; > - if (factor == 0) > - factor = 1; > + factor = mode_size > UNITS_PER_WORD ? mode_size / UNITS_PER_WORD : 1; > > /* Compute the default costs of certain things. > Note that targetm.rtx_costs can override the defaults. */ > @@ -4243,14 +4248,6 @@ rtx_cost (rtx x, machine_mode mode, enum rtx_code outer_code, > /* Used in combine.c as a marker. */ > total = 0; > break; > - case SET: > - /* A SET doesn't have a mode, so let's look at the SET_DEST to get > - the mode for the factor. */ > - mode = GET_MODE (SET_DEST (x)); > - factor = estimated_poly_value (GET_MODE_SIZE (mode)) / UNITS_PER_WORD; > - if (factor == 0) > - factor = 1; > - /* FALLTHRU */ > default: > total = factor * COSTS_N_INSNS (1); > } >