From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from gate.crashing.org (gate.crashing.org [63.228.1.57]) by sourceware.org (Postfix) with ESMTP id 57E42398B899 for ; Thu, 17 Sep 2020 17:53:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 57E42398B899 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=kernel.crashing.org Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=segher@kernel.crashing.org Received: from gate.crashing.org (localhost.localdomain [127.0.0.1]) by gate.crashing.org (8.14.1/8.14.1) with ESMTP id 08HHpQL7005121; Thu, 17 Sep 2020 12:51:26 -0500 Received: (from segher@localhost) by gate.crashing.org (8.14.1/8.14.1/Submit) id 08HHpPKK005120; Thu, 17 Sep 2020 12:51:25 -0500 X-Authentication-Warning: gate.crashing.org: segher set sender to segher@kernel.crashing.org using -f Date: Thu, 17 Sep 2020 12:51:25 -0500 From: Segher Boessenkool To: Alan Modra Cc: gcc-patches@sourceware.org Subject: Re: [RS6000] rs6000_rtx_costs reduce cost for SETs Message-ID: <20200917175125.GJ28786@gate.crashing.org> References: <20200915011946.3395-1-amodra@gmail.com> <20200915011946.3395-8-amodra@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200915011946.3395-8-amodra@gmail.com> User-Agent: Mutt/1.4.2.3i X-Spam-Status: No, score=-7.3 required=5.0 tests=BAYES_00, JMQ_SPF_NEUTRAL, KAM_DMARC_STATUS, TXREP, T_SPF_HELO_PERMERROR, T_SPF_PERMERROR autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Sep 2020 17:53:30 -0000 Hi! On Tue, Sep 15, 2020 at 10:49:45AM +0930, Alan Modra wrote: > Also use rs6000_cost only for speed. More directly: use something completely different for !speed, namely, code size. > - if (CONST_INT_P (XEXP (x, 1)) > - && satisfies_constraint_I (XEXP (x, 1))) > + if (!speed) > + /* A little more than one insn so that nothing is tempted to > + turn a shift left into a multiply. */ > + *total = COSTS_N_INSNS (1) + 1; Please don't. We have a lot of code elsewhere to handle this directly, already. Also, this is just wrong for size. Five shifts is *not* better than four muls. If that is the only way to get good results, than unfortunately we probably have to; but do not do this without any proof. > case FMA: > - if (mode == SFmode) > + if (!speed) > + *total = COSTS_N_INSNS (1) + 1; Not here, either. > case DIV: > case MOD: > if (FLOAT_MODE_P (mode)) > { > - *total = mode == DFmode ? rs6000_cost->ddiv > - : rs6000_cost->sdiv; > + if (!speed) > + *total = COSTS_N_INSNS (1) + 2; And why + 2 even? > - if (GET_MODE (XEXP (x, 1)) == DImode) > + if (!speed) > + *total = COSTS_N_INSNS (1) + 2; > + else if (GET_MODE (XEXP (x, 1)) == DImode) > *total = rs6000_cost->divdi; > else > *total = rs6000_cost->divsi; (more) > @@ -21368,6 +21378,7 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code, > return false; > > case AND: > + *total = COSTS_N_INSNS (1); > right = XEXP (x, 1); > if (CONST_INT_P (right)) > { > @@ -21380,15 +21391,15 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code, > || left_code == LSHIFTRT) > && rs6000_is_valid_shift_mask (right, left, mode)) > { > - *total = rtx_cost (XEXP (left, 0), mode, left_code, 0, speed); > - if (!CONST_INT_P (XEXP (left, 1))) > - *total += rtx_cost (XEXP (left, 1), SImode, left_code, 1, speed); > - *total += COSTS_N_INSNS (1); > + rtx reg_op = XEXP (left, 0); > + if (!REG_P (reg_op)) > + *total += rtx_cost (reg_op, mode, left_code, 0, speed); > + reg_op = XEXP (left, 1); > + if (!REG_P (reg_op) && !CONST_INT_P (reg_op)) > + *total += rtx_cost (reg_op, mode, left_code, 1, speed); > return true; > } > } > - > - *total = COSTS_N_INSNS (1); > return false; This doesn't improve anything? It just makes it different from all surrounding code? > @@ -21519,7 +21530,9 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code, > if (outer_code == TRUNCATE > && GET_CODE (XEXP (x, 0)) == MULT) > { > - if (mode == DImode) > + if (!speed) > + *total = COSTS_N_INSNS (1) + 1; (more) > + case SET: > + /* The default cost of a SET is the number of general purpose > + regs being set multiplied by COSTS_N_INSNS (1). That only > + works where the incremental cost of the operation and > + operands is zero, when the operation performed can be done in > + one instruction. For other cases where we add COSTS_N_INSNS > + for some operation (see point 5 above), COSTS_N_INSNS (1) > + should be subtracted from the total cost. */ What does "incremental cost" mean there? If what increases? > + { > + rtx_code src_code = GET_CODE (SET_SRC (x)); > + if (src_code == CONST_INT > + || src_code == CONST_DOUBLE > + || src_code == CONST_WIDE_INT) > + return false; > + int set_cost = (rtx_cost (SET_SRC (x), mode, SET, 1, speed) > + + rtx_cost (SET_DEST (x), mode, SET, 0, speed)); This should use set_src_cost, if anything. But that will recurse then, currently? Ugh. Using rtx_cost for SET_DEST is problematic, too. What (if anything!) calls this for a SET? Oh, set_rtx_cost still does that, hrm. rtx_cost costs RTL *expressions*. Not instructions. Except where it is (ab)used for that, sigh. Many targets have something for it already, but all quite different from this. > + if (set_cost >= COSTS_N_INSNS (1)) > + *total += set_cost - COSTS_N_INSNS (1); I don't understand this part at all, for example. Why not just *total += set_cost - COSTS_N_INSNS (1); ? If set_cost is lower than one insn's cost, don't we have a problem already? Generic things. Please split this patch up when sending it again, it does too many different things, and many of those are not obvious. All such changes that aren't completely obvious (like the previous ones were) should have some measurement. We are in stage1, and we will notice (non-trivial) degradations, but if we can expect degradations (like for this patch), it needs benchmarking. Since you add !speed all over the place, maybe we should just have a separate function that does !speed? It looks like quite a few things will simplify. Segher