From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 89952 invoked by alias); 19 Jun 2017 14:28:23 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 89895 invoked by uid 89); 19 Jun 2017 14:28:22 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_PASS,T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=yz, Hx-languages-length:2539, deciding, sum X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 19 Jun 2017 14:28:21 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id AC839344; Mon, 19 Jun 2017 07:28:24 -0700 (PDT) Received: from e105689-lin.cambridge.arm.com (e105689-lin.cambridge.arm.com [10.2.207.32]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id B5A6B3F587; Mon, 19 Jun 2017 07:28:21 -0700 (PDT) Subject: Re: [rtlanal] Do a better job of costing parallel sets containing flag-setting operations. To: Segher Boessenkool Cc: gcc-patches References: <66275bc9-7d97-b990-4c86-2de1f4a6a2fa@arm.com> <20170619140802.GK16550@gate.crashing.org> From: "Richard Earnshaw (lists)" Message-ID: <09527796-d45b-593e-5f18-7ba15a25c9df@arm.com> Date: Mon, 19 Jun 2017 14:28:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.1.1 MIME-Version: 1.0 In-Reply-To: <20170619140802.GK16550@gate.crashing.org> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-SW-Source: 2017-06/txt/msg01314.txt.bz2 On 19/06/17 15:08, Segher Boessenkool wrote: > Hi! > > On Mon, Jun 19, 2017 at 02:46:59PM +0100, Richard Earnshaw (lists) wrote: >> Many parallel set insns are of the form of a single set that also sets >> the condition code flags. In this case the cost of such an insn is >> normally the cost of the part that doesn't set the flags, since updating >> the condition flags is simply a side effect. >> >> At present all such insns are treated as having unknown cost (ie 0) and >> combine assumes that such insns are infinitely more expensive than any >> other insn sequence with a non-zero cost. > > That's not what combine does: it optimistically assumes any combination > with unknown costs is an improvement. So try this testcase on ARM. unsigned long x, y, z; int b; void test() { b = __builtin_sub_overflow (y,z, &x); } Without the patch, combine rips apart a compare and subtract insn because it sees it as having cost zero and substitutes it with separate compare and subtract insns. ie before: ldr r3, .L5 ldr r2, .L5+4 ldr r3, [r3] ldr r2, [r2] cmp r3, r2 <===== movcs r0, #0 movcc r0, #1 ldr ip, .L5+8 ldr r1, .L5+12 sub r3, r3, r2 <===== str r3, [ip] str r0, [r1] bx lr after: ldr r3, .L5 ldr r2, .L5+4 ldr r3, [r3] ldr r2, [r2] subs r3, r3, r2 <==== movcc r1, #1 movcs r1, #0 ldr r0, .L5+8 ldr r2, .L5+12 str r3, [r0] str r1, [r2] bx lr The combine log before the patch shows: allowing combination of insns 10 and 51 original costs 0 + 8 = 0 replacement costs 4 + 12 = 16 So it is clearly deciding that the original costs are greater than the replacement costs. > >> This patch addresses this problem by allowing insn_rtx_cost to ignore >> the condition setting part of a PARALLEL iff there is exactly one >> comparison set and one non-comparison set. If the only set operation is >> a comparison we still use that as the basis of the insn cost. > > I'll test this on a zillion archs, see what the effect is. > > Have you considered costing general parallels as well? > > I thought about it but concluded that there's no generically correct answer. It might be the max of all the individual sets or it might be the sum, or it might be somewhere in between. For example on ARM the load/store multiple operations are expressed as parallels, but their cost will depend on how many loads/stores happen in parallel within the hardware. I think we'd need a new back-end hook to handle the other cases sensibly. R. > Segher >