From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 53E113858C39 for ; Sat, 23 Oct 2021 10:39:50 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 53E113858C39 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A66221FB; Sat, 23 Oct 2021 03:39:49 -0700 (PDT) Received: from localhost (unknown [10.32.98.88]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 9E1D03F70D; Sat, 23 Oct 2021 03:39:47 -0700 (PDT) From: Richard Sandiford To: Tamar Christina via Gcc-patches Mail-Followup-To: Tamar Christina via Gcc-patches , Tamar Christina , Richard Earnshaw , nd , Marcus Shawcroft , richard.sandiford@arm.com Cc: Tamar Christina , Richard Earnshaw , nd , Marcus Shawcroft Subject: Re: [PATCH 2/2]AArch64: Add better costing for vector constants and operations References: Date: Sat, 23 Oct 2021 11:39:46 +0100 In-Reply-To: (Tamar Christina via Gcc-patches's message of "Wed, 8 Sep 2021 12:58:15 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-6.4 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 23 Oct 2021 10:39:52 -0000 Tamar Christina via Gcc-patches writes: >> I'm still a bit sceptical about treating the high-part cost as lower. >> ISTM that the subreg cases are the ones that are truly =E2=80=9Cfree=E2= =80=9D and any others >> should have a normal cost. So if CSE handled the subreg case itself (to= model >> how the rtx would actually be generated) then aarch64 code would have to >> do less work. I imagine that will be true for other targets as well. > > I guess the main problem is that CSE lacks context because it's not until= after > combine that the high part becomes truly "free" when pushed into a high o= peration. Yeah. And the aarch64 code is just being asked to cost the operation it's given, which could for example come from an existing aarch64_simd_mov_from_high. I think we should try to ensure that a aarch64_simd_mov_from_high followed by some arithmetic on the result is more expensive than the fused operation (when fusing is possible). An analogy might be: if the cost code is given: (add (reg X) (reg Y)) then, at some later point, the (reg X) might be replaced with a multiplication, in which case we'd have a MADD operation and the addition is effectively free. Something similar would happen if (reg X) became a shift by a small amount on newer cores, although I guess then you could argue either that the cost of the add disappears or that the cost of the shift disappears. But we shouldn't count ADD as free on the basis that it could be combined with a multiplication or shift in future. We have to cost what we're given. I think the same thing applies to the high part. Here we're trying to prevent cse1 from replacing a DUP (lane) with a MOVI by saying that the DUP is strictly cheaper than the MOVI. I don't think that's really true though, and the cost tables in the patch say that DUP is more expensive (rather than less expensive) than MOVI. Also, if I've understood correctly, it looks like we'd be relying on the vget_high of a constant remaining unfolded until RTL cse1. I think it's likely in future that we'd try to fold vget_high at the gimple level instead, since that could expose more optimisations of a different kind. The gimple optimisers would then fold vget_high(constant) in a similar way to cse1 does now. So perhaps we should continue to allow the vget_high(constant) to be foloded in cse1 and come up with some way of coping with the folded form. Thanks, Richard