From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 475A33858D39 for ; Mon, 8 Apr 2024 12:18:56 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 475A33858D39 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 475A33858D39 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1712578763; cv=none; b=xg5uYb+JC/xPjI2yR1hFCOzy4s6aiiRPflWo6iNchHcQJ3kW8Ps7DR7ff6Bb23soSwXW3OlwIU3RH/3Jt+vuUUVp2DAjDLFTWgJZgg+taGBq6gS1G/zB8aXGAFApaphLai+UZhaYgpQS3IFP1bzh0TdG4SyIJTxGv3xMD6Qg3VM= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1712578763; c=relaxed/simple; bh=Sw2XUVHlSzAfeLYQxgToAUEKZy8Zq9nsrdyrCDQ7b30=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=GSX07ZFWXBh9G15+VHV/S+cQSVrD4xUIuMeiMK8ZLkLSnmlZFEvEqdfZkHXZEvuvDZgEnDcdvoUZTfhb4pNrmUyeIjArUL/4dK3K5tmR4DGoQqDbmvhyFRK4O6H+g6VwifKlJ0WipYqZudV0ixV4po06e6/SUo3lOzD9AzBQLCg= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 63B82DA7; Mon, 8 Apr 2024 05:19:26 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 219363F64C; Mon, 8 Apr 2024 05:18:55 -0700 (PDT) From: Richard Sandiford To: Segher Boessenkool Mail-Followup-To: Segher Boessenkool ,Richard Biener , gcc-patches@gcc.gnu.org, Jakub Jelinek , jeffreyalaw@gmail.com, richard.sandiford@arm.com Cc: Richard Biener , gcc-patches@gcc.gnu.org, Jakub Jelinek , jeffreyalaw@gmail.com Subject: Re: [PATCH] rtl-optimization/101523 - avoid re-combine after noop 2->2 combination References: <202404031107.433B7hCA019240@gate.crashing.org> <20240405212754.GL19790@gate.crashing.org> Date: Mon, 08 Apr 2024 13:18:53 +0100 In-Reply-To: <20240405212754.GL19790@gate.crashing.org> (Segher Boessenkool's message of "Fri, 5 Apr 2024 16:27:54 -0500") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-14.6 required=5.0 tests=BAYES_00,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,KAM_SHORT,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Segher Boessenkool writes: > Hi! > > On Wed, Apr 03, 2024 at 01:07:41PM +0200, Richard Biener wrote: >> The following avoids re-walking and re-combining the instructions >> between i2 and i3 when the pattern of i2 doesn't change. >> >> Bootstrap and regtest running ontop of a reversal of >> r14-9692-g839bc42772ba7a. > > Please include that in the patch (or series, preferably). > >> It brings down memory use frmo 9GB to 400MB and compile-time from >> 80s to 3.5s. r14-9692-g839bc42772ba7a does better in both metrics >> but has shown code generation regressions across acrchitectures. >> >> OK to revert r14-9692-g839bc42772ba7a? > > No. > > The patch solved a very real problem. How does your replacement handle > that? You don't say. It looks like it only battles symptoms a bit, > instead :-( > > We had this before: 3->2 combinations that leave an instruction > identical to what was there before. This was just a combination with > context as well. The only reason this wasn't a huge problem then > already was because this is a 3->2 combination, even if it really is a > 2->1 one it still is beneficial in all the same cases. But in the new > case it can iterate indefinitely -- well not quite, but some polynomial > number of times, for a polynomial at least of degree three, possibly > more :-( > > With this patch you need to show combine still is linear. I don't think > it is, but some deeper analysis might show it still is. > > ~ - ~ - ~ > > What should *really* be done is something that has been on the wish list > for decades: an uncse pass. > > The things that combine no longer works on after my patch are actually > 1->1 combinations (which we never do currently, although we probably > should); or alternatively, an un-CSE followed by a 2->1 combination. > > We can do the latter of course, but we need to do an actual uncse first! > Somewhere before combine, and then redo a CSE after it. An actual CSE, > not doing ten gazillion other things. Can you give a specific example of a 2->2 combination that we still want to apply after r14-9692-g839bc42772ba7a? 2->2 combinations as I understand them were added by c4c5ad1d6d1e1e1fe7a1c2b3bb097cc269dc7306: Author: Segher Boessenkool Date: Mon Jul 30 15:18:17 2018 +0200 combine: Allow combining two insns to two insns This patch allows combine to combine two insns into two. This helps in many cases, by reducing instruction path length, and also allowing further combinations to happen. PR85160 is a typical example of code that it can improve. This patch does not allow such combinations if either of the original instructions was a simple move instruction. In those cases combining the two instructions increases register pressure without improving the code. With this move test register pressure does no longer increase noticably as far as I can tell. (At first I also didn't allow either of the resulting insns to be a move instruction. But that is actually a very good thing to have, as should have been obvious). PR rtl-optimization/85160 * combine.c (is_just_move): New function. (try_combine): Allow combining two instructions into two if neither of the original instructions was a move. That patch didn't have a testcase, but one was added later in 81bdfc1e2940fc93bcd0bba4416daff47f04f3b3: testcase for 2-2 combine gcc/testsuite/ PR rtl-optimization/85160 * gcc.target/powerpc/combine-2-2.c: New testcase. But this is the powerpc test that regresses with the recent patch (PR114518). The patches reference aarch64 bug PR85160. If I check out and build c4c5ad1d6d1e above, I can see that it does indeed remove two mvns from the PR85160 testcase. The diff from c4c5ad1d6d1e~ is: @@ -10,12 +10,10 @@ .cfi_startproc ldr w3, [x2, w3, sxtw 2] ldr w2, [x2, w4, sxtw 2] - mvn w3, w3 - mvn w2, w2 - and w4, w3, w1 - and w1, w2, w1 - and w3, w3, w0 - and w2, w2, w0 + bic w4, w1, w3 + bic w3, w0, w3 + bic w1, w1, w2 + bic w2, w0, w2 asr w4, w4, 9 asr w1, w1, 7 orr w3, w4, w3, asr 7 (which is great). But if I apply 839bc42772ba on top of c4c5ad1d6d1e then the optimisation is undone. Is that the intention? I.e. are we effectively removing the kind of 2->2 combinations added in c4c5ad1d6d1e1e1fe? If so, why not simply revert c4c5ad1d6d1e1e1fe itself? Or is there a specific testcase that is still optimised with the combination of c4c5ad1d6d1e1e1fe and 839bc42772ba7a that would not be optimised without c4c5ad1d6d1e1e1fe? If so, can you say what it is? Thanks, Richard