From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 68DCE3858D20 for ; Tue, 29 Aug 2023 11:40:54 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 68DCE3858D20 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 7B9D12F4; Tue, 29 Aug 2023 04:41:33 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 7755C3F738; Tue, 29 Aug 2023 04:40:53 -0700 (PDT) From: Richard Sandiford To: Jeff Law Mail-Followup-To: Jeff Law ,Robin Dapp , gcc-patches , richard.sandiford@arm.com Cc: Robin Dapp , gcc-patches Subject: Re: [PATCH] fwprop: Allow UNARY_P and check register pressure. References: <5a90c8a9-1570-5af4-bfdc-19d097bfee6e@gmail.com> <9acc1a24-5d01-40ad-b4b2-5948585d3e8c@gmail.com> <48bed106-190e-ab5f-4099-fdfd4f5a193f@gmail.com> Date: Tue, 29 Aug 2023 12:40:52 +0100 In-Reply-To: <48bed106-190e-ab5f-4099-fdfd4f5a193f@gmail.com> (Jeff Law's message of "Mon, 28 Aug 2023 17:33:39 -0600") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-19.4 required=5.0 tests=BAYES_00,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,KAM_SHORT,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Jeff Law writes: > On 8/24/23 08:06, Robin Dapp via Gcc-patches wrote: >> Ping. I refined the code and some comments a bit and added a test >> case. >> >> My question in general would still be: Is this something we want >> given that we potentially move some of combine's work a bit towards >> the front of the RTL pipeline? >> >> Regards >> Robin >> >> Subject: [PATCH] fwprop: Allow UNARY_P and check register pressure. >> >> This patch enables the forwarding of UNARY_P sources. As this >> involves potentially replacing a vector register with a scalar register >> the ira_hoist_pressure machinery is used to calculate the change in >> register pressure. If the propagation would increase the pressure >> beyond the number of hard regs, we don't perform it. >> >> gcc/ChangeLog: >> >> * fwprop.cc (fwprop_propagation::profitable_p): Add unary >> handling. >> (fwprop_propagation::update_register_pressure): New function. >> (fwprop_propagation::register_pressure_high_p): New function >> (reg_single_def_for_src_p): Look through unary expressions. >> (try_fwprop_subst_pattern): Check register pressure. >> (forward_propagate_into): Call new function. >> (fwprop_init): Init register pressure. >> (fwprop_done): Clean up register pressure. >> (fwprop_insn): Add comment. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.target/riscv/rvv/autovec/binop/vadd-vx-fwprop.c: New test. > So I was hoping that Richard S. would chime in here as he knows this > code better than anyone. Heh, I'm not sure about that. I rewrote the code to use rtl-ssa, so in that sense I'm OK with the framework side. But I tried to preserve the decisions that the old pass made as closely as possible. I don't know why most of those decisions were made (which is why I just kept them). So I don't think I have a good feel for the advantages and disadvantages of doing this. Robin's analysis of the aarch64 changes was nice and detailed though. I think the one that worries me most is the addressing mode one. fwprop is probably the first chance we get to propagate adds into addresses, and virtual register elimination means that some of those opportunities won't show up in gimple. There again, virtual register elimination wouldn't be the reason for the ld4_s8.c failure. Perhaps there's something missing in expand. Other than that, I think my main question is: why just unary operations? Is the underlying assumption that we only want to propagate a maximum of one register? If so, then I think we should check for that directly, by iterating over subrtxes. That way we can handle things like binary operations involving a register and a constant, and unspecs with a single non-constant operand. I imagine the check would be something like: unsigned int nregs = 0; for (each subrtx x) { if (MEM_P (x)) return false; if (SUBREG_P (x) && .../*current conditions */...) return false; if (REG_P (x)) { nregs += 1; if (nregs > 1) return false; } } return true; Perhaps we should allow the optimisation without register-pressure information if (a) the source register and destination register are in the same pressure class and (b) all uses of the destination are being replaced. (FWIW, rtl-ssa should make it easier to try to replace all definitions at once, with an all-or-nothing choice, if we ever wanted to do that.) Thanks, Richard > > This looks like a much better implementation of something I've done > before :-) Basically imagine a target where a sign/zero extension can > be folded into arithmetic for free. We put in various hacks to this > code to encourage more propagations of extensions. > > I still think this is valuable. As we lower from gimple->RTL we're > going to still have artifacts in the RTL that we're going to want to > optimize away. fwprop has certain advantages over combine, including > the fact that it runs earlier, pre-loop. > > > It looks generally sensible to me. But give Richard S. another week to > chime in. He seems to be around, but may be slammed with stuff right now. > > jeff