From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id C040F38618BB for ; Tue, 11 May 2021 10:59:26 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org C040F38618BB Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 57D88169E; Tue, 11 May 2021 03:59:26 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.126]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 6EF5C3F719; Tue, 11 May 2021 03:59:25 -0700 (PDT) From: Richard Sandiford To: Jakub Jelinek Mail-Followup-To: Jakub Jelinek , Hongtao Liu , Eric Botcazou , Jeff Law , gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Cc: Hongtao Liu , Eric Botcazou , Jeff Law , gcc-patches@gcc.gnu.org Subject: Re: [PATCH] regcprop: Fix another cprop_hardreg bug [PR100342] References: <20210119144514.GA4020736@tucnak> <20210505174446.GU1179226@tucnak> Date: Tue, 11 May 2021 11:59:24 +0100 In-Reply-To: <20210505174446.GU1179226@tucnak> (Jakub Jelinek's message of "Wed, 5 May 2021 19:44:46 +0200") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-6.6 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 May 2021 10:59:29 -0000 Jakub Jelinek writes: > On Tue, Jan 19, 2021 at 04:10:33PM +0000, Richard Sandiford via Gcc-patch= es wrote: >> Ah, ok, thanks for the extra context. >>=20 >> So AIUI the problem when recording xmm2<-di isn't just: >>=20 >> [A] partial_subreg_p (vd->e[sr].mode, GET_MODE (src)) >>=20 >> but also that: >>=20 >> [B] partial_subreg_p (vd->e[sr].mode, vd->e[vd->e[sr].oldest_regno].mod= e) >>=20 >> For example, all registers in this sequence can be part of the same chai= n: >>=20 >> (set (reg:HI R1) (reg:HI R0)) >> (set (reg:SI R2) (reg:SI R1)) // [A] >> (set (reg:DI R3) (reg:DI R2)) // [A] >> (set (reg:SI R4) (reg:SI R[0-3])) >> (set (reg:HI R5) (reg:HI R[0-4])) >>=20 >> But: >>=20 >> (set (reg:SI R1) (reg:SI R0)) >> (set (reg:HI R2) (reg:HI R1)) >> (set (reg:SI R3) (reg:SI R2)) // [A] && [B] >>=20 >> is problematic because it dips below the precision of the oldest regno >> and then increases again. >>=20 >> When this happens, I guess we have two choices: >>=20 >> (1) what the patch does: treat R3 as the start of a new chain. >> (2) pretend that the copy occured in vd->e[sr].mode instead >> (i.e. copy vd->e[sr].mode to vd->e[dr].mode) >>=20 >> I guess (2) would need to be subject to REG_CAN_CHANGE_MODE_P. >> Maybe the optimisation provided by (2) compared to (1) isn't common >> enough to be worth the complication. >>=20 >> I think we should test [B] as well as [A] though. The pass is set >> up to do some quite elaborate mode changes and I think rejecting >> [A] on its own would make some of the other code redundant. >> It also feels like it should be a seperate =E2=80=9Cif=E2=80=9D or =E2= =80=9Celse if=E2=80=9D, >> with its own comment. > > Unfortunately, we now have a testcase that shows that testing also [B] > is a problem (unfortunately now latent on the trunk, only reproduces > on 10 and 11 branches). This whole area feels way more complicated than it ought to be :-/ > The comment in the patch tries to list just the interesting instructions, > we have a 64-bit value, copy low 8 bit of those to another register, > copy full 64 bits to another register and then clobber the original regis= ter. > Before that (set (reg:DI r14) (const_int ...)) we have a chain > DI r14, QI si, DI bp , that instruction drops the DI r14 from that chain,= so > we have QI si, DI bp , si being the oldest_regno. > Next DI si is copied into DI dx. Only the low 8 bits of that are defined, > the rest is unspecified, but we would add DI dx into that same chain at t= he > end, so QI si, DI bp, DI dx [*]. Next si is overwritten, so the chain is > DI bp, DI dx. And then we see (set (reg:DI dx) (reg:DI bp)) and remove it > as redundant, because we think bp and dx are already equivalent, when in > reality that is true only for the lowpart 8 bits. > I believe the [*] marked step above is where the bug is. > > The committed regcprop.c (copy_value) change (but only committed to > trunk/11, not to 10) added > else if (partial_subreg_p (vd->e[sr].mode, GET_MODE (src)) > && partial_subreg_p (vd->e[sr].mode, > vd->e[vd->e[sr].oldest_regno].mode)) > return; > and while the first partial_subreg_p call returns true, the second one > doesn't; before the (set (reg:DI r14) (const_int ...)) insn it would be > true and we'd return, but as that reg got clobbered, si became the oldest > regno in the chain and so vd->e[vd->e[sr].oldest_regno].mode is QImode > and vd->e[sr].mode is QImode too, so the second partial_subreg_p is false. > But as the testcase shows, what is the oldest_regno in the chain is > something that changes over time, so relying on it for anything is > problematic, something could have a different oldest_regno and later > on get a different oldest_regno (perhaps with different mode) because > the oldest_regno got overwritten and it can change both ways. > > I wrote the following patch (originally against 10 branch because that is > where Uros has been debugging it) and bootstrapped/regtested it on 11 > branch successfully. > It effectively implements your (2) above; I'm not sure if > REG_CAN_CHANGE_MODE_P is needed there, because it is already tested in > find_oldest_value_reg -> maybe_mode_change -> mode_change_ok. The REG_CAN_CHANGE_MODE_P test would in this case be for vd->e[dr].mode =E2=86=92 vd->e[sr].mode, rather than oldest_regno's mode. I'm just worried that: (set (reg:HI R1) (reg:HI R0)) (set (reg:SI R2) (reg:SI R1)) isn't equivalent to: (set (reg:HI R1) (reg:HI R0)) (set (reg:HI R2) (reg:HI R1)) if REG_CAN_CHANGE_MODE_P is false for either the R2 or R1 change. If we pretend that it is when building the chain then there's a risk of GIGO when using it in find_oldest_value_reg. (Although in this case SI and HI are both valid for R1, REG_CAN_CHANGE_MODE_P might still be false if the HI bits are not in the low 16 bits of the SI. That's unlikely in this case, but a similar thing can happen for vector modes or multi-register modes.) I'm not saying the patch is wrong. I just wanted to clarify why I thought the check might be needed. Thanks, Richard