From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id D10A63857805; Wed, 20 Jan 2021 14:14:38 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org D10A63857805 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 303B0D6E; Wed, 20 Jan 2021 06:14:38 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.126]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 5B8F73F66E; Wed, 20 Jan 2021 06:14:37 -0800 (PST) From: Richard Sandiford To: Hongtao Liu Mail-Followup-To: Hongtao Liu , Jakub Jelinek via Gcc-patches , ebotcazou@libertysurf.fr, steven@gcc.gnu.org, Jakub Jelinek , richard.sandiford@arm.com Cc: Jakub Jelinek via Gcc-patches , ebotcazou@libertysurf.fr, steven@gcc.gnu.org, Jakub Jelinek Subject: Re: [PATCH] [PR rtl/optimization/98694] Fix incorrect optimization by cprop_hardreg. References: <20210119144514.GA4020736@tucnak> Date: Wed, 20 Jan 2021 14:14:35 +0000 In-Reply-To: (Hongtao Liu's message of "Wed, 20 Jan 2021 12:35:15 +0800") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-12.8 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Jan 2021 14:14:40 -0000 Hongtao Liu writes: > On Wed, Jan 20, 2021 at 12:10 AM Richard Sandiford > wrote: >> >> Jakub Jelinek via Gcc-patches writes: >> > On Tue, Jan 19, 2021 at 12:38:47PM +0000, Richard Sandiford via Gcc-pa= tches wrote: >> >> > actually only the lower 16bits are needed, the original insn is like >> >> > >> >> > .294.r.ira >> >> > (insn 69 68 70 13 (set (reg:HI 96 [ _52 ]) >> >> > (subreg:HI (reg:DI 82 [ var_6.0_1 ]) 0)) "test.c":21:23 76 >> >> > {*movhi_internal} >> >> > (nil)) >> >> > (insn 78 75 82 13 (set (reg:V4HI 140 [ _283 ]) >> >> > (vec_duplicate:V4HI (truncate:HI (subreg:SI (reg:HI 96 [ _52 >> >> > ]) 0)))) 1412 {*vec_dupv4hi} >> >> > (nil)) >> >> > >> >> > .295r.reload >> >> > (insn 69 68 70 13 (set (reg:HI 5 di [orig:96 _52 ] [96]) >> >> > (reg:HI 68 k0 [orig:82 var_6.0_1 ] [82])) "test.c":21:23 76 >> >> > {*movhi_internal} >> >> > (nil)) >> >> > (insn 489 75 78 13 (set (reg:SI 22 xmm2 [297]) >> >> > (reg:SI 5 di [orig:96 _52 ] [96])) 75 {*movsi_internal} >> >> > (nil)) >> >> > (insn 78 489 490 13 (set (reg:V4HI 20 xmm0 [orig:140 _283 ] [140]) >> >> > (vec_duplicate:V4HI (truncate:HI (reg:SI 22 xmm2 [297])))) >> >> > 1412 {*vec_dupv4hi} >> >> > (nil)) >> >> > >> >> > and insn 489 is created by lra/reload which seems ok for the sequen= ce, >> >> > but problemistic with considering the logic of hardreg_cprop. >> >> >> >> It looks OK even with the regcprop behaviour though: >> >> >> >> - insn 69 defines only the low 16 bits of di, >> >> - insn 489 defines only the low 16 bits of xmm2, but copies bits 16-31 >> >> too (with unknown contents) >> >> - insn 78 uses only the low 16 bits of xmm2 (the unknown contents >> >> introduced by insn 489 are truncated away) >> >> >> >> So where do bits 16-31 become significant? What goes wrong if they're >> >> not zero? >> > >> > The k0 register is initialized I believe with >> > (insn 20 2 21 2 (set (reg:DI 68 k0 [orig:82 var_6.0_1 ] [82]) >> > (mem/c:DI (symbol_ref:DI ("var_6") [flags 0x40] ) [3 var_6+0 S8 A64])) "pr98694.C":21:10 74 {*movdi_inter= nal} >> > (nil)) >> > and so it contains all 64-bits, and then the code sometimes uses all t= he >> > bits, sometimes just the low 16-bits and sometimes low 32-bits of that >> > value. >> > (insn 69 68 70 12 (set (reg:HI 5 di [orig:96 _52 ] [96]) >> > (reg:HI 68 k0 [orig:82 var_6.0_1 ] [82])) "pr98694.C":27:23 76= {*movhi_internal} >> > (nil)) >> > (insn 74 73 75 12 (set (reg:SI 36 r8 [orig:149 _52 ] [149]) >> > (zero_extend:SI (reg:HI 68 k0 [orig:82 var_6.0_1 ] [82]))) 144= {*zero_extendhisi2} >> > (nil)) >> > (insn 489 75 78 12 (set (reg:SI 22 xmm2 [297]) >> > (reg:SI 5 di [orig:96 _52 ] [96])) 75 {*movsi_internal} >> > (nil)) >> > (insn 78 489 490 12 (set (reg:V4HI 20 xmm0 [orig:140 _283 ] [140]) >> > (vec_duplicate:V4HI (truncate:HI (reg:SI 22 xmm2 [297])))) 141= 2 {*vec_dupv4hi} >> > (expr_list:REG_DEAD (reg:SI 22 xmm2 [297]) >> > (nil))) >> > are examples when it uses only the low 16 bits from that, and >> > (insn 487 72 73 12 (set (reg:SI 1 dx [148]) >> > (reg:SI 68 k0 [orig:82 var_6.0_1 ] [82])) 75 {*movsi_internal} >> > (nil)) >> > >> > (insn 85 84 491 13 (set (reg:SI 37 r9 [orig:86 _11 ] [86]) >> > (reg:SI 68 k0 [orig:82 var_6.0_1 ] [82])) "pr98694.C":28:14 75= {*movsi_internal} >> > (nil)) >> > >> > (insn 491 85 88 13 (set (reg:SI 3 bx [299]) >> > (reg:SI 68 k0 [orig:82 var_6.0_1 ] [82])) 75 {*movsi_internal} >> > (nil)) >> > (insn 88 491 89 13 (set (reg:CCNO 17 flags) >> > (compare:CCNO (reg:SI 3 bx [299]) >> > (const_int 0 [0]))) 7 {*cmpsi_ccno_1} >> > (expr_list:REG_DEAD (reg:SI 3 bx [299]) >> > (nil))) >> > >> > (insn 457 499 460 33 (set (reg:SI 39 r11 [orig:86 _11 ] [86]) >> > (reg:SI 37 r9 [orig:86 _11 ] [86])) "pr98694.C":35:36 75 {*mov= si_internal} >> > (expr_list:REG_DEAD (reg:SI 37 r9 [orig:86 _11 ] [86]) >> > (nil))) >> > are examples where it uses low 32-bits from k0. >> > So the >> > (insn 457 499 460 33 (set (reg:SI 39 r11 [orig:86 _11 ] [86]) >> > - (reg:SI 37 r9 [orig:86 _11 ] [86])) "pr98694.C":35:36 75 {*mo= vsi_internal} >> > - (expr_list:REG_DEAD (reg:SI 37 r9 [orig:86 _11 ] [86]) >> > + (reg:SI 22 xmm2 [orig:86 _11 ] [86])) "pr98694.C":35:36 75 {*= movsi_internal} >> > + (expr_list:REG_DEAD (reg:SI 22 xmm2 [orig:86 _11 ] [86]) >> > (nil))) >> > cprop_hardreg change indeed looks bogus, while xmm2 has SImode, it hol= ds >> > only the low 16-bits of the value and has the upper bits undefined, wh= ile r9 >> > it is replacing had all of the low 32-bits well defined. >> >> Ah, ok, thanks for the extra context. >> >> So AIUI the problem when recording xmm2<-di isn't just: >> >> [A] partial_subreg_p (vd->e[sr].mode, GET_MODE (src)) >> >> but also that: >> >> [B] partial_subreg_p (vd->e[sr].mode, vd->e[vd->e[sr].oldest_regno].mod= e) >> >> For example, all registers in this sequence can be part of the same chai= n: >> >> (set (reg:HI R1) (reg:HI R0)) >> (set (reg:SI R2) (reg:SI R1)) // [A] >> (set (reg:DI R3) (reg:DI R2)) // [A] >> (set (reg:SI R4) (reg:SI R[0-3])) >> (set (reg:HI R5) (reg:HI R[0-4])) >> >> But: >> >> (set (reg:SI R1) (reg:SI R0)) >> (set (reg:HI R2) (reg:HI R1)) >> (set (reg:SI R3) (reg:SI R2)) // [A] && [B] >> >> is problematic because it dips below the precision of the oldest regno >> and then increases again. >> >> When this happens, I guess we have two choices: >> >> (1) what the patch does: treat R3 as the start of a new chain. >> (2) pretend that the copy occured in vd->e[sr].mode instead >> (i.e. copy vd->e[sr].mode to vd->e[dr].mode) >> >> I guess (2) would need to be subject to REG_CAN_CHANGE_MODE_P. >> Maybe the optimisation provided by (2) compared to (1) isn't common >> enough to be worth the complication. >> >> I think we should test [B] as well as [A] though. The pass is set >> up to do some quite elaborate mode changes and I think rejecting >> [A] on its own would make some of the other code redundant. >> It also feels like it should be a seperate =E2=80=9Cif=E2=80=9D or =E2= =80=9Celse if=E2=80=9D, >> with its own comment. >> > Update patch. >> Thanks, >> Richard > > > > --=20 > BR, > Hongtao > > From a52b3c8a90a0bf6cbda8ce86d99c82c6182863a7 Mon Sep 17 00:00:00 2001 > From: liuhongt > Date: Mon, 18 Jan 2021 16:55:32 +0800 > Subject: [PATCH] [PR rtl/optimization/98694] Fix incorrect optimization by > cprop_hardreg. > > If SRC had been assigned a mode narrower than the copy, we can't link > DEST into the chain even they have same > hard_regno_nregs(i.e. HImode/SImode in i386 backend). This is a bit out of date now. Maybe just say =E2=80=9Ccan't always link= =E2=80=9D instead of just =E2=80=9Ccan't link=E2=80=9D. > i.e > kmovw %k0, %edi > vmovd %edi, %xmm2 > vpshuflw $0, %xmm2, %xmm0 > kmovw %k0, %r8d > kmovd %k0, %r9d > ... > - movl %r9d, %r11d > + vmovd %xmm2, %r11d > > gcc/ChangeLog: > > PR rtl-optimization/98694 > * regcprop.c (copy_value): If SRC had been assigned a mode > narrower than the copy, we can't link DEST into the chain even > they have same hard_regno_nregs(i.e. HImode/SImode in i386 > backend). > > gcc/testsuite/ChangeLog: > > PR rtl-optimization/98694 > * gcc.target/i386/pr98694.c: New test. > --- > gcc/regcprop.c | 33 +++++++++++++++++++++ > gcc/testsuite/gcc.target/i386/pr98694.c | 38 +++++++++++++++++++++++++ > 2 files changed, 71 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/i386/pr98694.c > > diff --git a/gcc/regcprop.c b/gcc/regcprop.c > index dd62cb36013..908298beaea 100644 > --- a/gcc/regcprop.c > +++ b/gcc/regcprop.c > @@ -358,6 +358,39 @@ copy_value (rtx dest, rtx src, struct value_data *vd) > else if (sn > hard_regno_nregs (sr, vd->e[sr].mode)) > return; >=20=20 > + /* If SRC had been assigned a mode narrower than the copy, Although > + they have same hard_regno_nregs, it's not safe to link DEST into the > + chain. .i.e. How about: It is not safe to link DEST into the chain if SRC was defined in some narrower mode M and if M is also narrower than the mode of the first register in the chain. For example: > + (set (reg:DI r1) (reg:DI r0)) > + (set (reg:HI r2) (reg:HI r1)) > + (set (reg:SI r3) (reg:SI r2)) //Should be a new chain start at r3 > + (set (reg:SI r4) (reg:SI r1)) > + (set (reg:SI r5) (reg:SI r4)) > + the upper part of r3 is undefined, if adding it to the chain, it ma= y be > + prop to r5 which has defined upper bits, .i.e. pr98694. And for this: (set (reg:DI r1) (reg:DI r0)) (set (reg:HI r2) (reg:HI r1)) (set (reg:SI r3) (reg:SI r2)) // Should be a new chain starting at r3 (set (reg:SI r4) (reg:SI r1)) (set (reg:SI r5) (reg:SI r4)) the upper part of r3 is undefined. If we added it to the chain, it may be used to replace r5, which has defined upper bits. See PR98694 for details. > + > + [A] partial_subreg_p (vd->e[sr].mode, GET_MODE (src)) > + [B] partial_subreg_p (vd->e[sr].mode, vd->e[vd->e[sr].oldest_regno]= .mode) > + Condition B is added to to catch optimization opportunities of > + > + (set (reg:HI R1) (reg:HI R0)) > + (set (reg:SI R2) (reg:SI R1)) // [A] > + (set (reg:DI R3) (reg:DI R2)) // [A] > + (set (reg:SI R4) (reg:SI R[0-3])) > + (set (reg:HI R5) (reg:HI R[0-4])) > + Maybe add here: in which all registers have only 16 defined bits. > + but problematic for > + > + (set (reg:SI R1) (reg:SI R0)) > + (set (reg:HI R2) (reg:HI R1)) > + (set (reg:SI R3) (reg:SI R2)) // [A] && [B] > + > + to be fixed???? */ I think we should drop this part. Your example above covers it in more detail. OK with those changes and the one that HJ asked for. Thanks, Richard