From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id BF9953858C62; Tue, 4 Apr 2023 13:41:59 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org BF9953858C62 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1680615719; bh=axwpW/VgBwDTAjg0/M9kB3vN2/TBuLTuHHfToKtGM3M=; h=From:To:Subject:Date:In-Reply-To:References:From; b=wcLvmv8kutvQB0qYOGoVfTtoW7vzHzJC+4U3Ujv6D3B2dcWaObS4O7DxOgefyqsvp zNcjqM79bjADDhUkMx7UJ8pOWS91pC2Vnd1ZJvrQQ9lJoZdDgMy2Hitg+uRTRrCqpC M7XRkiRmPWf75z9vZwPYGkOpFCQQr5WeSMwmnRy8= From: "jakub at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/109040] [13 Regression] wrong code with v16hi compare & mask on riscv64 at -O2 since r13-4907-g2e886eef7f2b5a Date: Tue, 04 Apr 2023 13:41:59 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 13.0 X-Bugzilla-Keywords: wrong-code X-Bugzilla-Severity: normal X-Bugzilla-Who: jakub at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 13.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D109040 Jakub Jelinek changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |ebotcazou at gcc dot gnu.o= rg --- Comment #4 from Jakub Jelinek --- That said, I've tried to reproduce this using -O1 -fno-tree-forwprop unsigned int a; unsigned short b; __attribute__((noipa)) unsigned int foo (void) { unsigned int c =3D a & 0x8084c; unsigned short d =3D c; return d + b; } but while the RTL is quite similar in that case, this one works. The problem with the #c0 testcase is during combine. We have before combine: (insn 32 23 34 2 (set (reg:SI 167 [ m ]) (mem/c:SI (reg/v/f:SI 147) [1 m+0 S4 A128])) "pr109040.c":9:30 180 {*movsi_internal} (expr_list:REG_DEAD (reg/v/f:SI 147) (nil))) (insn 34 32 35 2 (set (reg:SI 168) (const_int 526412 [0x8084c])) "pr109040.c":9:30 176 {*mvconst_inter= nal} (nil)) (insn 35 34 47 2 (set (reg:SI 166) (and:SI (reg:SI 167 [ m ]) (reg:SI 168))) "pr109040.c":9:30 95 {andsi3} (expr_list:REG_DEAD (reg:SI 168) (expr_list:REG_DEAD (reg:SI 167 [ m ]) (expr_list:REG_EQUAL (and:SI (reg:SI 167 [ m ]) (const_int 526412 [0x8084c])) (nil))))) (insn 47 35 39 2 (set (reg:HI 175) (subreg:HI (reg:SI 166) 0)) "pr109040.c":9:11 181 {*movhi_internal} (expr_list:REG_DEAD (reg:SI 166) (nil))) (insn 39 47 40 2 (set (reg:SI 171) (zero_extend:SI (reg:HI 175))) "pr109040.c":9:11 111 {*zero_extendhisi2} (expr_list:REG_DEAD (reg:HI 175) (nil))) (insn 40 39 43 2 (set (reg:SI 172) (leu:SI (reg:SI 171) (const_int 5 [0x5]))) "pr109040.c":9:11 291 {*sleu_sisi} (expr_list:REG_DEAD (reg:SI 171) (nil))) Now, the zero extension from HImode to SImode of m & 0x8084c would be best combined as m & 0x84c, but 0x84c doesn't fit into signed 12-bit immediate for ANDI instruction. On the above shorter testcase the major difference before combine is that t= he zero_extend is combined with the subreg, so (insn 10 9 11 2 (set (reg:SI 148 [ c ]) (zero_extend:SI (subreg:HI (reg:SI 144 [ c ]) 0))) "pr109040-2.c":1= 0:12 111 {*zero_extendhisi2} (expr_list:REG_DEAD (reg:SI 144 [ c ]) (nil))) in there. On the short testcase, the first successful combine is trying to combine the *mvconst_internal, and and zero_extend: Failed to match this instruction: (set (reg:SI 148 [ c ]) (and:SI (reg:SI 145 [ a ]) (const_int 2124 [0x84c]))) Successfully matched this instruction: (set (reg:SI 144 [ c ]) (const_int 2124 [0x84c])) Successfully matched this instruction: (set (reg:SI 148 [ c ]) (and:SI (reg:SI 145 [ a ]) (reg:SI 144 [ c ]))) and everything is fine. On the #c0 testcase, the first successful combine from the above ones is tr= ying to combine the and and insn 47 (subreg) into: Successfully matched this instruction: (set (subreg:SI (reg:HI 175) 0) (and:SI (reg:SI 167 [ m ]) (reg:SI 168))) Now, not really sure if that's valid given that riscv is WORD_REGISTER_OPERATIONS 1 target. But maybe even the insn 47 before combine is wrong for such a target. As pseudo 168 is 0x8084c, the upper half contains one randomish bit. And later this new insn is combined with the leu into Successfully matched this instruction: (set (reg:SI 172) (leu:SI (subreg:SI (reg:HI 175) 0) (const_int 5 [0x5]))) which is definitely wrong, because the zero extension disappeared.=