From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 228B73858C52; Mon, 27 Feb 2023 19:24:07 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 228B73858C52 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1677525848; bh=Gbe2uLTaNMnEEXadpD/iteMthaB7UyJS02vf1xWqW80=; h=From:To:Subject:Date:In-Reply-To:References:From; b=ph7flC5IXtYW8juBzHUEQBh6Xk5NrA/9ww1l9MyGVxU6N6pLJ5bbQR3DDkiJSHWEH TYPFuEfmJ3RJM+RpKGEEKDzBaUGn/LkleSlnV3HQIVK6CbAEeqzJVH4JFEmLtYa2yc lIoq/RdUsyUfBzdAVrfwqBjuf2wcDa66SlOSAE1s= From: "jakub at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/108803] [10/11/12/13 Regression] wrong code for 128bit rotate on aarch64-unknown-linux-gnu with -Og Date: Mon, 27 Feb 2023 19:24:06 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 13.0 X-Bugzilla-Keywords: wrong-code X-Bugzilla-Severity: normal X-Bugzilla-Who: jakub at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 10.5 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D108803 Jakub Jelinek changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |segher at gcc dot gnu.org --- Comment #7 from Jakub Jelinek --- Besides this question whether it is ok to emit possibly out-of-bounds shifts into code executed at runtime where source code didn't have any UB, provided that their destination is only used in a conditional move that will not pick that result up, I think we have a combiner bug here. Copying what I wrote in https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612909.html: I've added debug_bb_n after every successful combine attempt, --- gcc/combine.cc.jj 2023-01-02 09:32:43.720977011 +0100 +++ gcc/combine.cc 2023-02-27 19:27:28.151289055 +0100 @@ -4755,6 +4755,16 @@ try_combine (rtx_insn *i3, rtx_insn *i2, if (added_notes_insn && DF_INSN_LUID (added_notes_insn) < DF_INSN_LUID (ret)) ret =3D added_notes_insn; +{ +static int cnt =3D 0; +char buf[64]; +sprintf (buf, "/tmp/combine.%d", cnt++); +FILE *f =3D fopen (buf, "a"); +fprintf (f, "%d\n", combine_successes); +basic_block bb =3D BASIC_BLOCK_FOR_FN (cfun, 2); +dump_bb (f, bb, 0, dump_flags_t ()); +fclose (f); +} return ret; } and I see (insn 52 48 53 2 (set (reg:CC 66 cc) (compare:CC (reg:SI 130) (const_int 0 [0]))) "pr108803.c":12:25 437 {cmpsi} (expr_list:REG_DEAD (reg:SI 130) (expr_list:REG_EQUAL (compare:CC (const_int -64 [0xffffffffffffffc0= ]) (const_int 0 [0])) (nil)))) (insn 53 52 57 2 (set (reg:DI 152 [ _6+8 ]) (if_then_else:DI (ge (reg:CC 66 cc) (const_int 0 [0])) (reg:DI 132) (const_int 0 [0]))) "pr108803.c":12:25 490 {*cmovdi_insn} (expr_list:REG_DEAD (reg:DI 132) (nil))) (insn 57 53 59 2 (set (reg:DI 151 [ _6 ]) (if_then_else:DI (ge (reg:CC 66 cc) (const_int 0 [0])) (const_int 0 [0]) (reg:DI 126))) "pr108803.c":12:25 490 {*cmovdi_insn} (expr_list:REG_DEAD (reg:CC 66 cc) (nil))) ... (insn 71 68 72 2 (set (reg:CC 66 cc) (compare:CC (reg:SI 137) (const_int 0 [0]))) "pr108803.c":12:42 437 {cmpsi} (expr_list:REG_DEAD (reg:SI 137) (expr_list:REG_EQUAL (compare:CC (const_int -64 [0xffffffffffffffc0= ]) (const_int 0 [0])) (nil)))) (insn 72 71 76 2 (set (reg:DI 153 [ _8 ]) (if_then_else:DI (ge (reg:CC 66 cc) (const_int 0 [0])) (reg:DI 139) (reg:DI 153 [ _8 ]))) "pr108803.c":12:42 490 {*cmovdi_insn} (expr_list:REG_DEAD (reg:DI 139) (nil))) (insn 76 72 77 2 (set (reg:DI 154 [ _8+8 ]) (if_then_else:DI (ge (reg:CC 66 cc) (const_int 0 [0])) (reg:DI 138) (reg:DI 127))) "pr108803.c":12:42 490 {*cmovdi_insn} (expr_list:REG_DEAD (reg:DI 138) (expr_list:REG_DEAD (reg:DI 127) (expr_list:REG_DEAD (reg:CC 66 cc) (nil))))) (insn 77 76 78 2 (set (reg:DI 159 [ b ]) (ior:DI (reg:DI 151 [ _6 ]) (reg:DI 126))) "pr108803.c":12:12 537 {iordi3} (expr_list:REG_DEAD (reg:DI 126) (expr_list:REG_DEAD (reg:DI 151 [ _6 ]) (nil)))) (insn 78 77 80 2 (set (reg:DI 160 [ b+8 ]) (reg:DI 152 [ _6+8 ])) "pr108803.c":12:12 65 {*movdi_aarch64} (expr_list:REG_DEAD (reg:DI 152 [ _6+8 ]) (nil))) in one of the dumps which still looks correct from brief skimming, r130 here is k - 64 aka -64 and it is the pair of conditional moves of the first TImode shift, effectively (a + 63) << 0 but 0 is obfuscated at expansion time, while the second pair of conditional moves are for the second TImode shift, effectively (a + 63) >> (0 & 63) and then both ored together. It makes sense that the second set of conditional moves are optimized earlier because the shift count in that case is more constrained, it is mas= ked with 63, so one can see (0 & 63) - 64 < 0, while r130 comparison isn't optimized so early because r130 has 2 users and one of them couldn't be merged. Note, this is the exact spot why I think we want the other patch, we can't merge (insn 40 37 41 2 (set (reg:SI 130) (const_int -64 [0xffffffffffffffc0])) "pr108803.c":12:25 64 {*movsi_aarch64} (nil)) (insn 41 40 43 2 (set (reg:DI 132) (ashift:DI (reg:DI 126) (subreg:QI (reg:SI 130) 0))) "pr108803.c":12:25 781 {*aarch64_ashl_sisd_or_int_di3} (expr_list:REG_EQUAL (ashift:DI (reg:DI 126) (const_int -64 [0xffffffffffffffc0])) (nil))) because the backend doesn't like out of bounds shift count immediates in the instruction directly. When doing Trying 53 -> 78: 53: r152:DI=3D{(cc:CC>=3D0)?r132:DI:0} REG_DEAD r132:DI 78: r160:DI=3Dr152:DI REG_DEAD r152:DI Successfully matched this instruction: (set (reg:DI 160 [ b+8 ]) (if_then_else:DI (ge (reg:CC 66 cc) (const_int 0 [0])) (reg:DI 132) (const_int 0 [0]))) allowing combination of insns 53 and 78 original costs 4 + 4 =3D 8 replacement cost 4 deferring deletion of insn with uid =3D 53. modifying insn i3 78: r160:DI=3D{(cc:CC>=3D0)?r132:DI:0} REG_DEAD cc:CC REG_DEAD r132:DI deferring rescan insn with uid =3D 78. combination, the IL changes: @@ -38,21 +38,15 @@ (insn 52 48 53 2 (set (reg:CC 66 cc) (compare:CC (reg:SI 130) (const_int 0 [0]))) "pr108803.c":12:25 437 {cmpsi} (expr_list:REG_DEAD (reg:SI 130) (expr_list:REG_EQUAL (compare:CC (const_int -64 [0xffffffffffffffc= 0]) (const_int 0 [0])) (nil)))) -(insn 53 52 57 2 (set (reg:DI 152 [ _6+8 ]) - (if_then_else:DI (ge (reg:CC 66 cc) - (const_int 0 [0])) - (reg:DI 132) - (const_int 0 [0]))) "pr108803.c":12:25 490 {*cmovdi_insn} - (expr_list:REG_DEAD (reg:DI 132) - (nil))) +(note 53 52 57 2 NOTE_INSN_DELETED) (insn 57 53 59 2 (set (reg:DI 151 [ _6 ]) (if_then_else:DI (ge (reg:CC 66 cc) (const_int 0 [0])) (const_int 0 [0]) (reg:DI 126))) "pr108803.c":12:25 490 {*cmovdi_insn} (expr_list:REG_DEAD (reg:CC 66 cc) (nil))) @@ -72,17 +66,21 @@ (insn 77 76 78 2 (set (reg:DI 159 [ b ]) (ior:DI (reg:DI 151 [ _6 ]) (reg:DI 126))) "pr108803.c":12:12 537 {iordi3} (expr_list:REG_DEAD (reg:DI 126) (expr_list:REG_DEAD (reg:DI 151 [ _6 ]) (nil)))) (insn 78 77 80 2 (set (reg:DI 160 [ b+8 ]) - (reg:DI 152 [ _6+8 ])) "pr108803.c":12:12 65 {*movdi_aarch64} - (expr_list:REG_DEAD (reg:DI 152 [ _6+8 ]) - (nil))) + (if_then_else:DI (ge (reg:CC 66 cc) + (const_int 0 [0])) + (reg:DI 132) + (const_int 0 [0]))) "pr108803.c":12:12 490 {*cmovdi_insn} + (expr_list:REG_DEAD (reg:CC 66 cc) + (expr_list:REG_DEAD (reg:DI 132) + (nil)))) (insn 80 78 82 2 (parallel [ (set (reg:CC_C 66 cc) (compare:CC_C (plus:DI (reg:DI 155 [ a ]) (reg:DI 159 [ b ])) (reg:DI 155 [ a ]))) (set (reg:DI 144) (plus:DI (reg:DI 155 [ a ]) but as you can see, because cc reg has been REG_DEAD before on insn 57 rather than on insn 53, nothing really removed REG_DEAD note from there and just adds it on insn 78 (note, besides this REG_DEAD issue the IL is otherwise still sane, the previous cc setter 71 and its previous uses 72 and 76 in between the move have been optimized away already in an earlier successful combination). And things go wild with the next successful combination: Trying 52 -> 57: 52: cc:CC=3Dcmp(0xffffffffffffffc0,0) REG_DEAD r130:SI REG_EQUAL cmp(0xffffffffffffffc0,0) 57: r151:DI=3D{(cc:CC>=3D0)?0:r126:DI} REG_DEAD cc:CC Successfully matched this instruction: (set (reg:DI 151 [ _6 ]) (reg:DI 126)) allowing combination of insns 52 and 57 original costs 4 + 4 =3D 8 replacement cost 4 deferring deletion of insn with uid =3D 52. modifying insn i3 57: r151:DI=3Dr126:DI deferring rescan insn with uid =3D 57. where the IL changes: @@ -29,27 +29,18 @@ (insn 41 40 43 2 (set (reg:DI 132) (ashift:DI (reg:DI 126) (subreg:QI (reg:SI 130) 0))) "pr108803.c":12:25 781 {*aarch64_ashl_sisd_or_int_di3} - (expr_list:REG_EQUAL (ashift:DI (reg:DI 126) - (const_int -64 [0xffffffffffffffc0])) - (nil))) + (expr_list:REG_DEAD (reg:SI 130) + (expr_list:REG_EQUAL (ashift:DI (reg:DI 126) + (const_int -64 [0xffffffffffffffc0])) + (nil)))) (note 43 41 46 2 NOTE_INSN_DELETED) (note 46 43 48 2 NOTE_INSN_DELETED) (note 48 46 52 2 NOTE_INSN_DELETED) -(insn 52 48 53 2 (set (reg:CC 66 cc) - (compare:CC (reg:SI 130) - (const_int 0 [0]))) "pr108803.c":12:25 437 {cmpsi} - (expr_list:REG_DEAD (reg:SI 130) - (expr_list:REG_EQUAL (compare:CC (const_int -64 [0xffffffffffffffc= 0]) - (const_int 0 [0])) - (nil)))) +(note 52 48 53 2 NOTE_INSN_DELETED) (note 53 52 57 2 NOTE_INSN_DELETED) (insn 57 53 59 2 (set (reg:DI 151 [ _6 ]) - (if_then_else:DI (ge (reg:CC 66 cc) - (const_int 0 [0])) - (const_int 0 [0]) - (reg:DI 126))) "pr108803.c":12:25 490 {*cmovdi_insn} - (expr_list:REG_DEAD (reg:CC 66 cc) - (nil))) + (reg:DI 126)) "pr108803.c":12:25 65 {*movdi_aarch64} + (nil)) (note 59 57 60 2 NOTE_INSN_DELETED) (insn 60 59 61 2 (set (reg:DI 139) (const_int 0 [0])) "pr108803.c":12:42 65 {*movdi_aarch64} This removes the cc setter that was used by insn 57 (which has been updated) but is also needed for the former insn 53 which has been merged into insn 78.=