From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id EEF653858D32 for ; Mon, 27 Feb 2023 19:51:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org EEF653858D32 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 7A364C14; Mon, 27 Feb 2023 11:52:07 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.99.50]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id A21413F67D; Mon, 27 Feb 2023 11:51:23 -0800 (PST) From: Richard Sandiford To: Jakub Jelinek Mail-Followup-To: Jakub Jelinek ,Richard Biener , gcc-patches@gcc.gnu.org, Segher Boessenkool , richard.sandiford@arm.com Cc: Richard Biener , gcc-patches@gcc.gnu.org, Segher Boessenkool Subject: Re: [PATCH] optabs: Fix up expand_doubleword_shift_condmove for shift_mask == 0 [PR108803] References: Date: Mon, 27 Feb 2023 19:51:21 +0000 In-Reply-To: (Jakub Jelinek's message of "Mon, 27 Feb 2023 20:11:09 +0100") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-28.7 required=5.0 tests=BAYES_00,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Jakub Jelinek writes: > On Mon, Feb 27, 2023 at 03:34:11PM +0000, Richard Sandiford wrote: >> > The following testcase is miscompiled on aarch64. The problem is that >> > aarch64 with TARGET_SIMD is !SHIFT_COUNT_TRUNCATED target with >> > targetm.shift_truncation_mask (DImode) == 0 which has HAVE_conditional_move >> > true. If a doubleword shift (in this case TImode) doesn't have its own >> > expander (as is the case of e.g. x86) and is handled in generic code, >> > there are 3 possible expansions. One is when the shift count is constant, >> > the code computes in that case superword_op1 as op1 - BITS_PER_WORD, >> > and chooses at compile time which of expand_superword_shift or >> > expand_subword_shift to call, which ensures that whatever is used >> > actually has its shift count (superword_op1 or op1) in [0, BITS_PER_WORD - 1] >> > range. If !HAVE_conditional_move or that expansion fails, the function >> > handles non-constant op1 similarly (there are some special cases for >> > shift_mask >= BITS_PER_WORD - 1 but let's talk here just about >> > shift_mask < BITS_PER_WORD - 1), except that the selection is done at >> > runtime, with branches around the stuff. While superword_op1 can be >> > [-BITS_PER_WORD, BITS_PER_WORD - 1] and op1 [0, 2 * BITS_PER_WORD - 1], >> > the runtime selection ensures that the instructions executed at runtime >> > have their corresponding shift count again in the right range of >> > [0, BITS_PER_WORD - 1]. >> > The problematic is the HAVE_conditional_move case, which emits both >> > sequences into the actually executed code, so necessarily one of them >> > has an out of range shift count and then using 2 conditional moves >> > picks up a result. >> > Now, in the testcase because -Og doesn't perform VRP/EVRP the shift >> > count isn't optimized to constant during GIMPLE passes, but is determined >> > to be constant during/after expansion into RTL. The shift count is >> > actually const0_rtx later, so superword_op1 is (const_int -64) and we end >> > up with miscompilation during combine because of that. >> >> I haven't worked through the testcase yet, but how does that happen? >> Having shifts with out-of-range shift counts shouldn't be a problem >> in itself, provided that the conditional move selects the one with >> the in-range shift. > > It depends on if we (hw or the compiler) treats out-of-range shifts in RTL > as undefined behavior or just some kind of constrained unspecified behavior > (destination register can be anything, but no traps/exceptions/program > termination etc.). Of course SHIFT_COUNT_TRUNCATED makes it well defined, > but that is not the case here. > I've always thoughts we really treat it as undefined behavior, you shouldn't > reach this spot at runtime; we certainly treat it that way in GIMPLE. > If that is the case, then invoking UB and then having a conditional move > not to select its result is not well defined. > And the patch as posted fixes that problem while not really resulting in > worse code e.g. on aarch64. I think RTL and gimple are different in that respect. SHIFT_COUNT_TRUNCATED's effect on shifts is IMO a bit like CTZ_DEFINED_VALUE_AT_ZERO's effect on CTZ: it enumerates common target-specific behaviour, but doesn't turn invalid/should-not-be-evaluated values into valid values. Not defining SHIFT_COUNT_TRUNCATED is like defining CTZ_DEFINED_VALUE_AT_ZERO to 0. The docs say: Note that regardless of this macro the ``definedness'' of @code{clz} and @code{ctz} at zero do @emph{not} extend to the builtin functions visible to the user. Thus one may be free to adjust the value at will to match the target expansion of these operations without fear of breaking the API@. So for CTZ this really is an RTL thing, which can leak into gimple through ifns. I'd argue that the same is true for SHIFT_COUNT_TRUNCATED and conditional shifts like COND_SHL: normal gimple shifts aren't guaranteed to honour SHIFT_COUNT_TRUNCATED, but COND_SHL should be. The arm part relies on being able to generate shifts that take advantage of port-specific knowledge, see arm_emit_coreregs_64bit_shift. Haven't had time to work through the combine thing yet, but yeah, it does look suspiciously like something is going wrong in the cc handling or death note updates. Thanks, Richard > Anyway, back to the particular testcase rather than the general idea, > it goes wrong during combine, CCing Segher. I've added debug_bb_n after > every successful combine attempt, > --- gcc/combine.cc.jj 2023-01-02 09:32:43.720977011 +0100 > +++ gcc/combine.cc 2023-02-27 19:27:28.151289055 +0100 > @@ -4755,6 +4755,16 @@ try_combine (rtx_insn *i3, rtx_insn *i2, > if (added_notes_insn && DF_INSN_LUID (added_notes_insn) < DF_INSN_LUID (ret)) > ret = added_notes_insn; > > +{ > +static int cnt = 0; > +char buf[64]; > +sprintf (buf, "/tmp/combine.%d", cnt++); > +FILE *f = fopen (buf, "a"); > +fprintf (f, "%d\n", combine_successes); > +basic_block bb = BASIC_BLOCK_FOR_FN (cfun, 2); > +dump_bb (f, bb, 0, dump_flags_t ()); > +fclose (f); > +} > return ret; > } > > and I see > (insn 52 48 53 2 (set (reg:CC 66 cc) > (compare:CC (reg:SI 130) > (const_int 0 [0]))) "pr108803.c":12:25 437 {cmpsi} > (expr_list:REG_DEAD (reg:SI 130) > (expr_list:REG_EQUAL (compare:CC (const_int -64 [0xffffffffffffffc0]) > (const_int 0 [0])) > (nil)))) > (insn 53 52 57 2 (set (reg:DI 152 [ _6+8 ]) > (if_then_else:DI (ge (reg:CC 66 cc) > (const_int 0 [0])) > (reg:DI 132) > (const_int 0 [0]))) "pr108803.c":12:25 490 {*cmovdi_insn} > (expr_list:REG_DEAD (reg:DI 132) > (nil))) > (insn 57 53 59 2 (set (reg:DI 151 [ _6 ]) > (if_then_else:DI (ge (reg:CC 66 cc) > (const_int 0 [0])) > (const_int 0 [0]) > (reg:DI 126))) "pr108803.c":12:25 490 {*cmovdi_insn} > (expr_list:REG_DEAD (reg:CC 66 cc) > (nil))) > ... > (insn 71 68 72 2 (set (reg:CC 66 cc) > (compare:CC (reg:SI 137) > (const_int 0 [0]))) "pr108803.c":12:42 437 {cmpsi} > (expr_list:REG_DEAD (reg:SI 137) > (expr_list:REG_EQUAL (compare:CC (const_int -64 [0xffffffffffffffc0]) > (const_int 0 [0])) > (nil)))) > (insn 72 71 76 2 (set (reg:DI 153 [ _8 ]) > (if_then_else:DI (ge (reg:CC 66 cc) > (const_int 0 [0])) > (reg:DI 139) > (reg:DI 153 [ _8 ]))) "pr108803.c":12:42 490 {*cmovdi_insn} > (expr_list:REG_DEAD (reg:DI 139) > (nil))) > (insn 76 72 77 2 (set (reg:DI 154 [ _8+8 ]) > (if_then_else:DI (ge (reg:CC 66 cc) > (const_int 0 [0])) > (reg:DI 138) > (reg:DI 127))) "pr108803.c":12:42 490 {*cmovdi_insn} > (expr_list:REG_DEAD (reg:DI 138) > (expr_list:REG_DEAD (reg:DI 127) > (expr_list:REG_DEAD (reg:CC 66 cc) > (nil))))) > (insn 77 76 78 2 (set (reg:DI 159 [ b ]) > (ior:DI (reg:DI 151 [ _6 ]) > (reg:DI 126))) "pr108803.c":12:12 537 {iordi3} > (expr_list:REG_DEAD (reg:DI 126) > (expr_list:REG_DEAD (reg:DI 151 [ _6 ]) > (nil)))) > (insn 78 77 80 2 (set (reg:DI 160 [ b+8 ]) > (reg:DI 152 [ _6+8 ])) "pr108803.c":12:12 65 {*movdi_aarch64} > (expr_list:REG_DEAD (reg:DI 152 [ _6+8 ]) > (nil))) > in one of the dumps which still looks correct from brief skimming, > r130 here is k - 64 aka -64 and it is the pair of conditional moves of the > first TImode shift, effectively (a + 63) << 0 but 0 is obfuscated at > expansion time, while the second pair of conditional moves are for the > second TImode shift, effectively (a + 63) >> (0 & 63) and then both ored > together. > When doing > Trying 53 -> 78: > 53: r152:DI={(cc:CC>=0)?r132:DI:0} > REG_DEAD r132:DI > 78: r160:DI=r152:DI > REG_DEAD r152:DI > Successfully matched this instruction: > (set (reg:DI 160 [ b+8 ]) > (if_then_else:DI (ge (reg:CC 66 cc) > (const_int 0 [0])) > (reg:DI 132) > (const_int 0 [0]))) > allowing combination of insns 53 and 78 > original costs 4 + 4 = 8 > replacement cost 4 > deferring deletion of insn with uid = 53. > modifying insn i3 78: r160:DI={(cc:CC>=0)?r132:DI:0} > REG_DEAD cc:CC > REG_DEAD r132:DI > deferring rescan insn with uid = 78. > combination, the IL changes: > @@ -38,21 +38,15 @@ > (insn 52 48 53 2 (set (reg:CC 66 cc) > (compare:CC (reg:SI 130) > (const_int 0 [0]))) "pr108803.c":12:25 437 {cmpsi} > (expr_list:REG_DEAD (reg:SI 130) > (expr_list:REG_EQUAL (compare:CC (const_int -64 [0xffffffffffffffc0]) > (const_int 0 [0])) > (nil)))) > -(insn 53 52 57 2 (set (reg:DI 152 [ _6+8 ]) > - (if_then_else:DI (ge (reg:CC 66 cc) > - (const_int 0 [0])) > - (reg:DI 132) > - (const_int 0 [0]))) "pr108803.c":12:25 490 {*cmovdi_insn} > - (expr_list:REG_DEAD (reg:DI 132) > - (nil))) > +(note 53 52 57 2 NOTE_INSN_DELETED) > (insn 57 53 59 2 (set (reg:DI 151 [ _6 ]) > (if_then_else:DI (ge (reg:CC 66 cc) > (const_int 0 [0])) > (const_int 0 [0]) > (reg:DI 126))) "pr108803.c":12:25 490 {*cmovdi_insn} > (expr_list:REG_DEAD (reg:CC 66 cc) > (nil))) > @@ -72,17 +66,21 @@ > (insn 77 76 78 2 (set (reg:DI 159 [ b ]) > (ior:DI (reg:DI 151 [ _6 ]) > (reg:DI 126))) "pr108803.c":12:12 537 {iordi3} > (expr_list:REG_DEAD (reg:DI 126) > (expr_list:REG_DEAD (reg:DI 151 [ _6 ]) > (nil)))) > (insn 78 77 80 2 (set (reg:DI 160 [ b+8 ]) > - (reg:DI 152 [ _6+8 ])) "pr108803.c":12:12 65 {*movdi_aarch64} > - (expr_list:REG_DEAD (reg:DI 152 [ _6+8 ]) > - (nil))) > + (if_then_else:DI (ge (reg:CC 66 cc) > + (const_int 0 [0])) > + (reg:DI 132) > + (const_int 0 [0]))) "pr108803.c":12:12 490 {*cmovdi_insn} > + (expr_list:REG_DEAD (reg:CC 66 cc) > + (expr_list:REG_DEAD (reg:DI 132) > + (nil)))) > (insn 80 78 82 2 (parallel [ > (set (reg:CC_C 66 cc) > (compare:CC_C (plus:DI (reg:DI 155 [ a ]) > (reg:DI 159 [ b ])) > (reg:DI 155 [ a ]))) > (set (reg:DI 144) > (plus:DI (reg:DI 155 [ a ]) > but as you can see, because cc reg has been REG_DEAD before on insn 57 > rather than on insn 53, nothing really removed REG_DEAD note from there > and just adds it on insn 78 (note, besides this REG_DEAD issue the > IL is otherwise still sane, the previous cc setter 71 and its previous > uses 72 and 76 in between the move have been optimized away already in > an earlier successful combination). > And things go wild with the next successful combination: > Trying 52 -> 57: > 52: cc:CC=cmp(0xffffffffffffffc0,0) > REG_DEAD r130:SI > REG_EQUAL cmp(0xffffffffffffffc0,0) > 57: r151:DI={(cc:CC>=0)?0:r126:DI} > REG_DEAD cc:CC > Successfully matched this instruction: > (set (reg:DI 151 [ _6 ]) > (reg:DI 126)) > allowing combination of insns 52 and 57 > original costs 4 + 4 = 8 > replacement cost 4 > deferring deletion of insn with uid = 52. > modifying insn i3 57: r151:DI=r126:DI > deferring rescan insn with uid = 57. > where the IL changes: > @@ -29,27 +29,18 @@ > (insn 41 40 43 2 (set (reg:DI 132) > (ashift:DI (reg:DI 126) > (subreg:QI (reg:SI 130) 0))) "pr108803.c":12:25 781 {*aarch64_ashl_sisd_or_int_di3} > - (expr_list:REG_EQUAL (ashift:DI (reg:DI 126) > - (const_int -64 [0xffffffffffffffc0])) > - (nil))) > + (expr_list:REG_DEAD (reg:SI 130) > + (expr_list:REG_EQUAL (ashift:DI (reg:DI 126) > + (const_int -64 [0xffffffffffffffc0])) > + (nil)))) > (note 43 41 46 2 NOTE_INSN_DELETED) > (note 46 43 48 2 NOTE_INSN_DELETED) > (note 48 46 52 2 NOTE_INSN_DELETED) > -(insn 52 48 53 2 (set (reg:CC 66 cc) > - (compare:CC (reg:SI 130) > - (const_int 0 [0]))) "pr108803.c":12:25 437 {cmpsi} > - (expr_list:REG_DEAD (reg:SI 130) > - (expr_list:REG_EQUAL (compare:CC (const_int -64 [0xffffffffffffffc0]) > - (const_int 0 [0])) > - (nil)))) > +(note 52 48 53 2 NOTE_INSN_DELETED) > (note 53 52 57 2 NOTE_INSN_DELETED) > (insn 57 53 59 2 (set (reg:DI 151 [ _6 ]) > - (if_then_else:DI (ge (reg:CC 66 cc) > - (const_int 0 [0])) > - (const_int 0 [0]) > - (reg:DI 126))) "pr108803.c":12:25 490 {*cmovdi_insn} > - (expr_list:REG_DEAD (reg:CC 66 cc) > - (nil))) > + (reg:DI 126)) "pr108803.c":12:25 65 {*movdi_aarch64} > + (nil)) > (note 59 57 60 2 NOTE_INSN_DELETED) > (insn 60 59 61 2 (set (reg:DI 139) > (const_int 0 [0])) "pr108803.c":12:42 65 {*movdi_aarch64} > This removes the cc setter that was used by insn 57 (which has been > updated) but is also needed for the former insn 53 which has been > merged into insn 78. > > Jakub