Re: [PATCH] optabs: Fix up expand_doubleword_shift_condmove for shift_mask == 0 [PR108803]

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Jakub Jelinek <jakub@redhat.com>
To: Richard Biener <rguenther@suse.de>,
	gcc-patches@gcc.gnu.org, richard.sandiford@arm.com,
	Segher Boessenkool <segher@kernel.crashing.org>
Subject: Re: [PATCH] optabs: Fix up expand_doubleword_shift_condmove for shift_mask == 0 [PR108803]
Date: Mon, 27 Feb 2023 20:11:09 +0100	[thread overview]
Message-ID: <Y/0ATZRIRTt76W7B@tucnak> (raw)
In-Reply-To: <mpth6v7ruks.fsf@arm.com>

On Mon, Feb 27, 2023 at 03:34:11PM +0000, Richard Sandiford wrote:
> > The following testcase is miscompiled on aarch64.  The problem is that
> > aarch64 with TARGET_SIMD is !SHIFT_COUNT_TRUNCATED target with
> > targetm.shift_truncation_mask (DImode) == 0 which has HAVE_conditional_move
> > true.  If a doubleword shift (in this case TImode) doesn't have its own
> > expander (as is the case of e.g. x86) and is handled in generic code,
> > there are 3 possible expansions.  One is when the shift count is constant,
> > the code computes in that case superword_op1 as op1 - BITS_PER_WORD,
> > and chooses at compile time which of expand_superword_shift or
> > expand_subword_shift to call, which ensures that whatever is used
> > actually has its shift count (superword_op1 or op1) in [0, BITS_PER_WORD - 1]
> > range.  If !HAVE_conditional_move or that expansion fails, the function
> > handles non-constant op1 similarly (there are some special cases for
> > shift_mask >= BITS_PER_WORD - 1 but let's talk here just about
> > shift_mask < BITS_PER_WORD - 1), except that the selection is done at
> > runtime, with branches around the stuff.  While superword_op1 can be
> > [-BITS_PER_WORD, BITS_PER_WORD - 1] and op1 [0, 2 * BITS_PER_WORD - 1],
> > the runtime selection ensures that the instructions executed at runtime
> > have their corresponding shift count again in the right range of
> > [0, BITS_PER_WORD - 1].
> > The problematic is the HAVE_conditional_move case, which emits both
> > sequences into the actually executed code, so necessarily one of them
> > has an out of range shift count and then using 2 conditional moves
> > picks up a result.
> > Now, in the testcase because -Og doesn't perform VRP/EVRP the shift
> > count isn't optimized to constant during GIMPLE passes, but is determined
> > to be constant during/after expansion into RTL.  The shift count is
> > actually const0_rtx later, so superword_op1 is (const_int -64) and we end
> > up with miscompilation during combine because of that.
> 
> I haven't worked through the testcase yet, but how does that happen?
> Having shifts with out-of-range shift counts shouldn't be a problem
> in itself, provided that the conditional move selects the one with
> the in-range shift.

It depends on if we (hw or the compiler) treats out-of-range shifts in RTL
as undefined behavior or just some kind of constrained unspecified behavior
(destination register can be anything, but no traps/exceptions/program
termination etc.).  Of course SHIFT_COUNT_TRUNCATED makes it well defined,
but that is not the case here.
I've always thoughts we really treat it as undefined behavior, you shouldn't
reach this spot at runtime; we certainly treat it that way in GIMPLE.
If that is the case, then invoking UB and then having a conditional move
not to select its result is not well defined.
And the patch as posted fixes that problem while not really resulting in
worse code e.g. on aarch64.

Anyway, back to the particular testcase rather than the general idea,
it goes wrong during combine, CCing Segher.  I've added debug_bb_n after
every successful combine attempt,
--- gcc/combine.cc.jj	2023-01-02 09:32:43.720977011 +0100
+++ gcc/combine.cc	2023-02-27 19:27:28.151289055 +0100
@@ -4755,6 +4755,16 @@ try_combine (rtx_insn *i3, rtx_insn *i2,
   if (added_notes_insn && DF_INSN_LUID (added_notes_insn) < DF_INSN_LUID (ret))
     ret = added_notes_insn;
 
+{
+static int cnt = 0;
+char buf[64];
+sprintf (buf, "/tmp/combine.%d", cnt++);
+FILE *f = fopen (buf, "a");
+fprintf (f, "%d\n", combine_successes);
+basic_block bb = BASIC_BLOCK_FOR_FN (cfun, 2);
+dump_bb (f, bb, 0, dump_flags_t ());
+fclose (f);
+}
   return ret;
 }
 
and I see
(insn 52 48 53 2 (set (reg:CC 66 cc)
        (compare:CC (reg:SI 130)
            (const_int 0 [0]))) "pr108803.c":12:25 437 {cmpsi}
     (expr_list:REG_DEAD (reg:SI 130)
        (expr_list:REG_EQUAL (compare:CC (const_int -64 [0xffffffffffffffc0])
                (const_int 0 [0]))
            (nil))))
(insn 53 52 57 2 (set (reg:DI 152 [ _6+8 ])
        (if_then_else:DI (ge (reg:CC 66 cc)
                (const_int 0 [0]))
            (reg:DI 132)
            (const_int 0 [0]))) "pr108803.c":12:25 490 {*cmovdi_insn}
     (expr_list:REG_DEAD (reg:DI 132)
        (nil)))
(insn 57 53 59 2 (set (reg:DI 151 [ _6 ])
        (if_then_else:DI (ge (reg:CC 66 cc)
                (const_int 0 [0]))
            (const_int 0 [0])
            (reg:DI 126))) "pr108803.c":12:25 490 {*cmovdi_insn}
     (expr_list:REG_DEAD (reg:CC 66 cc)
        (nil)))
...
(insn 71 68 72 2 (set (reg:CC 66 cc)
        (compare:CC (reg:SI 137)
            (const_int 0 [0]))) "pr108803.c":12:42 437 {cmpsi}
     (expr_list:REG_DEAD (reg:SI 137)
        (expr_list:REG_EQUAL (compare:CC (const_int -64 [0xffffffffffffffc0])
                (const_int 0 [0]))
            (nil))))
(insn 72 71 76 2 (set (reg:DI 153 [ _8 ])
        (if_then_else:DI (ge (reg:CC 66 cc)
                (const_int 0 [0]))
            (reg:DI 139)
            (reg:DI 153 [ _8 ]))) "pr108803.c":12:42 490 {*cmovdi_insn}
     (expr_list:REG_DEAD (reg:DI 139)
        (nil)))
(insn 76 72 77 2 (set (reg:DI 154 [ _8+8 ])
        (if_then_else:DI (ge (reg:CC 66 cc)
                (const_int 0 [0]))
            (reg:DI 138)
            (reg:DI 127))) "pr108803.c":12:42 490 {*cmovdi_insn}
     (expr_list:REG_DEAD (reg:DI 138)
        (expr_list:REG_DEAD (reg:DI 127)
            (expr_list:REG_DEAD (reg:CC 66 cc)
                (nil)))))
(insn 77 76 78 2 (set (reg:DI 159 [ b ])
        (ior:DI (reg:DI 151 [ _6 ])
            (reg:DI 126))) "pr108803.c":12:12 537 {iordi3}
     (expr_list:REG_DEAD (reg:DI 126)
        (expr_list:REG_DEAD (reg:DI 151 [ _6 ])
            (nil))))
(insn 78 77 80 2 (set (reg:DI 160 [ b+8 ])
        (reg:DI 152 [ _6+8 ])) "pr108803.c":12:12 65 {*movdi_aarch64}
     (expr_list:REG_DEAD (reg:DI 152 [ _6+8 ])
        (nil)))
in one of the dumps which still looks correct from brief skimming,
r130 here is k - 64 aka -64 and it is the pair of conditional moves of the
first TImode shift, effectively (a + 63) << 0 but 0 is obfuscated at
expansion time, while the second pair of conditional moves are for the
second TImode shift, effectively (a + 63) >> (0 & 63) and then both ored
together.
When doing
Trying 53 -> 78:
   53: r152:DI={(cc:CC>=0)?r132:DI:0}
      REG_DEAD r132:DI
   78: r160:DI=r152:DI
      REG_DEAD r152:DI
Successfully matched this instruction:
(set (reg:DI 160 [ b+8 ])
    (if_then_else:DI (ge (reg:CC 66 cc)
            (const_int 0 [0]))
        (reg:DI 132)
        (const_int 0 [0])))
allowing combination of insns 53 and 78
original costs 4 + 4 = 8
replacement cost 4
deferring deletion of insn with uid = 53.
modifying insn i3    78: r160:DI={(cc:CC>=0)?r132:DI:0}
      REG_DEAD cc:CC
      REG_DEAD r132:DI
deferring rescan insn with uid = 78.
combination, the IL changes:
@@ -38,21 +38,15 @@
 (insn 52 48 53 2 (set (reg:CC 66 cc)
         (compare:CC (reg:SI 130)
             (const_int 0 [0]))) "pr108803.c":12:25 437 {cmpsi}
      (expr_list:REG_DEAD (reg:SI 130)
         (expr_list:REG_EQUAL (compare:CC (const_int -64 [0xffffffffffffffc0])
                 (const_int 0 [0]))
             (nil))))
-(insn 53 52 57 2 (set (reg:DI 152 [ _6+8 ])
-        (if_then_else:DI (ge (reg:CC 66 cc)
-                (const_int 0 [0]))
-            (reg:DI 132)
-            (const_int 0 [0]))) "pr108803.c":12:25 490 {*cmovdi_insn}
-     (expr_list:REG_DEAD (reg:DI 132)
-        (nil)))
+(note 53 52 57 2 NOTE_INSN_DELETED)
 (insn 57 53 59 2 (set (reg:DI 151 [ _6 ])
         (if_then_else:DI (ge (reg:CC 66 cc)
                 (const_int 0 [0]))
             (const_int 0 [0])
             (reg:DI 126))) "pr108803.c":12:25 490 {*cmovdi_insn}
      (expr_list:REG_DEAD (reg:CC 66 cc)
         (nil)))
@@ -72,17 +66,21 @@
 (insn 77 76 78 2 (set (reg:DI 159 [ b ])
         (ior:DI (reg:DI 151 [ _6 ])
             (reg:DI 126))) "pr108803.c":12:12 537 {iordi3}
      (expr_list:REG_DEAD (reg:DI 126)
         (expr_list:REG_DEAD (reg:DI 151 [ _6 ])
             (nil))))
 (insn 78 77 80 2 (set (reg:DI 160 [ b+8 ])
-        (reg:DI 152 [ _6+8 ])) "pr108803.c":12:12 65 {*movdi_aarch64}
-     (expr_list:REG_DEAD (reg:DI 152 [ _6+8 ])
-        (nil)))
+        (if_then_else:DI (ge (reg:CC 66 cc)
+                (const_int 0 [0]))
+            (reg:DI 132)
+            (const_int 0 [0]))) "pr108803.c":12:12 490 {*cmovdi_insn}
+     (expr_list:REG_DEAD (reg:CC 66 cc)
+        (expr_list:REG_DEAD (reg:DI 132)
+            (nil))))
 (insn 80 78 82 2 (parallel [
             (set (reg:CC_C 66 cc)
                 (compare:CC_C (plus:DI (reg:DI 155 [ a ])
                         (reg:DI 159 [ b ]))
                     (reg:DI 155 [ a ])))
             (set (reg:DI 144)
                 (plus:DI (reg:DI 155 [ a ])
but as you can see, because cc reg has been REG_DEAD before on insn 57
rather than on insn 53, nothing really removed REG_DEAD note from there
and just adds it on insn 78 (note, besides this REG_DEAD issue the
IL is otherwise still sane, the previous cc setter 71 and its previous
uses 72 and 76 in between the move have been optimized away already in
an earlier successful combination).
And things go wild with the next successful combination:
Trying 52 -> 57:
   52: cc:CC=cmp(0xffffffffffffffc0,0)
      REG_DEAD r130:SI
      REG_EQUAL cmp(0xffffffffffffffc0,0)
   57: r151:DI={(cc:CC>=0)?0:r126:DI}
      REG_DEAD cc:CC
Successfully matched this instruction:
(set (reg:DI 151 [ _6 ])
    (reg:DI 126))
allowing combination of insns 52 and 57
original costs 4 + 4 = 8
replacement cost 4
deferring deletion of insn with uid = 52.
modifying insn i3    57: r151:DI=r126:DI
deferring rescan insn with uid = 57.
where the IL changes:
@@ -29,27 +29,18 @@
 (insn 41 40 43 2 (set (reg:DI 132)
         (ashift:DI (reg:DI 126)
             (subreg:QI (reg:SI 130) 0))) "pr108803.c":12:25 781 {*aarch64_ashl_sisd_or_int_di3}
-     (expr_list:REG_EQUAL (ashift:DI (reg:DI 126)
-            (const_int -64 [0xffffffffffffffc0]))
-        (nil)))
+     (expr_list:REG_DEAD (reg:SI 130)
+        (expr_list:REG_EQUAL (ashift:DI (reg:DI 126)
+                (const_int -64 [0xffffffffffffffc0]))
+            (nil))))
 (note 43 41 46 2 NOTE_INSN_DELETED)
 (note 46 43 48 2 NOTE_INSN_DELETED)
 (note 48 46 52 2 NOTE_INSN_DELETED)
-(insn 52 48 53 2 (set (reg:CC 66 cc)
-        (compare:CC (reg:SI 130)
-            (const_int 0 [0]))) "pr108803.c":12:25 437 {cmpsi}
-     (expr_list:REG_DEAD (reg:SI 130)
-        (expr_list:REG_EQUAL (compare:CC (const_int -64 [0xffffffffffffffc0])
-                (const_int 0 [0]))
-            (nil))))
+(note 52 48 53 2 NOTE_INSN_DELETED)
 (note 53 52 57 2 NOTE_INSN_DELETED)
 (insn 57 53 59 2 (set (reg:DI 151 [ _6 ])
-        (if_then_else:DI (ge (reg:CC 66 cc)
-                (const_int 0 [0]))
-            (const_int 0 [0])
-            (reg:DI 126))) "pr108803.c":12:25 490 {*cmovdi_insn}
-     (expr_list:REG_DEAD (reg:CC 66 cc)
-        (nil)))
+        (reg:DI 126)) "pr108803.c":12:25 65 {*movdi_aarch64}
+     (nil))
 (note 59 57 60 2 NOTE_INSN_DELETED)
 (insn 60 59 61 2 (set (reg:DI 139)
         (const_int 0 [0])) "pr108803.c":12:42 65 {*movdi_aarch64}
This removes the cc setter that was used by insn 57 (which has been
updated) but is also needed for the former insn 53 which has been
merged into insn 78.

	Jakub

next prev parent reply	other threads:[~2023-02-27 19:11 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-17 10:14 Jakub Jelinek
2023-02-27 15:34 ` Richard Sandiford
2023-02-27 19:11   ` Jakub Jelinek [this message]
2023-02-27 19:51     ` Richard Sandiford
2023-02-27 20:02       ` Jakub Jelinek
2023-02-27 20:43         ` Richard Sandiford
2023-02-27 20:54           ` Jakub Jelinek
2023-02-27 21:01             ` Richard Sandiford
2023-02-27 21:15               ` Jakub Jelinek
2023-02-27 22:15             ` Segher Boessenkool
2023-02-27 22:21     ` Segher Boessenkool

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y/0ATZRIRTt76W7B@tucnak \
    --to=jakub@redhat.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=rguenther@suse.de \
    --cc=richard.sandiford@arm.com \
    --cc=segher@kernel.crashing.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).