[Bug target/108803] New: [10/11/12/13 Regression] wrong code for 128bit rotate on aarch64-unknown-linux-gnu with -Og

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug target/108803] New: [10/11/12/13 Regression] wrong code for 128bit rotate on aarch64-unknown-linux-gnu with -Og
@ 2023-02-15 14:06 zsojka at seznam dot cz
  2023-02-16  8:00 ` [Bug target/108803] " rguenth at gcc dot gnu.org
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: zsojka at seznam dot cz @ 2023-02-15 14:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108803

            Bug ID: 108803
           Summary: [10/11/12/13 Regression] wrong code for 128bit rotate
                    on aarch64-unknown-linux-gnu with -Og
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Keywords: wrong-code
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: zsojka at seznam dot cz
  Target Milestone: ---
              Host: x86_64-pc-linux-gnu
            Target: aarch64-unknown-linux-gnu

Created attachment 54468
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54468&action=edit
reduced testcase

Output:
$ aarch64-unknown-linux-gnu-gcc -Og testcase.c -static
$ qemu-aarch64 -- ./a.out 
qemu: uncaught target signal 6 (Aborted) - core dumped
Aborted

The value is "63 << 64 | 63" instead of just "63".

$ aarch64-unknown-linux-gnu-gcc -v
Using built-in specs.
COLLECT_GCC=/repo/gcc-trunk/binary-latest-aarch64/bin/aarch64-unknown-linux-gnu-gcc
COLLECT_LTO_WRAPPER=/repo/gcc-trunk/binary-trunk-r13-5999-20230215101300-g3f71b82596e-checking-yes-rtl-df-extra-nobootstrap-aarch64/bin/../libexec/gcc/aarch64-unknown-linux-gnu/13.0.1/lto-wrapper
Target: aarch64-unknown-linux-gnu
Configured with: /repo/gcc-trunk//configure --enable-languages=c,c++
--enable-valgrind-annotations --disable-nls --enable-checking=yes,rtl,df,extra
--disable-bootstrap --with-cloog --with-ppl --with-isl
--with-sysroot=/usr/aarch64-unknown-linux-gnu --build=x86_64-pc-linux-gnu
--host=x86_64-pc-linux-gnu --target=aarch64-unknown-linux-gnu
--with-ld=/usr/bin/aarch64-unknown-linux-gnu-ld
--with-as=/usr/bin/aarch64-unknown-linux-gnu-as --disable-libstdcxx-pch
--prefix=/repo/gcc-trunk//binary-trunk-r13-5999-20230215101300-g3f71b82596e-checking-yes-rtl-df-extra-nobootstrap-aarch64
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 13.0.1 20230215 (experimental) (GCC)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/108803] [10/11/12/13 Regression] wrong code for 128bit rotate on aarch64-unknown-linux-gnu with -Og
  2023-02-15 14:06 [Bug target/108803] New: [10/11/12/13 Regression] wrong code for 128bit rotate on aarch64-unknown-linux-gnu with -Og zsojka at seznam dot cz
@ 2023-02-16  8:00 ` rguenth at gcc dot gnu.org
  2023-02-16 16:34 ` jakub at gcc dot gnu.org
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-02-16  8:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108803

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P3                          |P2
   Target Milestone|---                         |10.5

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/108803] [10/11/12/13 Regression] wrong code for 128bit rotate on aarch64-unknown-linux-gnu with -Og
  2023-02-15 14:06 [Bug target/108803] New: [10/11/12/13 Regression] wrong code for 128bit rotate on aarch64-unknown-linux-gnu with -Og zsojka at seznam dot cz
  2023-02-16  8:00 ` [Bug target/108803] " rguenth at gcc dot gnu.org
@ 2023-02-16 16:34 ` jakub at gcc dot gnu.org
  2023-02-16 17:50 ` jakub at gcc dot gnu.org
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-02-16 16:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108803

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org,
                   |                            |rsandifo at gcc dot gnu.org

--- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
I'd say this is a bug in expand_doubleword_shift_condmove or so.
aarch64 when !TARGET_SIMD is a !SHIFT_COUNT_TRUNCATED target and shift_mask is
0.
And HAVE_conditional_move is non-zero.
Now, if the shift count is constant at expansion time, we just select one of
the
expand_superword_shift or expand_subword_shift depending on the exact value and
the shift count ought to be in both cases in the [0, BITS_PER_WORD - 1] range.
Similarly, if !HAVE_conditional_move and shift count is non-constant, we do the
same except that we select one at runtime, so again at runtime the chosen shift
count should be [0, BITS_PER_WORD - 1].
But for expand_doubleword_shift_condmove with shift_mask 0, op1 is [0, 2 *
BITS_PER_WORD - 1] and we pass op1, superword_op1 where the latter is op1 -
BITS_PER_WORD.
So, in expand_doubleword_shift_condmove subword_op1 is in [0, 2 * BITS_PER_WORD
- 1] range and superword_op1 is in [-BITS_PER_WORD, BITS_PER_WORD - 1] range. 
And the
routine just emits expand_superword_shift and expand_subword_shift and selects
using conditional move one of those.  But that means one of the two shifts is
necessarily with out of range count, either subword_op1 is [BITS_PER_WORD, 2 *
BITS_PER_WORD - 1] i.e. too large, or superword_op1 is in [-BITS_PER_WORD, -1]
range (i.e. negative).
Don't we need to mask those counts in that case (both)?
Now, in the testcase __builtin_add_overflow_p is actually evaluated to constant
0 only during the expansion (or later?) - in this particular case I wonder why
we haven't optimized it earlier because for any unsigned addends the addition
is in [0, 2 * UINT_MAX - 2] range and so fits well into signed __int128 range,
something to be looked at for GCC 14.
But I fear it is exactly the only during RTL discovered constant that we later
on propagate into the op1 - BITS_PER_WORD and thus do one of the shifts with
count -64.

CCing Richard as the author of that code from 2004.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/108803] [10/11/12/13 Regression] wrong code for 128bit rotate on aarch64-unknown-linux-gnu with -Og
  2023-02-15 14:06 [Bug target/108803] New: [10/11/12/13 Regression] wrong code for 128bit rotate on aarch64-unknown-linux-gnu with -Og zsojka at seznam dot cz
  2023-02-16  8:00 ` [Bug target/108803] " rguenth at gcc dot gnu.org
  2023-02-16 16:34 ` jakub at gcc dot gnu.org
@ 2023-02-16 17:50 ` jakub at gcc dot gnu.org
  2023-02-16 17:50 ` jakub at gcc dot gnu.org
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-02-16 17:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108803

--- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
--- gcc/optabs.cc.jj    2023-01-02 09:32:53.309838465 +0100
+++ gcc/optabs.cc       2023-02-16 18:04:54.794871019 +0100
@@ -596,6 +596,16 @@ expand_doubleword_shift_condmove (scalar
 {
   rtx outof_superword, into_superword;

+  if (shift_mask < BITS_PER_WORD - 1)
+    {
+      rtx tmp = immed_wide_int_const (wi::shwi (BITS_PER_WORD - 1,
+                                               GET_MODE (superword_op1)),
+                                     GET_MODE (superword_op1));
+      superword_op1
+       = simplify_expand_binop (op1_mode, and_optab, superword_op1, tmp,
+                                0, true, methods);
+    }
+
   /* Put the superword version of the output into OUTOF_SUPERWORD and
      INTO_SUPERWORD.  */
   outof_superword = outof_target != 0 ? gen_reg_rtx (word_mode) : 0;
@@ -617,6 +627,16 @@ expand_doubleword_shift_condmove (scalar
        return false;
     }

+  if (shift_mask < BITS_PER_WORD - 1)
+    {
+      rtx tmp = immed_wide_int_const (wi::shwi (BITS_PER_WORD - 1,
+                                               GET_MODE (subword_op1)),
+                                     GET_MODE (subword_op1));
+      subword_op1
+       = simplify_expand_binop (op1_mode, and_optab, subword_op1, tmp,
+                                0, true, methods);
+    }
+
   /* Put the subword version directly in OUTOF_TARGET and INTO_TARGET.  */
   if (!expand_subword_shift (op1_mode, binoptab,
                             outof_input, into_input, subword_op1,
indeed fixes the miscompilation, but unfortunately with e.g.
__attribute__((noipa)) __int128
foo (__int128 a, unsigned k)
{
  return a << k;
}

__attribute__((noipa)) __int128
bar (__int128 a, unsigned k)
{
  return a >> k;
}
results in one extra insn in each of the functions.  While the superword_op1
case
is fine because aarch64 (among other arches) has a pattern to catch shift with
masked count, in the subword_op1 case that doesn't work, because
expand_subword_shift actually emits 3 shifts instead of just one, one with
(BIT_PER_WORD - 1) - op1 as shift count
and two with op1.  If the op1 &= (BITS_PER_WORD - 1) masking is done in the
caller, then
it can't be easily merged with the shifts.
We could do that also separately in expand_subword_shift under some new bool
and in that
case instead of using op1 &= (BITS_PER_WORD - 1); shift1 by ((BITS_PER_WORD -
1) - op1); shift2 by op1; shift3 by op1 use tmp = (63 - op1) & (BITS_PER_WORD -
1); shift1 by tmp; op1 &= (BITS_PER_WORD - 1); shift2 by op1; shift3 by op1,
but that would be larger code if the target doesn't have those shift with
masking patterns that trigger on it.  Perhaps have some target hook?  Or try to
recog the combined instruction?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/108803] [10/11/12/13 Regression] wrong code for 128bit rotate on aarch64-unknown-linux-gnu with -Og
  2023-02-15 14:06 [Bug target/108803] New: [10/11/12/13 Regression] wrong code for 128bit rotate on aarch64-unknown-linux-gnu with -Og zsojka at seznam dot cz
                   ` (2 preceding siblings ...)
  2023-02-16 17:50 ` jakub at gcc dot gnu.org
@ 2023-02-16 17:50 ` jakub at gcc dot gnu.org
  2023-02-16 18:00 ` jakub at gcc dot gnu.org
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-02-16 17:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108803

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2023-02-16
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/108803] [10/11/12/13 Regression] wrong code for 128bit rotate on aarch64-unknown-linux-gnu with -Og
  2023-02-15 14:06 [Bug target/108803] New: [10/11/12/13 Regression] wrong code for 128bit rotate on aarch64-unknown-linux-gnu with -Og zsojka at seznam dot cz
                   ` (3 preceding siblings ...)
  2023-02-16 17:50 ` jakub at gcc dot gnu.org
@ 2023-02-16 18:00 ` jakub at gcc dot gnu.org
  2023-02-16 18:14 ` jakub at gcc dot gnu.org
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-02-16 18:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108803

--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
I take back the "I wonder why we haven't optimized it earlier", the reason is
-Og, we do optimize that in evrp/vrp*, but those aren't done with -Og.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/108803] [10/11/12/13 Regression] wrong code for 128bit rotate on aarch64-unknown-linux-gnu with -Og
  2023-02-15 14:06 [Bug target/108803] New: [10/11/12/13 Regression] wrong code for 128bit rotate on aarch64-unknown-linux-gnu with -Og zsojka at seznam dot cz
                   ` (4 preceding siblings ...)
  2023-02-16 18:00 ` jakub at gcc dot gnu.org
@ 2023-02-16 18:14 ` jakub at gcc dot gnu.org
  2023-02-16 18:44 ` jakub at gcc dot gnu.org
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-02-16 18:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108803

--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
On the other side, if we knew that the backend would use something like the
shifts with masking, we could then avoid the extra reverse unsigned shift by 1
+ reverse unsigned shift by (63 - op1) & 63 plus two shifts by op1 & 63 and
could do instead a single shift by -op1 & 63 (plus as before two shifts by op1
& 63).
So replace the current problematic code for foo in #c2 with:
        subs    w5, w2, #64
        lsl     x6, x0, x5
-       lsr     x3, x0, 1
-       mov     w4, 63
-       sub     w4, w4, w2
-       lsr     x3, x3, x4
+       neg     w4, w2
+       lsr     x3, x0, x4
        lsl     x1, x1, x2
        orr     x1, x3, x1
        lsl     x0, x0, x2
        csel    x0, xzr, x0, pl
        csel    x1, x6, x1, pl
        ret

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/108803] [10/11/12/13 Regression] wrong code for 128bit rotate on aarch64-unknown-linux-gnu with -Og
  2023-02-15 14:06 [Bug target/108803] New: [10/11/12/13 Regression] wrong code for 128bit rotate on aarch64-unknown-linux-gnu with -Og zsojka at seznam dot cz
                   ` (5 preceding siblings ...)
  2023-02-16 18:14 ` jakub at gcc dot gnu.org
@ 2023-02-16 18:44 ` jakub at gcc dot gnu.org
  2023-02-16 19:59 ` jakub at gcc dot gnu.org
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-02-16 18:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108803

--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
The change then would be
--- gcc/optabs.cc.jj    2023-01-02 09:32:53.309838465 +0100
+++ gcc/optabs.cc       2023-02-16 19:33:14.583883584 +0100
@@ -507,7 +507,7 @@ expand_subword_shift (scalar_int_mode op
                      rtx outof_input, rtx into_input, rtx op1,
                      rtx outof_target, rtx into_target,
                      int unsignedp, enum optab_methods methods,
-                     unsigned HOST_WIDE_INT shift_mask)
+                     unsigned HOST_WIDE_INT shift_mask, bool mask_count)
 {
   optab reverse_unsigned_shift, unsigned_shift;
   rtx tmp, carries;
@@ -526,6 +526,23 @@ expand_subword_shift (scalar_int_mode op
       tmp = simplify_expand_binop (op1_mode, sub_optab, tmp, op1,
                                   0, true, methods);
     }
+  else if (mask_count)
+    {
+      /* When called from expand_doubleword_shift_condmove with shift_mask 0,
+        we need to mask the shift count (and on some targets have that later
+        be combined with shifts into a single instruction).  In that case
+        we can avoid the separate shift by 1 and another by
+        (BITS_PER_WORD - 1) - op1 and can just do one shift by
+        -op1 & (BITS_PER_WORD - 1).  */
+      carries = outof_input;
+      tmp = expand_unop (op1_mode, neg_optab, op1, 0, false);
+      rtx tmp2 = immed_wide_int_const (wi::shwi (BITS_PER_WORD - 1,
+                                      op1_mode), op1_mode);
+      tmp = simplify_expand_binop (op1_mode, and_optab, tmp, tmp2, 0, true,
+                                  methods);
+      op1 = simplify_expand_binop (op1_mode, and_optab, op1, tmp2, 0, true,
+                                  methods);
+    }
   else
     {
       /* We must avoid shifting by BITS_PER_WORD bits since that is either
@@ -596,6 +613,15 @@ expand_doubleword_shift_condmove (scalar
 {
   rtx outof_superword, into_superword;

+  if (shift_mask < BITS_PER_WORD - 1)
+    {
+      rtx tmp = immed_wide_int_const (wi::shwi (BITS_PER_WORD - 1, op1_mode),
+                                     op1_mode);
+      superword_op1
+       = simplify_expand_binop (op1_mode, and_optab, superword_op1, tmp,
+                                0, true, methods);
+    }
+
   /* Put the superword version of the output into OUTOF_SUPERWORD and
      INTO_SUPERWORD.  */
   outof_superword = outof_target != 0 ? gen_reg_rtx (word_mode) : 0;
@@ -621,7 +647,8 @@ expand_doubleword_shift_condmove (scalar
   if (!expand_subword_shift (op1_mode, binoptab,
                             outof_input, into_input, subword_op1,
                             outof_target, into_target,
-                            unsignedp, methods, shift_mask))
+                            unsignedp, methods, shift_mask,
+                            shift_mask < BITS_PER_WORD - 1))
     return false;

   /* Select between them.  Do the INTO half first because INTO_SUPERWORD
@@ -742,7 +769,7 @@ expand_doubleword_shift (scalar_int_mode
        return expand_subword_shift (op1_mode, binoptab,
                                     outof_input, into_input, op1,
                                     outof_target, into_target,
-                                    unsignedp, methods, shift_mask);
+                                    unsignedp, methods, shift_mask, false);
     }

   /* Try using conditional moves to generate straight-line code.  */
@@ -781,7 +808,7 @@ expand_doubleword_shift (scalar_int_mode
   if (!expand_subword_shift (op1_mode, binoptab,
                             outof_input, into_input, op1,
                             outof_target, into_target,
-                            unsignedp, methods, shift_mask))
+                            unsignedp, methods, shift_mask, false))
     return false;

   emit_label (done_label);

or so and emits one fewer instruction for foo and bar as before.  But somehow
the #c0 testcase with it aborts again, so something is not right...

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/108803] [10/11/12/13 Regression] wrong code for 128bit rotate on aarch64-unknown-linux-gnu with -Og
  2023-02-15 14:06 [Bug target/108803] New: [10/11/12/13 Regression] wrong code for 128bit rotate on aarch64-unknown-linux-gnu with -Og zsojka at seznam dot cz
                   ` (6 preceding siblings ...)
  2023-02-16 18:44 ` jakub at gcc dot gnu.org
@ 2023-02-16 19:59 ` jakub at gcc dot gnu.org
  2023-02-27 19:24 ` jakub at gcc dot gnu.org
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-02-16 19:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108803

--- Comment #6 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Created attachment 54476
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54476&action=edit
gcc13-pr108803.patch

Actually, the above patch isn't correct because for op1 equal to 0 we really
need the reverse_unsigned_shift to give 0 aka (outof_input >> 1) >> 63 rather
than outof_input >> 0 aka outof_input.
Anyway, with this patch we get same number of instructions as before on the #c2
functions, the and unfortunately isn't optimized away because it has 2 uses
rather than just one, but on the other side 63 - count needs 2 instructions
while ~count only one.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/108803] [10/11/12/13 Regression] wrong code for 128bit rotate on aarch64-unknown-linux-gnu with -Og
  2023-02-15 14:06 [Bug target/108803] New: [10/11/12/13 Regression] wrong code for 128bit rotate on aarch64-unknown-linux-gnu with -Og zsojka at seznam dot cz
                   ` (7 preceding siblings ...)
  2023-02-16 19:59 ` jakub at gcc dot gnu.org
@ 2023-02-27 19:24 ` jakub at gcc dot gnu.org
  2023-07-07 10:44 ` [Bug target/108803] [11/12/13/14 " rguenth at gcc dot gnu.org
  2024-02-21  6:24 ` pinskia at gcc dot gnu.org
  10 siblings, 0 replies; 12+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-02-27 19:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108803

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |segher at gcc dot gnu.org

--- Comment #7 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Besides this question whether it is ok to emit possibly out-of-bounds shifts
into code executed at runtime where source code didn't have any UB, provided
that their destination is only used in a conditional move that will not pick
that result up, I think we have a combiner bug here.

Copying what I wrote in
https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612909.html:
I've added debug_bb_n after
every successful combine attempt,
--- gcc/combine.cc.jj   2023-01-02 09:32:43.720977011 +0100
+++ gcc/combine.cc      2023-02-27 19:27:28.151289055 +0100
@@ -4755,6 +4755,16 @@ try_combine (rtx_insn *i3, rtx_insn *i2,
   if (added_notes_insn && DF_INSN_LUID (added_notes_insn) < DF_INSN_LUID
(ret))
     ret = added_notes_insn;

+{
+static int cnt = 0;
+char buf[64];
+sprintf (buf, "/tmp/combine.%d", cnt++);
+FILE *f = fopen (buf, "a");
+fprintf (f, "%d\n", combine_successes);
+basic_block bb = BASIC_BLOCK_FOR_FN (cfun, 2);
+dump_bb (f, bb, 0, dump_flags_t ());
+fclose (f);
+}
   return ret;
 }

and I see
(insn 52 48 53 2 (set (reg:CC 66 cc)
        (compare:CC (reg:SI 130)
            (const_int 0 [0]))) "pr108803.c":12:25 437 {cmpsi}
     (expr_list:REG_DEAD (reg:SI 130)
        (expr_list:REG_EQUAL (compare:CC (const_int -64 [0xffffffffffffffc0])
                (const_int 0 [0]))
            (nil))))
(insn 53 52 57 2 (set (reg:DI 152 [ _6+8 ])
        (if_then_else:DI (ge (reg:CC 66 cc)
                (const_int 0 [0]))
            (reg:DI 132)
            (const_int 0 [0]))) "pr108803.c":12:25 490 {*cmovdi_insn}
     (expr_list:REG_DEAD (reg:DI 132)
        (nil)))
(insn 57 53 59 2 (set (reg:DI 151 [ _6 ])
        (if_then_else:DI (ge (reg:CC 66 cc)
                (const_int 0 [0]))
            (const_int 0 [0])
            (reg:DI 126))) "pr108803.c":12:25 490 {*cmovdi_insn}
     (expr_list:REG_DEAD (reg:CC 66 cc)
        (nil)))
...
(insn 71 68 72 2 (set (reg:CC 66 cc)
        (compare:CC (reg:SI 137)
            (const_int 0 [0]))) "pr108803.c":12:42 437 {cmpsi}
     (expr_list:REG_DEAD (reg:SI 137)
        (expr_list:REG_EQUAL (compare:CC (const_int -64 [0xffffffffffffffc0])
                (const_int 0 [0]))
            (nil))))
(insn 72 71 76 2 (set (reg:DI 153 [ _8 ])
        (if_then_else:DI (ge (reg:CC 66 cc)
                (const_int 0 [0]))
            (reg:DI 139)
            (reg:DI 153 [ _8 ]))) "pr108803.c":12:42 490 {*cmovdi_insn}
     (expr_list:REG_DEAD (reg:DI 139)
        (nil)))
(insn 76 72 77 2 (set (reg:DI 154 [ _8+8 ])
        (if_then_else:DI (ge (reg:CC 66 cc)
                (const_int 0 [0]))
            (reg:DI 138)
            (reg:DI 127))) "pr108803.c":12:42 490 {*cmovdi_insn}
     (expr_list:REG_DEAD (reg:DI 138)
        (expr_list:REG_DEAD (reg:DI 127)
            (expr_list:REG_DEAD (reg:CC 66 cc)
                (nil)))))
(insn 77 76 78 2 (set (reg:DI 159 [ b ])
        (ior:DI (reg:DI 151 [ _6 ])
            (reg:DI 126))) "pr108803.c":12:12 537 {iordi3}
     (expr_list:REG_DEAD (reg:DI 126)
        (expr_list:REG_DEAD (reg:DI 151 [ _6 ])
            (nil))))
(insn 78 77 80 2 (set (reg:DI 160 [ b+8 ])
        (reg:DI 152 [ _6+8 ])) "pr108803.c":12:12 65 {*movdi_aarch64}
     (expr_list:REG_DEAD (reg:DI 152 [ _6+8 ])
        (nil)))
in one of the dumps which still looks correct from brief skimming,
r130 here is k - 64 aka -64 and it is the pair of conditional moves of the
first TImode shift, effectively (a + 63) << 0 but 0 is obfuscated at
expansion time, while the second pair of conditional moves are for the
second TImode shift, effectively (a + 63) >> (0 & 63) and then both ored
together.  It makes sense that the second set of conditional moves are
optimized
earlier because the shift count in that case is more constrained, it is masked
with 63,
so one can see (0 & 63) - 64 < 0, while r130 comparison isn't optimized so
early
because r130 has 2 users and one of them couldn't be merged.
Note, this is the exact spot why I think we want the other patch, we can't
merge
(insn 40 37 41 2 (set (reg:SI 130)
        (const_int -64 [0xffffffffffffffc0])) "pr108803.c":12:25 64
{*movsi_aarch64}
     (nil))
(insn 41 40 43 2 (set (reg:DI 132)
        (ashift:DI (reg:DI 126)
            (subreg:QI (reg:SI 130) 0))) "pr108803.c":12:25 781
{*aarch64_ashl_sisd_or_int_di3}
     (expr_list:REG_EQUAL (ashift:DI (reg:DI 126)
            (const_int -64 [0xffffffffffffffc0]))
        (nil)))
because the backend doesn't like out of bounds shift count immediates in the
instruction
directly.
When doing
Trying 53 -> 78:
   53: r152:DI={(cc:CC>=0)?r132:DI:0}
      REG_DEAD r132:DI
   78: r160:DI=r152:DI
      REG_DEAD r152:DI
Successfully matched this instruction:
(set (reg:DI 160 [ b+8 ])
    (if_then_else:DI (ge (reg:CC 66 cc)
            (const_int 0 [0]))
        (reg:DI 132)
        (const_int 0 [0])))
allowing combination of insns 53 and 78
original costs 4 + 4 = 8
replacement cost 4
deferring deletion of insn with uid = 53.
modifying insn i3    78: r160:DI={(cc:CC>=0)?r132:DI:0}
      REG_DEAD cc:CC
      REG_DEAD r132:DI
deferring rescan insn with uid = 78.
combination, the IL changes:
@@ -38,21 +38,15 @@
 (insn 52 48 53 2 (set (reg:CC 66 cc)
         (compare:CC (reg:SI 130)
             (const_int 0 [0]))) "pr108803.c":12:25 437 {cmpsi}
      (expr_list:REG_DEAD (reg:SI 130)
         (expr_list:REG_EQUAL (compare:CC (const_int -64 [0xffffffffffffffc0])
                 (const_int 0 [0]))
             (nil))))
-(insn 53 52 57 2 (set (reg:DI 152 [ _6+8 ])
-        (if_then_else:DI (ge (reg:CC 66 cc)
-                (const_int 0 [0]))
-            (reg:DI 132)
-            (const_int 0 [0]))) "pr108803.c":12:25 490 {*cmovdi_insn}
-     (expr_list:REG_DEAD (reg:DI 132)
-        (nil)))
+(note 53 52 57 2 NOTE_INSN_DELETED)
 (insn 57 53 59 2 (set (reg:DI 151 [ _6 ])
         (if_then_else:DI (ge (reg:CC 66 cc)
                 (const_int 0 [0]))
             (const_int 0 [0])
             (reg:DI 126))) "pr108803.c":12:25 490 {*cmovdi_insn}
      (expr_list:REG_DEAD (reg:CC 66 cc)
         (nil)))
@@ -72,17 +66,21 @@
 (insn 77 76 78 2 (set (reg:DI 159 [ b ])
         (ior:DI (reg:DI 151 [ _6 ])
             (reg:DI 126))) "pr108803.c":12:12 537 {iordi3}
      (expr_list:REG_DEAD (reg:DI 126)
         (expr_list:REG_DEAD (reg:DI 151 [ _6 ])
             (nil))))
 (insn 78 77 80 2 (set (reg:DI 160 [ b+8 ])
-        (reg:DI 152 [ _6+8 ])) "pr108803.c":12:12 65 {*movdi_aarch64}
-     (expr_list:REG_DEAD (reg:DI 152 [ _6+8 ])
-        (nil)))
+        (if_then_else:DI (ge (reg:CC 66 cc)
+                (const_int 0 [0]))
+            (reg:DI 132)
+            (const_int 0 [0]))) "pr108803.c":12:12 490 {*cmovdi_insn}
+     (expr_list:REG_DEAD (reg:CC 66 cc)
+        (expr_list:REG_DEAD (reg:DI 132)
+            (nil))))
 (insn 80 78 82 2 (parallel [
             (set (reg:CC_C 66 cc)
                 (compare:CC_C (plus:DI (reg:DI 155 [ a ])
                         (reg:DI 159 [ b ]))
                     (reg:DI 155 [ a ])))
             (set (reg:DI 144)
                 (plus:DI (reg:DI 155 [ a ])
but as you can see, because cc reg has been REG_DEAD before on insn 57
rather than on insn 53, nothing really removed REG_DEAD note from there
and just adds it on insn 78 (note, besides this REG_DEAD issue the
IL is otherwise still sane, the previous cc setter 71 and its previous
uses 72 and 76 in between the move have been optimized away already in
an earlier successful combination).
And things go wild with the next successful combination:
Trying 52 -> 57:
   52: cc:CC=cmp(0xffffffffffffffc0,0)
      REG_DEAD r130:SI
      REG_EQUAL cmp(0xffffffffffffffc0,0)
   57: r151:DI={(cc:CC>=0)?0:r126:DI}
      REG_DEAD cc:CC
Successfully matched this instruction:
(set (reg:DI 151 [ _6 ])
    (reg:DI 126))
allowing combination of insns 52 and 57
original costs 4 + 4 = 8
replacement cost 4
deferring deletion of insn with uid = 52.
modifying insn i3    57: r151:DI=r126:DI
deferring rescan insn with uid = 57.
where the IL changes:
@@ -29,27 +29,18 @@
 (insn 41 40 43 2 (set (reg:DI 132)
         (ashift:DI (reg:DI 126)
             (subreg:QI (reg:SI 130) 0))) "pr108803.c":12:25 781
{*aarch64_ashl_sisd_or_int_di3}
-     (expr_list:REG_EQUAL (ashift:DI (reg:DI 126)
-            (const_int -64 [0xffffffffffffffc0]))
-        (nil)))
+     (expr_list:REG_DEAD (reg:SI 130)
+        (expr_list:REG_EQUAL (ashift:DI (reg:DI 126)
+                (const_int -64 [0xffffffffffffffc0]))
+            (nil))))
 (note 43 41 46 2 NOTE_INSN_DELETED)
 (note 46 43 48 2 NOTE_INSN_DELETED)
 (note 48 46 52 2 NOTE_INSN_DELETED)
-(insn 52 48 53 2 (set (reg:CC 66 cc)
-        (compare:CC (reg:SI 130)
-            (const_int 0 [0]))) "pr108803.c":12:25 437 {cmpsi}
-     (expr_list:REG_DEAD (reg:SI 130)
-        (expr_list:REG_EQUAL (compare:CC (const_int -64 [0xffffffffffffffc0])
-                (const_int 0 [0]))
-            (nil))))
+(note 52 48 53 2 NOTE_INSN_DELETED)
 (note 53 52 57 2 NOTE_INSN_DELETED)
 (insn 57 53 59 2 (set (reg:DI 151 [ _6 ])
-        (if_then_else:DI (ge (reg:CC 66 cc)
-                (const_int 0 [0]))
-            (const_int 0 [0])
-            (reg:DI 126))) "pr108803.c":12:25 490 {*cmovdi_insn}
-     (expr_list:REG_DEAD (reg:CC 66 cc)
-        (nil)))
+        (reg:DI 126)) "pr108803.c":12:25 65 {*movdi_aarch64}
+     (nil))
 (note 59 57 60 2 NOTE_INSN_DELETED)
 (insn 60 59 61 2 (set (reg:DI 139)
         (const_int 0 [0])) "pr108803.c":12:42 65 {*movdi_aarch64}
This removes the cc setter that was used by insn 57 (which has been
updated) but is also needed for the former insn 53 which has been
merged into insn 78.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/108803] [11/12/13/14 Regression] wrong code for 128bit rotate on aarch64-unknown-linux-gnu with -Og
  2023-02-15 14:06 [Bug target/108803] New: [10/11/12/13 Regression] wrong code for 128bit rotate on aarch64-unknown-linux-gnu with -Og zsojka at seznam dot cz
                   ` (8 preceding siblings ...)
  2023-02-27 19:24 ` jakub at gcc dot gnu.org
@ 2023-07-07 10:44 ` rguenth at gcc dot gnu.org
  2024-02-21  6:24 ` pinskia at gcc dot gnu.org
  10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-07-07 10:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108803

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|10.5                        |11.5

--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 10 branch is being closed.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/108803] [11/12/13/14 Regression] wrong code for 128bit rotate on aarch64-unknown-linux-gnu with -Og
  2023-02-15 14:06 [Bug target/108803] New: [10/11/12/13 Regression] wrong code for 128bit rotate on aarch64-unknown-linux-gnu with -Og zsojka at seznam dot cz
                   ` (9 preceding siblings ...)
  2023-07-07 10:44 ` [Bug target/108803] [11/12/13/14 " rguenth at gcc dot gnu.org
@ 2024-02-21  6:24 ` pinskia at gcc dot gnu.org
  10 siblings, 0 replies; 12+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-02-21  6:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108803

--- Comment #9 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
This looks to be to fixed on the trunk. Maybe r14-4647-gaccccbf5ae4594 fixed
it?

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-02-21  6:24 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-15 14:06 [Bug target/108803] New: [10/11/12/13 Regression] wrong code for 128bit rotate on aarch64-unknown-linux-gnu with -Og zsojka at seznam dot cz
2023-02-16  8:00 ` [Bug target/108803] " rguenth at gcc dot gnu.org
2023-02-16 16:34 ` jakub at gcc dot gnu.org
2023-02-16 17:50 ` jakub at gcc dot gnu.org
2023-02-16 17:50 ` jakub at gcc dot gnu.org
2023-02-16 18:00 ` jakub at gcc dot gnu.org
2023-02-16 18:14 ` jakub at gcc dot gnu.org
2023-02-16 18:44 ` jakub at gcc dot gnu.org
2023-02-16 19:59 ` jakub at gcc dot gnu.org
2023-02-27 19:24 ` jakub at gcc dot gnu.org
2023-07-07 10:44 ` [Bug target/108803] [11/12/13/14 " rguenth at gcc dot gnu.org
2024-02-21  6:24 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).