public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] simplify-rtx: Fix invalid simplification with paradoxical subregs [PR110206]
@ 2023-07-09  8:52 Uros Bizjak
  2023-07-10  9:16 ` Richard Biener
  0 siblings, 1 reply; 11+ messages in thread
From: Uros Bizjak @ 2023-07-09  8:52 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1888 bytes --]

As shown in the PR, simplify_gen_subreg call in simplify_replace_fn_rtx:

(gdb) list
469           if (code == SUBREG)
470             {
471               op0 = simplify_replace_fn_rtx (SUBREG_REG (x),
old_rtx, fn, data);
472               if (op0 == SUBREG_REG (x))
473                 return x;
474               op0 = simplify_gen_subreg (GET_MODE (x), op0,
475                                          GET_MODE (SUBREG_REG (x)),
476                                          SUBREG_BYTE (x));
477               return op0 ? op0 : x;
478             }

simplifies with following arguments:

(gdb) p debug_rtx (op0)
(const_vector:V4QI [
        (const_int -52 [0xffffffffffffffcc]) repeated x4
    ])
(gdb) p debug_rtx (x)
(subreg:V16QI (reg:V4QI 98) 0)

to:

(gdb) p debug_rtx (op0)
(const_vector:V16QI [
        (const_int -52 [0xffffffffffffffcc]) repeated x16
    ])

This simplification is invalid, it is not possible to get V16QImode vector
from V4QImode vector, even when all elements are duplicates.

The simplification happens in simplify_context::simplify_subreg:

(gdb) list
7558          if (VECTOR_MODE_P (outermode)
7559              && GET_MODE_INNER (outermode) == GET_MODE_INNER (innermode)
7560              && vec_duplicate_p (op, &elt))
7561            return gen_vec_duplicate (outermode, elt);

but the above simplification is valid only for non-paradoxical registers,
where outermode <= innermode.  We should not assume that elements outside
the original register are valid, let alone all duplicates.

    PR target/110206

gcc/ChangeLog:

    * simplify-rtx.cc (simplify_context::simplify_subreg):
    Avoid returning a vector with duplicated value
    outside the original register.

gcc/testsuite/ChangeLog:

    * gcc.dg/torture/pr110206.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

OK for master and release branches?

Uros.

[-- Attachment #2: p.diff.txt --]
[-- Type: text/plain, Size: 1381 bytes --]

diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index d7315d82aa3..87ca25086dc 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -7557,6 +7557,7 @@ simplify_context::simplify_subreg (machine_mode outermode, rtx op,
 
       if (VECTOR_MODE_P (outermode)
 	  && GET_MODE_INNER (outermode) == GET_MODE_INNER (innermode)
+	  && !paradoxical_subreg_p (outermode, innermode)
 	  && vec_duplicate_p (op, &elt))
 	return gen_vec_duplicate (outermode, elt);
 
diff --git a/gcc/testsuite/gcc.dg/torture/pr110206.c b/gcc/testsuite/gcc.dg/torture/pr110206.c
new file mode 100644
index 00000000000..3a4f221ef47
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr110206.c
@@ -0,0 +1,30 @@
+/* PR target/110206 */
+/* { dg-do run { target x86_64-*-* i?86-*-* } } */
+
+typedef unsigned char __attribute__((__vector_size__ (4))) U;
+typedef unsigned char __attribute__((__vector_size__ (8))) V;
+typedef unsigned short u16;
+
+V g;
+
+void
+__attribute__((noinline))
+foo (U u, u16 c, V *r)
+{
+  if (!c)
+    __builtin_abort ();
+  V x = __builtin_shufflevector (u, (204 >> u), 7, 0, 5, 1, 3, 5, 0, 2);
+  V y = __builtin_shufflevector (g, (V) { }, 7, 6, 6, 7, 2, 6, 3, 5);
+  V z = __builtin_shufflevector (y, 204 * x, 3, 9, 8, 1, 4, 6, 14, 5);
+  *r = z;
+}
+
+int
+main (void)
+{
+  V r;
+  foo ((U){4}, 5, &r);
+  if (r[6] != 0x30)
+    __builtin_abort();
+  return 0;
+}

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] simplify-rtx: Fix invalid simplification with paradoxical subregs [PR110206]
  2023-07-09  8:52 [PATCH] simplify-rtx: Fix invalid simplification with paradoxical subregs [PR110206] Uros Bizjak
@ 2023-07-10  9:16 ` Richard Biener
  2023-07-10  9:26   ` Uros Bizjak
  0 siblings, 1 reply; 11+ messages in thread
From: Richard Biener @ 2023-07-10  9:16 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches

On Sun, Jul 9, 2023 at 10:53 AM Uros Bizjak via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> As shown in the PR, simplify_gen_subreg call in simplify_replace_fn_rtx:
>
> (gdb) list
> 469           if (code == SUBREG)
> 470             {
> 471               op0 = simplify_replace_fn_rtx (SUBREG_REG (x),
> old_rtx, fn, data);
> 472               if (op0 == SUBREG_REG (x))
> 473                 return x;
> 474               op0 = simplify_gen_subreg (GET_MODE (x), op0,
> 475                                          GET_MODE (SUBREG_REG (x)),
> 476                                          SUBREG_BYTE (x));
> 477               return op0 ? op0 : x;
> 478             }
>
> simplifies with following arguments:
>
> (gdb) p debug_rtx (op0)
> (const_vector:V4QI [
>         (const_int -52 [0xffffffffffffffcc]) repeated x4
>     ])
> (gdb) p debug_rtx (x)
> (subreg:V16QI (reg:V4QI 98) 0)
>
> to:
>
> (gdb) p debug_rtx (op0)
> (const_vector:V16QI [
>         (const_int -52 [0xffffffffffffffcc]) repeated x16
>     ])
>
> This simplification is invalid, it is not possible to get V16QImode vector
> from V4QImode vector, even when all elements are duplicates.
>
> The simplification happens in simplify_context::simplify_subreg:
>
> (gdb) list
> 7558          if (VECTOR_MODE_P (outermode)
> 7559              && GET_MODE_INNER (outermode) == GET_MODE_INNER (innermode)
> 7560              && vec_duplicate_p (op, &elt))
> 7561            return gen_vec_duplicate (outermode, elt);
>
> but the above simplification is valid only for non-paradoxical registers,
> where outermode <= innermode.  We should not assume that elements outside
> the original register are valid, let alone all duplicates.

Hmm, but looking at the audit trail the x86 backend expects them to be zero?
Isn't that wrong as well?

That is, I think putting any random value into the upper lanes when
constant folding
a paradoxical subreg sounds OK to me, no?

Of course we might choose to not do such constant propagation for
efficiency reason - at least
when the resulting CONST_* would require a larger constant pool entry
or more costly
construction.

Thanks,
Richard.

>     PR target/110206
>
> gcc/ChangeLog:
>
>     * simplify-rtx.cc (simplify_context::simplify_subreg):
>     Avoid returning a vector with duplicated value
>     outside the original register.
>
> gcc/testsuite/ChangeLog:
>
>     * gcc.dg/torture/pr110206.c: New test.
>
> Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
>
> OK for master and release branches?
>
> Uros.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] simplify-rtx: Fix invalid simplification with paradoxical subregs [PR110206]
  2023-07-10  9:16 ` Richard Biener
@ 2023-07-10  9:26   ` Uros Bizjak
  2023-07-10  9:47     ` Richard Biener
  0 siblings, 1 reply; 11+ messages in thread
From: Uros Bizjak @ 2023-07-10  9:26 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

On Mon, Jul 10, 2023 at 11:17 AM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Sun, Jul 9, 2023 at 10:53 AM Uros Bizjak via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > As shown in the PR, simplify_gen_subreg call in simplify_replace_fn_rtx:
> >
> > (gdb) list
> > 469           if (code == SUBREG)
> > 470             {
> > 471               op0 = simplify_replace_fn_rtx (SUBREG_REG (x),
> > old_rtx, fn, data);
> > 472               if (op0 == SUBREG_REG (x))
> > 473                 return x;
> > 474               op0 = simplify_gen_subreg (GET_MODE (x), op0,
> > 475                                          GET_MODE (SUBREG_REG (x)),
> > 476                                          SUBREG_BYTE (x));
> > 477               return op0 ? op0 : x;
> > 478             }
> >
> > simplifies with following arguments:
> >
> > (gdb) p debug_rtx (op0)
> > (const_vector:V4QI [
> >         (const_int -52 [0xffffffffffffffcc]) repeated x4
> >     ])
> > (gdb) p debug_rtx (x)
> > (subreg:V16QI (reg:V4QI 98) 0)
> >
> > to:
> >
> > (gdb) p debug_rtx (op0)
> > (const_vector:V16QI [
> >         (const_int -52 [0xffffffffffffffcc]) repeated x16
> >     ])
> >
> > This simplification is invalid, it is not possible to get V16QImode vector
> > from V4QImode vector, even when all elements are duplicates.
> >
> > The simplification happens in simplify_context::simplify_subreg:
> >
> > (gdb) list
> > 7558          if (VECTOR_MODE_P (outermode)
> > 7559              && GET_MODE_INNER (outermode) == GET_MODE_INNER (innermode)
> > 7560              && vec_duplicate_p (op, &elt))
> > 7561            return gen_vec_duplicate (outermode, elt);
> >
> > but the above simplification is valid only for non-paradoxical registers,
> > where outermode <= innermode.  We should not assume that elements outside
> > the original register are valid, let alone all duplicates.
>
> Hmm, but looking at the audit trail the x86 backend expects them to be zero?
> Isn't that wrong as well?

If you mean Comment #10, it is just an observation that
simplify_replace_rtx simplifies arguments from Comment #9 to:

(gdb) p debug_rtx (src)
(const_vector:V8HI [
        (const_int 204 [0xcc]) repeated x4
        (const_int 0 [0]) repeated x4
    ])

instead of:

(gdb) p debug_rtx (src)
(const_vector:V8HI [
        (const_int 204 [0xcc]) repeated x8
    ])

which is in line with the statement below.
>
> That is, I think putting any random value into the upper lanes when
> constant folding
> a paradoxical subreg sounds OK to me, no?

The compiler is putting zero there as can be seen from the above new RTX.

> Of course we might choose to not do such constant propagation for
> efficiency reason - at least
> when the resulting CONST_* would require a larger constant pool entry
> or more costly
> construction.

This is probably a follow-up improvement, where this patch tries to
fix a specific invalid simplification of simplify_replace_rtx that is
invalid universally.

Uros.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] simplify-rtx: Fix invalid simplification with paradoxical subregs [PR110206]
  2023-07-10  9:26   ` Uros Bizjak
@ 2023-07-10  9:47     ` Richard Biener
  2023-07-10 11:01       ` Uros Bizjak
  0 siblings, 1 reply; 11+ messages in thread
From: Richard Biener @ 2023-07-10  9:47 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches

On Mon, Jul 10, 2023 at 11:26 AM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Mon, Jul 10, 2023 at 11:17 AM Richard Biener
> <richard.guenther@gmail.com> wrote:
> >
> > On Sun, Jul 9, 2023 at 10:53 AM Uros Bizjak via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> > >
> > > As shown in the PR, simplify_gen_subreg call in simplify_replace_fn_rtx:
> > >
> > > (gdb) list
> > > 469           if (code == SUBREG)
> > > 470             {
> > > 471               op0 = simplify_replace_fn_rtx (SUBREG_REG (x),
> > > old_rtx, fn, data);
> > > 472               if (op0 == SUBREG_REG (x))
> > > 473                 return x;
> > > 474               op0 = simplify_gen_subreg (GET_MODE (x), op0,
> > > 475                                          GET_MODE (SUBREG_REG (x)),
> > > 476                                          SUBREG_BYTE (x));
> > > 477               return op0 ? op0 : x;
> > > 478             }
> > >
> > > simplifies with following arguments:
> > >
> > > (gdb) p debug_rtx (op0)
> > > (const_vector:V4QI [
> > >         (const_int -52 [0xffffffffffffffcc]) repeated x4
> > >     ])
> > > (gdb) p debug_rtx (x)
> > > (subreg:V16QI (reg:V4QI 98) 0)
> > >
> > > to:
> > >
> > > (gdb) p debug_rtx (op0)
> > > (const_vector:V16QI [
> > >         (const_int -52 [0xffffffffffffffcc]) repeated x16
> > >     ])
> > >
> > > This simplification is invalid, it is not possible to get V16QImode vector
> > > from V4QImode vector, even when all elements are duplicates.

^^^

I think this simplification is valid.  A simplification to

(const_vector:V16QI [
         (const_int -52 [0xffffffffffffffcc]) repeated x4
         (const_int 0 [0]) repeated x12
     ])

would be valid as well.

> > > The simplification happens in simplify_context::simplify_subreg:
> > >
> > > (gdb) list
> > > 7558          if (VECTOR_MODE_P (outermode)
> > > 7559              && GET_MODE_INNER (outermode) == GET_MODE_INNER (innermode)
> > > 7560              && vec_duplicate_p (op, &elt))
> > > 7561            return gen_vec_duplicate (outermode, elt);
> > >
> > > but the above simplification is valid only for non-paradoxical registers,
> > > where outermode <= innermode.  We should not assume that elements outside
> > > the original register are valid, let alone all duplicates.
> >
> > Hmm, but looking at the audit trail the x86 backend expects them to be zero?
> > Isn't that wrong as well?
>
> If you mean Comment #10, it is just an observation that
> simplify_replace_rtx simplifies arguments from Comment #9 to:
>
> (gdb) p debug_rtx (src)
> (const_vector:V8HI [
>         (const_int 204 [0xcc]) repeated x4
>         (const_int 0 [0]) repeated x4
>     ])
>
> instead of:
>
> (gdb) p debug_rtx (src)
> (const_vector:V8HI [
>         (const_int 204 [0xcc]) repeated x8
>     ])
>
> which is in line with the statement below.
> >
> > That is, I think putting any random value into the upper lanes when
> > constant folding
> > a paradoxical subreg sounds OK to me, no?
>
> The compiler is putting zero there as can be seen from the above new RTX.
>
> > Of course we might choose to not do such constant propagation for
> > efficiency reason - at least
> > when the resulting CONST_* would require a larger constant pool entry
> > or more costly
> > construction.
>
> This is probably a follow-up improvement, where this patch tries to
> fix a specific invalid simplification of simplify_replace_rtx that is
> invalid universally.

How so?  What specifies the values of the paradoxical subreg for the
bytes not covered by the subreg operand?

>
> Uros.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] simplify-rtx: Fix invalid simplification with paradoxical subregs [PR110206]
  2023-07-10  9:47     ` Richard Biener
@ 2023-07-10 11:01       ` Uros Bizjak
  2023-07-10 11:57         ` Richard Biener
  0 siblings, 1 reply; 11+ messages in thread
From: Uros Bizjak @ 2023-07-10 11:01 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

On Mon, Jul 10, 2023 at 11:47 AM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Mon, Jul 10, 2023 at 11:26 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > On Mon, Jul 10, 2023 at 11:17 AM Richard Biener
> > <richard.guenther@gmail.com> wrote:
> > >
> > > On Sun, Jul 9, 2023 at 10:53 AM Uros Bizjak via Gcc-patches
> > > <gcc-patches@gcc.gnu.org> wrote:
> > > >
> > > > As shown in the PR, simplify_gen_subreg call in simplify_replace_fn_rtx:
> > > >
> > > > (gdb) list
> > > > 469           if (code == SUBREG)
> > > > 470             {
> > > > 471               op0 = simplify_replace_fn_rtx (SUBREG_REG (x),
> > > > old_rtx, fn, data);
> > > > 472               if (op0 == SUBREG_REG (x))
> > > > 473                 return x;
> > > > 474               op0 = simplify_gen_subreg (GET_MODE (x), op0,
> > > > 475                                          GET_MODE (SUBREG_REG (x)),
> > > > 476                                          SUBREG_BYTE (x));
> > > > 477               return op0 ? op0 : x;
> > > > 478             }
> > > >
> > > > simplifies with following arguments:
> > > >
> > > > (gdb) p debug_rtx (op0)
> > > > (const_vector:V4QI [
> > > >         (const_int -52 [0xffffffffffffffcc]) repeated x4
> > > >     ])
> > > > (gdb) p debug_rtx (x)
> > > > (subreg:V16QI (reg:V4QI 98) 0)
> > > >
> > > > to:
> > > >
> > > > (gdb) p debug_rtx (op0)
> > > > (const_vector:V16QI [
> > > >         (const_int -52 [0xffffffffffffffcc]) repeated x16
> > > >     ])
> > > >
> > > > This simplification is invalid, it is not possible to get V16QImode vector
> > > > from V4QImode vector, even when all elements are duplicates.
>
> ^^^
>
> I think this simplification is valid.  A simplification to
>
> (const_vector:V16QI [
>          (const_int -52 [0xffffffffffffffcc]) repeated x4
>          (const_int 0 [0]) repeated x12
>      ])
>
> would be valid as well.
>
> > > > The simplification happens in simplify_context::simplify_subreg:
> > > >
> > > > (gdb) list
> > > > 7558          if (VECTOR_MODE_P (outermode)
> > > > 7559              && GET_MODE_INNER (outermode) == GET_MODE_INNER (innermode)
> > > > 7560              && vec_duplicate_p (op, &elt))
> > > > 7561            return gen_vec_duplicate (outermode, elt);
> > > >
> > > > but the above simplification is valid only for non-paradoxical registers,
> > > > where outermode <= innermode.  We should not assume that elements outside
> > > > the original register are valid, let alone all duplicates.
> > >
> > > Hmm, but looking at the audit trail the x86 backend expects them to be zero?
> > > Isn't that wrong as well?
> >
> > If you mean Comment #10, it is just an observation that
> > simplify_replace_rtx simplifies arguments from Comment #9 to:
> >
> > (gdb) p debug_rtx (src)
> > (const_vector:V8HI [
> >         (const_int 204 [0xcc]) repeated x4
> >         (const_int 0 [0]) repeated x4
> >     ])
> >
> > instead of:
> >
> > (gdb) p debug_rtx (src)
> > (const_vector:V8HI [
> >         (const_int 204 [0xcc]) repeated x8
> >     ])
> >
> > which is in line with the statement below.
> > >
> > > That is, I think putting any random value into the upper lanes when
> > > constant folding
> > > a paradoxical subreg sounds OK to me, no?
> >
> > The compiler is putting zero there as can be seen from the above new RTX.
> >
> > > Of course we might choose to not do such constant propagation for
> > > efficiency reason - at least
> > > when the resulting CONST_* would require a larger constant pool entry
> > > or more costly
> > > construction.
> >
> > This is probably a follow-up improvement, where this patch tries to
> > fix a specific invalid simplification of simplify_replace_rtx that is
> > invalid universally.
>
> How so?  What specifies the values of the paradoxical subreg for the
> bytes not covered by the subreg operand?

I don't know why 0 is generated here (and if it is valid) for
paradoxical bytes, but 0xcc is not correct, since it sets REG_EQUAL to
the wrong constant and triggers unwanted propagation later on.

Uros.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] simplify-rtx: Fix invalid simplification with paradoxical subregs [PR110206]
  2023-07-10 11:01       ` Uros Bizjak
@ 2023-07-10 11:57         ` Richard Biener
  2023-07-12 10:23           ` Richard Sandiford
  0 siblings, 1 reply; 11+ messages in thread
From: Richard Biener @ 2023-07-10 11:57 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches

On Mon, Jul 10, 2023 at 1:01 PM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Mon, Jul 10, 2023 at 11:47 AM Richard Biener
> <richard.guenther@gmail.com> wrote:
> >
> > On Mon, Jul 10, 2023 at 11:26 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> > >
> > > On Mon, Jul 10, 2023 at 11:17 AM Richard Biener
> > > <richard.guenther@gmail.com> wrote:
> > > >
> > > > On Sun, Jul 9, 2023 at 10:53 AM Uros Bizjak via Gcc-patches
> > > > <gcc-patches@gcc.gnu.org> wrote:
> > > > >
> > > > > As shown in the PR, simplify_gen_subreg call in simplify_replace_fn_rtx:
> > > > >
> > > > > (gdb) list
> > > > > 469           if (code == SUBREG)
> > > > > 470             {
> > > > > 471               op0 = simplify_replace_fn_rtx (SUBREG_REG (x),
> > > > > old_rtx, fn, data);
> > > > > 472               if (op0 == SUBREG_REG (x))
> > > > > 473                 return x;
> > > > > 474               op0 = simplify_gen_subreg (GET_MODE (x), op0,
> > > > > 475                                          GET_MODE (SUBREG_REG (x)),
> > > > > 476                                          SUBREG_BYTE (x));
> > > > > 477               return op0 ? op0 : x;
> > > > > 478             }
> > > > >
> > > > > simplifies with following arguments:
> > > > >
> > > > > (gdb) p debug_rtx (op0)
> > > > > (const_vector:V4QI [
> > > > >         (const_int -52 [0xffffffffffffffcc]) repeated x4
> > > > >     ])
> > > > > (gdb) p debug_rtx (x)
> > > > > (subreg:V16QI (reg:V4QI 98) 0)
> > > > >
> > > > > to:
> > > > >
> > > > > (gdb) p debug_rtx (op0)
> > > > > (const_vector:V16QI [
> > > > >         (const_int -52 [0xffffffffffffffcc]) repeated x16
> > > > >     ])
> > > > >
> > > > > This simplification is invalid, it is not possible to get V16QImode vector
> > > > > from V4QImode vector, even when all elements are duplicates.
> >
> > ^^^
> >
> > I think this simplification is valid.  A simplification to
> >
> > (const_vector:V16QI [
> >          (const_int -52 [0xffffffffffffffcc]) repeated x4
> >          (const_int 0 [0]) repeated x12
> >      ])
> >
> > would be valid as well.
> >
> > > > > The simplification happens in simplify_context::simplify_subreg:
> > > > >
> > > > > (gdb) list
> > > > > 7558          if (VECTOR_MODE_P (outermode)
> > > > > 7559              && GET_MODE_INNER (outermode) == GET_MODE_INNER (innermode)
> > > > > 7560              && vec_duplicate_p (op, &elt))
> > > > > 7561            return gen_vec_duplicate (outermode, elt);
> > > > >
> > > > > but the above simplification is valid only for non-paradoxical registers,
> > > > > where outermode <= innermode.  We should not assume that elements outside
> > > > > the original register are valid, let alone all duplicates.
> > > >
> > > > Hmm, but looking at the audit trail the x86 backend expects them to be zero?
> > > > Isn't that wrong as well?
> > >
> > > If you mean Comment #10, it is just an observation that
> > > simplify_replace_rtx simplifies arguments from Comment #9 to:
> > >
> > > (gdb) p debug_rtx (src)
> > > (const_vector:V8HI [
> > >         (const_int 204 [0xcc]) repeated x4
> > >         (const_int 0 [0]) repeated x4
> > >     ])
> > >
> > > instead of:
> > >
> > > (gdb) p debug_rtx (src)
> > > (const_vector:V8HI [
> > >         (const_int 204 [0xcc]) repeated x8
> > >     ])
> > >
> > > which is in line with the statement below.
> > > >
> > > > That is, I think putting any random value into the upper lanes when
> > > > constant folding
> > > > a paradoxical subreg sounds OK to me, no?
> > >
> > > The compiler is putting zero there as can be seen from the above new RTX.
> > >
> > > > Of course we might choose to not do such constant propagation for
> > > > efficiency reason - at least
> > > > when the resulting CONST_* would require a larger constant pool entry
> > > > or more costly
> > > > construction.
> > >
> > > This is probably a follow-up improvement, where this patch tries to
> > > fix a specific invalid simplification of simplify_replace_rtx that is
> > > invalid universally.
> >
> > How so?  What specifies the values of the paradoxical subreg for the
> > bytes not covered by the subreg operand?
>
> I don't know why 0 is generated here (and if it is valid) for
> paradoxical bytes, but 0xcc is not correct, since it sets REG_EQUAL to
> the wrong constant and triggers unwanted propagation later on.

Quoting what I wrote in the PR below.  I think pragmatically the fix is
good - we might miss some opportunistic folding this way but we for
sure may not optimistically register an equality via REG_EQUAL without
enforcing it (removing the producer and replacing it with the optimistic
constant).

So consider the patch approved if no other RTL maintainer chimes in
within 48h.

Thanks,
Richard.


I can see cprop1 adds the REG_EQUAL note:

(insn 22 21 23 4 (set (reg:V8HI 100)
        (zero_extend:V8HI (vec_select:V8QI (subreg:V16QI (reg:V4QI 98) 0)
                (parallel [
                        (const_int 0 [0])
                        (const_int 1 [0x1])
                        (const_int 2 [0x2])
                        (const_int 3 [0x3])
                        (const_int 4 [0x4])
                        (const_int 5 [0x5])
                         (const_int 6 [0x6])
                         (const_int 7 [0x7])
                     ])))) "t.c":12:42 7557 {sse4_1_zero_extendv8qiv8hi2}
-     (expr_list:REG_DEAD (reg:V4QI 98)
-        (nil)))
+     (expr_list:REG_EQUAL (const_vector:V8HI [
+                (const_int 204 [0xcc]) repeated x8
+            ])
+        (expr_list:REG_DEAD (reg:V4QI 98)
+            (nil))))

but I don't see yet what the actual wrong transform based on this
REG_EQUAL note is?

It looks like we CSE the above with

-   46: r122:V8QI=[`*.LC3']
-      REG_EQUAL const_vector
-   48: r125:V8HI=zero_extend(vec_select(r122:V8QI#0,parallel))
-      REG_EQUAL const_vector
-      REG_DEAD r122:V8QI
-   49: r126:V8HI=r124:V8HI*r125:V8HI
-      REG_DEAD r125:V8HI
+   49: r126:V8HI=r124:V8HI*r100:V8HI

but otherwise do nothing.  So the issue is that we rely on the "undefined"
vals to have a specific value (from the earlier REG_EQUAL note) but actual
code generation doesn't ensure this (it doesn't need to).  That said,
the issue isn't the constant folding per-se but that we do not actually
constant fold but register an equality that doesn't hold.



> Uros.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] simplify-rtx: Fix invalid simplification with paradoxical subregs [PR110206]
  2023-07-10 11:57         ` Richard Biener
@ 2023-07-12 10:23           ` Richard Sandiford
  2023-07-12 10:58             ` Uros Bizjak
  0 siblings, 1 reply; 11+ messages in thread
From: Richard Sandiford @ 2023-07-12 10:23 UTC (permalink / raw)
  To: Richard Biener via Gcc-patches; +Cc: Uros Bizjak, Richard Biener

Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> On Mon, Jul 10, 2023 at 1:01 PM Uros Bizjak <ubizjak@gmail.com> wrote:
>>
>> On Mon, Jul 10, 2023 at 11:47 AM Richard Biener
>> <richard.guenther@gmail.com> wrote:
>> >
>> > On Mon, Jul 10, 2023 at 11:26 AM Uros Bizjak <ubizjak@gmail.com> wrote:
>> > >
>> > > On Mon, Jul 10, 2023 at 11:17 AM Richard Biener
>> > > <richard.guenther@gmail.com> wrote:
>> > > >
>> > > > On Sun, Jul 9, 2023 at 10:53 AM Uros Bizjak via Gcc-patches
>> > > > <gcc-patches@gcc.gnu.org> wrote:
>> > > > >
>> > > > > As shown in the PR, simplify_gen_subreg call in simplify_replace_fn_rtx:
>> > > > >
>> > > > > (gdb) list
>> > > > > 469           if (code == SUBREG)
>> > > > > 470             {
>> > > > > 471               op0 = simplify_replace_fn_rtx (SUBREG_REG (x),
>> > > > > old_rtx, fn, data);
>> > > > > 472               if (op0 == SUBREG_REG (x))
>> > > > > 473                 return x;
>> > > > > 474               op0 = simplify_gen_subreg (GET_MODE (x), op0,
>> > > > > 475                                          GET_MODE (SUBREG_REG (x)),
>> > > > > 476                                          SUBREG_BYTE (x));
>> > > > > 477               return op0 ? op0 : x;
>> > > > > 478             }
>> > > > >
>> > > > > simplifies with following arguments:
>> > > > >
>> > > > > (gdb) p debug_rtx (op0)
>> > > > > (const_vector:V4QI [
>> > > > >         (const_int -52 [0xffffffffffffffcc]) repeated x4
>> > > > >     ])
>> > > > > (gdb) p debug_rtx (x)
>> > > > > (subreg:V16QI (reg:V4QI 98) 0)
>> > > > >
>> > > > > to:
>> > > > >
>> > > > > (gdb) p debug_rtx (op0)
>> > > > > (const_vector:V16QI [
>> > > > >         (const_int -52 [0xffffffffffffffcc]) repeated x16
>> > > > >     ])
>> > > > >
>> > > > > This simplification is invalid, it is not possible to get V16QImode vector
>> > > > > from V4QImode vector, even when all elements are duplicates.
>> >
>> > ^^^
>> >
>> > I think this simplification is valid.  A simplification to
>> >
>> > (const_vector:V16QI [
>> >          (const_int -52 [0xffffffffffffffcc]) repeated x4
>> >          (const_int 0 [0]) repeated x12
>> >      ])
>> >
>> > would be valid as well.
>> >
>> > > > > The simplification happens in simplify_context::simplify_subreg:
>> > > > >
>> > > > > (gdb) list
>> > > > > 7558          if (VECTOR_MODE_P (outermode)
>> > > > > 7559              && GET_MODE_INNER (outermode) == GET_MODE_INNER (innermode)
>> > > > > 7560              && vec_duplicate_p (op, &elt))
>> > > > > 7561            return gen_vec_duplicate (outermode, elt);
>> > > > >
>> > > > > but the above simplification is valid only for non-paradoxical registers,
>> > > > > where outermode <= innermode.  We should not assume that elements outside
>> > > > > the original register are valid, let alone all duplicates.
>> > > >
>> > > > Hmm, but looking at the audit trail the x86 backend expects them to be zero?
>> > > > Isn't that wrong as well?
>> > >
>> > > If you mean Comment #10, it is just an observation that
>> > > simplify_replace_rtx simplifies arguments from Comment #9 to:
>> > >
>> > > (gdb) p debug_rtx (src)
>> > > (const_vector:V8HI [
>> > >         (const_int 204 [0xcc]) repeated x4
>> > >         (const_int 0 [0]) repeated x4
>> > >     ])
>> > >
>> > > instead of:
>> > >
>> > > (gdb) p debug_rtx (src)
>> > > (const_vector:V8HI [
>> > >         (const_int 204 [0xcc]) repeated x8
>> > >     ])
>> > >
>> > > which is in line with the statement below.
>> > > >
>> > > > That is, I think putting any random value into the upper lanes when
>> > > > constant folding
>> > > > a paradoxical subreg sounds OK to me, no?
>> > >
>> > > The compiler is putting zero there as can be seen from the above new RTX.
>> > >
>> > > > Of course we might choose to not do such constant propagation for
>> > > > efficiency reason - at least
>> > > > when the resulting CONST_* would require a larger constant pool entry
>> > > > or more costly
>> > > > construction.
>> > >
>> > > This is probably a follow-up improvement, where this patch tries to
>> > > fix a specific invalid simplification of simplify_replace_rtx that is
>> > > invalid universally.
>> >
>> > How so?  What specifies the values of the paradoxical subreg for the
>> > bytes not covered by the subreg operand?
>>
>> I don't know why 0 is generated here (and if it is valid) for
>> paradoxical bytes, but 0xcc is not correct, since it sets REG_EQUAL to
>> the wrong constant and triggers unwanted propagation later on.
>
> Quoting what I wrote in the PR below.  I think pragmatically the fix is
> good - we might miss some opportunistic folding this way but we for
> sure may not optimistically register an equality via REG_EQUAL without
> enforcing it (removing the producer and replacing it with the optimistic
> constant).
>
> So consider the patch approved if no other RTL maintainer chimes in
> within 48h.

Sorry, can you hold off a bit longer?  Wanted to have a look but the
deadline is about to expire.

I think at least a comment is needed, since like Richard says,
the transformation seems correct for paradoxical subregs, even if it
isn't wanted for other reasons.

Thanks,
Richard

>
> Thanks,
> Richard.
>
>
> I can see cprop1 adds the REG_EQUAL note:
>
> (insn 22 21 23 4 (set (reg:V8HI 100)
>         (zero_extend:V8HI (vec_select:V8QI (subreg:V16QI (reg:V4QI 98) 0)
>                 (parallel [
>                         (const_int 0 [0])
>                         (const_int 1 [0x1])
>                         (const_int 2 [0x2])
>                         (const_int 3 [0x3])
>                         (const_int 4 [0x4])
>                         (const_int 5 [0x5])
>                          (const_int 6 [0x6])
>                          (const_int 7 [0x7])
>                      ])))) "t.c":12:42 7557 {sse4_1_zero_extendv8qiv8hi2}
> -     (expr_list:REG_DEAD (reg:V4QI 98)
> -        (nil)))
> +     (expr_list:REG_EQUAL (const_vector:V8HI [
> +                (const_int 204 [0xcc]) repeated x8
> +            ])
> +        (expr_list:REG_DEAD (reg:V4QI 98)
> +            (nil))))
>
> but I don't see yet what the actual wrong transform based on this
> REG_EQUAL note is?
>
> It looks like we CSE the above with
>
> -   46: r122:V8QI=[`*.LC3']
> -      REG_EQUAL const_vector
> -   48: r125:V8HI=zero_extend(vec_select(r122:V8QI#0,parallel))
> -      REG_EQUAL const_vector
> -      REG_DEAD r122:V8QI
> -   49: r126:V8HI=r124:V8HI*r125:V8HI
> -      REG_DEAD r125:V8HI
> +   49: r126:V8HI=r124:V8HI*r100:V8HI
>
> but otherwise do nothing.  So the issue is that we rely on the "undefined"
> vals to have a specific value (from the earlier REG_EQUAL note) but actual
> code generation doesn't ensure this (it doesn't need to).  That said,
> the issue isn't the constant folding per-se but that we do not actually
> constant fold but register an equality that doesn't hold.
>
>
>
>> Uros.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] simplify-rtx: Fix invalid simplification with paradoxical subregs [PR110206]
  2023-07-12 10:23           ` Richard Sandiford
@ 2023-07-12 10:58             ` Uros Bizjak
  2023-07-12 11:04               ` Uros Bizjak
  0 siblings, 1 reply; 11+ messages in thread
From: Uros Bizjak @ 2023-07-12 10:58 UTC (permalink / raw)
  To: Richard Biener via Gcc-patches, Uros Bizjak, Richard Biener,
	richard.sandiford

On Wed, Jul 12, 2023 at 12:23 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > On Mon, Jul 10, 2023 at 1:01 PM Uros Bizjak <ubizjak@gmail.com> wrote:
> >>
> >> On Mon, Jul 10, 2023 at 11:47 AM Richard Biener
> >> <richard.guenther@gmail.com> wrote:
> >> >
> >> > On Mon, Jul 10, 2023 at 11:26 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> >> > >
> >> > > On Mon, Jul 10, 2023 at 11:17 AM Richard Biener
> >> > > <richard.guenther@gmail.com> wrote:
> >> > > >
> >> > > > On Sun, Jul 9, 2023 at 10:53 AM Uros Bizjak via Gcc-patches
> >> > > > <gcc-patches@gcc.gnu.org> wrote:
> >> > > > >
> >> > > > > As shown in the PR, simplify_gen_subreg call in simplify_replace_fn_rtx:
> >> > > > >
> >> > > > > (gdb) list
> >> > > > > 469           if (code == SUBREG)
> >> > > > > 470             {
> >> > > > > 471               op0 = simplify_replace_fn_rtx (SUBREG_REG (x),
> >> > > > > old_rtx, fn, data);
> >> > > > > 472               if (op0 == SUBREG_REG (x))
> >> > > > > 473                 return x;
> >> > > > > 474               op0 = simplify_gen_subreg (GET_MODE (x), op0,
> >> > > > > 475                                          GET_MODE (SUBREG_REG (x)),
> >> > > > > 476                                          SUBREG_BYTE (x));
> >> > > > > 477               return op0 ? op0 : x;
> >> > > > > 478             }
> >> > > > >
> >> > > > > simplifies with following arguments:
> >> > > > >
> >> > > > > (gdb) p debug_rtx (op0)
> >> > > > > (const_vector:V4QI [
> >> > > > >         (const_int -52 [0xffffffffffffffcc]) repeated x4
> >> > > > >     ])
> >> > > > > (gdb) p debug_rtx (x)
> >> > > > > (subreg:V16QI (reg:V4QI 98) 0)
> >> > > > >
> >> > > > > to:
> >> > > > >
> >> > > > > (gdb) p debug_rtx (op0)
> >> > > > > (const_vector:V16QI [
> >> > > > >         (const_int -52 [0xffffffffffffffcc]) repeated x16
> >> > > > >     ])
> >> > > > >
> >> > > > > This simplification is invalid, it is not possible to get V16QImode vector
> >> > > > > from V4QImode vector, even when all elements are duplicates.
> >> >
> >> > ^^^
> >> >
> >> > I think this simplification is valid.  A simplification to
> >> >
> >> > (const_vector:V16QI [
> >> >          (const_int -52 [0xffffffffffffffcc]) repeated x4
> >> >          (const_int 0 [0]) repeated x12
> >> >      ])
> >> >
> >> > would be valid as well.
> >> >
> >> > > > > The simplification happens in simplify_context::simplify_subreg:
> >> > > > >
> >> > > > > (gdb) list
> >> > > > > 7558          if (VECTOR_MODE_P (outermode)
> >> > > > > 7559              && GET_MODE_INNER (outermode) == GET_MODE_INNER (innermode)
> >> > > > > 7560              && vec_duplicate_p (op, &elt))
> >> > > > > 7561            return gen_vec_duplicate (outermode, elt);
> >> > > > >
> >> > > > > but the above simplification is valid only for non-paradoxical registers,
> >> > > > > where outermode <= innermode.  We should not assume that elements outside
> >> > > > > the original register are valid, let alone all duplicates.
> >> > > >
> >> > > > Hmm, but looking at the audit trail the x86 backend expects them to be zero?
> >> > > > Isn't that wrong as well?
> >> > >
> >> > > If you mean Comment #10, it is just an observation that
> >> > > simplify_replace_rtx simplifies arguments from Comment #9 to:
> >> > >
> >> > > (gdb) p debug_rtx (src)
> >> > > (const_vector:V8HI [
> >> > >         (const_int 204 [0xcc]) repeated x4
> >> > >         (const_int 0 [0]) repeated x4
> >> > >     ])
> >> > >
> >> > > instead of:
> >> > >
> >> > > (gdb) p debug_rtx (src)
> >> > > (const_vector:V8HI [
> >> > >         (const_int 204 [0xcc]) repeated x8
> >> > >     ])
> >> > >
> >> > > which is in line with the statement below.
> >> > > >
> >> > > > That is, I think putting any random value into the upper lanes when
> >> > > > constant folding
> >> > > > a paradoxical subreg sounds OK to me, no?
> >> > >
> >> > > The compiler is putting zero there as can be seen from the above new RTX.
> >> > >
> >> > > > Of course we might choose to not do such constant propagation for
> >> > > > efficiency reason - at least
> >> > > > when the resulting CONST_* would require a larger constant pool entry
> >> > > > or more costly
> >> > > > construction.
> >> > >
> >> > > This is probably a follow-up improvement, where this patch tries to
> >> > > fix a specific invalid simplification of simplify_replace_rtx that is
> >> > > invalid universally.
> >> >
> >> > How so?  What specifies the values of the paradoxical subreg for the
> >> > bytes not covered by the subreg operand?
> >>
> >> I don't know why 0 is generated here (and if it is valid) for
> >> paradoxical bytes, but 0xcc is not correct, since it sets REG_EQUAL to
> >> the wrong constant and triggers unwanted propagation later on.
> >
> > Quoting what I wrote in the PR below.  I think pragmatically the fix is
> > good - we might miss some opportunistic folding this way but we for
> > sure may not optimistically register an equality via REG_EQUAL without
> > enforcing it (removing the producer and replacing it with the optimistic
> > constant).
> >
> > So consider the patch approved if no other RTL maintainer chimes in
> > within 48h.
>
> Sorry, can you hold off a bit longer?  Wanted to have a look but the
> deadline is about to expire.

No problem, I will wait for you.

Thanks,
Uros.

>
> I think at least a comment is needed, since like Richard says,
> the transformation seems correct for paradoxical subregs, even if it
> isn't wanted for other reasons.
>
> Thanks,
> Richard
>
> >
> > Thanks,
> > Richard.
> >
> >
> > I can see cprop1 adds the REG_EQUAL note:
> >
> > (insn 22 21 23 4 (set (reg:V8HI 100)
> >         (zero_extend:V8HI (vec_select:V8QI (subreg:V16QI (reg:V4QI 98) 0)
> >                 (parallel [
> >                         (const_int 0 [0])
> >                         (const_int 1 [0x1])
> >                         (const_int 2 [0x2])
> >                         (const_int 3 [0x3])
> >                         (const_int 4 [0x4])
> >                         (const_int 5 [0x5])
> >                          (const_int 6 [0x6])
> >                          (const_int 7 [0x7])
> >                      ])))) "t.c":12:42 7557 {sse4_1_zero_extendv8qiv8hi2}
> > -     (expr_list:REG_DEAD (reg:V4QI 98)
> > -        (nil)))
> > +     (expr_list:REG_EQUAL (const_vector:V8HI [
> > +                (const_int 204 [0xcc]) repeated x8
> > +            ])
> > +        (expr_list:REG_DEAD (reg:V4QI 98)
> > +            (nil))))
> >
> > but I don't see yet what the actual wrong transform based on this
> > REG_EQUAL note is?
> >
> > It looks like we CSE the above with
> >
> > -   46: r122:V8QI=[`*.LC3']
> > -      REG_EQUAL const_vector
> > -   48: r125:V8HI=zero_extend(vec_select(r122:V8QI#0,parallel))
> > -      REG_EQUAL const_vector
> > -      REG_DEAD r122:V8QI
> > -   49: r126:V8HI=r124:V8HI*r125:V8HI
> > -      REG_DEAD r125:V8HI
> > +   49: r126:V8HI=r124:V8HI*r100:V8HI
> >
> > but otherwise do nothing.  So the issue is that we rely on the "undefined"
> > vals to have a specific value (from the earlier REG_EQUAL note) but actual
> > code generation doesn't ensure this (it doesn't need to).  That said,
> > the issue isn't the constant folding per-se but that we do not actually
> > constant fold but register an equality that doesn't hold.
> >
> >
> >
> >> Uros.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] simplify-rtx: Fix invalid simplification with paradoxical subregs [PR110206]
  2023-07-12 10:58             ` Uros Bizjak
@ 2023-07-12 11:04               ` Uros Bizjak
  2023-07-12 11:13                 ` Richard Biener
  0 siblings, 1 reply; 11+ messages in thread
From: Uros Bizjak @ 2023-07-12 11:04 UTC (permalink / raw)
  To: Richard Biener via Gcc-patches, Richard Biener, richard.sandiford

On Wed, Jul 12, 2023 at 12:58 PM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Wed, Jul 12, 2023 at 12:23 PM Richard Sandiford
> <richard.sandiford@arm.com> wrote:
> >
> > Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > > On Mon, Jul 10, 2023 at 1:01 PM Uros Bizjak <ubizjak@gmail.com> wrote:
> > >>
> > >> On Mon, Jul 10, 2023 at 11:47 AM Richard Biener
> > >> <richard.guenther@gmail.com> wrote:
> > >> >
> > >> > On Mon, Jul 10, 2023 at 11:26 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> > >> > >
> > >> > > On Mon, Jul 10, 2023 at 11:17 AM Richard Biener
> > >> > > <richard.guenther@gmail.com> wrote:
> > >> > > >
> > >> > > > On Sun, Jul 9, 2023 at 10:53 AM Uros Bizjak via Gcc-patches
> > >> > > > <gcc-patches@gcc.gnu.org> wrote:
> > >> > > > >
> > >> > > > > As shown in the PR, simplify_gen_subreg call in simplify_replace_fn_rtx:
> > >> > > > >
> > >> > > > > (gdb) list
> > >> > > > > 469           if (code == SUBREG)
> > >> > > > > 470             {
> > >> > > > > 471               op0 = simplify_replace_fn_rtx (SUBREG_REG (x),
> > >> > > > > old_rtx, fn, data);
> > >> > > > > 472               if (op0 == SUBREG_REG (x))
> > >> > > > > 473                 return x;
> > >> > > > > 474               op0 = simplify_gen_subreg (GET_MODE (x), op0,
> > >> > > > > 475                                          GET_MODE (SUBREG_REG (x)),
> > >> > > > > 476                                          SUBREG_BYTE (x));
> > >> > > > > 477               return op0 ? op0 : x;
> > >> > > > > 478             }
> > >> > > > >
> > >> > > > > simplifies with following arguments:
> > >> > > > >
> > >> > > > > (gdb) p debug_rtx (op0)
> > >> > > > > (const_vector:V4QI [
> > >> > > > >         (const_int -52 [0xffffffffffffffcc]) repeated x4
> > >> > > > >     ])
> > >> > > > > (gdb) p debug_rtx (x)
> > >> > > > > (subreg:V16QI (reg:V4QI 98) 0)
> > >> > > > >
> > >> > > > > to:
> > >> > > > >
> > >> > > > > (gdb) p debug_rtx (op0)
> > >> > > > > (const_vector:V16QI [
> > >> > > > >         (const_int -52 [0xffffffffffffffcc]) repeated x16
> > >> > > > >     ])
> > >> > > > >
> > >> > > > > This simplification is invalid, it is not possible to get V16QImode vector
> > >> > > > > from V4QImode vector, even when all elements are duplicates.
> > >> >
> > >> > ^^^
> > >> >
> > >> > I think this simplification is valid.  A simplification to
> > >> >
> > >> > (const_vector:V16QI [
> > >> >          (const_int -52 [0xffffffffffffffcc]) repeated x4
> > >> >          (const_int 0 [0]) repeated x12
> > >> >      ])
> > >> >
> > >> > would be valid as well.
> > >> >
> > >> > > > > The simplification happens in simplify_context::simplify_subreg:
> > >> > > > >
> > >> > > > > (gdb) list
> > >> > > > > 7558          if (VECTOR_MODE_P (outermode)
> > >> > > > > 7559              && GET_MODE_INNER (outermode) == GET_MODE_INNER (innermode)
> > >> > > > > 7560              && vec_duplicate_p (op, &elt))
> > >> > > > > 7561            return gen_vec_duplicate (outermode, elt);
> > >> > > > >
> > >> > > > > but the above simplification is valid only for non-paradoxical registers,
> > >> > > > > where outermode <= innermode.  We should not assume that elements outside
> > >> > > > > the original register are valid, let alone all duplicates.
> > >> > > >
> > >> > > > Hmm, but looking at the audit trail the x86 backend expects them to be zero?
> > >> > > > Isn't that wrong as well?
> > >> > >
> > >> > > If you mean Comment #10, it is just an observation that
> > >> > > simplify_replace_rtx simplifies arguments from Comment #9 to:
> > >> > >
> > >> > > (gdb) p debug_rtx (src)
> > >> > > (const_vector:V8HI [
> > >> > >         (const_int 204 [0xcc]) repeated x4
> > >> > >         (const_int 0 [0]) repeated x4
> > >> > >     ])
> > >> > >
> > >> > > instead of:
> > >> > >
> > >> > > (gdb) p debug_rtx (src)
> > >> > > (const_vector:V8HI [
> > >> > >         (const_int 204 [0xcc]) repeated x8
> > >> > >     ])
> > >> > >
> > >> > > which is in line with the statement below.
> > >> > > >
> > >> > > > That is, I think putting any random value into the upper lanes when
> > >> > > > constant folding
> > >> > > > a paradoxical subreg sounds OK to me, no?
> > >> > >
> > >> > > The compiler is putting zero there as can be seen from the above new RTX.
> > >> > >
> > >> > > > Of course we might choose to not do such constant propagation for
> > >> > > > efficiency reason - at least
> > >> > > > when the resulting CONST_* would require a larger constant pool entry
> > >> > > > or more costly
> > >> > > > construction.
> > >> > >
> > >> > > This is probably a follow-up improvement, where this patch tries to
> > >> > > fix a specific invalid simplification of simplify_replace_rtx that is
> > >> > > invalid universally.
> > >> >
> > >> > How so?  What specifies the values of the paradoxical subreg for the
> > >> > bytes not covered by the subreg operand?
> > >>
> > >> I don't know why 0 is generated here (and if it is valid) for
> > >> paradoxical bytes, but 0xcc is not correct, since it sets REG_EQUAL to
> > >> the wrong constant and triggers unwanted propagation later on.
> > >
> > > Quoting what I wrote in the PR below.  I think pragmatically the fix is
> > > good - we might miss some opportunistic folding this way but we for
> > > sure may not optimistically register an equality via REG_EQUAL without
> > > enforcing it (removing the producer and replacing it with the optimistic
> > > constant).
> > >
> > > So consider the patch approved if no other RTL maintainer chimes in
> > > within 48h.
> >
> > Sorry, can you hold off a bit longer?  Wanted to have a look but the
> > deadline is about to expire.
>
> No problem, I will wait for you.

Please also note Comment #14 in the PR. With the patch, the compiler sets

 (const_vector:V8HI [
         (const_int 204 [0xcc]) repeated x4
         (const_int 0 [0]) repeated x4
     ])

as a new REG_EQUAL note. This also does not look OK to me. IMO, the
compiler should not emit REG_EQUAL note when some of the elements are
derived from undefined values outside the original register.

Uros.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] simplify-rtx: Fix invalid simplification with paradoxical subregs [PR110206]
  2023-07-12 11:04               ` Uros Bizjak
@ 2023-07-12 11:13                 ` Richard Biener
  2023-07-12 11:47                   ` Richard Sandiford
  0 siblings, 1 reply; 11+ messages in thread
From: Richard Biener @ 2023-07-12 11:13 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: Richard Biener via Gcc-patches, richard.sandiford

On Wed, Jul 12, 2023 at 1:05 PM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Wed, Jul 12, 2023 at 12:58 PM Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > On Wed, Jul 12, 2023 at 12:23 PM Richard Sandiford
> > <richard.sandiford@arm.com> wrote:
> > >
> > > Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > > > On Mon, Jul 10, 2023 at 1:01 PM Uros Bizjak <ubizjak@gmail.com> wrote:
> > > >>
> > > >> On Mon, Jul 10, 2023 at 11:47 AM Richard Biener
> > > >> <richard.guenther@gmail.com> wrote:
> > > >> >
> > > >> > On Mon, Jul 10, 2023 at 11:26 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> > > >> > >
> > > >> > > On Mon, Jul 10, 2023 at 11:17 AM Richard Biener
> > > >> > > <richard.guenther@gmail.com> wrote:
> > > >> > > >
> > > >> > > > On Sun, Jul 9, 2023 at 10:53 AM Uros Bizjak via Gcc-patches
> > > >> > > > <gcc-patches@gcc.gnu.org> wrote:
> > > >> > > > >
> > > >> > > > > As shown in the PR, simplify_gen_subreg call in simplify_replace_fn_rtx:
> > > >> > > > >
> > > >> > > > > (gdb) list
> > > >> > > > > 469           if (code == SUBREG)
> > > >> > > > > 470             {
> > > >> > > > > 471               op0 = simplify_replace_fn_rtx (SUBREG_REG (x),
> > > >> > > > > old_rtx, fn, data);
> > > >> > > > > 472               if (op0 == SUBREG_REG (x))
> > > >> > > > > 473                 return x;
> > > >> > > > > 474               op0 = simplify_gen_subreg (GET_MODE (x), op0,
> > > >> > > > > 475                                          GET_MODE (SUBREG_REG (x)),
> > > >> > > > > 476                                          SUBREG_BYTE (x));
> > > >> > > > > 477               return op0 ? op0 : x;
> > > >> > > > > 478             }
> > > >> > > > >
> > > >> > > > > simplifies with following arguments:
> > > >> > > > >
> > > >> > > > > (gdb) p debug_rtx (op0)
> > > >> > > > > (const_vector:V4QI [
> > > >> > > > >         (const_int -52 [0xffffffffffffffcc]) repeated x4
> > > >> > > > >     ])
> > > >> > > > > (gdb) p debug_rtx (x)
> > > >> > > > > (subreg:V16QI (reg:V4QI 98) 0)
> > > >> > > > >
> > > >> > > > > to:
> > > >> > > > >
> > > >> > > > > (gdb) p debug_rtx (op0)
> > > >> > > > > (const_vector:V16QI [
> > > >> > > > >         (const_int -52 [0xffffffffffffffcc]) repeated x16
> > > >> > > > >     ])
> > > >> > > > >
> > > >> > > > > This simplification is invalid, it is not possible to get V16QImode vector
> > > >> > > > > from V4QImode vector, even when all elements are duplicates.
> > > >> >
> > > >> > ^^^
> > > >> >
> > > >> > I think this simplification is valid.  A simplification to
> > > >> >
> > > >> > (const_vector:V16QI [
> > > >> >          (const_int -52 [0xffffffffffffffcc]) repeated x4
> > > >> >          (const_int 0 [0]) repeated x12
> > > >> >      ])
> > > >> >
> > > >> > would be valid as well.
> > > >> >
> > > >> > > > > The simplification happens in simplify_context::simplify_subreg:
> > > >> > > > >
> > > >> > > > > (gdb) list
> > > >> > > > > 7558          if (VECTOR_MODE_P (outermode)
> > > >> > > > > 7559              && GET_MODE_INNER (outermode) == GET_MODE_INNER (innermode)
> > > >> > > > > 7560              && vec_duplicate_p (op, &elt))
> > > >> > > > > 7561            return gen_vec_duplicate (outermode, elt);
> > > >> > > > >
> > > >> > > > > but the above simplification is valid only for non-paradoxical registers,
> > > >> > > > > where outermode <= innermode.  We should not assume that elements outside
> > > >> > > > > the original register are valid, let alone all duplicates.
> > > >> > > >
> > > >> > > > Hmm, but looking at the audit trail the x86 backend expects them to be zero?
> > > >> > > > Isn't that wrong as well?
> > > >> > >
> > > >> > > If you mean Comment #10, it is just an observation that
> > > >> > > simplify_replace_rtx simplifies arguments from Comment #9 to:
> > > >> > >
> > > >> > > (gdb) p debug_rtx (src)
> > > >> > > (const_vector:V8HI [
> > > >> > >         (const_int 204 [0xcc]) repeated x4
> > > >> > >         (const_int 0 [0]) repeated x4
> > > >> > >     ])
> > > >> > >
> > > >> > > instead of:
> > > >> > >
> > > >> > > (gdb) p debug_rtx (src)
> > > >> > > (const_vector:V8HI [
> > > >> > >         (const_int 204 [0xcc]) repeated x8
> > > >> > >     ])
> > > >> > >
> > > >> > > which is in line with the statement below.
> > > >> > > >
> > > >> > > > That is, I think putting any random value into the upper lanes when
> > > >> > > > constant folding
> > > >> > > > a paradoxical subreg sounds OK to me, no?
> > > >> > >
> > > >> > > The compiler is putting zero there as can be seen from the above new RTX.
> > > >> > >
> > > >> > > > Of course we might choose to not do such constant propagation for
> > > >> > > > efficiency reason - at least
> > > >> > > > when the resulting CONST_* would require a larger constant pool entry
> > > >> > > > or more costly
> > > >> > > > construction.
> > > >> > >
> > > >> > > This is probably a follow-up improvement, where this patch tries to
> > > >> > > fix a specific invalid simplification of simplify_replace_rtx that is
> > > >> > > invalid universally.
> > > >> >
> > > >> > How so?  What specifies the values of the paradoxical subreg for the
> > > >> > bytes not covered by the subreg operand?
> > > >>
> > > >> I don't know why 0 is generated here (and if it is valid) for
> > > >> paradoxical bytes, but 0xcc is not correct, since it sets REG_EQUAL to
> > > >> the wrong constant and triggers unwanted propagation later on.
> > > >
> > > > Quoting what I wrote in the PR below.  I think pragmatically the fix is
> > > > good - we might miss some opportunistic folding this way but we for
> > > > sure may not optimistically register an equality via REG_EQUAL without
> > > > enforcing it (removing the producer and replacing it with the optimistic
> > > > constant).
> > > >
> > > > So consider the patch approved if no other RTL maintainer chimes in
> > > > within 48h.
> > >
> > > Sorry, can you hold off a bit longer?  Wanted to have a look but the
> > > deadline is about to expire.
> >
> > No problem, I will wait for you.
>
> Please also note Comment #14 in the PR. With the patch, the compiler sets
>
>  (const_vector:V8HI [
>          (const_int 204 [0xcc]) repeated x4
>          (const_int 0 [0]) repeated x4
>      ])
>
> as a new REG_EQUAL note. This also does not look OK to me. IMO, the
> compiler should not emit REG_EQUAL note when some of the elements are
> derived from undefined values outside the original register.

Yes.  The reverse would also be true, so when we'd actually constant propagate
(which would be OK) we also may not put an REG_EQUAL note with the original
expression which produced the undefined values.

Within the current framework it looks difficult to guarantee either so the patch
removing the possibility of simplifying alltogether looks most reasonable to me.

Richard.

>
> Uros.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] simplify-rtx: Fix invalid simplification with paradoxical subregs [PR110206]
  2023-07-12 11:13                 ` Richard Biener
@ 2023-07-12 11:47                   ` Richard Sandiford
  0 siblings, 0 replies; 11+ messages in thread
From: Richard Sandiford @ 2023-07-12 11:47 UTC (permalink / raw)
  To: Richard Biener; +Cc: Uros Bizjak, Richard Biener via Gcc-patches

Richard Biener <richard.guenther@gmail.com> writes:
> On Wed, Jul 12, 2023 at 1:05 PM Uros Bizjak <ubizjak@gmail.com> wrote:
>>
>> On Wed, Jul 12, 2023 at 12:58 PM Uros Bizjak <ubizjak@gmail.com> wrote:
>> >
>> > On Wed, Jul 12, 2023 at 12:23 PM Richard Sandiford
>> > <richard.sandiford@arm.com> wrote:
>> > >
>> > > Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> > > > On Mon, Jul 10, 2023 at 1:01 PM Uros Bizjak <ubizjak@gmail.com> wrote:
>> > > >>
>> > > >> On Mon, Jul 10, 2023 at 11:47 AM Richard Biener
>> > > >> <richard.guenther@gmail.com> wrote:
>> > > >> >
>> > > >> > On Mon, Jul 10, 2023 at 11:26 AM Uros Bizjak <ubizjak@gmail.com> wrote:
>> > > >> > >
>> > > >> > > On Mon, Jul 10, 2023 at 11:17 AM Richard Biener
>> > > >> > > <richard.guenther@gmail.com> wrote:
>> > > >> > > >
>> > > >> > > > On Sun, Jul 9, 2023 at 10:53 AM Uros Bizjak via Gcc-patches
>> > > >> > > > <gcc-patches@gcc.gnu.org> wrote:
>> > > >> > > > >
>> > > >> > > > > As shown in the PR, simplify_gen_subreg call in simplify_replace_fn_rtx:
>> > > >> > > > >
>> > > >> > > > > (gdb) list
>> > > >> > > > > 469           if (code == SUBREG)
>> > > >> > > > > 470             {
>> > > >> > > > > 471               op0 = simplify_replace_fn_rtx (SUBREG_REG (x),
>> > > >> > > > > old_rtx, fn, data);
>> > > >> > > > > 472               if (op0 == SUBREG_REG (x))
>> > > >> > > > > 473                 return x;
>> > > >> > > > > 474               op0 = simplify_gen_subreg (GET_MODE (x), op0,
>> > > >> > > > > 475                                          GET_MODE (SUBREG_REG (x)),
>> > > >> > > > > 476                                          SUBREG_BYTE (x));
>> > > >> > > > > 477               return op0 ? op0 : x;
>> > > >> > > > > 478             }
>> > > >> > > > >
>> > > >> > > > > simplifies with following arguments:
>> > > >> > > > >
>> > > >> > > > > (gdb) p debug_rtx (op0)
>> > > >> > > > > (const_vector:V4QI [
>> > > >> > > > >         (const_int -52 [0xffffffffffffffcc]) repeated x4
>> > > >> > > > >     ])
>> > > >> > > > > (gdb) p debug_rtx (x)
>> > > >> > > > > (subreg:V16QI (reg:V4QI 98) 0)
>> > > >> > > > >
>> > > >> > > > > to:
>> > > >> > > > >
>> > > >> > > > > (gdb) p debug_rtx (op0)
>> > > >> > > > > (const_vector:V16QI [
>> > > >> > > > >         (const_int -52 [0xffffffffffffffcc]) repeated x16
>> > > >> > > > >     ])
>> > > >> > > > >
>> > > >> > > > > This simplification is invalid, it is not possible to get V16QImode vector
>> > > >> > > > > from V4QImode vector, even when all elements are duplicates.
>> > > >> >
>> > > >> > ^^^
>> > > >> >
>> > > >> > I think this simplification is valid.  A simplification to
>> > > >> >
>> > > >> > (const_vector:V16QI [
>> > > >> >          (const_int -52 [0xffffffffffffffcc]) repeated x4
>> > > >> >          (const_int 0 [0]) repeated x12
>> > > >> >      ])
>> > > >> >
>> > > >> > would be valid as well.
>> > > >> >
>> > > >> > > > > The simplification happens in simplify_context::simplify_subreg:
>> > > >> > > > >
>> > > >> > > > > (gdb) list
>> > > >> > > > > 7558          if (VECTOR_MODE_P (outermode)
>> > > >> > > > > 7559              && GET_MODE_INNER (outermode) == GET_MODE_INNER (innermode)
>> > > >> > > > > 7560              && vec_duplicate_p (op, &elt))
>> > > >> > > > > 7561            return gen_vec_duplicate (outermode, elt);
>> > > >> > > > >
>> > > >> > > > > but the above simplification is valid only for non-paradoxical registers,
>> > > >> > > > > where outermode <= innermode.  We should not assume that elements outside
>> > > >> > > > > the original register are valid, let alone all duplicates.
>> > > >> > > >
>> > > >> > > > Hmm, but looking at the audit trail the x86 backend expects them to be zero?
>> > > >> > > > Isn't that wrong as well?
>> > > >> > >
>> > > >> > > If you mean Comment #10, it is just an observation that
>> > > >> > > simplify_replace_rtx simplifies arguments from Comment #9 to:
>> > > >> > >
>> > > >> > > (gdb) p debug_rtx (src)
>> > > >> > > (const_vector:V8HI [
>> > > >> > >         (const_int 204 [0xcc]) repeated x4
>> > > >> > >         (const_int 0 [0]) repeated x4
>> > > >> > >     ])
>> > > >> > >
>> > > >> > > instead of:
>> > > >> > >
>> > > >> > > (gdb) p debug_rtx (src)
>> > > >> > > (const_vector:V8HI [
>> > > >> > >         (const_int 204 [0xcc]) repeated x8
>> > > >> > >     ])
>> > > >> > >
>> > > >> > > which is in line with the statement below.
>> > > >> > > >
>> > > >> > > > That is, I think putting any random value into the upper lanes when
>> > > >> > > > constant folding
>> > > >> > > > a paradoxical subreg sounds OK to me, no?
>> > > >> > >
>> > > >> > > The compiler is putting zero there as can be seen from the above new RTX.
>> > > >> > >
>> > > >> > > > Of course we might choose to not do such constant propagation for
>> > > >> > > > efficiency reason - at least
>> > > >> > > > when the resulting CONST_* would require a larger constant pool entry
>> > > >> > > > or more costly
>> > > >> > > > construction.
>> > > >> > >
>> > > >> > > This is probably a follow-up improvement, where this patch tries to
>> > > >> > > fix a specific invalid simplification of simplify_replace_rtx that is
>> > > >> > > invalid universally.
>> > > >> >
>> > > >> > How so?  What specifies the values of the paradoxical subreg for the
>> > > >> > bytes not covered by the subreg operand?
>> > > >>
>> > > >> I don't know why 0 is generated here (and if it is valid) for
>> > > >> paradoxical bytes, but 0xcc is not correct, since it sets REG_EQUAL to
>> > > >> the wrong constant and triggers unwanted propagation later on.
>> > > >
>> > > > Quoting what I wrote in the PR below.  I think pragmatically the fix is
>> > > > good - we might miss some opportunistic folding this way but we for
>> > > > sure may not optimistically register an equality via REG_EQUAL without
>> > > > enforcing it (removing the producer and replacing it with the optimistic
>> > > > constant).
>> > > >
>> > > > So consider the patch approved if no other RTL maintainer chimes in
>> > > > within 48h.
>> > >
>> > > Sorry, can you hold off a bit longer?  Wanted to have a look but the
>> > > deadline is about to expire.
>> >
>> > No problem, I will wait for you.
>>
>> Please also note Comment #14 in the PR. With the patch, the compiler sets
>>
>>  (const_vector:V8HI [
>>          (const_int 204 [0xcc]) repeated x4
>>          (const_int 0 [0]) repeated x4
>>      ])
>>
>> as a new REG_EQUAL note. This also does not look OK to me. IMO, the
>> compiler should not emit REG_EQUAL note when some of the elements are
>> derived from undefined values outside the original register.
>
> Yes.  The reverse would also be true, so when we'd actually constant propagate
> (which would be OK) we also may not put an REG_EQUAL note with the original
> expression which produced the undefined values.

Yeah.  Good, it sounds like we're all in agreement. :)  For:

(insn 22 21 23 4 (set (reg:V8HI 100)
        (zero_extend:V8HI (vec_select:V8QI (subreg:V16QI (reg:V4QI 98) 0)
                (parallel [
                        (const_int 0 [0])
                        (const_int 1 [0x1])
                        (const_int 2 [0x2])
                        (const_int 3 [0x3])
                        (const_int 4 [0x4])
                        (const_int 5 [0x5])
                        (const_int 6 [0x6])
                        (const_int 7 [0x7])
                    ])))) "pr110206.c":12:42 7471 {sse4_1_zero_extendv8qiv8hi2}
     (expr_list:REG_EQUAL (const_vector:V8HI [
                (const_int 204 [0xcc]) repeated x8
            ])
        (expr_list:REG_DEAD (reg:V4QI 98)
            (nil))))

folding to:

  (set (reg:V8HI 100)
       (const_vector:V8HI [(const_int 204 [0xcc]) ...]))

would be OK, but adding the REG_EQUAL is not.

> Within the current framework it looks difficult to guarantee either so the patch
> removing the possibility of simplifying alltogether looks most reasonable to me.

But like Uros says, the alternative REG_EQUAL is equally wrong
in principle.  It just happens to work out OK for this testcase.

fwprop.c has:

      /* If there are any paradoxical SUBREGs, drop the REG_EQUAL note,
	 because the bits in there can be anything and so might not
	 match the REG_EQUAL note content.  See PR70574.  */
      if (contains_paradoxical_subreg_p (SET_SRC (set)))
	return false;

expand_mult_const has a similar guard.  IMO the bug is that cprop is
lacking the same test.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-07-12 11:47 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-09  8:52 [PATCH] simplify-rtx: Fix invalid simplification with paradoxical subregs [PR110206] Uros Bizjak
2023-07-10  9:16 ` Richard Biener
2023-07-10  9:26   ` Uros Bizjak
2023-07-10  9:47     ` Richard Biener
2023-07-10 11:01       ` Uros Bizjak
2023-07-10 11:57         ` Richard Biener
2023-07-12 10:23           ` Richard Sandiford
2023-07-12 10:58             ` Uros Bizjak
2023-07-12 11:04               ` Uros Bizjak
2023-07-12 11:13                 ` Richard Biener
2023-07-12 11:47                   ` Richard Sandiford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).