public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] [PR rtl/optimization/98694] Fix incorrect optimization by cprop_hardreg.
@ 2021-01-18  9:16 Hongtao Liu
  2021-01-18 10:18 ` Richard Sandiford
  0 siblings, 1 reply; 20+ messages in thread
From: Hongtao Liu @ 2021-01-18  9:16 UTC (permalink / raw)
  To: GCC Patches, ebotcazou, steven; +Cc: Jakub Jelinek, Richard Biener, H. J. Lu

Hi:
  If SRC had been assigned a mode narrower than the copy, we can't link
DEST into the chain even they have same
hard_regno_nregs(i.e. HImode/SImode in i386 backend).

i.e
        kmovw   %k0, %edi
        vmovd   %edi, %xmm2
        vpshuflw        $0, %xmm2, %xmm0
        kmovw   %k0, %r8d
        kmovd   %k0, %r9d
...
-        movl %r9d, %r11d
+        vmovd %xmm2, %r11d

  Bootstrap and regtested on x86_64-linux-gnu{-m32,}.
  Ok for trunk?

gcc/ChangeLog:

        PR rtl-optimization/98694
        * regcprop.c (copy_value): If SRC had been assigned a mode
        narrower than the copy, we can't link DEST into the chain even
        they have same hard_regno_nregs(i.e. HImode/SImode in i386
        backend).

gcc/testsuite/ChangeLog:

        PR rtl-optimization/98694
        * gcc.target/i386/pr98694.c: New test.

  ---
 gcc/regcprop.c                          |  3 +-
 gcc/testsuite/gcc.target/i386/pr98694.c | 38 +++++++++++++++++++++++++
 2 files changed, 40 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr98694.c

diff --git a/gcc/regcprop.c b/gcc/regcprop.c
index dd62cb36013..997516eca07 100644
--- a/gcc/regcprop.c
+++ b/gcc/regcprop.c
@@ -355,7 +355,8 @@ copy_value (rtx dest, rtx src, struct value_data *vd)
   /* If SRC had been assigned a mode narrower than the copy, we can't
      link DEST into the chain, because not all of the pieces of the
      copy came from oldest_regno.  */
-  else if (sn > hard_regno_nregs (sr, vd->e[sr].mode))
+  else if (sn > hard_regno_nregs (sr, vd->e[sr].mode)
+          || partial_subreg_p (vd->e[sr].mode, GET_MODE (src)))
     return;

   /* Link DR at the end of the value chain used by SR.  */
diff --git a/gcc/testsuite/gcc.target/i386/pr98694.c
b/gcc/testsuite/gcc.target/i386/pr98694.c
new file mode 100644
index 00000000000..611f9e77627
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr98694.c
@@ -0,0 +1,38 @@
+/* PR rtl-optimization/98694 */
+/* { dg-do run { target { ! ia32 } } } */
+/* { dg-options "-O2 -mavx512bw" } */
+/* { dg-require-effective-target avx512bw } */
+
+#include<immintrin.h>
+typedef short v4hi __attribute__ ((vector_size (8)));
+typedef int v2si __attribute__ ((vector_size (8)));
+v4hi b;
+
+__attribute__ ((noipa))
+v2si
+foo (__m512i src1, __m512i src2)
+{
+  __mmask64 m = _mm512_cmpeq_epu8_mask (src1, src2);
+  short s = (short) m;
+  int i = (int)m;
+  b = __extension__ (v4hi) {s, s, s, s};
+  return __extension__ (v2si) {i, i};
+}
+
+int main ()
+{
+  __m512i src1 = _mm512_setzero_si512 ();
+  __m512i src2 = _mm512_set_epi8 (0, 1, 0, 1, 0, 1, 0, 1,
+                                 0, 1, 0, 1, 0, 1, 0, 1,
+                                 0, 1, 0, 1, 0, 1, 0, 1,
+                                 0, 1, 0, 1, 0, 1, 0, 1,
+                                 0, 1, 0, 1, 0, 1, 0, 1,
+                                 0, 1, 0, 1, 0, 1, 0, 1,
+                                 0, 1, 0, 1, 0, 1, 0, 1,
+                                 0, 1, 0, 1, 0, 1, 0, 1);
+  __mmask64 m = _mm512_cmpeq_epu8_mask (src1, src2);
+  v2si a = foo (src1, src2);
+  if (a[0] != (int)m)
+    __builtin_abort ();
+  return 0;
+}
-- 


-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] [PR rtl/optimization/98694] Fix incorrect optimization by cprop_hardreg.
  2021-01-18  9:16 [PATCH] [PR rtl/optimization/98694] Fix incorrect optimization by cprop_hardreg Hongtao Liu
@ 2021-01-18 10:18 ` Richard Sandiford
  2021-01-18 10:43   ` Hongtao Liu
  0 siblings, 1 reply; 20+ messages in thread
From: Richard Sandiford @ 2021-01-18 10:18 UTC (permalink / raw)
  To: Hongtao Liu via Gcc-patches; +Cc: ebotcazou, steven, Hongtao Liu, Jakub Jelinek

Hongtao Liu via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> Hi:
>   If SRC had been assigned a mode narrower than the copy, we can't link
> DEST into the chain even they have same
> hard_regno_nregs(i.e. HImode/SImode in i386 backend).

In general, changes between modes within the same hard register are OK.
Could you explain in more detail what's going wrong?

Thanks,
Richard


>
> i.e
>         kmovw   %k0, %edi
>         vmovd   %edi, %xmm2
>         vpshuflw        $0, %xmm2, %xmm0
>         kmovw   %k0, %r8d
>         kmovd   %k0, %r9d
> ...
> -        movl %r9d, %r11d
> +        vmovd %xmm2, %r11d
>
>   Bootstrap and regtested on x86_64-linux-gnu{-m32,}.
>   Ok for trunk?
>
> gcc/ChangeLog:
>
>         PR rtl-optimization/98694
>         * regcprop.c (copy_value): If SRC had been assigned a mode
>         narrower than the copy, we can't link DEST into the chain even
>         they have same hard_regno_nregs(i.e. HImode/SImode in i386
>         backend).
>
> gcc/testsuite/ChangeLog:
>
>         PR rtl-optimization/98694
>         * gcc.target/i386/pr98694.c: New test.
>
>   ---
>  gcc/regcprop.c                          |  3 +-
>  gcc/testsuite/gcc.target/i386/pr98694.c | 38 +++++++++++++++++++++++++
>  2 files changed, 40 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr98694.c
>
> diff --git a/gcc/regcprop.c b/gcc/regcprop.c
> index dd62cb36013..997516eca07 100644
> --- a/gcc/regcprop.c
> +++ b/gcc/regcprop.c
> @@ -355,7 +355,8 @@ copy_value (rtx dest, rtx src, struct value_data *vd)
>    /* If SRC had been assigned a mode narrower than the copy, we can't
>       link DEST into the chain, because not all of the pieces of the
>       copy came from oldest_regno.  */
> -  else if (sn > hard_regno_nregs (sr, vd->e[sr].mode))
> +  else if (sn > hard_regno_nregs (sr, vd->e[sr].mode)
> +          || partial_subreg_p (vd->e[sr].mode, GET_MODE (src)))
>      return;
>
>    /* Link DR at the end of the value chain used by SR.  */
> diff --git a/gcc/testsuite/gcc.target/i386/pr98694.c
> b/gcc/testsuite/gcc.target/i386/pr98694.c
> new file mode 100644
> index 00000000000..611f9e77627
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr98694.c
> @@ -0,0 +1,38 @@
> +/* PR rtl-optimization/98694 */
> +/* { dg-do run { target { ! ia32 } } } */
> +/* { dg-options "-O2 -mavx512bw" } */
> +/* { dg-require-effective-target avx512bw } */
> +
> +#include<immintrin.h>
> +typedef short v4hi __attribute__ ((vector_size (8)));
> +typedef int v2si __attribute__ ((vector_size (8)));
> +v4hi b;
> +
> +__attribute__ ((noipa))
> +v2si
> +foo (__m512i src1, __m512i src2)
> +{
> +  __mmask64 m = _mm512_cmpeq_epu8_mask (src1, src2);
> +  short s = (short) m;
> +  int i = (int)m;
> +  b = __extension__ (v4hi) {s, s, s, s};
> +  return __extension__ (v2si) {i, i};
> +}
> +
> +int main ()
> +{
> +  __m512i src1 = _mm512_setzero_si512 ();
> +  __m512i src2 = _mm512_set_epi8 (0, 1, 0, 1, 0, 1, 0, 1,
> +                                 0, 1, 0, 1, 0, 1, 0, 1,
> +                                 0, 1, 0, 1, 0, 1, 0, 1,
> +                                 0, 1, 0, 1, 0, 1, 0, 1,
> +                                 0, 1, 0, 1, 0, 1, 0, 1,
> +                                 0, 1, 0, 1, 0, 1, 0, 1,
> +                                 0, 1, 0, 1, 0, 1, 0, 1,
> +                                 0, 1, 0, 1, 0, 1, 0, 1);
> +  __mmask64 m = _mm512_cmpeq_epu8_mask (src1, src2);
> +  v2si a = foo (src1, src2);
> +  if (a[0] != (int)m)
> +    __builtin_abort ();
> +  return 0;
> +}
> -- 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] [PR rtl/optimization/98694] Fix incorrect optimization by cprop_hardreg.
  2021-01-18 10:18 ` Richard Sandiford
@ 2021-01-18 10:43   ` Hongtao Liu
  2021-01-18 10:51     ` Hongtao Liu
  2021-01-18 11:10     ` Richard Sandiford
  0 siblings, 2 replies; 20+ messages in thread
From: Hongtao Liu @ 2021-01-18 10:43 UTC (permalink / raw)
  To: Hongtao Liu via Gcc-patches, ebotcazou, steven, Hongtao Liu,
	Jakub Jelinek, Richard Sandiford

On Mon, Jan 18, 2021 at 6:18 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Hongtao Liu via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > Hi:
> >   If SRC had been assigned a mode narrower than the copy, we can't link
> > DEST into the chain even they have same
> > hard_regno_nregs(i.e. HImode/SImode in i386 backend).
>
> In general, changes between modes within the same hard register are OK.
> Could you explain in more detail what's going wrong?
>

cprop hardreg change

(insn 457 499 460 33 (set (reg:SI 39 r11 [orig:86 _11 ] [86])
        (reg:SI 37 r9 [orig:86 _11 ] [86])) "test.c":29:36 75 {*movsi_internal}
     (expr_list:REG_DEAD (reg:SI 37 r9 [orig:86 _11 ] [86])
        (nil)))

to

(insn 457 499 460 33 (set (reg:SI 39 r11 [orig:86 _11 ] [86])
        (reg:SI 22 xmm2 [orig:86 _11 ] [86])) "test.c":29:36 75
{*movsi_internal}
     (expr_list:REG_DEAD (reg:SI 22 xmm2 [orig:86 _11 ] [86])
        (nil)))

since (reg:SI 22 xmm2) and (reg:SI r9) are in the same value chain in
which the oldest regno is k0.

but with xmm2 defined as

kmovw %k0, %edi  # 69 [c=4 l=4] *movhi_internal/6----- kmovw move the
lower 16bits to %edi, and clear the upper 16 bits.
vmovd %edi, %xmm2 # 489 *movsi_internal  --- vmovd move 32bits from
%edi to %xmm2.

(insn 69 68 70 12 (set (reg:HI 5 di [orig:96 _52 ] [96])
        (reg:HI 68 k0 [orig:82 var_6.0_1 ] [82])) "test.c":21:23 76
{*movhi_internal}
     (nil))

(insn 489 75 78 12 (set (reg:SI 22 xmm2 [297])
        (reg:SI 5 di [orig:96 _52 ] [96])) 75 {*movsi_internal}
     (nil))
...
kmovd %k0, %r9d (movsi) ---- kmovd move 32bits from %k0 to %r9d.

for %edi, bit 16-31 is cleared by kmovw which means %r9d is not equal
to %xmm2 as a SImode value.

> Thanks,
> Richard
>
>
> >
> > i.e
> >         kmovw   %k0, %edi
> >         vmovd   %edi, %xmm2
> >         vpshuflw        $0, %xmm2, %xmm0
> >         kmovw   %k0, %r8d
> >         kmovd   %k0, %r9d
> > ...
> > -        movl %r9d, %r11d
> > +        vmovd %xmm2, %r11d
> >
> >   Bootstrap and regtested on x86_64-linux-gnu{-m32,}.
> >   Ok for trunk?
> >
> > gcc/ChangeLog:
> >
> >         PR rtl-optimization/98694
> >         * regcprop.c (copy_value): If SRC had been assigned a mode
> >         narrower than the copy, we can't link DEST into the chain even
> >         they have same hard_regno_nregs(i.e. HImode/SImode in i386
> >         backend).
> >
> > gcc/testsuite/ChangeLog:
> >
> >         PR rtl-optimization/98694
> >         * gcc.target/i386/pr98694.c: New test.
> >
> >   ---
> >  gcc/regcprop.c                          |  3 +-
> >  gcc/testsuite/gcc.target/i386/pr98694.c | 38 +++++++++++++++++++++++++
> >  2 files changed, 40 insertions(+), 1 deletion(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr98694.c
> >
> > diff --git a/gcc/regcprop.c b/gcc/regcprop.c
> > index dd62cb36013..997516eca07 100644
> > --- a/gcc/regcprop.c
> > +++ b/gcc/regcprop.c
> > @@ -355,7 +355,8 @@ copy_value (rtx dest, rtx src, struct value_data *vd)
> >    /* If SRC had been assigned a mode narrower than the copy, we can't
> >       link DEST into the chain, because not all of the pieces of the
> >       copy came from oldest_regno.  */
> > -  else if (sn > hard_regno_nregs (sr, vd->e[sr].mode))
> > +  else if (sn > hard_regno_nregs (sr, vd->e[sr].mode)
> > +          || partial_subreg_p (vd->e[sr].mode, GET_MODE (src)))
> >      return;
> >
> >    /* Link DR at the end of the value chain used by SR.  */
> > diff --git a/gcc/testsuite/gcc.target/i386/pr98694.c
> > b/gcc/testsuite/gcc.target/i386/pr98694.c
> > new file mode 100644
> > index 00000000000..611f9e77627
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr98694.c
> > @@ -0,0 +1,38 @@
> > +/* PR rtl-optimization/98694 */
> > +/* { dg-do run { target { ! ia32 } } } */
> > +/* { dg-options "-O2 -mavx512bw" } */
> > +/* { dg-require-effective-target avx512bw } */
> > +
> > +#include<immintrin.h>
> > +typedef short v4hi __attribute__ ((vector_size (8)));
> > +typedef int v2si __attribute__ ((vector_size (8)));
> > +v4hi b;
> > +
> > +__attribute__ ((noipa))
> > +v2si
> > +foo (__m512i src1, __m512i src2)
> > +{
> > +  __mmask64 m = _mm512_cmpeq_epu8_mask (src1, src2);
> > +  short s = (short) m;
> > +  int i = (int)m;
> > +  b = __extension__ (v4hi) {s, s, s, s};
> > +  return __extension__ (v2si) {i, i};
> > +}
> > +
> > +int main ()
> > +{
> > +  __m512i src1 = _mm512_setzero_si512 ();
> > +  __m512i src2 = _mm512_set_epi8 (0, 1, 0, 1, 0, 1, 0, 1,
> > +                                 0, 1, 0, 1, 0, 1, 0, 1,
> > +                                 0, 1, 0, 1, 0, 1, 0, 1,
> > +                                 0, 1, 0, 1, 0, 1, 0, 1,
> > +                                 0, 1, 0, 1, 0, 1, 0, 1,
> > +                                 0, 1, 0, 1, 0, 1, 0, 1,
> > +                                 0, 1, 0, 1, 0, 1, 0, 1,
> > +                                 0, 1, 0, 1, 0, 1, 0, 1);
> > +  __mmask64 m = _mm512_cmpeq_epu8_mask (src1, src2);
> > +  v2si a = foo (src1, src2);
> > +  if (a[0] != (int)m)
> > +    __builtin_abort ();
> > +  return 0;
> > +}
> > --



--
BR,
Hongtao

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] [PR rtl/optimization/98694] Fix incorrect optimization by cprop_hardreg.
  2021-01-18 10:43   ` Hongtao Liu
@ 2021-01-18 10:51     ` Hongtao Liu
  2021-01-18 11:10     ` Richard Sandiford
  1 sibling, 0 replies; 20+ messages in thread
From: Hongtao Liu @ 2021-01-18 10:51 UTC (permalink / raw)
  To: Hongtao Liu via Gcc-patches, ebotcazou, steven, Hongtao Liu,
	Jakub Jelinek, Richard Sandiford

On Mon, Jan 18, 2021 at 6:43 PM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Mon, Jan 18, 2021 at 6:18 PM Richard Sandiford
> <richard.sandiford@arm.com> wrote:
> >
> > Hongtao Liu via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > > Hi:
> > >   If SRC had been assigned a mode narrower than the copy, we can't link
> > > DEST into the chain even they have same
> > > hard_regno_nregs(i.e. HImode/SImode in i386 backend).
> >
> > In general, changes between modes within the same hard register are OK.
> > Could you explain in more detail what's going wrong?

For simplicity, If the copy of narrow mode has the side effect of
clearing the upper bits of the same hard register, But this behavior
is not described in the insn pattern, shouldn't it be wrong to add
different modes to the same value chain.

> >
>
> cprop hardreg change
>
> (insn 457 499 460 33 (set (reg:SI 39 r11 [orig:86 _11 ] [86])
>         (reg:SI 37 r9 [orig:86 _11 ] [86])) "test.c":29:36 75 {*movsi_internal}
>      (expr_list:REG_DEAD (reg:SI 37 r9 [orig:86 _11 ] [86])
>         (nil)))
>
> to
>
> (insn 457 499 460 33 (set (reg:SI 39 r11 [orig:86 _11 ] [86])
>         (reg:SI 22 xmm2 [orig:86 _11 ] [86])) "test.c":29:36 75
> {*movsi_internal}
>      (expr_list:REG_DEAD (reg:SI 22 xmm2 [orig:86 _11 ] [86])
>         (nil)))
>
> since (reg:SI 22 xmm2) and (reg:SI r9) are in the same value chain in
> which the oldest regno is k0.
>
> but with xmm2 defined as
>
> kmovw %k0, %edi  # 69 [c=4 l=4] *movhi_internal/6----- kmovw move the
> lower 16bits to %edi, and clear the upper 16 bits.
> vmovd %edi, %xmm2 # 489 *movsi_internal  --- vmovd move 32bits from
> %edi to %xmm2.
>
> (insn 69 68 70 12 (set (reg:HI 5 di [orig:96 _52 ] [96])
>         (reg:HI 68 k0 [orig:82 var_6.0_1 ] [82])) "test.c":21:23 76
> {*movhi_internal}
>      (nil))
>
> (insn 489 75 78 12 (set (reg:SI 22 xmm2 [297])
>         (reg:SI 5 di [orig:96 _52 ] [96])) 75 {*movsi_internal}
>      (nil))
> ...
> kmovd %k0, %r9d (movsi) ---- kmovd move 32bits from %k0 to %r9d.
>
> for %edi, bit 16-31 is cleared by kmovw which means %r9d is not equal
> to %xmm2 as a SImode value.
>
> > Thanks,
> > Richard
> >
> >
> > >
> > > i.e
> > >         kmovw   %k0, %edi
> > >         vmovd   %edi, %xmm2
> > >         vpshuflw        $0, %xmm2, %xmm0
> > >         kmovw   %k0, %r8d
> > >         kmovd   %k0, %r9d
> > > ...
> > > -        movl %r9d, %r11d
> > > +        vmovd %xmm2, %r11d
> > >
> > >   Bootstrap and regtested on x86_64-linux-gnu{-m32,}.
> > >   Ok for trunk?
> > >
> > > gcc/ChangeLog:
> > >
> > >         PR rtl-optimization/98694
> > >         * regcprop.c (copy_value): If SRC had been assigned a mode
> > >         narrower than the copy, we can't link DEST into the chain even
> > >         they have same hard_regno_nregs(i.e. HImode/SImode in i386
> > >         backend).
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >         PR rtl-optimization/98694
> > >         * gcc.target/i386/pr98694.c: New test.
> > >
> > >   ---
> > >  gcc/regcprop.c                          |  3 +-
> > >  gcc/testsuite/gcc.target/i386/pr98694.c | 38 +++++++++++++++++++++++++
> > >  2 files changed, 40 insertions(+), 1 deletion(-)
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr98694.c
> > >
> > > diff --git a/gcc/regcprop.c b/gcc/regcprop.c
> > > index dd62cb36013..997516eca07 100644
> > > --- a/gcc/regcprop.c
> > > +++ b/gcc/regcprop.c
> > > @@ -355,7 +355,8 @@ copy_value (rtx dest, rtx src, struct value_data *vd)
> > >    /* If SRC had been assigned a mode narrower than the copy, we can't
> > >       link DEST into the chain, because not all of the pieces of the
> > >       copy came from oldest_regno.  */
> > > -  else if (sn > hard_regno_nregs (sr, vd->e[sr].mode))
> > > +  else if (sn > hard_regno_nregs (sr, vd->e[sr].mode)
> > > +          || partial_subreg_p (vd->e[sr].mode, GET_MODE (src)))
> > >      return;
> > >
> > >    /* Link DR at the end of the value chain used by SR.  */
> > > diff --git a/gcc/testsuite/gcc.target/i386/pr98694.c
> > > b/gcc/testsuite/gcc.target/i386/pr98694.c
> > > new file mode 100644
> > > index 00000000000..611f9e77627
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/pr98694.c
> > > @@ -0,0 +1,38 @@
> > > +/* PR rtl-optimization/98694 */
> > > +/* { dg-do run { target { ! ia32 } } } */
> > > +/* { dg-options "-O2 -mavx512bw" } */
> > > +/* { dg-require-effective-target avx512bw } */
> > > +
> > > +#include<immintrin.h>
> > > +typedef short v4hi __attribute__ ((vector_size (8)));
> > > +typedef int v2si __attribute__ ((vector_size (8)));
> > > +v4hi b;
> > > +
> > > +__attribute__ ((noipa))
> > > +v2si
> > > +foo (__m512i src1, __m512i src2)
> > > +{
> > > +  __mmask64 m = _mm512_cmpeq_epu8_mask (src1, src2);
> > > +  short s = (short) m;
> > > +  int i = (int)m;
> > > +  b = __extension__ (v4hi) {s, s, s, s};
> > > +  return __extension__ (v2si) {i, i};
> > > +}
> > > +
> > > +int main ()
> > > +{
> > > +  __m512i src1 = _mm512_setzero_si512 ();
> > > +  __m512i src2 = _mm512_set_epi8 (0, 1, 0, 1, 0, 1, 0, 1,
> > > +                                 0, 1, 0, 1, 0, 1, 0, 1,
> > > +                                 0, 1, 0, 1, 0, 1, 0, 1,
> > > +                                 0, 1, 0, 1, 0, 1, 0, 1,
> > > +                                 0, 1, 0, 1, 0, 1, 0, 1,
> > > +                                 0, 1, 0, 1, 0, 1, 0, 1,
> > > +                                 0, 1, 0, 1, 0, 1, 0, 1,
> > > +                                 0, 1, 0, 1, 0, 1, 0, 1);
> > > +  __mmask64 m = _mm512_cmpeq_epu8_mask (src1, src2);
> > > +  v2si a = foo (src1, src2);
> > > +  if (a[0] != (int)m)
> > > +    __builtin_abort ();
> > > +  return 0;
> > > +}
> > > --
>
>
>
> --
> BR,
> Hongtao



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] [PR rtl/optimization/98694] Fix incorrect optimization by cprop_hardreg.
  2021-01-18 10:43   ` Hongtao Liu
  2021-01-18 10:51     ` Hongtao Liu
@ 2021-01-18 11:10     ` Richard Sandiford
  2021-01-19  0:59       ` Hongtao Liu
  1 sibling, 1 reply; 20+ messages in thread
From: Richard Sandiford @ 2021-01-18 11:10 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: Hongtao Liu via Gcc-patches, ebotcazou, steven, Jakub Jelinek

Hongtao Liu <crazylht@gmail.com> writes:
> On Mon, Jan 18, 2021 at 6:18 PM Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>>
>> Hongtao Liu via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> > Hi:
>> >   If SRC had been assigned a mode narrower than the copy, we can't link
>> > DEST into the chain even they have same
>> > hard_regno_nregs(i.e. HImode/SImode in i386 backend).
>>
>> In general, changes between modes within the same hard register are OK.
>> Could you explain in more detail what's going wrong?
>>
>
> cprop hardreg change
>
> (insn 457 499 460 33 (set (reg:SI 39 r11 [orig:86 _11 ] [86])
>         (reg:SI 37 r9 [orig:86 _11 ] [86])) "test.c":29:36 75 {*movsi_internal}
>      (expr_list:REG_DEAD (reg:SI 37 r9 [orig:86 _11 ] [86])
>         (nil)))
>
> to
>
> (insn 457 499 460 33 (set (reg:SI 39 r11 [orig:86 _11 ] [86])
>         (reg:SI 22 xmm2 [orig:86 _11 ] [86])) "test.c":29:36 75
> {*movsi_internal}
>      (expr_list:REG_DEAD (reg:SI 22 xmm2 [orig:86 _11 ] [86])
>         (nil)))
>
> since (reg:SI 22 xmm2) and (reg:SI r9) are in the same value chain in
> which the oldest regno is k0.
>
> but with xmm2 defined as
>
> kmovw %k0, %edi  # 69 [c=4 l=4] *movhi_internal/6----- kmovw move the
> lower 16bits to %edi, and clear the upper 16 bits.
> vmovd %edi, %xmm2 # 489 *movsi_internal  --- vmovd move 32bits from
> %edi to %xmm2.
>
> (insn 69 68 70 12 (set (reg:HI 5 di [orig:96 _52 ] [96])
>         (reg:HI 68 k0 [orig:82 var_6.0_1 ] [82])) "test.c":21:23 76
> {*movhi_internal}
>      (nil))
>
> (insn 489 75 78 12 (set (reg:SI 22 xmm2 [297])
>         (reg:SI 5 di [orig:96 _52 ] [96])) 75 {*movsi_internal}
>      (nil))

The sequence is OK in itself, but insn 489 can't make any assumptions
about what's in the upper 16 bits of %edi.  In other words, as far as
RTL semantics are concerned, insn 489 only leaves bits 0-15 of %xmm2
with defined values; the other bits are undefined.

If the target wants all 32 bits of %edi to be carried over to insn 489
then it needs to make insn 69 an SImode set instead of a HImode set.

So what cprop is doing is OK: it's changing the values of undefined
bits but not changing the definition of defined bits (from an RTL
point of view).

Thanks,
Richard

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] [PR rtl/optimization/98694] Fix incorrect optimization by cprop_hardreg.
  2021-01-18 11:10     ` Richard Sandiford
@ 2021-01-19  0:59       ` Hongtao Liu
  2021-01-19 12:38         ` Richard Sandiford
  0 siblings, 1 reply; 20+ messages in thread
From: Hongtao Liu @ 2021-01-19  0:59 UTC (permalink / raw)
  To: Hongtao Liu, Hongtao Liu via Gcc-patches, ebotcazou, steven,
	Jakub Jelinek, Richard Sandiford

On Mon, Jan 18, 2021 at 7:10 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Hongtao Liu <crazylht@gmail.com> writes:
> > On Mon, Jan 18, 2021 at 6:18 PM Richard Sandiford
> > <richard.sandiford@arm.com> wrote:
> >>
> >> Hongtao Liu via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> >> > Hi:
> >> >   If SRC had been assigned a mode narrower than the copy, we can't link
> >> > DEST into the chain even they have same
> >> > hard_regno_nregs(i.e. HImode/SImode in i386 backend).
> >>
> >> In general, changes between modes within the same hard register are OK.
> >> Could you explain in more detail what's going wrong?
> >>
> >
> > cprop hardreg change
> >
> > (insn 457 499 460 33 (set (reg:SI 39 r11 [orig:86 _11 ] [86])
> >         (reg:SI 37 r9 [orig:86 _11 ] [86])) "test.c":29:36 75 {*movsi_internal}
> >      (expr_list:REG_DEAD (reg:SI 37 r9 [orig:86 _11 ] [86])
> >         (nil)))
> >
> > to
> >
> > (insn 457 499 460 33 (set (reg:SI 39 r11 [orig:86 _11 ] [86])
> >         (reg:SI 22 xmm2 [orig:86 _11 ] [86])) "test.c":29:36 75
> > {*movsi_internal}
> >      (expr_list:REG_DEAD (reg:SI 22 xmm2 [orig:86 _11 ] [86])
> >         (nil)))
> >
> > since (reg:SI 22 xmm2) and (reg:SI r9) are in the same value chain in
> > which the oldest regno is k0.
> >
> > but with xmm2 defined as
> >
> > kmovw %k0, %edi  # 69 [c=4 l=4] *movhi_internal/6----- kmovw move the
> > lower 16bits to %edi, and clear the upper 16 bits.
> > vmovd %edi, %xmm2 # 489 *movsi_internal  --- vmovd move 32bits from
> > %edi to %xmm2.
> >
> > (insn 69 68 70 12 (set (reg:HI 5 di [orig:96 _52 ] [96])
> >         (reg:HI 68 k0 [orig:82 var_6.0_1 ] [82])) "test.c":21:23 76
> > {*movhi_internal}
> >      (nil))
> >
> > (insn 489 75 78 12 (set (reg:SI 22 xmm2 [297])
> >         (reg:SI 5 di [orig:96 _52 ] [96])) 75 {*movsi_internal}
> >      (nil))
>
> The sequence is OK in itself, but insn 489 can't make any assumptions
> about what's in the upper 16 bits of %edi.  In other words, as far as
> RTL semantics are concerned, insn 489 only leaves bits 0-15 of %xmm2
> with defined values; the other bits are undefined.
>
> If the target wants all 32 bits of %edi to be carried over to insn 489
> then it needs to make insn 69 an SImode set instead of a HImode set.
>

actually only the lower 16bits are needed, the original insn is like

.294.r.ira
(insn 69 68 70 13 (set (reg:HI 96 [ _52 ])
        (subreg:HI (reg:DI 82 [ var_6.0_1 ]) 0)) "test.c":21:23 76
{*movhi_internal}
     (nil))
(insn 78 75 82 13 (set (reg:V4HI 140 [ _283 ])
        (vec_duplicate:V4HI (truncate:HI (subreg:SI (reg:HI 96 [ _52
]) 0)))) 1412 {*vec_dupv4hi}
     (nil))

.295r.reload
(insn 69 68 70 13 (set (reg:HI 5 di [orig:96 _52 ] [96])
        (reg:HI 68 k0 [orig:82 var_6.0_1 ] [82])) "test.c":21:23 76
{*movhi_internal}
     (nil))
(insn 489 75 78 13 (set (reg:SI 22 xmm2 [297])
        (reg:SI 5 di [orig:96 _52 ] [96])) 75 {*movsi_internal}
     (nil))
(insn 78 489 490 13 (set (reg:V4HI 20 xmm0 [orig:140 _283 ] [140])
        (vec_duplicate:V4HI (truncate:HI (reg:SI 22 xmm2 [297]))))
1412 {*vec_dupv4hi}
     (nil))

and insn 489 is created by lra/reload which seems ok for the sequence,
but problemistic with considering the logic of hardreg_cprop.

> So what cprop is doing is OK: it's changing the values of undefined
> bits but not changing the definition of defined bits (from an RTL
> point of view).
>
> Thanks,
> Richard



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] [PR rtl/optimization/98694] Fix incorrect optimization by cprop_hardreg.
  2021-01-19  0:59       ` Hongtao Liu
@ 2021-01-19 12:38         ` Richard Sandiford
  2021-01-19 14:45           ` Jakub Jelinek
  0 siblings, 1 reply; 20+ messages in thread
From: Richard Sandiford @ 2021-01-19 12:38 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: Hongtao Liu via Gcc-patches, ebotcazou, steven, Jakub Jelinek

Hongtao Liu <crazylht@gmail.com> writes:
> On Mon, Jan 18, 2021 at 7:10 PM Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>>
>> Hongtao Liu <crazylht@gmail.com> writes:
>> > On Mon, Jan 18, 2021 at 6:18 PM Richard Sandiford
>> > <richard.sandiford@arm.com> wrote:
>> >>
>> >> Hongtao Liu via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> >> > Hi:
>> >> >   If SRC had been assigned a mode narrower than the copy, we can't link
>> >> > DEST into the chain even they have same
>> >> > hard_regno_nregs(i.e. HImode/SImode in i386 backend).
>> >>
>> >> In general, changes between modes within the same hard register are OK.
>> >> Could you explain in more detail what's going wrong?
>> >>
>> >
>> > cprop hardreg change
>> >
>> > (insn 457 499 460 33 (set (reg:SI 39 r11 [orig:86 _11 ] [86])
>> >         (reg:SI 37 r9 [orig:86 _11 ] [86])) "test.c":29:36 75 {*movsi_internal}
>> >      (expr_list:REG_DEAD (reg:SI 37 r9 [orig:86 _11 ] [86])
>> >         (nil)))
>> >
>> > to
>> >
>> > (insn 457 499 460 33 (set (reg:SI 39 r11 [orig:86 _11 ] [86])
>> >         (reg:SI 22 xmm2 [orig:86 _11 ] [86])) "test.c":29:36 75
>> > {*movsi_internal}
>> >      (expr_list:REG_DEAD (reg:SI 22 xmm2 [orig:86 _11 ] [86])
>> >         (nil)))
>> >
>> > since (reg:SI 22 xmm2) and (reg:SI r9) are in the same value chain in
>> > which the oldest regno is k0.
>> >
>> > but with xmm2 defined as
>> >
>> > kmovw %k0, %edi  # 69 [c=4 l=4] *movhi_internal/6----- kmovw move the
>> > lower 16bits to %edi, and clear the upper 16 bits.
>> > vmovd %edi, %xmm2 # 489 *movsi_internal  --- vmovd move 32bits from
>> > %edi to %xmm2.
>> >
>> > (insn 69 68 70 12 (set (reg:HI 5 di [orig:96 _52 ] [96])
>> >         (reg:HI 68 k0 [orig:82 var_6.0_1 ] [82])) "test.c":21:23 76
>> > {*movhi_internal}
>> >      (nil))
>> >
>> > (insn 489 75 78 12 (set (reg:SI 22 xmm2 [297])
>> >         (reg:SI 5 di [orig:96 _52 ] [96])) 75 {*movsi_internal}
>> >      (nil))
>>
>> The sequence is OK in itself, but insn 489 can't make any assumptions
>> about what's in the upper 16 bits of %edi.  In other words, as far as
>> RTL semantics are concerned, insn 489 only leaves bits 0-15 of %xmm2
>> with defined values; the other bits are undefined.
>>
>> If the target wants all 32 bits of %edi to be carried over to insn 489
>> then it needs to make insn 69 an SImode set instead of a HImode set.
>>
>
> actually only the lower 16bits are needed, the original insn is like
>
> .294.r.ira
> (insn 69 68 70 13 (set (reg:HI 96 [ _52 ])
>         (subreg:HI (reg:DI 82 [ var_6.0_1 ]) 0)) "test.c":21:23 76
> {*movhi_internal}
>      (nil))
> (insn 78 75 82 13 (set (reg:V4HI 140 [ _283 ])
>         (vec_duplicate:V4HI (truncate:HI (subreg:SI (reg:HI 96 [ _52
> ]) 0)))) 1412 {*vec_dupv4hi}
>      (nil))
>
> .295r.reload
> (insn 69 68 70 13 (set (reg:HI 5 di [orig:96 _52 ] [96])
>         (reg:HI 68 k0 [orig:82 var_6.0_1 ] [82])) "test.c":21:23 76
> {*movhi_internal}
>      (nil))
> (insn 489 75 78 13 (set (reg:SI 22 xmm2 [297])
>         (reg:SI 5 di [orig:96 _52 ] [96])) 75 {*movsi_internal}
>      (nil))
> (insn 78 489 490 13 (set (reg:V4HI 20 xmm0 [orig:140 _283 ] [140])
>         (vec_duplicate:V4HI (truncate:HI (reg:SI 22 xmm2 [297]))))
> 1412 {*vec_dupv4hi}
>      (nil))
>
> and insn 489 is created by lra/reload which seems ok for the sequence,
> but problemistic with considering the logic of hardreg_cprop.

It looks OK even with the regcprop behaviour though:

- insn 69 defines only the low 16 bits of di,
- insn 489 defines only the low 16 bits of xmm2, but copies bits 16-31
  too (with unknown contents)
- insn 78 uses only the low 16 bits of xmm2 (the unknown contents
  introduced by insn 489 are truncated away)

So where do bits 16-31 become significant?  What goes wrong if they're
not zero?

Thanks,
Richard

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] [PR rtl/optimization/98694] Fix incorrect optimization by cprop_hardreg.
  2021-01-19 12:38         ` Richard Sandiford
@ 2021-01-19 14:45           ` Jakub Jelinek
  2021-01-19 16:10             ` Richard Sandiford
  0 siblings, 1 reply; 20+ messages in thread
From: Jakub Jelinek @ 2021-01-19 14:45 UTC (permalink / raw)
  To: Hongtao Liu, Hongtao Liu via Gcc-patches, ebotcazou, steven,
	richard.sandiford

On Tue, Jan 19, 2021 at 12:38:47PM +0000, Richard Sandiford via Gcc-patches wrote:
> > actually only the lower 16bits are needed, the original insn is like
> >
> > .294.r.ira
> > (insn 69 68 70 13 (set (reg:HI 96 [ _52 ])
> >         (subreg:HI (reg:DI 82 [ var_6.0_1 ]) 0)) "test.c":21:23 76
> > {*movhi_internal}
> >      (nil))
> > (insn 78 75 82 13 (set (reg:V4HI 140 [ _283 ])
> >         (vec_duplicate:V4HI (truncate:HI (subreg:SI (reg:HI 96 [ _52
> > ]) 0)))) 1412 {*vec_dupv4hi}
> >      (nil))
> >
> > .295r.reload
> > (insn 69 68 70 13 (set (reg:HI 5 di [orig:96 _52 ] [96])
> >         (reg:HI 68 k0 [orig:82 var_6.0_1 ] [82])) "test.c":21:23 76
> > {*movhi_internal}
> >      (nil))
> > (insn 489 75 78 13 (set (reg:SI 22 xmm2 [297])
> >         (reg:SI 5 di [orig:96 _52 ] [96])) 75 {*movsi_internal}
> >      (nil))
> > (insn 78 489 490 13 (set (reg:V4HI 20 xmm0 [orig:140 _283 ] [140])
> >         (vec_duplicate:V4HI (truncate:HI (reg:SI 22 xmm2 [297]))))
> > 1412 {*vec_dupv4hi}
> >      (nil))
> >
> > and insn 489 is created by lra/reload which seems ok for the sequence,
> > but problemistic with considering the logic of hardreg_cprop.
> 
> It looks OK even with the regcprop behaviour though:
> 
> - insn 69 defines only the low 16 bits of di,
> - insn 489 defines only the low 16 bits of xmm2, but copies bits 16-31
>   too (with unknown contents)
> - insn 78 uses only the low 16 bits of xmm2 (the unknown contents
>   introduced by insn 489 are truncated away)
> 
> So where do bits 16-31 become significant?  What goes wrong if they're
> not zero?

The k0 register is initialized I believe with
(insn 20 2 21 2 (set (reg:DI 68 k0 [orig:82 var_6.0_1 ] [82])
        (mem/c:DI (symbol_ref:DI ("var_6") [flags 0x40]  <var_decl 0x7f7babeaaf30 var_6>) [3 var_6+0 S8 A64])) "pr98694.C":21:10 74 {*movdi_internal}
     (nil))
and so it contains all 64-bits, and then the code sometimes uses all the
bits, sometimes just the low 16-bits and sometimes low 32-bits of that
value.
(insn 69 68 70 12 (set (reg:HI 5 di [orig:96 _52 ] [96])
        (reg:HI 68 k0 [orig:82 var_6.0_1 ] [82])) "pr98694.C":27:23 76 {*movhi_internal}
     (nil))
(insn 74 73 75 12 (set (reg:SI 36 r8 [orig:149 _52 ] [149])
        (zero_extend:SI (reg:HI 68 k0 [orig:82 var_6.0_1 ] [82]))) 144 {*zero_extendhisi2}
     (nil))
(insn 489 75 78 12 (set (reg:SI 22 xmm2 [297])
        (reg:SI 5 di [orig:96 _52 ] [96])) 75 {*movsi_internal}
     (nil))
(insn 78 489 490 12 (set (reg:V4HI 20 xmm0 [orig:140 _283 ] [140])
        (vec_duplicate:V4HI (truncate:HI (reg:SI 22 xmm2 [297])))) 1412 {*vec_dupv4hi}
     (expr_list:REG_DEAD (reg:SI 22 xmm2 [297])
        (nil)))
are examples when it uses only the low 16 bits from that, and
(insn 487 72 73 12 (set (reg:SI 1 dx [148])
        (reg:SI 68 k0 [orig:82 var_6.0_1 ] [82])) 75 {*movsi_internal}
     (nil))

(insn 85 84 491 13 (set (reg:SI 37 r9 [orig:86 _11 ] [86])
        (reg:SI 68 k0 [orig:82 var_6.0_1 ] [82])) "pr98694.C":28:14 75 {*movsi_internal}
     (nil))

(insn 491 85 88 13 (set (reg:SI 3 bx [299])
        (reg:SI 68 k0 [orig:82 var_6.0_1 ] [82])) 75 {*movsi_internal}
     (nil))
(insn 88 491 89 13 (set (reg:CCNO 17 flags)
        (compare:CCNO (reg:SI 3 bx [299])
            (const_int 0 [0]))) 7 {*cmpsi_ccno_1}
     (expr_list:REG_DEAD (reg:SI 3 bx [299])
        (nil)))

(insn 457 499 460 33 (set (reg:SI 39 r11 [orig:86 _11 ] [86])
        (reg:SI 37 r9 [orig:86 _11 ] [86])) "pr98694.C":35:36 75 {*movsi_internal}
     (expr_list:REG_DEAD (reg:SI 37 r9 [orig:86 _11 ] [86])
        (nil)))
are examples where it uses low 32-bits from k0.
So the
 (insn 457 499 460 33 (set (reg:SI 39 r11 [orig:86 _11 ] [86])
-        (reg:SI 37 r9 [orig:86 _11 ] [86])) "pr98694.C":35:36 75 {*movsi_internal}
-     (expr_list:REG_DEAD (reg:SI 37 r9 [orig:86 _11 ] [86])
+        (reg:SI 22 xmm2 [orig:86 _11 ] [86])) "pr98694.C":35:36 75 {*movsi_internal}
+     (expr_list:REG_DEAD (reg:SI 22 xmm2 [orig:86 _11 ] [86])
         (nil)))
cprop_hardreg change indeed looks bogus, while xmm2 has SImode, it holds
only the low 16-bits of the value and has the upper bits undefined, while r9
it is replacing had all of the low 32-bits well defined.

	Jakub


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] [PR rtl/optimization/98694] Fix incorrect optimization by cprop_hardreg.
  2021-01-19 14:45           ` Jakub Jelinek
@ 2021-01-19 16:10             ` Richard Sandiford
  2021-01-20  4:35               ` Hongtao Liu
  2021-05-05 17:44               ` [PATCH] regcprop: Fix another cprop_hardreg bug [PR100342] Jakub Jelinek
  0 siblings, 2 replies; 20+ messages in thread
From: Richard Sandiford @ 2021-01-19 16:10 UTC (permalink / raw)
  To: Jakub Jelinek via Gcc-patches
  Cc: Hongtao Liu, ebotcazou, steven, Jakub Jelinek

Jakub Jelinek via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> On Tue, Jan 19, 2021 at 12:38:47PM +0000, Richard Sandiford via Gcc-patches wrote:
>> > actually only the lower 16bits are needed, the original insn is like
>> >
>> > .294.r.ira
>> > (insn 69 68 70 13 (set (reg:HI 96 [ _52 ])
>> >         (subreg:HI (reg:DI 82 [ var_6.0_1 ]) 0)) "test.c":21:23 76
>> > {*movhi_internal}
>> >      (nil))
>> > (insn 78 75 82 13 (set (reg:V4HI 140 [ _283 ])
>> >         (vec_duplicate:V4HI (truncate:HI (subreg:SI (reg:HI 96 [ _52
>> > ]) 0)))) 1412 {*vec_dupv4hi}
>> >      (nil))
>> >
>> > .295r.reload
>> > (insn 69 68 70 13 (set (reg:HI 5 di [orig:96 _52 ] [96])
>> >         (reg:HI 68 k0 [orig:82 var_6.0_1 ] [82])) "test.c":21:23 76
>> > {*movhi_internal}
>> >      (nil))
>> > (insn 489 75 78 13 (set (reg:SI 22 xmm2 [297])
>> >         (reg:SI 5 di [orig:96 _52 ] [96])) 75 {*movsi_internal}
>> >      (nil))
>> > (insn 78 489 490 13 (set (reg:V4HI 20 xmm0 [orig:140 _283 ] [140])
>> >         (vec_duplicate:V4HI (truncate:HI (reg:SI 22 xmm2 [297]))))
>> > 1412 {*vec_dupv4hi}
>> >      (nil))
>> >
>> > and insn 489 is created by lra/reload which seems ok for the sequence,
>> > but problemistic with considering the logic of hardreg_cprop.
>> 
>> It looks OK even with the regcprop behaviour though:
>> 
>> - insn 69 defines only the low 16 bits of di,
>> - insn 489 defines only the low 16 bits of xmm2, but copies bits 16-31
>>   too (with unknown contents)
>> - insn 78 uses only the low 16 bits of xmm2 (the unknown contents
>>   introduced by insn 489 are truncated away)
>> 
>> So where do bits 16-31 become significant?  What goes wrong if they're
>> not zero?
>
> The k0 register is initialized I believe with
> (insn 20 2 21 2 (set (reg:DI 68 k0 [orig:82 var_6.0_1 ] [82])
>         (mem/c:DI (symbol_ref:DI ("var_6") [flags 0x40]  <var_decl 0x7f7babeaaf30 var_6>) [3 var_6+0 S8 A64])) "pr98694.C":21:10 74 {*movdi_internal}
>      (nil))
> and so it contains all 64-bits, and then the code sometimes uses all the
> bits, sometimes just the low 16-bits and sometimes low 32-bits of that
> value.
> (insn 69 68 70 12 (set (reg:HI 5 di [orig:96 _52 ] [96])
>         (reg:HI 68 k0 [orig:82 var_6.0_1 ] [82])) "pr98694.C":27:23 76 {*movhi_internal}
>      (nil))
> (insn 74 73 75 12 (set (reg:SI 36 r8 [orig:149 _52 ] [149])
>         (zero_extend:SI (reg:HI 68 k0 [orig:82 var_6.0_1 ] [82]))) 144 {*zero_extendhisi2}
>      (nil))
> (insn 489 75 78 12 (set (reg:SI 22 xmm2 [297])
>         (reg:SI 5 di [orig:96 _52 ] [96])) 75 {*movsi_internal}
>      (nil))
> (insn 78 489 490 12 (set (reg:V4HI 20 xmm0 [orig:140 _283 ] [140])
>         (vec_duplicate:V4HI (truncate:HI (reg:SI 22 xmm2 [297])))) 1412 {*vec_dupv4hi}
>      (expr_list:REG_DEAD (reg:SI 22 xmm2 [297])
>         (nil)))
> are examples when it uses only the low 16 bits from that, and
> (insn 487 72 73 12 (set (reg:SI 1 dx [148])
>         (reg:SI 68 k0 [orig:82 var_6.0_1 ] [82])) 75 {*movsi_internal}
>      (nil))
>
> (insn 85 84 491 13 (set (reg:SI 37 r9 [orig:86 _11 ] [86])
>         (reg:SI 68 k0 [orig:82 var_6.0_1 ] [82])) "pr98694.C":28:14 75 {*movsi_internal}
>      (nil))
>
> (insn 491 85 88 13 (set (reg:SI 3 bx [299])
>         (reg:SI 68 k0 [orig:82 var_6.0_1 ] [82])) 75 {*movsi_internal}
>      (nil))
> (insn 88 491 89 13 (set (reg:CCNO 17 flags)
>         (compare:CCNO (reg:SI 3 bx [299])
>             (const_int 0 [0]))) 7 {*cmpsi_ccno_1}
>      (expr_list:REG_DEAD (reg:SI 3 bx [299])
>         (nil)))
>
> (insn 457 499 460 33 (set (reg:SI 39 r11 [orig:86 _11 ] [86])
>         (reg:SI 37 r9 [orig:86 _11 ] [86])) "pr98694.C":35:36 75 {*movsi_internal}
>      (expr_list:REG_DEAD (reg:SI 37 r9 [orig:86 _11 ] [86])
>         (nil)))
> are examples where it uses low 32-bits from k0.
> So the
>  (insn 457 499 460 33 (set (reg:SI 39 r11 [orig:86 _11 ] [86])
> -        (reg:SI 37 r9 [orig:86 _11 ] [86])) "pr98694.C":35:36 75 {*movsi_internal}
> -     (expr_list:REG_DEAD (reg:SI 37 r9 [orig:86 _11 ] [86])
> +        (reg:SI 22 xmm2 [orig:86 _11 ] [86])) "pr98694.C":35:36 75 {*movsi_internal}
> +     (expr_list:REG_DEAD (reg:SI 22 xmm2 [orig:86 _11 ] [86])
>          (nil)))
> cprop_hardreg change indeed looks bogus, while xmm2 has SImode, it holds
> only the low 16-bits of the value and has the upper bits undefined, while r9
> it is replacing had all of the low 32-bits well defined.

Ah, ok, thanks for the extra context.

So AIUI the problem when recording xmm2<-di isn't just:

 [A] partial_subreg_p (vd->e[sr].mode, GET_MODE (src))

but also that:

 [B] partial_subreg_p (vd->e[sr].mode, vd->e[vd->e[sr].oldest_regno].mode)

For example, all registers in this sequence can be part of the same chain:

    (set (reg:HI R1) (reg:HI R0))
    (set (reg:SI R2) (reg:SI R1)) // [A]
    (set (reg:DI R3) (reg:DI R2)) // [A]
    (set (reg:SI R4) (reg:SI R[0-3]))
    (set (reg:HI R5) (reg:HI R[0-4]))

But:

    (set (reg:SI R1) (reg:SI R0))
    (set (reg:HI R2) (reg:HI R1))
    (set (reg:SI R3) (reg:SI R2)) // [A] && [B]

is problematic because it dips below the precision of the oldest regno
and then increases again.

When this happens, I guess we have two choices:

(1) what the patch does: treat R3 as the start of a new chain.
(2) pretend that the copy occured in vd->e[sr].mode instead
    (i.e. copy vd->e[sr].mode to vd->e[dr].mode)

I guess (2) would need to be subject to REG_CAN_CHANGE_MODE_P.
Maybe the optimisation provided by (2) compared to (1) isn't common
enough to be worth the complication.

I think we should test [B] as well as [A] though.  The pass is set
up to do some quite elaborate mode changes and I think rejecting
[A] on its own would make some of the other code redundant.
It also feels like it should be a seperate “if” or “else if”,
with its own comment.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] [PR rtl/optimization/98694] Fix incorrect optimization by cprop_hardreg.
  2021-01-19 16:10             ` Richard Sandiford
@ 2021-01-20  4:35               ` Hongtao Liu
  2021-01-20  4:40                 ` Hongtao Liu
                                   ` (2 more replies)
  2021-05-05 17:44               ` [PATCH] regcprop: Fix another cprop_hardreg bug [PR100342] Jakub Jelinek
  1 sibling, 3 replies; 20+ messages in thread
From: Hongtao Liu @ 2021-01-20  4:35 UTC (permalink / raw)
  To: Jakub Jelinek via Gcc-patches, Hongtao Liu, ebotcazou, steven,
	Jakub Jelinek, Richard Sandiford

[-- Attachment #1: Type: text/plain, Size: 6360 bytes --]

On Wed, Jan 20, 2021 at 12:10 AM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Jakub Jelinek via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > On Tue, Jan 19, 2021 at 12:38:47PM +0000, Richard Sandiford via Gcc-patches wrote:
> >> > actually only the lower 16bits are needed, the original insn is like
> >> >
> >> > .294.r.ira
> >> > (insn 69 68 70 13 (set (reg:HI 96 [ _52 ])
> >> >         (subreg:HI (reg:DI 82 [ var_6.0_1 ]) 0)) "test.c":21:23 76
> >> > {*movhi_internal}
> >> >      (nil))
> >> > (insn 78 75 82 13 (set (reg:V4HI 140 [ _283 ])
> >> >         (vec_duplicate:V4HI (truncate:HI (subreg:SI (reg:HI 96 [ _52
> >> > ]) 0)))) 1412 {*vec_dupv4hi}
> >> >      (nil))
> >> >
> >> > .295r.reload
> >> > (insn 69 68 70 13 (set (reg:HI 5 di [orig:96 _52 ] [96])
> >> >         (reg:HI 68 k0 [orig:82 var_6.0_1 ] [82])) "test.c":21:23 76
> >> > {*movhi_internal}
> >> >      (nil))
> >> > (insn 489 75 78 13 (set (reg:SI 22 xmm2 [297])
> >> >         (reg:SI 5 di [orig:96 _52 ] [96])) 75 {*movsi_internal}
> >> >      (nil))
> >> > (insn 78 489 490 13 (set (reg:V4HI 20 xmm0 [orig:140 _283 ] [140])
> >> >         (vec_duplicate:V4HI (truncate:HI (reg:SI 22 xmm2 [297]))))
> >> > 1412 {*vec_dupv4hi}
> >> >      (nil))
> >> >
> >> > and insn 489 is created by lra/reload which seems ok for the sequence,
> >> > but problemistic with considering the logic of hardreg_cprop.
> >>
> >> It looks OK even with the regcprop behaviour though:
> >>
> >> - insn 69 defines only the low 16 bits of di,
> >> - insn 489 defines only the low 16 bits of xmm2, but copies bits 16-31
> >>   too (with unknown contents)
> >> - insn 78 uses only the low 16 bits of xmm2 (the unknown contents
> >>   introduced by insn 489 are truncated away)
> >>
> >> So where do bits 16-31 become significant?  What goes wrong if they're
> >> not zero?
> >
> > The k0 register is initialized I believe with
> > (insn 20 2 21 2 (set (reg:DI 68 k0 [orig:82 var_6.0_1 ] [82])
> >         (mem/c:DI (symbol_ref:DI ("var_6") [flags 0x40]  <var_decl 0x7f7babeaaf30 var_6>) [3 var_6+0 S8 A64])) "pr98694.C":21:10 74 {*movdi_internal}
> >      (nil))
> > and so it contains all 64-bits, and then the code sometimes uses all the
> > bits, sometimes just the low 16-bits and sometimes low 32-bits of that
> > value.
> > (insn 69 68 70 12 (set (reg:HI 5 di [orig:96 _52 ] [96])
> >         (reg:HI 68 k0 [orig:82 var_6.0_1 ] [82])) "pr98694.C":27:23 76 {*movhi_internal}
> >      (nil))
> > (insn 74 73 75 12 (set (reg:SI 36 r8 [orig:149 _52 ] [149])
> >         (zero_extend:SI (reg:HI 68 k0 [orig:82 var_6.0_1 ] [82]))) 144 {*zero_extendhisi2}
> >      (nil))
> > (insn 489 75 78 12 (set (reg:SI 22 xmm2 [297])
> >         (reg:SI 5 di [orig:96 _52 ] [96])) 75 {*movsi_internal}
> >      (nil))
> > (insn 78 489 490 12 (set (reg:V4HI 20 xmm0 [orig:140 _283 ] [140])
> >         (vec_duplicate:V4HI (truncate:HI (reg:SI 22 xmm2 [297])))) 1412 {*vec_dupv4hi}
> >      (expr_list:REG_DEAD (reg:SI 22 xmm2 [297])
> >         (nil)))
> > are examples when it uses only the low 16 bits from that, and
> > (insn 487 72 73 12 (set (reg:SI 1 dx [148])
> >         (reg:SI 68 k0 [orig:82 var_6.0_1 ] [82])) 75 {*movsi_internal}
> >      (nil))
> >
> > (insn 85 84 491 13 (set (reg:SI 37 r9 [orig:86 _11 ] [86])
> >         (reg:SI 68 k0 [orig:82 var_6.0_1 ] [82])) "pr98694.C":28:14 75 {*movsi_internal}
> >      (nil))
> >
> > (insn 491 85 88 13 (set (reg:SI 3 bx [299])
> >         (reg:SI 68 k0 [orig:82 var_6.0_1 ] [82])) 75 {*movsi_internal}
> >      (nil))
> > (insn 88 491 89 13 (set (reg:CCNO 17 flags)
> >         (compare:CCNO (reg:SI 3 bx [299])
> >             (const_int 0 [0]))) 7 {*cmpsi_ccno_1}
> >      (expr_list:REG_DEAD (reg:SI 3 bx [299])
> >         (nil)))
> >
> > (insn 457 499 460 33 (set (reg:SI 39 r11 [orig:86 _11 ] [86])
> >         (reg:SI 37 r9 [orig:86 _11 ] [86])) "pr98694.C":35:36 75 {*movsi_internal}
> >      (expr_list:REG_DEAD (reg:SI 37 r9 [orig:86 _11 ] [86])
> >         (nil)))
> > are examples where it uses low 32-bits from k0.
> > So the
> >  (insn 457 499 460 33 (set (reg:SI 39 r11 [orig:86 _11 ] [86])
> > -        (reg:SI 37 r9 [orig:86 _11 ] [86])) "pr98694.C":35:36 75 {*movsi_internal}
> > -     (expr_list:REG_DEAD (reg:SI 37 r9 [orig:86 _11 ] [86])
> > +        (reg:SI 22 xmm2 [orig:86 _11 ] [86])) "pr98694.C":35:36 75 {*movsi_internal}
> > +     (expr_list:REG_DEAD (reg:SI 22 xmm2 [orig:86 _11 ] [86])
> >          (nil)))
> > cprop_hardreg change indeed looks bogus, while xmm2 has SImode, it holds
> > only the low 16-bits of the value and has the upper bits undefined, while r9
> > it is replacing had all of the low 32-bits well defined.
>
> Ah, ok, thanks for the extra context.
>
> So AIUI the problem when recording xmm2<-di isn't just:
>
>  [A] partial_subreg_p (vd->e[sr].mode, GET_MODE (src))
>
> but also that:
>
>  [B] partial_subreg_p (vd->e[sr].mode, vd->e[vd->e[sr].oldest_regno].mode)
>
> For example, all registers in this sequence can be part of the same chain:
>
>     (set (reg:HI R1) (reg:HI R0))
>     (set (reg:SI R2) (reg:SI R1)) // [A]
>     (set (reg:DI R3) (reg:DI R2)) // [A]
>     (set (reg:SI R4) (reg:SI R[0-3]))
>     (set (reg:HI R5) (reg:HI R[0-4]))
>
> But:
>
>     (set (reg:SI R1) (reg:SI R0))
>     (set (reg:HI R2) (reg:HI R1))
>     (set (reg:SI R3) (reg:SI R2)) // [A] && [B]
>
> is problematic because it dips below the precision of the oldest regno
> and then increases again.
>
> When this happens, I guess we have two choices:
>
> (1) what the patch does: treat R3 as the start of a new chain.
> (2) pretend that the copy occured in vd->e[sr].mode instead
>     (i.e. copy vd->e[sr].mode to vd->e[dr].mode)
>
> I guess (2) would need to be subject to REG_CAN_CHANGE_MODE_P.
> Maybe the optimisation provided by (2) compared to (1) isn't common
> enough to be worth the complication.
>
> I think we should test [B] as well as [A] though.  The pass is set
> up to do some quite elaborate mode changes and I think rejecting
> [A] on its own would make some of the other code redundant.
> It also feels like it should be a seperate “if” or “else if”,
> with its own comment.
>
Update patch.
> Thanks,
> Richard



-- 
BR,
Hongtao

[-- Attachment #2: 0001-PR-rtl-optimization-98694-Fix-incorrect-optimization_V2.patch --]
[-- Type: text/x-patch, Size: 4109 bytes --]

From a52b3c8a90a0bf6cbda8ce86d99c82c6182863a7 Mon Sep 17 00:00:00 2001
From: liuhongt <hongtao.liu@intel.com>
Date: Mon, 18 Jan 2021 16:55:32 +0800
Subject: [PATCH] [PR rtl/optimization/98694] Fix incorrect optimization by
 cprop_hardreg.

If SRC had been assigned a mode narrower than the copy, we can't link
DEST into the chain even they have same
hard_regno_nregs(i.e. HImode/SImode in i386 backend).

i.e
        kmovw   %k0, %edi
        vmovd   %edi, %xmm2
	vpshuflw        $0, %xmm2, %xmm0
        kmovw   %k0, %r8d
        kmovd   %k0, %r9d
...
-	 movl %r9d, %r11d
+	 vmovd %xmm2, %r11d

gcc/ChangeLog:

	PR rtl-optimization/98694
	* regcprop.c (copy_value): If SRC had been assigned a mode
	narrower than the copy, we can't link DEST into the chain even
	they have same hard_regno_nregs(i.e. HImode/SImode in i386
	backend).

gcc/testsuite/ChangeLog:

	PR rtl-optimization/98694
	* gcc.target/i386/pr98694.c: New test.
---
 gcc/regcprop.c                          | 33 +++++++++++++++++++++
 gcc/testsuite/gcc.target/i386/pr98694.c | 38 +++++++++++++++++++++++++
 2 files changed, 71 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr98694.c

diff --git a/gcc/regcprop.c b/gcc/regcprop.c
index dd62cb36013..908298beaea 100644
--- a/gcc/regcprop.c
+++ b/gcc/regcprop.c
@@ -358,6 +358,39 @@ copy_value (rtx dest, rtx src, struct value_data *vd)
   else if (sn > hard_regno_nregs (sr, vd->e[sr].mode))
     return;
 
+  /* If SRC had been assigned a mode narrower than the copy, Although
+     they have same hard_regno_nregs, it's not safe to link DEST into the
+     chain. .i.e.
+     (set (reg:DI r1) (reg:DI r0))
+     (set (reg:HI r2) (reg:HI r1))
+     (set (reg:SI r3) (reg:SI r2)) //Should be a new chain start at r3
+     (set (reg:SI r4) (reg:SI r1))
+     (set (reg:SI r5) (reg:SI r4))
+     the upper part of r3 is undefined, if adding it to the chain, it may be
+     prop to r5 which has defined upper bits, .i.e. pr98694.
+
+     [A] partial_subreg_p (vd->e[sr].mode, GET_MODE (src))
+     [B] partial_subreg_p (vd->e[sr].mode, vd->e[vd->e[sr].oldest_regno].mode)
+     Condition B is added to to catch optimization opportunities of
+
+     (set (reg:HI R1) (reg:HI R0))
+     (set (reg:SI R2) (reg:SI R1)) // [A]
+     (set (reg:DI R3) (reg:DI R2)) // [A]
+     (set (reg:SI R4) (reg:SI R[0-3]))
+     (set (reg:HI R5) (reg:HI R[0-4]))
+
+     but problematic for
+
+     (set (reg:SI R1) (reg:SI R0))
+     (set (reg:HI R2) (reg:HI R1))
+     (set (reg:SI R3) (reg:SI R2)) // [A] && [B]
+
+     to be fixed????   */
+  else if (partial_subreg_p (vd->e[sr].mode, GET_MODE (src))
+	   && partial_subreg_p (vd->e[sr].mode,
+				vd->e[vd->e[sr].oldest_regno].mode))
+    return;
+
   /* Link DR at the end of the value chain used by SR.  */
 
   vd->e[dr].oldest_regno = vd->e[sr].oldest_regno;
diff --git a/gcc/testsuite/gcc.target/i386/pr98694.c b/gcc/testsuite/gcc.target/i386/pr98694.c
new file mode 100644
index 00000000000..611f9e77627
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr98694.c
@@ -0,0 +1,38 @@
+/* PR rtl-optimization/98694 */
+/* { dg-do run { target { ! ia32 } } } */
+/* { dg-options "-O2 -mavx512bw" } */
+/* { dg-require-effective-target avx512bw } */
+
+#include<immintrin.h>
+typedef short v4hi __attribute__ ((vector_size (8)));
+typedef int v2si __attribute__ ((vector_size (8)));
+v4hi b;
+
+__attribute__ ((noipa))
+v2si
+foo (__m512i src1, __m512i src2)
+{
+  __mmask64 m = _mm512_cmpeq_epu8_mask (src1, src2);
+  short s = (short) m;
+  int i = (int)m;
+  b = __extension__ (v4hi) {s, s, s, s};
+  return __extension__ (v2si) {i, i};
+}
+
+int main ()
+{
+  __m512i src1 = _mm512_setzero_si512 ();
+  __m512i src2 = _mm512_set_epi8 (0, 1, 0, 1, 0, 1, 0, 1,
+				  0, 1, 0, 1, 0, 1, 0, 1,
+				  0, 1, 0, 1, 0, 1, 0, 1,
+				  0, 1, 0, 1, 0, 1, 0, 1,
+				  0, 1, 0, 1, 0, 1, 0, 1,
+				  0, 1, 0, 1, 0, 1, 0, 1,
+				  0, 1, 0, 1, 0, 1, 0, 1,
+				  0, 1, 0, 1, 0, 1, 0, 1);
+  __mmask64 m = _mm512_cmpeq_epu8_mask (src1, src2);
+  v2si a = foo (src1, src2);
+  if (a[0] != (int)m)
+    __builtin_abort ();
+  return 0;
+}
-- 
2.18.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] [PR rtl/optimization/98694] Fix incorrect optimization by cprop_hardreg.
  2021-01-20  4:35               ` Hongtao Liu
@ 2021-01-20  4:40                 ` Hongtao Liu
  2021-01-20 12:56                 ` H.J. Lu
  2021-01-20 14:14                 ` Richard Sandiford
  2 siblings, 0 replies; 20+ messages in thread
From: Hongtao Liu @ 2021-01-20  4:40 UTC (permalink / raw)
  To: Jakub Jelinek via Gcc-patches, Hongtao Liu, ebotcazou, steven,
	Jakub Jelinek, Richard Sandiford

On Wed, Jan 20, 2021 at 12:35 PM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Wed, Jan 20, 2021 at 12:10 AM Richard Sandiford
> <richard.sandiford@arm.com> wrote:
> >
> > Jakub Jelinek via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > > On Tue, Jan 19, 2021 at 12:38:47PM +0000, Richard Sandiford via Gcc-patches wrote:
> > >> > actually only the lower 16bits are needed, the original insn is like
> > >> >
> > >> > .294.r.ira
> > >> > (insn 69 68 70 13 (set (reg:HI 96 [ _52 ])
> > >> >         (subreg:HI (reg:DI 82 [ var_6.0_1 ]) 0)) "test.c":21:23 76
> > >> > {*movhi_internal}
> > >> >      (nil))
> > >> > (insn 78 75 82 13 (set (reg:V4HI 140 [ _283 ])
> > >> >         (vec_duplicate:V4HI (truncate:HI (subreg:SI (reg:HI 96 [ _52
> > >> > ]) 0)))) 1412 {*vec_dupv4hi}
> > >> >      (nil))
> > >> >
> > >> > .295r.reload
> > >> > (insn 69 68 70 13 (set (reg:HI 5 di [orig:96 _52 ] [96])
> > >> >         (reg:HI 68 k0 [orig:82 var_6.0_1 ] [82])) "test.c":21:23 76
> > >> > {*movhi_internal}
> > >> >      (nil))
> > >> > (insn 489 75 78 13 (set (reg:SI 22 xmm2 [297])
> > >> >         (reg:SI 5 di [orig:96 _52 ] [96])) 75 {*movsi_internal}
> > >> >      (nil))
> > >> > (insn 78 489 490 13 (set (reg:V4HI 20 xmm0 [orig:140 _283 ] [140])
> > >> >         (vec_duplicate:V4HI (truncate:HI (reg:SI 22 xmm2 [297]))))
> > >> > 1412 {*vec_dupv4hi}
> > >> >      (nil))
> > >> >
> > >> > and insn 489 is created by lra/reload which seems ok for the sequence,
> > >> > but problemistic with considering the logic of hardreg_cprop.
> > >>
> > >> It looks OK even with the regcprop behaviour though:
> > >>
> > >> - insn 69 defines only the low 16 bits of di,
> > >> - insn 489 defines only the low 16 bits of xmm2, but copies bits 16-31
> > >>   too (with unknown contents)
> > >> - insn 78 uses only the low 16 bits of xmm2 (the unknown contents
> > >>   introduced by insn 489 are truncated away)
> > >>
> > >> So where do bits 16-31 become significant?  What goes wrong if they're
> > >> not zero?
> > >
> > > The k0 register is initialized I believe with
> > > (insn 20 2 21 2 (set (reg:DI 68 k0 [orig:82 var_6.0_1 ] [82])
> > >         (mem/c:DI (symbol_ref:DI ("var_6") [flags 0x40]  <var_decl 0x7f7babeaaf30 var_6>) [3 var_6+0 S8 A64])) "pr98694.C":21:10 74 {*movdi_internal}
> > >      (nil))
> > > and so it contains all 64-bits, and then the code sometimes uses all the
> > > bits, sometimes just the low 16-bits and sometimes low 32-bits of that
> > > value.
> > > (insn 69 68 70 12 (set (reg:HI 5 di [orig:96 _52 ] [96])
> > >         (reg:HI 68 k0 [orig:82 var_6.0_1 ] [82])) "pr98694.C":27:23 76 {*movhi_internal}
> > >      (nil))
> > > (insn 74 73 75 12 (set (reg:SI 36 r8 [orig:149 _52 ] [149])
> > >         (zero_extend:SI (reg:HI 68 k0 [orig:82 var_6.0_1 ] [82]))) 144 {*zero_extendhisi2}
> > >      (nil))
> > > (insn 489 75 78 12 (set (reg:SI 22 xmm2 [297])
> > >         (reg:SI 5 di [orig:96 _52 ] [96])) 75 {*movsi_internal}
> > >      (nil))
> > > (insn 78 489 490 12 (set (reg:V4HI 20 xmm0 [orig:140 _283 ] [140])
> > >         (vec_duplicate:V4HI (truncate:HI (reg:SI 22 xmm2 [297])))) 1412 {*vec_dupv4hi}
> > >      (expr_list:REG_DEAD (reg:SI 22 xmm2 [297])
> > >         (nil)))
> > > are examples when it uses only the low 16 bits from that, and
> > > (insn 487 72 73 12 (set (reg:SI 1 dx [148])
> > >         (reg:SI 68 k0 [orig:82 var_6.0_1 ] [82])) 75 {*movsi_internal}
> > >      (nil))
> > >
> > > (insn 85 84 491 13 (set (reg:SI 37 r9 [orig:86 _11 ] [86])
> > >         (reg:SI 68 k0 [orig:82 var_6.0_1 ] [82])) "pr98694.C":28:14 75 {*movsi_internal}
> > >      (nil))
> > >
> > > (insn 491 85 88 13 (set (reg:SI 3 bx [299])
> > >         (reg:SI 68 k0 [orig:82 var_6.0_1 ] [82])) 75 {*movsi_internal}
> > >      (nil))
> > > (insn 88 491 89 13 (set (reg:CCNO 17 flags)
> > >         (compare:CCNO (reg:SI 3 bx [299])
> > >             (const_int 0 [0]))) 7 {*cmpsi_ccno_1}
> > >      (expr_list:REG_DEAD (reg:SI 3 bx [299])
> > >         (nil)))
> > >
> > > (insn 457 499 460 33 (set (reg:SI 39 r11 [orig:86 _11 ] [86])
> > >         (reg:SI 37 r9 [orig:86 _11 ] [86])) "pr98694.C":35:36 75 {*movsi_internal}
> > >      (expr_list:REG_DEAD (reg:SI 37 r9 [orig:86 _11 ] [86])
> > >         (nil)))
> > > are examples where it uses low 32-bits from k0.
> > > So the
> > >  (insn 457 499 460 33 (set (reg:SI 39 r11 [orig:86 _11 ] [86])
> > > -        (reg:SI 37 r9 [orig:86 _11 ] [86])) "pr98694.C":35:36 75 {*movsi_internal}
> > > -     (expr_list:REG_DEAD (reg:SI 37 r9 [orig:86 _11 ] [86])
> > > +        (reg:SI 22 xmm2 [orig:86 _11 ] [86])) "pr98694.C":35:36 75 {*movsi_internal}
> > > +     (expr_list:REG_DEAD (reg:SI 22 xmm2 [orig:86 _11 ] [86])
> > >          (nil)))
> > > cprop_hardreg change indeed looks bogus, while xmm2 has SImode, it holds
> > > only the low 16-bits of the value and has the upper bits undefined, while r9
> > > it is replacing had all of the low 32-bits well defined.
> >
> > Ah, ok, thanks for the extra context.
> >
> > So AIUI the problem when recording xmm2<-di isn't just:
> >
> >  [A] partial_subreg_p (vd->e[sr].mode, GET_MODE (src))
> >
> > but also that:
> >
> >  [B] partial_subreg_p (vd->e[sr].mode, vd->e[vd->e[sr].oldest_regno].mode)
> >
> > For example, all registers in this sequence can be part of the same chain:
> >
> >     (set (reg:HI R1) (reg:HI R0))
> >     (set (reg:SI R2) (reg:SI R1)) // [A]
> >     (set (reg:DI R3) (reg:DI R2)) // [A]
> >     (set (reg:SI R4) (reg:SI R[0-3]))
> >     (set (reg:HI R5) (reg:HI R[0-4]))
> >
> > But:
> >
> >     (set (reg:SI R1) (reg:SI R0))
> >     (set (reg:HI R2) (reg:HI R1))
> >     (set (reg:SI R3) (reg:SI R2)) // [A] && [B]
> >
> > is problematic because it dips below the precision of the oldest regno
> > and then increases again.
> >
> > When this happens, I guess we have two choices:
> >
> > (1) what the patch does: treat R3 as the start of a new chain.
> > (2) pretend that the copy occured in vd->e[sr].mode instead
> >     (i.e. copy vd->e[sr].mode to vd->e[dr].mode)
> >
> > I guess (2) would need to be subject to REG_CAN_CHANGE_MODE_P.
> > Maybe the optimisation provided by (2) compared to (1) isn't common
> > enough to be worth the complication.
> >
> > I think we should test [B] as well as [A] though.  The pass is set
> > up to do some quite elaborate mode changes and I think rejecting
> > [A] on its own would make some of the other code redundant.
> > It also feels like it should be a seperate “if” or “else if”,
> > with its own comment.
> >
> Update patch.

Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.

> > Thanks,
> > Richard
>
>
>
> --
> BR,
> Hongtao



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] [PR rtl/optimization/98694] Fix incorrect optimization by cprop_hardreg.
  2021-01-20  4:35               ` Hongtao Liu
  2021-01-20  4:40                 ` Hongtao Liu
@ 2021-01-20 12:56                 ` H.J. Lu
  2021-01-20 14:14                 ` Richard Sandiford
  2 siblings, 0 replies; 20+ messages in thread
From: H.J. Lu @ 2021-01-20 12:56 UTC (permalink / raw)
  To: Hongtao Liu
  Cc: Jakub Jelinek via Gcc-patches, Eric Botcazou, Steven Bosscher,
	Jakub Jelinek, Richard Sandiford

On Tue, Jan 19, 2021 at 8:32 PM Hongtao Liu via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On Wed, Jan 20, 2021 at 12:10 AM Richard Sandiford
> <richard.sandiford@arm.com> wrote:
> >
> > Jakub Jelinek via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > > On Tue, Jan 19, 2021 at 12:38:47PM +0000, Richard Sandiford via Gcc-patches wrote:
> > >> > actually only the lower 16bits are needed, the original insn is like
> > >> >
> > >> > .294.r.ira
> > >> > (insn 69 68 70 13 (set (reg:HI 96 [ _52 ])
> > >> >         (subreg:HI (reg:DI 82 [ var_6.0_1 ]) 0)) "test.c":21:23 76
> > >> > {*movhi_internal}
> > >> >      (nil))
> > >> > (insn 78 75 82 13 (set (reg:V4HI 140 [ _283 ])
> > >> >         (vec_duplicate:V4HI (truncate:HI (subreg:SI (reg:HI 96 [ _52
> > >> > ]) 0)))) 1412 {*vec_dupv4hi}
> > >> >      (nil))
> > >> >
> > >> > .295r.reload
> > >> > (insn 69 68 70 13 (set (reg:HI 5 di [orig:96 _52 ] [96])
> > >> >         (reg:HI 68 k0 [orig:82 var_6.0_1 ] [82])) "test.c":21:23 76
> > >> > {*movhi_internal}
> > >> >      (nil))
> > >> > (insn 489 75 78 13 (set (reg:SI 22 xmm2 [297])
> > >> >         (reg:SI 5 di [orig:96 _52 ] [96])) 75 {*movsi_internal}
> > >> >      (nil))
> > >> > (insn 78 489 490 13 (set (reg:V4HI 20 xmm0 [orig:140 _283 ] [140])
> > >> >         (vec_duplicate:V4HI (truncate:HI (reg:SI 22 xmm2 [297]))))
> > >> > 1412 {*vec_dupv4hi}
> > >> >      (nil))
> > >> >
> > >> > and insn 489 is created by lra/reload which seems ok for the sequence,
> > >> > but problemistic with considering the logic of hardreg_cprop.
> > >>
> > >> It looks OK even with the regcprop behaviour though:
> > >>
> > >> - insn 69 defines only the low 16 bits of di,
> > >> - insn 489 defines only the low 16 bits of xmm2, but copies bits 16-31
> > >>   too (with unknown contents)
> > >> - insn 78 uses only the low 16 bits of xmm2 (the unknown contents
> > >>   introduced by insn 489 are truncated away)
> > >>
> > >> So where do bits 16-31 become significant?  What goes wrong if they're
> > >> not zero?
> > >
> > > The k0 register is initialized I believe with
> > > (insn 20 2 21 2 (set (reg:DI 68 k0 [orig:82 var_6.0_1 ] [82])
> > >         (mem/c:DI (symbol_ref:DI ("var_6") [flags 0x40]  <var_decl 0x7f7babeaaf30 var_6>) [3 var_6+0 S8 A64])) "pr98694.C":21:10 74 {*movdi_internal}
> > >      (nil))
> > > and so it contains all 64-bits, and then the code sometimes uses all the
> > > bits, sometimes just the low 16-bits and sometimes low 32-bits of that
> > > value.
> > > (insn 69 68 70 12 (set (reg:HI 5 di [orig:96 _52 ] [96])
> > >         (reg:HI 68 k0 [orig:82 var_6.0_1 ] [82])) "pr98694.C":27:23 76 {*movhi_internal}
> > >      (nil))
> > > (insn 74 73 75 12 (set (reg:SI 36 r8 [orig:149 _52 ] [149])
> > >         (zero_extend:SI (reg:HI 68 k0 [orig:82 var_6.0_1 ] [82]))) 144 {*zero_extendhisi2}
> > >      (nil))
> > > (insn 489 75 78 12 (set (reg:SI 22 xmm2 [297])
> > >         (reg:SI 5 di [orig:96 _52 ] [96])) 75 {*movsi_internal}
> > >      (nil))
> > > (insn 78 489 490 12 (set (reg:V4HI 20 xmm0 [orig:140 _283 ] [140])
> > >         (vec_duplicate:V4HI (truncate:HI (reg:SI 22 xmm2 [297])))) 1412 {*vec_dupv4hi}
> > >      (expr_list:REG_DEAD (reg:SI 22 xmm2 [297])
> > >         (nil)))
> > > are examples when it uses only the low 16 bits from that, and
> > > (insn 487 72 73 12 (set (reg:SI 1 dx [148])
> > >         (reg:SI 68 k0 [orig:82 var_6.0_1 ] [82])) 75 {*movsi_internal}
> > >      (nil))
> > >
> > > (insn 85 84 491 13 (set (reg:SI 37 r9 [orig:86 _11 ] [86])
> > >         (reg:SI 68 k0 [orig:82 var_6.0_1 ] [82])) "pr98694.C":28:14 75 {*movsi_internal}
> > >      (nil))
> > >
> > > (insn 491 85 88 13 (set (reg:SI 3 bx [299])
> > >         (reg:SI 68 k0 [orig:82 var_6.0_1 ] [82])) 75 {*movsi_internal}
> > >      (nil))
> > > (insn 88 491 89 13 (set (reg:CCNO 17 flags)
> > >         (compare:CCNO (reg:SI 3 bx [299])
> > >             (const_int 0 [0]))) 7 {*cmpsi_ccno_1}
> > >      (expr_list:REG_DEAD (reg:SI 3 bx [299])
> > >         (nil)))
> > >
> > > (insn 457 499 460 33 (set (reg:SI 39 r11 [orig:86 _11 ] [86])
> > >         (reg:SI 37 r9 [orig:86 _11 ] [86])) "pr98694.C":35:36 75 {*movsi_internal}
> > >      (expr_list:REG_DEAD (reg:SI 37 r9 [orig:86 _11 ] [86])
> > >         (nil)))
> > > are examples where it uses low 32-bits from k0.
> > > So the
> > >  (insn 457 499 460 33 (set (reg:SI 39 r11 [orig:86 _11 ] [86])
> > > -        (reg:SI 37 r9 [orig:86 _11 ] [86])) "pr98694.C":35:36 75 {*movsi_internal}
> > > -     (expr_list:REG_DEAD (reg:SI 37 r9 [orig:86 _11 ] [86])
> > > +        (reg:SI 22 xmm2 [orig:86 _11 ] [86])) "pr98694.C":35:36 75 {*movsi_internal}
> > > +     (expr_list:REG_DEAD (reg:SI 22 xmm2 [orig:86 _11 ] [86])
> > >          (nil)))
> > > cprop_hardreg change indeed looks bogus, while xmm2 has SImode, it holds
> > > only the low 16-bits of the value and has the upper bits undefined, while r9
> > > it is replacing had all of the low 32-bits well defined.
> >
> > Ah, ok, thanks for the extra context.
> >
> > So AIUI the problem when recording xmm2<-di isn't just:
> >
> >  [A] partial_subreg_p (vd->e[sr].mode, GET_MODE (src))
> >
> > but also that:
> >
> >  [B] partial_subreg_p (vd->e[sr].mode, vd->e[vd->e[sr].oldest_regno].mode)
> >
> > For example, all registers in this sequence can be part of the same chain:
> >
> >     (set (reg:HI R1) (reg:HI R0))
> >     (set (reg:SI R2) (reg:SI R1)) // [A]
> >     (set (reg:DI R3) (reg:DI R2)) // [A]
> >     (set (reg:SI R4) (reg:SI R[0-3]))
> >     (set (reg:HI R5) (reg:HI R[0-4]))
> >
> > But:
> >
> >     (set (reg:SI R1) (reg:SI R0))
> >     (set (reg:HI R2) (reg:HI R1))
> >     (set (reg:SI R3) (reg:SI R2)) // [A] && [B]
> >
> > is problematic because it dips below the precision of the oldest regno
> > and then increases again.
> >
> > When this happens, I guess we have two choices:
> >
> > (1) what the patch does: treat R3 as the start of a new chain.
> > (2) pretend that the copy occured in vd->e[sr].mode instead
> >     (i.e. copy vd->e[sr].mode to vd->e[dr].mode)
> >
> > I guess (2) would need to be subject to REG_CAN_CHANGE_MODE_P.
> > Maybe the optimisation provided by (2) compared to (1) isn't common
> > enough to be worth the complication.
> >
> > I think we should test [B] as well as [A] though.  The pass is set
> > up to do some quite elaborate mode changes and I think rejecting
> > [A] on its own would make some of the other code redundant.
> > It also feels like it should be a seperate “if” or “else if”,
> > with its own comment.
> >
> Update patch.
> > Thanks,
> > Richard

+int main ()
+{

Please add __builtin_cpu_supports ("avx512bw") check.

+  __m512i src1 = _mm512_setzero_si512 ();
+  __m512i src2 = _mm512_set_epi8 (0, 1, 0, 1, 0, 1, 0, 1,
+   0, 1, 0, 1, 0, 1, 0, 1,
+   0, 1, 0, 1, 0, 1, 0, 1,
+   0, 1, 0, 1, 0, 1, 0, 1,
+   0, 1, 0, 1, 0, 1, 0, 1,
+   0, 1, 0, 1, 0, 1, 0, 1,
+   0, 1, 0, 1, 0, 1, 0, 1,
+   0, 1, 0, 1, 0, 1, 0, 1);
+  __mmask64 m = _mm512_cmpeq_epu8_mask (src1, src2);
+  v2si a = foo (src1, src2);
+  if (a[0] != (int)m)
+    __builtin_abort ();
+  return 0;
+}

-- 
H.J.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] [PR rtl/optimization/98694] Fix incorrect optimization by cprop_hardreg.
  2021-01-20  4:35               ` Hongtao Liu
  2021-01-20  4:40                 ` Hongtao Liu
  2021-01-20 12:56                 ` H.J. Lu
@ 2021-01-20 14:14                 ` Richard Sandiford
  2021-01-21  5:25                   ` Hongtao Liu
  2 siblings, 1 reply; 20+ messages in thread
From: Richard Sandiford @ 2021-01-20 14:14 UTC (permalink / raw)
  To: Hongtao Liu
  Cc: Jakub Jelinek via Gcc-patches, ebotcazou, steven, Jakub Jelinek

Hongtao Liu <crazylht@gmail.com> writes:
> On Wed, Jan 20, 2021 at 12:10 AM Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>>
>> Jakub Jelinek via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> > On Tue, Jan 19, 2021 at 12:38:47PM +0000, Richard Sandiford via Gcc-patches wrote:
>> >> > actually only the lower 16bits are needed, the original insn is like
>> >> >
>> >> > .294.r.ira
>> >> > (insn 69 68 70 13 (set (reg:HI 96 [ _52 ])
>> >> >         (subreg:HI (reg:DI 82 [ var_6.0_1 ]) 0)) "test.c":21:23 76
>> >> > {*movhi_internal}
>> >> >      (nil))
>> >> > (insn 78 75 82 13 (set (reg:V4HI 140 [ _283 ])
>> >> >         (vec_duplicate:V4HI (truncate:HI (subreg:SI (reg:HI 96 [ _52
>> >> > ]) 0)))) 1412 {*vec_dupv4hi}
>> >> >      (nil))
>> >> >
>> >> > .295r.reload
>> >> > (insn 69 68 70 13 (set (reg:HI 5 di [orig:96 _52 ] [96])
>> >> >         (reg:HI 68 k0 [orig:82 var_6.0_1 ] [82])) "test.c":21:23 76
>> >> > {*movhi_internal}
>> >> >      (nil))
>> >> > (insn 489 75 78 13 (set (reg:SI 22 xmm2 [297])
>> >> >         (reg:SI 5 di [orig:96 _52 ] [96])) 75 {*movsi_internal}
>> >> >      (nil))
>> >> > (insn 78 489 490 13 (set (reg:V4HI 20 xmm0 [orig:140 _283 ] [140])
>> >> >         (vec_duplicate:V4HI (truncate:HI (reg:SI 22 xmm2 [297]))))
>> >> > 1412 {*vec_dupv4hi}
>> >> >      (nil))
>> >> >
>> >> > and insn 489 is created by lra/reload which seems ok for the sequence,
>> >> > but problemistic with considering the logic of hardreg_cprop.
>> >>
>> >> It looks OK even with the regcprop behaviour though:
>> >>
>> >> - insn 69 defines only the low 16 bits of di,
>> >> - insn 489 defines only the low 16 bits of xmm2, but copies bits 16-31
>> >>   too (with unknown contents)
>> >> - insn 78 uses only the low 16 bits of xmm2 (the unknown contents
>> >>   introduced by insn 489 are truncated away)
>> >>
>> >> So where do bits 16-31 become significant?  What goes wrong if they're
>> >> not zero?
>> >
>> > The k0 register is initialized I believe with
>> > (insn 20 2 21 2 (set (reg:DI 68 k0 [orig:82 var_6.0_1 ] [82])
>> >         (mem/c:DI (symbol_ref:DI ("var_6") [flags 0x40]  <var_decl 0x7f7babeaaf30 var_6>) [3 var_6+0 S8 A64])) "pr98694.C":21:10 74 {*movdi_internal}
>> >      (nil))
>> > and so it contains all 64-bits, and then the code sometimes uses all the
>> > bits, sometimes just the low 16-bits and sometimes low 32-bits of that
>> > value.
>> > (insn 69 68 70 12 (set (reg:HI 5 di [orig:96 _52 ] [96])
>> >         (reg:HI 68 k0 [orig:82 var_6.0_1 ] [82])) "pr98694.C":27:23 76 {*movhi_internal}
>> >      (nil))
>> > (insn 74 73 75 12 (set (reg:SI 36 r8 [orig:149 _52 ] [149])
>> >         (zero_extend:SI (reg:HI 68 k0 [orig:82 var_6.0_1 ] [82]))) 144 {*zero_extendhisi2}
>> >      (nil))
>> > (insn 489 75 78 12 (set (reg:SI 22 xmm2 [297])
>> >         (reg:SI 5 di [orig:96 _52 ] [96])) 75 {*movsi_internal}
>> >      (nil))
>> > (insn 78 489 490 12 (set (reg:V4HI 20 xmm0 [orig:140 _283 ] [140])
>> >         (vec_duplicate:V4HI (truncate:HI (reg:SI 22 xmm2 [297])))) 1412 {*vec_dupv4hi}
>> >      (expr_list:REG_DEAD (reg:SI 22 xmm2 [297])
>> >         (nil)))
>> > are examples when it uses only the low 16 bits from that, and
>> > (insn 487 72 73 12 (set (reg:SI 1 dx [148])
>> >         (reg:SI 68 k0 [orig:82 var_6.0_1 ] [82])) 75 {*movsi_internal}
>> >      (nil))
>> >
>> > (insn 85 84 491 13 (set (reg:SI 37 r9 [orig:86 _11 ] [86])
>> >         (reg:SI 68 k0 [orig:82 var_6.0_1 ] [82])) "pr98694.C":28:14 75 {*movsi_internal}
>> >      (nil))
>> >
>> > (insn 491 85 88 13 (set (reg:SI 3 bx [299])
>> >         (reg:SI 68 k0 [orig:82 var_6.0_1 ] [82])) 75 {*movsi_internal}
>> >      (nil))
>> > (insn 88 491 89 13 (set (reg:CCNO 17 flags)
>> >         (compare:CCNO (reg:SI 3 bx [299])
>> >             (const_int 0 [0]))) 7 {*cmpsi_ccno_1}
>> >      (expr_list:REG_DEAD (reg:SI 3 bx [299])
>> >         (nil)))
>> >
>> > (insn 457 499 460 33 (set (reg:SI 39 r11 [orig:86 _11 ] [86])
>> >         (reg:SI 37 r9 [orig:86 _11 ] [86])) "pr98694.C":35:36 75 {*movsi_internal}
>> >      (expr_list:REG_DEAD (reg:SI 37 r9 [orig:86 _11 ] [86])
>> >         (nil)))
>> > are examples where it uses low 32-bits from k0.
>> > So the
>> >  (insn 457 499 460 33 (set (reg:SI 39 r11 [orig:86 _11 ] [86])
>> > -        (reg:SI 37 r9 [orig:86 _11 ] [86])) "pr98694.C":35:36 75 {*movsi_internal}
>> > -     (expr_list:REG_DEAD (reg:SI 37 r9 [orig:86 _11 ] [86])
>> > +        (reg:SI 22 xmm2 [orig:86 _11 ] [86])) "pr98694.C":35:36 75 {*movsi_internal}
>> > +     (expr_list:REG_DEAD (reg:SI 22 xmm2 [orig:86 _11 ] [86])
>> >          (nil)))
>> > cprop_hardreg change indeed looks bogus, while xmm2 has SImode, it holds
>> > only the low 16-bits of the value and has the upper bits undefined, while r9
>> > it is replacing had all of the low 32-bits well defined.
>>
>> Ah, ok, thanks for the extra context.
>>
>> So AIUI the problem when recording xmm2<-di isn't just:
>>
>>  [A] partial_subreg_p (vd->e[sr].mode, GET_MODE (src))
>>
>> but also that:
>>
>>  [B] partial_subreg_p (vd->e[sr].mode, vd->e[vd->e[sr].oldest_regno].mode)
>>
>> For example, all registers in this sequence can be part of the same chain:
>>
>>     (set (reg:HI R1) (reg:HI R0))
>>     (set (reg:SI R2) (reg:SI R1)) // [A]
>>     (set (reg:DI R3) (reg:DI R2)) // [A]
>>     (set (reg:SI R4) (reg:SI R[0-3]))
>>     (set (reg:HI R5) (reg:HI R[0-4]))
>>
>> But:
>>
>>     (set (reg:SI R1) (reg:SI R0))
>>     (set (reg:HI R2) (reg:HI R1))
>>     (set (reg:SI R3) (reg:SI R2)) // [A] && [B]
>>
>> is problematic because it dips below the precision of the oldest regno
>> and then increases again.
>>
>> When this happens, I guess we have two choices:
>>
>> (1) what the patch does: treat R3 as the start of a new chain.
>> (2) pretend that the copy occured in vd->e[sr].mode instead
>>     (i.e. copy vd->e[sr].mode to vd->e[dr].mode)
>>
>> I guess (2) would need to be subject to REG_CAN_CHANGE_MODE_P.
>> Maybe the optimisation provided by (2) compared to (1) isn't common
>> enough to be worth the complication.
>>
>> I think we should test [B] as well as [A] though.  The pass is set
>> up to do some quite elaborate mode changes and I think rejecting
>> [A] on its own would make some of the other code redundant.
>> It also feels like it should be a seperate “if” or “else if”,
>> with its own comment.
>>
> Update patch.
>> Thanks,
>> Richard
>
>
>
> -- 
> BR,
> Hongtao
>
> From a52b3c8a90a0bf6cbda8ce86d99c82c6182863a7 Mon Sep 17 00:00:00 2001
> From: liuhongt <hongtao.liu@intel.com>
> Date: Mon, 18 Jan 2021 16:55:32 +0800
> Subject: [PATCH] [PR rtl/optimization/98694] Fix incorrect optimization by
>  cprop_hardreg.
>
> If SRC had been assigned a mode narrower than the copy, we can't link
> DEST into the chain even they have same
> hard_regno_nregs(i.e. HImode/SImode in i386 backend).

This is a bit out of date now.  Maybe just say “can't always link”
instead of just “can't link”.

> i.e
>         kmovw   %k0, %edi
>         vmovd   %edi, %xmm2
> 	vpshuflw        $0, %xmm2, %xmm0
>         kmovw   %k0, %r8d
>         kmovd   %k0, %r9d
> ...
> -	 movl %r9d, %r11d
> +	 vmovd %xmm2, %r11d
>
> gcc/ChangeLog:
>
> 	PR rtl-optimization/98694
> 	* regcprop.c (copy_value): If SRC had been assigned a mode
> 	narrower than the copy, we can't link DEST into the chain even
> 	they have same hard_regno_nregs(i.e. HImode/SImode in i386
> 	backend).
>
> gcc/testsuite/ChangeLog:
>
> 	PR rtl-optimization/98694
> 	* gcc.target/i386/pr98694.c: New test.
> ---
>  gcc/regcprop.c                          | 33 +++++++++++++++++++++
>  gcc/testsuite/gcc.target/i386/pr98694.c | 38 +++++++++++++++++++++++++
>  2 files changed, 71 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr98694.c
>
> diff --git a/gcc/regcprop.c b/gcc/regcprop.c
> index dd62cb36013..908298beaea 100644
> --- a/gcc/regcprop.c
> +++ b/gcc/regcprop.c
> @@ -358,6 +358,39 @@ copy_value (rtx dest, rtx src, struct value_data *vd)
>    else if (sn > hard_regno_nregs (sr, vd->e[sr].mode))
>      return;
>  
> +  /* If SRC had been assigned a mode narrower than the copy, Although
> +     they have same hard_regno_nregs, it's not safe to link DEST into the
> +     chain. .i.e.

How about:

  It is not safe to link DEST into the chain if SRC was defined in some
  narrower mode M and if M is also narrower than the mode of the first
  register in the chain.  For example:

> +     (set (reg:DI r1) (reg:DI r0))
> +     (set (reg:HI r2) (reg:HI r1))
> +     (set (reg:SI r3) (reg:SI r2)) //Should be a new chain start at r3
> +     (set (reg:SI r4) (reg:SI r1))
> +     (set (reg:SI r5) (reg:SI r4))
> +     the upper part of r3 is undefined, if adding it to the chain, it may be
> +     prop to r5 which has defined upper bits, .i.e. pr98694.

And for this:

     (set (reg:DI r1) (reg:DI r0))
     (set (reg:HI r2) (reg:HI r1))
     (set (reg:SI r3) (reg:SI r2)) // Should be a new chain starting at r3
     (set (reg:SI r4) (reg:SI r1))
     (set (reg:SI r5) (reg:SI r4))

  the upper part of r3 is undefined.  If we added it to the chain,
  it may be used to replace r5, which has defined upper bits.
  See PR98694 for details.

> +
> +     [A] partial_subreg_p (vd->e[sr].mode, GET_MODE (src))
> +     [B] partial_subreg_p (vd->e[sr].mode, vd->e[vd->e[sr].oldest_regno].mode)
> +     Condition B is added to to catch optimization opportunities of
> +
> +     (set (reg:HI R1) (reg:HI R0))
> +     (set (reg:SI R2) (reg:SI R1)) // [A]
> +     (set (reg:DI R3) (reg:DI R2)) // [A]
> +     (set (reg:SI R4) (reg:SI R[0-3]))
> +     (set (reg:HI R5) (reg:HI R[0-4]))
> +

Maybe add here:

  in which all registers have only 16 defined bits.

> +     but problematic for
> +
> +     (set (reg:SI R1) (reg:SI R0))
> +     (set (reg:HI R2) (reg:HI R1))
> +     (set (reg:SI R3) (reg:SI R2)) // [A] && [B]
> +
> +     to be fixed????   */

I think we should drop this part.  Your example above covers it in
more detail.

OK with those changes and the one that HJ asked for.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] [PR rtl/optimization/98694] Fix incorrect optimization by cprop_hardreg.
  2021-01-20 14:14                 ` Richard Sandiford
@ 2021-01-21  5:25                   ` Hongtao Liu
  0 siblings, 0 replies; 20+ messages in thread
From: Hongtao Liu @ 2021-01-21  5:25 UTC (permalink / raw)
  To: Hongtao Liu, Jakub Jelinek via Gcc-patches, ebotcazou, steven,
	Jakub Jelinek, Richard Sandiford

[-- Attachment #1: Type: text/plain, Size: 156 bytes --]

> OK with those changes and the one that HJ asked for.
>

This is the patch I'm checking in, thanks for the review.

> Thanks,
> Richard



-- 
BR,
Hongtao

[-- Attachment #2: 0001-PR-rtl-optimization-98694-Fix-incorrect-optimization_V3.patch --]
[-- Type: text/x-patch, Size: 4114 bytes --]

From b806e0cece62620e31a4a861208d37616059a212 Mon Sep 17 00:00:00 2001
From: liuhongt <hongtao.liu@intel.com>
Date: Mon, 18 Jan 2021 16:55:32 +0800
Subject: [PATCH] [PR rtl/optimization/98694] Fix incorrect optimization by
 cprop_hardreg.

If SRC had been assigned a mode narrower than the copy, we can't
always link DEST into the chain even they have same
hard_regno_nregs(i.e. HImode/SImode in i386 backend).

i.e
        kmovw   %k0, %edi
        vmovd   %edi, %xmm2
	vpshuflw        $0, %xmm2, %xmm0
        kmovw   %k0, %r8d
        kmovd   %k0, %r9d
...
-	 movl %r9d, %r11d
+	 vmovd %xmm2, %r11d

gcc/ChangeLog:

	PR rtl-optimization/98694
	* regcprop.c (copy_value): If SRC had been assigned a mode
	narrower than the copy, we can't link DEST into the chain even
	they have same hard_regno_nregs(i.e. HImode/SImode in i386
	backend).

gcc/testsuite/ChangeLog:

	PR rtl-optimization/98694
	* gcc.target/i386/pr98694.c: New test.
---
 gcc/regcprop.c                          | 29 +++++++++++++++++
 gcc/testsuite/gcc.target/i386/pr98694.c | 41 +++++++++++++++++++++++++
 2 files changed, 70 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr98694.c

diff --git a/gcc/regcprop.c b/gcc/regcprop.c
index dd62cb36013..e1342f56bd1 100644
--- a/gcc/regcprop.c
+++ b/gcc/regcprop.c
@@ -358,6 +358,35 @@ copy_value (rtx dest, rtx src, struct value_data *vd)
   else if (sn > hard_regno_nregs (sr, vd->e[sr].mode))
     return;
 
+  /* It is not safe to link DEST into the chain if SRC was defined in some
+     narrower mode M and if M is also narrower than the mode of the first
+     register in the chain.  For example:
+     (set (reg:DI r1) (reg:DI r0))
+     (set (reg:HI r2) (reg:HI r1))
+     (set (reg:SI r3) (reg:SI r2)) //Should be a new chain start at r3
+     (set (reg:SI r4) (reg:SI r1))
+     (set (reg:SI r5) (reg:SI r4))
+
+     the upper part of r3 is undefined.  If we added it to the chain,
+     it may be used to replace r5, which has defined upper bits.
+     See PR98694 for details.
+
+     [A] partial_subreg_p (vd->e[sr].mode, GET_MODE (src))
+     [B] partial_subreg_p (vd->e[sr].mode, vd->e[vd->e[sr].oldest_regno].mode)
+     Condition B is added to to catch optimization opportunities of
+
+     (set (reg:HI R1) (reg:HI R0))
+     (set (reg:SI R2) (reg:SI R1)) // [A]
+     (set (reg:DI R3) (reg:DI R2)) // [A]
+     (set (reg:SI R4) (reg:SI R[0-3]))
+     (set (reg:HI R5) (reg:HI R[0-4]))
+
+     in which all registers have only 16 defined bits.  */
+  else if (partial_subreg_p (vd->e[sr].mode, GET_MODE (src))
+	   && partial_subreg_p (vd->e[sr].mode,
+				vd->e[vd->e[sr].oldest_regno].mode))
+    return;
+
   /* Link DR at the end of the value chain used by SR.  */
 
   vd->e[dr].oldest_regno = vd->e[sr].oldest_regno;
diff --git a/gcc/testsuite/gcc.target/i386/pr98694.c b/gcc/testsuite/gcc.target/i386/pr98694.c
new file mode 100644
index 00000000000..45889d482c1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr98694.c
@@ -0,0 +1,41 @@
+/* PR rtl-optimization/98694 */
+/* { dg-do run { target { ! ia32 } } } */
+/* { dg-options "-O2 -mavx512bw" } */
+/* { dg-require-effective-target avx512bw } */
+
+#include<immintrin.h>
+typedef short v4hi __attribute__ ((vector_size (8)));
+typedef int v2si __attribute__ ((vector_size (8)));
+v4hi b;
+
+__attribute__ ((noipa))
+v2si
+foo (__m512i src1, __m512i src2)
+{
+  __mmask64 m = _mm512_cmpeq_epu8_mask (src1, src2);
+  short s = (short) m;
+  int i = (int)m;
+  b = __extension__ (v4hi) {s, s, s, s};
+  return __extension__ (v2si) {i, i};
+}
+
+int main ()
+{
+  if (!__builtin_cpu_supports ("avx512bw"))
+    return 0;
+
+  __m512i src1 = _mm512_setzero_si512 ();
+  __m512i src2 = _mm512_set_epi8 (0, 1, 0, 1, 0, 1, 0, 1,
+				  0, 1, 0, 1, 0, 1, 0, 1,
+				  0, 1, 0, 1, 0, 1, 0, 1,
+				  0, 1, 0, 1, 0, 1, 0, 1,
+				  0, 1, 0, 1, 0, 1, 0, 1,
+				  0, 1, 0, 1, 0, 1, 0, 1,
+				  0, 1, 0, 1, 0, 1, 0, 1,
+				  0, 1, 0, 1, 0, 1, 0, 1);
+  __mmask64 m = _mm512_cmpeq_epu8_mask (src1, src2);
+  v2si a = foo (src1, src2);
+  if (a[0] != (int)m)
+    __builtin_abort ();
+  return 0;
+}
-- 
2.18.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH] regcprop: Fix another cprop_hardreg bug [PR100342]
  2021-01-19 16:10             ` Richard Sandiford
  2021-01-20  4:35               ` Hongtao Liu
@ 2021-05-05 17:44               ` Jakub Jelinek
  2021-05-06  8:50                 ` Jakub Jelinek
  2021-05-11 10:59                 ` Richard Sandiford
  1 sibling, 2 replies; 20+ messages in thread
From: Jakub Jelinek @ 2021-05-05 17:44 UTC (permalink / raw)
  To: Richard Sandiford, Hongtao Liu, Eric Botcazou, Jeff Law; +Cc: gcc-patches

On Tue, Jan 19, 2021 at 04:10:33PM +0000, Richard Sandiford via Gcc-patches wrote:
> Ah, ok, thanks for the extra context.
> 
> So AIUI the problem when recording xmm2<-di isn't just:
> 
>  [A] partial_subreg_p (vd->e[sr].mode, GET_MODE (src))
> 
> but also that:
> 
>  [B] partial_subreg_p (vd->e[sr].mode, vd->e[vd->e[sr].oldest_regno].mode)
> 
> For example, all registers in this sequence can be part of the same chain:
> 
>     (set (reg:HI R1) (reg:HI R0))
>     (set (reg:SI R2) (reg:SI R1)) // [A]
>     (set (reg:DI R3) (reg:DI R2)) // [A]
>     (set (reg:SI R4) (reg:SI R[0-3]))
>     (set (reg:HI R5) (reg:HI R[0-4]))
> 
> But:
> 
>     (set (reg:SI R1) (reg:SI R0))
>     (set (reg:HI R2) (reg:HI R1))
>     (set (reg:SI R3) (reg:SI R2)) // [A] && [B]
> 
> is problematic because it dips below the precision of the oldest regno
> and then increases again.
> 
> When this happens, I guess we have two choices:
> 
> (1) what the patch does: treat R3 as the start of a new chain.
> (2) pretend that the copy occured in vd->e[sr].mode instead
>     (i.e. copy vd->e[sr].mode to vd->e[dr].mode)
> 
> I guess (2) would need to be subject to REG_CAN_CHANGE_MODE_P.
> Maybe the optimisation provided by (2) compared to (1) isn't common
> enough to be worth the complication.
> 
> I think we should test [B] as well as [A] though.  The pass is set
> up to do some quite elaborate mode changes and I think rejecting
> [A] on its own would make some of the other code redundant.
> It also feels like it should be a seperate “if” or “else if”,
> with its own comment.

Unfortunately, we now have a testcase that shows that testing also [B]
is a problem (unfortunately now latent on the trunk, only reproduces
on 10 and 11 branches).

The comment in the patch tries to list just the interesting instructions,
we have a 64-bit value, copy low 8 bit of those to another register,
copy full 64 bits to another register and then clobber the original register.
Before that (set (reg:DI r14) (const_int ...)) we have a chain
DI r14, QI si, DI bp , that instruction drops the DI r14 from that chain, so
we have QI si, DI bp , si being the oldest_regno.
Next DI si is copied into DI dx.  Only the low 8 bits of that are defined,
the rest is unspecified, but we would add DI dx into that same chain at the
end, so QI si, DI bp, DI dx [*].  Next si is overwritten, so the chain is
DI bp, DI dx.  And then we see (set (reg:DI dx) (reg:DI bp)) and remove it
as redundant, because we think bp and dx are already equivalent, when in
reality that is true only for the lowpart 8 bits.
I believe the [*] marked step above is where the bug is.

The committed regcprop.c (copy_value) change (but only committed to
trunk/11, not to 10) added
  else if (partial_subreg_p (vd->e[sr].mode, GET_MODE (src))
           && partial_subreg_p (vd->e[sr].mode,
                                vd->e[vd->e[sr].oldest_regno].mode))
    return;
and while the first partial_subreg_p call returns true, the second one
doesn't; before the (set (reg:DI r14) (const_int ...)) insn it would be
true and we'd return, but as that reg got clobbered, si became the oldest
regno in the chain and so vd->e[vd->e[sr].oldest_regno].mode is QImode
and vd->e[sr].mode is QImode too, so the second partial_subreg_p is false.
But as the testcase shows, what is the oldest_regno in the chain is
something that changes over time, so relying on it for anything is
problematic, something could have a different oldest_regno and later
on get a different oldest_regno (perhaps with different mode) because
the oldest_regno got overwritten and it can change both ways.

I wrote the following patch (originally against 10 branch because that is
where Uros has been debugging it) and bootstrapped/regtested it on 11
branch successfully.
It effectively implements your (2) above; I'm not sure if
REG_CAN_CHANGE_MODE_P is needed there, because it is already tested in
find_oldest_value_reg -> maybe_mode_change -> mode_change_ok.

So perhaps just the vd->e[dr].mode in there could change to
GET_MODE (src) and drop the previous PR98694 change?
If yes, what to do with the previously added comment?

2021-05-05  Jakub Jelinek  <jakub@redhat.com>

	PR rtl-optimization/100342
	* regcprop.c (copy_value): When copying a source reg in a wider
	mode than it has recorded for the value, adjust recorded destination
	mode too.

	* gcc.target/i386/pr100342.c: New test.

--- gcc/regcprop.c.jj	2020-04-30 17:41:37.624675304 +0200
+++ gcc/regcprop.c	2021-05-05 16:24:01.667308941 +0200
@@ -358,6 +358,22 @@ copy_value (rtx dest, rtx src, struct va
   else if (sn > hard_regno_nregs (sr, vd->e[sr].mode))
     return;
 
+  /* If a narrower value is copied using wider mode, the upper bits
+     are undefined (could be e.g. a former paradoxical subreg).  Signal
+     in that case we've only copied value using the narrower mode.
+     Consider:
+     (set (reg:DI r14) (mem:DI ...))
+     (set (reg:QI si) (reg:QI r14))
+     (set (reg:DI bp) (reg:DI r14))
+     (set (reg:DI r14) (const_int ...))
+     (set (reg:DI dx) (reg:DI si))
+     (set (reg:DI si) (const_int ...))
+     (set (reg:DI dx) (reg:DI bp))
+     The last set is not redundant, while the low 8 bits of dx are already
+     equal to low 8 bits of bp, the other bits are undefined.  */
+  if (partial_subreg_p (vd->e[sr].mode, vd->e[dr].mode))
+    set_value_regno (dr, vd->e[sr].mode, vd);
+
   /* Link DR at the end of the value chain used by SR.  */
 
   vd->e[dr].oldest_regno = vd->e[sr].oldest_regno;
--- gcc/testsuite/gcc.target/i386/pr100342.c.jj	2021-05-05 17:01:29.139356719 +0200
+++ gcc/testsuite/gcc.target/i386/pr100342.c	2021-05-05 17:01:14.287521150 +0200
@@ -0,0 +1,70 @@
+/* PR rtl-optimization/100342 */
+/* { dg-do run { target int128 } } */
+/* { dg-options "-O2 -fno-dse -fno-forward-propagate -Wno-psabi -mno-sse2" } */
+
+#define SHL(x, y) ((x) << ((y) & (sizeof(x) * 8 - 1)))
+#define SHR(x, y) ((x) >> ((y) & (sizeof(x) * 8 - 1)))
+#define ROR(x, y) (SHR(x, y)) | (SHL(x, (sizeof(x) * 8 - (y))))
+#define SHLV(x, y) ((x) << ((y) & (sizeof((x)[0]) * 8 - 1)))
+#define SHLSV(x, y) ((x) << ((y) & (sizeof((y)[0]) * 8 - 1)))
+typedef unsigned char A;
+typedef unsigned char __attribute__((__vector_size__ (8))) B;
+typedef unsigned char __attribute__((__vector_size__ (16))) C;
+typedef unsigned char __attribute__((__vector_size__ (32))) D;
+typedef unsigned char __attribute__((__vector_size__ (64))) E;
+typedef unsigned short F;
+typedef unsigned short __attribute__((__vector_size__ (16))) G;
+typedef unsigned int H;
+typedef unsigned int __attribute__((__vector_size__ (32))) I;
+typedef unsigned long long J;
+typedef unsigned long long __attribute__((__vector_size__ (8))) K;
+typedef unsigned long long __attribute__((__vector_size__ (32))) L;
+typedef unsigned long long __attribute__((__vector_size__ (64))) M;
+typedef unsigned __int128 N;
+typedef unsigned __int128 __attribute__((__vector_size__ (16))) O;
+typedef unsigned __int128 __attribute__((__vector_size__ (32))) P;
+typedef unsigned __int128 __attribute__((__vector_size__ (64))) Q;
+B v1;
+D v2;
+L v3;
+K v4;
+I v5;
+O v6;
+
+B
+foo (A a, C b, E c, F d, G e, H f, J g, M h, N i, P j, Q k)
+{
+  b &= (A) f;
+  k += a;
+  G l = e;
+  D m = v2 >= (A) (J) v1;
+  J r = a + g;
+  L n = v3 <= f;
+  k -= i / f;
+  l -= (A) g;
+  c |= (A) d;
+  b -= (A) i;
+  J o = ROR (__builtin_clz (r), a);
+  K p = v4 | f, q = v4 <= f;
+  P s = SHLV (SHLSV (__builtin_bswap64 (i), (P) (0 < j)) <= 0, j);
+  n += a <= r;
+  M t = (M) (a / SHLV (c, 0)) != __builtin_bswap64 (i);
+  I u = f - v5;
+  E v = (E) h + (E) t + (E) k;
+  D w = (union { D b[2]; }) { }.b[0] + ((union { E b; }) v).b[1] + m + (D) u + (D) n + (D) s;
+  C x = ((union { D b; }) w).b[1] + b + (C) l + (C) v6;
+  B y = ((union { C a; B b; }) x).b + ((union { C a; B b[2]; }) x).b[1] + (B) p + (B) q;
+  J z = i + o;
+  F z2 = z;
+  A z3 = z2;
+  return y + z3;
+}
+
+int
+main ()
+{
+  B x = foo (0, (C) { }, (E) { }, 10, (G) { }, 4, 2, (M) { }, 123842323652213865LL, (P) { 1 }, (Q) { });
+  if ((J) x != 0x2e2c2e2c2e2c2e30ULL)
+    __builtin_abort();
+  return 0;
+}

	Jakub


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] regcprop: Fix another cprop_hardreg bug [PR100342]
  2021-05-05 17:44               ` [PATCH] regcprop: Fix another cprop_hardreg bug [PR100342] Jakub Jelinek
@ 2021-05-06  8:50                 ` Jakub Jelinek
  2021-05-11 10:59                 ` Richard Sandiford
  1 sibling, 0 replies; 20+ messages in thread
From: Jakub Jelinek @ 2021-05-06  8:50 UTC (permalink / raw)
  To: Richard Sandiford, Hongtao Liu, Eric Botcazou, Jeff Law, gcc-patches

On Wed, May 05, 2021 at 07:44:46PM +0200, Jakub Jelinek via Gcc-patches wrote:
> So perhaps just the vd->e[dr].mode in there could change to
> GET_MODE (src) and drop the previous PR98694 change?

I've bootstrapped/regtested that successfully on the trunk
(on {x86_64,i686}-linux), though haven't attempted to merge the two comments:

2021-05-06  Jakub Jelinek  <jakub@redhat.com>

	PR rtl-optimization/100342
	* regcprop.c (copy_value): When copying a source reg in a wider
	mode than it has recorded for the value, adjust recorded destination
	mode too.

	* gcc.target/i386/pr100342.c: New test.

--- gcc/regcprop.c.jj	2020-04-30 17:41:37.624675304 +0200
+++ gcc/regcprop.c	2021-05-05 16:24:01.667308941 +0200
@@ -382,10 +382,22 @@ copy_value (rtx dest, rtx src, struct va
      (set (reg:HI R5) (reg:HI R[0-4]))
 
      in which all registers have only 16 defined bits.  */
-  else if (partial_subreg_p (vd->e[sr].mode, GET_MODE (src))
-	   && partial_subreg_p (vd->e[sr].mode,
-				vd->e[vd->e[sr].oldest_regno].mode))
-    return;
+
+  /* If a narrower value is copied using wider mode, the upper bits
+     are undefined (could be e.g. a former paradoxical subreg).  Signal
+     in that case we've only copied value using the narrower mode.
+     Consider:
+     (set (reg:DI r14) (mem:DI ...))
+     (set (reg:QI si) (reg:QI r14))
+     (set (reg:DI bp) (reg:DI r14))
+     (set (reg:DI r14) (const_int ...))
+     (set (reg:DI dx) (reg:DI si))
+     (set (reg:DI si) (const_int ...))
+     (set (reg:DI dx) (reg:DI bp))
+     The last set is not redundant, while the low 8 bits of dx are already
+     equal to low 8 bits of bp, the other bits are undefined.  */
+  else if (partial_subreg_p (vd->e[sr].mode, GET_MODE (src)))
+    set_value_regno (dr, vd->e[sr].mode, vd);
 
   /* Link DR at the end of the value chain used by SR.  */
 
--- gcc/testsuite/gcc.target/i386/pr100342.c.jj	2021-05-05 17:01:29.139356719 +0200
+++ gcc/testsuite/gcc.target/i386/pr100342.c	2021-05-05 17:01:14.287521150 +0200
@@ -0,0 +1,70 @@
+/* PR rtl-optimization/100342 */
+/* { dg-do run { target int128 } } */
+/* { dg-options "-O2 -fno-dse -fno-forward-propagate -Wno-psabi -mno-sse2" } */
+
+#define SHL(x, y) ((x) << ((y) & (sizeof(x) * 8 - 1)))
+#define SHR(x, y) ((x) >> ((y) & (sizeof(x) * 8 - 1)))
+#define ROR(x, y) (SHR(x, y)) | (SHL(x, (sizeof(x) * 8 - (y))))
+#define SHLV(x, y) ((x) << ((y) & (sizeof((x)[0]) * 8 - 1)))
+#define SHLSV(x, y) ((x) << ((y) & (sizeof((y)[0]) * 8 - 1)))
+typedef unsigned char A;
+typedef unsigned char __attribute__((__vector_size__ (8))) B;
+typedef unsigned char __attribute__((__vector_size__ (16))) C;
+typedef unsigned char __attribute__((__vector_size__ (32))) D;
+typedef unsigned char __attribute__((__vector_size__ (64))) E;
+typedef unsigned short F;
+typedef unsigned short __attribute__((__vector_size__ (16))) G;
+typedef unsigned int H;
+typedef unsigned int __attribute__((__vector_size__ (32))) I;
+typedef unsigned long long J;
+typedef unsigned long long __attribute__((__vector_size__ (8))) K;
+typedef unsigned long long __attribute__((__vector_size__ (32))) L;
+typedef unsigned long long __attribute__((__vector_size__ (64))) M;
+typedef unsigned __int128 N;
+typedef unsigned __int128 __attribute__((__vector_size__ (16))) O;
+typedef unsigned __int128 __attribute__((__vector_size__ (32))) P;
+typedef unsigned __int128 __attribute__((__vector_size__ (64))) Q;
+B v1;
+D v2;
+L v3;
+K v4;
+I v5;
+O v6;
+
+B
+foo (A a, C b, E c, F d, G e, H f, J g, M h, N i, P j, Q k)
+{
+  b &= (A) f;
+  k += a;
+  G l = e;
+  D m = v2 >= (A) (J) v1;
+  J r = a + g;
+  L n = v3 <= f;
+  k -= i / f;
+  l -= (A) g;
+  c |= (A) d;
+  b -= (A) i;
+  J o = ROR (__builtin_clz (r), a);
+  K p = v4 | f, q = v4 <= f;
+  P s = SHLV (SHLSV (__builtin_bswap64 (i), (P) (0 < j)) <= 0, j);
+  n += a <= r;
+  M t = (M) (a / SHLV (c, 0)) != __builtin_bswap64 (i);
+  I u = f - v5;
+  E v = (E) h + (E) t + (E) k;
+  D w = (union { D b[2]; }) { }.b[0] + ((union { E b; }) v).b[1] + m + (D) u + (D) n + (D) s;
+  C x = ((union { D b; }) w).b[1] + b + (C) l + (C) v6;
+  B y = ((union { C a; B b; }) x).b + ((union { C a; B b[2]; }) x).b[1] + (B) p + (B) q;
+  J z = i + o;
+  F z2 = z;
+  A z3 = z2;
+  return y + z3;
+}
+
+int
+main ()
+{
+  B x = foo (0, (C) { }, (E) { }, 10, (G) { }, 4, 2, (M) { }, 123842323652213865LL, (P) { 1 }, (Q) { });
+  if ((J) x != 0x2e2c2e2c2e2c2e30ULL)
+    __builtin_abort();
+  return 0;
+}


	Jakub


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] regcprop: Fix another cprop_hardreg bug [PR100342]
  2021-05-05 17:44               ` [PATCH] regcprop: Fix another cprop_hardreg bug [PR100342] Jakub Jelinek
  2021-05-06  8:50                 ` Jakub Jelinek
@ 2021-05-11 10:59                 ` Richard Sandiford
  2021-05-13 15:37                   ` Jakub Jelinek
  1 sibling, 1 reply; 20+ messages in thread
From: Richard Sandiford @ 2021-05-11 10:59 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Hongtao Liu, Eric Botcazou, Jeff Law, gcc-patches

Jakub Jelinek <jakub@redhat.com> writes:
> On Tue, Jan 19, 2021 at 04:10:33PM +0000, Richard Sandiford via Gcc-patches wrote:
>> Ah, ok, thanks for the extra context.
>> 
>> So AIUI the problem when recording xmm2<-di isn't just:
>> 
>>  [A] partial_subreg_p (vd->e[sr].mode, GET_MODE (src))
>> 
>> but also that:
>> 
>>  [B] partial_subreg_p (vd->e[sr].mode, vd->e[vd->e[sr].oldest_regno].mode)
>> 
>> For example, all registers in this sequence can be part of the same chain:
>> 
>>     (set (reg:HI R1) (reg:HI R0))
>>     (set (reg:SI R2) (reg:SI R1)) // [A]
>>     (set (reg:DI R3) (reg:DI R2)) // [A]
>>     (set (reg:SI R4) (reg:SI R[0-3]))
>>     (set (reg:HI R5) (reg:HI R[0-4]))
>> 
>> But:
>> 
>>     (set (reg:SI R1) (reg:SI R0))
>>     (set (reg:HI R2) (reg:HI R1))
>>     (set (reg:SI R3) (reg:SI R2)) // [A] && [B]
>> 
>> is problematic because it dips below the precision of the oldest regno
>> and then increases again.
>> 
>> When this happens, I guess we have two choices:
>> 
>> (1) what the patch does: treat R3 as the start of a new chain.
>> (2) pretend that the copy occured in vd->e[sr].mode instead
>>     (i.e. copy vd->e[sr].mode to vd->e[dr].mode)
>> 
>> I guess (2) would need to be subject to REG_CAN_CHANGE_MODE_P.
>> Maybe the optimisation provided by (2) compared to (1) isn't common
>> enough to be worth the complication.
>> 
>> I think we should test [B] as well as [A] though.  The pass is set
>> up to do some quite elaborate mode changes and I think rejecting
>> [A] on its own would make some of the other code redundant.
>> It also feels like it should be a seperate “if” or “else if”,
>> with its own comment.
>
> Unfortunately, we now have a testcase that shows that testing also [B]
> is a problem (unfortunately now latent on the trunk, only reproduces
> on 10 and 11 branches).

This whole area feels way more complicated than it ought to be :-/

> The comment in the patch tries to list just the interesting instructions,
> we have a 64-bit value, copy low 8 bit of those to another register,
> copy full 64 bits to another register and then clobber the original register.
> Before that (set (reg:DI r14) (const_int ...)) we have a chain
> DI r14, QI si, DI bp , that instruction drops the DI r14 from that chain, so
> we have QI si, DI bp , si being the oldest_regno.
> Next DI si is copied into DI dx.  Only the low 8 bits of that are defined,
> the rest is unspecified, but we would add DI dx into that same chain at the
> end, so QI si, DI bp, DI dx [*].  Next si is overwritten, so the chain is
> DI bp, DI dx.  And then we see (set (reg:DI dx) (reg:DI bp)) and remove it
> as redundant, because we think bp and dx are already equivalent, when in
> reality that is true only for the lowpart 8 bits.
> I believe the [*] marked step above is where the bug is.
>
> The committed regcprop.c (copy_value) change (but only committed to
> trunk/11, not to 10) added
>   else if (partial_subreg_p (vd->e[sr].mode, GET_MODE (src))
>            && partial_subreg_p (vd->e[sr].mode,
>                                 vd->e[vd->e[sr].oldest_regno].mode))
>     return;
> and while the first partial_subreg_p call returns true, the second one
> doesn't; before the (set (reg:DI r14) (const_int ...)) insn it would be
> true and we'd return, but as that reg got clobbered, si became the oldest
> regno in the chain and so vd->e[vd->e[sr].oldest_regno].mode is QImode
> and vd->e[sr].mode is QImode too, so the second partial_subreg_p is false.
> But as the testcase shows, what is the oldest_regno in the chain is
> something that changes over time, so relying on it for anything is
> problematic, something could have a different oldest_regno and later
> on get a different oldest_regno (perhaps with different mode) because
> the oldest_regno got overwritten and it can change both ways.
>
> I wrote the following patch (originally against 10 branch because that is
> where Uros has been debugging it) and bootstrapped/regtested it on 11
> branch successfully.
> It effectively implements your (2) above; I'm not sure if
> REG_CAN_CHANGE_MODE_P is needed there, because it is already tested in
> find_oldest_value_reg -> maybe_mode_change -> mode_change_ok.

The REG_CAN_CHANGE_MODE_P test would in this case be for
vd->e[dr].mode → vd->e[sr].mode, rather than oldest_regno's mode.
I'm just worried that:

   (set (reg:HI R1) (reg:HI R0))
   (set (reg:SI R2) (reg:SI R1))

isn't equivalent to:

   (set (reg:HI R1) (reg:HI R0))
   (set (reg:HI R2) (reg:HI R1))

if REG_CAN_CHANGE_MODE_P is false for either the R2 or R1 change.
If we pretend that it is when building the chain then there's a
risk of GIGO when using it in find_oldest_value_reg.

(Although in this case SI and HI are both valid for R1,
REG_CAN_CHANGE_MODE_P might still be false if the HI bits are
not in the low 16 bits of the SI.  That's unlikely in this case,
but a similar thing can happen for vector modes or multi-register modes.)

I'm not saying the patch is wrong.  I just wanted to clarify
why I thought the check might be needed.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] regcprop: Fix another cprop_hardreg bug [PR100342]
  2021-05-11 10:59                 ` Richard Sandiford
@ 2021-05-13 15:37                   ` Jakub Jelinek
  2021-05-13 17:01                     ` Jakub Jelinek
  0 siblings, 1 reply; 20+ messages in thread
From: Jakub Jelinek @ 2021-05-13 15:37 UTC (permalink / raw)
  To: Hongtao Liu, Eric Botcazou, Jeff Law, gcc-patches, richard.sandiford

On Tue, May 11, 2021 at 11:59:24AM +0100, Richard Sandiford via Gcc-patches wrote:
> > I wrote the following patch (originally against 10 branch because that is
> > where Uros has been debugging it) and bootstrapped/regtested it on 11
> > branch successfully.
> > It effectively implements your (2) above; I'm not sure if
> > REG_CAN_CHANGE_MODE_P is needed there, because it is already tested in
> > find_oldest_value_reg -> maybe_mode_change -> mode_change_ok.
> 
> The REG_CAN_CHANGE_MODE_P test would in this case be for
> vd->e[dr].mode → vd->e[sr].mode, rather than oldest_regno's mode.
> I'm just worried that:
> 
>    (set (reg:HI R1) (reg:HI R0))
>    (set (reg:SI R2) (reg:SI R1))
> 
> isn't equivalent to:
> 
>    (set (reg:HI R1) (reg:HI R0))
>    (set (reg:HI R2) (reg:HI R1))
> 
> if REG_CAN_CHANGE_MODE_P is false for either the R2 or R1 change.
> If we pretend that it is when building the chain then there's a
> risk of GIGO when using it in find_oldest_value_reg.
> 
> (Although in this case SI and HI are both valid for R1,
> REG_CAN_CHANGE_MODE_P might still be false if the HI bits are
> not in the low 16 bits of the SI.  That's unlikely in this case,
> but a similar thing can happen for vector modes or multi-register modes.)
> 
> I'm not saying the patch is wrong.  I just wanted to clarify
> why I thought the check might be needed.

So, do you want something like (I've deleted the old comment as I think
the new one is enough, but am open to keep both) the patch below, where
it REG_CAN_CHANGE_MODE_P is false, we punt (return), otherwise call
set_value_regno?
Am not sure if those REG_CAN_CHANGE_MODE_P arguments is what you want
though.

--- gcc/regcprop.c.jj	2021-03-23 10:21:07.176447920 +0100
+++ gcc/regcprop.c	2021-05-13 17:31:39.940519855 +0200
@@ -358,34 +358,25 @@ copy_value (rtx dest, rtx src, struct va
   else if (sn > hard_regno_nregs (sr, vd->e[sr].mode))
     return;
 
-  /* It is not safe to link DEST into the chain if SRC was defined in some
-     narrower mode M and if M is also narrower than the mode of the first
-     register in the chain.  For example:
-     (set (reg:DI r1) (reg:DI r0))
-     (set (reg:HI r2) (reg:HI r1))
-     (set (reg:SI r3) (reg:SI r2)) //Should be a new chain start at r3
-     (set (reg:SI r4) (reg:SI r1))
-     (set (reg:SI r5) (reg:SI r4))
-
-     the upper part of r3 is undefined.  If we added it to the chain,
-     it may be used to replace r5, which has defined upper bits.
-     See PR98694 for details.
-
-     [A] partial_subreg_p (vd->e[sr].mode, GET_MODE (src))
-     [B] partial_subreg_p (vd->e[sr].mode, vd->e[vd->e[sr].oldest_regno].mode)
-     Condition B is added to to catch optimization opportunities of
-
-     (set (reg:HI R1) (reg:HI R0))
-     (set (reg:SI R2) (reg:SI R1)) // [A]
-     (set (reg:DI R3) (reg:DI R2)) // [A]
-     (set (reg:SI R4) (reg:SI R[0-3]))
-     (set (reg:HI R5) (reg:HI R[0-4]))
-
-     in which all registers have only 16 defined bits.  */
-  else if (partial_subreg_p (vd->e[sr].mode, GET_MODE (src))
-	   && partial_subreg_p (vd->e[sr].mode,
-				vd->e[vd->e[sr].oldest_regno].mode))
-    return;
+  /* If a narrower value is copied using wider mode, the upper bits
+     are undefined (could be e.g. a former paradoxical subreg).  Signal
+     in that case we've only copied value using the narrower mode.
+     Consider:
+     (set (reg:DI r14) (mem:DI ...))
+     (set (reg:QI si) (reg:QI r14))
+     (set (reg:DI bp) (reg:DI r14))
+     (set (reg:DI r14) (const_int ...))
+     (set (reg:DI dx) (reg:DI si))
+     (set (reg:DI si) (const_int ...))
+     (set (reg:DI dx) (reg:DI bp))
+     The last set is not redundant, while the low 8 bits of dx are already
+     equal to low 8 bits of bp, the other bits are undefined.  */
+  else if (partial_subreg_p (vd->e[sr].mode, GET_MODE (src)))
+    {
+      if (REG_CAN_CHANGE_MODE_P (sr, GET_MODE (src), vd->e[sr].mode))
+	return;
+      set_value_regno (dr, vd->e[sr].mode, vd);
+    }
 
   /* Link DR at the end of the value chain used by SR.  */
 


	Jakub


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] regcprop: Fix another cprop_hardreg bug [PR100342]
  2021-05-13 15:37                   ` Jakub Jelinek
@ 2021-05-13 17:01                     ` Jakub Jelinek
  2021-05-14  9:09                       ` Richard Sandiford
  0 siblings, 1 reply; 20+ messages in thread
From: Jakub Jelinek @ 2021-05-13 17:01 UTC (permalink / raw)
  To: Hongtao Liu, Eric Botcazou, Jeff Law, gcc-patches, richard.sandiford

On Thu, May 13, 2021 at 05:37:36PM +0200, Jakub Jelinek wrote:
> So, do you want something like (I've deleted the old comment as I think
> the new one is enough, but am open to keep both) the patch below, where
> it REG_CAN_CHANGE_MODE_P is false, we punt (return), otherwise call
> set_value_regno?
> Am not sure if those REG_CAN_CHANGE_MODE_P arguments is what you want
> though.

Oops, missing !, meant following which works on 11 branch for the testcase:

2021-05-13  Jakub Jelinek  <jakub@redhat.com>

	PR rtl-optimization/100342
	* regcprop.c (copy_value): When copying a source reg in a wider
	mode than it has recorded for the value, adjust recorded destination
	mode too or punt if !REG_CAN_CHANGE_MODE_P.

	* gcc.target/i386/pr100342.c: New test.

--- gcc/regcprop.c.jj	2021-03-23 10:21:07.176447920 +0100
+++ gcc/regcprop.c	2021-05-13 17:36:46.443192451 +0200
@@ -358,34 +358,25 @@ copy_value (rtx dest, rtx src, struct va
   else if (sn > hard_regno_nregs (sr, vd->e[sr].mode))
     return;
 
-  /* It is not safe to link DEST into the chain if SRC was defined in some
-     narrower mode M and if M is also narrower than the mode of the first
-     register in the chain.  For example:
-     (set (reg:DI r1) (reg:DI r0))
-     (set (reg:HI r2) (reg:HI r1))
-     (set (reg:SI r3) (reg:SI r2)) //Should be a new chain start at r3
-     (set (reg:SI r4) (reg:SI r1))
-     (set (reg:SI r5) (reg:SI r4))
-
-     the upper part of r3 is undefined.  If we added it to the chain,
-     it may be used to replace r5, which has defined upper bits.
-     See PR98694 for details.
-
-     [A] partial_subreg_p (vd->e[sr].mode, GET_MODE (src))
-     [B] partial_subreg_p (vd->e[sr].mode, vd->e[vd->e[sr].oldest_regno].mode)
-     Condition B is added to to catch optimization opportunities of
-
-     (set (reg:HI R1) (reg:HI R0))
-     (set (reg:SI R2) (reg:SI R1)) // [A]
-     (set (reg:DI R3) (reg:DI R2)) // [A]
-     (set (reg:SI R4) (reg:SI R[0-3]))
-     (set (reg:HI R5) (reg:HI R[0-4]))
-
-     in which all registers have only 16 defined bits.  */
-  else if (partial_subreg_p (vd->e[sr].mode, GET_MODE (src))
-	   && partial_subreg_p (vd->e[sr].mode,
-				vd->e[vd->e[sr].oldest_regno].mode))
-    return;
+  /* If a narrower value is copied using wider mode, the upper bits
+     are undefined (could be e.g. a former paradoxical subreg).  Signal
+     in that case we've only copied value using the narrower mode.
+     Consider:
+     (set (reg:DI r14) (mem:DI ...))
+     (set (reg:QI si) (reg:QI r14))
+     (set (reg:DI bp) (reg:DI r14))
+     (set (reg:DI r14) (const_int ...))
+     (set (reg:DI dx) (reg:DI si))
+     (set (reg:DI si) (const_int ...))
+     (set (reg:DI dx) (reg:DI bp))
+     The last set is not redundant, while the low 8 bits of dx are already
+     equal to low 8 bits of bp, the other bits are undefined.  */
+  else if (partial_subreg_p (vd->e[sr].mode, GET_MODE (src)))
+    {
+      if (!REG_CAN_CHANGE_MODE_P (sr, GET_MODE (src), vd->e[sr].mode))
+	return;
+      set_value_regno (dr, vd->e[sr].mode, vd);
+    }
 
   /* Link DR at the end of the value chain used by SR.  */
 
--- gcc/testsuite/gcc.target/i386/pr100342.c.jj	2021-05-13 17:28:41.181460465 +0200
+++ gcc/testsuite/gcc.target/i386/pr100342.c	2021-05-13 17:28:41.181460465 +0200
@@ -0,0 +1,70 @@
+/* PR rtl-optimization/100342 */
+/* { dg-do run { target int128 } } */
+/* { dg-options "-O2 -fno-dse -fno-forward-propagate -Wno-psabi -mno-sse2" } */
+
+#define SHL(x, y) ((x) << ((y) & (sizeof(x) * 8 - 1)))
+#define SHR(x, y) ((x) >> ((y) & (sizeof(x) * 8 - 1)))
+#define ROR(x, y) (SHR(x, y)) | (SHL(x, (sizeof(x) * 8 - (y))))
+#define SHLV(x, y) ((x) << ((y) & (sizeof((x)[0]) * 8 - 1)))
+#define SHLSV(x, y) ((x) << ((y) & (sizeof((y)[0]) * 8 - 1)))
+typedef unsigned char A;
+typedef unsigned char __attribute__((__vector_size__ (8))) B;
+typedef unsigned char __attribute__((__vector_size__ (16))) C;
+typedef unsigned char __attribute__((__vector_size__ (32))) D;
+typedef unsigned char __attribute__((__vector_size__ (64))) E;
+typedef unsigned short F;
+typedef unsigned short __attribute__((__vector_size__ (16))) G;
+typedef unsigned int H;
+typedef unsigned int __attribute__((__vector_size__ (32))) I;
+typedef unsigned long long J;
+typedef unsigned long long __attribute__((__vector_size__ (8))) K;
+typedef unsigned long long __attribute__((__vector_size__ (32))) L;
+typedef unsigned long long __attribute__((__vector_size__ (64))) M;
+typedef unsigned __int128 N;
+typedef unsigned __int128 __attribute__((__vector_size__ (16))) O;
+typedef unsigned __int128 __attribute__((__vector_size__ (32))) P;
+typedef unsigned __int128 __attribute__((__vector_size__ (64))) Q;
+B v1;
+D v2;
+L v3;
+K v4;
+I v5;
+O v6;
+
+B
+foo (A a, C b, E c, F d, G e, H f, J g, M h, N i, P j, Q k)
+{
+  b &= (A) f;
+  k += a;
+  G l = e;
+  D m = v2 >= (A) (J) v1;
+  J r = a + g;
+  L n = v3 <= f;
+  k -= i / f;
+  l -= (A) g;
+  c |= (A) d;
+  b -= (A) i;
+  J o = ROR (__builtin_clz (r), a);
+  K p = v4 | f, q = v4 <= f;
+  P s = SHLV (SHLSV (__builtin_bswap64 (i), (P) (0 < j)) <= 0, j);
+  n += a <= r;
+  M t = (M) (a / SHLV (c, 0)) != __builtin_bswap64 (i);
+  I u = f - v5;
+  E v = (E) h + (E) t + (E) k;
+  D w = (union { D b[2]; }) { }.b[0] + ((union { E b; }) v).b[1] + m + (D) u + (D) n + (D) s;
+  C x = ((union { D b; }) w).b[1] + b + (C) l + (C) v6;
+  B y = ((union { C a; B b; }) x).b + ((union { C a; B b[2]; }) x).b[1] + (B) p + (B) q;
+  J z = i + o;
+  F z2 = z;
+  A z3 = z2;
+  return y + z3;
+}
+
+int
+main ()
+{
+  B x = foo (0, (C) { }, (E) { }, 10, (G) { }, 4, 2, (M) { }, 123842323652213865LL, (P) { 1 }, (Q) { });
+  if ((J) x != 0x2e2c2e2c2e2c2e30ULL)
+    __builtin_abort();
+  return 0;
+}


	Jakub


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] regcprop: Fix another cprop_hardreg bug [PR100342]
  2021-05-13 17:01                     ` Jakub Jelinek
@ 2021-05-14  9:09                       ` Richard Sandiford
  0 siblings, 0 replies; 20+ messages in thread
From: Richard Sandiford @ 2021-05-14  9:09 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Hongtao Liu, Eric Botcazou, Jeff Law, gcc-patches

Jakub Jelinek <jakub@redhat.com> writes:
> On Thu, May 13, 2021 at 05:37:36PM +0200, Jakub Jelinek wrote:
>> So, do you want something like (I've deleted the old comment as I think
>> the new one is enough, but am open to keep both) the patch below, where
>> it REG_CAN_CHANGE_MODE_P is false, we punt (return), otherwise call
>> set_value_regno?
>> Am not sure if those REG_CAN_CHANGE_MODE_P arguments is what you want
>> though.
>
> Oops, missing !, meant following which works on 11 branch for the testcase:
>
> 2021-05-13  Jakub Jelinek  <jakub@redhat.com>
>
> 	PR rtl-optimization/100342
> 	* regcprop.c (copy_value): When copying a source reg in a wider
> 	mode than it has recorded for the value, adjust recorded destination
> 	mode too or punt if !REG_CAN_CHANGE_MODE_P.
>
> 	* gcc.target/i386/pr100342.c: New test.
>
> --- gcc/regcprop.c.jj	2021-03-23 10:21:07.176447920 +0100
> +++ gcc/regcprop.c	2021-05-13 17:36:46.443192451 +0200
> @@ -358,34 +358,25 @@ copy_value (rtx dest, rtx src, struct va
>    else if (sn > hard_regno_nregs (sr, vd->e[sr].mode))
>      return;
>  
> -  /* It is not safe to link DEST into the chain if SRC was defined in some
> -     narrower mode M and if M is also narrower than the mode of the first
> -     register in the chain.  For example:
> -     (set (reg:DI r1) (reg:DI r0))
> -     (set (reg:HI r2) (reg:HI r1))
> -     (set (reg:SI r3) (reg:SI r2)) //Should be a new chain start at r3
> -     (set (reg:SI r4) (reg:SI r1))
> -     (set (reg:SI r5) (reg:SI r4))
> -
> -     the upper part of r3 is undefined.  If we added it to the chain,
> -     it may be used to replace r5, which has defined upper bits.
> -     See PR98694 for details.
> -
> -     [A] partial_subreg_p (vd->e[sr].mode, GET_MODE (src))
> -     [B] partial_subreg_p (vd->e[sr].mode, vd->e[vd->e[sr].oldest_regno].mode)
> -     Condition B is added to to catch optimization opportunities of
> -
> -     (set (reg:HI R1) (reg:HI R0))
> -     (set (reg:SI R2) (reg:SI R1)) // [A]
> -     (set (reg:DI R3) (reg:DI R2)) // [A]
> -     (set (reg:SI R4) (reg:SI R[0-3]))
> -     (set (reg:HI R5) (reg:HI R[0-4]))
> -
> -     in which all registers have only 16 defined bits.  */
> -  else if (partial_subreg_p (vd->e[sr].mode, GET_MODE (src))
> -	   && partial_subreg_p (vd->e[sr].mode,
> -				vd->e[vd->e[sr].oldest_regno].mode))
> -    return;
> +  /* If a narrower value is copied using wider mode, the upper bits
> +     are undefined (could be e.g. a former paradoxical subreg).  Signal
> +     in that case we've only copied value using the narrower mode.
> +     Consider:
> +     (set (reg:DI r14) (mem:DI ...))
> +     (set (reg:QI si) (reg:QI r14))
> +     (set (reg:DI bp) (reg:DI r14))
> +     (set (reg:DI r14) (const_int ...))
> +     (set (reg:DI dx) (reg:DI si))
> +     (set (reg:DI si) (const_int ...))
> +     (set (reg:DI dx) (reg:DI bp))
> +     The last set is not redundant, while the low 8 bits of dx are already
> +     equal to low 8 bits of bp, the other bits are undefined.  */
> +  else if (partial_subreg_p (vd->e[sr].mode, GET_MODE (src)))
> +    {
> +      if (!REG_CAN_CHANGE_MODE_P (sr, GET_MODE (src), vd->e[sr].mode))
> +	return;

LGTM, but for extra safety, I think we also want:

   || !REG_CAN_CHANGE_MODE_P (dr, vd->e[sr].mode, GET_MODE (dst))

i.e. we're effectively converting the source from the wider mode
to the narrower mode, doing a move, and then converting the narrow
mode back to the wider mode.

OK with that change, thanks.

Richard

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2021-05-14  9:09 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-18  9:16 [PATCH] [PR rtl/optimization/98694] Fix incorrect optimization by cprop_hardreg Hongtao Liu
2021-01-18 10:18 ` Richard Sandiford
2021-01-18 10:43   ` Hongtao Liu
2021-01-18 10:51     ` Hongtao Liu
2021-01-18 11:10     ` Richard Sandiford
2021-01-19  0:59       ` Hongtao Liu
2021-01-19 12:38         ` Richard Sandiford
2021-01-19 14:45           ` Jakub Jelinek
2021-01-19 16:10             ` Richard Sandiford
2021-01-20  4:35               ` Hongtao Liu
2021-01-20  4:40                 ` Hongtao Liu
2021-01-20 12:56                 ` H.J. Lu
2021-01-20 14:14                 ` Richard Sandiford
2021-01-21  5:25                   ` Hongtao Liu
2021-05-05 17:44               ` [PATCH] regcprop: Fix another cprop_hardreg bug [PR100342] Jakub Jelinek
2021-05-06  8:50                 ` Jakub Jelinek
2021-05-11 10:59                 ` Richard Sandiford
2021-05-13 15:37                   ` Jakub Jelinek
2021-05-13 17:01                     ` Jakub Jelinek
2021-05-14  9:09                       ` Richard Sandiford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).