public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] i386: Fix a peephole2 for -mavx512vl -mno-avx512bw [PR99321]
@ 2021-03-02 23:29 Jakub Jelinek
  2021-03-03  7:13 ` Uros Bizjak
  0 siblings, 1 reply; 2+ messages in thread
From: Jakub Jelinek @ 2021-03-02 23:29 UTC (permalink / raw)
  To: Uros Bizjak, Jeff Law, Kirill Yukhin; +Cc: gcc-patches

Hi!

As the testcase shows, the
(define_peephole2
  [(set (match_operand 0 "sse_reg_operand")
        (match_operand 1 "sse_reg_operand"))
   (set (match_dup 0)
        (match_operator 3 "commutative_operator"
          [(match_dup 0)
           (match_operand 2 "memory_operand")]))]
peephole2 can for AVX512VL without AVX512BW (I guess it is a hyphothetical
CPU, but unfortunately they are separate CPUID bits and we have separate
options for them) turn something that is valid without that peephole2
into something that is invalid (and in this case ICEs).
The problem is that the vpadd[bw], vpmullw, vpmin[su][bw] and vpmax[su][bw]
instructions require both AVX512BW and AVX512VL when they have
16-byte or 32-byte operands.  If operands[0] is %[xy]mm0 .. %[xy]mm15
but operands[1] is %[xy]mm16 .. %[xy]mm31, then before we have
a vector move which uses vmovdqa{32,64} and doesn't need AVX512BW,
AVX512VL is I think implied from HARD_REGNO_MODE_OK only supporting
V{16Q,32Q,8H,16H}imode in EXT_REX_SSE_REGNO_P regs with AVX512VL, and then
we have a commutative operation with that %[xy]mm0 .. %[xy]mm15 destination
and one source and a memory operand, so VEX encoded operation.
And, the peephole2 wants to replace it with a load into the destination
register from memory (ok) and then the commutative arith instruction.
But that needs EVEX encoding because of the high register and so requires
AVX512BW which might not be enabled.
The exception is and/ior/xor, because the hw doesn't have
vp{and,or,xor}{b,w} instructions at all, it uses vp{and,or,xor}d instead
and that of course doesn't need AVX512BW.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

BTW, there are other bugs I need to look at, while the vp{min,max}ub with
16-byte operands instruction properly requires avx512bw for v constraints
and otherwise uses x, e.g. the vpadd[bw] etc. instructions don't.
I'll try to handle that incrementally later this week.

2021-03-02  Jakub Jelinek  <jakub@redhat.com>

	PR target/99321
	* config/i386/i386.md (mov + mem using comm arith peephole2):
	Punt if operands[1] is EXT_REX_SSE_REGNO_P, AVX512BW is not enabled
	and the inner mode is [QH]Imode.

	* gcc.target/i386/pr99321.c: New test.

--- gcc/config/i386/i386.md.jj	2021-02-18 09:16:21.026841364 +0100
+++ gcc/config/i386/i386.md	2021-03-02 17:57:55.878158963 +0100
@@ -19843,7 +19843,18 @@ (define_peephole2
 	(match_operator 3 "commutative_operator"
 	  [(match_dup 0)
 	   (match_operand 2 "memory_operand")]))]
-  "REGNO (operands[0]) != REGNO (operands[1])"
+  "REGNO (operands[0]) != REGNO (operands[1])
+   /* Punt if operands[1] is %[xy]mm16+ and AVX512BW is not enabled,
+      as EVEX encoded vpadd[bw], vpmullw, vpmin[su][bw] and vpmax[su][bw]
+      instructions require AVX512BW and AVX512VL, but with the original
+      instructions it might require just AVX512VL.
+      AVX512VL is implied from TARGET_HARD_REGNO_MODE_OK.  */
+   && (!EXT_REX_SSE_REGNO_P (REGNO (operands[1]))
+       || TARGET_AVX512BW
+       || GET_MODE_SIZE (GET_MODE_INNER (GET_MODE (operands[0]))) > 2
+       || GET_CODE (operands[3]) == AND
+       || GET_CODE (operands[3]) == IOR
+       || GET_CODE (operands[3]) == XOR)"
   [(set (match_dup 0) (match_dup 2))
    (set (match_dup 0)
 	(match_op_dup 3 [(match_dup 0) (match_dup 1)]))])
--- gcc/testsuite/gcc.target/i386/pr99321.c.jj	2021-03-02 18:05:02.561528747 +0100
+++ gcc/testsuite/gcc.target/i386/pr99321.c	2021-03-02 18:05:51.121001799 +0100
@@ -0,0 +1,41 @@
+/* PR target/99321 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -mtune=btver2 -fno-tree-dce -mavx512vl -mno-avx512bw" } */
+
+typedef unsigned __attribute__((__vector_size__ (8))) A;
+typedef unsigned int __attribute__((__vector_size__ (8))) B;
+typedef unsigned char __attribute__((__vector_size__ (16))) C;
+typedef unsigned __attribute__((__vector_size__ (16))) D;
+typedef unsigned int __attribute__((__vector_size__ (16))) E;
+typedef unsigned __attribute__((__vector_size__ (16))) F;
+typedef unsigned __attribute__((__vector_size__ (32))) G;
+typedef int __attribute__((__vector_size__ (32))) H;
+typedef unsigned int __attribute__((__vector_size__ (32))) I;
+typedef char __attribute__((__vector_size__ (64))) J;
+typedef unsigned int __attribute__((__vector_size__ (64))) K;
+typedef unsigned long long __attribute__((__vector_size__ (64))) L;
+unsigned char a;
+unsigned b, c;
+H d;
+E e, f;
+D g;
+L h;
+
+A
+foo0 (A i, C j, G k, B l, K m, B n, I o)
+{
+  J p, q = a != p;
+  F r = b << f;
+  int s = a * 15;
+  C t = (1 << ((C) ((C) { 80 } >=j) & sizeof (0)) | (j ^ (C) { 5 }) << (j & sizeof (0))) != 0;
+  L u = h;
+  H v = d - 40;
+  u ^= -(long long) n;
+  D w = (char) s > g;
+  o ^= c / o;
+  J x = p + q + (J) m + (J) u + (J) u;
+  G y = ((union { J a; G b;}) x).b + ((union { J a; G b[2];}) x).b[1] + k + v + o;
+  C z = ((union { G a; C b;}) y).b + ((union { G a; C b;}) y).b + j + t + (C) g + (C) w + (C) e + (C) f + (C) r;
+  A zz = ((union { C a; A b;}) z).b + i + l + n;
+  return zz;
+}

	Jakub


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [PATCH] i386: Fix a peephole2 for -mavx512vl -mno-avx512bw [PR99321]
  2021-03-02 23:29 [PATCH] i386: Fix a peephole2 for -mavx512vl -mno-avx512bw [PR99321] Jakub Jelinek
@ 2021-03-03  7:13 ` Uros Bizjak
  0 siblings, 0 replies; 2+ messages in thread
From: Uros Bizjak @ 2021-03-03  7:13 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Jeff Law, Kirill Yukhin, gcc-patches

On Wed, Mar 3, 2021 at 12:29 AM Jakub Jelinek <jakub@redhat.com> wrote:
>
> Hi!
>
> As the testcase shows, the
> (define_peephole2
>   [(set (match_operand 0 "sse_reg_operand")
>         (match_operand 1 "sse_reg_operand"))
>    (set (match_dup 0)
>         (match_operator 3 "commutative_operator"
>           [(match_dup 0)
>            (match_operand 2 "memory_operand")]))]
> peephole2 can for AVX512VL without AVX512BW (I guess it is a hyphothetical
> CPU, but unfortunately they are separate CPUID bits and we have separate
> options for them) turn something that is valid without that peephole2
> into something that is invalid (and in this case ICEs).
> The problem is that the vpadd[bw], vpmullw, vpmin[su][bw] and vpmax[su][bw]
> instructions require both AVX512BW and AVX512VL when they have
> 16-byte or 32-byte operands.  If operands[0] is %[xy]mm0 .. %[xy]mm15
> but operands[1] is %[xy]mm16 .. %[xy]mm31, then before we have
> a vector move which uses vmovdqa{32,64} and doesn't need AVX512BW,
> AVX512VL is I think implied from HARD_REGNO_MODE_OK only supporting
> V{16Q,32Q,8H,16H}imode in EXT_REX_SSE_REGNO_P regs with AVX512VL, and then
> we have a commutative operation with that %[xy]mm0 .. %[xy]mm15 destination
> and one source and a memory operand, so VEX encoded operation.
> And, the peephole2 wants to replace it with a load into the destination
> register from memory (ok) and then the commutative arith instruction.
> But that needs EVEX encoding because of the high register and so requires
> AVX512BW which might not be enabled.
> The exception is and/ior/xor, because the hw doesn't have
> vp{and,or,xor}{b,w} instructions at all, it uses vp{and,or,xor}d instead
> and that of course doesn't need AVX512BW.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> BTW, there are other bugs I need to look at, while the vp{min,max}ub with
> 16-byte operands instruction properly requires avx512bw for v constraints
> and otherwise uses x, e.g. the vpadd[bw] etc. instructions don't.
> I'll try to handle that incrementally later this week.
>
> 2021-03-02  Jakub Jelinek  <jakub@redhat.com>
>
>         PR target/99321
>         * config/i386/i386.md (mov + mem using comm arith peephole2):
>         Punt if operands[1] is EXT_REX_SSE_REGNO_P, AVX512BW is not enabled
>         and the inner mode is [QH]Imode.
>
>         * gcc.target/i386/pr99321.c: New test.

OK, a comment inline.

Thanks,
Uros.

> --- gcc/config/i386/i386.md.jj  2021-02-18 09:16:21.026841364 +0100
> +++ gcc/config/i386/i386.md     2021-03-02 17:57:55.878158963 +0100
> @@ -19843,7 +19843,18 @@ (define_peephole2
>         (match_operator 3 "commutative_operator"
>           [(match_dup 0)
>            (match_operand 2 "memory_operand")]))]
> -  "REGNO (operands[0]) != REGNO (operands[1])"
> +  "REGNO (operands[0]) != REGNO (operands[1])
> +   /* Punt if operands[1] is %[xy]mm16+ and AVX512BW is not enabled,
> +      as EVEX encoded vpadd[bw], vpmullw, vpmin[su][bw] and vpmax[su][bw]
> +      instructions require AVX512BW and AVX512VL, but with the original
> +      instructions it might require just AVX512VL.
> +      AVX512VL is implied from TARGET_HARD_REGNO_MODE_OK.  */
> +   && (!EXT_REX_SSE_REGNO_P (REGNO (operands[1]))
> +       || TARGET_AVX512BW
> +       || GET_MODE_SIZE (GET_MODE_INNER (GET_MODE (operands[0]))) > 2
> +       || GET_CODE (operands[3]) == AND
> +       || GET_CODE (operands[3]) == IOR
> +       || GET_CODE (operands[3]) == XOR)"

Can you please introduce the "logic_operator" predicate, similar to
plusminuslogic_operator and use it here?

>    [(set (match_dup 0) (match_dup 2))
>     (set (match_dup 0)
>         (match_op_dup 3 [(match_dup 0) (match_dup 1)]))])
> --- gcc/testsuite/gcc.target/i386/pr99321.c.jj  2021-03-02 18:05:02.561528747 +0100
> +++ gcc/testsuite/gcc.target/i386/pr99321.c     2021-03-02 18:05:51.121001799 +0100
> @@ -0,0 +1,41 @@
> +/* PR target/99321 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mtune=btver2 -fno-tree-dce -mavx512vl -mno-avx512bw" } */
> +
> +typedef unsigned __attribute__((__vector_size__ (8))) A;
> +typedef unsigned int __attribute__((__vector_size__ (8))) B;
> +typedef unsigned char __attribute__((__vector_size__ (16))) C;
> +typedef unsigned __attribute__((__vector_size__ (16))) D;
> +typedef unsigned int __attribute__((__vector_size__ (16))) E;
> +typedef unsigned __attribute__((__vector_size__ (16))) F;
> +typedef unsigned __attribute__((__vector_size__ (32))) G;
> +typedef int __attribute__((__vector_size__ (32))) H;
> +typedef unsigned int __attribute__((__vector_size__ (32))) I;
> +typedef char __attribute__((__vector_size__ (64))) J;
> +typedef unsigned int __attribute__((__vector_size__ (64))) K;
> +typedef unsigned long long __attribute__((__vector_size__ (64))) L;
> +unsigned char a;
> +unsigned b, c;
> +H d;
> +E e, f;
> +D g;
> +L h;
> +
> +A
> +foo0 (A i, C j, G k, B l, K m, B n, I o)
> +{
> +  J p, q = a != p;
> +  F r = b << f;
> +  int s = a * 15;
> +  C t = (1 << ((C) ((C) { 80 } >=j) & sizeof (0)) | (j ^ (C) { 5 }) << (j & sizeof (0))) != 0;
> +  L u = h;
> +  H v = d - 40;
> +  u ^= -(long long) n;
> +  D w = (char) s > g;
> +  o ^= c / o;
> +  J x = p + q + (J) m + (J) u + (J) u;
> +  G y = ((union { J a; G b;}) x).b + ((union { J a; G b[2];}) x).b[1] + k + v + o;
> +  C z = ((union { G a; C b;}) y).b + ((union { G a; C b;}) y).b + j + t + (C) g + (C) w + (C) e + (C) f + (C) r;
> +  A zz = ((union { C a; A b;}) z).b + i + l + n;
> +  return zz;
> +}
>
>         Jakub
>

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-03-03  7:13 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-02 23:29 [PATCH] i386: Fix a peephole2 for -mavx512vl -mno-avx512bw [PR99321] Jakub Jelinek
2021-03-03  7:13 ` Uros Bizjak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).