public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] i386: Allow some V32HImode and V64QImode permutations even without AVX512BW [PR80355]
@ 2021-08-10  8:54 Jakub Jelinek
  2021-08-10 10:14 ` Hongtao Liu
  0 siblings, 1 reply; 2+ messages in thread
From: Jakub Jelinek @ 2021-08-10  8:54 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: Uros Bizjak, gcc-patches

Hi!

When working on the PR, I've noticed we generate terrible code for
V32HImode or V64QImode permutations for -mavx512f -mno-avx512bw.
Generally we can't do much with such permutations, but since PR68655
we can handle at least some, those expressible using V16SImode or V8DImode
permutations, but that wasn't reachable, because ix86_vectorize_vec_perm_const
didn't even try, it said without TARGET_AVX512BW it can't do anything, and
with it can do everything, no d.testing_p attempts.

This patch makes it try it for TARGET_AVX512F && !TARGET_AVX512BW.

The first hunk is to avoid ICE, expand_vec_perm_even_odd_1 asserts d->vmode
isn't V32HImode because expand_vec_perm_1 for AVX512BW handles already
all permutations, but when we let it through without !TARGET_AVX512BW,
expand_vec_perm_1 doesn't handle it.

If we want, that hunk can be dropped if we implement in
expand_vec_perm_even_odd_1 and its helper the even permutation as
vpmovdw + vpmovdw + vinserti64x4 and odd permutation as
vpsrld $16 + vpsrld $16 + vpmovdw + vpmovdw + vinserti64x4.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-08-10  Jakub Jelinek  <jakub@redhat.com>

	PR target/80355
	* config/i386/i386-expand.c (expand_vec_perm_even_odd): Return false
	for V32HImode if !TARGET_AVX512BW.
	(ix86_vectorize_vec_perm_const) <case E_V32HImode, case E_V64QImode>:
	If !TARGET_AVX512BW and TARGET_AVX512F and d.testing_p, don't fail
	early, but actually check the permutation.

	* gcc.target/i386/avx512f-pr80355-2.c: New test.

--- gcc/config/i386/i386-expand.c.jj	2021-08-05 10:26:15.589555028 +0200
+++ gcc/config/i386/i386-expand.c	2021-08-09 14:14:35.466268680 +0200
@@ -20337,6 +20337,11 @@ expand_vec_perm_even_odd (struct expand_
     if (d->perm[i] != 2 * i + odd)
       return false;
 
+  if (d->vmode == E_V32HImode
+      && d->testing_p
+      && !TARGET_AVX512BW)
+    return false;
+
   return expand_vec_perm_even_odd_1 (d, odd);
 }
 
@@ -20877,16 +20882,16 @@ ix86_vectorize_vec_perm_const (machine_m
 	return true;
       break;
     case E_V32HImode:
-      if (!TARGET_AVX512BW)
+      if (!TARGET_AVX512F)
 	return false;
-      if (d.testing_p)
+      if (d.testing_p && TARGET_AVX512BW)
 	/* All implementable with a single vperm[it]2 insn.  */
 	return true;
       break;
     case E_V64QImode:
-      if (!TARGET_AVX512BW)
+      if (!TARGET_AVX512F)
 	return false;
-      if (d.testing_p)
+      if (d.testing_p && TARGET_AVX512BW)
 	/* Implementable with 2 vperm[it]2, 2 vpshufb and 1 or insn.  */
 	return true;
       break;
--- gcc/testsuite/gcc.target/i386/avx512f-pr80355-2.c.jj	2021-08-09 14:24:27.176142589 +0200
+++ gcc/testsuite/gcc.target/i386/avx512f-pr80355-2.c	2021-08-09 14:29:23.308074276 +0200
@@ -0,0 +1,23 @@
+/* PR target/80355 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512f -mno-avx512vl -mno-avx512dq -mno-avx512bw" } */
+/* { dg-final { scan-assembler-times "\tvshufi(?:32x4|64x2)\t" 2 } } */
+
+typedef short V __attribute__((vector_size (64)));
+typedef char W __attribute__((vector_size (64)));
+
+W
+f0 (W x)
+{
+  return __builtin_shuffle (x, (W) { 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,
+				     48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63,
+				     0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
+				     17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 });
+}
+
+V
+f1 (V x)
+{
+  return __builtin_shuffle (x, (V) { 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
+				     0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 });
+}

	Jakub


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [PATCH] i386: Allow some V32HImode and V64QImode permutations even without AVX512BW [PR80355]
  2021-08-10  8:54 [PATCH] i386: Allow some V32HImode and V64QImode permutations even without AVX512BW [PR80355] Jakub Jelinek
@ 2021-08-10 10:14 ` Hongtao Liu
  0 siblings, 0 replies; 2+ messages in thread
From: Hongtao Liu @ 2021-08-10 10:14 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Uros Bizjak, GCC Patches

On Tue, Aug 10, 2021 at 4:54 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> Hi!
>
> When working on the PR, I've noticed we generate terrible code for
> V32HImode or V64QImode permutations for -mavx512f -mno-avx512bw.
> Generally we can't do much with such permutations, but since PR68655
> we can handle at least some, those expressible using V16SImode or V8DImode
> permutations, but that wasn't reachable, because ix86_vectorize_vec_perm_const
> didn't even try, it said without TARGET_AVX512BW it can't do anything, and
> with it can do everything, no d.testing_p attempts.
>
> This patch makes it try it for TARGET_AVX512F && !TARGET_AVX512BW.
TARGET_AVX512{F,BW,CD,DQ,VL} will be the baseline for all
AVX512-enabled processors after(including)SKX.
But it's definitely good to have this, patch LGTM.
>
> The first hunk is to avoid ICE, expand_vec_perm_even_odd_1 asserts d->vmode
> isn't V32HImode because expand_vec_perm_1 for AVX512BW handles already
> all permutations, but when we let it through without !TARGET_AVX512BW,
> expand_vec_perm_1 doesn't handle it.
>
> If we want, that hunk can be dropped if we implement in
> expand_vec_perm_even_odd_1 and its helper the even permutation as
> vpmovdw + vpmovdw + vinserti64x4 and odd permutation as
> vpsrld $16 + vpsrld $16 + vpmovdw + vpmovdw + vinserti64x4.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2021-08-10  Jakub Jelinek  <jakub@redhat.com>
>
>         PR target/80355
>         * config/i386/i386-expand.c (expand_vec_perm_even_odd): Return false
>         for V32HImode if !TARGET_AVX512BW.
>         (ix86_vectorize_vec_perm_const) <case E_V32HImode, case E_V64QImode>:
>         If !TARGET_AVX512BW and TARGET_AVX512F and d.testing_p, don't fail
>         early, but actually check the permutation.
>
>         * gcc.target/i386/avx512f-pr80355-2.c: New test.
>
> --- gcc/config/i386/i386-expand.c.jj    2021-08-05 10:26:15.589555028 +0200
> +++ gcc/config/i386/i386-expand.c       2021-08-09 14:14:35.466268680 +0200
> @@ -20337,6 +20337,11 @@ expand_vec_perm_even_odd (struct expand_
>      if (d->perm[i] != 2 * i + odd)
>        return false;
>
> +  if (d->vmode == E_V32HImode
> +      && d->testing_p
> +      && !TARGET_AVX512BW)
> +    return false;
> +
>    return expand_vec_perm_even_odd_1 (d, odd);
>  }
>
> @@ -20877,16 +20882,16 @@ ix86_vectorize_vec_perm_const (machine_m
>         return true;
>        break;
>      case E_V32HImode:
> -      if (!TARGET_AVX512BW)
> +      if (!TARGET_AVX512F)
>         return false;
> -      if (d.testing_p)
> +      if (d.testing_p && TARGET_AVX512BW)
>         /* All implementable with a single vperm[it]2 insn.  */
>         return true;
>        break;
>      case E_V64QImode:
> -      if (!TARGET_AVX512BW)
> +      if (!TARGET_AVX512F)
>         return false;
> -      if (d.testing_p)
> +      if (d.testing_p && TARGET_AVX512BW)
>         /* Implementable with 2 vperm[it]2, 2 vpshufb and 1 or insn.  */
>         return true;
>        break;
> --- gcc/testsuite/gcc.target/i386/avx512f-pr80355-2.c.jj        2021-08-09 14:24:27.176142589 +0200
> +++ gcc/testsuite/gcc.target/i386/avx512f-pr80355-2.c   2021-08-09 14:29:23.308074276 +0200
> @@ -0,0 +1,23 @@
> +/* PR target/80355 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mavx512f -mno-avx512vl -mno-avx512dq -mno-avx512bw" } */
> +/* { dg-final { scan-assembler-times "\tvshufi(?:32x4|64x2)\t" 2 } } */
> +
> +typedef short V __attribute__((vector_size (64)));
> +typedef char W __attribute__((vector_size (64)));
> +
> +W
> +f0 (W x)
> +{
> +  return __builtin_shuffle (x, (W) { 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,
> +                                    48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63,
> +                                    0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
> +                                    17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 });
> +}
> +
> +V
> +f1 (V x)
> +{
> +  return __builtin_shuffle (x, (V) { 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
> +                                    0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 });
> +}
>
>         Jakub
>


-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-08-10 10:08 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-10  8:54 [PATCH] i386: Allow some V32HImode and V64QImode permutations even without AVX512BW [PR80355] Jakub Jelinek
2021-08-10 10:14 ` Hongtao Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).