* [PATCH][AVX512] Fix ICE for vpexpand*.
@ 2021-04-30 4:42 Hongtao Liu
2021-04-30 4:49 ` [PATCH][AVX512] Optimize vpexpand* to mask mov when mask have all ones in it's lower part (including 0 and -1) Hongtao Liu
2021-05-12 7:29 ` [PATCH][AVX512] Fix ICE for vpexpand* Hongtao Liu
0 siblings, 2 replies; 6+ messages in thread
From: Hongtao Liu @ 2021-04-30 4:42 UTC (permalink / raw)
To: GCC Patches
[-- Attachment #1: Type: text/plain, Size: 2143 bytes --]
Hi:
This patch is to fix ice which was introduced by my
r11-5696-g35c4c67e6c534ef3d6ba7a7752ab7e0fbc91755b.
Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
Ok for trunk and backport to GCC11?
gcc/ChangeLog
PR target/100310
* config/i386/i386-expand.c
(ix86_expand_special_args_builtin): Keep constm1_operand only
if it satisfies insn's operand predicate.
gcc/testsuite/ChangeLog
PR target/100310
* gcc.target/i386/pr100310.c: New test.
Add test
---
gcc/config/i386/i386-expand.c | 5 +++--
gcc/testsuite/gcc.target/i386/pr100310.c | 12 ++++++++++++
2 files changed, 15 insertions(+), 2 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/i386/pr100310.c
diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 516440eb5c1..b2bb2b1e3a1 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -10862,11 +10862,12 @@ ix86_expand_special_args_builtin (const
struct builtin_description *d,
op = fixup_modeless_constant (op, mode);
- /* NB: 3-operands load implied it's a mask load,
+ /* NB: 3-operands load implied it's a mask load or v{p}expand*,
and that mask operand shoud be at the end.
Keep all-ones mask which would be simplified by the expander. */
if (nargs == 3 && i == 2 && klass == load
- && constm1_operand (op, mode))
+ && constm1_operand (op, mode)
+ && insn_p->operand[i].predicate (op, mode))
;
else if (GET_MODE (op) == mode || GET_MODE (op) == VOIDmode)
op = copy_to_mode_reg (mode, op);
diff --git a/gcc/testsuite/gcc.target/i386/pr100310.c
b/gcc/testsuite/gcc.target/i386/pr100310.c
new file mode 100644
index 00000000000..54ace18531b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr100310.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+#include <immintrin.h>
+
+double *p;
+volatile __m512d x;
+volatile __mmask8 m;
+
+void foo()
+{
+ x = _mm512_mask_expandloadu_pd (x, 255, p);
+}
--
2.18.1
--
BR,
Hongtao
[-- Attachment #2: 0001-i386-Fix-ICE-for-vpexpand.patch --]
[-- Type: text/x-patch, Size: 2006 bytes --]
From 1769a103d52370fd53508a7cccca81312fddb8b9 Mon Sep 17 00:00:00 2001
From: liuhongt <hongtao.liu@intel.com>
Date: Wed, 28 Apr 2021 14:52:59 +0800
Subject: [PATCH 1/2] [i386] Fix ICE for vpexpand*.
gcc/ChangeLog
PR target/100310
* config/i386/i386-expand.c
(ix86_expand_special_args_builtin): Keep constm1_operand only
if it satisfies insn's operand predicate.
gcc/testsuite/ChangeLog
PR target/100310
* gcc.target/i386/pr100310.c: New test.
Add test
---
gcc/config/i386/i386-expand.c | 5 +++--
gcc/testsuite/gcc.target/i386/pr100310.c | 12 ++++++++++++
2 files changed, 15 insertions(+), 2 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/i386/pr100310.c
diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 516440eb5c1..b2bb2b1e3a1 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -10862,11 +10862,12 @@ ix86_expand_special_args_builtin (const struct builtin_description *d,
op = fixup_modeless_constant (op, mode);
- /* NB: 3-operands load implied it's a mask load,
+ /* NB: 3-operands load implied it's a mask load or v{p}expand*,
and that mask operand shoud be at the end.
Keep all-ones mask which would be simplified by the expander. */
if (nargs == 3 && i == 2 && klass == load
- && constm1_operand (op, mode))
+ && constm1_operand (op, mode)
+ && insn_p->operand[i].predicate (op, mode))
;
else if (GET_MODE (op) == mode || GET_MODE (op) == VOIDmode)
op = copy_to_mode_reg (mode, op);
diff --git a/gcc/testsuite/gcc.target/i386/pr100310.c b/gcc/testsuite/gcc.target/i386/pr100310.c
new file mode 100644
index 00000000000..54ace18531b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr100310.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+#include <immintrin.h>
+
+double *p;
+volatile __m512d x;
+volatile __mmask8 m;
+
+void foo()
+{
+ x = _mm512_mask_expandloadu_pd (x, 255, p);
+}
--
2.18.1
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH][AVX512] Optimize vpexpand* to mask mov when mask have all ones in it's lower part (including 0 and -1).
2021-04-30 4:42 [PATCH][AVX512] Fix ICE for vpexpand* Hongtao Liu
@ 2021-04-30 4:49 ` Hongtao Liu
2021-05-12 7:28 ` Hongtao Liu
2021-05-12 7:29 ` [PATCH][AVX512] Fix ICE for vpexpand* Hongtao Liu
1 sibling, 1 reply; 6+ messages in thread
From: Hongtao Liu @ 2021-04-30 4:49 UTC (permalink / raw)
To: GCC Patches
[-- Attachment #1: Type: text/plain, Size: 1673 bytes --]
Hi:
For v{,p}expand* When mask is 0, -1, or has all all one bits in its
lower part, it can be optimized to simple mov or mask mov.
Bootstrapped and regtested on x86_64-linux-gnu{-m32,} and
x86_64-linux-gnu{m32\ -march=cascadelake,-m64\ -march=cascadelake},
gcc/ChangeLog:
* config/i386/i386-builtin.def (BDESC): Adjust builtin name.
* config/i386/sse.md (<avx512>_expand<mode>_mask): Rename to ..
(expand<mode>_mask): this ..
(*expand<mode>_mask): New pre_reload splitter to transform
* v{,p}expand* to vmov* when mask is zero, all ones, or has
all ones in its lower part, otherwise still generate v{,p}expand*.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx512bw-pr100267-1.c: New test.
* gcc.target/i386/avx512bw-pr100267-b-2.c: New test.
* gcc.target/i386/avx512bw-pr100267-d-2.c: New test.
* gcc.target/i386/avx512bw-pr100267-q-2.c: New test.
* gcc.target/i386/avx512bw-pr100267-w-2.c: New test.
* gcc.target/i386/avx512f-pr100267-1.c: New test.
* gcc.target/i386/avx512f-pr100267-pd-2.c: New test.
* gcc.target/i386/avx512f-pr100267-ps-2.c: New test.
* gcc.target/i386/avx512vl-pr100267-1.c: New test.
* gcc.target/i386/avx512vl-pr100267-pd-2.c: New test.
* gcc.target/i386/avx512vl-pr100267-ps-2.c: New test.
* gcc.target/i386/avx512vlbw-pr100267-1.c: New test.
* gcc.target/i386/avx512vlbw-pr100267-b-2.c: New test.
* gcc.target/i386/avx512vlbw-pr100267-d-2.c: New test.
* gcc.target/i386/avx512vlbw-pr100267-q-2.c: New test.
* gcc.target/i386/avx512vlbw-pr100267-w-2.c: New test.
[-- Attachment #2: 0002-i386-Optimize-vpexpand-to-mask-mov-when-mask-have-al.patch --]
[-- Type: text/x-patch, Size: 56785 bytes --]
From 8134ed81eaf6793fa1717d5f1d1e04834b67f6ab Mon Sep 17 00:00:00 2001
From: liuhongt <hongtao.liu@intel.com>
Date: Thu, 29 Apr 2021 18:27:09 +0800
Subject: [PATCH 2/2] [i386] Optimize vpexpand* to mask mov when mask have all
ones in it's lower part (including 0 and -1).
gcc/ChangeLog:
* config/i386/i386-builtin.def (BDESC): Adjust builtin name.
* config/i386/sse.md (<avx512>_expand<mode>_mask): Rename to ..
(expand<mode>_mask): this ..
(*expand<mode>_mask): New pre_reload splitter to transform
* v{,p}expand* to vmov* when mask is zero, all ones, or has
all ones in it's lower part, otherwise still generate v{,p}expand*.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx512bw-pr100267-1.c: New test.
* gcc.target/i386/avx512bw-pr100267-b-2.c: New test.
* gcc.target/i386/avx512bw-pr100267-d-2.c: New test.
* gcc.target/i386/avx512bw-pr100267-q-2.c: New test.
* gcc.target/i386/avx512bw-pr100267-w-2.c: New test.
* gcc.target/i386/avx512f-pr100267-1.c: New test.
* gcc.target/i386/avx512f-pr100267-pd-2.c: New test.
* gcc.target/i386/avx512f-pr100267-ps-2.c: New test.
* gcc.target/i386/avx512vl-pr100267-1.c: New test.
* gcc.target/i386/avx512vl-pr100267-pd-2.c: New test.
* gcc.target/i386/avx512vl-pr100267-ps-2.c: New test.
* gcc.target/i386/avx512vlbw-pr100267-1.c: New test.
* gcc.target/i386/avx512vlbw-pr100267-b-2.c: New test.
* gcc.target/i386/avx512vlbw-pr100267-d-2.c: New test.
* gcc.target/i386/avx512vlbw-pr100267-q-2.c: New test.
* gcc.target/i386/avx512vlbw-pr100267-w-2.c: New test.
---
gcc/config/i386/i386-builtin.def | 48 +++----
gcc/config/i386/sse.md | 69 +++++++++-
.../gcc.target/i386/avx512bw-pr100267-1.c | 38 ++++++
.../gcc.target/i386/avx512bw-pr100267-b-2.c | 74 +++++++++++
.../gcc.target/i386/avx512bw-pr100267-d-2.c | 74 +++++++++++
.../gcc.target/i386/avx512bw-pr100267-q-2.c | 74 +++++++++++
.../gcc.target/i386/avx512bw-pr100267-w-2.c | 74 +++++++++++
.../gcc.target/i386/avx512f-pr100267-1.c | 66 ++++++++++
.../gcc.target/i386/avx512f-pr100267-pd-2.c | 76 +++++++++++
.../gcc.target/i386/avx512f-pr100267-ps-2.c | 72 +++++++++++
.../gcc.target/i386/avx512vl-pr100267-1.c | 122 ++++++++++++++++++
.../gcc.target/i386/avx512vl-pr100267-pd-2.c | 15 +++
.../gcc.target/i386/avx512vl-pr100267-ps-2.c | 15 +++
.../gcc.target/i386/avx512vlbw-pr100267-1.c | 66 ++++++++++
.../gcc.target/i386/avx512vlbw-pr100267-b-2.c | 16 +++
.../gcc.target/i386/avx512vlbw-pr100267-d-2.c | 15 +++
.../gcc.target/i386/avx512vlbw-pr100267-q-2.c | 15 +++
.../gcc.target/i386/avx512vlbw-pr100267-w-2.c | 16 +++
18 files changed, 920 insertions(+), 25 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/i386/avx512bw-pr100267-1.c
create mode 100644 gcc/testsuite/gcc.target/i386/avx512bw-pr100267-b-2.c
create mode 100644 gcc/testsuite/gcc.target/i386/avx512bw-pr100267-d-2.c
create mode 100644 gcc/testsuite/gcc.target/i386/avx512bw-pr100267-q-2.c
create mode 100644 gcc/testsuite/gcc.target/i386/avx512bw-pr100267-w-2.c
create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-pr100267-1.c
create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-pr100267-pd-2.c
create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-pr100267-ps-2.c
create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-pr100267-1.c
create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-pr100267-pd-2.c
create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-pr100267-ps-2.c
create mode 100644 gcc/testsuite/gcc.target/i386/avx512vlbw-pr100267-1.c
create mode 100644 gcc/testsuite/gcc.target/i386/avx512vlbw-pr100267-b-2.c
create mode 100644 gcc/testsuite/gcc.target/i386/avx512vlbw-pr100267-d-2.c
create mode 100644 gcc/testsuite/gcc.target/i386/avx512vlbw-pr100267-q-2.c
create mode 100644 gcc/testsuite/gcc.target/i386/avx512vlbw-pr100267-w-2.c
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 4dbd4f23647..90c5bd5d208 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -212,13 +212,13 @@ BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_compressstorev16sf_mask, "__
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_compressstorev16si_mask, "__builtin_ia32_compressstoresi512_mask", IX86_BUILTIN_PCOMPRESSDSTORE512, UNKNOWN, (int) VOID_FTYPE_PV16SI_V16SI_UHI)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_compressstorev8df_mask, "__builtin_ia32_compressstoredf512_mask", IX86_BUILTIN_COMPRESSPDSTORE512, UNKNOWN, (int) VOID_FTYPE_PV8DF_V8DF_UQI)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_compressstorev8di_mask, "__builtin_ia32_compressstoredi512_mask", IX86_BUILTIN_PCOMPRESSQSTORE512, UNKNOWN, (int) VOID_FTYPE_PV8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_expandv16sf_mask, "__builtin_ia32_expandloadsf512_mask", IX86_BUILTIN_EXPANDPSLOAD512, UNKNOWN, (int) V16SF_FTYPE_PCV16SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_expandv16sf_mask, "__builtin_ia32_expandloadsf512_mask", IX86_BUILTIN_EXPANDPSLOAD512, UNKNOWN, (int) V16SF_FTYPE_PCV16SF_V16SF_UHI)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_expandv16sf_maskz, "__builtin_ia32_expandloadsf512_maskz", IX86_BUILTIN_EXPANDPSLOAD512Z, UNKNOWN, (int) V16SF_FTYPE_PCV16SF_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_expandv16si_mask, "__builtin_ia32_expandloadsi512_mask", IX86_BUILTIN_PEXPANDDLOAD512, UNKNOWN, (int) V16SI_FTYPE_PCV16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_expandv16si_mask, "__builtin_ia32_expandloadsi512_mask", IX86_BUILTIN_PEXPANDDLOAD512, UNKNOWN, (int) V16SI_FTYPE_PCV16SI_V16SI_UHI)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_expandv16si_maskz, "__builtin_ia32_expandloadsi512_maskz", IX86_BUILTIN_PEXPANDDLOAD512Z, UNKNOWN, (int) V16SI_FTYPE_PCV16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_expandv8df_mask, "__builtin_ia32_expandloaddf512_mask", IX86_BUILTIN_EXPANDPDLOAD512, UNKNOWN, (int) V8DF_FTYPE_PCV8DF_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_expandv8df_mask, "__builtin_ia32_expandloaddf512_mask", IX86_BUILTIN_EXPANDPDLOAD512, UNKNOWN, (int) V8DF_FTYPE_PCV8DF_V8DF_UQI)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_expandv8df_maskz, "__builtin_ia32_expandloaddf512_maskz", IX86_BUILTIN_EXPANDPDLOAD512Z, UNKNOWN, (int) V8DF_FTYPE_PCV8DF_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_expandv8di_mask, "__builtin_ia32_expandloaddi512_mask", IX86_BUILTIN_PEXPANDQLOAD512, UNKNOWN, (int) V8DI_FTYPE_PCV8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_expandv8di_mask, "__builtin_ia32_expandloaddi512_mask", IX86_BUILTIN_PEXPANDQLOAD512, UNKNOWN, (int) V8DI_FTYPE_PCV8DI_V8DI_UQI)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_expandv8di_maskz, "__builtin_ia32_expandloaddi512_maskz", IX86_BUILTIN_PEXPANDQLOAD512Z, UNKNOWN, (int) V8DI_FTYPE_PCV8DI_V8DI_UQI)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_loadv16si_mask, "__builtin_ia32_loaddqusi512_mask", IX86_BUILTIN_LOADDQUSI512, UNKNOWN, (int) V16SI_FTYPE_PCINT_V16SI_UHI)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_loadv8di_mask, "__builtin_ia32_loaddqudi512_mask", IX86_BUILTIN_LOADDQUDI512, UNKNOWN, (int) V8DI_FTYPE_PCINT64_V8DI_UQI)
@@ -345,14 +345,14 @@ BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_compressstorev4di_mask, "_
BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_compressstorev2di_mask, "__builtin_ia32_compressstoredi128_mask", IX86_BUILTIN_PCOMPRESSQSTORE128, UNKNOWN, (int) VOID_FTYPE_PV2DI_V2DI_UQI)
BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_compressstorev8si_mask, "__builtin_ia32_compressstoresi256_mask", IX86_BUILTIN_PCOMPRESSDSTORE256, UNKNOWN, (int) VOID_FTYPE_PV8SI_V8SI_UQI)
BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_compressstorev4si_mask, "__builtin_ia32_compressstoresi128_mask", IX86_BUILTIN_PCOMPRESSDSTORE128, UNKNOWN, (int) VOID_FTYPE_PV4SI_V4SI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_expandv4df_mask, "__builtin_ia32_expandloaddf256_mask", IX86_BUILTIN_EXPANDPDLOAD256, UNKNOWN, (int) V4DF_FTYPE_PCV4DF_V4DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_expandv2df_mask, "__builtin_ia32_expandloaddf128_mask", IX86_BUILTIN_EXPANDPDLOAD128, UNKNOWN, (int) V2DF_FTYPE_PCV2DF_V2DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_expandv8sf_mask, "__builtin_ia32_expandloadsf256_mask", IX86_BUILTIN_EXPANDPSLOAD256, UNKNOWN, (int) V8SF_FTYPE_PCV8SF_V8SF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_expandv4sf_mask, "__builtin_ia32_expandloadsf128_mask", IX86_BUILTIN_EXPANDPSLOAD128, UNKNOWN, (int) V4SF_FTYPE_PCV4SF_V4SF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_expandv4di_mask, "__builtin_ia32_expandloaddi256_mask", IX86_BUILTIN_PEXPANDQLOAD256, UNKNOWN, (int) V4DI_FTYPE_PCV4DI_V4DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_expandv2di_mask, "__builtin_ia32_expandloaddi128_mask", IX86_BUILTIN_PEXPANDQLOAD128, UNKNOWN, (int) V2DI_FTYPE_PCV2DI_V2DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_expandv8si_mask, "__builtin_ia32_expandloadsi256_mask", IX86_BUILTIN_PEXPANDDLOAD256, UNKNOWN, (int) V8SI_FTYPE_PCV8SI_V8SI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_expandv4si_mask, "__builtin_ia32_expandloadsi128_mask", IX86_BUILTIN_PEXPANDDLOAD128, UNKNOWN, (int) V4SI_FTYPE_PCV4SI_V4SI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_expandv4df_mask, "__builtin_ia32_expandloaddf256_mask", IX86_BUILTIN_EXPANDPDLOAD256, UNKNOWN, (int) V4DF_FTYPE_PCV4DF_V4DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_expandv2df_mask, "__builtin_ia32_expandloaddf128_mask", IX86_BUILTIN_EXPANDPDLOAD128, UNKNOWN, (int) V2DF_FTYPE_PCV2DF_V2DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_expandv8sf_mask, "__builtin_ia32_expandloadsf256_mask", IX86_BUILTIN_EXPANDPSLOAD256, UNKNOWN, (int) V8SF_FTYPE_PCV8SF_V8SF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_expandv4sf_mask, "__builtin_ia32_expandloadsf128_mask", IX86_BUILTIN_EXPANDPSLOAD128, UNKNOWN, (int) V4SF_FTYPE_PCV4SF_V4SF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_expandv4di_mask, "__builtin_ia32_expandloaddi256_mask", IX86_BUILTIN_PEXPANDQLOAD256, UNKNOWN, (int) V4DI_FTYPE_PCV4DI_V4DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_expandv2di_mask, "__builtin_ia32_expandloaddi128_mask", IX86_BUILTIN_PEXPANDQLOAD128, UNKNOWN, (int) V2DI_FTYPE_PCV2DI_V2DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_expandv8si_mask, "__builtin_ia32_expandloadsi256_mask", IX86_BUILTIN_PEXPANDDLOAD256, UNKNOWN, (int) V8SI_FTYPE_PCV8SI_V8SI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_expandv4si_mask, "__builtin_ia32_expandloadsi128_mask", IX86_BUILTIN_PEXPANDDLOAD128, UNKNOWN, (int) V4SI_FTYPE_PCV4SI_V4SI_UQI)
BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_expandv4df_maskz, "__builtin_ia32_expandloaddf256_maskz", IX86_BUILTIN_EXPANDPDLOAD256Z, UNKNOWN, (int) V4DF_FTYPE_PCV4DF_V4DF_UQI)
BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_expandv2df_maskz, "__builtin_ia32_expandloaddf128_maskz", IX86_BUILTIN_EXPANDPDLOAD128Z, UNKNOWN, (int) V2DF_FTYPE_PCV2DF_V2DF_UQI)
BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_expandv8sf_maskz, "__builtin_ia32_expandloadsf256_maskz", IX86_BUILTIN_EXPANDPSLOAD256Z, UNKNOWN, (int) V8SF_FTYPE_PCV8SF_V8SF_UQI)
@@ -1335,9 +1335,9 @@ BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_floatv8siv8df2_mask, "__builtin_ia32
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vcvtps2ph512_mask, "__builtin_ia32_vcvtps2ph512_mask", IX86_BUILTIN_CVTPS2PH512, UNKNOWN, (int) V16HI_FTYPE_V16SF_INT_V16HI_UHI)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_ufloatv8siv8df2_mask, "__builtin_ia32_cvtudq2pd512_mask", IX86_BUILTIN_CVTUDQ2PD512, UNKNOWN, (int) V8DF_FTYPE_V8SI_V8DF_UQI)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_cvtusi2sd32, "__builtin_ia32_cvtusi2sd32", IX86_BUILTIN_CVTUSI2SD32, UNKNOWN, (int) V2DF_FTYPE_V2DF_UINT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_expandv8df_mask, "__builtin_ia32_expanddf512_mask", IX86_BUILTIN_EXPANDPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_expandv8df_mask, "__builtin_ia32_expanddf512_mask", IX86_BUILTIN_EXPANDPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_UQI)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_expandv8df_maskz, "__builtin_ia32_expanddf512_maskz", IX86_BUILTIN_EXPANDPD512Z, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_expandv16sf_mask, "__builtin_ia32_expandsf512_mask", IX86_BUILTIN_EXPANDPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_expandv16sf_mask, "__builtin_ia32_expandsf512_mask", IX86_BUILTIN_EXPANDPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_UHI)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_expandv16sf_maskz, "__builtin_ia32_expandsf512_maskz", IX86_BUILTIN_EXPANDPS512Z, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_UHI)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vextractf32x4_mask, "__builtin_ia32_extractf32x4_mask", IX86_BUILTIN_EXTRACTF32X4, UNKNOWN, (int) V4SF_FTYPE_V16SF_INT_V4SF_UQI)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vextractf64x4_mask, "__builtin_ia32_extractf64x4_mask", IX86_BUILTIN_EXTRACTF64X4, UNKNOWN, (int) V4DF_FTYPE_V8DF_INT_V4DF_UQI)
@@ -1374,9 +1374,9 @@ BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_gtv16si3_mask, "__builtin_ia
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_gtv8di3_mask, "__builtin_ia32_pcmpgtq512_mask", IX86_BUILTIN_PCMPGTQ512_MASK, UNKNOWN, (int) UQI_FTYPE_V8DI_V8DI_UQI)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_compressv16si_mask, "__builtin_ia32_compresssi512_mask", IX86_BUILTIN_PCOMPRESSD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_compressv8di_mask, "__builtin_ia32_compressdi512_mask", IX86_BUILTIN_PCOMPRESSQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_expandv16si_mask, "__builtin_ia32_expandsi512_mask", IX86_BUILTIN_PEXPANDD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_expandv16si_mask, "__builtin_ia32_expandsi512_mask", IX86_BUILTIN_PEXPANDD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_expandv16si_maskz, "__builtin_ia32_expandsi512_maskz", IX86_BUILTIN_PEXPANDD512Z, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_expandv8di_mask, "__builtin_ia32_expanddi512_mask", IX86_BUILTIN_PEXPANDQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_expandv8di_mask, "__builtin_ia32_expanddi512_mask", IX86_BUILTIN_PEXPANDQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_expandv8di_maskz, "__builtin_ia32_expanddi512_maskz", IX86_BUILTIN_PEXPANDQ512Z, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_smaxv16si3_mask, "__builtin_ia32_pmaxsd512_mask", IX86_BUILTIN_PMAXSD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_smaxv8di3_mask, "__builtin_ia32_pmaxsq512_mask", IX86_BUILTIN_PMAXSQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
@@ -2180,14 +2180,14 @@ BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_compressv4di_mask, "__buil
BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_compressv2di_mask, "__builtin_ia32_compressdi128_mask", IX86_BUILTIN_PCOMPRESSQ128, UNKNOWN, (int) V2DI_FTYPE_V2DI_V2DI_UQI)
BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_compressv8si_mask, "__builtin_ia32_compresssi256_mask", IX86_BUILTIN_PCOMPRESSD256, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_UQI)
BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_compressv4si_mask, "__builtin_ia32_compresssi128_mask", IX86_BUILTIN_PCOMPRESSD128, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_expandv4df_mask, "__builtin_ia32_expanddf256_mask", IX86_BUILTIN_EXPANDPD256, UNKNOWN, (int) V4DF_FTYPE_V4DF_V4DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_expandv2df_mask, "__builtin_ia32_expanddf128_mask", IX86_BUILTIN_EXPANDPD128, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_expandv8sf_mask, "__builtin_ia32_expandsf256_mask", IX86_BUILTIN_EXPANDPS256, UNKNOWN, (int) V8SF_FTYPE_V8SF_V8SF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_expandv4sf_mask, "__builtin_ia32_expandsf128_mask", IX86_BUILTIN_EXPANDPS128, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_expandv4di_mask, "__builtin_ia32_expanddi256_mask", IX86_BUILTIN_PEXPANDQ256, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_expandv2di_mask, "__builtin_ia32_expanddi128_mask", IX86_BUILTIN_PEXPANDQ128, UNKNOWN, (int) V2DI_FTYPE_V2DI_V2DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_expandv8si_mask, "__builtin_ia32_expandsi256_mask", IX86_BUILTIN_PEXPANDD256, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_expandv4si_mask, "__builtin_ia32_expandsi128_mask", IX86_BUILTIN_PEXPANDD128, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_expandv4df_mask, "__builtin_ia32_expanddf256_mask", IX86_BUILTIN_EXPANDPD256, UNKNOWN, (int) V4DF_FTYPE_V4DF_V4DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_expandv2df_mask, "__builtin_ia32_expanddf128_mask", IX86_BUILTIN_EXPANDPD128, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_expandv8sf_mask, "__builtin_ia32_expandsf256_mask", IX86_BUILTIN_EXPANDPS256, UNKNOWN, (int) V8SF_FTYPE_V8SF_V8SF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_expandv4sf_mask, "__builtin_ia32_expandsf128_mask", IX86_BUILTIN_EXPANDPS128, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_expandv4di_mask, "__builtin_ia32_expanddi256_mask", IX86_BUILTIN_PEXPANDQ256, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_expandv2di_mask, "__builtin_ia32_expanddi128_mask", IX86_BUILTIN_PEXPANDQ128, UNKNOWN, (int) V2DI_FTYPE_V2DI_V2DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_expandv8si_mask, "__builtin_ia32_expandsi256_mask", IX86_BUILTIN_PEXPANDD256, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_expandv4si_mask, "__builtin_ia32_expandsi128_mask", IX86_BUILTIN_PEXPANDD128, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_UQI)
BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_expandv4df_maskz, "__builtin_ia32_expanddf256_maskz", IX86_BUILTIN_EXPANDPD256Z, UNKNOWN, (int) V4DF_FTYPE_V4DF_V4DF_UQI)
BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_expandv2df_maskz, "__builtin_ia32_expanddf128_maskz", IX86_BUILTIN_EXPANDPD128Z, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_UQI)
BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_expandv8sf_maskz, "__builtin_ia32_expandsf256_maskz", IX86_BUILTIN_EXPANDPS256Z, UNKNOWN, (int) V8SF_FTYPE_V8SF_V8SF_UQI)
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 897cf3eaea9..e1a3395726c 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -702,6 +702,17 @@ (define_mode_iterator VI48F
(V4DI "TARGET_AVX512VL") (V4DF "TARGET_AVX512VL")
(V4SI "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")
(V2DI "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")])
+(define_mode_iterator VI12_VI48F_AVX512VLBW
+ [(V16SI "TARGET_AVX512F") (V16SF "TARGET_AVX512F")
+ (V8DI "TARGET_AVX512F") (V8DF "TARGET_AVX512F")
+ (V8SI "TARGET_AVX512VL") (V8SF "TARGET_AVX512VL")
+ (V4DI "TARGET_AVX512VL") (V4DF "TARGET_AVX512VL")
+ (V4SI "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")
+ (V2DI "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")
+ (V64QI "TARGET_AVX512BW") (V16QI "TARGET_AVX512VL")
+ (V32QI "TARGET_AVX512VL && TARGET_AVX512BW") (V32HI "TARGET_AVX512BW")
+ (V16HI "TARGET_AVX512VL") (V8HI "TARGET_AVX512VL")])
+
(define_mode_iterator VI48F_256 [V8SI V8SF V4DI V4DF])
(define_mode_iterator VF_AVX512
@@ -22735,7 +22746,7 @@ (define_expand "<avx512>_expand<mode>_maskz"
"TARGET_AVX512F"
"operands[2] = CONST0_RTX (<MODE>mode);")
-(define_insn "<avx512>_expand<mode>_mask"
+(define_insn "expand<mode>_mask"
[(set (match_operand:VI48F 0 "register_operand" "=v,v")
(unspec:VI48F
[(match_operand:VI48F 1 "nonimmediate_operand" "v,m")
@@ -22763,6 +22774,62 @@ (define_insn "expand<mode>_mask"
(set_attr "memory" "none,load")
(set_attr "mode" "<sseinsnmode>")])
+(define_insn_and_split "*expand<mode>_mask"
+ [(set (match_operand:VI12_VI48F_AVX512VLBW 0 "register_operand")
+ (unspec:VI12_VI48F_AVX512VLBW
+ [(match_operand:VI12_VI48F_AVX512VLBW 1 "nonimmediate_operand")
+ (match_operand:VI12_VI48F_AVX512VLBW 2 "nonimm_or_0_operand")
+ (match_operand 3 "const_int_operand")]
+ UNSPEC_EXPAND))]
+ "ix86_pre_reload_split ()
+ && (TARGET_AVX512VBMI2 || GET_MODE_SIZE (GET_MODE_INNER (<MODE>mode)) >= 4)"
+ "#"
+ "&& 1"
+ [(const_int 0)]
+{
+ unsigned HOST_WIDE_INT mask = INTVAL (operands[3]);
+ bool has_zero = false;
+ unsigned n = GET_MODE_NUNITS (<MODE>mode), i;
+ unsigned ones = 0;
+
+ /* If all ones bits is in mask's lower part,
+ get number of ones and assign it to ONES. */
+ for (i = 0; i != n; i++)
+ {
+ if ((mask & HOST_WIDE_INT_1U << i) && has_zero)
+ break;
+
+ /* Record first zero bit. */
+ if (!(mask & HOST_WIDE_INT_1U << i) && !has_zero)
+ {
+ has_zero = true;
+ ones = i;
+ }
+ }
+
+ if (!has_zero)
+ ones = n;
+
+ if (i != n || (ones != 0 && ones != n))
+ {
+ rtx reg = gen_reg_rtx (<avx512fmaskmode>mode);
+ emit_move_insn (reg, operands[3]);
+ enum insn_code icode;
+ if (i == n)
+ /* For masks with all one bits in it's lower part,
+ we can transform v{,p}expand* to vmovdq* with
+ mask operand. */
+ icode = CODE_FOR_<avx512>_load<mode>_mask;
+ else
+ icode = CODE_FOR_expand<mode>_mask;
+ emit_insn (GEN_FCN (icode) (operands[0], operands[1], operands[2], reg));
+ }
+ else
+ /* For ALL_MASK_ONES or CONST0_RTX mask, transform it to simple mov. */
+ emit_move_insn (operands[0], ones ? operands[1] : operands[2]);
+ DONE;
+})
+
(define_expand "expand<mode>_maskz"
[(set (match_operand:VI12_AVX512VLBW 0 "register_operand")
(unspec:VI12_AVX512VLBW
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-pr100267-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-pr100267-1.c
new file mode 100644
index 00000000000..ce83d63bc73
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-pr100267-1.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -mavx512vbmi2 -O2" } */
+/* { dg-final { scan-assembler-times "vpexpandb\[ \\t\]+\[^\{\n\]*\\(\[^\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpexpandb\[ \\t\]+\[^\{\n\(]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpexpandw\[ \\t\]+\[^\{\n\]*\\(\[^\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpexpandw\[ \\t\]+\[^\{\n\(]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+/* { dg-final { scan-assembler-times "vmov\[a-z0-9\]*\[ \\t\]+\[^\{\n\]*\\(\[^\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vmov\[a-z0-9\]*\[ \\t\]+\[^\{\n\(]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 2 } } */
+#include <immintrin.h>
+
+char *pi8;
+short *pi16;
+volatile __m512i xi16, xi8;
+
+void extern
+avx512f_test (void)
+{
+ xi8 = _mm512_mask_expand_epi8 (xi8, 0, xi8);
+ xi8 = _mm512_mask_expand_epi8 (xi8, -1, xi8);
+ xi8 = _mm512_mask_expand_epi8 (xi8, (1 << 8) - 1, xi8);
+ xi8 = _mm512_mask_expand_epi8 (xi8, (1 << 8) + 1, xi8);
+
+ xi8 = _mm512_mask_expandloadu_epi8 (xi8, 0, pi8);
+ xi8 = _mm512_mask_expandloadu_epi8 (xi8, -1, pi8);
+ xi8 = _mm512_mask_expandloadu_epi8 (xi8, (1 << 6) - 1, pi8);
+ xi8 = _mm512_mask_expandloadu_epi8 (xi8, (1 << 6) + 3, pi8);
+
+ xi16 = _mm512_mask_expand_epi16 (xi16, 0, xi16);
+ xi16 = _mm512_mask_expand_epi16 (xi16, -1, xi16);
+ xi16 = _mm512_mask_expand_epi16 (xi16, (1 << 3) - 1, xi16);
+ xi16 = _mm512_mask_expand_epi16 (xi16, (1 << 3) + 2, xi16);
+
+ xi16 = _mm512_mask_expandloadu_epi16 (xi16, 0, pi16);
+ xi16 = _mm512_mask_expandloadu_epi16 (xi16, -1, pi16);
+ xi16 = _mm512_mask_expandloadu_epi16 (xi16, (1 << 7) - 1, pi16);
+ xi16 = _mm512_mask_expandloadu_epi16 (xi16, (1 << 7) + 7, pi16);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-pr100267-b-2.c b/gcc/testsuite/gcc.target/i386/avx512bw-pr100267-b-2.c
new file mode 100644
index 00000000000..424b485a203
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-pr100267-b-2.c
@@ -0,0 +1,74 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512f -mavx512vbmi2 -mavx512bw" } */
+/* { dg-require-effective-target avx512f } */
+/* { dg-require-effective-target avx512vbmi2 } */
+
+#define AVX512F
+
+#define AVX512VBMI2
+#include "avx512f-helper.h"
+
+#define SIZE (AVX512F_LEN / 8)
+#include "avx512f-mask-type.h"
+
+static void
+CALC (char *s, char *r, MASK_TYPE mask)
+{
+ int i, k;
+
+ for (i = 0, k = 0; i < SIZE; i++)
+ {
+ if (mask & ((long long)1 << i))
+ r[i] = s[k++];
+ }
+}
+
+void
+TEST (void)
+{
+ UNION_TYPE (AVX512F_LEN, i_b) s1, res1, res2, res3, res4, res5, res6, res7, res8;
+ MASK_TYPE mask = (1 << (SIZE >> 2) - 1) - 1;
+ char s2[SIZE];
+ char res_ref1[SIZE];
+ char res_ref2[SIZE];
+ int i, sign = 1;
+
+ for (i = 0; i < SIZE; i++)
+ {
+ s1.a[i] = 12345 * (i + 200) * sign;
+ s2[i] = 67890 * (i + 300) * sign;
+ res1.a[i] = DEFAULT_VALUE;
+ res5.a[i] = DEFAULT_VALUE;
+ sign = -sign;
+ }
+
+ res2.x = INTRINSIC (_mask_expand_epi8) (res1.x, MASK_ALL_ONES, s1.x);
+ res3.x = INTRINSIC (_mask_expand_epi8) (res1.x, 0, s1.x);
+ res4.x = INTRINSIC (_mask_expand_epi8) (res1.x, mask, s1.x);
+ res6.x = INTRINSIC (_mask_expandloadu_epi8) (res5.x, MASK_ALL_ONES, s2);
+ res7.x = INTRINSIC (_mask_expandloadu_epi8) (res5.x, 0, s2);
+ res8.x = INTRINSIC (_mask_expandloadu_epi8) (res5.x, mask, s2);
+
+ CALC (s1.a, res_ref1, mask);
+ CALC (s2, res_ref2, mask);
+
+ if (UNION_CHECK (AVX512F_LEN, i_b) (res2, s1.a))
+ abort ();
+
+ if (UNION_CHECK (AVX512F_LEN, i_b) (res3, res1.a))
+ abort ();
+
+ MASK_MERGE (i_b) (res_ref1, mask, SIZE);
+ if (UNION_CHECK (AVX512F_LEN, i_b) (res4, res_ref1))
+ abort ();
+
+ if (UNION_CHECK (AVX512F_LEN, i_b) (res6, s2))
+ abort ();
+
+ if (UNION_CHECK (AVX512F_LEN, i_b) (res7, res5.a))
+ abort ();
+
+ MASK_MERGE (i_b) (res_ref2, mask, SIZE);
+ if (UNION_CHECK (AVX512F_LEN, i_b) (res8, res_ref2))
+ abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-pr100267-d-2.c b/gcc/testsuite/gcc.target/i386/avx512bw-pr100267-d-2.c
new file mode 100644
index 00000000000..24790b20cf1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-pr100267-d-2.c
@@ -0,0 +1,74 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512f -mavx512vbmi2 -mavx512bw" } */
+/* { dg-require-effective-target avx512f } */
+/* { dg-require-effective-target avx512vbmi2 } */
+
+#define AVX512F
+
+#define AVX512VBMI2
+#include "avx512f-helper.h"
+
+#define SIZE (AVX512F_LEN / 32)
+#include "avx512f-mask-type.h"
+
+static void
+CALC (int *s, int *r, MASK_TYPE mask)
+{
+ int i, k;
+
+ for (i = 0, k = 0; i < SIZE; i++)
+ {
+ if (mask & ((long long)1 << i))
+ r[i] = s[k++];
+ }
+}
+
+void
+TEST (void)
+{
+ UNION_TYPE (AVX512F_LEN, i_d) s1, res1, res2, res3, res4, res5, res6, res7, res8;
+ MASK_TYPE mask = (1 << SIZE - 1) - 1;
+ int s2[SIZE];
+ int res_ref1[SIZE];
+ int res_ref2[SIZE];
+ int i, sign = 1;
+
+ for (i = 0; i < SIZE; i++)
+ {
+ s1.a[i] = 12345 * (i + 200) * sign;
+ s2[i] = 67890 * (i + 300) * sign;
+ res1.a[i] = DEFAULT_VALUE;
+ res5.a[i] = DEFAULT_VALUE;
+ sign = -sign;
+ }
+
+ res2.x = INTRINSIC (_mask_expand_epi32) (res1.x, MASK_ALL_ONES, s1.x);
+ res3.x = INTRINSIC (_mask_expand_epi32) (res1.x, 0, s1.x);
+ res4.x = INTRINSIC (_mask_expand_epi32) (res1.x, mask, s1.x);
+ res6.x = INTRINSIC (_mask_expandloadu_epi32) (res5.x, MASK_ALL_ONES, s2);
+ res7.x = INTRINSIC (_mask_expandloadu_epi32) (res5.x, 0, s2);
+ res8.x = INTRINSIC (_mask_expandloadu_epi32) (res5.x, mask, s2);
+
+ CALC (s1.a, res_ref1, mask);
+ CALC (s2, res_ref2, mask);
+
+ if (UNION_CHECK (AVX512F_LEN, i_d) (res2, s1.a))
+ abort ();
+
+ if (UNION_CHECK (AVX512F_LEN, i_d) (res3, res1.a))
+ abort ();
+
+ MASK_MERGE (i_d) (res_ref1, mask, SIZE);
+ if (UNION_CHECK (AVX512F_LEN, i_d) (res4, res_ref1))
+ abort ();
+
+ if (UNION_CHECK (AVX512F_LEN, i_d) (res6, s2))
+ abort ();
+
+ if (UNION_CHECK (AVX512F_LEN, i_d) (res7, res5.a))
+ abort ();
+
+ MASK_MERGE (i_d) (res_ref2, mask, SIZE);
+ if (UNION_CHECK (AVX512F_LEN, i_d) (res8, res_ref2))
+ abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-pr100267-q-2.c b/gcc/testsuite/gcc.target/i386/avx512bw-pr100267-q-2.c
new file mode 100644
index 00000000000..e0320a50d62
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-pr100267-q-2.c
@@ -0,0 +1,74 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512f -mavx512vbmi2 -mavx512bw" } */
+/* { dg-require-effective-target avx512f } */
+/* { dg-require-effective-target avx512vbmi2 } */
+
+#define AVX512F
+
+#define AVX512VBMI2
+#include "avx512f-helper.h"
+
+#define SIZE (AVX512F_LEN / 64)
+#include "avx512f-mask-type.h"
+
+static void
+CALC (long long *s, long long *r, MASK_TYPE mask)
+{
+ int i, k;
+
+ for (i = 0, k = 0; i < SIZE; i++)
+ {
+ if (mask & ((long long)1 << i))
+ r[i] = s[k++];
+ }
+}
+
+void
+TEST (void)
+{
+ UNION_TYPE (AVX512F_LEN, i_q) s1, res1, res2, res3, res4, res5, res6, res7, res8;
+ MASK_TYPE mask = (1 << SIZE - 1) - 1;
+ long long s2[SIZE];
+ long long res_ref1[SIZE];
+ long long res_ref2[SIZE];
+ int i, sign = 1;
+
+ for (i = 0; i < SIZE; i++)
+ {
+ s1.a[i] = 12345 * (i + 200) * sign;
+ s2[i] = 67890 * (i + 300) * sign;
+ res1.a[i] = DEFAULT_VALUE;
+ res5.a[i] = DEFAULT_VALUE;
+ sign = -sign;
+ }
+
+ res2.x = INTRINSIC (_mask_expand_epi32) (res1.x, MASK_ALL_ONES, s1.x);
+ res3.x = INTRINSIC (_mask_expand_epi32) (res1.x, 0, s1.x);
+ res4.x = INTRINSIC (_mask_expand_epi32) (res1.x, mask, s1.x);
+ res6.x = INTRINSIC (_mask_expandloadu_epi32) (res5.x, MASK_ALL_ONES, s2);
+ res7.x = INTRINSIC (_mask_expandloadu_epi32) (res5.x, 0, s2);
+ res8.x = INTRINSIC (_mask_expandloadu_epi32) (res5.x, mask, s2);
+
+ CALC (s1.a, res_ref1, mask);
+ CALC (s2, res_ref2, mask);
+
+ if (UNION_CHECK (AVX512F_LEN, i_q) (res2, s1.a))
+ abort ();
+
+ if (UNION_CHECK (AVX512F_LEN, i_q) (res3, res1.a))
+ abort ();
+
+ MASK_MERGE (i_q) (res_ref1, mask, SIZE);
+ if (UNION_CHECK (AVX512F_LEN, i_q) (res4, res_ref1))
+ abort ();
+
+ if (UNION_CHECK (AVX512F_LEN, i_q) (res6, s2))
+ abort ();
+
+ if (UNION_CHECK (AVX512F_LEN, i_q) (res7, res5.a))
+ abort ();
+
+ MASK_MERGE (i_q) (res_ref2, mask, SIZE);
+ if (UNION_CHECK (AVX512F_LEN, i_q) (res8, res_ref2))
+ abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-pr100267-w-2.c b/gcc/testsuite/gcc.target/i386/avx512bw-pr100267-w-2.c
new file mode 100644
index 00000000000..926e04d4df6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-pr100267-w-2.c
@@ -0,0 +1,74 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512f -mavx512vbmi2 -mavx512bw" } */
+/* { dg-require-effective-target avx512f } */
+/* { dg-require-effective-target avx512vbmi2 } */
+
+#define AVX512F
+
+#define AVX512VBMI2
+#include "avx512f-helper.h"
+
+#define SIZE (AVX512F_LEN / 16)
+#include "avx512f-mask-type.h"
+
+static void
+CALC (short *s, short *r, MASK_TYPE mask)
+{
+ int i, k;
+
+ for (i = 0, k = 0; i < SIZE; i++)
+ {
+ if (mask & ((long long)1 << i))
+ r[i] = s[k++];
+ }
+}
+
+void
+TEST (void)
+{
+ UNION_TYPE (AVX512F_LEN, i_w) s1, res1, res2, res3, res4, res5, res6, res7, res8;
+ MASK_TYPE mask = (1 << (SIZE >> 2) - 1) - 1;
+ short s2[SIZE];
+ short res_ref1[SIZE];
+ short res_ref2[SIZE];
+ int i, sign = 1;
+
+ for (i = 0; i < SIZE; i++)
+ {
+ s1.a[i] = 12345 * (i + 200) * sign;
+ s2[i] = 67890 * (i + 300) * sign;
+ res1.a[i] = DEFAULT_VALUE;
+ res5.a[i] = DEFAULT_VALUE;
+ sign = -sign;
+ }
+
+ res2.x = INTRINSIC (_mask_expand_epi16) (res1.x, MASK_ALL_ONES, s1.x);
+ res3.x = INTRINSIC (_mask_expand_epi16) (res1.x, 0, s1.x);
+ res4.x = INTRINSIC (_mask_expand_epi16) (res1.x, mask, s1.x);
+ res6.x = INTRINSIC (_mask_expandloadu_epi16) (res5.x, MASK_ALL_ONES, s2);
+ res7.x = INTRINSIC (_mask_expandloadu_epi16) (res5.x, 0, s2);
+ res8.x = INTRINSIC (_mask_expandloadu_epi16) (res5.x, mask, s2);
+
+ CALC (s1.a, res_ref1, mask);
+ CALC (s2, res_ref2, mask);
+
+ if (UNION_CHECK (AVX512F_LEN, i_w) (res2, s1.a))
+ abort ();
+
+ if (UNION_CHECK (AVX512F_LEN, i_w) (res3, res1.a))
+ abort ();
+
+ MASK_MERGE (i_w) (res_ref1, mask, SIZE);
+ if (UNION_CHECK (AVX512F_LEN, i_w) (res4, res_ref1))
+ abort ();
+
+ if (UNION_CHECK (AVX512F_LEN, i_w) (res6, s2))
+ abort ();
+
+ if (UNION_CHECK (AVX512F_LEN, i_w) (res7, res5.a))
+ abort ();
+
+ MASK_MERGE (i_w) (res_ref2, mask, SIZE);
+ if (UNION_CHECK (AVX512F_LEN, i_w) (res8, res_ref2))
+ abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-pr100267-1.c b/gcc/testsuite/gcc.target/i386/avx512f-pr100267-1.c
new file mode 100644
index 00000000000..5eb6de531c2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-pr100267-1.c
@@ -0,0 +1,66 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vpexpandd\[ \\t\]+\[^\{\n\]*\\(\[^\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpexpandd\[ \\t\]+\[^\{\n\(]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpexpandq\[ \\t\]+\[^\{\n\]*\\(\[^\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpexpandq\[ \\t\]+\[^\{\n\(]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vexpandpd\[ \\t\]+\[^\{\n\]*\\(\[^\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vexpandpd\[ \\t\]+\[^\{\n\(]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vexpandps\[ \\t\]+\[^\{\n\]*\\(\[^\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vexpandps\[ \\t\]+\[^\{\n\(]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+/* { dg-final { scan-assembler-times "vmov\[a-z0-9\]*\[ \\t\]+\[^\{\n\]*\\(\[^\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 4 } } */
+/* { dg-final { scan-assembler-times "vmov\[a-z0-9\]*\[ \\t\]+\[^\{\n\(]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 4 } } */
+#include <immintrin.h>
+
+int *pi32;
+long long *pi64;
+double *pd;
+float *pf;
+volatile __m512i xi32, xi64;
+volatile __m512d xd;
+volatile __m512 xf;
+
+void extern
+avx512f_test (void)
+{
+ xi32 = _mm512_mask_expand_epi32 (xi32, 0, xi32);
+ xi32 = _mm512_mask_expand_epi32 (xi32, -1, xi32);
+ xi32 = _mm512_mask_expand_epi32 (xi32, (1 << 8) - 1, xi32);
+ xi32 = _mm512_mask_expand_epi32 (xi32, (1 << 8) + 1, xi32);
+
+ xi32 = _mm512_mask_expandloadu_epi32 (xi32, 0, pi32);
+ xi32 = _mm512_mask_expandloadu_epi32 (xi32, (1 << 16) - 1, pi32);
+ xi32 = _mm512_mask_expandloadu_epi32 (xi32, (1 << 6) - 1, pi32);
+ xi32 = _mm512_mask_expandloadu_epi32 (xi32, (1 << 6) + 3, pi32);
+
+ xi64 = _mm512_mask_expand_epi64 (xi64, 0, xi64);
+ xi64 = _mm512_mask_expand_epi64 (xi64, -1, xi64);
+ xi64 = _mm512_mask_expand_epi64 (xi64, (1 << 3) - 1, xi64);
+ xi64 = _mm512_mask_expand_epi64 (xi64, (1 << 3) + 2, xi64);
+
+ xi64 = _mm512_mask_expandloadu_epi64 (xi64, 0, pi64);
+ xi64 = _mm512_mask_expandloadu_epi64 (xi64, (1 << 8) - 1, pi64);
+ xi64 = _mm512_mask_expandloadu_epi64 (xi64, (1 << 7) - 1, pi64);
+ xi64 = _mm512_mask_expandloadu_epi64 (xi64, (1 << 7) + 7, pi64);
+
+ xf = _mm512_mask_expand_ps (xf, 0, xf);
+ xf = _mm512_mask_expand_ps (xf, (1 << 16) - 1, xf);
+ xf = _mm512_mask_expand_ps (xf, (1 << 15) - 1, xf);
+ xf = _mm512_mask_expand_ps (xf, (1 << 14) + 3, xf);
+
+ xf = _mm512_mask_expandloadu_ps (xf, 0, pf);
+ xf = _mm512_mask_expandloadu_ps (xf, -1, pf);
+ xf = _mm512_mask_expandloadu_ps (xf, (1 << 13) - 1, pf);
+ xf = _mm512_mask_expandloadu_ps (xf, (1 << 13) + 5, pf);
+
+ xd = _mm512_mask_expand_pd (xd, 0, xd);
+ xd = _mm512_mask_expand_pd (xd, (1 << 8) - 1, xd);
+ xd = _mm512_mask_expand_pd (xd, (1 << 4) - 1, xd);
+ xd = _mm512_mask_expand_pd (xd, (1 << 4) + 1, xd);
+
+ xd = _mm512_mask_expandloadu_pd (xd, 0, pd);
+ xd = _mm512_mask_expandloadu_pd (xd, -1, pd);
+ xd = _mm512_mask_expandloadu_pd (xd, (1 << 5) - 1, pd);
+ xd = _mm512_mask_expandloadu_pd (xd, (1 << 5), pd);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-pr100267-pd-2.c b/gcc/testsuite/gcc.target/i386/avx512f-pr100267-pd-2.c
new file mode 100644
index 00000000000..7d653e8e853
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-pr100267-pd-2.c
@@ -0,0 +1,76 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512f" } */
+/* { dg-require-effective-target avx512f } */
+
+#define AVX512F
+
+#include "avx512f-helper.h"
+
+#define SIZE (AVX512F_LEN / 64)
+#include "avx512f-mask-type.h"
+
+static void
+CALC (double *s, double *r, MASK_TYPE mask)
+{
+ int i, k;
+
+ for (i = 0, k = 0; i < SIZE; i++)
+ {
+ if (mask & (1 << i))
+ r[i] = s[k++];
+ }
+}
+
+void
+TEST (void)
+{
+ UNION_TYPE (AVX512F_LEN, d) s1, res2, res3, res4, res5, res6, res7, res8, res9;
+ MASK_TYPE mask = (1 << SIZE - 1) - 1;
+ double s2[SIZE];
+ double res_ref1[SIZE];
+ double res_ref2[SIZE];
+ int i, sign = 1;
+
+ for (i = 0; i < SIZE; i++)
+ {
+ s1.a[i] = 123.456 * (i + 200) * sign;
+ s2[i] = 789.012 * (i + 300) * sign;
+ res2.a[i] = DEFAULT_VALUE;
+ res6.a[i] = DEFAULT_VALUE;
+ sign = -sign;
+ }
+
+ res3.x = INTRINSIC (_mask_expand_pd) (res2.x, MASK_ALL_ONES, s1.x);
+ res4.x = INTRINSIC (_mask_expand_pd) (res2.x, 0, s1.x);
+ res5.x = INTRINSIC (_mask_expand_pd) (res2.x, mask, s1.x);
+
+ res7.x = INTRINSIC (_mask_expandloadu_pd) (res6.x, MASK_ALL_ONES, s2);
+ res8.x = INTRINSIC (_mask_expandloadu_pd) (res6.x, 0, s2);
+ res9.x = INTRINSIC (_mask_expandloadu_pd) (res6.x, mask, s2);
+
+
+ /* no mask is the same as all ones mask. */
+ CALC (s1.a, res_ref1, mask);
+ CALC (s2, res_ref2, mask);
+
+ if (UNION_CHECK (AVX512F_LEN, d) (res3, s1.a))
+ abort ();
+
+ if (UNION_CHECK (AVX512F_LEN, d) (res4, res2.a))
+ abort ();
+
+ MASK_MERGE (d) (res_ref1, mask, SIZE);
+ if (UNION_CHECK (AVX512F_LEN, d) (res5, res_ref1))
+ abort ();
+
+ if (UNION_CHECK (AVX512F_LEN, d) (res7, s2))
+ abort ();
+
+ if (UNION_CHECK (AVX512F_LEN, d) (res8, res6.a))
+ abort ();
+
+ MASK_MERGE (d) (res_ref2, mask, SIZE);
+ if (UNION_CHECK (AVX512F_LEN, d) (res9, res_ref2))
+ abort ();
+
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-pr100267-ps-2.c b/gcc/testsuite/gcc.target/i386/avx512f-pr100267-ps-2.c
new file mode 100644
index 00000000000..40f0dde0c88
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-pr100267-ps-2.c
@@ -0,0 +1,72 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512f" } */
+/* { dg-require-effective-target avx512f } */
+
+#define AVX512F
+
+#include "avx512f-helper.h"
+
+#define SIZE (AVX512F_LEN / 32)
+#include "avx512f-mask-type.h"
+
+static void
+CALC (float *s, float *r, MASK_TYPE mask)
+{
+ int i, k;
+
+ for (i = 0, k = 0; i < SIZE; i++)
+ {
+ if (mask & (1 << i))
+ r[i] = s[k++];
+ }
+}
+
+void
+TEST (void)
+{
+ UNION_TYPE (AVX512F_LEN, ) s1, res2, res3, res4, res5, res6, res7, res8, res9;
+ MASK_TYPE mask = (1 << SIZE - 1) - 1;
+ float s2[SIZE];
+ float res_ref1[SIZE];
+ float res_ref2[SIZE];
+ int i, sign = 1;
+
+ for (i = 0; i < SIZE; i++)
+ {
+ s1.a[i] = 123.456 * (i + 200) * sign;
+ s2[i] = 789.012 * (i + 300) * sign;
+ res2.a[i] = DEFAULT_VALUE;
+ res6.a[i] = DEFAULT_VALUE;
+ sign = -sign;
+ }
+
+ res3.x = INTRINSIC (_mask_expand_ps) (res2.x, MASK_ALL_ONES, s1.x);
+ res4.x = INTRINSIC (_mask_expand_ps) (res2.x, 0, s1.x);
+ res5.x = INTRINSIC (_mask_expand_ps) (res2.x, mask, s1.x);
+ res7.x = INTRINSIC (_mask_expandloadu_ps) (res6.x, MASK_ALL_ONES, s2);
+ res8.x = INTRINSIC (_mask_expandloadu_ps) (res6.x, 0, s2);
+ res9.x = INTRINSIC (_mask_expandloadu_ps) (res6.x, mask, s2);
+
+ CALC (s1.a, res_ref1, mask);
+ CALC (s2, res_ref2, mask);
+
+ if (UNION_CHECK (AVX512F_LEN, ) (res3, s1.a))
+ abort ();
+
+ if (UNION_CHECK (AVX512F_LEN, ) (res4, res2.a))
+ abort ();
+
+ MASK_MERGE () (res_ref1, mask, SIZE);
+ if (UNION_CHECK (AVX512F_LEN, ) (res5, res_ref1))
+ abort ();
+
+ if (UNION_CHECK (AVX512F_LEN, ) (res7, s2))
+ abort ();
+
+ if (UNION_CHECK (AVX512F_LEN, ) (res8, res6.a))
+ abort ();
+
+ MASK_MERGE () (res_ref2, mask, SIZE);
+ if (UNION_CHECK (AVX512F_LEN, ) (res9, res_ref2))
+ abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-pr100267-1.c b/gcc/testsuite/gcc.target/i386/avx512vl-pr100267-1.c
new file mode 100644
index 00000000000..9ffd381f6c4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512vl-pr100267-1.c
@@ -0,0 +1,122 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -mavx512vl -O2" } */
+/* { dg-final { scan-assembler-times "vpexpandd\[ \\t\]+\[^\{\n\]*\\(\[^\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpexpandd\[ \\t\]+\[^\{\n\(]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpexpandq\[ \\t\]+\[^\{\n\]*\\(\[^\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpexpandq\[ \\t\]+\[^\{\n\(]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vexpandpd\[ \\t\]+\[^\{\n\]*\\(\[^\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vexpandpd\[ \\t\]+\[^\{\n\(]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vexpandps\[ \\t\]+\[^\{\n\]*\\(\[^\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vexpandps\[ \\t\]+\[^\{\n\(]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+/* { dg-final { scan-assembler-times "vmov\[a-z0-9\]*\[ \\t\]+\[^\{\n\]*\\(\[^\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 4 } } */
+/* { dg-final { scan-assembler-times "(?:vmov\[a-z0-9\]*\[ \\t\]+\[^\{\n\(]*%ymm\[0-9\]+\{%k\[1-7\]\}|blend\[a-z]+\[ \\t\]+\[^\{\n\(]*%ymm\[0-9\]+)(?:\n|\[ \\t\]+#)" 4 } } */
+/* { dg-final { scan-assembler-times "vpexpandd\[ \\t\]+\[^\{\n\]*\\(\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpexpandd\[ \\t\]+\[^\{\n\(]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpexpandq\[ \\t\]+\[^\{\n\]*\\(\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpexpandq\[ \\t\]+\[^\{\n\(]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vexpandpd\[ \\t\]+\[^\{\n\]*\\(\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vexpandpd\[ \\t\]+\[^\{\n\(]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vexpandps\[ \\t\]+\[^\{\n\]*\\(\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vexpandps\[ \\t\]+\[^\{\n\(]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+/* { dg-final { scan-assembler-times "vmov\[a-z0-9\]*\[ \\t\]+\[^\{\n\]*\\(\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 4 } } */
+/* { dg-final { scan-assembler-times "(?:vmov\[a-z0-9\]*\[ \\t\]+\[^\{\n\(]*%xmm\[0-9\]+\{%k\[1-7\]\}|(?:blend\[a-z]+|movsd)\[ \\t\]+\[^\{\n\(]*%xmm\[0-9\]+)(?:\n|\[ \\t\]+#)" 4 } } */
+
+#include <immintrin.h>
+
+int *pi32;
+long long *pi64;
+double *pd;
+float *pf;
+volatile __m256i xi32, xi64;
+volatile __m256d xd;
+volatile __m256 xf;
+
+volatile __m128i xi32_128, xi64_128;
+volatile __m128d xd_128;
+volatile __m128 xf_128;
+
+void extern
+avx512vl_test (void)
+{
+ xi32 = _mm256_mask_expand_epi32 (xi32, 0, xi32);
+ xi32 = _mm256_mask_expand_epi32 (xi32, -1, xi32);
+ xi32 = _mm256_mask_expand_epi32 (xi32, (1 << 4) - 1, xi32);
+ xi32 = _mm256_mask_expand_epi32 (xi32, (1 << 4) + 1, xi32);
+
+ xi32 = _mm256_mask_expandloadu_epi32 (xi32, 0, pi32);
+ xi32 = _mm256_mask_expandloadu_epi32 (xi32, (1 << 8) - 1, pi32);
+ xi32 = _mm256_mask_expandloadu_epi32 (xi32, (1 << 6) - 1, pi32);
+ xi32 = _mm256_mask_expandloadu_epi32 (xi32, (1 << 6) + 3, pi32);
+
+ xi64 = _mm256_mask_expand_epi64 (xi64, 0, xi64);
+ xi64 = _mm256_mask_expand_epi64 (xi64, -1, xi64);
+ xi64 = _mm256_mask_expand_epi64 (xi64, (1 << 3) - 1, xi64);
+ xi64 = _mm256_mask_expand_epi64 (xi64, (1 << 3) + 2, xi64);
+
+ xi64 = _mm256_mask_expandloadu_epi64 (xi64, 0, pi64);
+ xi64 = _mm256_mask_expandloadu_epi64 (xi64, (1 << 4) - 1, pi64);
+ xi64 = _mm256_mask_expandloadu_epi64 (xi64, (1 << 2) - 1, pi64);
+ xi64 = _mm256_mask_expandloadu_epi64 (xi64, (1 << 2), pi64);
+
+ xf = _mm256_mask_expand_ps (xf, 0, xf);
+ xf = _mm256_mask_expand_ps (xf, (1 << 8) - 1, xf);
+ xf = _mm256_mask_expand_ps (xf, (1 << 6) - 1, xf);
+ xf = _mm256_mask_expand_ps (xf, (1 << 6) + 3, xf);
+
+ xf = _mm256_mask_expandloadu_ps (xf, 0, pf);
+ xf = _mm256_mask_expandloadu_ps (xf, -1, pf);
+ xf = _mm256_mask_expandloadu_ps (xf, (1 << 7) - 1, pf);
+ xf = _mm256_mask_expandloadu_ps (xf, (1 << 7) + 5, pf);
+
+ xd = _mm256_mask_expand_pd (xd, 0, xd);
+ xd = _mm256_mask_expand_pd (xd, (1 << 4) - 1, xd);
+ xd = _mm256_mask_expand_pd (xd, (1 << 2) - 1, xd);
+ xd = _mm256_mask_expand_pd (xd, (1 << 2), xd);
+
+ xd = _mm256_mask_expandloadu_pd (xd, 0, pd);
+ xd = _mm256_mask_expandloadu_pd (xd, -1, pd);
+ xd = _mm256_mask_expandloadu_pd (xd, (1 << 2) - 1, pd);
+ xd = _mm256_mask_expandloadu_pd (xd, (1 << 2), pd);
+
+ xi32_128 = _mm_mask_expand_epi32 (xi32_128, 0, xi32_128);
+ xi32_128 = _mm_mask_expand_epi32 (xi32_128, -1, xi32_128);
+ xi32_128 = _mm_mask_expand_epi32 (xi32_128, (1 << 3) - 1, xi32_128);
+ xi32_128 = _mm_mask_expand_epi32 (xi32_128, (1 << 3) + 1, xi32_128);
+
+ xi32_128 = _mm_mask_expandloadu_epi32 (xi32_128, 0, pi32);
+ xi32_128 = _mm_mask_expandloadu_epi32 (xi32_128, (1 << 4) - 1, pi32);
+ xi32_128 = _mm_mask_expandloadu_epi32 (xi32_128, (1 << 2) - 1, pi32);
+ xi32_128 = _mm_mask_expandloadu_epi32 (xi32_128, (1 << 1) + 3, pi32);
+
+ xi64_128 = _mm_mask_expand_epi64 (xi64_128, 0, xi64_128);
+ xi64_128 = _mm_mask_expand_epi64 (xi64_128, -1, xi64_128);
+ xi64_128 = _mm_mask_expand_epi64 (xi64_128, (1 << 1) - 1, xi64_128);
+ xi64_128 = _mm_mask_expand_epi64 (xi64_128, 2, xi64_128);
+
+ xi64_128 = _mm_mask_expandloadu_epi64 (xi64_128, 0, pi64);
+ xi64_128 = _mm_mask_expandloadu_epi64 (xi64_128, 3, pi64);
+ xi64_128 = _mm_mask_expandloadu_epi64 (xi64_128, 1, pi64);
+ xi64_128 = _mm_mask_expandloadu_epi64 (xi64_128, 2, pi64);
+
+ xf_128 = _mm_mask_expand_ps (xf_128, 0, xf_128);
+ xf_128 = _mm_mask_expand_ps (xf_128, (1 << 4) - 1, xf_128);
+ xf_128 = _mm_mask_expand_ps (xf_128, (1 << 3) - 1, xf_128);
+ xf_128 = _mm_mask_expand_ps (xf_128, (1 << 2), xf_128);
+
+ xf_128 = _mm_mask_expandloadu_ps (xf_128, 0, pf);
+ xf_128 = _mm_mask_expandloadu_ps (xf_128, -1, pf);
+ xf_128 = _mm_mask_expandloadu_ps (xf_128, (1 << 3) - 1, pf);
+ xf_128 = _mm_mask_expandloadu_ps (xf_128, (1 << 1), pf);
+
+ xd_128 = _mm_mask_expand_pd (xd_128, 0, xd_128);
+ xd_128 = _mm_mask_expand_pd (xd_128, (1 << 2) - 1, xd_128);
+ xd_128 = _mm_mask_expand_pd (xd_128, 1, xd_128);
+ xd_128 = _mm_mask_expand_pd (xd_128, 2, xd_128);
+
+ xd_128 = _mm_mask_expandloadu_pd (xd_128, 0, pd);
+ xd_128 = _mm_mask_expandloadu_pd (xd_128, -1, pd);
+ xd_128 = _mm_mask_expandloadu_pd (xd_128, 1, pd);
+ xd_128 = _mm_mask_expandloadu_pd (xd_128, 2, pd);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-pr100267-pd-2.c b/gcc/testsuite/gcc.target/i386/avx512vl-pr100267-pd-2.c
new file mode 100644
index 00000000000..e5f30374b51
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512vl-pr100267-pd-2.c
@@ -0,0 +1,15 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512vl" } */
+/* { dg-require-effective-target avx512vl } */
+
+#define AVX512VL
+#define AVX512F_LEN 256
+#define AVX512F_LEN_HALF 128
+#include "avx512f-pr100267-pd-2.c"
+
+#undef AVX512F_LEN
+#undef AVX512F_LEN_HALF
+
+#define AVX512F_LEN 128
+#define AVX512F_LEN_HALF 128
+#include "avx512f-pr100267-pd-2.c"
diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-pr100267-ps-2.c b/gcc/testsuite/gcc.target/i386/avx512vl-pr100267-ps-2.c
new file mode 100644
index 00000000000..0149a8aaa55
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512vl-pr100267-ps-2.c
@@ -0,0 +1,15 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512vl" } */
+/* { dg-require-effective-target avx512vl } */
+
+#define AVX512VL
+#define AVX512F_LEN 256
+#define AVX512F_LEN_HALF 128
+#include "avx512f-pr100267-ps-2.c"
+
+#undef AVX512F_LEN
+#undef AVX512F_LEN_HALF
+
+#define AVX512F_LEN 128
+#define AVX512F_LEN_HALF 128
+#include "avx512f-pr100267-ps-2.c"
diff --git a/gcc/testsuite/gcc.target/i386/avx512vlbw-pr100267-1.c b/gcc/testsuite/gcc.target/i386/avx512vlbw-pr100267-1.c
new file mode 100644
index 00000000000..135dbd7577e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512vlbw-pr100267-1.c
@@ -0,0 +1,66 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -mavx512vbmi2 -mavx512vl -O2" } */
+/* { dg-final { scan-assembler-times "vpexpandb\[ \\t\]+\[^\{\n\]*\\(\[^\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpexpandb\[ \\t\]+\[^\{\n\(]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpexpandw\[ \\t\]+\[^\{\n\]*\\(\[^\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpexpandw\[ \\t\]+\[^\{\n\(]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+/* { dg-final { scan-assembler-times "vmov\[a-z0-9\]*\[ \\t\]+\[^\{\n\]*\\(\[^\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vmov\[a-z0-9\]*\[ \\t\]+\[^\{\n\(]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vpexpandb\[ \\t\]+\[^\{\n\]*\\(\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpexpandb\[ \\t\]+\[^\{\n\(]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpexpandw\[ \\t\]+\[^\{\n\]*\\(\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpexpandw\[ \\t\]+\[^\{\n\(]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+/* { dg-final { scan-assembler-times "vmov\[a-z0-9\]*\[ \\t\]+\[^\{\n\]*\\(\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "(?:vmov\[a-z0-9\]*\[ \\t\]+\[^\{\n\(]*%xmm\[0-9\]+\{%k\[1-7\]\}|blend\[a-z]*\[ \\t\]+\[^\{\n\(]*%xmm\[0-9\]+)(?:\n|\[ \\t\]+#)" 2 } } */
+#include <immintrin.h>
+
+char *pi8;
+short *pi16;
+volatile __m256i xi16, xi8;
+volatile __m128i xi16_xmm, xi8_xmm;
+
+void extern
+avx512f_test (void)
+{
+ xi8 = _mm256_mask_expand_epi8 (xi8, 0, xi8);
+ xi8 = _mm256_mask_expand_epi8 (xi8, -1, xi8);
+ xi8 = _mm256_mask_expand_epi8 (xi8, (1 << 30) - 1, xi8);
+ xi8 = _mm256_mask_expand_epi8 (xi8, (1 << 16) + 1, xi8);
+
+ xi8 = _mm256_mask_expandloadu_epi8 (xi8, 0, pi8);
+ xi8 = _mm256_mask_expandloadu_epi8 (xi8, -1, pi8);
+ xi8 = _mm256_mask_expandloadu_epi8 (xi8, (1 << 28) - 1, pi8);
+ xi8 = _mm256_mask_expandloadu_epi8 (xi8, (1 << 15) + 3, pi8);
+
+ xi16 = _mm256_mask_expand_epi16 (xi16, 0, xi16);
+ xi16 = _mm256_mask_expand_epi16 (xi16, -1, xi16);
+ xi16 = _mm256_mask_expand_epi16 (xi16, (1 << 15) - 1, xi16);
+ xi16 = _mm256_mask_expand_epi16 (xi16, (1 << 14) + 2, xi16);
+
+ xi16 = _mm256_mask_expandloadu_epi16 (xi16, 0, pi16);
+ xi16 = _mm256_mask_expandloadu_epi16 (xi16, (1 << 16) - 1, pi16);
+ xi16 = _mm256_mask_expandloadu_epi16 (xi16, (1 << 14) - 1, pi16);
+ xi16 = _mm256_mask_expandloadu_epi16 (xi16, (1 << 13) + 7, pi16);
+
+ xi8_xmm = _mm_mask_expand_epi8 (xi8_xmm, 0, xi8_xmm);
+ xi8_xmm = _mm_mask_expand_epi8 (xi8_xmm, -1, xi8_xmm);
+ xi8_xmm = _mm_mask_expand_epi8 (xi8_xmm, (1 << 13) - 1, xi8_xmm);
+ xi8_xmm = _mm_mask_expand_epi8 (xi8_xmm, (1 << 12) + 1, xi8_xmm);
+
+ xi8_xmm = _mm_mask_expandloadu_epi8 (xi8_xmm, 0, pi8);
+ xi8_xmm = _mm_mask_expandloadu_epi8 (xi8_xmm, (1 << 16) - 1, pi8);
+ xi8_xmm = _mm_mask_expandloadu_epi8 (xi8_xmm, (1 << 12) - 1, pi8);
+ xi8_xmm = _mm_mask_expandloadu_epi8 (xi8_xmm, (1 << 11) + 3, pi8);
+
+ xi16_xmm = _mm_mask_expand_epi16 (xi16_xmm, 0, xi16_xmm);
+ xi16_xmm = _mm_mask_expand_epi16 (xi16_xmm, -1, xi16_xmm);
+ xi16_xmm = _mm_mask_expand_epi16 (xi16_xmm, (1 << 7) - 1, xi16_xmm);
+ xi16_xmm = _mm_mask_expand_epi16 (xi16_xmm, (1 << 4) + 2, xi16_xmm);
+
+ xi16_xmm = _mm_mask_expandloadu_epi16 (xi16_xmm, 0, pi16);
+ xi16_xmm = _mm_mask_expandloadu_epi16 (xi16_xmm, (1 << 8) - 1, pi16);
+ xi16_xmm = _mm_mask_expandloadu_epi16 (xi16_xmm, (1 << 3) - 1, pi16);
+ xi16_xmm = _mm_mask_expandloadu_epi16 (xi16_xmm, (1 << 6) + 7, pi16);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512vlbw-pr100267-b-2.c b/gcc/testsuite/gcc.target/i386/avx512vlbw-pr100267-b-2.c
new file mode 100644
index 00000000000..d54e8033a25
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512vlbw-pr100267-b-2.c
@@ -0,0 +1,16 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512vl -mavx512vbmi2 -mavx512bw" } */
+/* { dg-require-effective-target avx512vl } */
+/* { dg-require-effective-target avx512vbmi2 } */
+
+#define AVX512VL
+#define AVX512F_LEN 256
+#define AVX512F_LEN_HALF 128
+#include "avx512bw-pr100267-b-2.c"
+
+#undef AVX512F_LEN
+#undef AVX512F_LEN_HALF
+
+#define AVX512F_LEN 128
+#define AVX512F_LEN_HALF 128
+#include "avx512bw-pr100267-b-2.c"
diff --git a/gcc/testsuite/gcc.target/i386/avx512vlbw-pr100267-d-2.c b/gcc/testsuite/gcc.target/i386/avx512vlbw-pr100267-d-2.c
new file mode 100644
index 00000000000..1e604cfb1aa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512vlbw-pr100267-d-2.c
@@ -0,0 +1,15 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512vl" } */
+/* { dg-require-effective-target avx512vl } */
+
+#define AVX512VL
+#define AVX512F_LEN 256
+#define AVX512F_LEN_HALF 128
+#include "avx512bw-pr100267-d-2.c"
+
+#undef AVX512F_LEN
+#undef AVX512F_LEN_HALF
+
+#define AVX512F_LEN 128
+#define AVX512F_LEN_HALF 128
+#include "avx512bw-pr100267-d-2.c"
diff --git a/gcc/testsuite/gcc.target/i386/avx512vlbw-pr100267-q-2.c b/gcc/testsuite/gcc.target/i386/avx512vlbw-pr100267-q-2.c
new file mode 100644
index 00000000000..1f9fe65ad16
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512vlbw-pr100267-q-2.c
@@ -0,0 +1,15 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512vl" } */
+/* { dg-require-effective-target avx512vl } */
+
+#define AVX512VL
+#define AVX512F_LEN 256
+#define AVX512F_LEN_HALF 128
+#include "avx512bw-pr100267-q-2.c"
+
+#undef AVX512F_LEN
+#undef AVX512F_LEN_HALF
+
+#define AVX512F_LEN 128
+#define AVX512F_LEN_HALF 128
+#include "avx512bw-pr100267-q-2.c"
diff --git a/gcc/testsuite/gcc.target/i386/avx512vlbw-pr100267-w-2.c b/gcc/testsuite/gcc.target/i386/avx512vlbw-pr100267-w-2.c
new file mode 100644
index 00000000000..a46ca78a621
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512vlbw-pr100267-w-2.c
@@ -0,0 +1,16 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512vl -mavx512vbmi2 -mavx512bw" } */
+/* { dg-require-effective-target avx512vl } */
+/* { dg-require-effective-target avx512vbmi2 } */
+
+#define AVX512VL
+#define AVX512F_LEN 256
+#define AVX512F_LEN_HALF 128
+#include "avx512bw-pr100267-w-2.c"
+
+#undef AVX512F_LEN
+#undef AVX512F_LEN_HALF
+
+#define AVX512F_LEN 128
+#define AVX512F_LEN_HALF 128
+#include "avx512bw-pr100267-w-2.c"
--
2.18.1
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH][AVX512] Optimize vpexpand* to mask mov when mask have all ones in it's lower part (including 0 and -1).
2021-04-30 4:49 ` [PATCH][AVX512] Optimize vpexpand* to mask mov when mask have all ones in it's lower part (including 0 and -1) Hongtao Liu
@ 2021-05-12 7:28 ` Hongtao Liu
2021-06-21 11:46 ` Hongtao Liu
0 siblings, 1 reply; 6+ messages in thread
From: Hongtao Liu @ 2021-05-12 7:28 UTC (permalink / raw)
To: GCC Patches
ping
On Fri, Apr 30, 2021 at 12:49 PM Hongtao Liu <crazylht@gmail.com> wrote:
>
> Hi:
> For v{,p}expand* When mask is 0, -1, or has all all one bits in its
> lower part, it can be optimized to simple mov or mask mov.
> Bootstrapped and regtested on x86_64-linux-gnu{-m32,} and
> x86_64-linux-gnu{m32\ -march=cascadelake,-m64\ -march=cascadelake},
>
> gcc/ChangeLog:
>
> * config/i386/i386-builtin.def (BDESC): Adjust builtin name.
> * config/i386/sse.md (<avx512>_expand<mode>_mask): Rename to ..
> (expand<mode>_mask): this ..
> (*expand<mode>_mask): New pre_reload splitter to transform
> * v{,p}expand* to vmov* when mask is zero, all ones, or has
> all ones in its lower part, otherwise still generate v{,p}expand*.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/avx512bw-pr100267-1.c: New test.
> * gcc.target/i386/avx512bw-pr100267-b-2.c: New test.
> * gcc.target/i386/avx512bw-pr100267-d-2.c: New test.
> * gcc.target/i386/avx512bw-pr100267-q-2.c: New test.
> * gcc.target/i386/avx512bw-pr100267-w-2.c: New test.
> * gcc.target/i386/avx512f-pr100267-1.c: New test.
> * gcc.target/i386/avx512f-pr100267-pd-2.c: New test.
> * gcc.target/i386/avx512f-pr100267-ps-2.c: New test.
> * gcc.target/i386/avx512vl-pr100267-1.c: New test.
> * gcc.target/i386/avx512vl-pr100267-pd-2.c: New test.
> * gcc.target/i386/avx512vl-pr100267-ps-2.c: New test.
> * gcc.target/i386/avx512vlbw-pr100267-1.c: New test.
> * gcc.target/i386/avx512vlbw-pr100267-b-2.c: New test.
> * gcc.target/i386/avx512vlbw-pr100267-d-2.c: New test.
> * gcc.target/i386/avx512vlbw-pr100267-q-2.c: New test.
> * gcc.target/i386/avx512vlbw-pr100267-w-2.c: New test.
--
BR,
Hongtao
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH][AVX512] Fix ICE for vpexpand*.
2021-04-30 4:42 [PATCH][AVX512] Fix ICE for vpexpand* Hongtao Liu
2021-04-30 4:49 ` [PATCH][AVX512] Optimize vpexpand* to mask mov when mask have all ones in it's lower part (including 0 and -1) Hongtao Liu
@ 2021-05-12 7:29 ` Hongtao Liu
2021-06-21 11:24 ` Hongtao Liu
1 sibling, 1 reply; 6+ messages in thread
From: Hongtao Liu @ 2021-05-12 7:29 UTC (permalink / raw)
To: GCC Patches
ping
On Fri, Apr 30, 2021 at 12:42 PM Hongtao Liu <crazylht@gmail.com> wrote:
>
> Hi:
> This patch is to fix ice which was introduced by my
> r11-5696-g35c4c67e6c534ef3d6ba7a7752ab7e0fbc91755b.
> Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
> Ok for trunk and backport to GCC11?
>
> gcc/ChangeLog
>
> PR target/100310
> * config/i386/i386-expand.c
> (ix86_expand_special_args_builtin): Keep constm1_operand only
> if it satisfies insn's operand predicate.
>
> gcc/testsuite/ChangeLog
>
> PR target/100310
> * gcc.target/i386/pr100310.c: New test.
>
> Add test
> ---
> gcc/config/i386/i386-expand.c | 5 +++--
> gcc/testsuite/gcc.target/i386/pr100310.c | 12 ++++++++++++
> 2 files changed, 15 insertions(+), 2 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/i386/pr100310.c
>
> diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
> index 516440eb5c1..b2bb2b1e3a1 100644
> --- a/gcc/config/i386/i386-expand.c
> +++ b/gcc/config/i386/i386-expand.c
> @@ -10862,11 +10862,12 @@ ix86_expand_special_args_builtin (const
> struct builtin_description *d,
>
> op = fixup_modeless_constant (op, mode);
>
> - /* NB: 3-operands load implied it's a mask load,
> + /* NB: 3-operands load implied it's a mask load or v{p}expand*,
> and that mask operand shoud be at the end.
> Keep all-ones mask which would be simplified by the expander. */
> if (nargs == 3 && i == 2 && klass == load
> - && constm1_operand (op, mode))
> + && constm1_operand (op, mode)
> + && insn_p->operand[i].predicate (op, mode))
> ;
> else if (GET_MODE (op) == mode || GET_MODE (op) == VOIDmode)
> op = copy_to_mode_reg (mode, op);
> diff --git a/gcc/testsuite/gcc.target/i386/pr100310.c
> b/gcc/testsuite/gcc.target/i386/pr100310.c
> new file mode 100644
> index 00000000000..54ace18531b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr100310.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mavx512f -O2" } */
> +#include <immintrin.h>
> +
> +double *p;
> +volatile __m512d x;
> +volatile __mmask8 m;
> +
> +void foo()
> +{
> + x = _mm512_mask_expandloadu_pd (x, 255, p);
> +}
> --
> 2.18.1
>
>
> --
> BR,
> Hongtao
--
BR,
Hongtao
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH][AVX512] Fix ICE for vpexpand*.
2021-05-12 7:29 ` [PATCH][AVX512] Fix ICE for vpexpand* Hongtao Liu
@ 2021-06-21 11:24 ` Hongtao Liu
0 siblings, 0 replies; 6+ messages in thread
From: Hongtao Liu @ 2021-06-21 11:24 UTC (permalink / raw)
To: GCC Patches
[-- Attachment #1: Type: text/plain, Size: 2674 bytes --]
This is the patch I'm going to push to the trunk.
On Wed, May 12, 2021 at 3:29 PM Hongtao Liu <crazylht@gmail.com> wrote:
>
> ping
>
> On Fri, Apr 30, 2021 at 12:42 PM Hongtao Liu <crazylht@gmail.com> wrote:
> >
> > Hi:
> > This patch is to fix ice which was introduced by my
> > r11-5696-g35c4c67e6c534ef3d6ba7a7752ab7e0fbc91755b.
> > Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
> > Ok for trunk and backport to GCC11?
> >
> > gcc/ChangeLog
> >
> > PR target/100310
> > * config/i386/i386-expand.c
> > (ix86_expand_special_args_builtin): Keep constm1_operand only
> > if it satisfies insn's operand predicate.
> >
> > gcc/testsuite/ChangeLog
> >
> > PR target/100310
> > * gcc.target/i386/pr100310.c: New test.
> >
> > Add test
> > ---
> > gcc/config/i386/i386-expand.c | 5 +++--
> > gcc/testsuite/gcc.target/i386/pr100310.c | 12 ++++++++++++
> > 2 files changed, 15 insertions(+), 2 deletions(-)
> > create mode 100644 gcc/testsuite/gcc.target/i386/pr100310.c
> >
> > diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
> > index 516440eb5c1..b2bb2b1e3a1 100644
> > --- a/gcc/config/i386/i386-expand.c
> > +++ b/gcc/config/i386/i386-expand.c
> > @@ -10862,11 +10862,12 @@ ix86_expand_special_args_builtin (const
> > struct builtin_description *d,
> >
> > op = fixup_modeless_constant (op, mode);
> >
> > - /* NB: 3-operands load implied it's a mask load,
> > + /* NB: 3-operands load implied it's a mask load or v{p}expand*,
> > and that mask operand shoud be at the end.
> > Keep all-ones mask which would be simplified by the expander. */
> > if (nargs == 3 && i == 2 && klass == load
> > - && constm1_operand (op, mode))
> > + && constm1_operand (op, mode)
> > + && insn_p->operand[i].predicate (op, mode))
> > ;
> > else if (GET_MODE (op) == mode || GET_MODE (op) == VOIDmode)
> > op = copy_to_mode_reg (mode, op);
> > diff --git a/gcc/testsuite/gcc.target/i386/pr100310.c
> > b/gcc/testsuite/gcc.target/i386/pr100310.c
> > new file mode 100644
> > index 00000000000..54ace18531b
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr100310.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-mavx512f -O2" } */
> > +#include <immintrin.h>
> > +
> > +double *p;
> > +volatile __m512d x;
> > +volatile __mmask8 m;
> > +
> > +void foo()
> > +{
> > + x = _mm512_mask_expandloadu_pd (x, 255, p);
> > +}
> > --
> > 2.18.1
> >
> >
> > --
> > BR,
> > Hongtao
>
>
>
> --
> BR,
> Hongtao
--
BR,
Hongtao
[-- Attachment #2: 0001-i386-Fix-ICE-for-vpexpand.patch --]
[-- Type: text/x-patch, Size: 2006 bytes --]
From 274325ebab87bd56484a6a55cfeb358dc5189263 Mon Sep 17 00:00:00 2001
From: liuhongt <hongtao.liu@intel.com>
Date: Wed, 28 Apr 2021 14:52:59 +0800
Subject: [PATCH 1/2] [i386] Fix ICE for vpexpand*.
gcc/ChangeLog
PR target/100310
* config/i386/i386-expand.c
(ix86_expand_special_args_builtin): Keep constm1_operand only
if it satisfies insn's operand predicate.
gcc/testsuite/ChangeLog
PR target/100310
* gcc.target/i386/pr100310.c: New test.
Add test
---
gcc/config/i386/i386-expand.c | 5 +++--
gcc/testsuite/gcc.target/i386/pr100310.c | 12 ++++++++++++
2 files changed, 15 insertions(+), 2 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/i386/pr100310.c
diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 8f4e4e4d884..cc2eaeed8df 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -10969,11 +10969,12 @@ ix86_expand_special_args_builtin (const struct builtin_description *d,
op = fixup_modeless_constant (op, mode);
- /* NB: 3-operands load implied it's a mask load,
+ /* NB: 3-operands load implied it's a mask load or v{p}expand*,
and that mask operand shoud be at the end.
Keep all-ones mask which would be simplified by the expander. */
if (nargs == 3 && i == 2 && klass == load
- && constm1_operand (op, mode))
+ && constm1_operand (op, mode)
+ && insn_p->operand[i].predicate (op, mode))
;
else if (GET_MODE (op) == mode || GET_MODE (op) == VOIDmode)
op = copy_to_mode_reg (mode, op);
diff --git a/gcc/testsuite/gcc.target/i386/pr100310.c b/gcc/testsuite/gcc.target/i386/pr100310.c
new file mode 100644
index 00000000000..54ace18531b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr100310.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+#include <immintrin.h>
+
+double *p;
+volatile __m512d x;
+volatile __mmask8 m;
+
+void foo()
+{
+ x = _mm512_mask_expandloadu_pd (x, 255, p);
+}
--
2.18.1
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH][AVX512] Optimize vpexpand* to mask mov when mask have all ones in it's lower part (including 0 and -1).
2021-05-12 7:28 ` Hongtao Liu
@ 2021-06-21 11:46 ` Hongtao Liu
0 siblings, 0 replies; 6+ messages in thread
From: Hongtao Liu @ 2021-06-21 11:46 UTC (permalink / raw)
To: GCC Patches
[-- Attachment #1: Type: text/plain, Size: 2060 bytes --]
This is the patch I'm going to push to the trunk.
On Wed, May 12, 2021 at 3:28 PM Hongtao Liu <crazylht@gmail.com> wrote:
>
> ping
>
> On Fri, Apr 30, 2021 at 12:49 PM Hongtao Liu <crazylht@gmail.com> wrote:
> >
> > Hi:
> > For v{,p}expand* When mask is 0, -1, or has all all one bits in its
> > lower part, it can be optimized to simple mov or mask mov.
> > Bootstrapped and regtested on x86_64-linux-gnu{-m32,} and
> > x86_64-linux-gnu{m32\ -march=cascadelake,-m64\ -march=cascadelake},
> >
> > gcc/ChangeLog:
> >
> > * config/i386/i386-builtin.def (BDESC): Adjust builtin name.
> > * config/i386/sse.md (<avx512>_expand<mode>_mask): Rename to ..
> > (expand<mode>_mask): this ..
> > (*expand<mode>_mask): New pre_reload splitter to transform
> > * v{,p}expand* to vmov* when mask is zero, all ones, or has
> > all ones in its lower part, otherwise still generate v{,p}expand*.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/i386/avx512bw-pr100267-1.c: New test.
> > * gcc.target/i386/avx512bw-pr100267-b-2.c: New test.
> > * gcc.target/i386/avx512bw-pr100267-d-2.c: New test.
> > * gcc.target/i386/avx512bw-pr100267-q-2.c: New test.
> > * gcc.target/i386/avx512bw-pr100267-w-2.c: New test.
> > * gcc.target/i386/avx512f-pr100267-1.c: New test.
> > * gcc.target/i386/avx512f-pr100267-pd-2.c: New test.
> > * gcc.target/i386/avx512f-pr100267-ps-2.c: New test.
> > * gcc.target/i386/avx512vl-pr100267-1.c: New test.
> > * gcc.target/i386/avx512vl-pr100267-pd-2.c: New test.
> > * gcc.target/i386/avx512vl-pr100267-ps-2.c: New test.
> > * gcc.target/i386/avx512vlbw-pr100267-1.c: New test.
> > * gcc.target/i386/avx512vlbw-pr100267-b-2.c: New test.
> > * gcc.target/i386/avx512vlbw-pr100267-d-2.c: New test.
> > * gcc.target/i386/avx512vlbw-pr100267-q-2.c: New test.
> > * gcc.target/i386/avx512vlbw-pr100267-w-2.c: New test.
>
>
>
> --
> BR,
> Hongtao
--
BR,
Hongtao
[-- Attachment #2: 0002-i386-Optimize-vpexpand-to-mask-mov-when-mask-have-al.patch --]
[-- Type: text/x-patch, Size: 56784 bytes --]
From 17e8b8e85da9d3a2bcacc108615a307ae04d67f3 Mon Sep 17 00:00:00 2001
From: liuhongt <hongtao.liu@intel.com>
Date: Thu, 29 Apr 2021 18:27:09 +0800
Subject: [PATCH 2/2] [i386] Optimize vpexpand* to mask mov when mask have all
ones in it's lower part (including 0 and -1).
gcc/ChangeLog:
* config/i386/i386-builtin.def (BDESC): Adjust builtin name.
* config/i386/sse.md (<avx512>_expand<mode>_mask): Rename to ..
(expand<mode>_mask): this ..
(*expand<mode>_mask): New pre_reload splitter to transform
v{,p}expand* to vmov* when mask is zero, all ones, or has all
ones in it's lower part, otherwise still generate
v{,p}expand*.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx512bw-pr100267-1.c: New test.
* gcc.target/i386/avx512bw-pr100267-b-2.c: New test.
* gcc.target/i386/avx512bw-pr100267-d-2.c: New test.
* gcc.target/i386/avx512bw-pr100267-q-2.c: New test.
* gcc.target/i386/avx512bw-pr100267-w-2.c: New test.
* gcc.target/i386/avx512f-pr100267-1.c: New test.
* gcc.target/i386/avx512f-pr100267-pd-2.c: New test.
* gcc.target/i386/avx512f-pr100267-ps-2.c: New test.
* gcc.target/i386/avx512vl-pr100267-1.c: New test.
* gcc.target/i386/avx512vl-pr100267-pd-2.c: New test.
* gcc.target/i386/avx512vl-pr100267-ps-2.c: New test.
* gcc.target/i386/avx512vlbw-pr100267-1.c: New test.
* gcc.target/i386/avx512vlbw-pr100267-b-2.c: New test.
* gcc.target/i386/avx512vlbw-pr100267-d-2.c: New test.
* gcc.target/i386/avx512vlbw-pr100267-q-2.c: New test.
* gcc.target/i386/avx512vlbw-pr100267-w-2.c: New test.
---
gcc/config/i386/i386-builtin.def | 48 +++----
gcc/config/i386/sse.md | 69 +++++++++-
.../gcc.target/i386/avx512bw-pr100267-1.c | 38 ++++++
.../gcc.target/i386/avx512bw-pr100267-b-2.c | 74 +++++++++++
.../gcc.target/i386/avx512bw-pr100267-d-2.c | 74 +++++++++++
.../gcc.target/i386/avx512bw-pr100267-q-2.c | 74 +++++++++++
.../gcc.target/i386/avx512bw-pr100267-w-2.c | 74 +++++++++++
.../gcc.target/i386/avx512f-pr100267-1.c | 66 ++++++++++
.../gcc.target/i386/avx512f-pr100267-pd-2.c | 76 +++++++++++
.../gcc.target/i386/avx512f-pr100267-ps-2.c | 72 +++++++++++
.../gcc.target/i386/avx512vl-pr100267-1.c | 122 ++++++++++++++++++
.../gcc.target/i386/avx512vl-pr100267-pd-2.c | 15 +++
.../gcc.target/i386/avx512vl-pr100267-ps-2.c | 15 +++
.../gcc.target/i386/avx512vlbw-pr100267-1.c | 66 ++++++++++
.../gcc.target/i386/avx512vlbw-pr100267-b-2.c | 16 +++
.../gcc.target/i386/avx512vlbw-pr100267-d-2.c | 15 +++
.../gcc.target/i386/avx512vlbw-pr100267-q-2.c | 15 +++
.../gcc.target/i386/avx512vlbw-pr100267-w-2.c | 16 +++
18 files changed, 920 insertions(+), 25 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/i386/avx512bw-pr100267-1.c
create mode 100644 gcc/testsuite/gcc.target/i386/avx512bw-pr100267-b-2.c
create mode 100644 gcc/testsuite/gcc.target/i386/avx512bw-pr100267-d-2.c
create mode 100644 gcc/testsuite/gcc.target/i386/avx512bw-pr100267-q-2.c
create mode 100644 gcc/testsuite/gcc.target/i386/avx512bw-pr100267-w-2.c
create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-pr100267-1.c
create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-pr100267-pd-2.c
create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-pr100267-ps-2.c
create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-pr100267-1.c
create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-pr100267-pd-2.c
create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-pr100267-ps-2.c
create mode 100644 gcc/testsuite/gcc.target/i386/avx512vlbw-pr100267-1.c
create mode 100644 gcc/testsuite/gcc.target/i386/avx512vlbw-pr100267-b-2.c
create mode 100644 gcc/testsuite/gcc.target/i386/avx512vlbw-pr100267-d-2.c
create mode 100644 gcc/testsuite/gcc.target/i386/avx512vlbw-pr100267-q-2.c
create mode 100644 gcc/testsuite/gcc.target/i386/avx512vlbw-pr100267-w-2.c
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 80c2a2c0294..31df3a613dd 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -204,13 +204,13 @@ BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_compressstorev16sf_mask, "__
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_compressstorev16si_mask, "__builtin_ia32_compressstoresi512_mask", IX86_BUILTIN_PCOMPRESSDSTORE512, UNKNOWN, (int) VOID_FTYPE_PV16SI_V16SI_UHI)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_compressstorev8df_mask, "__builtin_ia32_compressstoredf512_mask", IX86_BUILTIN_COMPRESSPDSTORE512, UNKNOWN, (int) VOID_FTYPE_PV8DF_V8DF_UQI)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_compressstorev8di_mask, "__builtin_ia32_compressstoredi512_mask", IX86_BUILTIN_PCOMPRESSQSTORE512, UNKNOWN, (int) VOID_FTYPE_PV8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_expandv16sf_mask, "__builtin_ia32_expandloadsf512_mask", IX86_BUILTIN_EXPANDPSLOAD512, UNKNOWN, (int) V16SF_FTYPE_PCV16SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_expandv16sf_mask, "__builtin_ia32_expandloadsf512_mask", IX86_BUILTIN_EXPANDPSLOAD512, UNKNOWN, (int) V16SF_FTYPE_PCV16SF_V16SF_UHI)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_expandv16sf_maskz, "__builtin_ia32_expandloadsf512_maskz", IX86_BUILTIN_EXPANDPSLOAD512Z, UNKNOWN, (int) V16SF_FTYPE_PCV16SF_V16SF_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_expandv16si_mask, "__builtin_ia32_expandloadsi512_mask", IX86_BUILTIN_PEXPANDDLOAD512, UNKNOWN, (int) V16SI_FTYPE_PCV16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_expandv16si_mask, "__builtin_ia32_expandloadsi512_mask", IX86_BUILTIN_PEXPANDDLOAD512, UNKNOWN, (int) V16SI_FTYPE_PCV16SI_V16SI_UHI)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_expandv16si_maskz, "__builtin_ia32_expandloadsi512_maskz", IX86_BUILTIN_PEXPANDDLOAD512Z, UNKNOWN, (int) V16SI_FTYPE_PCV16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_expandv8df_mask, "__builtin_ia32_expandloaddf512_mask", IX86_BUILTIN_EXPANDPDLOAD512, UNKNOWN, (int) V8DF_FTYPE_PCV8DF_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_expandv8df_mask, "__builtin_ia32_expandloaddf512_mask", IX86_BUILTIN_EXPANDPDLOAD512, UNKNOWN, (int) V8DF_FTYPE_PCV8DF_V8DF_UQI)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_expandv8df_maskz, "__builtin_ia32_expandloaddf512_maskz", IX86_BUILTIN_EXPANDPDLOAD512Z, UNKNOWN, (int) V8DF_FTYPE_PCV8DF_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_expandv8di_mask, "__builtin_ia32_expandloaddi512_mask", IX86_BUILTIN_PEXPANDQLOAD512, UNKNOWN, (int) V8DI_FTYPE_PCV8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_expandv8di_mask, "__builtin_ia32_expandloaddi512_mask", IX86_BUILTIN_PEXPANDQLOAD512, UNKNOWN, (int) V8DI_FTYPE_PCV8DI_V8DI_UQI)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_expandv8di_maskz, "__builtin_ia32_expandloaddi512_maskz", IX86_BUILTIN_PEXPANDQLOAD512Z, UNKNOWN, (int) V8DI_FTYPE_PCV8DI_V8DI_UQI)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_loadv16si_mask, "__builtin_ia32_loaddqusi512_mask", IX86_BUILTIN_LOADDQUSI512, UNKNOWN, (int) V16SI_FTYPE_PCINT_V16SI_UHI)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_loadv8di_mask, "__builtin_ia32_loaddqudi512_mask", IX86_BUILTIN_LOADDQUDI512, UNKNOWN, (int) V8DI_FTYPE_PCINT64_V8DI_UQI)
@@ -337,14 +337,14 @@ BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_compressstorev4di_mask, "_
BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_compressstorev2di_mask, "__builtin_ia32_compressstoredi128_mask", IX86_BUILTIN_PCOMPRESSQSTORE128, UNKNOWN, (int) VOID_FTYPE_PV2DI_V2DI_UQI)
BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_compressstorev8si_mask, "__builtin_ia32_compressstoresi256_mask", IX86_BUILTIN_PCOMPRESSDSTORE256, UNKNOWN, (int) VOID_FTYPE_PV8SI_V8SI_UQI)
BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_compressstorev4si_mask, "__builtin_ia32_compressstoresi128_mask", IX86_BUILTIN_PCOMPRESSDSTORE128, UNKNOWN, (int) VOID_FTYPE_PV4SI_V4SI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_expandv4df_mask, "__builtin_ia32_expandloaddf256_mask", IX86_BUILTIN_EXPANDPDLOAD256, UNKNOWN, (int) V4DF_FTYPE_PCV4DF_V4DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_expandv2df_mask, "__builtin_ia32_expandloaddf128_mask", IX86_BUILTIN_EXPANDPDLOAD128, UNKNOWN, (int) V2DF_FTYPE_PCV2DF_V2DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_expandv8sf_mask, "__builtin_ia32_expandloadsf256_mask", IX86_BUILTIN_EXPANDPSLOAD256, UNKNOWN, (int) V8SF_FTYPE_PCV8SF_V8SF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_expandv4sf_mask, "__builtin_ia32_expandloadsf128_mask", IX86_BUILTIN_EXPANDPSLOAD128, UNKNOWN, (int) V4SF_FTYPE_PCV4SF_V4SF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_expandv4di_mask, "__builtin_ia32_expandloaddi256_mask", IX86_BUILTIN_PEXPANDQLOAD256, UNKNOWN, (int) V4DI_FTYPE_PCV4DI_V4DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_expandv2di_mask, "__builtin_ia32_expandloaddi128_mask", IX86_BUILTIN_PEXPANDQLOAD128, UNKNOWN, (int) V2DI_FTYPE_PCV2DI_V2DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_expandv8si_mask, "__builtin_ia32_expandloadsi256_mask", IX86_BUILTIN_PEXPANDDLOAD256, UNKNOWN, (int) V8SI_FTYPE_PCV8SI_V8SI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_expandv4si_mask, "__builtin_ia32_expandloadsi128_mask", IX86_BUILTIN_PEXPANDDLOAD128, UNKNOWN, (int) V4SI_FTYPE_PCV4SI_V4SI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_expandv4df_mask, "__builtin_ia32_expandloaddf256_mask", IX86_BUILTIN_EXPANDPDLOAD256, UNKNOWN, (int) V4DF_FTYPE_PCV4DF_V4DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_expandv2df_mask, "__builtin_ia32_expandloaddf128_mask", IX86_BUILTIN_EXPANDPDLOAD128, UNKNOWN, (int) V2DF_FTYPE_PCV2DF_V2DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_expandv8sf_mask, "__builtin_ia32_expandloadsf256_mask", IX86_BUILTIN_EXPANDPSLOAD256, UNKNOWN, (int) V8SF_FTYPE_PCV8SF_V8SF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_expandv4sf_mask, "__builtin_ia32_expandloadsf128_mask", IX86_BUILTIN_EXPANDPSLOAD128, UNKNOWN, (int) V4SF_FTYPE_PCV4SF_V4SF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_expandv4di_mask, "__builtin_ia32_expandloaddi256_mask", IX86_BUILTIN_PEXPANDQLOAD256, UNKNOWN, (int) V4DI_FTYPE_PCV4DI_V4DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_expandv2di_mask, "__builtin_ia32_expandloaddi128_mask", IX86_BUILTIN_PEXPANDQLOAD128, UNKNOWN, (int) V2DI_FTYPE_PCV2DI_V2DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_expandv8si_mask, "__builtin_ia32_expandloadsi256_mask", IX86_BUILTIN_PEXPANDDLOAD256, UNKNOWN, (int) V8SI_FTYPE_PCV8SI_V8SI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_expandv4si_mask, "__builtin_ia32_expandloadsi128_mask", IX86_BUILTIN_PEXPANDDLOAD128, UNKNOWN, (int) V4SI_FTYPE_PCV4SI_V4SI_UQI)
BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_expandv4df_maskz, "__builtin_ia32_expandloaddf256_maskz", IX86_BUILTIN_EXPANDPDLOAD256Z, UNKNOWN, (int) V4DF_FTYPE_PCV4DF_V4DF_UQI)
BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_expandv2df_maskz, "__builtin_ia32_expandloaddf128_maskz", IX86_BUILTIN_EXPANDPDLOAD128Z, UNKNOWN, (int) V2DF_FTYPE_PCV2DF_V2DF_UQI)
BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_expandv8sf_maskz, "__builtin_ia32_expandloadsf256_maskz", IX86_BUILTIN_EXPANDPSLOAD256Z, UNKNOWN, (int) V8SF_FTYPE_PCV8SF_V8SF_UQI)
@@ -1342,9 +1342,9 @@ BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_floatv8siv8df2_mask, "__builtin_ia32
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vcvtps2ph512_mask, "__builtin_ia32_vcvtps2ph512_mask", IX86_BUILTIN_CVTPS2PH512, UNKNOWN, (int) V16HI_FTYPE_V16SF_INT_V16HI_UHI)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_ufloatv8siv8df2_mask, "__builtin_ia32_cvtudq2pd512_mask", IX86_BUILTIN_CVTUDQ2PD512, UNKNOWN, (int) V8DF_FTYPE_V8SI_V8DF_UQI)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_cvtusi2sd32, "__builtin_ia32_cvtusi2sd32", IX86_BUILTIN_CVTUSI2SD32, UNKNOWN, (int) V2DF_FTYPE_V2DF_UINT)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_expandv8df_mask, "__builtin_ia32_expanddf512_mask", IX86_BUILTIN_EXPANDPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_expandv8df_mask, "__builtin_ia32_expanddf512_mask", IX86_BUILTIN_EXPANDPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_UQI)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_expandv8df_maskz, "__builtin_ia32_expanddf512_maskz", IX86_BUILTIN_EXPANDPD512Z, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_expandv16sf_mask, "__builtin_ia32_expandsf512_mask", IX86_BUILTIN_EXPANDPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_expandv16sf_mask, "__builtin_ia32_expandsf512_mask", IX86_BUILTIN_EXPANDPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_UHI)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_expandv16sf_maskz, "__builtin_ia32_expandsf512_maskz", IX86_BUILTIN_EXPANDPS512Z, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_UHI)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vextractf32x4_mask, "__builtin_ia32_extractf32x4_mask", IX86_BUILTIN_EXTRACTF32X4, UNKNOWN, (int) V4SF_FTYPE_V16SF_INT_V4SF_UQI)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vextractf64x4_mask, "__builtin_ia32_extractf64x4_mask", IX86_BUILTIN_EXTRACTF64X4, UNKNOWN, (int) V4DF_FTYPE_V8DF_INT_V4DF_UQI)
@@ -1381,9 +1381,9 @@ BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_gtv16si3_mask, "__builtin_ia
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_gtv8di3_mask, "__builtin_ia32_pcmpgtq512_mask", IX86_BUILTIN_PCMPGTQ512_MASK, UNKNOWN, (int) UQI_FTYPE_V8DI_V8DI_UQI)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_compressv16si_mask, "__builtin_ia32_compresssi512_mask", IX86_BUILTIN_PCOMPRESSD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_compressv8di_mask, "__builtin_ia32_compressdi512_mask", IX86_BUILTIN_PCOMPRESSQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_expandv16si_mask, "__builtin_ia32_expandsi512_mask", IX86_BUILTIN_PEXPANDD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI)
+BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_expandv16si_mask, "__builtin_ia32_expandsi512_mask", IX86_BUILTIN_PEXPANDD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_expandv16si_maskz, "__builtin_ia32_expandsi512_maskz", IX86_BUILTIN_PEXPANDD512Z, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_UHI)
-BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_expandv8di_mask, "__builtin_ia32_expanddi512_mask", IX86_BUILTIN_PEXPANDQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_expandv8di_mask, "__builtin_ia32_expanddi512_mask", IX86_BUILTIN_PEXPANDQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_expandv8di_maskz, "__builtin_ia32_expanddi512_maskz", IX86_BUILTIN_PEXPANDQ512Z, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_UQI)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_smaxv16si3_mask, "__builtin_ia32_pmaxsd512_mask", IX86_BUILTIN_PMAXSD512, UNKNOWN, (int) V16SI_FTYPE_V16SI_V16SI_V16SI_UHI)
BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_smaxv8di3_mask, "__builtin_ia32_pmaxsq512_mask", IX86_BUILTIN_PMAXSQ512, UNKNOWN, (int) V8DI_FTYPE_V8DI_V8DI_V8DI_UQI)
@@ -2187,14 +2187,14 @@ BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_compressv4di_mask, "__buil
BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_compressv2di_mask, "__builtin_ia32_compressdi128_mask", IX86_BUILTIN_PCOMPRESSQ128, UNKNOWN, (int) V2DI_FTYPE_V2DI_V2DI_UQI)
BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_compressv8si_mask, "__builtin_ia32_compresssi256_mask", IX86_BUILTIN_PCOMPRESSD256, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_UQI)
BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_compressv4si_mask, "__builtin_ia32_compresssi128_mask", IX86_BUILTIN_PCOMPRESSD128, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_expandv4df_mask, "__builtin_ia32_expanddf256_mask", IX86_BUILTIN_EXPANDPD256, UNKNOWN, (int) V4DF_FTYPE_V4DF_V4DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_expandv2df_mask, "__builtin_ia32_expanddf128_mask", IX86_BUILTIN_EXPANDPD128, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_expandv8sf_mask, "__builtin_ia32_expandsf256_mask", IX86_BUILTIN_EXPANDPS256, UNKNOWN, (int) V8SF_FTYPE_V8SF_V8SF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_expandv4sf_mask, "__builtin_ia32_expandsf128_mask", IX86_BUILTIN_EXPANDPS128, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_expandv4di_mask, "__builtin_ia32_expanddi256_mask", IX86_BUILTIN_PEXPANDQ256, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_expandv2di_mask, "__builtin_ia32_expanddi128_mask", IX86_BUILTIN_PEXPANDQ128, UNKNOWN, (int) V2DI_FTYPE_V2DI_V2DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_expandv8si_mask, "__builtin_ia32_expandsi256_mask", IX86_BUILTIN_PEXPANDD256, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_expandv4si_mask, "__builtin_ia32_expandsi128_mask", IX86_BUILTIN_PEXPANDD128, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_expandv4df_mask, "__builtin_ia32_expanddf256_mask", IX86_BUILTIN_EXPANDPD256, UNKNOWN, (int) V4DF_FTYPE_V4DF_V4DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_expandv2df_mask, "__builtin_ia32_expanddf128_mask", IX86_BUILTIN_EXPANDPD128, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_expandv8sf_mask, "__builtin_ia32_expandsf256_mask", IX86_BUILTIN_EXPANDPS256, UNKNOWN, (int) V8SF_FTYPE_V8SF_V8SF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_expandv4sf_mask, "__builtin_ia32_expandsf128_mask", IX86_BUILTIN_EXPANDPS128, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_expandv4di_mask, "__builtin_ia32_expanddi256_mask", IX86_BUILTIN_PEXPANDQ256, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_expandv2di_mask, "__builtin_ia32_expanddi128_mask", IX86_BUILTIN_PEXPANDQ128, UNKNOWN, (int) V2DI_FTYPE_V2DI_V2DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_expandv8si_mask, "__builtin_ia32_expandsi256_mask", IX86_BUILTIN_PEXPANDD256, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_expandv4si_mask, "__builtin_ia32_expandsi128_mask", IX86_BUILTIN_PEXPANDD128, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_UQI)
BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_expandv4df_maskz, "__builtin_ia32_expanddf256_maskz", IX86_BUILTIN_EXPANDPD256Z, UNKNOWN, (int) V4DF_FTYPE_V4DF_V4DF_UQI)
BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_expandv2df_maskz, "__builtin_ia32_expanddf128_maskz", IX86_BUILTIN_EXPANDPD128Z, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_UQI)
BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_expandv8sf_maskz, "__builtin_ia32_expandsf256_maskz", IX86_BUILTIN_EXPANDPS256Z, UNKNOWN, (int) V8SF_FTYPE_V8SF_V8SF_UQI)
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 94296bc773b..f5f9403db44 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -699,6 +699,17 @@ (define_mode_iterator VI48F
(V4DI "TARGET_AVX512VL") (V4DF "TARGET_AVX512VL")
(V4SI "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")
(V2DI "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")])
+(define_mode_iterator VI12_VI48F_AVX512VLBW
+ [(V16SI "TARGET_AVX512F") (V16SF "TARGET_AVX512F")
+ (V8DI "TARGET_AVX512F") (V8DF "TARGET_AVX512F")
+ (V8SI "TARGET_AVX512VL") (V8SF "TARGET_AVX512VL")
+ (V4DI "TARGET_AVX512VL") (V4DF "TARGET_AVX512VL")
+ (V4SI "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")
+ (V2DI "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")
+ (V64QI "TARGET_AVX512BW") (V16QI "TARGET_AVX512VL")
+ (V32QI "TARGET_AVX512VL && TARGET_AVX512BW") (V32HI "TARGET_AVX512BW")
+ (V16HI "TARGET_AVX512VL") (V8HI "TARGET_AVX512VL")])
+
(define_mode_iterator VI48F_256 [V8SI V8SF V4DI V4DF])
(define_mode_iterator VF_AVX512
@@ -23009,7 +23020,7 @@ (define_expand "<avx512>_expand<mode>_maskz"
"TARGET_AVX512F"
"operands[2] = CONST0_RTX (<MODE>mode);")
-(define_insn "<avx512>_expand<mode>_mask"
+(define_insn "expand<mode>_mask"
[(set (match_operand:VI48F 0 "register_operand" "=v,v")
(unspec:VI48F
[(match_operand:VI48F 1 "nonimmediate_operand" "v,m")
@@ -23037,6 +23048,62 @@ (define_insn "expand<mode>_mask"
(set_attr "memory" "none,load")
(set_attr "mode" "<sseinsnmode>")])
+(define_insn_and_split "*expand<mode>_mask"
+ [(set (match_operand:VI12_VI48F_AVX512VLBW 0 "register_operand")
+ (unspec:VI12_VI48F_AVX512VLBW
+ [(match_operand:VI12_VI48F_AVX512VLBW 1 "nonimmediate_operand")
+ (match_operand:VI12_VI48F_AVX512VLBW 2 "nonimm_or_0_operand")
+ (match_operand 3 "const_int_operand")]
+ UNSPEC_EXPAND))]
+ "ix86_pre_reload_split ()
+ && (TARGET_AVX512VBMI2 || GET_MODE_SIZE (GET_MODE_INNER (<MODE>mode)) >= 4)"
+ "#"
+ "&& 1"
+ [(const_int 0)]
+{
+ unsigned HOST_WIDE_INT mask = INTVAL (operands[3]);
+ bool has_zero = false;
+ unsigned n = GET_MODE_NUNITS (<MODE>mode), i;
+ unsigned ones = 0;
+
+ /* If all ones bits is in mask's lower part,
+ get number of ones and assign it to ONES. */
+ for (i = 0; i != n; i++)
+ {
+ if ((mask & HOST_WIDE_INT_1U << i) && has_zero)
+ break;
+
+ /* Record first zero bit. */
+ if (!(mask & HOST_WIDE_INT_1U << i) && !has_zero)
+ {
+ has_zero = true;
+ ones = i;
+ }
+ }
+
+ if (!has_zero)
+ ones = n;
+
+ if (i != n || (ones != 0 && ones != n))
+ {
+ rtx reg = gen_reg_rtx (<avx512fmaskmode>mode);
+ emit_move_insn (reg, operands[3]);
+ enum insn_code icode;
+ if (i == n)
+ /* For masks with all one bits in it's lower part,
+ we can transform v{,p}expand* to vmovdq* with
+ mask operand. */
+ icode = CODE_FOR_<avx512>_load<mode>_mask;
+ else
+ icode = CODE_FOR_expand<mode>_mask;
+ emit_insn (GEN_FCN (icode) (operands[0], operands[1], operands[2], reg));
+ }
+ else
+ /* For ALL_MASK_ONES or CONST0_RTX mask, transform it to simple mov. */
+ emit_move_insn (operands[0], ones ? operands[1] : operands[2]);
+ DONE;
+})
+
(define_expand "expand<mode>_maskz"
[(set (match_operand:VI12_AVX512VLBW 0 "register_operand")
(unspec:VI12_AVX512VLBW
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-pr100267-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-pr100267-1.c
new file mode 100644
index 00000000000..ce83d63bc73
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-pr100267-1.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -mavx512vbmi2 -O2" } */
+/* { dg-final { scan-assembler-times "vpexpandb\[ \\t\]+\[^\{\n\]*\\(\[^\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpexpandb\[ \\t\]+\[^\{\n\(]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpexpandw\[ \\t\]+\[^\{\n\]*\\(\[^\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpexpandw\[ \\t\]+\[^\{\n\(]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+/* { dg-final { scan-assembler-times "vmov\[a-z0-9\]*\[ \\t\]+\[^\{\n\]*\\(\[^\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vmov\[a-z0-9\]*\[ \\t\]+\[^\{\n\(]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 2 } } */
+#include <immintrin.h>
+
+char *pi8;
+short *pi16;
+volatile __m512i xi16, xi8;
+
+void extern
+avx512f_test (void)
+{
+ xi8 = _mm512_mask_expand_epi8 (xi8, 0, xi8);
+ xi8 = _mm512_mask_expand_epi8 (xi8, -1, xi8);
+ xi8 = _mm512_mask_expand_epi8 (xi8, (1 << 8) - 1, xi8);
+ xi8 = _mm512_mask_expand_epi8 (xi8, (1 << 8) + 1, xi8);
+
+ xi8 = _mm512_mask_expandloadu_epi8 (xi8, 0, pi8);
+ xi8 = _mm512_mask_expandloadu_epi8 (xi8, -1, pi8);
+ xi8 = _mm512_mask_expandloadu_epi8 (xi8, (1 << 6) - 1, pi8);
+ xi8 = _mm512_mask_expandloadu_epi8 (xi8, (1 << 6) + 3, pi8);
+
+ xi16 = _mm512_mask_expand_epi16 (xi16, 0, xi16);
+ xi16 = _mm512_mask_expand_epi16 (xi16, -1, xi16);
+ xi16 = _mm512_mask_expand_epi16 (xi16, (1 << 3) - 1, xi16);
+ xi16 = _mm512_mask_expand_epi16 (xi16, (1 << 3) + 2, xi16);
+
+ xi16 = _mm512_mask_expandloadu_epi16 (xi16, 0, pi16);
+ xi16 = _mm512_mask_expandloadu_epi16 (xi16, -1, pi16);
+ xi16 = _mm512_mask_expandloadu_epi16 (xi16, (1 << 7) - 1, pi16);
+ xi16 = _mm512_mask_expandloadu_epi16 (xi16, (1 << 7) + 7, pi16);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-pr100267-b-2.c b/gcc/testsuite/gcc.target/i386/avx512bw-pr100267-b-2.c
new file mode 100644
index 00000000000..424b485a203
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-pr100267-b-2.c
@@ -0,0 +1,74 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512f -mavx512vbmi2 -mavx512bw" } */
+/* { dg-require-effective-target avx512f } */
+/* { dg-require-effective-target avx512vbmi2 } */
+
+#define AVX512F
+
+#define AVX512VBMI2
+#include "avx512f-helper.h"
+
+#define SIZE (AVX512F_LEN / 8)
+#include "avx512f-mask-type.h"
+
+static void
+CALC (char *s, char *r, MASK_TYPE mask)
+{
+ int i, k;
+
+ for (i = 0, k = 0; i < SIZE; i++)
+ {
+ if (mask & ((long long)1 << i))
+ r[i] = s[k++];
+ }
+}
+
+void
+TEST (void)
+{
+ UNION_TYPE (AVX512F_LEN, i_b) s1, res1, res2, res3, res4, res5, res6, res7, res8;
+ MASK_TYPE mask = (1 << (SIZE >> 2) - 1) - 1;
+ char s2[SIZE];
+ char res_ref1[SIZE];
+ char res_ref2[SIZE];
+ int i, sign = 1;
+
+ for (i = 0; i < SIZE; i++)
+ {
+ s1.a[i] = 12345 * (i + 200) * sign;
+ s2[i] = 67890 * (i + 300) * sign;
+ res1.a[i] = DEFAULT_VALUE;
+ res5.a[i] = DEFAULT_VALUE;
+ sign = -sign;
+ }
+
+ res2.x = INTRINSIC (_mask_expand_epi8) (res1.x, MASK_ALL_ONES, s1.x);
+ res3.x = INTRINSIC (_mask_expand_epi8) (res1.x, 0, s1.x);
+ res4.x = INTRINSIC (_mask_expand_epi8) (res1.x, mask, s1.x);
+ res6.x = INTRINSIC (_mask_expandloadu_epi8) (res5.x, MASK_ALL_ONES, s2);
+ res7.x = INTRINSIC (_mask_expandloadu_epi8) (res5.x, 0, s2);
+ res8.x = INTRINSIC (_mask_expandloadu_epi8) (res5.x, mask, s2);
+
+ CALC (s1.a, res_ref1, mask);
+ CALC (s2, res_ref2, mask);
+
+ if (UNION_CHECK (AVX512F_LEN, i_b) (res2, s1.a))
+ abort ();
+
+ if (UNION_CHECK (AVX512F_LEN, i_b) (res3, res1.a))
+ abort ();
+
+ MASK_MERGE (i_b) (res_ref1, mask, SIZE);
+ if (UNION_CHECK (AVX512F_LEN, i_b) (res4, res_ref1))
+ abort ();
+
+ if (UNION_CHECK (AVX512F_LEN, i_b) (res6, s2))
+ abort ();
+
+ if (UNION_CHECK (AVX512F_LEN, i_b) (res7, res5.a))
+ abort ();
+
+ MASK_MERGE (i_b) (res_ref2, mask, SIZE);
+ if (UNION_CHECK (AVX512F_LEN, i_b) (res8, res_ref2))
+ abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-pr100267-d-2.c b/gcc/testsuite/gcc.target/i386/avx512bw-pr100267-d-2.c
new file mode 100644
index 00000000000..24790b20cf1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-pr100267-d-2.c
@@ -0,0 +1,74 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512f -mavx512vbmi2 -mavx512bw" } */
+/* { dg-require-effective-target avx512f } */
+/* { dg-require-effective-target avx512vbmi2 } */
+
+#define AVX512F
+
+#define AVX512VBMI2
+#include "avx512f-helper.h"
+
+#define SIZE (AVX512F_LEN / 32)
+#include "avx512f-mask-type.h"
+
+static void
+CALC (int *s, int *r, MASK_TYPE mask)
+{
+ int i, k;
+
+ for (i = 0, k = 0; i < SIZE; i++)
+ {
+ if (mask & ((long long)1 << i))
+ r[i] = s[k++];
+ }
+}
+
+void
+TEST (void)
+{
+ UNION_TYPE (AVX512F_LEN, i_d) s1, res1, res2, res3, res4, res5, res6, res7, res8;
+ MASK_TYPE mask = (1 << SIZE - 1) - 1;
+ int s2[SIZE];
+ int res_ref1[SIZE];
+ int res_ref2[SIZE];
+ int i, sign = 1;
+
+ for (i = 0; i < SIZE; i++)
+ {
+ s1.a[i] = 12345 * (i + 200) * sign;
+ s2[i] = 67890 * (i + 300) * sign;
+ res1.a[i] = DEFAULT_VALUE;
+ res5.a[i] = DEFAULT_VALUE;
+ sign = -sign;
+ }
+
+ res2.x = INTRINSIC (_mask_expand_epi32) (res1.x, MASK_ALL_ONES, s1.x);
+ res3.x = INTRINSIC (_mask_expand_epi32) (res1.x, 0, s1.x);
+ res4.x = INTRINSIC (_mask_expand_epi32) (res1.x, mask, s1.x);
+ res6.x = INTRINSIC (_mask_expandloadu_epi32) (res5.x, MASK_ALL_ONES, s2);
+ res7.x = INTRINSIC (_mask_expandloadu_epi32) (res5.x, 0, s2);
+ res8.x = INTRINSIC (_mask_expandloadu_epi32) (res5.x, mask, s2);
+
+ CALC (s1.a, res_ref1, mask);
+ CALC (s2, res_ref2, mask);
+
+ if (UNION_CHECK (AVX512F_LEN, i_d) (res2, s1.a))
+ abort ();
+
+ if (UNION_CHECK (AVX512F_LEN, i_d) (res3, res1.a))
+ abort ();
+
+ MASK_MERGE (i_d) (res_ref1, mask, SIZE);
+ if (UNION_CHECK (AVX512F_LEN, i_d) (res4, res_ref1))
+ abort ();
+
+ if (UNION_CHECK (AVX512F_LEN, i_d) (res6, s2))
+ abort ();
+
+ if (UNION_CHECK (AVX512F_LEN, i_d) (res7, res5.a))
+ abort ();
+
+ MASK_MERGE (i_d) (res_ref2, mask, SIZE);
+ if (UNION_CHECK (AVX512F_LEN, i_d) (res8, res_ref2))
+ abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-pr100267-q-2.c b/gcc/testsuite/gcc.target/i386/avx512bw-pr100267-q-2.c
new file mode 100644
index 00000000000..119b50e6f79
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-pr100267-q-2.c
@@ -0,0 +1,74 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512f -mavx512vbmi2 -mavx512bw" } */
+/* { dg-require-effective-target avx512f } */
+/* { dg-require-effective-target avx512vbmi2 } */
+
+#define AVX512F
+
+#define AVX512VBMI2
+#include "avx512f-helper.h"
+
+#define SIZE (AVX512F_LEN / 64)
+#include "avx512f-mask-type.h"
+
+static void
+CALC (long long *s, long long *r, MASK_TYPE mask)
+{
+ int i, k;
+
+ for (i = 0, k = 0; i < SIZE; i++)
+ {
+ if (mask & ((long long)1 << i))
+ r[i] = s[k++];
+ }
+}
+
+void
+TEST (void)
+{
+ UNION_TYPE (AVX512F_LEN, i_q) s1, res1, res2, res3, res4, res5, res6, res7, res8;
+ MASK_TYPE mask = (1 << SIZE - 1) - 1;
+ long long s2[SIZE];
+ long long res_ref1[SIZE];
+ long long res_ref2[SIZE];
+ int i, sign = 1;
+
+ for (i = 0; i < SIZE; i++)
+ {
+ s1.a[i] = 12345 * (i + 200) * sign;
+ s2[i] = 67890 * (i + 300) * sign;
+ res1.a[i] = DEFAULT_VALUE;
+ res5.a[i] = DEFAULT_VALUE;
+ sign = -sign;
+ }
+
+ res2.x = INTRINSIC (_mask_expand_epi64) (res1.x, MASK_ALL_ONES, s1.x);
+ res3.x = INTRINSIC (_mask_expand_epi64) (res1.x, 0, s1.x);
+ res4.x = INTRINSIC (_mask_expand_epi64) (res1.x, mask, s1.x);
+ res6.x = INTRINSIC (_mask_expandloadu_epi64) (res5.x, MASK_ALL_ONES, s2);
+ res7.x = INTRINSIC (_mask_expandloadu_epi64) (res5.x, 0, s2);
+ res8.x = INTRINSIC (_mask_expandloadu_epi64) (res5.x, mask, s2);
+
+ CALC (s1.a, res_ref1, mask);
+ CALC (s2, res_ref2, mask);
+
+ if (UNION_CHECK (AVX512F_LEN, i_q) (res2, s1.a))
+ abort ();
+
+ if (UNION_CHECK (AVX512F_LEN, i_q) (res3, res1.a))
+ abort ();
+
+ MASK_MERGE (i_q) (res_ref1, mask, SIZE);
+ if (UNION_CHECK (AVX512F_LEN, i_q) (res4, res_ref1))
+ abort ();
+
+ if (UNION_CHECK (AVX512F_LEN, i_q) (res6, s2))
+ abort ();
+
+ if (UNION_CHECK (AVX512F_LEN, i_q) (res7, res5.a))
+ abort ();
+
+ MASK_MERGE (i_q) (res_ref2, mask, SIZE);
+ if (UNION_CHECK (AVX512F_LEN, i_q) (res8, res_ref2))
+ abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-pr100267-w-2.c b/gcc/testsuite/gcc.target/i386/avx512bw-pr100267-w-2.c
new file mode 100644
index 00000000000..926e04d4df6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bw-pr100267-w-2.c
@@ -0,0 +1,74 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512f -mavx512vbmi2 -mavx512bw" } */
+/* { dg-require-effective-target avx512f } */
+/* { dg-require-effective-target avx512vbmi2 } */
+
+#define AVX512F
+
+#define AVX512VBMI2
+#include "avx512f-helper.h"
+
+#define SIZE (AVX512F_LEN / 16)
+#include "avx512f-mask-type.h"
+
+static void
+CALC (short *s, short *r, MASK_TYPE mask)
+{
+ int i, k;
+
+ for (i = 0, k = 0; i < SIZE; i++)
+ {
+ if (mask & ((long long)1 << i))
+ r[i] = s[k++];
+ }
+}
+
+void
+TEST (void)
+{
+ UNION_TYPE (AVX512F_LEN, i_w) s1, res1, res2, res3, res4, res5, res6, res7, res8;
+ MASK_TYPE mask = (1 << (SIZE >> 2) - 1) - 1;
+ short s2[SIZE];
+ short res_ref1[SIZE];
+ short res_ref2[SIZE];
+ int i, sign = 1;
+
+ for (i = 0; i < SIZE; i++)
+ {
+ s1.a[i] = 12345 * (i + 200) * sign;
+ s2[i] = 67890 * (i + 300) * sign;
+ res1.a[i] = DEFAULT_VALUE;
+ res5.a[i] = DEFAULT_VALUE;
+ sign = -sign;
+ }
+
+ res2.x = INTRINSIC (_mask_expand_epi16) (res1.x, MASK_ALL_ONES, s1.x);
+ res3.x = INTRINSIC (_mask_expand_epi16) (res1.x, 0, s1.x);
+ res4.x = INTRINSIC (_mask_expand_epi16) (res1.x, mask, s1.x);
+ res6.x = INTRINSIC (_mask_expandloadu_epi16) (res5.x, MASK_ALL_ONES, s2);
+ res7.x = INTRINSIC (_mask_expandloadu_epi16) (res5.x, 0, s2);
+ res8.x = INTRINSIC (_mask_expandloadu_epi16) (res5.x, mask, s2);
+
+ CALC (s1.a, res_ref1, mask);
+ CALC (s2, res_ref2, mask);
+
+ if (UNION_CHECK (AVX512F_LEN, i_w) (res2, s1.a))
+ abort ();
+
+ if (UNION_CHECK (AVX512F_LEN, i_w) (res3, res1.a))
+ abort ();
+
+ MASK_MERGE (i_w) (res_ref1, mask, SIZE);
+ if (UNION_CHECK (AVX512F_LEN, i_w) (res4, res_ref1))
+ abort ();
+
+ if (UNION_CHECK (AVX512F_LEN, i_w) (res6, s2))
+ abort ();
+
+ if (UNION_CHECK (AVX512F_LEN, i_w) (res7, res5.a))
+ abort ();
+
+ MASK_MERGE (i_w) (res_ref2, mask, SIZE);
+ if (UNION_CHECK (AVX512F_LEN, i_w) (res8, res_ref2))
+ abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-pr100267-1.c b/gcc/testsuite/gcc.target/i386/avx512f-pr100267-1.c
new file mode 100644
index 00000000000..5eb6de531c2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-pr100267-1.c
@@ -0,0 +1,66 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vpexpandd\[ \\t\]+\[^\{\n\]*\\(\[^\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpexpandd\[ \\t\]+\[^\{\n\(]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpexpandq\[ \\t\]+\[^\{\n\]*\\(\[^\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpexpandq\[ \\t\]+\[^\{\n\(]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vexpandpd\[ \\t\]+\[^\{\n\]*\\(\[^\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vexpandpd\[ \\t\]+\[^\{\n\(]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vexpandps\[ \\t\]+\[^\{\n\]*\\(\[^\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vexpandps\[ \\t\]+\[^\{\n\(]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+/* { dg-final { scan-assembler-times "vmov\[a-z0-9\]*\[ \\t\]+\[^\{\n\]*\\(\[^\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 4 } } */
+/* { dg-final { scan-assembler-times "vmov\[a-z0-9\]*\[ \\t\]+\[^\{\n\(]*%zmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 4 } } */
+#include <immintrin.h>
+
+int *pi32;
+long long *pi64;
+double *pd;
+float *pf;
+volatile __m512i xi32, xi64;
+volatile __m512d xd;
+volatile __m512 xf;
+
+void extern
+avx512f_test (void)
+{
+ xi32 = _mm512_mask_expand_epi32 (xi32, 0, xi32);
+ xi32 = _mm512_mask_expand_epi32 (xi32, -1, xi32);
+ xi32 = _mm512_mask_expand_epi32 (xi32, (1 << 8) - 1, xi32);
+ xi32 = _mm512_mask_expand_epi32 (xi32, (1 << 8) + 1, xi32);
+
+ xi32 = _mm512_mask_expandloadu_epi32 (xi32, 0, pi32);
+ xi32 = _mm512_mask_expandloadu_epi32 (xi32, (1 << 16) - 1, pi32);
+ xi32 = _mm512_mask_expandloadu_epi32 (xi32, (1 << 6) - 1, pi32);
+ xi32 = _mm512_mask_expandloadu_epi32 (xi32, (1 << 6) + 3, pi32);
+
+ xi64 = _mm512_mask_expand_epi64 (xi64, 0, xi64);
+ xi64 = _mm512_mask_expand_epi64 (xi64, -1, xi64);
+ xi64 = _mm512_mask_expand_epi64 (xi64, (1 << 3) - 1, xi64);
+ xi64 = _mm512_mask_expand_epi64 (xi64, (1 << 3) + 2, xi64);
+
+ xi64 = _mm512_mask_expandloadu_epi64 (xi64, 0, pi64);
+ xi64 = _mm512_mask_expandloadu_epi64 (xi64, (1 << 8) - 1, pi64);
+ xi64 = _mm512_mask_expandloadu_epi64 (xi64, (1 << 7) - 1, pi64);
+ xi64 = _mm512_mask_expandloadu_epi64 (xi64, (1 << 7) + 7, pi64);
+
+ xf = _mm512_mask_expand_ps (xf, 0, xf);
+ xf = _mm512_mask_expand_ps (xf, (1 << 16) - 1, xf);
+ xf = _mm512_mask_expand_ps (xf, (1 << 15) - 1, xf);
+ xf = _mm512_mask_expand_ps (xf, (1 << 14) + 3, xf);
+
+ xf = _mm512_mask_expandloadu_ps (xf, 0, pf);
+ xf = _mm512_mask_expandloadu_ps (xf, -1, pf);
+ xf = _mm512_mask_expandloadu_ps (xf, (1 << 13) - 1, pf);
+ xf = _mm512_mask_expandloadu_ps (xf, (1 << 13) + 5, pf);
+
+ xd = _mm512_mask_expand_pd (xd, 0, xd);
+ xd = _mm512_mask_expand_pd (xd, (1 << 8) - 1, xd);
+ xd = _mm512_mask_expand_pd (xd, (1 << 4) - 1, xd);
+ xd = _mm512_mask_expand_pd (xd, (1 << 4) + 1, xd);
+
+ xd = _mm512_mask_expandloadu_pd (xd, 0, pd);
+ xd = _mm512_mask_expandloadu_pd (xd, -1, pd);
+ xd = _mm512_mask_expandloadu_pd (xd, (1 << 5) - 1, pd);
+ xd = _mm512_mask_expandloadu_pd (xd, (1 << 5), pd);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-pr100267-pd-2.c b/gcc/testsuite/gcc.target/i386/avx512f-pr100267-pd-2.c
new file mode 100644
index 00000000000..7d653e8e853
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-pr100267-pd-2.c
@@ -0,0 +1,76 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512f" } */
+/* { dg-require-effective-target avx512f } */
+
+#define AVX512F
+
+#include "avx512f-helper.h"
+
+#define SIZE (AVX512F_LEN / 64)
+#include "avx512f-mask-type.h"
+
+static void
+CALC (double *s, double *r, MASK_TYPE mask)
+{
+ int i, k;
+
+ for (i = 0, k = 0; i < SIZE; i++)
+ {
+ if (mask & (1 << i))
+ r[i] = s[k++];
+ }
+}
+
+void
+TEST (void)
+{
+ UNION_TYPE (AVX512F_LEN, d) s1, res2, res3, res4, res5, res6, res7, res8, res9;
+ MASK_TYPE mask = (1 << SIZE - 1) - 1;
+ double s2[SIZE];
+ double res_ref1[SIZE];
+ double res_ref2[SIZE];
+ int i, sign = 1;
+
+ for (i = 0; i < SIZE; i++)
+ {
+ s1.a[i] = 123.456 * (i + 200) * sign;
+ s2[i] = 789.012 * (i + 300) * sign;
+ res2.a[i] = DEFAULT_VALUE;
+ res6.a[i] = DEFAULT_VALUE;
+ sign = -sign;
+ }
+
+ res3.x = INTRINSIC (_mask_expand_pd) (res2.x, MASK_ALL_ONES, s1.x);
+ res4.x = INTRINSIC (_mask_expand_pd) (res2.x, 0, s1.x);
+ res5.x = INTRINSIC (_mask_expand_pd) (res2.x, mask, s1.x);
+
+ res7.x = INTRINSIC (_mask_expandloadu_pd) (res6.x, MASK_ALL_ONES, s2);
+ res8.x = INTRINSIC (_mask_expandloadu_pd) (res6.x, 0, s2);
+ res9.x = INTRINSIC (_mask_expandloadu_pd) (res6.x, mask, s2);
+
+
+ /* no mask is the same as all ones mask. */
+ CALC (s1.a, res_ref1, mask);
+ CALC (s2, res_ref2, mask);
+
+ if (UNION_CHECK (AVX512F_LEN, d) (res3, s1.a))
+ abort ();
+
+ if (UNION_CHECK (AVX512F_LEN, d) (res4, res2.a))
+ abort ();
+
+ MASK_MERGE (d) (res_ref1, mask, SIZE);
+ if (UNION_CHECK (AVX512F_LEN, d) (res5, res_ref1))
+ abort ();
+
+ if (UNION_CHECK (AVX512F_LEN, d) (res7, s2))
+ abort ();
+
+ if (UNION_CHECK (AVX512F_LEN, d) (res8, res6.a))
+ abort ();
+
+ MASK_MERGE (d) (res_ref2, mask, SIZE);
+ if (UNION_CHECK (AVX512F_LEN, d) (res9, res_ref2))
+ abort ();
+
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-pr100267-ps-2.c b/gcc/testsuite/gcc.target/i386/avx512f-pr100267-ps-2.c
new file mode 100644
index 00000000000..40f0dde0c88
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-pr100267-ps-2.c
@@ -0,0 +1,72 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512f" } */
+/* { dg-require-effective-target avx512f } */
+
+#define AVX512F
+
+#include "avx512f-helper.h"
+
+#define SIZE (AVX512F_LEN / 32)
+#include "avx512f-mask-type.h"
+
+static void
+CALC (float *s, float *r, MASK_TYPE mask)
+{
+ int i, k;
+
+ for (i = 0, k = 0; i < SIZE; i++)
+ {
+ if (mask & (1 << i))
+ r[i] = s[k++];
+ }
+}
+
+void
+TEST (void)
+{
+ UNION_TYPE (AVX512F_LEN, ) s1, res2, res3, res4, res5, res6, res7, res8, res9;
+ MASK_TYPE mask = (1 << SIZE - 1) - 1;
+ float s2[SIZE];
+ float res_ref1[SIZE];
+ float res_ref2[SIZE];
+ int i, sign = 1;
+
+ for (i = 0; i < SIZE; i++)
+ {
+ s1.a[i] = 123.456 * (i + 200) * sign;
+ s2[i] = 789.012 * (i + 300) * sign;
+ res2.a[i] = DEFAULT_VALUE;
+ res6.a[i] = DEFAULT_VALUE;
+ sign = -sign;
+ }
+
+ res3.x = INTRINSIC (_mask_expand_ps) (res2.x, MASK_ALL_ONES, s1.x);
+ res4.x = INTRINSIC (_mask_expand_ps) (res2.x, 0, s1.x);
+ res5.x = INTRINSIC (_mask_expand_ps) (res2.x, mask, s1.x);
+ res7.x = INTRINSIC (_mask_expandloadu_ps) (res6.x, MASK_ALL_ONES, s2);
+ res8.x = INTRINSIC (_mask_expandloadu_ps) (res6.x, 0, s2);
+ res9.x = INTRINSIC (_mask_expandloadu_ps) (res6.x, mask, s2);
+
+ CALC (s1.a, res_ref1, mask);
+ CALC (s2, res_ref2, mask);
+
+ if (UNION_CHECK (AVX512F_LEN, ) (res3, s1.a))
+ abort ();
+
+ if (UNION_CHECK (AVX512F_LEN, ) (res4, res2.a))
+ abort ();
+
+ MASK_MERGE () (res_ref1, mask, SIZE);
+ if (UNION_CHECK (AVX512F_LEN, ) (res5, res_ref1))
+ abort ();
+
+ if (UNION_CHECK (AVX512F_LEN, ) (res7, s2))
+ abort ();
+
+ if (UNION_CHECK (AVX512F_LEN, ) (res8, res6.a))
+ abort ();
+
+ MASK_MERGE () (res_ref2, mask, SIZE);
+ if (UNION_CHECK (AVX512F_LEN, ) (res9, res_ref2))
+ abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-pr100267-1.c b/gcc/testsuite/gcc.target/i386/avx512vl-pr100267-1.c
new file mode 100644
index 00000000000..9ffd381f6c4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512vl-pr100267-1.c
@@ -0,0 +1,122 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -mavx512vl -O2" } */
+/* { dg-final { scan-assembler-times "vpexpandd\[ \\t\]+\[^\{\n\]*\\(\[^\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpexpandd\[ \\t\]+\[^\{\n\(]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpexpandq\[ \\t\]+\[^\{\n\]*\\(\[^\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpexpandq\[ \\t\]+\[^\{\n\(]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vexpandpd\[ \\t\]+\[^\{\n\]*\\(\[^\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vexpandpd\[ \\t\]+\[^\{\n\(]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vexpandps\[ \\t\]+\[^\{\n\]*\\(\[^\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vexpandps\[ \\t\]+\[^\{\n\(]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+/* { dg-final { scan-assembler-times "vmov\[a-z0-9\]*\[ \\t\]+\[^\{\n\]*\\(\[^\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 4 } } */
+/* { dg-final { scan-assembler-times "(?:vmov\[a-z0-9\]*\[ \\t\]+\[^\{\n\(]*%ymm\[0-9\]+\{%k\[1-7\]\}|blend\[a-z]+\[ \\t\]+\[^\{\n\(]*%ymm\[0-9\]+)(?:\n|\[ \\t\]+#)" 4 } } */
+/* { dg-final { scan-assembler-times "vpexpandd\[ \\t\]+\[^\{\n\]*\\(\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpexpandd\[ \\t\]+\[^\{\n\(]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpexpandq\[ \\t\]+\[^\{\n\]*\\(\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpexpandq\[ \\t\]+\[^\{\n\(]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vexpandpd\[ \\t\]+\[^\{\n\]*\\(\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vexpandpd\[ \\t\]+\[^\{\n\(]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vexpandps\[ \\t\]+\[^\{\n\]*\\(\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vexpandps\[ \\t\]+\[^\{\n\(]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+/* { dg-final { scan-assembler-times "vmov\[a-z0-9\]*\[ \\t\]+\[^\{\n\]*\\(\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 4 } } */
+/* { dg-final { scan-assembler-times "(?:vmov\[a-z0-9\]*\[ \\t\]+\[^\{\n\(]*%xmm\[0-9\]+\{%k\[1-7\]\}|(?:blend\[a-z]+|movsd)\[ \\t\]+\[^\{\n\(]*%xmm\[0-9\]+)(?:\n|\[ \\t\]+#)" 4 } } */
+
+#include <immintrin.h>
+
+int *pi32;
+long long *pi64;
+double *pd;
+float *pf;
+volatile __m256i xi32, xi64;
+volatile __m256d xd;
+volatile __m256 xf;
+
+volatile __m128i xi32_128, xi64_128;
+volatile __m128d xd_128;
+volatile __m128 xf_128;
+
+void extern
+avx512vl_test (void)
+{
+ xi32 = _mm256_mask_expand_epi32 (xi32, 0, xi32);
+ xi32 = _mm256_mask_expand_epi32 (xi32, -1, xi32);
+ xi32 = _mm256_mask_expand_epi32 (xi32, (1 << 4) - 1, xi32);
+ xi32 = _mm256_mask_expand_epi32 (xi32, (1 << 4) + 1, xi32);
+
+ xi32 = _mm256_mask_expandloadu_epi32 (xi32, 0, pi32);
+ xi32 = _mm256_mask_expandloadu_epi32 (xi32, (1 << 8) - 1, pi32);
+ xi32 = _mm256_mask_expandloadu_epi32 (xi32, (1 << 6) - 1, pi32);
+ xi32 = _mm256_mask_expandloadu_epi32 (xi32, (1 << 6) + 3, pi32);
+
+ xi64 = _mm256_mask_expand_epi64 (xi64, 0, xi64);
+ xi64 = _mm256_mask_expand_epi64 (xi64, -1, xi64);
+ xi64 = _mm256_mask_expand_epi64 (xi64, (1 << 3) - 1, xi64);
+ xi64 = _mm256_mask_expand_epi64 (xi64, (1 << 3) + 2, xi64);
+
+ xi64 = _mm256_mask_expandloadu_epi64 (xi64, 0, pi64);
+ xi64 = _mm256_mask_expandloadu_epi64 (xi64, (1 << 4) - 1, pi64);
+ xi64 = _mm256_mask_expandloadu_epi64 (xi64, (1 << 2) - 1, pi64);
+ xi64 = _mm256_mask_expandloadu_epi64 (xi64, (1 << 2), pi64);
+
+ xf = _mm256_mask_expand_ps (xf, 0, xf);
+ xf = _mm256_mask_expand_ps (xf, (1 << 8) - 1, xf);
+ xf = _mm256_mask_expand_ps (xf, (1 << 6) - 1, xf);
+ xf = _mm256_mask_expand_ps (xf, (1 << 6) + 3, xf);
+
+ xf = _mm256_mask_expandloadu_ps (xf, 0, pf);
+ xf = _mm256_mask_expandloadu_ps (xf, -1, pf);
+ xf = _mm256_mask_expandloadu_ps (xf, (1 << 7) - 1, pf);
+ xf = _mm256_mask_expandloadu_ps (xf, (1 << 7) + 5, pf);
+
+ xd = _mm256_mask_expand_pd (xd, 0, xd);
+ xd = _mm256_mask_expand_pd (xd, (1 << 4) - 1, xd);
+ xd = _mm256_mask_expand_pd (xd, (1 << 2) - 1, xd);
+ xd = _mm256_mask_expand_pd (xd, (1 << 2), xd);
+
+ xd = _mm256_mask_expandloadu_pd (xd, 0, pd);
+ xd = _mm256_mask_expandloadu_pd (xd, -1, pd);
+ xd = _mm256_mask_expandloadu_pd (xd, (1 << 2) - 1, pd);
+ xd = _mm256_mask_expandloadu_pd (xd, (1 << 2), pd);
+
+ xi32_128 = _mm_mask_expand_epi32 (xi32_128, 0, xi32_128);
+ xi32_128 = _mm_mask_expand_epi32 (xi32_128, -1, xi32_128);
+ xi32_128 = _mm_mask_expand_epi32 (xi32_128, (1 << 3) - 1, xi32_128);
+ xi32_128 = _mm_mask_expand_epi32 (xi32_128, (1 << 3) + 1, xi32_128);
+
+ xi32_128 = _mm_mask_expandloadu_epi32 (xi32_128, 0, pi32);
+ xi32_128 = _mm_mask_expandloadu_epi32 (xi32_128, (1 << 4) - 1, pi32);
+ xi32_128 = _mm_mask_expandloadu_epi32 (xi32_128, (1 << 2) - 1, pi32);
+ xi32_128 = _mm_mask_expandloadu_epi32 (xi32_128, (1 << 1) + 3, pi32);
+
+ xi64_128 = _mm_mask_expand_epi64 (xi64_128, 0, xi64_128);
+ xi64_128 = _mm_mask_expand_epi64 (xi64_128, -1, xi64_128);
+ xi64_128 = _mm_mask_expand_epi64 (xi64_128, (1 << 1) - 1, xi64_128);
+ xi64_128 = _mm_mask_expand_epi64 (xi64_128, 2, xi64_128);
+
+ xi64_128 = _mm_mask_expandloadu_epi64 (xi64_128, 0, pi64);
+ xi64_128 = _mm_mask_expandloadu_epi64 (xi64_128, 3, pi64);
+ xi64_128 = _mm_mask_expandloadu_epi64 (xi64_128, 1, pi64);
+ xi64_128 = _mm_mask_expandloadu_epi64 (xi64_128, 2, pi64);
+
+ xf_128 = _mm_mask_expand_ps (xf_128, 0, xf_128);
+ xf_128 = _mm_mask_expand_ps (xf_128, (1 << 4) - 1, xf_128);
+ xf_128 = _mm_mask_expand_ps (xf_128, (1 << 3) - 1, xf_128);
+ xf_128 = _mm_mask_expand_ps (xf_128, (1 << 2), xf_128);
+
+ xf_128 = _mm_mask_expandloadu_ps (xf_128, 0, pf);
+ xf_128 = _mm_mask_expandloadu_ps (xf_128, -1, pf);
+ xf_128 = _mm_mask_expandloadu_ps (xf_128, (1 << 3) - 1, pf);
+ xf_128 = _mm_mask_expandloadu_ps (xf_128, (1 << 1), pf);
+
+ xd_128 = _mm_mask_expand_pd (xd_128, 0, xd_128);
+ xd_128 = _mm_mask_expand_pd (xd_128, (1 << 2) - 1, xd_128);
+ xd_128 = _mm_mask_expand_pd (xd_128, 1, xd_128);
+ xd_128 = _mm_mask_expand_pd (xd_128, 2, xd_128);
+
+ xd_128 = _mm_mask_expandloadu_pd (xd_128, 0, pd);
+ xd_128 = _mm_mask_expandloadu_pd (xd_128, -1, pd);
+ xd_128 = _mm_mask_expandloadu_pd (xd_128, 1, pd);
+ xd_128 = _mm_mask_expandloadu_pd (xd_128, 2, pd);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-pr100267-pd-2.c b/gcc/testsuite/gcc.target/i386/avx512vl-pr100267-pd-2.c
new file mode 100644
index 00000000000..e5f30374b51
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512vl-pr100267-pd-2.c
@@ -0,0 +1,15 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512vl" } */
+/* { dg-require-effective-target avx512vl } */
+
+#define AVX512VL
+#define AVX512F_LEN 256
+#define AVX512F_LEN_HALF 128
+#include "avx512f-pr100267-pd-2.c"
+
+#undef AVX512F_LEN
+#undef AVX512F_LEN_HALF
+
+#define AVX512F_LEN 128
+#define AVX512F_LEN_HALF 128
+#include "avx512f-pr100267-pd-2.c"
diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-pr100267-ps-2.c b/gcc/testsuite/gcc.target/i386/avx512vl-pr100267-ps-2.c
new file mode 100644
index 00000000000..0149a8aaa55
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512vl-pr100267-ps-2.c
@@ -0,0 +1,15 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512vl" } */
+/* { dg-require-effective-target avx512vl } */
+
+#define AVX512VL
+#define AVX512F_LEN 256
+#define AVX512F_LEN_HALF 128
+#include "avx512f-pr100267-ps-2.c"
+
+#undef AVX512F_LEN
+#undef AVX512F_LEN_HALF
+
+#define AVX512F_LEN 128
+#define AVX512F_LEN_HALF 128
+#include "avx512f-pr100267-ps-2.c"
diff --git a/gcc/testsuite/gcc.target/i386/avx512vlbw-pr100267-1.c b/gcc/testsuite/gcc.target/i386/avx512vlbw-pr100267-1.c
new file mode 100644
index 00000000000..135dbd7577e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512vlbw-pr100267-1.c
@@ -0,0 +1,66 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -mavx512vbmi2 -mavx512vl -O2" } */
+/* { dg-final { scan-assembler-times "vpexpandb\[ \\t\]+\[^\{\n\]*\\(\[^\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpexpandb\[ \\t\]+\[^\{\n\(]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpexpandw\[ \\t\]+\[^\{\n\]*\\(\[^\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpexpandw\[ \\t\]+\[^\{\n\(]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+/* { dg-final { scan-assembler-times "vmov\[a-z0-9\]*\[ \\t\]+\[^\{\n\]*\\(\[^\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vmov\[a-z0-9\]*\[ \\t\]+\[^\{\n\(]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vpexpandb\[ \\t\]+\[^\{\n\]*\\(\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpexpandb\[ \\t\]+\[^\{\n\(]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpexpandw\[ \\t\]+\[^\{\n\]*\\(\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpexpandw\[ \\t\]+\[^\{\n\(]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+/* { dg-final { scan-assembler-times "vmov\[a-z0-9\]*\[ \\t\]+\[^\{\n\]*\\(\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "(?:vmov\[a-z0-9\]*\[ \\t\]+\[^\{\n\(]*%xmm\[0-9\]+\{%k\[1-7\]\}|blend\[a-z]*\[ \\t\]+\[^\{\n\(]*%xmm\[0-9\]+)(?:\n|\[ \\t\]+#)" 2 } } */
+#include <immintrin.h>
+
+char *pi8;
+short *pi16;
+volatile __m256i xi16, xi8;
+volatile __m128i xi16_xmm, xi8_xmm;
+
+void extern
+avx512f_test (void)
+{
+ xi8 = _mm256_mask_expand_epi8 (xi8, 0, xi8);
+ xi8 = _mm256_mask_expand_epi8 (xi8, -1, xi8);
+ xi8 = _mm256_mask_expand_epi8 (xi8, (1 << 30) - 1, xi8);
+ xi8 = _mm256_mask_expand_epi8 (xi8, (1 << 16) + 1, xi8);
+
+ xi8 = _mm256_mask_expandloadu_epi8 (xi8, 0, pi8);
+ xi8 = _mm256_mask_expandloadu_epi8 (xi8, -1, pi8);
+ xi8 = _mm256_mask_expandloadu_epi8 (xi8, (1 << 28) - 1, pi8);
+ xi8 = _mm256_mask_expandloadu_epi8 (xi8, (1 << 15) + 3, pi8);
+
+ xi16 = _mm256_mask_expand_epi16 (xi16, 0, xi16);
+ xi16 = _mm256_mask_expand_epi16 (xi16, -1, xi16);
+ xi16 = _mm256_mask_expand_epi16 (xi16, (1 << 15) - 1, xi16);
+ xi16 = _mm256_mask_expand_epi16 (xi16, (1 << 14) + 2, xi16);
+
+ xi16 = _mm256_mask_expandloadu_epi16 (xi16, 0, pi16);
+ xi16 = _mm256_mask_expandloadu_epi16 (xi16, (1 << 16) - 1, pi16);
+ xi16 = _mm256_mask_expandloadu_epi16 (xi16, (1 << 14) - 1, pi16);
+ xi16 = _mm256_mask_expandloadu_epi16 (xi16, (1 << 13) + 7, pi16);
+
+ xi8_xmm = _mm_mask_expand_epi8 (xi8_xmm, 0, xi8_xmm);
+ xi8_xmm = _mm_mask_expand_epi8 (xi8_xmm, -1, xi8_xmm);
+ xi8_xmm = _mm_mask_expand_epi8 (xi8_xmm, (1 << 13) - 1, xi8_xmm);
+ xi8_xmm = _mm_mask_expand_epi8 (xi8_xmm, (1 << 12) + 1, xi8_xmm);
+
+ xi8_xmm = _mm_mask_expandloadu_epi8 (xi8_xmm, 0, pi8);
+ xi8_xmm = _mm_mask_expandloadu_epi8 (xi8_xmm, (1 << 16) - 1, pi8);
+ xi8_xmm = _mm_mask_expandloadu_epi8 (xi8_xmm, (1 << 12) - 1, pi8);
+ xi8_xmm = _mm_mask_expandloadu_epi8 (xi8_xmm, (1 << 11) + 3, pi8);
+
+ xi16_xmm = _mm_mask_expand_epi16 (xi16_xmm, 0, xi16_xmm);
+ xi16_xmm = _mm_mask_expand_epi16 (xi16_xmm, -1, xi16_xmm);
+ xi16_xmm = _mm_mask_expand_epi16 (xi16_xmm, (1 << 7) - 1, xi16_xmm);
+ xi16_xmm = _mm_mask_expand_epi16 (xi16_xmm, (1 << 4) + 2, xi16_xmm);
+
+ xi16_xmm = _mm_mask_expandloadu_epi16 (xi16_xmm, 0, pi16);
+ xi16_xmm = _mm_mask_expandloadu_epi16 (xi16_xmm, (1 << 8) - 1, pi16);
+ xi16_xmm = _mm_mask_expandloadu_epi16 (xi16_xmm, (1 << 3) - 1, pi16);
+ xi16_xmm = _mm_mask_expandloadu_epi16 (xi16_xmm, (1 << 6) + 7, pi16);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512vlbw-pr100267-b-2.c b/gcc/testsuite/gcc.target/i386/avx512vlbw-pr100267-b-2.c
new file mode 100644
index 00000000000..d54e8033a25
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512vlbw-pr100267-b-2.c
@@ -0,0 +1,16 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512vl -mavx512vbmi2 -mavx512bw" } */
+/* { dg-require-effective-target avx512vl } */
+/* { dg-require-effective-target avx512vbmi2 } */
+
+#define AVX512VL
+#define AVX512F_LEN 256
+#define AVX512F_LEN_HALF 128
+#include "avx512bw-pr100267-b-2.c"
+
+#undef AVX512F_LEN
+#undef AVX512F_LEN_HALF
+
+#define AVX512F_LEN 128
+#define AVX512F_LEN_HALF 128
+#include "avx512bw-pr100267-b-2.c"
diff --git a/gcc/testsuite/gcc.target/i386/avx512vlbw-pr100267-d-2.c b/gcc/testsuite/gcc.target/i386/avx512vlbw-pr100267-d-2.c
new file mode 100644
index 00000000000..1e604cfb1aa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512vlbw-pr100267-d-2.c
@@ -0,0 +1,15 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512vl" } */
+/* { dg-require-effective-target avx512vl } */
+
+#define AVX512VL
+#define AVX512F_LEN 256
+#define AVX512F_LEN_HALF 128
+#include "avx512bw-pr100267-d-2.c"
+
+#undef AVX512F_LEN
+#undef AVX512F_LEN_HALF
+
+#define AVX512F_LEN 128
+#define AVX512F_LEN_HALF 128
+#include "avx512bw-pr100267-d-2.c"
diff --git a/gcc/testsuite/gcc.target/i386/avx512vlbw-pr100267-q-2.c b/gcc/testsuite/gcc.target/i386/avx512vlbw-pr100267-q-2.c
new file mode 100644
index 00000000000..1f9fe65ad16
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512vlbw-pr100267-q-2.c
@@ -0,0 +1,15 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512vl" } */
+/* { dg-require-effective-target avx512vl } */
+
+#define AVX512VL
+#define AVX512F_LEN 256
+#define AVX512F_LEN_HALF 128
+#include "avx512bw-pr100267-q-2.c"
+
+#undef AVX512F_LEN
+#undef AVX512F_LEN_HALF
+
+#define AVX512F_LEN 128
+#define AVX512F_LEN_HALF 128
+#include "avx512bw-pr100267-q-2.c"
diff --git a/gcc/testsuite/gcc.target/i386/avx512vlbw-pr100267-w-2.c b/gcc/testsuite/gcc.target/i386/avx512vlbw-pr100267-w-2.c
new file mode 100644
index 00000000000..a46ca78a621
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512vlbw-pr100267-w-2.c
@@ -0,0 +1,16 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512vl -mavx512vbmi2 -mavx512bw" } */
+/* { dg-require-effective-target avx512vl } */
+/* { dg-require-effective-target avx512vbmi2 } */
+
+#define AVX512VL
+#define AVX512F_LEN 256
+#define AVX512F_LEN_HALF 128
+#include "avx512bw-pr100267-w-2.c"
+
+#undef AVX512F_LEN
+#undef AVX512F_LEN_HALF
+
+#define AVX512F_LEN 128
+#define AVX512F_LEN_HALF 128
+#include "avx512bw-pr100267-w-2.c"
--
2.18.1
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2021-06-21 11:41 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-30 4:42 [PATCH][AVX512] Fix ICE for vpexpand* Hongtao Liu
2021-04-30 4:49 ` [PATCH][AVX512] Optimize vpexpand* to mask mov when mask have all ones in it's lower part (including 0 and -1) Hongtao Liu
2021-05-12 7:28 ` Hongtao Liu
2021-06-21 11:46 ` Hongtao Liu
2021-05-12 7:29 ` [PATCH][AVX512] Fix ICE for vpexpand* Hongtao Liu
2021-06-21 11:24 ` Hongtao Liu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).