Intel AVX10.1 Compiler Design and Support

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* Intel AVX10.1 Compiler Design and Support
@ 2023-08-08  7:13 Haochen Jiang
  2023-08-08  7:13 ` [PATCH 1/3] Initial support for AVX10.1 Haochen Jiang
                   ` (12 more replies)
  0 siblings, 13 replies; 88+ messages in thread
From: Haochen Jiang @ 2023-08-08  7:13 UTC (permalink / raw)
  To: gcc-patches; +Cc: ubizjak, hongtao.liu

Hi all,

We will send out our initial support of AVX10 and some sample patches in this
mailing thread. And there will be more coming up afterwards. Therefore, we would
like to share our proposed AVX10 design in GCC.

Here is a quick introduction to AVX10:
  - AVX10 is the first major new ISA since the introduction of AVX512 in 2013.
  - Since the introduction of AVX10, we would like to establish a common,
    converged vector instruction set across all Intel architectures, including
    Xeon Server, Atom Server and Clients.
  - The default maximum vector size for AVX10 will be 256 bit, while 512 bit is
    optional.
  - AVX10.1 will include all existing AVX512 instructions in Granite Rapids.
  - There will be no new AVX512 CPUID introduced in future. All EVEX vector
    instructions will be under AVX10 umbrella.
  - AVX10 will be version-based ISA instead of tons of different CPUIDs like
    AVX512BW, AVX512DQ, AVX512FP16, etc.
  - Based on AVX10.1, AVX10.2 will introduce ymm embedded rounding, SAE
    (Suppressed All Exceptions) control and new instructions.

If you would like to have a closed look at the details, please follow the links
below:

Intel Advanced Vector Extensions 10 (Intel AVX10) Architecture Specification 
It describes the Intel Advanced Vector Extensions 10 Instruction Set
Architecture.
https://cdrdv2.intel.com/v1/dl/getContent/784267

The Converged Vector ISA: Intel Advanced Vector Extensions 10 Technical Paper
It provides introductory information regarding the converged vector ISA: Intel
Advanced Vector Extensions 10.
https://cdrdv2.intel.com/v1/dl/getContent/784343

Hence, we will have several compiler design ground rules for AVX10:
  - AVX10 is a converged ISA feature set.
    We will not provide -m[no-]xxx to enable/disable each single vector feature
    in one version as we used to before. Instead, a simple option -m[no-]avx10.x
    is used. If 512 bit version is needed, -mavx10.x-512 is all you need. Also,
    maximum vector width should be the same when different version of AVX10 is
    used. For example, enabling AVX10.1 with 512 bit vector width while enabling
    AVX10.2 with only 256 bit vector width is not a desired behavior.
  - AVX10 is an evolving ISA feature set.
    Every feature showed up in the current version will always show up in future
    version.
  - AVX10 is an independent ISA feature set.
    Although sharing the same instructions and encodings, AVX10 and AVX512 are
    conceptual independent features, which means they are orthogonal.

Since AVX10 will have several benefits like bringing AVX512 features on Atom
Server and Clients and getting rid of tons of AVX512 CPUIDs but a simple AVX10
option to enable features, we lean towards the adoption of AVX10 instead of
AVX512 from now on.

Based on all we got, we would like to introduce the following compiler options:
  - -mavx10.x: The option will enable AVX10.1-AVX10.x features with a default
    256 bit vector width to make sure the compatibility on all platforms.
  - -mavx10.x-512: The option will enable AVX10.1-AVX10.x features with 512 bit
    vector width. “-mno-avx10.x-512” option will not be provided to avoid
    confusion of disabling 512 vector width or avx10.x itself.
  - -mavx10.x-256: The option will enable AVX10.1-AVX10.x features with 256 bit
    vector width. But it will disable 512 bit vector width since the vector size
    is indicated in option. “-mno-avx10.x-256” option will not be provided to
    keep align with the 512 ones.
  - -mno-avx10.x: The option will disable all the features introduced >=avx10.x
    (both 256 and 512 bit) and keep features <avx10.x if enabled, just like how
    -mno- options behave previously.

When there comes an option combination of various vector size indicated
(e.g. -mavx10.x-512 -mavx10.y-256), we would like to emit a warning since the
vector size conflicts under this scenario. Also in the warning message, we will
indicate the last mentioned vector size will be picked. The ISA set will be the
highest one.

For the auto dispatch support including function __builtin_cpu_supports (),
function multi versioning, function attribute usage, the behavior will be
identical to compiler options, which means we will have avx10.x, avx10.x-256,
avx10.x-512 and no-avx10.x.

As we have mentioned before, we lean towards the adoption of AVX10 instead of
AVX512 from now on. Hence, we don’t recommend users to combine the AVX10 and
legacy AVX512 options since different users will have different opinions on
compiler behavior with option combinations like “-m[no-]avx10.1 -m[no-]avx512f"
and it is hard to tell whether compiler should open or close the feature under
those scenarios. Furthermore, we don't guarantee that the behavior is
consistent between GCC and LLVM/ICX.

From our understanding, we propose to maintain the independency between AVX10
and AVX512 switches. Therefore, opening one of them will turn on the feature,
no matter the other one is opened or not. We will emit a warning when user
enables one feature but disable the other afterwards. Some typical examples are
given to help better understand that:
  - -mno-avx512xxx: It will check if AVX10.1 is disabled when handling the
    option. If AVX10.1 is  disabled, it is valid and then disables AVX512xxx.
    If AVX10.1 not disabled, a warning will be emitted and -mno-avx512xxx will
    be ignored.
  - -mno-avx10.1: It will check if all AVX512 features in Granite Rapids are
    disabled when handling the option. If all disabled, it is valid and then
    disables all the features. If not, a warning will be emitted and
    -mno-avx10.1 will be ignored.
  - -mno-avx10.x (x >= 2): It is always valid.

Also, since we maintain the independency between AVX10 and AVX512 switches,
when using a compiler option of “-mavx10.x[-256] -mavx512xxx”, it will actually
open all the AVX10.x 128/256 bit vector instruction support and 512 bit vector
instruction support for AVX512xxx.

Last thing needed to be mentioned is -march options. We will imply AVX10
features for future platforms with AVX10 available, i.e., AVX10/512 for
Xeon Servers and AVX10/256 for Atom Servers and Clients. We purpose to change
the current -march=graniterapids/graniterapids-d from implying AVX512 features
to AVX10.1/512. No obvious behavior changes will happen for these two -march.

There will be a minor open after implying change: when we are using
-march=graniterapids -mno-avx512f or -mno-avx512f -march=graniterapids, it will
not disable AVX512F and it is a change in behavior. Should we emit a warning
for that? Our current behavior is not to emit a warning but I am open for
changes. However, I suppose if we finally choose to emit a warning, it should
only happen in Granite Rapids and Granite Rapids D since for the next
generation Xeon Server product, user should be aware of AVX10 change.

For the following nine patches, first three of them will be the initial support
for AVX10.1 while the latter six is the AVX10.1 support for AVX512DQ+AVX512VL.

If you have any questions, feel free to ask in this thread. Also, if you are
working on AVX512 related patterns during AVX10 upstreaming, especially
constraints, target check and iterators related, please kindly cc me in the
patches since there might be some conflicts.

Thx,
Haochen

^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH 1/3] Initial support for AVX10.1
  2023-08-08  7:13 Intel AVX10.1 Compiler Design and Support Haochen Jiang
@ 2023-08-08  7:13 ` Haochen Jiang
  2023-08-16  2:29   ` Hongtao Liu
  2023-08-08  7:13 ` [PATCH 2/3] Emit a warning when disabling AVX512 with AVX10 enabled or disabling AVX10 with AVX512 enabled Haochen Jiang
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 88+ messages in thread
From: Haochen Jiang @ 2023-08-08  7:13 UTC (permalink / raw)
  To: gcc-patches; +Cc: ubizjak, hongtao.liu

gcc/ChangeLog:

	* common/config/i386/cpuinfo.h (get_available_features):
	Add avx10_set and version and detect avx10.1.
	(cpu_indicator_init): Handle avx10.1-512.
	* common/config/i386/i386-common.cc
	(OPTION_MASK_ISA2_AVX10_512BIT_SET): New.
	(OPTION_MASK_ISA2_AVX10_1_SET): Ditto.
	(OPTION_MASK_ISA2_AVX10_512BIT_UNSET): Ditto.
	(OPTION_MASK_ISA2_AVX10_1_UNSET): Ditto.
	(OPTION_MASK_ISA2_AVX2_UNSET): Modify for AVX10_1.
	(ix86_handle_option): Handle -mavx10.1, -mavx10.1-256 and
	-mavx10.1-512.
	* common/config/i386/i386-cpuinfo.h (enum processor_features):
	Add FEATURE_AVX10_512BIT, FEATURE_AVX10_1 and
	FEATURE_AVX10_512BIT.
	* common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY for
	AVX10_512BIT, AVX10_1 and AVX10_1_512.
	* config/i386/constraints.md (Yk): Add AVX10_1.
	(Yv): Ditto.
	(k): Ditto.
	* config/i386/cpuid.h (bit_AVX10): New.
	(bit_AVX10_256): Ditto.
	(bit_AVX10_512): Ditto.
	* config/i386/i386-c.cc (ix86_target_macros_internal):
	Define AVX10_512BIT and AVX10_1.
	* config/i386/i386-isa.def
	(AVX10_512BIT): Add DEF_PTA(AVX10_512BIT).
	(AVX10_1): Add DEF_PTA(AVX10_1).
	* config/i386/i386-options.cc (isa2_opts): Add -mavx10.1.
	(ix86_valid_target_attribute_inner_p): Handle avx10-512bit, avx10.1
	and avx10.1-512.
	(ix86_option_override_internal): Enable AVX512{F,VL,BW,DQ,CD,BF16,
	FP16,VBMI,VBMI2,VNNI,IFMA,BITALG,VPOPCNTDQ} features for avx10.1-512.
	(ix86_valid_target_attribute_inner_p): Handle AVX10_1.
	* config/i386/i386.cc (ix86_get_ssemov): Add AVX10_1.
	(ix86_conditional_register_usage): Ditto.
	(ix86_hard_regno_mode_ok): Ditto.
	(ix86_rtx_costs): Ditto.
	* config/i386/i386.h (VALID_MASK_AVX10_MODE): New macro.
	* config/i386/i386.opt: Add option -mavx10.1, -mavx10.1-256 and
	-mavx10.1-512.
	* doc/extend.texi: Document avx10.1, avx10.1-256 and avx10.1-512.
	* doc/invoke.texi: Document -mavx10.1, -mavx10.1-256 and -mavx10.1-512.
	* doc/sourcebuild.texi: Document target avx10.1, avx10.1-256
	and avx10.1-512.

gcc/testsuite/ChangeLog:

	* g++.target/i386/mv33.C: New test.
	* gcc.target/i386/avx10_1-1.c: Ditto.
	* gcc.target/i386/avx10_1-2.c: Ditto.
	* gcc.target/i386/avx10_1-3.c: Ditto.
	* gcc.target/i386/avx10_1-4.c: Ditto.
	* gcc.target/i386/avx10_1-5.c: Ditto.
	* gcc.target/i386/avx10_1-6.c: Ditto.
	* gcc.target/i386/avx10_1-7.c: Ditto.
	* gcc.target/i386/avx10_1-8.c: Ditto.
	* gcc.target/i386/avx10_1-9.c: Ditto.
	* gcc.target/i386/avx10_1-10.c: Ditto.
---
 gcc/common/config/i386/cpuinfo.h           | 36 +++++++++++++++
 gcc/common/config/i386/i386-common.cc      | 53 +++++++++++++++++++++-
 gcc/common/config/i386/i386-cpuinfo.h      |  3 ++
 gcc/common/config/i386/i386-isas.h         |  5 ++
 gcc/config/i386/constraints.md             |  6 +--
 gcc/config/i386/cpuid.h                    |  6 +++
 gcc/config/i386/i386-c.cc                  |  4 ++
 gcc/config/i386/i386-isa.def               |  2 +
 gcc/config/i386/i386-options.cc            | 26 ++++++++++-
 gcc/config/i386/i386.cc                    | 18 ++++++--
 gcc/config/i386/i386.h                     |  3 ++
 gcc/config/i386/i386.opt                   | 19 ++++++++
 gcc/doc/extend.texi                        | 13 ++++++
 gcc/doc/invoke.texi                        | 16 +++++--
 gcc/doc/sourcebuild.texi                   |  9 ++++
 gcc/testsuite/g++.target/i386/mv33.C       | 30 ++++++++++++
 gcc/testsuite/gcc.target/i386/avx10_1-1.c  | 22 +++++++++
 gcc/testsuite/gcc.target/i386/avx10_1-10.c | 13 ++++++
 gcc/testsuite/gcc.target/i386/avx10_1-2.c  | 13 ++++++
 gcc/testsuite/gcc.target/i386/avx10_1-3.c  | 13 ++++++
 gcc/testsuite/gcc.target/i386/avx10_1-4.c  | 13 ++++++
 gcc/testsuite/gcc.target/i386/avx10_1-5.c  | 13 ++++++
 gcc/testsuite/gcc.target/i386/avx10_1-6.c  | 13 ++++++
 gcc/testsuite/gcc.target/i386/avx10_1-7.c  | 13 ++++++
 gcc/testsuite/gcc.target/i386/avx10_1-8.c  |  4 ++
 gcc/testsuite/gcc.target/i386/avx10_1-9.c  | 13 ++++++
 26 files changed, 366 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/i386/mv33.C
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-10.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-8.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-9.c

diff --git a/gcc/common/config/i386/cpuinfo.h b/gcc/common/config/i386/cpuinfo.h
index 30ef0d334ca..5abff83b4ca 100644
--- a/gcc/common/config/i386/cpuinfo.h
+++ b/gcc/common/config/i386/cpuinfo.h
@@ -688,6 +688,9 @@ get_available_features (struct __processor_model *cpu_model,
   int amx_usable = 0;
   /* Check if KL is usable.  */
   int has_kl = 0;
+  /* Record AVX10 version.  */
+  int avx10_set = 0;
+  int version = 0;
   if ((ecx & bit_OSXSAVE))
     {
       /* Check if XMM, YMM, OPMASK, upper 256 bits of ZMM0-ZMM15 and
@@ -906,6 +909,9 @@ get_available_features (struct __processor_model *cpu_model,
 	{
 	  if (eax & bit_AVX512BF16)
 	    set_feature (FEATURE_AVX512BF16);
+	  /* AVX10 has the same XSTATE with AVX512.  */
+	  if (edx & bit_AVX10)
+	    avx10_set = 1;
 	}
       if (amx_usable)
 	{
@@ -951,6 +957,24 @@ get_available_features (struct __processor_model *cpu_model,
 	}
     }
 
+  /* Get Advanced Features at level 0x24 (eax = 0x24).  */
+  if (avx10_set && max_cpuid_level >= 0x24)
+    {
+      __cpuid (0x18, eax, ebx, ecx, edx);
+      version = ebx & 0xff;
+      if (ebx & bit_AVX10_256)
+	switch (version)
+	  {
+	  case 1:
+	    set_feature (FEATURE_AVX10_1);
+	    break;
+	  default:
+	    gcc_unreachable ();
+	  }
+      if (ebx & bit_AVX10_512)
+	set_feature (FEATURE_AVX10_512BIT);
+    }
+
   /* Check cpuid level of extended features.  */
   __cpuid (0x80000000, ext_level, ebx, ecx, edx);
 
@@ -1155,6 +1179,18 @@ cpu_indicator_init (struct __processor_model *cpu_model,
 	}
     }
 
+#define SET_AVX10_512(A,B) \
+  if (has_cpu_feature (cpu_model, cpu_features2, FEATURE_AVX10_##A)) \
+    { \
+      CHECK___builtin_cpu_supports (B); \
+      set_cpu_feature (cpu_model, cpu_features2, FEATURE_AVX10_##A##_512); \
+    }
+
+  if (has_cpu_feature (cpu_model, cpu_features2, FEATURE_AVX10_512BIT))
+    SET_AVX10_512 (1, "avx10.1-512");
+
+#undef SET_AVX10_512
+
   gcc_assert (cpu_model->__cpu_vendor < VENDOR_MAX);
   gcc_assert (cpu_model->__cpu_type < CPU_TYPE_MAX);
   gcc_assert (cpu_model->__cpu_subtype < CPU_SUBTYPE_MAX);
diff --git a/gcc/common/config/i386/i386-common.cc b/gcc/common/config/i386/i386-common.cc
index 26005914079..6c3bebb1846 100644
--- a/gcc/common/config/i386/i386-common.cc
+++ b/gcc/common/config/i386/i386-common.cc
@@ -123,6 +123,8 @@ along with GCC; see the file COPYING3.  If not see
 #define OPTION_MASK_ISA2_SM3_SET OPTION_MASK_ISA2_SM3
 #define OPTION_MASK_ISA2_SHA512_SET OPTION_MASK_ISA2_SHA512
 #define OPTION_MASK_ISA2_SM4_SET OPTION_MASK_ISA2_SM4
+#define OPTION_MASK_ISA2_AVX10_512BIT_SET OPTION_MASK_ISA2_AVX10_512BIT
+#define OPTION_MASK_ISA2_AVX10_1_SET OPTION_MASK_ISA2_AVX10_1
 
 /* SSE4 includes both SSE4.1 and SSE4.2. -msse4 should be the same
    as -msse4.2.  */
@@ -232,7 +234,8 @@ along with GCC; see the file COPYING3.  If not see
 #define OPTION_MASK_ISA2_AVX2_UNSET \
   (OPTION_MASK_ISA2_AVXIFMA_UNSET | OPTION_MASK_ISA2_AVXVNNI_UNSET \
    | OPTION_MASK_ISA2_AVXVNNIINT8_UNSET | OPTION_MASK_ISA2_AVXNECONVERT_UNSET \
-   | OPTION_MASK_ISA2_AVXVNNIINT16_UNSET | OPTION_MASK_ISA2_AVX512F_UNSET)
+   | OPTION_MASK_ISA2_AVXVNNIINT16_UNSET | OPTION_MASK_ISA2_AVX512F_UNSET \
+   | OPTION_MASK_ISA2_AVX10_1_UNSET)
 #define OPTION_MASK_ISA_AVX512F_UNSET \
   (OPTION_MASK_ISA_AVX512F | OPTION_MASK_ISA_AVX512CD_UNSET \
    | OPTION_MASK_ISA_AVX512PF_UNSET | OPTION_MASK_ISA_AVX512ER_UNSET \
@@ -309,6 +312,8 @@ along with GCC; see the file COPYING3.  If not see
 #define OPTION_MASK_ISA2_SM3_UNSET OPTION_MASK_ISA2_SM3
 #define OPTION_MASK_ISA2_SHA512_UNSET OPTION_MASK_ISA2_SHA512
 #define OPTION_MASK_ISA2_SM4_UNSET OPTION_MASK_ISA2_SM4
+#define OPTION_MASK_ISA2_AVX10_512BIT_UNSET OPTION_MASK_ISA2_AVX10_512BIT
+#define OPTION_MASK_ISA2_AVX10_1_UNSET OPTION_MASK_ISA2_AVX10_1
 
 /* SSE4 includes both SSE4.1 and SSE4.2.  -mno-sse4 should the same
    as -mno-sse4.1. */
@@ -1341,6 +1346,52 @@ ix86_handle_option (struct gcc_options *opts,
 	}
       return true;
 
+    case OPT_mavx10_max_512bit:
+      if (value)
+	{
+	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AVX10_512BIT_SET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AVX10_512BIT_SET;
+	}
+      else
+	{
+	  opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_AVX10_512BIT_UNSET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AVX10_512BIT_UNSET;
+	}
+      return true;
+
+    case OPT_mavx10_1:
+      if (value)
+	{
+	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AVX10_1_SET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AVX10_1_SET;
+	  opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX2_SET;
+	  opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX2_SET;
+	}
+      else
+	{
+	  opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_AVX10_1_UNSET;
+	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AVX10_1_UNSET;
+	}
+      return true;
+
+    case OPT_mavx10_1_256:
+      opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AVX10_1_SET;
+      opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AVX10_1_SET;
+      opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_AVX10_512BIT_SET;
+      opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AVX10_512BIT_SET;
+      opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX2_SET;
+      opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX2_SET;
+      return true;
+
+    case OPT_mavx10_1_512:
+      opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AVX10_1_SET;
+      opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AVX10_1_SET;
+      opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AVX10_512BIT_SET;
+      opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AVX10_512BIT_SET;
+      opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX2_SET;
+      opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX2_SET;
+      return true;
+
     case OPT_mfma:
       if (value)
 	{
diff --git a/gcc/common/config/i386/i386-cpuinfo.h b/gcc/common/config/i386/i386-cpuinfo.h
index 9153b4d0a54..8fbfb38baed 100644
--- a/gcc/common/config/i386/i386-cpuinfo.h
+++ b/gcc/common/config/i386/i386-cpuinfo.h
@@ -261,6 +261,9 @@ enum processor_features
   FEATURE_SM3,
   FEATURE_SHA512,
   FEATURE_SM4,
+  FEATURE_AVX10_512BIT,
+  FEATURE_AVX10_1,
+  FEATURE_AVX10_1_512,
   CPU_FEATURE_MAX
 };
 
diff --git a/gcc/common/config/i386/i386-isas.h b/gcc/common/config/i386/i386-isas.h
index 2297903a45e..35be0cc3f2a 100644
--- a/gcc/common/config/i386/i386-isas.h
+++ b/gcc/common/config/i386/i386-isas.h
@@ -191,4 +191,9 @@ ISA_NAMES_TABLE_START
   ISA_NAMES_TABLE_ENTRY("sm3", FEATURE_SM3, P_NONE, "-msm3")
   ISA_NAMES_TABLE_ENTRY("sha512", FEATURE_SHA512, P_NONE, "-msha512")
   ISA_NAMES_TABLE_ENTRY("sm4", FEATURE_SM4, P_NONE, "-msm4")
+  ISA_NAMES_TABLE_ENTRY("avx10-max-512bit", FEATURE_AVX10_512BIT,
+			P_NONE, "-mavx10-max-512bit")
+  ISA_NAMES_TABLE_ENTRY("avx10.1", FEATURE_AVX10_1, P_NONE, "-mavx10.1")
+  ISA_NAMES_TABLE_ENTRY("avx10.1-256", FEATURE_AVX10_1, P_NONE, NULL)
+  ISA_NAMES_TABLE_ENTRY("avx10.1-512", FEATURE_AVX10_1_512, P_NONE, NULL)
 ISA_NAMES_TABLE_END
diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
index fd490f39110..4be6bc4816a 100644
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -78,10 +78,10 @@
  "TARGET_80387 || TARGET_FLOAT_RETURNS_IN_80387 ? FP_SECOND_REG : NO_REGS"
  "Second from top of 80387 floating-point stack (@code{%st(1)}).")
 
-(define_register_constraint "Yk" "TARGET_AVX512F ? MASK_REGS : NO_REGS"
+(define_register_constraint "Yk" "(TARGET_AVX512F || TARGET_AVX10_1) ? MASK_REGS : NO_REGS"
 "@internal Any mask register that can be used as predicate, i.e. k1-k7.")
 
-(define_register_constraint "k" "TARGET_AVX512F ? ALL_MASK_REGS : NO_REGS"
+(define_register_constraint "k" "(TARGET_AVX512F || TARGET_AVX10_1) ? ALL_MASK_REGS : NO_REGS"
 "@internal Any mask register.")
 
 ;; Vector registers (also used for plain floating point nowadays).
@@ -146,7 +146,7 @@
  "@internal Lower SSE register when avoiding REX prefix and all SSE registers otherwise.")
 
 (define_register_constraint "Yv"
- "TARGET_AVX512VL ? ALL_SSE_REGS : TARGET_SSE ? SSE_REGS : NO_REGS"
+ "(TARGET_AVX512VL || TARGET_AVX10_1) ? ALL_SSE_REGS : TARGET_SSE ? SSE_REGS : NO_REGS"
  "@internal For AVX512VL, any EVEX encodable SSE register (@code{%xmm0-%xmm31}), otherwise any SSE register.")
 
 (define_register_constraint "Yw"
diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h
index 73c15480350..ca5551cefca 100644
--- a/gcc/config/i386/cpuid.h
+++ b/gcc/config/i386/cpuid.h
@@ -149,6 +149,7 @@
 #define bit_AVXNECONVERT	(1 << 5)
 #define bit_AVXVNNIINT16	(1 << 10)
 #define bit_PREFETCHI	(1 << 14)
+#define bit_AVX10	(1 << 19)
 
 /* Extended State Enumeration Sub-leaf (%eax == 0xd, %ecx == 1) */
 #define bit_XSAVEOPT	(1 << 0)
@@ -159,6 +160,11 @@
 /* %ebx */
 #define bit_PTWRITE	(1 << 4)
 
+/* AVX10 sub leaf (%eax == 0x18) */
+/* %ebx */
+#define bit_AVX10_256	(1 << 17)
+#define bit_AVX10_512	(1 << 18)
+
 /* Keylocker leaf (%eax == 0x19) */
 /* %ebx */
 #define bit_AESKLE	( 1<<0 )
diff --git a/gcc/config/i386/i386-c.cc b/gcc/config/i386/i386-c.cc
index 257950582c2..caef5531593 100644
--- a/gcc/config/i386/i386-c.cc
+++ b/gcc/config/i386/i386-c.cc
@@ -692,6 +692,10 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
     def_or_undef (parse_in, "__SHA512__");
   if (isa_flag2 & OPTION_MASK_ISA2_SM4)
     def_or_undef (parse_in, "__SM4__");
+  if (isa_flag2 & OPTION_MASK_ISA2_AVX10_512BIT)
+    def_or_undef (parse_in, "__AVX10_512BIT__");
+  if (isa_flag2 & OPTION_MASK_ISA2_AVX10_1)
+    def_or_undef (parse_in, "__AVX10_1__");
   if (TARGET_IAMCU)
     {
       def_or_undef (parse_in, "__iamcu");
diff --git a/gcc/config/i386/i386-isa.def b/gcc/config/i386/i386-isa.def
index aeafcf870ac..f7d741746c3 100644
--- a/gcc/config/i386/i386-isa.def
+++ b/gcc/config/i386/i386-isa.def
@@ -121,3 +121,5 @@ DEF_PTA(AVXVNNIINT16)
 DEF_PTA(SM3)
 DEF_PTA(SHA512)
 DEF_PTA(SM4)
+DEF_PTA(AVX10_512BIT)
+DEF_PTA(AVX10_1)
diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
index 127ee24203c..b2281fbd4b5 100644
--- a/gcc/config/i386/i386-options.cc
+++ b/gcc/config/i386/i386-options.cc
@@ -243,7 +243,9 @@ static struct ix86_target_opts isa2_opts[] =
   { "-mavxvnniint16",	OPTION_MASK_ISA2_AVXVNNIINT16 },
   { "-msm3",		OPTION_MASK_ISA2_SM3 },
   { "-msha512",		OPTION_MASK_ISA2_SHA512 },
-  { "-msm4",            OPTION_MASK_ISA2_SM4 }
+  { "-msm4",            OPTION_MASK_ISA2_SM4 },
+  { "-mavx10-max-512bit",	OPTION_MASK_ISA2_AVX10_512BIT },
+  { "-mavx10.1",	OPTION_MASK_ISA2_AVX10_1 }
 };
 static struct ix86_target_opts isa_opts[] =
 {
@@ -983,7 +985,7 @@ ix86_valid_target_attribute_inner_p (tree fndecl, tree args, char *p_strings[],
     ix86_opt_ix86_no,
     ix86_opt_str,
     ix86_opt_enum,
-    ix86_opt_isa
+    ix86_opt_isa,
   };
 
   static const struct
@@ -1100,6 +1102,10 @@ ix86_valid_target_attribute_inner_p (tree fndecl, tree args, char *p_strings[],
     IX86_ATTR_ISA ("sm3", OPT_msm3),
     IX86_ATTR_ISA ("sha512", OPT_msha512),
     IX86_ATTR_ISA ("sm4", OPT_msm4),
+    IX86_ATTR_ISA ("avx10-max-512bit", OPT_mavx10_max_512bit),
+    IX86_ATTR_ISA ("avx10.1", OPT_mavx10_1),
+    IX86_ATTR_ISA ("avx10.1-256", OPT_mavx10_1_256),
+    IX86_ATTR_ISA ("avx10.1-512", OPT_mavx10_1_512),
 
     /* enum options */
     IX86_ATTR_ENUM ("fpmath=",	OPT_mfpmath_),
@@ -2524,6 +2530,22 @@ ix86_option_override_internal (bool main_args_p,
       &= ~((OPTION_MASK_ISA_BMI | OPTION_MASK_ISA_BMI2 | OPTION_MASK_ISA_TBM)
 	   & ~opts->x_ix86_isa_flags_explicit);
 
+  /* Enable AVX512{F,VL,BW,DQ,CD,BF16,FP16,VBMI,VBMI2,VNNI,IFMA,BITALG,
+     VPOPCNTDQ} features for AVX10.1/512.  */
+  if (TARGET_AVX10_1_P (opts->x_ix86_isa_flags2)
+      && TARGET_AVX10_512BIT_P (opts->x_ix86_isa_flags2))
+    {
+      opts->x_ix86_isa_flags
+	|= OPTION_MASK_ISA_AVX512F | OPTION_MASK_ISA_AVX512CD
+	    | OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512BW
+	    | OPTION_MASK_ISA_AVX512VL | OPTION_MASK_ISA_AVX512IFMA
+	    | OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512VBMI2
+	    | OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VPOPCNTDQ
+	    | OPTION_MASK_ISA_AVX512BITALG;
+      opts->x_ix86_isa_flags2
+	|= OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_AVX512BF16;
+    }
+
   /* Validate -mpreferred-stack-boundary= value or default it to
      PREFERRED_STACK_BOUNDARY_DEFAULT.  */
   ix86_preferred_stack_boundary = PREFERRED_STACK_BOUNDARY_DEFAULT;
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 5d57726e22c..e75614b993d 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -513,8 +513,8 @@ ix86_conditional_register_usage (void)
   if (! (TARGET_80387 || TARGET_FLOAT_RETURNS_IN_80387))
     accessible_reg_set &= ~reg_class_contents[FLOAT_REGS];
 
-  /* If AVX512F is disabled, disable the registers.  */
-  if (! TARGET_AVX512F)
+  /* If AVX512F and AVX10 is disabled, disable the registers.  */
+  if (!TARGET_AVX512F && !TARGET_AVX10_1)
     {
       for (i = FIRST_EXT_REX_SSE_REG; i <= LAST_EXT_REX_SSE_REG; i++)
 	CLEAR_HARD_REG_BIT (accessible_reg_set, i);
@@ -5490,6 +5490,7 @@ ix86_get_ssemov (rtx *operands, unsigned size,
      we can only use zmm register move without memory operand.  */
   if (evex_reg_p
       && !TARGET_AVX512VL
+      && !TARGET_AVX10_1
       && GET_MODE_SIZE (mode) < 64)
     {
       /* NB: Even though ix86_hard_regno_mode_ok doesn't allow
@@ -20259,7 +20260,8 @@ ix86_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
 
       return ((TARGET_AVX512F && VALID_MASK_REG_MODE (mode))
 	      || (TARGET_AVX512BW
-		  && VALID_MASK_AVX512BW_MODE (mode)));
+		  && VALID_MASK_AVX512BW_MODE (mode))
+	      || (TARGET_AVX10_1 && VALID_MASK_AVX10_MODE (mode)));
     }
 
   if (GET_MODE_CLASS (mode) == MODE_PARTIAL_INT)
@@ -20294,6 +20296,13 @@ ix86_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
 	      || VALID_AVX512VL_128_REG_MODE (mode)))
 	return true;
 
+      /* AVX10_1 allows sse regs16+ for 256 bit modes.  */
+      if (TARGET_AVX10_1
+	  && (VALID_AVX256_REG_OR_OI_MODE (mode)
+	      || VALID_AVX512VL_128_REG_MODE (mode)
+	      || VALID_AVX512F_SCALAR_MODE (mode)))
+	return true;
+
       /* xmm16-xmm31 are only available for AVX-512.  */
       if (EXT_REX_SSE_REGNO_P (regno))
 	return false;
@@ -21584,7 +21593,8 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno,
       mask = XEXP (x, 2);
       /* This is masked instruction, assume the same cost,
 	 as nonmasked variant.  */
-      if (TARGET_AVX512F && register_operand (mask, GET_MODE (mask)))
+      if ((TARGET_AVX512F || TARGET_AVX10_1)
+	  && register_operand (mask, GET_MODE (mask)))
 	*total = rtx_cost (XEXP (x, 0), mode, outer_code, opno, speed);
       else
 	*total = cost->sse_op;
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index ef342fcee9b..77b50913458 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -1080,6 +1080,9 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
 
 #define VALID_MASK_AVX512BW_MODE(MODE) ((MODE) == SImode || (MODE) == DImode)
 
+#define VALID_MASK_AVX10_MODE(MODE) ((MODE) == SImode || (MODE) == HImode \
+				       || (MODE) == QImode)
+
 #define VALID_FP_MODE_P(MODE)						\
   ((MODE) == SFmode || (MODE) == DFmode || (MODE) == XFmode		\
    || (MODE) == SCmode || (MODE) == DCmode || (MODE) == XCmode)
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index 1cc8563477a..0ce8e6204ff 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -1298,3 +1298,22 @@ msm4
 Target Mask(ISA2_SM4) Var(ix86_isa_flags2) Save
 Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX and
 SM4 built-in functions and code generation.
+
+mavx10-max-512bit
+Target Mask(ISA2_AVX10_512BIT) Var(ix86_isa_flags2) Save
+Indicates 512 bit vector width support for AVX10.
+
+mavx10.1
+Target Mask(ISA2_AVX10_1) Var(ix86_isa_flags2) Save
+Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2,
+and AVX10.1 built-in functions and code generation.
+
+mavx10.1-256
+Target RejectNegative
+Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2,
+and AVX10.1 built-in functions and code generation.
+
+mavx10.1-512
+Target RejectNegative
+Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2,
+and AVX10.1-512 built-in functions and code generation.
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 89c5b4ea2b2..08e8b3b761c 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -7184,6 +7184,19 @@ Enable/disable the generation of the SHA512 instructions.
 @itemx no-sm4
 Enable/disable the generation of the SM4 instructions.
 
+@cindex @code{target("avx10.1")} function attribute, x86
+@item avx10.1
+@itemx no-avx10.1
+Enable/disable the generation of the AVX10.1 instructions.
+
+@cindex @code{target("avx10.1-256")} function attribute, x86
+@item avx10.1-256
+Enable the generation of the AVX10.1 instructions.
+
+@cindex @code{target("avx10.1-512")} function attribute, x86
+@item avx10.1-512
+Enable the generation of the AVX10.1 512 bit instructions.
+
 @cindex @code{target("cld")} function attribute, x86
 @item cld
 @itemx no-cld
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 674f956f4b8..43b6210c3c8 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1436,6 +1436,7 @@ See RS/6000 and PowerPC Options.
 -mamx-tile  -mamx-int8  -mamx-bf16 -muintr -mhreset -mavxvnni
 -mavx512fp16 -mavxifma -mavxvnniint8 -mavxneconvert -mcmpccxadd -mamx-fp16
 -mprefetchi -mraoint -mamx-complex -mavxvnniint16 -msm3 -msha512 -msm4
+-mavx10.1 -mavx10.1-256 -mavx10.1-512
 -mcldemote  -mms-bitfields  -mno-align-stringops  -minline-all-stringops
 -minline-stringops-dynamically  -mstringop-strategy=@var{alg}
 -mkl -mwidekl
@@ -33670,6 +33671,15 @@ preferred alignment to @option{-mpreferred-stack-boundary=2}.
 @need 200
 @opindex msm4
 @itemx -msm4
+@need 200
+@opindex mavx10.1
+@itemx -mavx10.1
+@need 200
+@opindex mavx10.1-256
+@itemx -mavx10.1-256
+@need 200
+@opindex mavx10.1-512
+@itemx -mavx10.1-512
 These switches enable the use of instructions in the MMX, SSE,
 AVX512ER, AVX512CD, AVX512VL, AVX512BW, AVX512DQ, AVX512IFMA, AVX512VBMI, SHA,
 AES, PCLMUL, CLFLUSHOPT, CLWB, FSGSBASE, PTWRITE, RDRND, F16C, FMA, PCONFIG,
@@ -33680,9 +33690,9 @@ GFNI, VAES, WAITPKG, VPCLMULQDQ, AVX512BITALG, MOVDIRI, MOVDIR64B, AVX512BF16,
 ENQCMD, AVX512VPOPCNTDQ, AVX5124FMAPS, AVX512VNNI, AVX5124VNNIW, SERIALIZE,
 UINTR, HRESET, AMXTILE, AMXINT8, AMXBF16, KL, WIDEKL, AVXVNNI, AVX512-FP16,
 AVXIFMA, AVXVNNIINT8, AVXNECONVERT, CMPCCXADD, AMX-FP16, PREFETCHI, RAOINT,
-AMX-COMPLEX, AVXVNNIINT16, SM3, SHA512, SM4 or CLDEMOTE extended instruction
-sets. Each has a corresponding @option{-mno-} option to disable use of these
-instructions.
+AMX-COMPLEX, AVXVNNIINT16, SM3, SHA512, SM4, AVX10.1 or CLDEMOTE extended
+instruction sets. Each has a corresponding @option{-mno-} option to disable
+use of these instructions.
 
 These extensions are also available as built-in functions: see
 @ref{x86 Built-in Functions}, for details of the functions enabled and
diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 1a78b3c1abb..cab8065cd8e 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2484,6 +2484,15 @@ Target supports compiling @code{avx} instructions.
 @item avx_runtime
 Target supports the execution of @code{avx} instructions.
 
+@item avx10.1
+Target supports the execution of @code{avx10.1} instructions.
+
+@item avx10.1-256
+Target supports the execution of @code{avx10.1} instructions.
+
+@item avx10.1-512
+Target supports the execution of @code{avx10.1-512} instructions.
+
 @item avx2
 Target supports compiling @code{avx2} instructions.
 
diff --git a/gcc/testsuite/g++.target/i386/mv33.C b/gcc/testsuite/g++.target/i386/mv33.C
new file mode 100644
index 00000000000..b50f13c5aa8
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/mv33.C
@@ -0,0 +1,30 @@
+// Test that dispatching can choose the right multiversion
+// for avx10.x-512 microarchitecture levels. 
+
+// { dg-do run }
+// { dg-require-ifunc "" }
+// { dg-options "-O2" }
+
+#include <assert.h>
+
+int __attribute__ ((target("default")))
+foo ()
+{
+  return 0;
+}
+
+int __attribute__ ((target("avx10.1-512"))) foo () {
+  return 1;
+}
+
+int main ()
+{
+  int val = foo ();
+
+  if  (__builtin_cpu_supports ("avx10.1-512"))
+    assert (val == 1);
+  else
+    assert (val == 0);
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-1.c b/gcc/testsuite/gcc.target/i386/avx10_1-1.c
new file mode 100644
index 00000000000..cfd9662bb13
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-1.c
@@ -0,0 +1,22 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -march=x86-64 -mavx10.1" } */
+
+#include <immintrin.h>
+
+void
+f1 ()
+{
+  register __m256d a __asm ("ymm17");
+  register __m256d b __asm ("ymm16");
+  a = _mm256_add_pd (a, b);
+  asm volatile ("" : "+v" (a));
+}
+
+void
+f2 ()
+{
+  register __m128d a __asm ("xmm17");
+  register __m128d b __asm ("xmm16");
+  a = _mm_add_pd (a, b);
+  asm volatile ("" : "+v" (a));
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-10.c b/gcc/testsuite/gcc.target/i386/avx10_1-10.c
new file mode 100644
index 00000000000..9a5892d8df9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-10.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-march=x86-64" } */
+/* { dg-final { scan-assembler "%zmm" } } */
+
+typedef double __m512d __attribute__ ((__vector_size__ (64), __may_alias__));
+
+__attribute__ ((target ("avx10.1-512"))) __m512d
+foo ()
+{
+  __m512d a, b;
+  a = a + b;
+  return a;
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-2.c b/gcc/testsuite/gcc.target/i386/avx10_1-2.c
new file mode 100644
index 00000000000..0b3991dcf74
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-2.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-march=x86-64 -mavx10.1-512" } */
+/* { dg-final { scan-assembler "%zmm" } } */
+
+typedef double __m512d __attribute__ ((__vector_size__ (64), __may_alias__));
+
+__m512d
+foo ()
+{
+  __m512d a, b;
+  a = a + b;
+  return a;
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-3.c b/gcc/testsuite/gcc.target/i386/avx10_1-3.c
new file mode 100644
index 00000000000..3be988a1a62
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-3.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=x86-64 -mavx10.1" } */
+
+#include <immintrin.h>
+
+int
+foo (int c)
+{
+  register int a __asm ("k7") = c;
+  int b = foo (a);
+  asm volatile ("" : "+k" (b));
+  return b;
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-4.c b/gcc/testsuite/gcc.target/i386/avx10_1-4.c
new file mode 100644
index 00000000000..68cbf197d61
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-4.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=x86-64 -mavx10.1-512" } */
+
+#include <immintrin.h>
+
+long long
+foo (long long c)
+{
+  register long long a __asm ("k7") = c;
+  long long b = foo (a);
+  asm volatile ("" : "+k" (b));
+  return b;
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-5.c b/gcc/testsuite/gcc.target/i386/avx10_1-5.c
new file mode 100644
index 00000000000..5481ab2f386
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-5.c
@@ -0,0 +1,13 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O0 -march=x86-64 -mavx10.1 -Wno-psabi" } */
+/* { dg-final { scan-assembler-not ".%zmm" } } */
+
+typedef double __m512d __attribute__ ((__vector_size__ (64), __may_alias__));
+
+__m512d
+foo ()
+{
+  __m512d a, b;
+  a = a + b;
+  return a;
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-6.c b/gcc/testsuite/gcc.target/i386/avx10_1-6.c
new file mode 100644
index 00000000000..827c80ce51e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-6.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=x86-64 -mavx10.1" } */
+
+#include <immintrin.h>
+
+long long
+foo (long long c)
+{
+  register long long a __asm ("k7") = c;
+  long long b = foo (a);
+  asm volatile ("" : "+k" (b)); /* { dg-error "inconsistent operand constraints in an 'asm'" } */
+  return b;
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-7.c b/gcc/testsuite/gcc.target/i386/avx10_1-7.c
new file mode 100644
index 00000000000..d8b8d97590b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-7.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-march=x86-64 -Wno-psabi" } */
+/* { dg-final { scan-assembler-not ".%zmm" } } */
+
+typedef double __m512d __attribute__ ((__vector_size__ (64), __may_alias__));
+
+__attribute__ ((target ("avx10.1"))) __m512d
+foo ()
+{
+  __m512d a, b;
+  a = a + b;
+  return a;
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-8.c b/gcc/testsuite/gcc.target/i386/avx10_1-8.c
new file mode 100644
index 00000000000..8dbd201b336
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-8.c
@@ -0,0 +1,4 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -march=x86-64 -mavx10.1-256" } */
+
+#include "avx10_1-1.c"
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-9.c b/gcc/testsuite/gcc.target/i386/avx10_1-9.c
new file mode 100644
index 00000000000..00493098be7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-9.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-march=x86-64 -Wno-psabi" } */
+/* { dg-final { scan-assembler-not ".%zmm" } } */
+
+typedef double __m512d __attribute__ ((__vector_size__ (64), __may_alias__));
+
+__attribute__ ((target ("avx10.1-256"))) __m512d
+foo ()
+{
+  __m512d a, b;
+  a = a + b;
+  return a;
+}
-- 
2.31.1


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 1/3] Initial support for AVX10.1
  2023-08-08  7:13 ` [PATCH 1/3] Initial support for AVX10.1 Haochen Jiang
@ 2023-08-16  2:29   ` Hongtao Liu
  0 siblings, 0 replies; 88+ messages in thread
From: Hongtao Liu @ 2023-08-16  2:29 UTC (permalink / raw)
  To: Haochen Jiang; +Cc: gcc-patches, ubizjak, hongtao.liu

On Tue, Aug 8, 2023 at 3:16 PM Haochen Jiang via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> gcc/ChangeLog:
>
>         * common/config/i386/cpuinfo.h (get_available_features):
>         Add avx10_set and version and detect avx10.1.
>         (cpu_indicator_init): Handle avx10.1-512.
>         * common/config/i386/i386-common.cc
>         (OPTION_MASK_ISA2_AVX10_512BIT_SET): New.
>         (OPTION_MASK_ISA2_AVX10_1_SET): Ditto.
>         (OPTION_MASK_ISA2_AVX10_512BIT_UNSET): Ditto.
>         (OPTION_MASK_ISA2_AVX10_1_UNSET): Ditto.
>         (OPTION_MASK_ISA2_AVX2_UNSET): Modify for AVX10_1.
>         (ix86_handle_option): Handle -mavx10.1, -mavx10.1-256 and
>         -mavx10.1-512.
>         * common/config/i386/i386-cpuinfo.h (enum processor_features):
>         Add FEATURE_AVX10_512BIT, FEATURE_AVX10_1 and
>         FEATURE_AVX10_512BIT.
>         * common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY for
>         AVX10_512BIT, AVX10_1 and AVX10_1_512.
>         * config/i386/constraints.md (Yk): Add AVX10_1.
>         (Yv): Ditto.
>         (k): Ditto.
>         * config/i386/cpuid.h (bit_AVX10): New.
>         (bit_AVX10_256): Ditto.
>         (bit_AVX10_512): Ditto.
>         * config/i386/i386-c.cc (ix86_target_macros_internal):
>         Define AVX10_512BIT and AVX10_1.
>         * config/i386/i386-isa.def
>         (AVX10_512BIT): Add DEF_PTA(AVX10_512BIT).
>         (AVX10_1): Add DEF_PTA(AVX10_1).
>         * config/i386/i386-options.cc (isa2_opts): Add -mavx10.1.
>         (ix86_valid_target_attribute_inner_p): Handle avx10-512bit, avx10.1
>         and avx10.1-512.
>         (ix86_option_override_internal): Enable AVX512{F,VL,BW,DQ,CD,BF16,
>         FP16,VBMI,VBMI2,VNNI,IFMA,BITALG,VPOPCNTDQ} features for avx10.1-512.
>         (ix86_valid_target_attribute_inner_p): Handle AVX10_1.
>         * config/i386/i386.cc (ix86_get_ssemov): Add AVX10_1.
>         (ix86_conditional_register_usage): Ditto.
>         (ix86_hard_regno_mode_ok): Ditto.
>         (ix86_rtx_costs): Ditto.
>         * config/i386/i386.h (VALID_MASK_AVX10_MODE): New macro.
>         * config/i386/i386.opt: Add option -mavx10.1, -mavx10.1-256 and
>         -mavx10.1-512.
>         * doc/extend.texi: Document avx10.1, avx10.1-256 and avx10.1-512.
>         * doc/invoke.texi: Document -mavx10.1, -mavx10.1-256 and -mavx10.1-512.
>         * doc/sourcebuild.texi: Document target avx10.1, avx10.1-256
>         and avx10.1-512.
>
> gcc/testsuite/ChangeLog:
>
>         * g++.target/i386/mv33.C: New test.
>         * gcc.target/i386/avx10_1-1.c: Ditto.
>         * gcc.target/i386/avx10_1-2.c: Ditto.
>         * gcc.target/i386/avx10_1-3.c: Ditto.
>         * gcc.target/i386/avx10_1-4.c: Ditto.
>         * gcc.target/i386/avx10_1-5.c: Ditto.
>         * gcc.target/i386/avx10_1-6.c: Ditto.
>         * gcc.target/i386/avx10_1-7.c: Ditto.
>         * gcc.target/i386/avx10_1-8.c: Ditto.
>         * gcc.target/i386/avx10_1-9.c: Ditto.
>         * gcc.target/i386/avx10_1-10.c: Ditto.
Ok(please wait for extra 24 hours to commit, if there's no objection)
> ---
>  gcc/common/config/i386/cpuinfo.h           | 36 +++++++++++++++
>  gcc/common/config/i386/i386-common.cc      | 53 +++++++++++++++++++++-
>  gcc/common/config/i386/i386-cpuinfo.h      |  3 ++
>  gcc/common/config/i386/i386-isas.h         |  5 ++
>  gcc/config/i386/constraints.md             |  6 +--
>  gcc/config/i386/cpuid.h                    |  6 +++
>  gcc/config/i386/i386-c.cc                  |  4 ++
>  gcc/config/i386/i386-isa.def               |  2 +
>  gcc/config/i386/i386-options.cc            | 26 ++++++++++-
>  gcc/config/i386/i386.cc                    | 18 ++++++--
>  gcc/config/i386/i386.h                     |  3 ++
>  gcc/config/i386/i386.opt                   | 19 ++++++++
>  gcc/doc/extend.texi                        | 13 ++++++
>  gcc/doc/invoke.texi                        | 16 +++++--
>  gcc/doc/sourcebuild.texi                   |  9 ++++
>  gcc/testsuite/g++.target/i386/mv33.C       | 30 ++++++++++++
>  gcc/testsuite/gcc.target/i386/avx10_1-1.c  | 22 +++++++++
>  gcc/testsuite/gcc.target/i386/avx10_1-10.c | 13 ++++++
>  gcc/testsuite/gcc.target/i386/avx10_1-2.c  | 13 ++++++
>  gcc/testsuite/gcc.target/i386/avx10_1-3.c  | 13 ++++++
>  gcc/testsuite/gcc.target/i386/avx10_1-4.c  | 13 ++++++
>  gcc/testsuite/gcc.target/i386/avx10_1-5.c  | 13 ++++++
>  gcc/testsuite/gcc.target/i386/avx10_1-6.c  | 13 ++++++
>  gcc/testsuite/gcc.target/i386/avx10_1-7.c  | 13 ++++++
>  gcc/testsuite/gcc.target/i386/avx10_1-8.c  |  4 ++
>  gcc/testsuite/gcc.target/i386/avx10_1-9.c  | 13 ++++++
>  26 files changed, 366 insertions(+), 13 deletions(-)
>  create mode 100644 gcc/testsuite/g++.target/i386/mv33.C
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-10.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-3.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-4.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-5.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-6.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-7.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-8.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-9.c
>
> diff --git a/gcc/common/config/i386/cpuinfo.h b/gcc/common/config/i386/cpuinfo.h
> index 30ef0d334ca..5abff83b4ca 100644
> --- a/gcc/common/config/i386/cpuinfo.h
> +++ b/gcc/common/config/i386/cpuinfo.h
> @@ -688,6 +688,9 @@ get_available_features (struct __processor_model *cpu_model,
>    int amx_usable = 0;
>    /* Check if KL is usable.  */
>    int has_kl = 0;
> +  /* Record AVX10 version.  */
> +  int avx10_set = 0;
> +  int version = 0;
>    if ((ecx & bit_OSXSAVE))
>      {
>        /* Check if XMM, YMM, OPMASK, upper 256 bits of ZMM0-ZMM15 and
> @@ -906,6 +909,9 @@ get_available_features (struct __processor_model *cpu_model,
>         {
>           if (eax & bit_AVX512BF16)
>             set_feature (FEATURE_AVX512BF16);
> +         /* AVX10 has the same XSTATE with AVX512.  */
> +         if (edx & bit_AVX10)
> +           avx10_set = 1;
>         }
>        if (amx_usable)
>         {
> @@ -951,6 +957,24 @@ get_available_features (struct __processor_model *cpu_model,
>         }
>      }
>
> +  /* Get Advanced Features at level 0x24 (eax = 0x24).  */
> +  if (avx10_set && max_cpuid_level >= 0x24)
> +    {
> +      __cpuid (0x18, eax, ebx, ecx, edx);
> +      version = ebx & 0xff;
> +      if (ebx & bit_AVX10_256)
> +       switch (version)
> +         {
> +         case 1:
> +           set_feature (FEATURE_AVX10_1);
> +           break;
> +         default:
> +           gcc_unreachable ();
> +         }
> +      if (ebx & bit_AVX10_512)
> +       set_feature (FEATURE_AVX10_512BIT);
> +    }
> +
>    /* Check cpuid level of extended features.  */
>    __cpuid (0x80000000, ext_level, ebx, ecx, edx);
>
> @@ -1155,6 +1179,18 @@ cpu_indicator_init (struct __processor_model *cpu_model,
>         }
>      }
>
> +#define SET_AVX10_512(A,B) \
> +  if (has_cpu_feature (cpu_model, cpu_features2, FEATURE_AVX10_##A)) \
> +    { \
> +      CHECK___builtin_cpu_supports (B); \
> +      set_cpu_feature (cpu_model, cpu_features2, FEATURE_AVX10_##A##_512); \
> +    }
> +
> +  if (has_cpu_feature (cpu_model, cpu_features2, FEATURE_AVX10_512BIT))
> +    SET_AVX10_512 (1, "avx10.1-512");
> +
> +#undef SET_AVX10_512
> +
>    gcc_assert (cpu_model->__cpu_vendor < VENDOR_MAX);
>    gcc_assert (cpu_model->__cpu_type < CPU_TYPE_MAX);
>    gcc_assert (cpu_model->__cpu_subtype < CPU_SUBTYPE_MAX);
> diff --git a/gcc/common/config/i386/i386-common.cc b/gcc/common/config/i386/i386-common.cc
> index 26005914079..6c3bebb1846 100644
> --- a/gcc/common/config/i386/i386-common.cc
> +++ b/gcc/common/config/i386/i386-common.cc
> @@ -123,6 +123,8 @@ along with GCC; see the file COPYING3.  If not see
>  #define OPTION_MASK_ISA2_SM3_SET OPTION_MASK_ISA2_SM3
>  #define OPTION_MASK_ISA2_SHA512_SET OPTION_MASK_ISA2_SHA512
>  #define OPTION_MASK_ISA2_SM4_SET OPTION_MASK_ISA2_SM4
> +#define OPTION_MASK_ISA2_AVX10_512BIT_SET OPTION_MASK_ISA2_AVX10_512BIT
> +#define OPTION_MASK_ISA2_AVX10_1_SET OPTION_MASK_ISA2_AVX10_1
>
>  /* SSE4 includes both SSE4.1 and SSE4.2. -msse4 should be the same
>     as -msse4.2.  */
> @@ -232,7 +234,8 @@ along with GCC; see the file COPYING3.  If not see
>  #define OPTION_MASK_ISA2_AVX2_UNSET \
>    (OPTION_MASK_ISA2_AVXIFMA_UNSET | OPTION_MASK_ISA2_AVXVNNI_UNSET \
>     | OPTION_MASK_ISA2_AVXVNNIINT8_UNSET | OPTION_MASK_ISA2_AVXNECONVERT_UNSET \
> -   | OPTION_MASK_ISA2_AVXVNNIINT16_UNSET | OPTION_MASK_ISA2_AVX512F_UNSET)
> +   | OPTION_MASK_ISA2_AVXVNNIINT16_UNSET | OPTION_MASK_ISA2_AVX512F_UNSET \
> +   | OPTION_MASK_ISA2_AVX10_1_UNSET)
>  #define OPTION_MASK_ISA_AVX512F_UNSET \
>    (OPTION_MASK_ISA_AVX512F | OPTION_MASK_ISA_AVX512CD_UNSET \
>     | OPTION_MASK_ISA_AVX512PF_UNSET | OPTION_MASK_ISA_AVX512ER_UNSET \
> @@ -309,6 +312,8 @@ along with GCC; see the file COPYING3.  If not see
>  #define OPTION_MASK_ISA2_SM3_UNSET OPTION_MASK_ISA2_SM3
>  #define OPTION_MASK_ISA2_SHA512_UNSET OPTION_MASK_ISA2_SHA512
>  #define OPTION_MASK_ISA2_SM4_UNSET OPTION_MASK_ISA2_SM4
> +#define OPTION_MASK_ISA2_AVX10_512BIT_UNSET OPTION_MASK_ISA2_AVX10_512BIT
> +#define OPTION_MASK_ISA2_AVX10_1_UNSET OPTION_MASK_ISA2_AVX10_1
>
>  /* SSE4 includes both SSE4.1 and SSE4.2.  -mno-sse4 should the same
>     as -mno-sse4.1. */
> @@ -1341,6 +1346,52 @@ ix86_handle_option (struct gcc_options *opts,
>         }
>        return true;
>
> +    case OPT_mavx10_max_512bit:
> +      if (value)
> +       {
> +         opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AVX10_512BIT_SET;
> +         opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AVX10_512BIT_SET;
> +       }
> +      else
> +       {
> +         opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_AVX10_512BIT_UNSET;
> +         opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AVX10_512BIT_UNSET;
> +       }
> +      return true;
> +
> +    case OPT_mavx10_1:
> +      if (value)
> +       {
> +         opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AVX10_1_SET;
> +         opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AVX10_1_SET;
> +         opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX2_SET;
> +         opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX2_SET;
> +       }
> +      else
> +       {
> +         opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_AVX10_1_UNSET;
> +         opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AVX10_1_UNSET;
> +       }
> +      return true;
> +
> +    case OPT_mavx10_1_256:
> +      opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AVX10_1_SET;
> +      opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AVX10_1_SET;
> +      opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_AVX10_512BIT_SET;
> +      opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AVX10_512BIT_SET;
> +      opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX2_SET;
> +      opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX2_SET;
> +      return true;
> +
> +    case OPT_mavx10_1_512:
> +      opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AVX10_1_SET;
> +      opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AVX10_1_SET;
> +      opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AVX10_512BIT_SET;
> +      opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AVX10_512BIT_SET;
> +      opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX2_SET;
> +      opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX2_SET;
> +      return true;
> +
>      case OPT_mfma:
>        if (value)
>         {
> diff --git a/gcc/common/config/i386/i386-cpuinfo.h b/gcc/common/config/i386/i386-cpuinfo.h
> index 9153b4d0a54..8fbfb38baed 100644
> --- a/gcc/common/config/i386/i386-cpuinfo.h
> +++ b/gcc/common/config/i386/i386-cpuinfo.h
> @@ -261,6 +261,9 @@ enum processor_features
>    FEATURE_SM3,
>    FEATURE_SHA512,
>    FEATURE_SM4,
> +  FEATURE_AVX10_512BIT,
> +  FEATURE_AVX10_1,
> +  FEATURE_AVX10_1_512,
>    CPU_FEATURE_MAX
>  };
>
> diff --git a/gcc/common/config/i386/i386-isas.h b/gcc/common/config/i386/i386-isas.h
> index 2297903a45e..35be0cc3f2a 100644
> --- a/gcc/common/config/i386/i386-isas.h
> +++ b/gcc/common/config/i386/i386-isas.h
> @@ -191,4 +191,9 @@ ISA_NAMES_TABLE_START
>    ISA_NAMES_TABLE_ENTRY("sm3", FEATURE_SM3, P_NONE, "-msm3")
>    ISA_NAMES_TABLE_ENTRY("sha512", FEATURE_SHA512, P_NONE, "-msha512")
>    ISA_NAMES_TABLE_ENTRY("sm4", FEATURE_SM4, P_NONE, "-msm4")
> +  ISA_NAMES_TABLE_ENTRY("avx10-max-512bit", FEATURE_AVX10_512BIT,
> +                       P_NONE, "-mavx10-max-512bit")
> +  ISA_NAMES_TABLE_ENTRY("avx10.1", FEATURE_AVX10_1, P_NONE, "-mavx10.1")
> +  ISA_NAMES_TABLE_ENTRY("avx10.1-256", FEATURE_AVX10_1, P_NONE, NULL)
> +  ISA_NAMES_TABLE_ENTRY("avx10.1-512", FEATURE_AVX10_1_512, P_NONE, NULL)
>  ISA_NAMES_TABLE_END
> diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
> index fd490f39110..4be6bc4816a 100644
> --- a/gcc/config/i386/constraints.md
> +++ b/gcc/config/i386/constraints.md
> @@ -78,10 +78,10 @@
>   "TARGET_80387 || TARGET_FLOAT_RETURNS_IN_80387 ? FP_SECOND_REG : NO_REGS"
>   "Second from top of 80387 floating-point stack (@code{%st(1)}).")
>
> -(define_register_constraint "Yk" "TARGET_AVX512F ? MASK_REGS : NO_REGS"
> +(define_register_constraint "Yk" "(TARGET_AVX512F || TARGET_AVX10_1) ? MASK_REGS : NO_REGS"
>  "@internal Any mask register that can be used as predicate, i.e. k1-k7.")
>
> -(define_register_constraint "k" "TARGET_AVX512F ? ALL_MASK_REGS : NO_REGS"
> +(define_register_constraint "k" "(TARGET_AVX512F || TARGET_AVX10_1) ? ALL_MASK_REGS : NO_REGS"
>  "@internal Any mask register.")
>
>  ;; Vector registers (also used for plain floating point nowadays).
> @@ -146,7 +146,7 @@
>   "@internal Lower SSE register when avoiding REX prefix and all SSE registers otherwise.")
>
>  (define_register_constraint "Yv"
> - "TARGET_AVX512VL ? ALL_SSE_REGS : TARGET_SSE ? SSE_REGS : NO_REGS"
> + "(TARGET_AVX512VL || TARGET_AVX10_1) ? ALL_SSE_REGS : TARGET_SSE ? SSE_REGS : NO_REGS"
>   "@internal For AVX512VL, any EVEX encodable SSE register (@code{%xmm0-%xmm31}), otherwise any SSE register.")
>
>  (define_register_constraint "Yw"
> diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h
> index 73c15480350..ca5551cefca 100644
> --- a/gcc/config/i386/cpuid.h
> +++ b/gcc/config/i386/cpuid.h
> @@ -149,6 +149,7 @@
>  #define bit_AVXNECONVERT       (1 << 5)
>  #define bit_AVXVNNIINT16       (1 << 10)
>  #define bit_PREFETCHI  (1 << 14)
> +#define bit_AVX10      (1 << 19)
>
>  /* Extended State Enumeration Sub-leaf (%eax == 0xd, %ecx == 1) */
>  #define bit_XSAVEOPT   (1 << 0)
> @@ -159,6 +160,11 @@
>  /* %ebx */
>  #define bit_PTWRITE    (1 << 4)
>
> +/* AVX10 sub leaf (%eax == 0x18) */
> +/* %ebx */
> +#define bit_AVX10_256  (1 << 17)
> +#define bit_AVX10_512  (1 << 18)
> +
>  /* Keylocker leaf (%eax == 0x19) */
>  /* %ebx */
>  #define bit_AESKLE     ( 1<<0 )
> diff --git a/gcc/config/i386/i386-c.cc b/gcc/config/i386/i386-c.cc
> index 257950582c2..caef5531593 100644
> --- a/gcc/config/i386/i386-c.cc
> +++ b/gcc/config/i386/i386-c.cc
> @@ -692,6 +692,10 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
>      def_or_undef (parse_in, "__SHA512__");
>    if (isa_flag2 & OPTION_MASK_ISA2_SM4)
>      def_or_undef (parse_in, "__SM4__");
> +  if (isa_flag2 & OPTION_MASK_ISA2_AVX10_512BIT)
> +    def_or_undef (parse_in, "__AVX10_512BIT__");
> +  if (isa_flag2 & OPTION_MASK_ISA2_AVX10_1)
> +    def_or_undef (parse_in, "__AVX10_1__");
>    if (TARGET_IAMCU)
>      {
>        def_or_undef (parse_in, "__iamcu");
> diff --git a/gcc/config/i386/i386-isa.def b/gcc/config/i386/i386-isa.def
> index aeafcf870ac..f7d741746c3 100644
> --- a/gcc/config/i386/i386-isa.def
> +++ b/gcc/config/i386/i386-isa.def
> @@ -121,3 +121,5 @@ DEF_PTA(AVXVNNIINT16)
>  DEF_PTA(SM3)
>  DEF_PTA(SHA512)
>  DEF_PTA(SM4)
> +DEF_PTA(AVX10_512BIT)
> +DEF_PTA(AVX10_1)
> diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
> index 127ee24203c..b2281fbd4b5 100644
> --- a/gcc/config/i386/i386-options.cc
> +++ b/gcc/config/i386/i386-options.cc
> @@ -243,7 +243,9 @@ static struct ix86_target_opts isa2_opts[] =
>    { "-mavxvnniint16",  OPTION_MASK_ISA2_AVXVNNIINT16 },
>    { "-msm3",           OPTION_MASK_ISA2_SM3 },
>    { "-msha512",                OPTION_MASK_ISA2_SHA512 },
> -  { "-msm4",            OPTION_MASK_ISA2_SM4 }
> +  { "-msm4",            OPTION_MASK_ISA2_SM4 },
> +  { "-mavx10-max-512bit",      OPTION_MASK_ISA2_AVX10_512BIT },
> +  { "-mavx10.1",       OPTION_MASK_ISA2_AVX10_1 }
>  };
>  static struct ix86_target_opts isa_opts[] =
>  {
> @@ -983,7 +985,7 @@ ix86_valid_target_attribute_inner_p (tree fndecl, tree args, char *p_strings[],
>      ix86_opt_ix86_no,
>      ix86_opt_str,
>      ix86_opt_enum,
> -    ix86_opt_isa
> +    ix86_opt_isa,
>    };
>
>    static const struct
> @@ -1100,6 +1102,10 @@ ix86_valid_target_attribute_inner_p (tree fndecl, tree args, char *p_strings[],
>      IX86_ATTR_ISA ("sm3", OPT_msm3),
>      IX86_ATTR_ISA ("sha512", OPT_msha512),
>      IX86_ATTR_ISA ("sm4", OPT_msm4),
> +    IX86_ATTR_ISA ("avx10-max-512bit", OPT_mavx10_max_512bit),
> +    IX86_ATTR_ISA ("avx10.1", OPT_mavx10_1),
> +    IX86_ATTR_ISA ("avx10.1-256", OPT_mavx10_1_256),
> +    IX86_ATTR_ISA ("avx10.1-512", OPT_mavx10_1_512),
>
>      /* enum options */
>      IX86_ATTR_ENUM ("fpmath=", OPT_mfpmath_),
> @@ -2524,6 +2530,22 @@ ix86_option_override_internal (bool main_args_p,
>        &= ~((OPTION_MASK_ISA_BMI | OPTION_MASK_ISA_BMI2 | OPTION_MASK_ISA_TBM)
>            & ~opts->x_ix86_isa_flags_explicit);
>
> +  /* Enable AVX512{F,VL,BW,DQ,CD,BF16,FP16,VBMI,VBMI2,VNNI,IFMA,BITALG,
> +     VPOPCNTDQ} features for AVX10.1/512.  */
> +  if (TARGET_AVX10_1_P (opts->x_ix86_isa_flags2)
> +      && TARGET_AVX10_512BIT_P (opts->x_ix86_isa_flags2))
> +    {
> +      opts->x_ix86_isa_flags
> +       |= OPTION_MASK_ISA_AVX512F | OPTION_MASK_ISA_AVX512CD
> +           | OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512BW
> +           | OPTION_MASK_ISA_AVX512VL | OPTION_MASK_ISA_AVX512IFMA
> +           | OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512VBMI2
> +           | OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VPOPCNTDQ
> +           | OPTION_MASK_ISA_AVX512BITALG;
> +      opts->x_ix86_isa_flags2
> +       |= OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_AVX512BF16;
> +    }
> +
>    /* Validate -mpreferred-stack-boundary= value or default it to
>       PREFERRED_STACK_BOUNDARY_DEFAULT.  */
>    ix86_preferred_stack_boundary = PREFERRED_STACK_BOUNDARY_DEFAULT;
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index 5d57726e22c..e75614b993d 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -513,8 +513,8 @@ ix86_conditional_register_usage (void)
>    if (! (TARGET_80387 || TARGET_FLOAT_RETURNS_IN_80387))
>      accessible_reg_set &= ~reg_class_contents[FLOAT_REGS];
>
> -  /* If AVX512F is disabled, disable the registers.  */
> -  if (! TARGET_AVX512F)
> +  /* If AVX512F and AVX10 is disabled, disable the registers.  */
> +  if (!TARGET_AVX512F && !TARGET_AVX10_1)
>      {
>        for (i = FIRST_EXT_REX_SSE_REG; i <= LAST_EXT_REX_SSE_REG; i++)
>         CLEAR_HARD_REG_BIT (accessible_reg_set, i);
> @@ -5490,6 +5490,7 @@ ix86_get_ssemov (rtx *operands, unsigned size,
>       we can only use zmm register move without memory operand.  */
>    if (evex_reg_p
>        && !TARGET_AVX512VL
> +      && !TARGET_AVX10_1
>        && GET_MODE_SIZE (mode) < 64)
>      {
>        /* NB: Even though ix86_hard_regno_mode_ok doesn't allow
> @@ -20259,7 +20260,8 @@ ix86_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
>
>        return ((TARGET_AVX512F && VALID_MASK_REG_MODE (mode))
>               || (TARGET_AVX512BW
> -                 && VALID_MASK_AVX512BW_MODE (mode)));
> +                 && VALID_MASK_AVX512BW_MODE (mode))
> +             || (TARGET_AVX10_1 && VALID_MASK_AVX10_MODE (mode)));
>      }
>
>    if (GET_MODE_CLASS (mode) == MODE_PARTIAL_INT)
> @@ -20294,6 +20296,13 @@ ix86_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
>               || VALID_AVX512VL_128_REG_MODE (mode)))
>         return true;
>
> +      /* AVX10_1 allows sse regs16+ for 256 bit modes.  */
> +      if (TARGET_AVX10_1
> +         && (VALID_AVX256_REG_OR_OI_MODE (mode)
> +             || VALID_AVX512VL_128_REG_MODE (mode)
> +             || VALID_AVX512F_SCALAR_MODE (mode)))
> +       return true;
> +
>        /* xmm16-xmm31 are only available for AVX-512.  */
>        if (EXT_REX_SSE_REGNO_P (regno))
>         return false;
> @@ -21584,7 +21593,8 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno,
>        mask = XEXP (x, 2);
>        /* This is masked instruction, assume the same cost,
>          as nonmasked variant.  */
> -      if (TARGET_AVX512F && register_operand (mask, GET_MODE (mask)))
> +      if ((TARGET_AVX512F || TARGET_AVX10_1)
> +         && register_operand (mask, GET_MODE (mask)))
>         *total = rtx_cost (XEXP (x, 0), mode, outer_code, opno, speed);
>        else
>         *total = cost->sse_op;
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> index ef342fcee9b..77b50913458 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -1080,6 +1080,9 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
>
>  #define VALID_MASK_AVX512BW_MODE(MODE) ((MODE) == SImode || (MODE) == DImode)
>
> +#define VALID_MASK_AVX10_MODE(MODE) ((MODE) == SImode || (MODE) == HImode \
> +                                      || (MODE) == QImode)
> +
>  #define VALID_FP_MODE_P(MODE)                                          \
>    ((MODE) == SFmode || (MODE) == DFmode || (MODE) == XFmode            \
>     || (MODE) == SCmode || (MODE) == DCmode || (MODE) == XCmode)
> diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
> index 1cc8563477a..0ce8e6204ff 100644
> --- a/gcc/config/i386/i386.opt
> +++ b/gcc/config/i386/i386.opt
> @@ -1298,3 +1298,22 @@ msm4
>  Target Mask(ISA2_SM4) Var(ix86_isa_flags2) Save
>  Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX and
>  SM4 built-in functions and code generation.
> +
> +mavx10-max-512bit
> +Target Mask(ISA2_AVX10_512BIT) Var(ix86_isa_flags2) Save
> +Indicates 512 bit vector width support for AVX10.
> +
> +mavx10.1
> +Target Mask(ISA2_AVX10_1) Var(ix86_isa_flags2) Save
> +Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2,
> +and AVX10.1 built-in functions and code generation.
> +
> +mavx10.1-256
> +Target RejectNegative
> +Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2,
> +and AVX10.1 built-in functions and code generation.
> +
> +mavx10.1-512
> +Target RejectNegative
> +Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2,
> +and AVX10.1-512 built-in functions and code generation.
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index 89c5b4ea2b2..08e8b3b761c 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -7184,6 +7184,19 @@ Enable/disable the generation of the SHA512 instructions.
>  @itemx no-sm4
>  Enable/disable the generation of the SM4 instructions.
>
> +@cindex @code{target("avx10.1")} function attribute, x86
> +@item avx10.1
> +@itemx no-avx10.1
> +Enable/disable the generation of the AVX10.1 instructions.
> +
> +@cindex @code{target("avx10.1-256")} function attribute, x86
> +@item avx10.1-256
> +Enable the generation of the AVX10.1 instructions.
> +
> +@cindex @code{target("avx10.1-512")} function attribute, x86
> +@item avx10.1-512
> +Enable the generation of the AVX10.1 512 bit instructions.
> +
>  @cindex @code{target("cld")} function attribute, x86
>  @item cld
>  @itemx no-cld
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 674f956f4b8..43b6210c3c8 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -1436,6 +1436,7 @@ See RS/6000 and PowerPC Options.
>  -mamx-tile  -mamx-int8  -mamx-bf16 -muintr -mhreset -mavxvnni
>  -mavx512fp16 -mavxifma -mavxvnniint8 -mavxneconvert -mcmpccxadd -mamx-fp16
>  -mprefetchi -mraoint -mamx-complex -mavxvnniint16 -msm3 -msha512 -msm4
> +-mavx10.1 -mavx10.1-256 -mavx10.1-512
>  -mcldemote  -mms-bitfields  -mno-align-stringops  -minline-all-stringops
>  -minline-stringops-dynamically  -mstringop-strategy=@var{alg}
>  -mkl -mwidekl
> @@ -33670,6 +33671,15 @@ preferred alignment to @option{-mpreferred-stack-boundary=2}.
>  @need 200
>  @opindex msm4
>  @itemx -msm4
> +@need 200
> +@opindex mavx10.1
> +@itemx -mavx10.1
> +@need 200
> +@opindex mavx10.1-256
> +@itemx -mavx10.1-256
> +@need 200
> +@opindex mavx10.1-512
> +@itemx -mavx10.1-512
>  These switches enable the use of instructions in the MMX, SSE,
>  AVX512ER, AVX512CD, AVX512VL, AVX512BW, AVX512DQ, AVX512IFMA, AVX512VBMI, SHA,
>  AES, PCLMUL, CLFLUSHOPT, CLWB, FSGSBASE, PTWRITE, RDRND, F16C, FMA, PCONFIG,
> @@ -33680,9 +33690,9 @@ GFNI, VAES, WAITPKG, VPCLMULQDQ, AVX512BITALG, MOVDIRI, MOVDIR64B, AVX512BF16,
>  ENQCMD, AVX512VPOPCNTDQ, AVX5124FMAPS, AVX512VNNI, AVX5124VNNIW, SERIALIZE,
>  UINTR, HRESET, AMXTILE, AMXINT8, AMXBF16, KL, WIDEKL, AVXVNNI, AVX512-FP16,
>  AVXIFMA, AVXVNNIINT8, AVXNECONVERT, CMPCCXADD, AMX-FP16, PREFETCHI, RAOINT,
> -AMX-COMPLEX, AVXVNNIINT16, SM3, SHA512, SM4 or CLDEMOTE extended instruction
> -sets. Each has a corresponding @option{-mno-} option to disable use of these
> -instructions.
> +AMX-COMPLEX, AVXVNNIINT16, SM3, SHA512, SM4, AVX10.1 or CLDEMOTE extended
> +instruction sets. Each has a corresponding @option{-mno-} option to disable
> +use of these instructions.
>
>  These extensions are also available as built-in functions: see
>  @ref{x86 Built-in Functions}, for details of the functions enabled and
> diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
> index 1a78b3c1abb..cab8065cd8e 100644
> --- a/gcc/doc/sourcebuild.texi
> +++ b/gcc/doc/sourcebuild.texi
> @@ -2484,6 +2484,15 @@ Target supports compiling @code{avx} instructions.
>  @item avx_runtime
>  Target supports the execution of @code{avx} instructions.
>
> +@item avx10.1
> +Target supports the execution of @code{avx10.1} instructions.
> +
> +@item avx10.1-256
> +Target supports the execution of @code{avx10.1} instructions.
> +
> +@item avx10.1-512
> +Target supports the execution of @code{avx10.1-512} instructions.
> +
>  @item avx2
>  Target supports compiling @code{avx2} instructions.
>
> diff --git a/gcc/testsuite/g++.target/i386/mv33.C b/gcc/testsuite/g++.target/i386/mv33.C
> new file mode 100644
> index 00000000000..b50f13c5aa8
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/i386/mv33.C
> @@ -0,0 +1,30 @@
> +// Test that dispatching can choose the right multiversion
> +// for avx10.x-512 microarchitecture levels.
> +
> +// { dg-do run }
> +// { dg-require-ifunc "" }
> +// { dg-options "-O2" }
> +
> +#include <assert.h>
> +
> +int __attribute__ ((target("default")))
> +foo ()
> +{
> +  return 0;
> +}
> +
> +int __attribute__ ((target("avx10.1-512"))) foo () {
> +  return 1;
> +}
> +
> +int main ()
> +{
> +  int val = foo ();
> +
> +  if  (__builtin_cpu_supports ("avx10.1-512"))
> +    assert (val == 1);
> +  else
> +    assert (val == 0);
> +
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-1.c b/gcc/testsuite/gcc.target/i386/avx10_1-1.c
> new file mode 100644
> index 00000000000..cfd9662bb13
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx10_1-1.c
> @@ -0,0 +1,22 @@
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-O2 -march=x86-64 -mavx10.1" } */
> +
> +#include <immintrin.h>
> +
> +void
> +f1 ()
> +{
> +  register __m256d a __asm ("ymm17");
> +  register __m256d b __asm ("ymm16");
> +  a = _mm256_add_pd (a, b);
> +  asm volatile ("" : "+v" (a));
> +}
> +
> +void
> +f2 ()
> +{
> +  register __m128d a __asm ("xmm17");
> +  register __m128d b __asm ("xmm16");
> +  a = _mm_add_pd (a, b);
> +  asm volatile ("" : "+v" (a));
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-10.c b/gcc/testsuite/gcc.target/i386/avx10_1-10.c
> new file mode 100644
> index 00000000000..9a5892d8df9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx10_1-10.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=x86-64" } */
> +/* { dg-final { scan-assembler "%zmm" } } */
> +
> +typedef double __m512d __attribute__ ((__vector_size__ (64), __may_alias__));
> +
> +__attribute__ ((target ("avx10.1-512"))) __m512d
> +foo ()
> +{
> +  __m512d a, b;
> +  a = a + b;
> +  return a;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-2.c b/gcc/testsuite/gcc.target/i386/avx10_1-2.c
> new file mode 100644
> index 00000000000..0b3991dcf74
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx10_1-2.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=x86-64 -mavx10.1-512" } */
> +/* { dg-final { scan-assembler "%zmm" } } */
> +
> +typedef double __m512d __attribute__ ((__vector_size__ (64), __may_alias__));
> +
> +__m512d
> +foo ()
> +{
> +  __m512d a, b;
> +  a = a + b;
> +  return a;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-3.c b/gcc/testsuite/gcc.target/i386/avx10_1-3.c
> new file mode 100644
> index 00000000000..3be988a1a62
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx10_1-3.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -march=x86-64 -mavx10.1" } */
> +
> +#include <immintrin.h>
> +
> +int
> +foo (int c)
> +{
> +  register int a __asm ("k7") = c;
> +  int b = foo (a);
> +  asm volatile ("" : "+k" (b));
> +  return b;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-4.c b/gcc/testsuite/gcc.target/i386/avx10_1-4.c
> new file mode 100644
> index 00000000000..68cbf197d61
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx10_1-4.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -march=x86-64 -mavx10.1-512" } */
> +
> +#include <immintrin.h>
> +
> +long long
> +foo (long long c)
> +{
> +  register long long a __asm ("k7") = c;
> +  long long b = foo (a);
> +  asm volatile ("" : "+k" (b));
> +  return b;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-5.c b/gcc/testsuite/gcc.target/i386/avx10_1-5.c
> new file mode 100644
> index 00000000000..5481ab2f386
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx10_1-5.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-O0 -march=x86-64 -mavx10.1 -Wno-psabi" } */
> +/* { dg-final { scan-assembler-not ".%zmm" } } */
> +
> +typedef double __m512d __attribute__ ((__vector_size__ (64), __may_alias__));
> +
> +__m512d
> +foo ()
> +{
> +  __m512d a, b;
> +  a = a + b;
> +  return a;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-6.c b/gcc/testsuite/gcc.target/i386/avx10_1-6.c
> new file mode 100644
> index 00000000000..827c80ce51e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx10_1-6.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -march=x86-64 -mavx10.1" } */
> +
> +#include <immintrin.h>
> +
> +long long
> +foo (long long c)
> +{
> +  register long long a __asm ("k7") = c;
> +  long long b = foo (a);
> +  asm volatile ("" : "+k" (b)); /* { dg-error "inconsistent operand constraints in an 'asm'" } */
> +  return b;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-7.c b/gcc/testsuite/gcc.target/i386/avx10_1-7.c
> new file mode 100644
> index 00000000000..d8b8d97590b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx10_1-7.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=x86-64 -Wno-psabi" } */
> +/* { dg-final { scan-assembler-not ".%zmm" } } */
> +
> +typedef double __m512d __attribute__ ((__vector_size__ (64), __may_alias__));
> +
> +__attribute__ ((target ("avx10.1"))) __m512d
> +foo ()
> +{
> +  __m512d a, b;
> +  a = a + b;
> +  return a;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-8.c b/gcc/testsuite/gcc.target/i386/avx10_1-8.c
> new file mode 100644
> index 00000000000..8dbd201b336
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx10_1-8.c
> @@ -0,0 +1,4 @@
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-O2 -march=x86-64 -mavx10.1-256" } */
> +
> +#include "avx10_1-1.c"
> diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-9.c b/gcc/testsuite/gcc.target/i386/avx10_1-9.c
> new file mode 100644
> index 00000000000..00493098be7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx10_1-9.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=x86-64 -Wno-psabi" } */
> +/* { dg-final { scan-assembler-not ".%zmm" } } */
> +
> +typedef double __m512d __attribute__ ((__vector_size__ (64), __may_alias__));
> +
> +__attribute__ ((target ("avx10.1-256"))) __m512d
> +foo ()
> +{
> +  __m512d a, b;
> +  a = a + b;
> +  return a;
> +}
> --
> 2.31.1
>


-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH 2/3] Emit a warning when disabling AVX512 with AVX10 enabled or disabling AVX10 with AVX512 enabled
  2023-08-08  7:13 Intel AVX10.1 Compiler Design and Support Haochen Jiang
  2023-08-08  7:13 ` [PATCH 1/3] Initial support for AVX10.1 Haochen Jiang
@ 2023-08-08  7:13 ` Haochen Jiang
  2023-08-16  2:30   ` Hongtao Liu
  2023-08-08  7:13 ` [PATCH 3/3] Emit a warning when AVX10 options conflict in vector width Haochen Jiang
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 88+ messages in thread
From: Haochen Jiang @ 2023-08-08  7:13 UTC (permalink / raw)
  To: gcc-patches; +Cc: ubizjak, hongtao.liu

gcc/ChangeLog:

	* config/i386/driver-i386.cc (host_detect_local_cpu):
	Do not append -mno-avx10.1 for -march=native.
	* config/i386/i386-options.cc
	(ix86_check_avx10): New function to check isa_flags and
	isa_flags_explicit to emit warning when AVX10 is enabled
	by "-m" option.
	(ix86_check_avx512):  New function to check isa_flags and
        isa_flags_explicit to emit warning when AVX512 is enabled
	by "-m" option.
	(ix86_handle_option): Do not change the flags when warning
	is emitted.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx10_1-11.c: New test.
	* gcc.target/i386/avx10_1-12.c: Ditto.
	* gcc.target/i386/avx10_1-13.c: Ditto.
	* gcc.target/i386/avx10_1-14.c: Ditto.
---
 gcc/common/config/i386/i386-common.cc      | 68 +++++++++++++++++-----
 gcc/config/i386/driver-i386.cc             |  2 +-
 gcc/testsuite/gcc.target/i386/avx10_1-11.c |  5 ++
 gcc/testsuite/gcc.target/i386/avx10_1-12.c | 13 +++++
 gcc/testsuite/gcc.target/i386/avx10_1-13.c |  5 ++
 gcc/testsuite/gcc.target/i386/avx10_1-14.c | 13 +++++
 6 files changed, 91 insertions(+), 15 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-11.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-12.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-13.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-14.c

diff --git a/gcc/common/config/i386/i386-common.cc b/gcc/common/config/i386/i386-common.cc
index 6c3bebb1846..ec94251dd4c 100644
--- a/gcc/common/config/i386/i386-common.cc
+++ b/gcc/common/config/i386/i386-common.cc
@@ -388,6 +388,46 @@ set_malign_value (const char **flag, unsigned value)
   *flag = r;
 }
 
+/* Emit a warning when using -mno-avx512{f,vl,bw,dq,cd,bf16,fp16,vbmi,vbmi2,
+   vnni,ifma,bitalg,vpopcntdq} with -mavx10.1 and above.  */
+static bool
+ix86_check_avx10 (struct gcc_options *opts)
+{
+  if (opts->x_ix86_isa_flags2 & opts->x_ix86_isa_flags2_explicit
+      & OPTION_MASK_ISA2_AVX10_1)
+    {
+      warning (0, "%<-mno-avx512{f,vl,bw,dq,cd,bf16,fp16,vbmi,vbmi2,vnni,ifma,"
+	       "bitalg,vpopcntdq}%> are ignored with %<-mavx10.1%> and above");
+      return false;
+    }
+
+  return true;
+}
+
+/* Emit a warning when using -mno-avx10.1 with -mavx512{f,vl,bw,dq,cd,bf16,
+   fp16,vbmi,vbmi2,vnni,ifma,bitalg,vpopcntdq}.  */
+static bool
+ix86_check_avx512 (struct gcc_options *opts)
+{
+  if ((opts->x_ix86_isa_flags & opts->x_ix86_isa_flags_explicit
+       & (OPTION_MASK_ISA_AVX512F | OPTION_MASK_ISA_AVX512CD
+	  | OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512BW
+	  | OPTION_MASK_ISA_AVX512VL | OPTION_MASK_ISA_AVX512IFMA
+	  | OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512VBMI2
+	  | OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VPOPCNTDQ
+	  | OPTION_MASK_ISA_AVX512BITALG))
+      || (opts->x_ix86_isa_flags2 & opts->x_ix86_isa_flags2_explicit
+	  & (OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_AVX512BF16)))
+    {
+      warning (0, "%<-mno-avx10.1%> is ignored when using with "
+	       "%<-mavx512{f,vl,bw,dq,cd,bf16,fp16,vbmi,vbmi2,vnni,"
+	       "ifma,bitalg,vpopcntdq}%>");
+      return false;
+    }
+
+  return true;
+}
+
 /* Implement TARGET_HANDLE_OPTION.  */
 
 bool
@@ -609,7 +649,7 @@ ix86_handle_option (struct gcc_options *opts,
 	  opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX512F_SET;
 	  opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512F_SET;
 	}
-      else
+      else if (ix86_check_avx10 (opts))
 	{
 	  opts->x_ix86_isa_flags &= ~OPTION_MASK_ISA_AVX512F_UNSET;
 	  opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512F_UNSET;
@@ -624,7 +664,7 @@ ix86_handle_option (struct gcc_options *opts,
 	  opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX512CD_SET;
 	  opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512CD_SET;
 	}
-      else
+      else if (ix86_check_avx10 (opts))
 	{
 	  opts->x_ix86_isa_flags &= ~OPTION_MASK_ISA_AVX512CD_UNSET;
 	  opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512CD_UNSET;
@@ -898,7 +938,7 @@ ix86_handle_option (struct gcc_options *opts,
 	  opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX512VBMI2_SET;
 	  opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512VBMI2_SET;
 	}
-      else
+      else if (ix86_check_avx10 (opts))
 	{
 	  opts->x_ix86_isa_flags &= ~OPTION_MASK_ISA_AVX512VBMI2_UNSET;
 	  opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512VBMI2_UNSET;
@@ -913,7 +953,7 @@ ix86_handle_option (struct gcc_options *opts,
 	  opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX512FP16_SET;
 	  opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512FP16_SET;
 	}
-      else
+      else if (ix86_check_avx10 (opts))
 	{
 	  opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_AVX512FP16_UNSET;
 	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AVX512FP16_UNSET;
@@ -926,7 +966,7 @@ ix86_handle_option (struct gcc_options *opts,
 	  opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX512VNNI_SET;
 	  opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512VNNI_SET;
 	}
-      else
+      else if (ix86_check_avx10 (opts))
 	{
 	  opts->x_ix86_isa_flags &= ~OPTION_MASK_ISA_AVX512VNNI_UNSET;
 	  opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512VNNI_UNSET;
@@ -940,7 +980,7 @@ ix86_handle_option (struct gcc_options *opts,
 	  opts->x_ix86_isa_flags_explicit
 	    |= OPTION_MASK_ISA_AVX512VPOPCNTDQ_SET;
 	}
-      else
+      else if (ix86_check_avx10 (opts))
 	{
 	  opts->x_ix86_isa_flags &= ~OPTION_MASK_ISA_AVX512VPOPCNTDQ_UNSET;
 	  opts->x_ix86_isa_flags_explicit
@@ -954,7 +994,7 @@ ix86_handle_option (struct gcc_options *opts,
 	  opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX512BITALG_SET;
 	  opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512BITALG_SET;
 	}
-      else
+      else if (ix86_check_avx10 (opts))
 	{
 	  opts->x_ix86_isa_flags &= ~OPTION_MASK_ISA_AVX512BITALG_UNSET;
 	  opts->x_ix86_isa_flags_explicit
@@ -970,7 +1010,7 @@ ix86_handle_option (struct gcc_options *opts,
 	  opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX512BW_SET;
 	  opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512BW_SET;
 	}
-      else
+      else if (ix86_check_avx10 (opts))
 	{
 	  opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_AVX512BF16_UNSET;
 	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AVX512BF16_UNSET;
@@ -1037,7 +1077,7 @@ ix86_handle_option (struct gcc_options *opts,
 	  opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX512DQ_SET;
 	  opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512DQ_SET;
 	}
-      else
+      else if (ix86_check_avx10 (opts))
 	{
 	  opts->x_ix86_isa_flags &= ~OPTION_MASK_ISA_AVX512DQ_UNSET;
 	  opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512DQ_UNSET;
@@ -1050,7 +1090,7 @@ ix86_handle_option (struct gcc_options *opts,
 	  opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX512BW_SET;
 	  opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512BW_SET;
 	}
-      else
+      else if (ix86_check_avx10 (opts))
 	{
 	  opts->x_ix86_isa_flags &= ~OPTION_MASK_ISA_AVX512BW_UNSET;
 	  opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512BW_UNSET;
@@ -1065,7 +1105,7 @@ ix86_handle_option (struct gcc_options *opts,
 	  opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX512VL_SET;
 	  opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512VL_SET;
 	}
-      else
+      else if (ix86_check_avx10 (opts))
 	{
 	  opts->x_ix86_isa_flags &= ~OPTION_MASK_ISA_AVX512VL_UNSET;
 	  opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512VL_UNSET;
@@ -1078,7 +1118,7 @@ ix86_handle_option (struct gcc_options *opts,
 	  opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX512IFMA_SET;
 	  opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512IFMA_SET;
 	}
-      else
+      else if (ix86_check_avx10 (opts))
 	{
 	  opts->x_ix86_isa_flags &= ~OPTION_MASK_ISA_AVX512IFMA_UNSET;
 	  opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512IFMA_UNSET;
@@ -1091,7 +1131,7 @@ ix86_handle_option (struct gcc_options *opts,
 	  opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX512VBMI_SET;
 	  opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512VBMI_SET;
 	}
-      else
+      else if (ix86_check_avx10 (opts))
 	{
 	  opts->x_ix86_isa_flags &= ~OPTION_MASK_ISA_AVX512VBMI_UNSET;
 	  opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512VBMI_UNSET;
@@ -1367,7 +1407,7 @@ ix86_handle_option (struct gcc_options *opts,
 	  opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX2_SET;
 	  opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX2_SET;
 	}
-      else
+      else if (ix86_check_avx512 (opts))
 	{
 	  opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_AVX10_1_UNSET;
 	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AVX10_1_UNSET;
diff --git a/gcc/config/i386/driver-i386.cc b/gcc/config/i386/driver-i386.cc
index 08d0aed6183..227ace6ff83 100644
--- a/gcc/config/i386/driver-i386.cc
+++ b/gcc/config/i386/driver-i386.cc
@@ -854,7 +854,7 @@ const char *host_detect_local_cpu (int argc, const char **argv)
 		  options = concat (options, " ",
 				    isa_names_table[i].option, NULL);
 	      }
-	    else
+	    else if (isa_names_table[i].feature != FEATURE_AVX10_1)
 	      options = concat (options, neg_option,
 				isa_names_table[i].option + 2, NULL);
 	  }
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-11.c b/gcc/testsuite/gcc.target/i386/avx10_1-11.c
new file mode 100644
index 00000000000..10c8d781dd9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-11.c
@@ -0,0 +1,5 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -march=x86-64 -mavx10.1 -mno-avx512f" } */
+/* { dg-warning "'-mno-avx512{f,vl,bw,dq,cd,bf16,fp16,vbmi,vbmi2,vnni,ifma,bitalg,vpopcntdq}' are ignored with '-mavx10.1' and above" "" { target *-*-* } 0 } */
+
+#include "avx10_1-1.c"
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-12.c b/gcc/testsuite/gcc.target/i386/avx10_1-12.c
new file mode 100644
index 00000000000..b79c92ad002
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-12.c
@@ -0,0 +1,13 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2" } */
+
+#include <immintrin.h>
+
+__attribute__ ((target ("avx10.1,no-avx512f"))) void
+f1 ()
+{ /* { dg-warning "'-mno-avx512{f,vl,bw,dq,cd,bf16,fp16,vbmi,vbmi2,vnni,ifma,bitalg,vpopcntdq}' are ignored with '-mavx10.1' and above" } */
+  register __m256d a __asm ("ymm17");
+  register __m256d b __asm ("ymm16");
+  a = _mm256_add_pd (a, b);
+  asm volatile ("" : "+v" (a));
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-13.c b/gcc/testsuite/gcc.target/i386/avx10_1-13.c
new file mode 100644
index 00000000000..156d59f1d35
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-13.c
@@ -0,0 +1,5 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -march=x86-64 -mavx512f -mno-avx10.1" } */
+/* { dg-warning "'-mno-avx10.1' is ignored when using with '-mavx512{f,vl,bw,dq,cd,bf16,fp16,vbmi,vbmi2,vnni,ifma,bitalg,vpopcntdq}'" "" { target *-*-* } 0 } */
+
+#include "avx10_1-2.c"
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-14.c b/gcc/testsuite/gcc.target/i386/avx10_1-14.c
new file mode 100644
index 00000000000..23d2ba8bc64
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-14.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-march=x86-64" } */
+/* { dg-final { scan-assembler "%zmm" } } */
+
+typedef double __m512d __attribute__ ((__vector_size__ (64), __may_alias__));
+
+__attribute__ ((target ("avx512f,no-avx10.1"))) __m512d
+foo ()
+{ /* { dg-warning "'-mno-avx10.1' is ignored when using with '-mavx512{f,vl,bw,dq,cd,bf16,fp16,vbmi,vbmi2,vnni,ifma,bitalg,vpopcntdq}'" } */
+  __m512d a, b;
+  a = a + b;
+  return a;
+}
-- 
2.31.1


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 2/3] Emit a warning when disabling AVX512 with AVX10 enabled or disabling AVX10 with AVX512 enabled
  2023-08-08  7:13 ` [PATCH 2/3] Emit a warning when disabling AVX512 with AVX10 enabled or disabling AVX10 with AVX512 enabled Haochen Jiang
@ 2023-08-16  2:30   ` Hongtao Liu
  0 siblings, 0 replies; 88+ messages in thread
From: Hongtao Liu @ 2023-08-16  2:30 UTC (permalink / raw)
  To: Haochen Jiang; +Cc: gcc-patches, ubizjak, hongtao.liu

On Tue, Aug 8, 2023 at 3:15 PM Haochen Jiang via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> gcc/ChangeLog:
>
>         * config/i386/driver-i386.cc (host_detect_local_cpu):
>         Do not append -mno-avx10.1 for -march=native.
>         * config/i386/i386-options.cc
>         (ix86_check_avx10): New function to check isa_flags and
>         isa_flags_explicit to emit warning when AVX10 is enabled
>         by "-m" option.
>         (ix86_check_avx512):  New function to check isa_flags and
>         isa_flags_explicit to emit warning when AVX512 is enabled
>         by "-m" option.
>         (ix86_handle_option): Do not change the flags when warning
>         is emitted.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/i386/avx10_1-11.c: New test.
>         * gcc.target/i386/avx10_1-12.c: Ditto.
>         * gcc.target/i386/avx10_1-13.c: Ditto.
>         * gcc.target/i386/avx10_1-14.c: Ditto.
Ok(please wait for extra 24 hours to commit, if there's no objection)
> ---
>  gcc/common/config/i386/i386-common.cc      | 68 +++++++++++++++++-----
>  gcc/config/i386/driver-i386.cc             |  2 +-
>  gcc/testsuite/gcc.target/i386/avx10_1-11.c |  5 ++
>  gcc/testsuite/gcc.target/i386/avx10_1-12.c | 13 +++++
>  gcc/testsuite/gcc.target/i386/avx10_1-13.c |  5 ++
>  gcc/testsuite/gcc.target/i386/avx10_1-14.c | 13 +++++
>  6 files changed, 91 insertions(+), 15 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-11.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-12.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-13.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-14.c
>
> diff --git a/gcc/common/config/i386/i386-common.cc b/gcc/common/config/i386/i386-common.cc
> index 6c3bebb1846..ec94251dd4c 100644
> --- a/gcc/common/config/i386/i386-common.cc
> +++ b/gcc/common/config/i386/i386-common.cc
> @@ -388,6 +388,46 @@ set_malign_value (const char **flag, unsigned value)
>    *flag = r;
>  }
>
> +/* Emit a warning when using -mno-avx512{f,vl,bw,dq,cd,bf16,fp16,vbmi,vbmi2,
> +   vnni,ifma,bitalg,vpopcntdq} with -mavx10.1 and above.  */
> +static bool
> +ix86_check_avx10 (struct gcc_options *opts)
> +{
> +  if (opts->x_ix86_isa_flags2 & opts->x_ix86_isa_flags2_explicit
> +      & OPTION_MASK_ISA2_AVX10_1)
> +    {
> +      warning (0, "%<-mno-avx512{f,vl,bw,dq,cd,bf16,fp16,vbmi,vbmi2,vnni,ifma,"
> +              "bitalg,vpopcntdq}%> are ignored with %<-mavx10.1%> and above");
> +      return false;
> +    }
> +
> +  return true;
> +}
> +
> +/* Emit a warning when using -mno-avx10.1 with -mavx512{f,vl,bw,dq,cd,bf16,
> +   fp16,vbmi,vbmi2,vnni,ifma,bitalg,vpopcntdq}.  */
> +static bool
> +ix86_check_avx512 (struct gcc_options *opts)
> +{
> +  if ((opts->x_ix86_isa_flags & opts->x_ix86_isa_flags_explicit
> +       & (OPTION_MASK_ISA_AVX512F | OPTION_MASK_ISA_AVX512CD
> +         | OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512BW
> +         | OPTION_MASK_ISA_AVX512VL | OPTION_MASK_ISA_AVX512IFMA
> +         | OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512VBMI2
> +         | OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VPOPCNTDQ
> +         | OPTION_MASK_ISA_AVX512BITALG))
> +      || (opts->x_ix86_isa_flags2 & opts->x_ix86_isa_flags2_explicit
> +         & (OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_AVX512BF16)))
> +    {
> +      warning (0, "%<-mno-avx10.1%> is ignored when using with "
> +              "%<-mavx512{f,vl,bw,dq,cd,bf16,fp16,vbmi,vbmi2,vnni,"
> +              "ifma,bitalg,vpopcntdq}%>");
> +      return false;
> +    }
> +
> +  return true;
> +}
> +
>  /* Implement TARGET_HANDLE_OPTION.  */
>
>  bool
> @@ -609,7 +649,7 @@ ix86_handle_option (struct gcc_options *opts,
>           opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX512F_SET;
>           opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512F_SET;
>         }
> -      else
> +      else if (ix86_check_avx10 (opts))
>         {
>           opts->x_ix86_isa_flags &= ~OPTION_MASK_ISA_AVX512F_UNSET;
>           opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512F_UNSET;
> @@ -624,7 +664,7 @@ ix86_handle_option (struct gcc_options *opts,
>           opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX512CD_SET;
>           opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512CD_SET;
>         }
> -      else
> +      else if (ix86_check_avx10 (opts))
>         {
>           opts->x_ix86_isa_flags &= ~OPTION_MASK_ISA_AVX512CD_UNSET;
>           opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512CD_UNSET;
> @@ -898,7 +938,7 @@ ix86_handle_option (struct gcc_options *opts,
>           opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX512VBMI2_SET;
>           opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512VBMI2_SET;
>         }
> -      else
> +      else if (ix86_check_avx10 (opts))
>         {
>           opts->x_ix86_isa_flags &= ~OPTION_MASK_ISA_AVX512VBMI2_UNSET;
>           opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512VBMI2_UNSET;
> @@ -913,7 +953,7 @@ ix86_handle_option (struct gcc_options *opts,
>           opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX512FP16_SET;
>           opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512FP16_SET;
>         }
> -      else
> +      else if (ix86_check_avx10 (opts))
>         {
>           opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_AVX512FP16_UNSET;
>           opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AVX512FP16_UNSET;
> @@ -926,7 +966,7 @@ ix86_handle_option (struct gcc_options *opts,
>           opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX512VNNI_SET;
>           opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512VNNI_SET;
>         }
> -      else
> +      else if (ix86_check_avx10 (opts))
>         {
>           opts->x_ix86_isa_flags &= ~OPTION_MASK_ISA_AVX512VNNI_UNSET;
>           opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512VNNI_UNSET;
> @@ -940,7 +980,7 @@ ix86_handle_option (struct gcc_options *opts,
>           opts->x_ix86_isa_flags_explicit
>             |= OPTION_MASK_ISA_AVX512VPOPCNTDQ_SET;
>         }
> -      else
> +      else if (ix86_check_avx10 (opts))
>         {
>           opts->x_ix86_isa_flags &= ~OPTION_MASK_ISA_AVX512VPOPCNTDQ_UNSET;
>           opts->x_ix86_isa_flags_explicit
> @@ -954,7 +994,7 @@ ix86_handle_option (struct gcc_options *opts,
>           opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX512BITALG_SET;
>           opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512BITALG_SET;
>         }
> -      else
> +      else if (ix86_check_avx10 (opts))
>         {
>           opts->x_ix86_isa_flags &= ~OPTION_MASK_ISA_AVX512BITALG_UNSET;
>           opts->x_ix86_isa_flags_explicit
> @@ -970,7 +1010,7 @@ ix86_handle_option (struct gcc_options *opts,
>           opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX512BW_SET;
>           opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512BW_SET;
>         }
> -      else
> +      else if (ix86_check_avx10 (opts))
>         {
>           opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_AVX512BF16_UNSET;
>           opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AVX512BF16_UNSET;
> @@ -1037,7 +1077,7 @@ ix86_handle_option (struct gcc_options *opts,
>           opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX512DQ_SET;
>           opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512DQ_SET;
>         }
> -      else
> +      else if (ix86_check_avx10 (opts))
>         {
>           opts->x_ix86_isa_flags &= ~OPTION_MASK_ISA_AVX512DQ_UNSET;
>           opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512DQ_UNSET;
> @@ -1050,7 +1090,7 @@ ix86_handle_option (struct gcc_options *opts,
>           opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX512BW_SET;
>           opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512BW_SET;
>         }
> -      else
> +      else if (ix86_check_avx10 (opts))
>         {
>           opts->x_ix86_isa_flags &= ~OPTION_MASK_ISA_AVX512BW_UNSET;
>           opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512BW_UNSET;
> @@ -1065,7 +1105,7 @@ ix86_handle_option (struct gcc_options *opts,
>           opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX512VL_SET;
>           opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512VL_SET;
>         }
> -      else
> +      else if (ix86_check_avx10 (opts))
>         {
>           opts->x_ix86_isa_flags &= ~OPTION_MASK_ISA_AVX512VL_UNSET;
>           opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512VL_UNSET;
> @@ -1078,7 +1118,7 @@ ix86_handle_option (struct gcc_options *opts,
>           opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX512IFMA_SET;
>           opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512IFMA_SET;
>         }
> -      else
> +      else if (ix86_check_avx10 (opts))
>         {
>           opts->x_ix86_isa_flags &= ~OPTION_MASK_ISA_AVX512IFMA_UNSET;
>           opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512IFMA_UNSET;
> @@ -1091,7 +1131,7 @@ ix86_handle_option (struct gcc_options *opts,
>           opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX512VBMI_SET;
>           opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512VBMI_SET;
>         }
> -      else
> +      else if (ix86_check_avx10 (opts))
>         {
>           opts->x_ix86_isa_flags &= ~OPTION_MASK_ISA_AVX512VBMI_UNSET;
>           opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512VBMI_UNSET;
> @@ -1367,7 +1407,7 @@ ix86_handle_option (struct gcc_options *opts,
>           opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX2_SET;
>           opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX2_SET;
>         }
> -      else
> +      else if (ix86_check_avx512 (opts))
>         {
>           opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_AVX10_1_UNSET;
>           opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AVX10_1_UNSET;
> diff --git a/gcc/config/i386/driver-i386.cc b/gcc/config/i386/driver-i386.cc
> index 08d0aed6183..227ace6ff83 100644
> --- a/gcc/config/i386/driver-i386.cc
> +++ b/gcc/config/i386/driver-i386.cc
> @@ -854,7 +854,7 @@ const char *host_detect_local_cpu (int argc, const char **argv)
>                   options = concat (options, " ",
>                                     isa_names_table[i].option, NULL);
>               }
> -           else
> +           else if (isa_names_table[i].feature != FEATURE_AVX10_1)
>               options = concat (options, neg_option,
>                                 isa_names_table[i].option + 2, NULL);
>           }
> diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-11.c b/gcc/testsuite/gcc.target/i386/avx10_1-11.c
> new file mode 100644
> index 00000000000..10c8d781dd9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx10_1-11.c
> @@ -0,0 +1,5 @@
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-O2 -march=x86-64 -mavx10.1 -mno-avx512f" } */
> +/* { dg-warning "'-mno-avx512{f,vl,bw,dq,cd,bf16,fp16,vbmi,vbmi2,vnni,ifma,bitalg,vpopcntdq}' are ignored with '-mavx10.1' and above" "" { target *-*-* } 0 } */
> +
> +#include "avx10_1-1.c"
> diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-12.c b/gcc/testsuite/gcc.target/i386/avx10_1-12.c
> new file mode 100644
> index 00000000000..b79c92ad002
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx10_1-12.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-O2" } */
> +
> +#include <immintrin.h>
> +
> +__attribute__ ((target ("avx10.1,no-avx512f"))) void
> +f1 ()
> +{ /* { dg-warning "'-mno-avx512{f,vl,bw,dq,cd,bf16,fp16,vbmi,vbmi2,vnni,ifma,bitalg,vpopcntdq}' are ignored with '-mavx10.1' and above" } */
> +  register __m256d a __asm ("ymm17");
> +  register __m256d b __asm ("ymm16");
> +  a = _mm256_add_pd (a, b);
> +  asm volatile ("" : "+v" (a));
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-13.c b/gcc/testsuite/gcc.target/i386/avx10_1-13.c
> new file mode 100644
> index 00000000000..156d59f1d35
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx10_1-13.c
> @@ -0,0 +1,5 @@
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-O2 -march=x86-64 -mavx512f -mno-avx10.1" } */
> +/* { dg-warning "'-mno-avx10.1' is ignored when using with '-mavx512{f,vl,bw,dq,cd,bf16,fp16,vbmi,vbmi2,vnni,ifma,bitalg,vpopcntdq}'" "" { target *-*-* } 0 } */
> +
> +#include "avx10_1-2.c"
> diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-14.c b/gcc/testsuite/gcc.target/i386/avx10_1-14.c
> new file mode 100644
> index 00000000000..23d2ba8bc64
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx10_1-14.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=x86-64" } */
> +/* { dg-final { scan-assembler "%zmm" } } */
> +
> +typedef double __m512d __attribute__ ((__vector_size__ (64), __may_alias__));
> +
> +__attribute__ ((target ("avx512f,no-avx10.1"))) __m512d
> +foo ()
> +{ /* { dg-warning "'-mno-avx10.1' is ignored when using with '-mavx512{f,vl,bw,dq,cd,bf16,fp16,vbmi,vbmi2,vnni,ifma,bitalg,vpopcntdq}'" } */
> +  __m512d a, b;
> +  a = a + b;
> +  return a;
> +}
> --
> 2.31.1
>


-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH 3/3] Emit a warning when AVX10 options conflict in vector width
  2023-08-08  7:13 Intel AVX10.1 Compiler Design and Support Haochen Jiang
  2023-08-08  7:13 ` [PATCH 1/3] Initial support for AVX10.1 Haochen Jiang
  2023-08-08  7:13 ` [PATCH 2/3] Emit a warning when disabling AVX512 with AVX10 enabled or disabling AVX10 with AVX512 enabled Haochen Jiang
@ 2023-08-08  7:13 ` Haochen Jiang
  2023-08-16  2:30   ` Hongtao Liu
  2023-08-08  7:19 ` [PATCH 1/6] Support AVX10.1 for AVX512DQ+AVX512VL intrins Haochen Jiang
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 88+ messages in thread
From: Haochen Jiang @ 2023-08-08  7:13 UTC (permalink / raw)
  To: gcc-patches; +Cc: ubizjak, hongtao.liu

gcc/ChangeLog:

	* config/i386/driver-i386.cc (host_detect_local_cpu):
	Do not append -mno-avx10-max-512bit for -march=native.
	* common/config/i386/i386-common.cc
	(ix86_check_avx10_vector_width): New function to check isa_flags
	to emit a warning when there is a conflict in AVX10 options for
	vector width.
	(ix86_handle_option): Add check for avx10.1-256 and avx10.1-512.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx10_1-15.c: New test.
	* gcc.target/i386/avx10_1-16.c: Ditto.
	* gcc.target/i386/avx10_1-17.c: Ditto.
	* gcc.target/i386/avx10_1-18.c: Ditto.
---
 gcc/common/config/i386/i386-common.cc      | 20 ++++++++++++++++++++
 gcc/config/i386/driver-i386.cc             |  3 ++-
 gcc/config/i386/i386-options.cc            |  2 +-
 gcc/testsuite/gcc.target/i386/avx10_1-15.c |  5 +++++
 gcc/testsuite/gcc.target/i386/avx10_1-16.c |  5 +++++
 gcc/testsuite/gcc.target/i386/avx10_1-17.c | 13 +++++++++++++
 gcc/testsuite/gcc.target/i386/avx10_1-18.c | 13 +++++++++++++
 7 files changed, 59 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-15.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-16.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-17.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-18.c

diff --git a/gcc/common/config/i386/i386-common.cc b/gcc/common/config/i386/i386-common.cc
index ec94251dd4c..db88befc9b8 100644
--- a/gcc/common/config/i386/i386-common.cc
+++ b/gcc/common/config/i386/i386-common.cc
@@ -428,6 +428,24 @@ ix86_check_avx512 (struct gcc_options *opts)
   return true;
 }
 
+/* Emit a warning when there is a conflict vector width in AVX10 options.  */
+static void
+ix86_check_avx10_vector_width (struct gcc_options *opts, bool avx10_max_512)
+{
+  if (avx10_max_512)
+    {
+      if (((opts->x_ix86_isa_flags2 | ~OPTION_MASK_ISA2_AVX10_512BIT)
+	   == ~OPTION_MASK_ISA2_AVX10_512BIT)
+	  && (opts->x_ix86_isa_flags2_explicit & OPTION_MASK_ISA2_AVX10_512BIT))
+	warning (0, "The options used for AVX10 have conflict vector width, "
+		 "using the latter 512 as vector width");
+    }
+  else if (opts->x_ix86_isa_flags2 & opts->x_ix86_isa_flags2_explicit
+	   & OPTION_MASK_ISA2_AVX10_512BIT)
+    warning (0, "The options used for AVX10 have conflict vector width, "
+	     "using the latter 256 as vector width");
+}
+
 /* Implement TARGET_HANDLE_OPTION.  */
 
 bool
@@ -1415,6 +1433,7 @@ ix86_handle_option (struct gcc_options *opts,
       return true;
 
     case OPT_mavx10_1_256:
+      ix86_check_avx10_vector_width (opts, false);
       opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AVX10_1_SET;
       opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AVX10_1_SET;
       opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_AVX10_512BIT_SET;
@@ -1424,6 +1443,7 @@ ix86_handle_option (struct gcc_options *opts,
       return true;
 
     case OPT_mavx10_1_512:
+      ix86_check_avx10_vector_width (opts, true);
       opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AVX10_1_SET;
       opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AVX10_1_SET;
       opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AVX10_512BIT_SET;
diff --git a/gcc/config/i386/driver-i386.cc b/gcc/config/i386/driver-i386.cc
index 227ace6ff83..f4551a74e3a 100644
--- a/gcc/config/i386/driver-i386.cc
+++ b/gcc/config/i386/driver-i386.cc
@@ -854,7 +854,8 @@ const char *host_detect_local_cpu (int argc, const char **argv)
 		  options = concat (options, " ",
 				    isa_names_table[i].option, NULL);
 	      }
-	    else if (isa_names_table[i].feature != FEATURE_AVX10_1)
+	    else if ((isa_names_table[i].feature != FEATURE_AVX10_1)
+		     && (isa_names_table[i].feature != FEATURE_AVX10_512BIT))
 	      options = concat (options, neg_option,
 				isa_names_table[i].option + 2, NULL);
 	  }
diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
index b2281fbd4b5..8f9b825b527 100644
--- a/gcc/config/i386/i386-options.cc
+++ b/gcc/config/i386/i386-options.cc
@@ -985,7 +985,7 @@ ix86_valid_target_attribute_inner_p (tree fndecl, tree args, char *p_strings[],
     ix86_opt_ix86_no,
     ix86_opt_str,
     ix86_opt_enum,
-    ix86_opt_isa,
+    ix86_opt_isa
   };
 
   static const struct
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-15.c b/gcc/testsuite/gcc.target/i386/avx10_1-15.c
new file mode 100644
index 00000000000..fd873c9694c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-15.c
@@ -0,0 +1,5 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -march=x86-64 -mavx10.1-512 -mavx10.1-256" } */
+/* { dg-warning "The options used for AVX10 have conflict vector width, using the latter 256 as vector width" "" { target *-*-* } 0 } */
+
+#include "avx10_1-1.c"
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-16.c b/gcc/testsuite/gcc.target/i386/avx10_1-16.c
new file mode 100644
index 00000000000..1e664ebd1f2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-16.c
@@ -0,0 +1,5 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=x86-64 -mavx10.1-256 -mavx10.1-512" } */
+/* { dg-warning "The options used for AVX10 have conflict vector width, using the latter 512 as vector width" "" { target *-*-* } 0 } */
+
+#include "avx10_1-2.c"
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-17.c b/gcc/testsuite/gcc.target/i386/avx10_1-17.c
new file mode 100644
index 00000000000..7dfff3aeeac
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-17.c
@@ -0,0 +1,13 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2" } */
+
+#include <immintrin.h>
+
+__attribute__ ((target ("avx10.1-512,avx10.1-256"))) void
+f1 ()
+{ /* { dg-warning "The options used for AVX10 have conflict vector width, using the latter 256 as vector width" } */
+  register __m256d a __asm ("ymm17");
+  register __m256d b __asm ("ymm16");
+  a = _mm256_add_pd (a, b);
+  asm volatile ("" : "+v" (a));
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-18.c b/gcc/testsuite/gcc.target/i386/avx10_1-18.c
new file mode 100644
index 00000000000..955cca185fd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-18.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-march=x86-64" } */
+/* { dg-final { scan-assembler "%zmm" } } */
+
+typedef double __m512d __attribute__ ((__vector_size__ (64), __may_alias__));
+
+__attribute__ ((target ("avx10.1-256,avx10.1-512"))) __m512d
+foo ()
+{ /* { dg-warning "The options used for AVX10 have conflict vector width, using the latter 512 as vector width" } */
+  __m512d a, b;
+  a = a + b;
+  return a;
+}
-- 
2.31.1


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 3/3] Emit a warning when AVX10 options conflict in vector width
  2023-08-08  7:13 ` [PATCH 3/3] Emit a warning when AVX10 options conflict in vector width Haochen Jiang
@ 2023-08-16  2:30   ` Hongtao Liu
  0 siblings, 0 replies; 88+ messages in thread
From: Hongtao Liu @ 2023-08-16  2:30 UTC (permalink / raw)
  To: Haochen Jiang; +Cc: gcc-patches, ubizjak, hongtao.liu

On Tue, Aug 8, 2023 at 3:13 PM Haochen Jiang via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> gcc/ChangeLog:
>
>         * config/i386/driver-i386.cc (host_detect_local_cpu):
>         Do not append -mno-avx10-max-512bit for -march=native.
>         * common/config/i386/i386-common.cc
>         (ix86_check_avx10_vector_width): New function to check isa_flags
>         to emit a warning when there is a conflict in AVX10 options for
>         vector width.
>         (ix86_handle_option): Add check for avx10.1-256 and avx10.1-512.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/i386/avx10_1-15.c: New test.
>         * gcc.target/i386/avx10_1-16.c: Ditto.
>         * gcc.target/i386/avx10_1-17.c: Ditto.
>         * gcc.target/i386/avx10_1-18.c: Ditto.
> ---
Ok(please wait for extra 24 hours to commit, if there's no objection)
>  gcc/common/config/i386/i386-common.cc      | 20 ++++++++++++++++++++
>  gcc/config/i386/driver-i386.cc             |  3 ++-
>  gcc/config/i386/i386-options.cc            |  2 +-
>  gcc/testsuite/gcc.target/i386/avx10_1-15.c |  5 +++++
>  gcc/testsuite/gcc.target/i386/avx10_1-16.c |  5 +++++
>  gcc/testsuite/gcc.target/i386/avx10_1-17.c | 13 +++++++++++++
>  gcc/testsuite/gcc.target/i386/avx10_1-18.c | 13 +++++++++++++
>  7 files changed, 59 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-15.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-16.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-17.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-18.c
>
> diff --git a/gcc/common/config/i386/i386-common.cc b/gcc/common/config/i386/i386-common.cc
> index ec94251dd4c..db88befc9b8 100644
> --- a/gcc/common/config/i386/i386-common.cc
> +++ b/gcc/common/config/i386/i386-common.cc
> @@ -428,6 +428,24 @@ ix86_check_avx512 (struct gcc_options *opts)
>    return true;
>  }
>
> +/* Emit a warning when there is a conflict vector width in AVX10 options.  */
> +static void
> +ix86_check_avx10_vector_width (struct gcc_options *opts, bool avx10_max_512)
> +{
> +  if (avx10_max_512)
> +    {
> +      if (((opts->x_ix86_isa_flags2 | ~OPTION_MASK_ISA2_AVX10_512BIT)
> +          == ~OPTION_MASK_ISA2_AVX10_512BIT)
> +         && (opts->x_ix86_isa_flags2_explicit & OPTION_MASK_ISA2_AVX10_512BIT))
> +       warning (0, "The options used for AVX10 have conflict vector width, "
> +                "using the latter 512 as vector width");
> +    }
> +  else if (opts->x_ix86_isa_flags2 & opts->x_ix86_isa_flags2_explicit
> +          & OPTION_MASK_ISA2_AVX10_512BIT)
> +    warning (0, "The options used for AVX10 have conflict vector width, "
> +            "using the latter 256 as vector width");
> +}
> +
>  /* Implement TARGET_HANDLE_OPTION.  */
>
>  bool
> @@ -1415,6 +1433,7 @@ ix86_handle_option (struct gcc_options *opts,
>        return true;
>
>      case OPT_mavx10_1_256:
> +      ix86_check_avx10_vector_width (opts, false);
>        opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AVX10_1_SET;
>        opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AVX10_1_SET;
>        opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_AVX10_512BIT_SET;
> @@ -1424,6 +1443,7 @@ ix86_handle_option (struct gcc_options *opts,
>        return true;
>
>      case OPT_mavx10_1_512:
> +      ix86_check_avx10_vector_width (opts, true);
>        opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AVX10_1_SET;
>        opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AVX10_1_SET;
>        opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AVX10_512BIT_SET;
> diff --git a/gcc/config/i386/driver-i386.cc b/gcc/config/i386/driver-i386.cc
> index 227ace6ff83..f4551a74e3a 100644
> --- a/gcc/config/i386/driver-i386.cc
> +++ b/gcc/config/i386/driver-i386.cc
> @@ -854,7 +854,8 @@ const char *host_detect_local_cpu (int argc, const char **argv)
>                   options = concat (options, " ",
>                                     isa_names_table[i].option, NULL);
>               }
> -           else if (isa_names_table[i].feature != FEATURE_AVX10_1)
> +           else if ((isa_names_table[i].feature != FEATURE_AVX10_1)
> +                    && (isa_names_table[i].feature != FEATURE_AVX10_512BIT))
>               options = concat (options, neg_option,
>                                 isa_names_table[i].option + 2, NULL);
>           }
> diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
> index b2281fbd4b5..8f9b825b527 100644
> --- a/gcc/config/i386/i386-options.cc
> +++ b/gcc/config/i386/i386-options.cc
> @@ -985,7 +985,7 @@ ix86_valid_target_attribute_inner_p (tree fndecl, tree args, char *p_strings[],
>      ix86_opt_ix86_no,
>      ix86_opt_str,
>      ix86_opt_enum,
> -    ix86_opt_isa,
> +    ix86_opt_isa
>    };
>
>    static const struct
> diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-15.c b/gcc/testsuite/gcc.target/i386/avx10_1-15.c
> new file mode 100644
> index 00000000000..fd873c9694c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx10_1-15.c
> @@ -0,0 +1,5 @@
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-O2 -march=x86-64 -mavx10.1-512 -mavx10.1-256" } */
> +/* { dg-warning "The options used for AVX10 have conflict vector width, using the latter 256 as vector width" "" { target *-*-* } 0 } */
> +
> +#include "avx10_1-1.c"
> diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-16.c b/gcc/testsuite/gcc.target/i386/avx10_1-16.c
> new file mode 100644
> index 00000000000..1e664ebd1f2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx10_1-16.c
> @@ -0,0 +1,5 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -march=x86-64 -mavx10.1-256 -mavx10.1-512" } */
> +/* { dg-warning "The options used for AVX10 have conflict vector width, using the latter 512 as vector width" "" { target *-*-* } 0 } */
> +
> +#include "avx10_1-2.c"
> diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-17.c b/gcc/testsuite/gcc.target/i386/avx10_1-17.c
> new file mode 100644
> index 00000000000..7dfff3aeeac
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx10_1-17.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-O2" } */
> +
> +#include <immintrin.h>
> +
> +__attribute__ ((target ("avx10.1-512,avx10.1-256"))) void
> +f1 ()
> +{ /* { dg-warning "The options used for AVX10 have conflict vector width, using the latter 256 as vector width" } */
> +  register __m256d a __asm ("ymm17");
> +  register __m256d b __asm ("ymm16");
> +  a = _mm256_add_pd (a, b);
> +  asm volatile ("" : "+v" (a));
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-18.c b/gcc/testsuite/gcc.target/i386/avx10_1-18.c
> new file mode 100644
> index 00000000000..955cca185fd
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx10_1-18.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=x86-64" } */
> +/* { dg-final { scan-assembler "%zmm" } } */
> +
> +typedef double __m512d __attribute__ ((__vector_size__ (64), __may_alias__));
> +
> +__attribute__ ((target ("avx10.1-256,avx10.1-512"))) __m512d
> +foo ()
> +{ /* { dg-warning "The options used for AVX10 have conflict vector width, using the latter 512 as vector width" } */
> +  __m512d a, b;
> +  a = a + b;
> +  return a;
> +}
> --
> 2.31.1
>


-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH 1/6] Support AVX10.1 for AVX512DQ+AVX512VL intrins
  2023-08-08  7:13 Intel AVX10.1 Compiler Design and Support Haochen Jiang
                   ` (2 preceding siblings ...)
  2023-08-08  7:13 ` [PATCH 3/3] Emit a warning when AVX10 options conflict in vector width Haochen Jiang
@ 2023-08-08  7:19 ` Haochen Jiang
  2023-08-08  7:20 ` [PATCH 2/6] " Haochen Jiang
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 88+ messages in thread
From: Haochen Jiang @ 2023-08-08  7:19 UTC (permalink / raw)
  To: gcc-patches; +Cc: ubizjak, hongtao.liu

gcc/ChangeLog:

	* config/i386/avx512vldqintrin.h: Remove target attribute.
	* config/i386/i386-builtin.def (BDESC):
	Add OPTION_MASK_ISA2_AVX10_1.
	* config/i386/i386-builtins.cc (def_builtin): Handle AVX10_1.
	* config/i386/i386-expand.cc
	(ix86_check_builtin_isa_match): Ditto.
	(ix86_expand_sse2_mulvxdi3): Add TARGET_AVX10_1.
	* config/i386/i386.md: Add new isa attribute avx10_1_or_avx512dq
	and avx10_1_or_avx512vl.
	* config/i386/sse.md: (VF2_AVX512VLDQ_AVX10_1): New.
	(VF1_128_256VLDQ_AVX10_1): Ditto.
	(VI8_AVX512VLDQ_AVX10_1): Ditto.
	(<sse>_andnot<mode>3<mask_name>):
	Add TARGET_AVX10_1 and change isa attr from avx512dq to
	avx10_1_or_avx512dq.
	(*andnot<mode>3): Add TARGET_AVX10_1 and change isa attr from
	avx512vl to avx10_1_or_avx512vl.
	(fix<fixunssuffix>_trunc<mode><sseintvecmodelower>2<mask_name><round_saeonly_name>):
	Change iterator to VF2_AVX512VLDQ_AVX10_1. Remove target check.
	(fix_notrunc<mode><sseintvecmodelower>2<mask_name><round_name>):
	Ditto.
	(ufix_notrunc<mode><sseintvecmodelower>2<mask_name><round_name>):
	Ditto.
	(fix<fixunssuffix>_trunc<mode><sselongvecmodelower>2<mask_name><round_saeonly_name>):
	Change iterator to VF1_128_256VLDQ_AVX10_1. Remove target check.
	(avx512dq_fix<fixunssuffix>_truncv2sfv2di2<mask_name>):
	Add TARGET_AVX10_1.
	(fix<fixunssuffix>_truncv2sfv2di2): Ditto.
	(cond_mul<mode>): Change iterator to VI8_AVX10_1_AVX512DQVL.
	Remove target check.
	(avx512dq_mul<mode>3<mask_name>): Ditto.
	(*avx512dq_mul<mode>3<mask_name>): Ditto.
	(VI4F_BRCST32x2): Add TARGET_AVX512DQ and TARGET_AVX10_1.
	(<mask_codefor>avx512dq_broadcast<mode><mask_name>):
	Remove target check.
	(VI8F_BRCST64x2): Add TARGET_AVX512DQ and TARGET_AVX10_1.
	(<mask_codefor>avx512dq_broadcast<mode><mask_name>_1):
	Remove target check.
	* config/i386/subst.md (mask_mode512bit_condition): Add TARGET_AVX10_1.
	(mask_avx512vl_condition): Ditto.
	(mask): Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx-1.c: Add -mavx10.1.
	* gcc.target/i386/avx-2.c: Ditto.
	* gcc.target/i386/sse-26.c: Skip AVX512VLDQ intrin file.
---
 gcc/config/i386/avx512vldqintrin.h     | 12 ++--
 gcc/config/i386/i386-builtin.def       | 46 ++++++------
 gcc/config/i386/i386-builtins.cc       |  9 +--
 gcc/config/i386/i386-expand.cc         |  8 ++-
 gcc/config/i386/i386.md                |  7 +-
 gcc/config/i386/sse.md                 | 97 ++++++++++++++++----------
 gcc/config/i386/subst.md               |  7 +-
 gcc/testsuite/gcc.target/i386/avx-1.c  |  2 +-
 gcc/testsuite/gcc.target/i386/avx-2.c  |  2 +-
 gcc/testsuite/gcc.target/i386/sse-26.c |  6 ++
 10 files changed, 117 insertions(+), 79 deletions(-)

diff --git a/gcc/config/i386/avx512vldqintrin.h b/gcc/config/i386/avx512vldqintrin.h
index be4d59c34e4..4b8006f7b73 100644
--- a/gcc/config/i386/avx512vldqintrin.h
+++ b/gcc/config/i386/avx512vldqintrin.h
@@ -28,12 +28,6 @@
 #ifndef _AVX512VLDQINTRIN_H_INCLUDED
 #define _AVX512VLDQINTRIN_H_INCLUDED
 
-#if !defined(__AVX512VL__) || !defined(__AVX512DQ__)
-#pragma GCC push_options
-#pragma GCC target("avx512vl,avx512dq")
-#define __DISABLE_AVX512VLDQ__
-#endif /* __AVX512VLDQ__ */
-
 extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_cvttpd_epi64 (__m256d __A)
@@ -679,6 +673,12 @@ _mm_maskz_andnot_ps (__mmask8 __U, __m128 __A, __m128 __B)
 						 (__mmask8) __U);
 }
 
+#if !defined(__AVX512VL__) || !defined(__AVX512DQ__)
+#pragma GCC push_options
+#pragma GCC target("avx512vl,avx512dq")
+#define __DISABLE_AVX512VLDQ__
+#endif /* __AVX512VLDQ__ */
+
 extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_cvtps_epi64 (__m128 __A)
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 8738b3b6a8a..18d8966f0de 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -1718,31 +1718,31 @@ BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_iorv4df3
 BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_iorv2df3_mask, "__builtin_ia32_orpd128_mask", IX86_BUILTIN_ORPD128_MASK, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_iorv8sf3_mask, "__builtin_ia32_orps256_mask", IX86_BUILTIN_ORPS256_MASK, UNKNOWN, (int) V8SF_FTYPE_V8SF_V8SF_V8SF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_iorv4sf3_mask, "__builtin_ia32_orps128_mask", IX86_BUILTIN_ORPS128_MASK, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512dq_broadcastv8sf_mask, "__builtin_ia32_broadcastf32x2_256_mask", IX86_BUILTIN_BROADCASTF32x2_256, UNKNOWN, (int) V8SF_FTYPE_V4SF_V8SF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512dq_broadcastv8si_mask, "__builtin_ia32_broadcasti32x2_256_mask", IX86_BUILTIN_BROADCASTI32x2_256, UNKNOWN, (int) V8SI_FTYPE_V4SI_V8SI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512dq_broadcastv4si_mask, "__builtin_ia32_broadcasti32x2_128_mask", IX86_BUILTIN_BROADCASTI32x2_128, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512dq_broadcastv4df_mask_1, "__builtin_ia32_broadcastf64x2_256_mask", IX86_BUILTIN_BROADCASTF64X2_256, UNKNOWN, (int) V4DF_FTYPE_V2DF_V4DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512dq_broadcastv4di_mask_1, "__builtin_ia32_broadcasti64x2_256_mask", IX86_BUILTIN_BROADCASTI64X2_256, UNKNOWN, (int) V4DI_FTYPE_V2DI_V4DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_avx512dq_broadcastv8sf_mask, "__builtin_ia32_broadcastf32x2_256_mask", IX86_BUILTIN_BROADCASTF32x2_256, UNKNOWN, (int) V8SF_FTYPE_V4SF_V8SF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_avx512dq_broadcastv8si_mask, "__builtin_ia32_broadcasti32x2_256_mask", IX86_BUILTIN_BROADCASTI32x2_256, UNKNOWN, (int) V8SI_FTYPE_V4SI_V8SI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_avx512dq_broadcastv4si_mask, "__builtin_ia32_broadcasti32x2_128_mask", IX86_BUILTIN_BROADCASTI32x2_128, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_avx512dq_broadcastv4df_mask_1, "__builtin_ia32_broadcastf64x2_256_mask", IX86_BUILTIN_BROADCASTF64X2_256, UNKNOWN, (int) V4DF_FTYPE_V2DF_V4DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_avx512dq_broadcastv4di_mask_1, "__builtin_ia32_broadcasti64x2_256_mask", IX86_BUILTIN_BROADCASTI64X2_256, UNKNOWN, (int) V4DI_FTYPE_V2DI_V4DI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_broadcastv8sf_mask_1, "__builtin_ia32_broadcastf32x4_256_mask", IX86_BUILTIN_BROADCASTF32X4_256, UNKNOWN, (int) V8SF_FTYPE_V4SF_V8SF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_broadcastv8si_mask_1, "__builtin_ia32_broadcasti32x4_256_mask", IX86_BUILTIN_BROADCASTI32X4_256, UNKNOWN, (int) V8SI_FTYPE_V4SI_V8SI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_vextractf128v8sf, "__builtin_ia32_extractf32x4_256_mask", IX86_BUILTIN_EXTRACTF32X4_256, UNKNOWN, (int) V4SF_FTYPE_V8SF_INT_V4SF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_vextractf128v8si, "__builtin_ia32_extracti32x4_256_mask", IX86_BUILTIN_EXTRACTI32X4_256, UNKNOWN, (int) V4SI_FTYPE_V8SI_INT_V4SI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512BW | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512bw_dbpsadbwv16hi_mask, "__builtin_ia32_dbpsadbw256_mask", IX86_BUILTIN_DBPSADBW256, UNKNOWN, (int) V16HI_FTYPE_V32QI_V32QI_INT_V16HI_UHI)
 BDESC (OPTION_MASK_ISA_AVX512BW | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512bw_dbpsadbwv8hi_mask, "__builtin_ia32_dbpsadbw128_mask", IX86_BUILTIN_DBPSADBW128, UNKNOWN, (int) V8HI_FTYPE_V16QI_V16QI_INT_V8HI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_fix_truncv4dfv4di2_mask, "__builtin_ia32_cvttpd2qq256_mask", IX86_BUILTIN_CVTTPD2QQ256, UNKNOWN, (int) V4DI_FTYPE_V4DF_V4DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_fix_truncv2dfv2di2_mask, "__builtin_ia32_cvttpd2qq128_mask", IX86_BUILTIN_CVTTPD2QQ128, UNKNOWN, (int) V2DI_FTYPE_V2DF_V2DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_fixuns_truncv4dfv4di2_mask, "__builtin_ia32_cvttpd2uqq256_mask", IX86_BUILTIN_CVTTPD2UQQ256, UNKNOWN, (int) V4DI_FTYPE_V4DF_V4DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_fixuns_truncv2dfv2di2_mask, "__builtin_ia32_cvttpd2uqq128_mask", IX86_BUILTIN_CVTTPD2UQQ128, UNKNOWN, (int) V2DI_FTYPE_V2DF_V2DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_fix_notruncv4dfv4di2_mask, "__builtin_ia32_cvtpd2qq256_mask", IX86_BUILTIN_CVTPD2QQ256, UNKNOWN, (int) V4DI_FTYPE_V4DF_V4DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_fix_notruncv2dfv2di2_mask, "__builtin_ia32_cvtpd2qq128_mask", IX86_BUILTIN_CVTPD2QQ128, UNKNOWN, (int) V2DI_FTYPE_V2DF_V2DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_fixuns_notruncv4dfv4di2_mask, "__builtin_ia32_cvtpd2uqq256_mask", IX86_BUILTIN_CVTPD2UQQ256, UNKNOWN, (int) V4DI_FTYPE_V4DF_V4DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_fixuns_notruncv2dfv2di2_mask, "__builtin_ia32_cvtpd2uqq128_mask", IX86_BUILTIN_CVTPD2UQQ128, UNKNOWN, (int) V2DI_FTYPE_V2DF_V2DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_fix_truncv4dfv4di2_mask, "__builtin_ia32_cvttpd2qq256_mask", IX86_BUILTIN_CVTTPD2QQ256, UNKNOWN, (int) V4DI_FTYPE_V4DF_V4DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_fix_truncv2dfv2di2_mask, "__builtin_ia32_cvttpd2qq128_mask", IX86_BUILTIN_CVTTPD2QQ128, UNKNOWN, (int) V2DI_FTYPE_V2DF_V2DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_fixuns_truncv4dfv4di2_mask, "__builtin_ia32_cvttpd2uqq256_mask", IX86_BUILTIN_CVTTPD2UQQ256, UNKNOWN, (int) V4DI_FTYPE_V4DF_V4DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_fixuns_truncv2dfv2di2_mask, "__builtin_ia32_cvttpd2uqq128_mask", IX86_BUILTIN_CVTTPD2UQQ128, UNKNOWN, (int) V2DI_FTYPE_V2DF_V2DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_fix_notruncv4dfv4di2_mask, "__builtin_ia32_cvtpd2qq256_mask", IX86_BUILTIN_CVTPD2QQ256, UNKNOWN, (int) V4DI_FTYPE_V4DF_V4DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_fix_notruncv2dfv2di2_mask, "__builtin_ia32_cvtpd2qq128_mask", IX86_BUILTIN_CVTPD2QQ128, UNKNOWN, (int) V2DI_FTYPE_V2DF_V2DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_fixuns_notruncv4dfv4di2_mask, "__builtin_ia32_cvtpd2uqq256_mask", IX86_BUILTIN_CVTPD2UQQ256, UNKNOWN, (int) V4DI_FTYPE_V4DF_V4DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_fixuns_notruncv2dfv2di2_mask, "__builtin_ia32_cvtpd2uqq128_mask", IX86_BUILTIN_CVTPD2UQQ128, UNKNOWN, (int) V2DI_FTYPE_V2DF_V2DI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_fixuns_notruncv4dfv4si2_mask, "__builtin_ia32_cvtpd2udq256_mask", IX86_BUILTIN_CVTPD2UDQ256_MASK, UNKNOWN, (int) V4SI_FTYPE_V4DF_V4SI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_fixuns_notruncv2dfv2si2_mask, "__builtin_ia32_cvtpd2udq128_mask", IX86_BUILTIN_CVTPD2UDQ128_MASK, UNKNOWN, (int) V4SI_FTYPE_V2DF_V4SI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_fix_truncv4sfv4di2_mask, "__builtin_ia32_cvttps2qq256_mask", IX86_BUILTIN_CVTTPS2QQ256, UNKNOWN, (int) V4DI_FTYPE_V4SF_V4DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512dq_fix_truncv2sfv2di2_mask, "__builtin_ia32_cvttps2qq128_mask", IX86_BUILTIN_CVTTPS2QQ128, UNKNOWN, (int) V2DI_FTYPE_V4SF_V2DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_fixuns_truncv4sfv4di2_mask, "__builtin_ia32_cvttps2uqq256_mask", IX86_BUILTIN_CVTTPS2UQQ256, UNKNOWN, (int) V4DI_FTYPE_V4SF_V4DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512dq_fixuns_truncv2sfv2di2_mask, "__builtin_ia32_cvttps2uqq128_mask", IX86_BUILTIN_CVTTPS2UQQ128, UNKNOWN, (int) V2DI_FTYPE_V4SF_V2DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_fix_truncv4sfv4di2_mask, "__builtin_ia32_cvttps2qq256_mask", IX86_BUILTIN_CVTTPS2QQ256, UNKNOWN, (int) V4DI_FTYPE_V4SF_V4DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_avx512dq_fix_truncv2sfv2di2_mask, "__builtin_ia32_cvttps2qq128_mask", IX86_BUILTIN_CVTTPS2QQ128, UNKNOWN, (int) V2DI_FTYPE_V4SF_V2DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_fixuns_truncv4sfv4di2_mask, "__builtin_ia32_cvttps2uqq256_mask", IX86_BUILTIN_CVTTPS2UQQ256, UNKNOWN, (int) V4DI_FTYPE_V4SF_V4DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_avx512dq_fixuns_truncv2sfv2di2_mask, "__builtin_ia32_cvttps2uqq128_mask", IX86_BUILTIN_CVTTPS2UQQ128, UNKNOWN, (int) V2DI_FTYPE_V4SF_V2DI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_fix_truncv8sfv8si2_mask, "__builtin_ia32_cvttps2dq256_mask", IX86_BUILTIN_CVTTPS2DQ256_MASK, UNKNOWN, (int) V8SI_FTYPE_V8SF_V8SI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_fix_truncv4sfv4si2_mask, "__builtin_ia32_cvttps2dq128_mask", IX86_BUILTIN_CVTTPS2DQ128_MASK, UNKNOWN, (int) V4SI_FTYPE_V4SF_V4SI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_fixuns_truncv8sfv8si2_mask, "__builtin_ia32_cvttps2udq256_mask", IX86_BUILTIN_CVTTPS2UDQ256, UNKNOWN, (int) V8SI_FTYPE_V8SF_V8SI_UQI)
@@ -1936,16 +1936,16 @@ BDESC (OPTION_MASK_ISA_AVX512BW | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_smulv16h
 BDESC (OPTION_MASK_ISA_AVX512BW | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_smulv8hi3_highpart_mask, "__builtin_ia32_pmulhw128_mask", IX86_BUILTIN_PMULHW128_MASK, UNKNOWN,(int) V8HI_FTYPE_V8HI_V8HI_V8HI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512BW | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_mulv16hi3_mask, "__builtin_ia32_pmullw256_mask"  , IX86_BUILTIN_PMULLW256_MASK, UNKNOWN, (int) V16HI_FTYPE_V16HI_V16HI_V16HI_UHI)
 BDESC (OPTION_MASK_ISA_AVX512BW | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_mulv8hi3_mask, "__builtin_ia32_pmullw128_mask", IX86_BUILTIN_PMULLW128_MASK, UNKNOWN, (int) V8HI_FTYPE_V8HI_V8HI_V8HI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512dq_mulv4di3_mask, "__builtin_ia32_pmullq256_mask", IX86_BUILTIN_PMULLQ256, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_V4DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512dq_mulv2di3_mask, "__builtin_ia32_pmullq128_mask", IX86_BUILTIN_PMULLQ128, UNKNOWN, (int) V2DI_FTYPE_V2DI_V2DI_V2DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_avx512dq_mulv4di3_mask, "__builtin_ia32_pmullq256_mask", IX86_BUILTIN_PMULLQ256, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_V4DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_avx512dq_mulv2di3_mask, "__builtin_ia32_pmullq128_mask", IX86_BUILTIN_PMULLQ128, UNKNOWN, (int) V2DI_FTYPE_V2DI_V2DI_V2DI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_andv4df3_mask, "__builtin_ia32_andpd256_mask", IX86_BUILTIN_ANDPD256_MASK, UNKNOWN, (int) V4DF_FTYPE_V4DF_V4DF_V4DF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_andv2df3_mask, "__builtin_ia32_andpd128_mask", IX86_BUILTIN_ANDPD128_MASK, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_andv8sf3_mask, "__builtin_ia32_andps256_mask", IX86_BUILTIN_ANDPS256_MASK, UNKNOWN, (int) V8SF_FTYPE_V8SF_V8SF_V8SF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_andv4sf3_mask, "__builtin_ia32_andps128_mask", IX86_BUILTIN_ANDPS128_MASK, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx_andnotv4df3_mask, "__builtin_ia32_andnpd256_mask", IX86_BUILTIN_ANDNPD256_MASK, UNKNOWN, (int) V4DF_FTYPE_V4DF_V4DF_V4DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_sse2_andnotv2df3_mask, "__builtin_ia32_andnpd128_mask", IX86_BUILTIN_ANDNPD128_MASK, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx_andnotv8sf3_mask, "__builtin_ia32_andnps256_mask", IX86_BUILTIN_ANDNPS256_MASK, UNKNOWN, (int) V8SF_FTYPE_V8SF_V8SF_V8SF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_sse_andnotv4sf3_mask, "__builtin_ia32_andnps128_mask", IX86_BUILTIN_ANDNPS128_MASK, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_avx_andnotv4df3_mask, "__builtin_ia32_andnpd256_mask", IX86_BUILTIN_ANDNPD256_MASK, UNKNOWN, (int) V4DF_FTYPE_V4DF_V4DF_V4DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_sse2_andnotv2df3_mask, "__builtin_ia32_andnpd128_mask", IX86_BUILTIN_ANDNPD128_MASK, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_avx_andnotv8sf3_mask, "__builtin_ia32_andnps256_mask", IX86_BUILTIN_ANDNPS256_MASK, UNKNOWN, (int) V8SF_FTYPE_V8SF_V8SF_V8SF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_sse_andnotv4sf3_mask, "__builtin_ia32_andnps128_mask", IX86_BUILTIN_ANDNPS128_MASK, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512BW | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_ashlv8hi3_mask, "__builtin_ia32_psllwi128_mask", IX86_BUILTIN_PSLLWI128_MASK, UNKNOWN, (int) V8HI_FTYPE_V8HI_INT_V8HI_UQI_COUNT)
 BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_ashlv4si3_mask, "__builtin_ia32_pslldi128_mask", IX86_BUILTIN_PSLLDI128_MASK, UNKNOWN, (int) V4SI_FTYPE_V4SI_INT_V4SI_UQI_COUNT)
 BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_ashlv2di3_mask, "__builtin_ia32_psllqi128_mask", IX86_BUILTIN_PSLLQI128_MASK, UNKNOWN, (int) V2DI_FTYPE_V2DI_INT_V2DI_UQI_COUNT)
diff --git a/gcc/config/i386/i386-builtins.cc b/gcc/config/i386/i386-builtins.cc
index 356b6dfd5fb..06667629d51 100644
--- a/gcc/config/i386/i386-builtins.cc
+++ b/gcc/config/i386/i386-builtins.cc
@@ -278,15 +278,16 @@ def_builtin (HOST_WIDE_INT mask, HOST_WIDE_INT mask2,
       if (((mask2 == 0 || (mask2 & ix86_isa_flags2) != 0)
 	   && (mask == 0 || (mask & ix86_isa_flags) != 0))
 	  || ((mask & OPTION_MASK_ISA_MMX) != 0 && TARGET_MMX_WITH_SSE)
-	  /* "Unified" builtin used by either AVXVNNI/AVXIFMA/AES intrinsics
-	     or AVX512VNNIVL/AVX512IFMAVL/VAESVL non-mask intrinsics should be
-	     defined whenever avxvnni/avxifma/aes or avx512vnni/avx512ifma/vaes
-	     && avx512vl exist.  */
+	  /* "Unified" builtin used by either AVXVNNI/AVXIFMA/AES/AVX10.1
+	     intrinsics or AVX512VNNIVL/AVX512IFMAVL/VAESVL/- non-mask
+	     intrinsics should be defined whenever avxvnni/avxifma/aes/avx10.1 or
+	     avx512vnni/avx512ifma/vaes/- && avx512vl exist.  */
 	  || (mask2 == OPTION_MASK_ISA2_AVXVNNI)
 	  || (mask2 == OPTION_MASK_ISA2_AVXIFMA)
 	  || (mask2 == (OPTION_MASK_ISA2_AVXNECONVERT
 			| OPTION_MASK_ISA2_AVX512BF16))
 	  || ((mask2 & OPTION_MASK_ISA2_VAES) != 0)
+	  || ((mask2 & OPTION_MASK_ISA2_AVX10_1) != 0)
 	  || (lang_hooks.builtin_function
 	      == lang_hooks.builtin_function_ext_scope))
 	{
diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index e9dc0bc2e9d..85e30552d6f 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -12718,6 +12718,8 @@ ix86_check_builtin_isa_match (unsigned int fcode,
 		 OPTION_MASK_ISA2_AVXNECONVERT);
   SHARE_BUILTIN (OPTION_MASK_ISA_AES, 0, OPTION_MASK_ISA_AVX512VL,
 		 OPTION_MASK_ISA2_VAES);
+  SHARE_BUILTIN (OPTION_MASK_ISA_AVX512VL, 0, 0, OPTION_MASK_ISA2_AVX10_1);
+  SHARE_BUILTIN (OPTION_MASK_ISA_AVX512DQ, 0, 0, OPTION_MASK_ISA2_AVX10_1);
   isa = tmp_isa;
   isa2 = tmp_isa2;
 
@@ -23949,9 +23951,11 @@ ix86_expand_sse2_mulvxdi3 (rtx op0, rtx op1, rtx op2)
 
   if (TARGET_AVX512DQ && mode == V8DImode)
     emit_insn (gen_avx512dq_mulv8di3 (op0, op1, op2));
-  else if (TARGET_AVX512DQ && TARGET_AVX512VL && mode == V4DImode)
+  else if (((TARGET_AVX512DQ && TARGET_AVX512VL) || TARGET_AVX10_1)
+	   && mode == V4DImode)
     emit_insn (gen_avx512dq_mulv4di3 (op0, op1, op2));
-  else if (TARGET_AVX512DQ && TARGET_AVX512VL && mode == V2DImode)
+  else if (((TARGET_AVX512DQ && TARGET_AVX512VL) || TARGET_AVX10_1)
+	   && mode == V2DImode)
     emit_insn (gen_avx512dq_mulv2di3 (op0, op1, op2));
   else if (TARGET_XOP && mode == V2DImode)
     {
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index c906d75b13e..765dd0d1115 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -538,7 +538,8 @@
 		    avx,noavx,avx2,noavx2,bmi,bmi2,fma4,fma,avx512f,noavx512f,
 		    avx512bw,noavx512bw,avx512dq,noavx512dq,fma_or_avx512vl,
 		    avx512vl,noavx512vl,avxvnni,avx512vnnivl,avx512fp16,avxifma,
-		    avx512ifmavl,avxneconvert,avx512bf16vl,vpclmulqdqvl"
+		    avx512ifmavl,avxneconvert,avx512bf16vl,vpclmulqdqvl,
+		    avx10_1_or_avx512dq,avx10_1_or_avx512vl"
   (const_string "base"))
 
 ;; The (bounding maximum) length of an instruction immediate.
@@ -919,6 +920,10 @@
 	   (symbol_ref "TARGET_AVX512BF16 && TARGET_AVX512VL")
 	 (eq_attr "isa" "vpclmulqdqvl")
 	   (symbol_ref "TARGET_VPCLMULQDQ && TARGET_AVX512VL")
+	 (eq_attr "isa" "avx10_1_or_avx512dq")
+	   (symbol_ref "TARGET_AVX512DQ || TARGET_AVX10_1")
+	 (eq_attr "isa" "avx10_1_or_avx512vl")
+	   (symbol_ref "TARGET_AVX512VL || TARGET_AVX10_1")
 
 	 (eq_attr "mmx_isa" "native")
 	   (symbol_ref "!TARGET_MMX_WITH_SSE")
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 2c698af4664..5d19aaf380f 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -388,6 +388,10 @@
 (define_mode_iterator VF1_128_256VL
   [V8SF (V4SF "TARGET_AVX512VL")])
 
+(define_mode_iterator VF1_128_256VLDQ_AVX10_1
+  [(V8SF "TARGET_AVX512DQ")
+   (V4SF "(TARGET_AVX512DQ && TARGET_AVX512VL) || TARGET_AVX10_1")])
+
 ;; All DFmode vector float modes
 (define_mode_iterator VF2
   [(V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") V2DF])
@@ -463,6 +467,11 @@
 (define_mode_iterator VF2_AVX512VL
   [V8DF (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")])
 
+(define_mode_iterator VF2_AVX512VLDQ_AVX10_1
+  [(V8DF "TARGET_AVX512DQ")
+   (V4DF "(TARGET_AVX512DQ && TARGET_AVX512VL) || TARGET_AVX10_1")
+   (V2DF "(TARGET_AVX512DQ && TARGET_AVX512VL) || TARGET_AVX10_1")])
+
 (define_mode_iterator VF1_AVX512VL
   [V16SF (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")])
 
@@ -528,6 +537,11 @@
 (define_mode_iterator VI8_AVX512VL
   [V8DI (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")])
 
+(define_mode_iterator VI8_AVX512VLDQ_AVX10_1
+  [(V8DI "TARGET_AVX512DQ")
+   (V4DI "(TARGET_AVX512DQ && TARGET_AVX512VL) || TARGET_AVX10_1")
+   (V2DI "(TARGET_AVX512DQ && TARGET_AVX512VL) || TARGET_AVX10_1")])
+
 (define_mode_iterator VI8_256_512
   [V8DI (V4DI "TARGET_AVX512VL")])
 
@@ -4774,13 +4788,13 @@
   output_asm_insn (buf, operands);
   return "";
 }
-  [(set_attr "isa" "noavx,avx,avx512dq,avx512f")
+  [(set_attr "isa" "noavx,avx,avx10_1_or_avx512dq,avx512f")
    (set_attr "type" "sselog")
    (set_attr "prefix" "orig,maybe_vex,evex,evex")
    (set (attr "mode")
 	(cond [(and (match_test "<mask_applied>")
 		    (and (eq_attr "alternative" "1")
-			 (match_test "!TARGET_AVX512DQ")))
+			 (match_test "!(TARGET_AVX512DQ || TARGET_AVX10_1)")))
 		 (const_string "<sseintvecmode2>")
 	       (eq_attr "alternative" "3")
 		 (const_string "<sseintvecmode2>")
@@ -5031,7 +5045,7 @@
       ops = "vandn%s\t{%%2, %%1, %%0|%%0, %%1, %%2}";
       break;
     case 2:
-      if (TARGET_AVX512DQ)
+      if (TARGET_AVX512DQ || TARGET_AVX10_1)
 	ops = "vandn%s\t{%%2, %%1, %%0|%%0, %%1, %%2}";
       else
 	{
@@ -5056,12 +5070,12 @@
   output_asm_insn (buf, operands);
   return "";
 }
-  [(set_attr "isa" "noavx,avx,avx512vl,avx512f")
+  [(set_attr "isa" "noavx,avx,avx10_1_or_avx512vl,avx512f")
    (set_attr "type" "sselog")
    (set_attr "prefix" "orig,vex,evex,evex")
    (set (attr "mode")
 	(cond [(eq_attr "alternative" "2")
-		 (if_then_else (match_test "TARGET_AVX512DQ")
+		 (if_then_else (match_test "TARGET_AVX512DQ || TARGET_AVX10_1")
 			       (const_string "<ssevecmode>")
 			       (const_string "TI"))
 	       (eq_attr "alternative" "3")
@@ -8870,8 +8884,8 @@
 (define_insn "fix<fixunssuffix>_trunc<mode><sseintvecmodelower>2<mask_name><round_saeonly_name>"
   [(set (match_operand:<sseintvecmode> 0 "register_operand" "=v")
 	(any_fix:<sseintvecmode>
-	  (match_operand:VF2_AVX512VL 1 "<round_saeonly_nimm_predicate>" "<round_saeonly_constraint>")))]
-  "TARGET_AVX512DQ && <round_saeonly_mode512bit_condition>"
+	  (match_operand:VF2_AVX512VLDQ_AVX10_1 1 "<round_saeonly_nimm_predicate>" "<round_saeonly_constraint>")))]
+  "<round_saeonly_mode512bit_condition>"
   "vcvttpd2<fixsuffix>qq\t{<round_saeonly_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_saeonly_mask_op2>}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
@@ -8880,9 +8894,9 @@
 (define_insn "fix_notrunc<mode><sseintvecmodelower>2<mask_name><round_name>"
   [(set (match_operand:<sseintvecmode> 0 "register_operand" "=v")
 	(unspec:<sseintvecmode>
-	  [(match_operand:VF2_AVX512VL 1 "<round_nimm_predicate>" "<round_constraint>")]
+	  [(match_operand:VF2_AVX512VLDQ_AVX10_1 1 "<round_nimm_predicate>" "<round_constraint>")]
 	  UNSPEC_FIX_NOTRUNC))]
-  "TARGET_AVX512DQ && <round_mode512bit_condition>"
+  "<round_mode512bit_condition>"
   "vcvtpd2qq\t{<round_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_mask_op2>}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
@@ -8891,9 +8905,9 @@
 (define_insn "fixuns_notrunc<mode><sseintvecmodelower>2<mask_name><round_name>"
   [(set (match_operand:<sseintvecmode> 0 "register_operand" "=v")
 	(unspec:<sseintvecmode>
-	  [(match_operand:VF2_AVX512VL 1 "nonimmediate_operand" "<round_constraint>")]
+	  [(match_operand:VF2_AVX512VLDQ_AVX10_1 1 "nonimmediate_operand" "<round_constraint>")]
 	  UNSPEC_UNSIGNED_FIX_NOTRUNC))]
-  "TARGET_AVX512DQ && <round_mode512bit_condition>"
+  "<round_mode512bit_condition>"
   "vcvtpd2uqq\t{<round_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_mask_op2>}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
@@ -8902,8 +8916,8 @@
 (define_insn "fix<fixunssuffix>_trunc<mode><sselongvecmodelower>2<mask_name><round_saeonly_name>"
   [(set (match_operand:<sselongvecmode> 0 "register_operand" "=v")
 	(any_fix:<sselongvecmode>
-	  (match_operand:VF1_128_256VL 1 "<round_saeonly_nimm_predicate>" "<round_saeonly_constraint>")))]
-  "TARGET_AVX512DQ && <round_saeonly_modev8sf_condition>"
+	  (match_operand:VF1_128_256VLDQ_AVX10_1 1 "<round_saeonly_nimm_predicate>" "<round_saeonly_constraint>")))]
+  "<round_saeonly_modev8sf_condition>"
   "vcvttps2<fixsuffix>qq\t{<round_saeonly_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_saeonly_mask_op2>}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
@@ -8915,7 +8929,7 @@
 	  (vec_select:V2SF
 	    (match_operand:V4SF 1 "nonimmediate_operand" "vm")
 	    (parallel [(const_int 0) (const_int 1)]))))]
-  "TARGET_AVX512DQ && TARGET_AVX512VL"
+  "(TARGET_AVX512DQ && TARGET_AVX512VL) || TARGET_AVX10_1"
   "vcvttps2<fixsuffix>qq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %q1}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
@@ -8925,7 +8939,7 @@
   [(set (match_operand:V2DI 0 "register_operand")
 	(any_fix:V2DI
 	  (match_operand:V2SF 1 "register_operand")))]
-  "TARGET_AVX512DQ && TARGET_AVX512VL"
+  "(TARGET_AVX512DQ && TARGET_AVX512VL) || TARGET_AVX10_1"
 {
   rtx op1 = force_reg (V2SFmode, operands[1]);
   op1 = lowpart_subreg (V4SFmode, op1, V2SFmode);
@@ -15840,14 +15854,14 @@
    (set_attr "mode" "TI")])
 
 (define_expand "cond_mul<mode>"
-  [(set (match_operand:VI8_AVX512VL 0 "register_operand")
-	(vec_merge:VI8_AVX512VL
-	  (mult:VI8_AVX512VL
-	    (match_operand:VI8_AVX512VL 2 "vector_operand")
-	    (match_operand:VI8_AVX512VL 3 "vector_operand"))
-	  (match_operand:VI8_AVX512VL 4 "nonimm_or_0_operand")
+  [(set (match_operand:VI8_AVX512VLDQ_AVX10_1 0 "register_operand")
+	(vec_merge:VI8_AVX512VLDQ_AVX10_1
+	  (mult:VI8_AVX512VLDQ_AVX10_1
+	    (match_operand:VI8_AVX512VLDQ_AVX10_1 2 "vector_operand")
+	    (match_operand:VI8_AVX512VLDQ_AVX10_1 3 "vector_operand"))
+	  (match_operand:VI8_AVX512VLDQ_AVX10_1 4 "nonimm_or_0_operand")
 	  (match_operand:<avx512fmaskmode> 1 "register_operand")))]
-  "TARGET_AVX512DQ"
+  ""
 {
   emit_insn (gen_avx512dq_mul<mode>3_mask (operands[0],
 					   operands[2],
@@ -15858,19 +15872,19 @@
 })
 
 (define_expand "avx512dq_mul<mode>3<mask_name>"
-  [(set (match_operand:VI8_AVX512VL 0 "register_operand")
-	(mult:VI8_AVX512VL
-	  (match_operand:VI8_AVX512VL 1 "bcst_vector_operand")
-	  (match_operand:VI8_AVX512VL 2 "bcst_vector_operand")))]
-  "TARGET_AVX512DQ && <mask_mode512bit_condition>"
+  [(set (match_operand:VI8_AVX512VLDQ_AVX10_1 0 "register_operand")
+	(mult:VI8_AVX512VLDQ_AVX10_1
+	  (match_operand:VI8_AVX512VLDQ_AVX10_1 1 "bcst_vector_operand")
+	  (match_operand:VI8_AVX512VLDQ_AVX10_1 2 "bcst_vector_operand")))]
+  "<mask_mode512bit_condition>"
   "ix86_fixup_binary_operands_no_copy (MULT, <MODE>mode, operands);")
 
 (define_insn "*avx512dq_mul<mode>3<mask_name>"
-  [(set (match_operand:VI8_AVX512VL 0 "register_operand" "=v")
-	(mult:VI8_AVX512VL
-	  (match_operand:VI8_AVX512VL 1 "bcst_vector_operand" "%v")
-	  (match_operand:VI8_AVX512VL 2 "bcst_vector_operand" "vmBr")))]
-  "TARGET_AVX512DQ && <mask_mode512bit_condition>
+  [(set (match_operand:VI8_AVX512VLDQ_AVX10_1 0 "register_operand" "=v")
+	(mult:VI8_AVX512VLDQ_AVX10_1
+	  (match_operand:VI8_AVX512VLDQ_AVX10_1 1 "bcst_vector_operand" "%v")
+	  (match_operand:VI8_AVX512VLDQ_AVX10_1 2 "bcst_vector_operand" "vmBr")))]
+  "<mask_mode512bit_condition>
   && ix86_binary_operator_ok (MULT, <MODE>mode, operands)"
 {
   if (TARGET_DEST_FALSE_DEP_FOR_GLC
@@ -17506,7 +17520,8 @@
 		       ? "<ssemodesuffix>" : "");
 	  break;
 	default:
-	  ssesuffix = TARGET_AVX512VL && which_alternative == 2 ? "q" : "";
+	  ssesuffix = (TARGET_AVX512VL || TARGET_AVX10_1)
+		      && which_alternative == 2 ? "q" : "";
 	}
       break;
 
@@ -26826,8 +26841,11 @@
 
 ;; For broadcast[i|f]32x2.  Yes there is no v4sf version, only v4si.
 (define_mode_iterator VI4F_BRCST32x2
-  [V16SI (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL")
-   V16SF (V8SF "TARGET_AVX512VL")])
+  [(V16SI "TARGET_AVX512DQ")
+   (V8SI "(TARGET_AVX512DQ && TARGET_AVX512VL) || TARGET_AVX10_1")
+   (V4SI "(TARGET_AVX512DQ && TARGET_AVX512VL) || TARGET_AVX10_1")
+   (V16SF "TARGET_AVX512DQ")
+   (V8SF "(TARGET_AVX512DQ && TARGET_AVX512VL) || TARGET_AVX10_1")])
 
 (define_mode_attr 64x2mode
   [(V8DF "V2DF") (V8DI "V2DI") (V4DI "V2DI") (V4DF "V2DF")])
@@ -26842,7 +26860,7 @@
 	  (vec_select:<32x2mode>
 	    (match_operand:<ssexmmmode> 1 "nonimmediate_operand" "vm")
 	    (parallel [(const_int 0) (const_int 1)]))))]
-  "TARGET_AVX512DQ"
+  ""
   "vbroadcast<shuffletype>32x2\t{%1, %0<mask_operand2>|%0<mask_operand2>, %q1}"
   [(set_attr "type" "ssemov")
    (set_attr "prefix_extra" "1")
@@ -26877,13 +26895,16 @@
 
 ;; For broadcast[i|f]64x2
 (define_mode_iterator VI8F_BRCST64x2
-  [V8DI V8DF (V4DI "TARGET_AVX512VL") (V4DF "TARGET_AVX512VL")])
+  [(V8DI "TARGET_AVX512DQ")
+   (V8DF "TARGET_AVX512DQ")
+   (V4DI "(TARGET_AVX512DQ && TARGET_AVX512VL) || TARGET_AVX10_1")
+   (V4DF "(TARGET_AVX512DQ && TARGET_AVX512VL) || TARGET_AVX10_1")])
 
 (define_insn "<mask_codefor>avx512dq_broadcast<mode><mask_name>_1"
   [(set (match_operand:VI8F_BRCST64x2 0 "register_operand" "=v,v")
        (vec_duplicate:VI8F_BRCST64x2
          (match_operand:<64x2mode> 1 "nonimmediate_operand" "v,m")))]
-  "TARGET_AVX512DQ"
+  ""
   "@
    vshuf<shuffletype>64x2\t{$0x0, %<xtg_mode>1, %<xtg_mode>1, %0<mask_operand2>|%0<mask_operand2>, %<xtg_mode>1, %<xtg_mode>1, 0x0}
    vbroadcast<shuffletype>64x2\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
diff --git a/gcc/config/i386/subst.md b/gcc/config/i386/subst.md
index c5de75bb89d..59c4b395a9d 100644
--- a/gcc/config/i386/subst.md
+++ b/gcc/config/i386/subst.md
@@ -61,8 +61,9 @@
 (define_subst_attr "mask_operand19" "mask" "" "%{%20%}%N19")
 (define_subst_attr "mask_codefor" "mask" "*" "")
 (define_subst_attr "mask_operand_arg34" "mask" "" ", operands[3], operands[4]")
-(define_subst_attr "mask_mode512bit_condition" "mask" "1" "(<MODE_SIZE> == 64 || TARGET_AVX512VL)")
-(define_subst_attr "mask_avx512vl_condition" "mask" "1" "TARGET_AVX512VL")
+(define_subst_attr "mask_mode512bit_condition" "mask" "1" "(<MODE_SIZE> == 64 || TARGET_AVX512VL
+							    || TARGET_AVX10_1)")
+(define_subst_attr "mask_avx512vl_condition" "mask" "1" "(TARGET_AVX512VL || TARGET_AVX10_1)")
 (define_subst_attr "mask_avx512bw_condition" "mask" "1" "TARGET_AVX512BW")
 (define_subst_attr "mask_avx512dq_condition" "mask" "1" "TARGET_AVX512DQ")
 (define_subst_attr "mask_prefix" "mask" "vex" "evex")
@@ -81,7 +82,7 @@
 (define_subst "mask"
   [(set (match_operand:SUBST_V 0)
         (match_operand:SUBST_V 1))]
-  "TARGET_AVX512F"
+  "TARGET_AVX512F || TARGET_AVX10_1"
   [(set (match_dup 0)
         (vec_merge:SUBST_V
 	  (match_dup 1)
diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c
index a6589deca84..bb72555939a 100644
--- a/gcc/testsuite/gcc.target/i386/avx-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mavx2 -maes -mpclmul -mgfni -mavx512bw -mavx512fp16 -mavx512vl -mprefetchi" } */
+/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mavx2 -maes -mpclmul -mgfni -mavx512bw -mavx512fp16 -mavx512vl -mprefetchi -mavx10.1" } */
 /* { dg-add-options bind_pic_locally } */
 
 #include <mm_malloc.h>
diff --git a/gcc/testsuite/gcc.target/i386/avx-2.c b/gcc/testsuite/gcc.target/i386/avx-2.c
index 642ae4d7bfb..85fd7213d1b 100644
--- a/gcc/testsuite/gcc.target/i386/avx-2.c
+++ b/gcc/testsuite/gcc.target/i386/avx-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mavx2 -msse4a -maes -mpclmul -mavx512bw -mavx512fp16 -mavx512vl" } */
+/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mavx2 -msse4a -maes -mpclmul -mavx512bw -mavx512fp16 -mavx512vl -mavx10.1" } */
 /* { dg-add-options bind_pic_locally } */
 
 #include <mm_malloc.h>
diff --git a/gcc/testsuite/gcc.target/i386/sse-26.c b/gcc/testsuite/gcc.target/i386/sse-26.c
index 04ffe10f42a..89db33b8b8c 100644
--- a/gcc/testsuite/gcc.target/i386/sse-26.c
+++ b/gcc/testsuite/gcc.target/i386/sse-26.c
@@ -2,4 +2,10 @@
 /* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse2 -mmmx -mno-sse3 -mno-3dnow -mno-fma -mno-fxsr -mno-xsave -mno-rtm -mno-prfchw -mno-rdseed -mno-adx -mno-prefetchwt1 -mno-clflushopt -mno-xsavec -mno-xsaves -mno-clwb -mno-mwaitx -mno-clzero -mno-pku -mno-rdpid -mno-gfni -mno-shstk -mno-vaes -mno-vpclmulqdq" } */
 /* { dg-add-options bind_pic_locally } */
 
+/* We need to skip those intrin files which removed target attribute since after
+   removal GCC will issue a "target option mismatch" error for those
+   intrinsics. */
+
+#define _AVX512VLDQINTRIN_H_INCLUDED
+
 #include "sse-13.c"
-- 
2.31.1


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH 2/6] Support AVX10.1 for AVX512DQ+AVX512VL intrins
  2023-08-08  7:13 Intel AVX10.1 Compiler Design and Support Haochen Jiang
                   ` (3 preceding siblings ...)
  2023-08-08  7:19 ` [PATCH 1/6] Support AVX10.1 for AVX512DQ+AVX512VL intrins Haochen Jiang
@ 2023-08-08  7:20 ` Haochen Jiang
  2023-08-08  7:20 ` [PATCH 3/6] " Haochen Jiang
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 88+ messages in thread
From: Haochen Jiang @ 2023-08-08  7:20 UTC (permalink / raw)
  To: gcc-patches; +Cc: ubizjak, hongtao.liu

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx10_1-vandnpd-1.c: New test.
	* gcc.target/i386/avx10_1-vandnps-1.c: Ditto.
	* gcc.target/i386/avx10_1-vbroadcastf32x2-1.c: Ditto.
	* gcc.target/i386/avx10_1-vbroadcastf64x2-1.c: Ditto.
	* gcc.target/i386/avx10_1-vbroadcasti32x2-1.c: Ditto.
	* gcc.target/i386/avx10_1-vbroadcasti64x2-1.c: Ditto.
	* gcc.target/i386/avx10_1-vcvtpd2qq-1.c: Ditto.
	* gcc.target/i386/avx10_1-vcvtpd2uqq-1.c: Ditto.
	* gcc.target/i386/avx10_1-vcvttpd2qq-1.c: Ditto.
	* gcc.target/i386/avx10_1-vcvttpd2uqq-1.c: Ditto.
	* gcc.target/i386/avx10_1-vcvttps2qq-1.c: Ditto.
	* gcc.target/i386/avx10_1-vcvttps2uqq-1.c: Ditto.
	* gcc.target/i386/avx10_1-vpmullq-1.c: Ditto.
---
 .../gcc.target/i386/avx10_1-vandnpd-1.c       | 21 +++++++++++++
 .../gcc.target/i386/avx10_1-vandnps-1.c       | 21 +++++++++++++
 .../i386/avx10_1-vbroadcastf32x2-1.c          | 19 ++++++++++++
 .../i386/avx10_1-vbroadcastf64x2-1.c          | 19 ++++++++++++
 .../i386/avx10_1-vbroadcasti32x2-1.c          | 25 ++++++++++++++++
 .../i386/avx10_1-vbroadcasti64x2-1.c          | 19 ++++++++++++
 .../gcc.target/i386/avx10_1-vcvtpd2qq-1.c     | 29 ++++++++++++++++++
 .../gcc.target/i386/avx10_1-vcvtpd2uqq-1.c    | 29 ++++++++++++++++++
 .../gcc.target/i386/avx10_1-vcvttpd2qq-1.c    | 30 +++++++++++++++++++
 .../gcc.target/i386/avx10_1-vcvttpd2uqq-1.c   | 29 ++++++++++++++++++
 .../gcc.target/i386/avx10_1-vcvttps2qq-1.c    | 27 +++++++++++++++++
 .../gcc.target/i386/avx10_1-vcvttps2uqq-1.c   | 26 ++++++++++++++++
 .../gcc.target/i386/avx10_1-vpmullq-1.c       | 24 +++++++++++++++
 13 files changed, 318 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-vandnpd-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-vandnps-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-vbroadcastf32x2-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-vbroadcastf64x2-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-vbroadcasti32x2-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-vbroadcasti64x2-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-vcvtpd2qq-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-vcvtpd2uqq-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-vcvttpd2qq-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-vcvttpd2uqq-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-vcvttps2qq-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-vcvttps2uqq-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-vpmullq-1.c

diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-vandnpd-1.c b/gcc/testsuite/gcc.target/i386/avx10_1-vandnpd-1.c
new file mode 100644
index 00000000000..a9a8bd7ca8b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-vandnpd-1.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx10.1 -O2" } */
+/* { dg-final { scan-assembler-times "vandnpd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vandnpd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vandnpd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vandnpd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256d y;
+volatile __m128d x;
+volatile __mmask8 m;
+
+void extern
+avx10_1_test (void)
+{
+  y = _mm256_mask_andnot_pd (y, m, y, y);
+  y = _mm256_maskz_andnot_pd (m, y, y);
+  x = _mm_mask_andnot_pd (x, m, x, x);
+  x = _mm_maskz_andnot_pd (m, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-vandnps-1.c b/gcc/testsuite/gcc.target/i386/avx10_1-vandnps-1.c
new file mode 100644
index 00000000000..c33141021cc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-vandnps-1.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx10.1 -O2" } */
+/* { dg-final { scan-assembler-times "vandnps\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vandnps\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vandnps\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vandnps\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256 y;
+volatile __m128 x;
+volatile __mmask8 m;
+
+void extern
+avx10_1_test (void)
+{
+  y = _mm256_mask_andnot_ps (y, m, y, y);
+  y = _mm256_maskz_andnot_ps (m, y, y);
+  x = _mm_mask_andnot_ps (x, m, x, x);
+  x = _mm_maskz_andnot_ps (m, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-vbroadcastf32x2-1.c b/gcc/testsuite/gcc.target/i386/avx10_1-vbroadcastf32x2-1.c
new file mode 100644
index 00000000000..b6d73714282
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-vbroadcastf32x2-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx10.1 -O2" } */
+/* { dg-final { scan-assembler-times "vbroadcastf32x2\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vbroadcastf32x2\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vbroadcastf32x2\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}{z}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256 x;
+volatile __m128 y;
+volatile __mmask8 m;
+
+void extern
+avx10_1_test (void)
+{
+  x = _mm256_broadcast_f32x2 (y);
+  x = _mm256_mask_broadcast_f32x2 (x, m, y);
+  x = _mm256_maskz_broadcast_f32x2 (m, y);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-vbroadcastf64x2-1.c b/gcc/testsuite/gcc.target/i386/avx10_1-vbroadcastf64x2-1.c
new file mode 100644
index 00000000000..26a391552c7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-vbroadcastf64x2-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx10.1 -O2" } */
+/* { dg-final { scan-assembler-times "vbroadcastf64x2\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\{\]|vshuff64x2\[ \\t\]+\[^\n\]*%ymm\[0-9\]+\[^\n\]*%ymm\[0-9\]+\[^\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vbroadcastf64x2\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\[^\{\]|vshuff64x2\[ \\t\]+\[^\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vbroadcastf64x2\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}|vshuff64x2\[ \\t\]+\[^\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256d y;
+volatile __m128d x;
+volatile __mmask8 m;
+
+void extern
+avx10_1_test (void)
+{
+  y = _mm256_broadcast_f64x2 (x);
+  y = _mm256_mask_broadcast_f64x2 (y, m, x);
+  y = _mm256_maskz_broadcast_f64x2 (m, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-vbroadcasti32x2-1.c b/gcc/testsuite/gcc.target/i386/avx10_1-vbroadcasti32x2-1.c
new file mode 100644
index 00000000000..b26e2a3f33a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-vbroadcasti32x2-1.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx10.1 -O2" } */
+/* { dg-final { scan-assembler-times "vbroadcasti32x2\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vbroadcasti32x2\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vbroadcasti32x2\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vbroadcasti32x2\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vbroadcasti32x2\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vbroadcasti32x2\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256i x;
+volatile __m128i y;
+volatile __mmask8 m;
+
+void extern
+avx10_1_test (void)
+{
+  x = _mm256_broadcast_i32x2 (y);
+  x = _mm256_mask_broadcast_i32x2 (x, m, y);
+  x = _mm256_maskz_broadcast_i32x2 (m, y);
+  y = _mm_broadcast_i32x2 (y);
+  y = _mm_mask_broadcast_i32x2 (y, m, y);
+  y = _mm_maskz_broadcast_i32x2 (m, y);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-vbroadcasti64x2-1.c b/gcc/testsuite/gcc.target/i386/avx10_1-vbroadcasti64x2-1.c
new file mode 100644
index 00000000000..29e255a8724
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-vbroadcasti64x2-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx10.1 -O2" } */
+/* { dg-final { scan-assembler-times "vbroadcasti64x2\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\{\]|vshufi64x2\[ \\t\]+\[^\n\]*%ymm\[0-9\]+\[^\n\]*%ymm\[0-9\]+\[^\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vbroadcasti64x2\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\[^\{\]|vshufi64x2\[ \\t\]+\[^\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vbroadcasti64x2\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}|vshufi64x2\[ \\t\]+\[^\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256i y;
+volatile __m128i x;
+volatile __mmask8 m;
+
+void extern
+avx10_1_test (void)
+{
+  y = _mm256_broadcast_i64x2 (x);
+  y = _mm256_mask_broadcast_i64x2 (y, m, x);
+  y = _mm256_maskz_broadcast_i64x2 (m, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-vcvtpd2qq-1.c b/gcc/testsuite/gcc.target/i386/avx10_1-vcvtpd2qq-1.c
new file mode 100644
index 00000000000..ec213071f68
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-vcvtpd2qq-1.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx10.1 -O2" } */
+/* { dg-final { scan-assembler-times "vcvtpd2qq\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vcvtpd2qq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vcvtpd2qq\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtpd2qq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtpd2qq\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtpd2qq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256d s1;
+volatile __m128d s2;
+volatile __m256i res1;
+volatile __m128i res2;
+volatile __mmask8 m;
+
+void extern
+avx10_1_test (void)
+{
+  res1 = _mm256_cvtpd_epi64 (s1);
+  res2 = _mm_cvtpd_epi64 (s2);
+
+  res1 = _mm256_mask_cvtpd_epi64 (res1, m, s1);
+  res2 = _mm_mask_cvtpd_epi64 (res2, m, s2);
+
+  res1 = _mm256_maskz_cvtpd_epi64 (m, s1);
+  res2 = _mm_maskz_cvtpd_epi64 (m, s2);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-vcvtpd2uqq-1.c b/gcc/testsuite/gcc.target/i386/avx10_1-vcvtpd2uqq-1.c
new file mode 100644
index 00000000000..d84e96860c6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-vcvtpd2uqq-1.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx10.1 -O2" } */
+/* { dg-final { scan-assembler-times "vcvtpd2uqq\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vcvtpd2uqq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vcvtpd2uqq\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtpd2uqq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtpd2uqq\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtpd2uqq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256d s1;
+volatile __m128d s2;
+volatile __m256i res1;
+volatile __m128i res2;
+volatile __mmask8 m;
+
+void extern
+avx10_1_test (void)
+{
+  res1 = _mm256_cvtpd_epu64 (s1);
+  res2 = _mm_cvtpd_epu64 (s2);
+
+  res1 = _mm256_mask_cvtpd_epu64 (res1, m, s1);
+  res2 = _mm_mask_cvtpd_epu64 (res2, m, s2);
+
+  res1 = _mm256_maskz_cvtpd_epu64 (m, s1);
+  res2 = _mm_maskz_cvtpd_epu64 (m, s2);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-vcvttpd2qq-1.c b/gcc/testsuite/gcc.target/i386/avx10_1-vcvttpd2qq-1.c
new file mode 100644
index 00000000000..a677176102f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-vcvttpd2qq-1.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx10.1 -O2" } */
+/* { dg-final { scan-assembler-times "vcvttpd2qq\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vcvttpd2qq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vcvttpd2qq\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttpd2qq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttpd2qq\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttpd2qq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256d s1;
+volatile __m128d s2;
+volatile __m256i res1;
+volatile __m128i res2;
+volatile __mmask8 m;
+
+void extern
+avx10_1_test (void)
+{
+  res1 = _mm256_cvttpd_epi64 (s1);
+  res2 = _mm_cvttpd_epi64 (s2);
+
+  res1 = _mm256_mask_cvttpd_epi64 (res1, m, s1);
+  res2 = _mm_mask_cvttpd_epi64 (res2, m, s2);
+
+  res1 = _mm256_maskz_cvttpd_epi64 (m, s1);
+  res2 = _mm_maskz_cvttpd_epi64 (m, s2);
+
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-vcvttpd2uqq-1.c b/gcc/testsuite/gcc.target/i386/avx10_1-vcvttpd2uqq-1.c
new file mode 100644
index 00000000000..d970b2ee633
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-vcvttpd2uqq-1.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx10.1 -O2" } */
+/* { dg-final { scan-assembler-times "vcvttpd2uqq\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vcvttpd2uqq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vcvttpd2uqq\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttpd2uqq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttpd2uqq\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttpd2uqq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256d s1;
+volatile __m128d s2;
+volatile __m256i res1;
+volatile __m128i res2;
+volatile __mmask8 m;
+
+void extern
+avx10_1_test (void)
+{
+  res1 = _mm256_cvttpd_epu64 (s1);
+  res2 = _mm_cvttpd_epu64 (s2);
+
+  res1 = _mm256_mask_cvttpd_epu64 (res1, m, s1);
+  res2 = _mm_mask_cvttpd_epu64 (res2, m, s2);
+
+  res1 = _mm256_maskz_cvttpd_epu64 (m, s1);
+  res2 = _mm_maskz_cvttpd_epu64 (m, s2);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-vcvttps2qq-1.c b/gcc/testsuite/gcc.target/i386/avx10_1-vcvttps2qq-1.c
new file mode 100644
index 00000000000..95610023b3b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-vcvttps2qq-1.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx10.1 -O2" } */
+/* { dg-final { scan-assembler-times "vcvttps2qq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttps2qq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttps2qq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttps2qq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttps2qq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttps2qq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256i x1;
+volatile __m128i x2;
+volatile __m256 z1;
+volatile __m128 z2;
+volatile __mmask8 m;
+
+void extern
+avx10_1_test (void)
+{
+  x1 = _mm256_cvttps_epi64 (z2);
+  x1 = _mm256_mask_cvttps_epi64 (x1, m, z2);
+  x1 = _mm256_maskz_cvttps_epi64 (m, z2);
+  x2 = _mm_cvttps_epi64 (z2);
+  x2 = _mm_mask_cvttps_epi64 (x2, m, z2);
+  x2 = _mm_maskz_cvttps_epi64 (m, z2);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-vcvttps2uqq-1.c b/gcc/testsuite/gcc.target/i386/avx10_1-vcvttps2uqq-1.c
new file mode 100644
index 00000000000..8e42fcf9caf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-vcvttps2uqq-1.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx10.1 -O2" } */
+/* { dg-final { scan-assembler-times "vcvttps2uqq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttps2uqq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttps2uqq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttps2uqq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttps2uqq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvttps2uqq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256i x1;
+volatile __m128i x2;
+volatile __m128 z;
+volatile __mmask8 m;
+
+void extern
+avx10_1_test (void)
+{
+  x1 = _mm256_cvttps_epu64 (z);
+  x1 = _mm256_mask_cvttps_epu64 (x1, m, z);
+  x1 = _mm256_maskz_cvttps_epu64 (m, z);
+  x2 = _mm_cvttps_epu64 (z);
+  x2 = _mm_mask_cvttps_epu64 (x2, m, z);
+  x2 = _mm_maskz_cvttps_epu64 (m, z);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-vpmullq-1.c b/gcc/testsuite/gcc.target/i386/avx10_1-vpmullq-1.c
new file mode 100644
index 00000000000..a26fc70c1dc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-vpmullq-1.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx10.1 -O2" } */
+/* { dg-final { scan-assembler-times "vpmullq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vpmullq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpmullq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpmullq\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\]*%ymm\[0-9\]+\[^\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vpmullq\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\]*%ymm\[0-9\]+\[^\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpmullq\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\]*%ymm\[0-9\]+\[^\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256i _x1, _y1, _z1;
+volatile __m128i _x2, _y2, _z2;
+
+void extern
+avx10_1_test (void)
+{
+  _x2 = _mm_mullo_epi64 (_y2, _z2);
+  _x2 = _mm_mask_mullo_epi64 (_x2, 2, _y2, _z2);
+  _x2 = _mm_maskz_mullo_epi64 (2, _y2, _z2);
+  _x1 = _mm256_mullo_epi64 (_y1, _z1);
+  _x1 = _mm256_mask_mullo_epi64 (_x1, 3, _y1, _z1);
+  _x1 = _mm256_maskz_mullo_epi64 (3, _y1, _z1);
+}
-- 
2.31.1


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH 3/6] Support AVX10.1 for AVX512DQ+AVX512VL intrins
  2023-08-08  7:13 Intel AVX10.1 Compiler Design and Support Haochen Jiang
                   ` (4 preceding siblings ...)
  2023-08-08  7:20 ` [PATCH 2/6] " Haochen Jiang
@ 2023-08-08  7:20 ` Haochen Jiang
  2023-08-08  7:20 ` [PATCH 4/6] " Haochen Jiang
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 88+ messages in thread
From: Haochen Jiang @ 2023-08-08  7:20 UTC (permalink / raw)
  To: gcc-patches; +Cc: ubizjak, hongtao.liu

gcc/ChangeLog:

	* config/i386/avx512vldqintrin.h: Remove target attribute.
	* config/i386/i386-builtin.def (BDESC):
	Add OPTION_MASK_ISA2_AVX10_1.
	* config/i386/i386.cc (standard_sse_constant_opcode): Add TARGET_AVX10_1.
	* config/i386/i386.md: Add new isa attribute
	avx10_1_or_avx512vl.
	* config/i386/sse.md: (VI48_AVX512VL_AVX10_1): New.
	(VI48_AVX512VLDQ_AVX10_1): Ditto.
	(VF2_AVX512VL): Remove.
	(VI8_256_512VLDQ_AVX10_1): Rename from VI8_256_512.
	Add TARGET_AVX10_1.
	(*<code><mode>3<mask_name>): Change isa attribute to
	avx10_1_or_avx512dq. Add TARGET_AVX10_1.
	(<code><mode>3): Add TARGET_AVX10_1. Change isa attr
	to avx10_1_or_avx512vl.
	(<mask_codefor>avx512dq_cvtps2qq<mode><mask_name><round_name>):
	Change iterator to VI8_256_512VLDQ_AVX10_1. Remove target check.
	(<mask_codefor>avx512dq_cvtps2qqv2di<mask_name>):
	Add TARGET_AVX10_1.
	(<mask_codefor>avx512dq_cvtps2uqq<mode><mask_name><round_name>):
	Change iterator to VI8_256_512VLDQ_AVX10_1. Remove target check.
	(<mask_codefor>avx512dq_cvtps2uqqv2di<mask_name>):
	Add TARGET_AVX10_1.
	(float<floatunssuffix><sseintvecmodelower><mode>2<mask_name><round_name>):
	Change iterator to VF2_AVX512VLDQ_AVX10_1. Remove target check.
	(float<floatunssuffix><sselongvecmodelower><mode>2<mask_name><round_name>):
	Change iterator to VF1_128_256VLDQ_AVX10_1. Remove target check.
	(float<floatunssuffix>v4div4sf2<mask_name>):
	Add TARGET_AVX10_1.
	(avx512dq_float<floatunssuffix>v2div2sf2): Ditto.
	(*avx512dq_float<floatunssuffix>v2div2sf2): Ditto.
	(float<floatunssuffix>v2div2sf2): Ditto.
	(float<floatunssuffix>v2div2sf2_mask): Ditto.
	(*float<floatunssuffix>v2div2sf2_mask): Ditto.
	(*float<floatunssuffix>v2div2sf2_mask_1): Ditto.
	(<avx512>_cvt<ssemodesuffix>2mask<mode>):
	Change iterator to VI48_AVX512VLDQ_AVX10_1. Remove target check.
	(<avx512>_cvtmask2<ssemodesuffix><mode>): Ditto.
	(*<avx512>_cvtmask2<ssemodesuffix><mode>):
	Change iterator to VI48_AVX512VL_AVX10_1. Remove target check.
	Change when constraint is enabled.
---
 gcc/config/i386/avx512vldqintrin.h |  12 +--
 gcc/config/i386/i386-builtin.def   |  64 ++++++++--------
 gcc/config/i386/i386.cc            |   8 +-
 gcc/config/i386/sse.md             | 114 +++++++++++++++++------------
 4 files changed, 109 insertions(+), 89 deletions(-)

diff --git a/gcc/config/i386/avx512vldqintrin.h b/gcc/config/i386/avx512vldqintrin.h
index 4b8006f7b73..a8d14a4efc9 100644
--- a/gcc/config/i386/avx512vldqintrin.h
+++ b/gcc/config/i386/avx512vldqintrin.h
@@ -673,12 +673,6 @@ _mm_maskz_andnot_ps (__mmask8 __U, __m128 __A, __m128 __B)
 						 (__mmask8) __U);
 }
 
-#if !defined(__AVX512VL__) || !defined(__AVX512DQ__)
-#pragma GCC push_options
-#pragma GCC target("avx512vl,avx512dq")
-#define __DISABLE_AVX512VLDQ__
-#endif /* __AVX512VLDQ__ */
-
 extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_cvtps_epi64 (__m128 __A)
@@ -1337,6 +1331,12 @@ _mm256_movepi64_mask (__m256i __A)
   return (__mmask8) __builtin_ia32_cvtq2mask256 ((__v4di) __A);
 }
 
+#if !defined(__AVX512VL__) || !defined(__AVX512DQ__)
+#pragma GCC push_options
+#pragma GCC target("avx512vl,avx512dq")
+#define __DISABLE_AVX512VLDQ__
+#endif /* __AVX512VLDQ__ */
+
 #ifdef __OPTIMIZE__
 extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 18d8966f0de..aa0a29caa9f 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -1710,14 +1710,14 @@ BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_subv2df3_mask, "__builtin_ia32_subp
 BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_subv4df3_mask, "__builtin_ia32_subpd256_mask", IX86_BUILTIN_SUBPD256_MASK, UNKNOWN, (int) V4DF_FTYPE_V4DF_V4DF_V4DF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_subv4sf3_mask, "__builtin_ia32_subps128_mask", IX86_BUILTIN_SUBPS128_MASK, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_subv8sf3_mask, "__builtin_ia32_subps256_mask", IX86_BUILTIN_SUBPS256_MASK, UNKNOWN, (int) V8SF_FTYPE_V8SF_V8SF_V8SF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_xorv4df3_mask, "__builtin_ia32_xorpd256_mask", IX86_BUILTIN_XORPD256_MASK, UNKNOWN, (int) V4DF_FTYPE_V4DF_V4DF_V4DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_xorv2df3_mask, "__builtin_ia32_xorpd128_mask", IX86_BUILTIN_XORPD128_MASK, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_xorv8sf3_mask, "__builtin_ia32_xorps256_mask", IX86_BUILTIN_XORPS256_MASK, UNKNOWN, (int) V8SF_FTYPE_V8SF_V8SF_V8SF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_xorv4sf3_mask, "__builtin_ia32_xorps128_mask", IX86_BUILTIN_XORPS128_MASK, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_iorv4df3_mask, "__builtin_ia32_orpd256_mask", IX86_BUILTIN_ORPD256_MASK, UNKNOWN, (int) V4DF_FTYPE_V4DF_V4DF_V4DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_iorv2df3_mask, "__builtin_ia32_orpd128_mask", IX86_BUILTIN_ORPD128_MASK, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_iorv8sf3_mask, "__builtin_ia32_orps256_mask", IX86_BUILTIN_ORPS256_MASK, UNKNOWN, (int) V8SF_FTYPE_V8SF_V8SF_V8SF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_iorv4sf3_mask, "__builtin_ia32_orps128_mask", IX86_BUILTIN_ORPS128_MASK, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_xorv4df3_mask, "__builtin_ia32_xorpd256_mask", IX86_BUILTIN_XORPD256_MASK, UNKNOWN, (int) V4DF_FTYPE_V4DF_V4DF_V4DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_xorv2df3_mask, "__builtin_ia32_xorpd128_mask", IX86_BUILTIN_XORPD128_MASK, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_xorv8sf3_mask, "__builtin_ia32_xorps256_mask", IX86_BUILTIN_XORPS256_MASK, UNKNOWN, (int) V8SF_FTYPE_V8SF_V8SF_V8SF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_xorv4sf3_mask, "__builtin_ia32_xorps128_mask", IX86_BUILTIN_XORPS128_MASK, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_iorv4df3_mask, "__builtin_ia32_orpd256_mask", IX86_BUILTIN_ORPD256_MASK, UNKNOWN, (int) V4DF_FTYPE_V4DF_V4DF_V4DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_iorv2df3_mask, "__builtin_ia32_orpd128_mask", IX86_BUILTIN_ORPD128_MASK, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_iorv8sf3_mask, "__builtin_ia32_orps256_mask", IX86_BUILTIN_ORPS256_MASK, UNKNOWN, (int) V8SF_FTYPE_V8SF_V8SF_V8SF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_iorv4sf3_mask, "__builtin_ia32_orps128_mask", IX86_BUILTIN_ORPS128_MASK, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_avx512dq_broadcastv8sf_mask, "__builtin_ia32_broadcastf32x2_256_mask", IX86_BUILTIN_BROADCASTF32x2_256, UNKNOWN, (int) V8SF_FTYPE_V4SF_V8SF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_avx512dq_broadcastv8si_mask, "__builtin_ia32_broadcasti32x2_256_mask", IX86_BUILTIN_BROADCASTI32x2_256, UNKNOWN, (int) V8SI_FTYPE_V4SI_V8SI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_avx512dq_broadcastv4si_mask, "__builtin_ia32_broadcasti32x2_128_mask", IX86_BUILTIN_BROADCASTI32x2_128, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_UQI)
@@ -1938,10 +1938,10 @@ BDESC (OPTION_MASK_ISA_AVX512BW | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_mulv16hi
 BDESC (OPTION_MASK_ISA_AVX512BW | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_mulv8hi3_mask, "__builtin_ia32_pmullw128_mask", IX86_BUILTIN_PMULLW128_MASK, UNKNOWN, (int) V8HI_FTYPE_V8HI_V8HI_V8HI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_avx512dq_mulv4di3_mask, "__builtin_ia32_pmullq256_mask", IX86_BUILTIN_PMULLQ256, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_V4DI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_avx512dq_mulv2di3_mask, "__builtin_ia32_pmullq128_mask", IX86_BUILTIN_PMULLQ128, UNKNOWN, (int) V2DI_FTYPE_V2DI_V2DI_V2DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_andv4df3_mask, "__builtin_ia32_andpd256_mask", IX86_BUILTIN_ANDPD256_MASK, UNKNOWN, (int) V4DF_FTYPE_V4DF_V4DF_V4DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_andv2df3_mask, "__builtin_ia32_andpd128_mask", IX86_BUILTIN_ANDPD128_MASK, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_andv8sf3_mask, "__builtin_ia32_andps256_mask", IX86_BUILTIN_ANDPS256_MASK, UNKNOWN, (int) V8SF_FTYPE_V8SF_V8SF_V8SF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_andv4sf3_mask, "__builtin_ia32_andps128_mask", IX86_BUILTIN_ANDPS128_MASK, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_andv4df3_mask, "__builtin_ia32_andpd256_mask", IX86_BUILTIN_ANDPD256_MASK, UNKNOWN, (int) V4DF_FTYPE_V4DF_V4DF_V4DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_andv2df3_mask, "__builtin_ia32_andpd128_mask", IX86_BUILTIN_ANDPD128_MASK, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_andv8sf3_mask, "__builtin_ia32_andps256_mask", IX86_BUILTIN_ANDPS256_MASK, UNKNOWN, (int) V8SF_FTYPE_V8SF_V8SF_V8SF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_andv4sf3_mask, "__builtin_ia32_andps128_mask", IX86_BUILTIN_ANDPS128_MASK, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_avx_andnotv4df3_mask, "__builtin_ia32_andnpd256_mask", IX86_BUILTIN_ANDNPD256_MASK, UNKNOWN, (int) V4DF_FTYPE_V4DF_V4DF_V4DF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_sse2_andnotv2df3_mask, "__builtin_ia32_andnpd128_mask", IX86_BUILTIN_ANDNPD128_MASK, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_avx_andnotv8sf3_mask, "__builtin_ia32_andnps256_mask", IX86_BUILTIN_ANDNPS256_MASK, UNKNOWN, (int) V8SF_FTYPE_V8SF_V8SF_V8SF_UQI)
@@ -2090,10 +2090,10 @@ BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx_fix_notruncv8sfv8si_mask, "__bu
 BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_sse2_fix_notruncv4sfv4si_mask, "__builtin_ia32_cvtps2dq128_mask", IX86_BUILTIN_CVTPS2DQ128_MASK, UNKNOWN, (int) V4SI_FTYPE_V4SF_V4SI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_fixuns_notruncv8sfv8si_mask, "__builtin_ia32_cvtps2udq256_mask", IX86_BUILTIN_CVTPS2UDQ256, UNKNOWN, (int) V8SI_FTYPE_V8SF_V8SI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_fixuns_notruncv4sfv4si_mask, "__builtin_ia32_cvtps2udq128_mask", IX86_BUILTIN_CVTPS2UDQ128, UNKNOWN, (int) V4SI_FTYPE_V4SF_V4SI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512dq_cvtps2qqv4di_mask, "__builtin_ia32_cvtps2qq256_mask", IX86_BUILTIN_CVTPS2QQ256, UNKNOWN, (int) V4DI_FTYPE_V4SF_V4DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512dq_cvtps2qqv2di_mask, "__builtin_ia32_cvtps2qq128_mask", IX86_BUILTIN_CVTPS2QQ128, UNKNOWN, (int) V2DI_FTYPE_V4SF_V2DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512dq_cvtps2uqqv4di_mask, "__builtin_ia32_cvtps2uqq256_mask", IX86_BUILTIN_CVTPS2UQQ256, UNKNOWN, (int) V4DI_FTYPE_V4SF_V4DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512dq_cvtps2uqqv2di_mask, "__builtin_ia32_cvtps2uqq128_mask", IX86_BUILTIN_CVTPS2UQQ128, UNKNOWN, (int) V2DI_FTYPE_V4SF_V2DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_avx512dq_cvtps2qqv4di_mask, "__builtin_ia32_cvtps2qq256_mask", IX86_BUILTIN_CVTPS2QQ256, UNKNOWN, (int) V4DI_FTYPE_V4SF_V4DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_avx512dq_cvtps2qqv2di_mask, "__builtin_ia32_cvtps2qq128_mask", IX86_BUILTIN_CVTPS2QQ128, UNKNOWN, (int) V2DI_FTYPE_V4SF_V2DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_avx512dq_cvtps2uqqv4di_mask, "__builtin_ia32_cvtps2uqq256_mask", IX86_BUILTIN_CVTPS2UQQ256, UNKNOWN, (int) V4DI_FTYPE_V4SF_V4DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_avx512dq_cvtps2uqqv2di_mask, "__builtin_ia32_cvtps2uqq128_mask", IX86_BUILTIN_CVTPS2UQQ128, UNKNOWN, (int) V2DI_FTYPE_V4SF_V2DI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_getmantv8sf_mask, "__builtin_ia32_getmantps256_mask", IX86_BUILTIN_GETMANTPS256, UNKNOWN, (int) V8SF_FTYPE_V8SF_INT_V8SF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_getmantv4sf_mask, "__builtin_ia32_getmantps128_mask", IX86_BUILTIN_GETMANTPS128, UNKNOWN, (int) V4SF_FTYPE_V4SF_INT_V4SF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_getmantv4df_mask, "__builtin_ia32_getmantpd256_mask", IX86_BUILTIN_GETMANTPD256, UNKNOWN, (int) V4DF_FTYPE_V4DF_INT_V4DF_UQI)
@@ -2104,14 +2104,14 @@ BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx_movshdup256_mask, "__builtin_ia
 BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_sse3_movshdup_mask, "__builtin_ia32_movshdup128_mask", IX86_BUILTIN_MOVSHDUP128_MASK, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx_movsldup256_mask, "__builtin_ia32_movsldup256_mask", IX86_BUILTIN_MOVSLDUP256_MASK, UNKNOWN, (int) V8SF_FTYPE_V8SF_V8SF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_sse3_movsldup_mask, "__builtin_ia32_movsldup128_mask", IX86_BUILTIN_MOVSLDUP128_MASK, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_floatv4div4sf2_mask, "__builtin_ia32_cvtqq2ps256_mask", IX86_BUILTIN_CVTQQ2PS256, UNKNOWN, (int) V4SF_FTYPE_V4DI_V4SF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_floatv2div2sf2_mask, "__builtin_ia32_cvtqq2ps128_mask", IX86_BUILTIN_CVTQQ2PS128, UNKNOWN, (int) V4SF_FTYPE_V2DI_V4SF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_floatunsv4div4sf2_mask, "__builtin_ia32_cvtuqq2ps256_mask", IX86_BUILTIN_CVTUQQ2PS256, UNKNOWN, (int) V4SF_FTYPE_V4DI_V4SF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_floatunsv2div2sf2_mask, "__builtin_ia32_cvtuqq2ps128_mask", IX86_BUILTIN_CVTUQQ2PS128, UNKNOWN, (int) V4SF_FTYPE_V2DI_V4SF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_floatv4div4df2_mask, "__builtin_ia32_cvtqq2pd256_mask", IX86_BUILTIN_CVTQQ2PD256, UNKNOWN, (int) V4DF_FTYPE_V4DI_V4DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_floatv2div2df2_mask, "__builtin_ia32_cvtqq2pd128_mask", IX86_BUILTIN_CVTQQ2PD128, UNKNOWN, (int) V2DF_FTYPE_V2DI_V2DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_floatunsv4div4df2_mask, "__builtin_ia32_cvtuqq2pd256_mask", IX86_BUILTIN_CVTUQQ2PD256, UNKNOWN, (int) V4DF_FTYPE_V4DI_V4DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_floatunsv2div2df2_mask, "__builtin_ia32_cvtuqq2pd128_mask", IX86_BUILTIN_CVTUQQ2PD128, UNKNOWN, (int) V2DF_FTYPE_V2DI_V2DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_floatv4div4sf2_mask, "__builtin_ia32_cvtqq2ps256_mask", IX86_BUILTIN_CVTQQ2PS256, UNKNOWN, (int) V4SF_FTYPE_V4DI_V4SF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_floatv2div2sf2_mask, "__builtin_ia32_cvtqq2ps128_mask", IX86_BUILTIN_CVTQQ2PS128, UNKNOWN, (int) V4SF_FTYPE_V2DI_V4SF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_floatunsv4div4sf2_mask, "__builtin_ia32_cvtuqq2ps256_mask", IX86_BUILTIN_CVTUQQ2PS256, UNKNOWN, (int) V4SF_FTYPE_V4DI_V4SF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_floatunsv2div2sf2_mask, "__builtin_ia32_cvtuqq2ps128_mask", IX86_BUILTIN_CVTUQQ2PS128, UNKNOWN, (int) V4SF_FTYPE_V2DI_V4SF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_floatv4div4df2_mask, "__builtin_ia32_cvtqq2pd256_mask", IX86_BUILTIN_CVTQQ2PD256, UNKNOWN, (int) V4DF_FTYPE_V4DI_V4DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_floatv2div2df2_mask, "__builtin_ia32_cvtqq2pd128_mask", IX86_BUILTIN_CVTQQ2PD128, UNKNOWN, (int) V2DF_FTYPE_V2DI_V2DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_floatunsv4div4df2_mask, "__builtin_ia32_cvtuqq2pd256_mask", IX86_BUILTIN_CVTUQQ2PD256, UNKNOWN, (int) V4DF_FTYPE_V4DI_V4DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_floatunsv2div2df2_mask, "__builtin_ia32_cvtuqq2pd128_mask", IX86_BUILTIN_CVTUQQ2PD128, UNKNOWN, (int) V2DF_FTYPE_V2DI_V2DF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_vpermt2varv4di3_mask, "__builtin_ia32_vpermt2varq256_mask", IX86_BUILTIN_VPERMT2VARQ256, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_V4DI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_vpermt2varv4di3_maskz, "__builtin_ia32_vpermt2varq256_maskz", IX86_BUILTIN_VPERMT2VARQ256_MASKZ, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_V4DI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_vpermt2varv8si3_mask, "__builtin_ia32_vpermt2vard256_mask", IX86_BUILTIN_VPERMT2VARD256, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI_UQI)
@@ -2194,18 +2194,18 @@ BDESC (OPTION_MASK_ISA_AVX512BW | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl
 BDESC (OPTION_MASK_ISA_AVX512BW | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_cvtb2maskv32qi, "__builtin_ia32_cvtb2mask256", IX86_BUILTIN_CVTB2MASK256, UNKNOWN, (int) USI_FTYPE_V32QI)
 BDESC (OPTION_MASK_ISA_AVX512BW | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_cvtw2maskv8hi, "__builtin_ia32_cvtw2mask128", IX86_BUILTIN_CVTW2MASK128, UNKNOWN, (int) UQI_FTYPE_V8HI)
 BDESC (OPTION_MASK_ISA_AVX512BW | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_cvtw2maskv16hi, "__builtin_ia32_cvtw2mask256", IX86_BUILTIN_CVTW2MASK256, UNKNOWN, (int) UHI_FTYPE_V16HI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_cvtd2maskv4si, "__builtin_ia32_cvtd2mask128", IX86_BUILTIN_CVTD2MASK128, UNKNOWN, (int) UQI_FTYPE_V4SI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_cvtd2maskv8si, "__builtin_ia32_cvtd2mask256", IX86_BUILTIN_CVTD2MASK256, UNKNOWN, (int) UQI_FTYPE_V8SI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_cvtq2maskv2di, "__builtin_ia32_cvtq2mask128", IX86_BUILTIN_CVTQ2MASK128, UNKNOWN, (int) UQI_FTYPE_V2DI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_cvtq2maskv4di, "__builtin_ia32_cvtq2mask256", IX86_BUILTIN_CVTQ2MASK256, UNKNOWN, (int) UQI_FTYPE_V4DI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_avx512vl_cvtd2maskv4si, "__builtin_ia32_cvtd2mask128", IX86_BUILTIN_CVTD2MASK128, UNKNOWN, (int) UQI_FTYPE_V4SI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_avx512vl_cvtd2maskv8si, "__builtin_ia32_cvtd2mask256", IX86_BUILTIN_CVTD2MASK256, UNKNOWN, (int) UQI_FTYPE_V8SI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_avx512vl_cvtq2maskv2di, "__builtin_ia32_cvtq2mask128", IX86_BUILTIN_CVTQ2MASK128, UNKNOWN, (int) UQI_FTYPE_V2DI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_avx512vl_cvtq2maskv4di, "__builtin_ia32_cvtq2mask256", IX86_BUILTIN_CVTQ2MASK256, UNKNOWN, (int) UQI_FTYPE_V4DI)
 BDESC (OPTION_MASK_ISA_AVX512BW | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_cvtmask2bv16qi, "__builtin_ia32_cvtmask2b128", IX86_BUILTIN_CVTMASK2B128, UNKNOWN, (int) V16QI_FTYPE_UHI)
 BDESC (OPTION_MASK_ISA_AVX512BW | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_cvtmask2bv32qi, "__builtin_ia32_cvtmask2b256", IX86_BUILTIN_CVTMASK2B256, UNKNOWN, (int) V32QI_FTYPE_USI)
 BDESC (OPTION_MASK_ISA_AVX512BW | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_cvtmask2wv8hi, "__builtin_ia32_cvtmask2w128", IX86_BUILTIN_CVTMASK2W128, UNKNOWN, (int) V8HI_FTYPE_UQI)
 BDESC (OPTION_MASK_ISA_AVX512BW | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_cvtmask2wv16hi, "__builtin_ia32_cvtmask2w256", IX86_BUILTIN_CVTMASK2W256, UNKNOWN, (int) V16HI_FTYPE_UHI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_cvtmask2dv4si, "__builtin_ia32_cvtmask2d128", IX86_BUILTIN_CVTMASK2D128, UNKNOWN, (int) V4SI_FTYPE_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_cvtmask2dv8si, "__builtin_ia32_cvtmask2d256", IX86_BUILTIN_CVTMASK2D256, UNKNOWN, (int) V8SI_FTYPE_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_cvtmask2qv2di, "__builtin_ia32_cvtmask2q128", IX86_BUILTIN_CVTMASK2Q128, UNKNOWN, (int) V2DI_FTYPE_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_cvtmask2qv4di, "__builtin_ia32_cvtmask2q256", IX86_BUILTIN_CVTMASK2Q256, UNKNOWN, (int) V4DI_FTYPE_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_avx512vl_cvtmask2dv4si, "__builtin_ia32_cvtmask2d128", IX86_BUILTIN_CVTMASK2D128, UNKNOWN, (int) V4SI_FTYPE_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_avx512vl_cvtmask2dv8si, "__builtin_ia32_cvtmask2d256", IX86_BUILTIN_CVTMASK2D256, UNKNOWN, (int) V8SI_FTYPE_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_avx512vl_cvtmask2qv2di, "__builtin_ia32_cvtmask2q128", IX86_BUILTIN_CVTMASK2Q128, UNKNOWN, (int) V2DI_FTYPE_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_avx512vl_cvtmask2qv4di, "__builtin_ia32_cvtmask2q256", IX86_BUILTIN_CVTMASK2Q256, UNKNOWN, (int) V4DI_FTYPE_UQI)
 BDESC (OPTION_MASK_ISA_AVX512BW | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_eqv16qi3_mask, "__builtin_ia32_pcmpeqb128_mask", IX86_BUILTIN_PCMPEQB128_MASK, UNKNOWN, (int) UHI_FTYPE_V16QI_V16QI_UHI)
 BDESC (OPTION_MASK_ISA_AVX512BW | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_eqv32qi3_mask, "__builtin_ia32_pcmpeqb256_mask", IX86_BUILTIN_PCMPEQB256_MASK, UNKNOWN, (int) USI_FTYPE_V32QI_V32QI_USI)
 BDESC (OPTION_MASK_ISA_AVX512BW | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_eqv8hi3_mask, "__builtin_ia32_pcmpeqw128_mask", IX86_BUILTIN_PCMPEQW128_MASK, UNKNOWN, (int) UQI_FTYPE_V8HI_V8HI_UQI)
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index e75614b993d..dd297709e44 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -5316,8 +5316,8 @@ standard_sse_constant_opcode (rtx_insn *insn, rtx *operands)
 	case MODE_V4DF:
 	  if (!EXT_REX_SSE_REG_P (operands[0]))
 	    return "vxorpd\t%x0, %x0, %x0";
-	  else if (TARGET_AVX512DQ)
-	    return (TARGET_AVX512VL
+	  else if (TARGET_AVX512DQ || TARGET_AVX10_1)
+	    return ((TARGET_AVX512VL || TARGET_AVX10_1)
 		    ? "vxorpd\t%x0, %x0, %x0"
 		    : "vxorpd\t%g0, %g0, %g0");
 	  else
@@ -5333,8 +5333,8 @@ standard_sse_constant_opcode (rtx_insn *insn, rtx *operands)
 	case MODE_V8SF:
 	  if (!EXT_REX_SSE_REG_P (operands[0]))
 	    return "vxorps\t%x0, %x0, %x0";
-	  else if (TARGET_AVX512DQ)
-	    return (TARGET_AVX512VL
+	  else if (TARGET_AVX512DQ || TARGET_AVX10_1)
+	    return ((TARGET_AVX512VL || TARGET_AVX10_1)
 		    ? "vxorps\t%x0, %x0, %x0"
 		    : "vxorps\t%g0, %g0, %g0");
 	  else
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 5d19aaf380f..9003776ee01 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -441,6 +441,20 @@
   [V16SI (V8SI  "TARGET_AVX512VL") (V4SI  "TARGET_AVX512VL")
    V8DI  (V4DI  "TARGET_AVX512VL") (V2DI  "TARGET_AVX512VL")])
 
+(define_mode_iterator VI48_AVX512VL_AVX10_1
+  [(V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX512VL || TARGET_AVX10_1")
+   (V4SI "TARGET_AVX512VL || TARGET_AVX10_1") (V8DI "TARGET_AVX512F")
+   (V4DI "TARGET_AVX512VL || TARGET_AVX10_1")
+   (V2DI "TARGET_AVX512VL || TARGET_AVX10_1")])
+
+(define_mode_iterator VI48_AVX512VLDQ_AVX10_1
+  [(V16SI "TARGET_AVX512DQ")
+   (V8SI "(TARGET_AVX512DQ && TARGET_AVX512VL) || TARGET_AVX10_1")
+   (V4SI "(TARGET_AVX512DQ && TARGET_AVX512VL) || TARGET_AVX10_1")
+   (V8DI "TARGET_AVX512DQ")
+   (V4DI "(TARGET_AVX512DQ && TARGET_AVX512VL) || TARGET_AVX10_1")
+   (V2DI "(TARGET_AVX512DQ && TARGET_AVX512VL) || TARGET_AVX10_1")])
+
 (define_mode_iterator VI1248_AVX512VLBW
   [(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX512VL && TARGET_AVX512BW")
    (V16QI "TARGET_AVX512VL && TARGET_AVX512BW")
@@ -464,9 +478,6 @@
    V16SF (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")
    V8DF (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")])
 
-(define_mode_iterator VF2_AVX512VL
-  [V8DF (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")])
-
 (define_mode_iterator VF2_AVX512VLDQ_AVX10_1
   [(V8DF "TARGET_AVX512DQ")
    (V4DF "(TARGET_AVX512DQ && TARGET_AVX512VL) || TARGET_AVX10_1")
@@ -542,8 +553,9 @@
    (V4DI "(TARGET_AVX512DQ && TARGET_AVX512VL) || TARGET_AVX10_1")
    (V2DI "(TARGET_AVX512DQ && TARGET_AVX512VL) || TARGET_AVX10_1")])
 
-(define_mode_iterator VI8_256_512
-  [V8DI (V4DI "TARGET_AVX512VL")])
+(define_mode_iterator VI8_256_512VLDQ_AVX10_1
+  [(V8DI "TARGET_AVX512DQ")
+   (V4DI "(TARGET_AVX512DQ && TARGET_AVX512VL) || TARGET_AVX10_1")])
 
 (define_mode_iterator VI1_AVX2
   [(V32QI "TARGET_AVX2") V16QI])
@@ -4909,13 +4921,13 @@
   output_asm_insn (buf, operands);
   return "";
 }
-  [(set_attr "isa" "noavx,avx,avx512dq,avx512f")
+  [(set_attr "isa" "noavx,avx,avx10_1_or_avx512dq,avx512f")
    (set_attr "type" "sselog")
    (set_attr "prefix" "orig,maybe_evex,evex,evex")
    (set (attr "mode")
 	(cond [(and (match_test "<mask_applied>")
 		    (and (eq_attr "alternative" "1")
-			 (match_test "!TARGET_AVX512DQ")))
+			 (match_test "!(TARGET_AVX512DQ || TARGET_AVX10_1)")))
 		 (const_string "<sseintvecmode2>")
 	       (eq_attr "alternative" "3")
 		 (const_string "<sseintvecmode2>")
@@ -5169,7 +5181,7 @@
       ops = "<logic>%s\t{%%2, %%0|%%0, %%2}";
       break;
     case 2:
-      if (!TARGET_AVX512DQ)
+      if (!TARGET_AVX512DQ && !TARGET_AVX10_1)
 	{
 	  suffix = <MODE>mode == DFmode ? "q" : "d";
 	  ops = "vp<logic>%s\t{%%2, %%1, %%0|%%0, %%1, %%2}";
@@ -5196,12 +5208,12 @@
   output_asm_insn (buf, operands);
   return "";
 }
-  [(set_attr "isa" "noavx,avx,avx512vl,avx512f")
+  [(set_attr "isa" "noavx,avx,avx10_1_or_avx512vl,avx512f")
    (set_attr "type" "sselog")
    (set_attr "prefix" "orig,vex,evex,evex")
    (set (attr "mode")
 	(cond [(eq_attr "alternative" "2")
-		 (if_then_else (match_test "TARGET_AVX512DQ")
+		 (if_then_else (match_test "TARGET_AVX512DQ || TARGET_AVX10_1")
 			       (const_string "<ssevecmode>")
 			       (const_string "TI"))
 	       (eq_attr "alternative" "3")
@@ -8073,10 +8085,11 @@
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_insn "<mask_codefor>avx512dq_cvtps2qq<mode><mask_name><round_name>"
-  [(set (match_operand:VI8_256_512 0 "register_operand" "=v")
-	(unspec:VI8_256_512 [(match_operand:<ssePSmode2> 1 "nonimmediate_operand" "<round_constraint>")]
-		     UNSPEC_FIX_NOTRUNC))]
-  "TARGET_AVX512DQ && <round_mode512bit_condition>"
+  [(set (match_operand:VI8_256_512VLDQ_AVX10_1 0 "register_operand" "=v")
+	(unspec:VI8_256_512VLDQ_AVX10_1
+	  [(match_operand:<ssePSmode2> 1 "nonimmediate_operand" "<round_constraint>")]
+	  UNSPEC_FIX_NOTRUNC))]
+  "<round_mode512bit_condition>"
   "vcvtps2qq\t{<round_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_mask_op2>}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
@@ -8089,17 +8102,18 @@
 	     (match_operand:V4SF 1 "nonimmediate_operand" "vm")
 	     (parallel [(const_int 0) (const_int 1)]))]
 	  UNSPEC_FIX_NOTRUNC))]
-  "TARGET_AVX512DQ && TARGET_AVX512VL"
+  "(TARGET_AVX512DQ && TARGET_AVX512VL) || TARGET_AVX10_1"
   "vcvtps2qq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %q1}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
    (set_attr "mode" "TI")])
 
 (define_insn "<mask_codefor>avx512dq_cvtps2uqq<mode><mask_name><round_name>"
-  [(set (match_operand:VI8_256_512 0 "register_operand" "=v")
-	(unspec:VI8_256_512 [(match_operand:<ssePSmode2> 1 "nonimmediate_operand" "<round_constraint>")]
-		     UNSPEC_UNSIGNED_FIX_NOTRUNC))]
-  "TARGET_AVX512DQ && <round_mode512bit_condition>"
+  [(set (match_operand:VI8_256_512VLDQ_AVX10_1 0 "register_operand" "=v")
+	(unspec:VI8_256_512VLDQ_AVX10_1
+	  [(match_operand:<ssePSmode2> 1 "nonimmediate_operand" "<round_constraint>")]
+	  UNSPEC_UNSIGNED_FIX_NOTRUNC))]
+  "<round_mode512bit_condition>"
   "vcvtps2uqq\t{<round_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_mask_op2>}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
@@ -8112,7 +8126,7 @@
 	     (match_operand:V4SF 1 "nonimmediate_operand" "vm")
 	     (parallel [(const_int 0) (const_int 1)]))]
 	  UNSPEC_UNSIGNED_FIX_NOTRUNC))]
-  "TARGET_AVX512DQ && TARGET_AVX512VL"
+  "(TARGET_AVX512DQ && TARGET_AVX512VL) || TARGET_AVX10_1"
   "vcvtps2uqq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %q1}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
@@ -8418,10 +8432,10 @@
    (set_attr "mode" "<MODE>")])
 
 (define_insn "float<floatunssuffix><sseintvecmodelower><mode>2<mask_name><round_name>"
-  [(set (match_operand:VF2_AVX512VL 0 "register_operand" "=v")
-	(any_float:VF2_AVX512VL
+  [(set (match_operand:VF2_AVX512VLDQ_AVX10_1 0 "register_operand" "=v")
+	(any_float:VF2_AVX512VLDQ_AVX10_1
 	  (match_operand:<sseintvecmode> 1 "nonimmediate_operand" "<round_constraint>")))]
-  "TARGET_AVX512DQ"
+  ""
   "vcvt<floatsuffix>qq2pd\t{<round_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_mask_op2>}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
@@ -8442,10 +8456,10 @@
    (V8DF "OI") (V4DF "TI")])
 
 (define_insn "float<floatunssuffix><sselongvecmodelower><mode>2<mask_name><round_name>"
-  [(set (match_operand:VF1_128_256VL 0 "register_operand" "=v")
-	 (any_float:VF1_128_256VL
+  [(set (match_operand:VF1_128_256VLDQ_AVX10_1 0 "register_operand" "=v")
+	 (any_float:VF1_128_256VLDQ_AVX10_1
 	   (match_operand:<sselongvecmode> 1 "nonimmediate_operand" "<round_constraint>")))]
-  "TARGET_AVX512DQ && <round_modev8sf_condition>"
+  "<round_modev8sf_condition>"
   "vcvt<floatsuffix>qq2ps<qq2pssuff>\t{<round_mask_op2>%1, %0<mask_operand2>|%0<mask_operand2>, %1<round_mask_op2>}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
@@ -8456,7 +8470,7 @@
 	(vec_concat:V4SF
 	    (any_float:V2SF (match_operand:V2DI 1 "nonimmediate_operand" "vm"))
 	    (match_dup 2)))]
-  "TARGET_AVX512DQ && TARGET_AVX512VL"
+  "(TARGET_AVX512DQ && TARGET_AVX512VL) || TARGET_AVX10_1"
   "operands[2] = CONST0_RTX (V2SFmode);")
 
 (define_insn "*avx512dq_float<floatunssuffix>v2div2sf2"
@@ -8464,7 +8478,7 @@
 	(vec_concat:V4SF
 	    (any_float:V2SF (match_operand:V2DI 1 "nonimmediate_operand" "vm"))
 	    (match_operand:V2SF 2 "const0_operand")))]
-  "TARGET_AVX512DQ && TARGET_AVX512VL"
+  "(TARGET_AVX512DQ && TARGET_AVX512VL) || TARGET_AVX10_1"
   "vcvt<floatsuffix>qq2ps{x}\t{%1, %0|%0, %1}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
@@ -8473,7 +8487,7 @@
 (define_expand "float<floatunssuffix>v2div2sf2"
   [(set (match_operand:V2SF 0 "register_operand")
 	(any_float:V2SF (match_operand:V2DI 1 "nonimmediate_operand")))]
-  "TARGET_AVX512DQ && TARGET_AVX512VL"
+  "(TARGET_AVX512DQ && TARGET_AVX512VL) || TARGET_AVX10_1"
 {
   rtx op0 = gen_reg_rtx (V4SFmode);
 
@@ -8557,7 +8571,7 @@
                 (parallel [(const_int 0) (const_int 1)]))
             (match_operand:QI 3 "register_operand" "Yk"))
 	    (match_dup 4)))]
-  "TARGET_AVX512DQ && TARGET_AVX512VL"
+  "(TARGET_AVX512DQ && TARGET_AVX512VL) || TARGET_AVX10_1"
   "operands[4] = CONST0_RTX (V2SFmode);")
 
 (define_insn "*float<floatunssuffix>v2div2sf2_mask"
@@ -8570,7 +8584,7 @@
                 (parallel [(const_int 0) (const_int 1)]))
             (match_operand:QI 3 "register_operand" "Yk"))
 	    (match_operand:V2SF 4 "const0_operand")))]
-  "TARGET_AVX512DQ && TARGET_AVX512VL"
+  "(TARGET_AVX512DQ && TARGET_AVX512VL) || TARGET_AVX10_1"
   "vcvt<floatsuffix>qq2ps{x}\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
@@ -8585,7 +8599,7 @@
 	    (match_operand:V2SF 3 "const0_operand")
 	    (match_operand:QI 2 "register_operand" "Yk"))
 	    (match_operand:V2SF 4 "const0_operand")))]
-  "TARGET_AVX512DQ && TARGET_AVX512VL"
+  "(TARGET_AVX512DQ && TARGET_AVX512VL) || TARGET_AVX10_1"
   "vcvt<floatsuffix>qq2ps{x}\t{%1, %0%{%2%}%{z%}|%0%{%2%}%{z%}, %1}"
   [(set_attr "type" "ssecvt")
    (set_attr "prefix" "evex")
@@ -9401,9 +9415,9 @@
 (define_insn "<avx512>_cvt<ssemodesuffix>2mask<mode>"
   [(set (match_operand:<avx512fmaskmode> 0 "register_operand" "=k")
 	(unspec:<avx512fmaskmode>
-	 [(match_operand:VI48_AVX512VL 1 "register_operand" "v")]
+	 [(match_operand:VI48_AVX512VLDQ_AVX10_1 1 "register_operand" "v")]
 	 UNSPEC_CVTINT2MASK))]
-  "TARGET_AVX512DQ"
+  ""
   "vpmov<ssemodesuffix>2m\t{%1, %0|%0, %1}"
   [(set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
@@ -9432,42 +9446,48 @@
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_expand "<avx512>_cvtmask2<ssemodesuffix><mode>"
-  [(set (match_operand:VI48_AVX512VL 0 "register_operand")
-	(vec_merge:VI48_AVX512VL
+  [(set (match_operand:VI48_AVX512VL_AVX10_1 0 "register_operand")
+	(vec_merge:VI48_AVX512VL_AVX10_1
 	  (match_dup 2)
 	  (match_dup 3)
 	  (match_operand:<avx512fmaskmode> 1 "register_operand")))]
-  "TARGET_AVX512F"
+  ""
   "{
     operands[2] = CONSTM1_RTX (<MODE>mode);
     operands[3] = CONST0_RTX (<MODE>mode);
   }")
 
 (define_insn_and_split "*<avx512>_cvtmask2<ssemodesuffix><mode>"
-  [(set (match_operand:VI48_AVX512VL 0 "register_operand" "=v,v")
-	(vec_merge:VI48_AVX512VL
-	  (match_operand:VI48_AVX512VL 2 "vector_all_ones_operand")
-	  (match_operand:VI48_AVX512VL 3 "const0_operand")
+  [(set (match_operand:VI48_AVX512VL_AVX10_1 0 "register_operand" "=v,v")
+	(vec_merge:VI48_AVX512VL_AVX10_1
+	  (match_operand:VI48_AVX512VL_AVX10_1 2 "vector_all_ones_operand")
+	  (match_operand:VI48_AVX512VL_AVX10_1 3 "const0_operand")
 	  (match_operand:<avx512fmaskmode> 1 "register_operand" "k,Yk")))]
-  "TARGET_AVX512F"
+  ""
   "@
    vpmovm2<ssemodesuffix>\t{%1, %0|%0, %1}
    vpternlog<ssemodesuffix>\t{$0x81, %0, %0, %0%{%1%}%{z%}|%0%{%1%}%{z%}, %0, %0, 0x81}"
-  "&& !TARGET_AVX512DQ && reload_completed
+  "&& !TARGET_AVX512DQ
+   && (!TARGET_AVX10_1 || <MODE_SIZE> == 64)
+   && reload_completed
    && optimize_function_for_speed_p (cfun)"
   [(set (match_dup 0) (match_dup 4))
    (parallel
     [(set (match_dup 0)
-	  (vec_merge:VI48_AVX512VL
+	  (vec_merge:VI48_AVX512VL_AVX10_1
 	    (match_dup 2)
 	    (match_dup 3)
 	    (match_dup 1)))
      (unspec [(match_dup 0)] UNSPEC_INSN_FALSE_DEP)])]
   "operands[4] = CONST0_RTX (<MODE>mode);"
-  [(set_attr "isa" "avx512dq,*")
-   (set_attr "length_immediate" "0,1")
+  [(set_attr "length_immediate" "0,1")
    (set_attr "prefix" "evex")
-   (set_attr "mode" "<sseinsnmode>")])
+   (set_attr "mode" "<sseinsnmode>")
+   (set (attr "enabled")
+	(if_then_else (eq_attr "alternative" "0")
+		      (symbol_ref "(TARGET_AVX10_1 && <MODE_SIZE> != 64)
+				   || TARGET_AVX512DQ")
+		      (const_int 1)))])
 
 (define_insn "*<avx512>_cvtmask2<ssemodesuffix><mode>_pternlog_false_dep"
   [(set (match_operand:VI48_AVX512VL 0 "register_operand" "=v")
-- 
2.31.1


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH 4/6] Support AVX10.1 for AVX512DQ+AVX512VL intrins
  2023-08-08  7:13 Intel AVX10.1 Compiler Design and Support Haochen Jiang
                   ` (5 preceding siblings ...)
  2023-08-08  7:20 ` [PATCH 3/6] " Haochen Jiang
@ 2023-08-08  7:20 ` Haochen Jiang
  2023-08-08  7:20 ` [PATCH 5/6] " Haochen Jiang
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 88+ messages in thread
From: Haochen Jiang @ 2023-08-08  7:20 UTC (permalink / raw)
  To: gcc-patches; +Cc: ubizjak, hongtao.liu

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx10_1-abs-copysign-1.c: New test.
	* gcc.target/i386/avx10_1-vandpd-1.c: Ditto.
	* gcc.target/i386/avx10_1-vandps-1.c: Ditto.
	* gcc.target/i386/avx10_1-vcvtps2qq-1.c: Ditto.
	* gcc.target/i386/avx10_1-vcvtps2uqq-1.c: Ditto.
	* gcc.target/i386/avx10_1-vcvtqq2pd-1.c: Ditto.
	* gcc.target/i386/avx10_1-vcvtqq2ps-1.c: Ditto.
	* gcc.target/i386/avx10_1-vcvtuqq2pd-1.c: Ditto.
	* gcc.target/i386/avx10_1-vcvtuqq2ps-1.c: Ditto.
	* gcc.target/i386/avx10_1-vorpd-1.c: Ditto.
	* gcc.target/i386/avx10_1-vorps-1.c: Ditto.
	* gcc.target/i386/avx10_1-vpmovd2m-1.c: Ditto.
	* gcc.target/i386/avx10_1-vpmovm2d-1.c: Ditto.
	* gcc.target/i386/avx10_1-vpmovm2q-1.c: Ditto.
	* gcc.target/i386/avx10_1-vpmovq2m-1.c: Ditto.
	* gcc.target/i386/avx10_1-vxorpd-1.c: Ditto.
	* gcc.target/i386/avx10_1-vxorps-1.c: Ditto.
---
 .../gcc.target/i386/avx10_1-abs-copysign-1.c  | 69 +++++++++++++++++++
 .../gcc.target/i386/avx10_1-vandpd-1.c        | 21 ++++++
 .../gcc.target/i386/avx10_1-vandps-1.c        | 21 ++++++
 .../gcc.target/i386/avx10_1-vcvtps2qq-1.c     | 28 ++++++++
 .../gcc.target/i386/avx10_1-vcvtps2uqq-1.c    | 27 ++++++++
 .../gcc.target/i386/avx10_1-vcvtqq2pd-1.c     | 27 ++++++++
 .../gcc.target/i386/avx10_1-vcvtqq2ps-1.c     | 26 +++++++
 .../gcc.target/i386/avx10_1-vcvtuqq2pd-1.c    | 27 ++++++++
 .../gcc.target/i386/avx10_1-vcvtuqq2ps-1.c    | 27 ++++++++
 .../gcc.target/i386/avx10_1-vorpd-1.c         | 22 ++++++
 .../gcc.target/i386/avx10_1-vorps-1.c         | 22 ++++++
 .../gcc.target/i386/avx10_1-vpmovd2m-1.c      | 17 +++++
 .../gcc.target/i386/avx10_1-vpmovm2d-1.c      | 17 +++++
 .../gcc.target/i386/avx10_1-vpmovm2q-1.c      | 17 +++++
 .../gcc.target/i386/avx10_1-vpmovq2m-1.c      | 17 +++++
 .../gcc.target/i386/avx10_1-vxorpd-1.c        | 23 +++++++
 .../gcc.target/i386/avx10_1-vxorps-1.c        | 22 ++++++
 17 files changed, 430 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-abs-copysign-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-vandpd-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-vandps-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-vcvtps2qq-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-vcvtps2uqq-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-vcvtqq2pd-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-vcvtqq2ps-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-vcvtuqq2pd-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-vcvtuqq2ps-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-vorpd-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-vorps-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-vpmovd2m-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-vpmovm2d-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-vpmovm2q-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-vpmovq2m-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-vxorpd-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-vxorps-1.c

diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-abs-copysign-1.c b/gcc/testsuite/gcc.target/i386/avx10_1-abs-copysign-1.c
new file mode 100644
index 00000000000..e9e45e44051
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-abs-copysign-1.c
@@ -0,0 +1,69 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-Ofast -mavx10.1" } */
+
+void
+f1 (float x)
+{
+  register float a __asm ("xmm16");
+  a = x;
+  asm volatile ("" : "+v" (a));
+  a = __builtin_fabsf (a);
+  asm volatile ("" : "+v" (a));
+}
+/*
+void
+f2 (float x, float y)
+{
+  register float a __asm ("xmm16"), b __asm ("xmm17");
+  a = x;
+  b = y;
+  asm volatile ("" : "+v" (a), "+v" (b));
+  a = __builtin_copysignf (a, b);
+  asm volatile ("" : "+v" (a));
+}
+*/
+void
+f3 (float x)
+{
+  register float a __asm ("xmm16");
+  a = x;
+  asm volatile ("" : "+v" (a));
+  a = -a;
+  asm volatile ("" : "+v" (a));
+}
+
+void
+f4 (double x)
+{
+  register double a __asm ("xmm18");
+  a = x;
+  asm volatile ("" : "+v" (a));
+  a = __builtin_fabs (a);
+  asm volatile ("" : "+v" (a));
+}
+/*
+void
+f5 (double x, double y)
+{
+  register double a __asm ("xmm18"), b __asm ("xmm19");
+  a = x;
+  b = y;
+  asm volatile ("" : "+v" (a), "+v" (b));
+  a = __builtin_copysign (a, b);
+  asm volatile ("" : "+v" (a));
+}
+*/
+void
+f6 (double x)
+{
+  register double a __asm ("xmm18");
+  a = x;
+  asm volatile ("" : "+v" (a));
+  a = -a;
+  asm volatile ("" : "+v" (a));
+}
+
+/* { dg-final { scan-assembler "vandps\[^\n\r\]*xmm16" } } */
+/* { dg-final { scan-assembler "vxorps\[^\n\r\]*xmm16" } } */
+/* { dg-final { scan-assembler "vandpd\[^\n\r\]*xmm18" } } */
+/* { dg-final { scan-assembler "vxorpd\[^\n\r\]*xmm18" } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-vandpd-1.c b/gcc/testsuite/gcc.target/i386/avx10_1-vandpd-1.c
new file mode 100644
index 00000000000..3a765479f6d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-vandpd-1.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx10.1 -O2" } */
+/* { dg-final { scan-assembler-times "vandpd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vandpd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vandpd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vandpd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256d y;
+volatile __m128d x;
+volatile __mmask8 m;
+
+void extern
+avx10_1_test (void)
+{
+  y = _mm256_mask_and_pd (y, m, y, y);
+  y = _mm256_maskz_and_pd (m, y, y);
+  x = _mm_mask_and_pd (x, m, x, x);
+  x = _mm_maskz_and_pd (m, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-vandps-1.c b/gcc/testsuite/gcc.target/i386/avx10_1-vandps-1.c
new file mode 100644
index 00000000000..ed785af5f30
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-vandps-1.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx10.1 -O2" } */
+/* { dg-final { scan-assembler-times "vandps\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vandps\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vandps\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vandps\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256 y;
+volatile __m128 x;
+volatile __mmask8 m2;
+
+void extern
+avx10_1_test (void)
+{
+  y = _mm256_mask_and_ps (y, m2, y, y);
+  y = _mm256_maskz_and_ps (m2, y, y);
+  x = _mm_mask_and_ps (x, m2, x, x);
+  x = _mm_maskz_and_ps (m2, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-vcvtps2qq-1.c b/gcc/testsuite/gcc.target/i386/avx10_1-vcvtps2qq-1.c
new file mode 100644
index 00000000000..dad6dbe778d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-vcvtps2qq-1.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx10.1 -O2" } */
+/* { dg-final { scan-assembler-times "vcvtps2qq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtps2qq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtps2qq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtps2qq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtps2qq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtps2qq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256i x1;
+volatile __m128i x2;
+volatile __m256 z1;
+volatile __m128 z2;
+volatile __mmask8 m;
+
+void extern
+avx10_1_test (void)
+{
+  x1 = _mm256_cvtps_epi64 (z2);
+  x1 = _mm256_mask_cvtps_epi64 (x1, m, z2);
+  x1 = _mm256_maskz_cvtps_epi64 (m, z2);
+  x2 = _mm_cvtps_epi64 (z2);
+  x2 = _mm_mask_cvtps_epi64 (x2, m, z2);
+  x2 = _mm_maskz_cvtps_epi64 (m, z2);
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-vcvtps2uqq-1.c b/gcc/testsuite/gcc.target/i386/avx10_1-vcvtps2uqq-1.c
new file mode 100644
index 00000000000..24de26bd5e9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-vcvtps2uqq-1.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx10.1 -O2" } */
+/* { dg-final { scan-assembler-times "vcvtps2uqq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtps2uqq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtps2uqq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtps2uqq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtps2uqq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtps2uqq\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256i x1;
+volatile __m128i x2;
+volatile __m256 z1;
+volatile __m128 z2;
+volatile __mmask8 m;
+
+void extern
+avx10_1_test (void)
+{
+  x1 = _mm256_cvtps_epu64 (z2);
+  x1 = _mm256_mask_cvtps_epu64 (x1, m, z2);
+  x1 = _mm256_maskz_cvtps_epu64 (m, z2);
+  x2 = _mm_cvtps_epu64 (z2);
+  x2 = _mm_mask_cvtps_epu64 (x2, m, z2);
+  x2 = _mm_maskz_cvtps_epu64 (m, z2);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-vcvtqq2pd-1.c b/gcc/testsuite/gcc.target/i386/avx10_1-vcvtqq2pd-1.c
new file mode 100644
index 00000000000..5a2472292b5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-vcvtqq2pd-1.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx10.1 -O2" } */
+/* { dg-final { scan-assembler-times "vcvtqq2pd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtqq2pd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtqq2pd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtqq2pd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtqq2pd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtqq2pd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256i s1;
+volatile __m128i s2;
+volatile __m256d res1;
+volatile __m128d res2;
+volatile __mmask8 m;
+
+void extern
+avx10_1_test (void)
+{
+  res1 = _mm256_cvtepi64_pd (s1);
+  res1 = _mm256_mask_cvtepi64_pd (res1, m, s1);
+  res1 = _mm256_maskz_cvtepi64_pd (m, s1);
+  res2 = _mm_cvtepi64_pd (s2);
+  res2 = _mm_mask_cvtepi64_pd (res2, m, s2);
+  res2 = _mm_maskz_cvtepi64_pd (m, s2);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-vcvtqq2ps-1.c b/gcc/testsuite/gcc.target/i386/avx10_1-vcvtqq2ps-1.c
new file mode 100644
index 00000000000..7d735eb4c9c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-vcvtqq2ps-1.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx10.1 -O2" } */
+/* { dg-final { scan-assembler-times "vcvtqq2psx\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtqq2psx\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtqq2psx\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtqq2psy\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtqq2psy\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtqq2psy\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256i s1;
+volatile __m128i s2;
+volatile __m128 res;
+volatile __mmask8 m;
+
+void extern
+avx10_1_test (void)
+{
+  res = _mm256_cvtepi64_ps (s1);
+  res = _mm256_mask_cvtepi64_ps (res, m, s1);
+  res = _mm256_maskz_cvtepi64_ps (m, s1);
+  res = _mm_cvtepi64_ps (s2);
+  res = _mm_mask_cvtepi64_ps (res, m, s2);
+  res = _mm_maskz_cvtepi64_ps (m, s2);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-vcvtuqq2pd-1.c b/gcc/testsuite/gcc.target/i386/avx10_1-vcvtuqq2pd-1.c
new file mode 100644
index 00000000000..ab433a2ecde
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-vcvtuqq2pd-1.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx10.1 -O2" } */
+/* { dg-final { scan-assembler-times "vcvtuqq2pd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtuqq2pd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtuqq2pd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtuqq2pd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtuqq2pd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtuqq2pd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256i s1;
+volatile __m128i s2;
+volatile __m256d res1;
+volatile __m128d res2;
+volatile __mmask8 m;
+
+void extern
+avx10_1_test (void)
+{
+  res1 = _mm256_cvtepu64_pd (s1);
+  res1 = _mm256_mask_cvtepu64_pd (res1, m, s1);
+  res1 = _mm256_maskz_cvtepu64_pd (m, s1);
+  res2 = _mm_cvtepu64_pd (s2);
+  res2 = _mm_mask_cvtepu64_pd (res2, m, s2);
+  res2 = _mm_maskz_cvtepu64_pd (m, s2);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-vcvtuqq2ps-1.c b/gcc/testsuite/gcc.target/i386/avx10_1-vcvtuqq2ps-1.c
new file mode 100644
index 00000000000..ac9e788e4c9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-vcvtuqq2ps-1.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx10.1 -O2" } */
+/* { dg-final { scan-assembler-times "vcvtuqq2psx\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtuqq2psx\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtuqq2psx\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtuqq2psy\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtuqq2psy\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtuqq2psy\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256i s1;
+volatile __m128i s2;
+volatile __m256 res1;
+volatile __m128 res2;
+volatile __mmask8 m;
+
+void extern
+avx10_1_test (void)
+{
+  res2 = _mm256_cvtepu64_ps (s1);
+  res2 = _mm256_mask_cvtepu64_ps (res2, m, s1);
+  res2 = _mm256_maskz_cvtepu64_ps (m, s1);
+  res2 = _mm_cvtepu64_ps (s2);
+  res2 = _mm_mask_cvtepu64_ps (res2, m, s2);
+  res2 = _mm_maskz_cvtepu64_ps (m, s2);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-vorpd-1.c b/gcc/testsuite/gcc.target/i386/avx10_1-vorpd-1.c
new file mode 100644
index 00000000000..d2367d136a8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-vorpd-1.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx10.1 -O2" } */
+/* { dg-final { scan-assembler-times "vorpd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vorpd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vorpd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vorpd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256d y;
+volatile __m128d x;
+volatile __mmask8 m;
+
+void extern
+avx10_1_test (void)
+{
+  y = _mm256_mask_or_pd (y, m, y, y);
+  y = _mm256_maskz_or_pd (m, y, y);
+
+  x = _mm_mask_or_pd (x, m, x, x);
+  x = _mm_maskz_or_pd (m, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-vorps-1.c b/gcc/testsuite/gcc.target/i386/avx10_1-vorps-1.c
new file mode 100644
index 00000000000..2ba919ed2e2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-vorps-1.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx10.1 -O2" } */
+/* { dg-final { scan-assembler-times "vorps\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vorps\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vorps\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vorps\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256 y;
+volatile __m128 x;
+volatile __mmask8 n;
+
+void extern
+avx10_1_test (void)
+{
+  y = _mm256_mask_or_ps (y, n, y, y);
+  y = _mm256_maskz_or_ps (n, y, y);
+
+  x = _mm_mask_or_ps (x, n, x, x);
+  x = _mm_maskz_or_ps (n, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-vpmovd2m-1.c b/gcc/testsuite/gcc.target/i386/avx10_1-vpmovd2m-1.c
new file mode 100644
index 00000000000..68f1a9485ed
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-vpmovd2m-1.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx10.1 -O2" } */
+/* { dg-final { scan-assembler-times "vpmovd2m\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpmovd2m\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256i x256;
+volatile __m128i x128;
+volatile __mmask8 m;
+
+void extern
+avx10_1_test (void)
+{
+  m = _mm_movepi32_mask (x128);
+  m = _mm256_movepi32_mask (x256);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-vpmovm2d-1.c b/gcc/testsuite/gcc.target/i386/avx10_1-vpmovm2d-1.c
new file mode 100644
index 00000000000..89ac3bd49ed
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-vpmovm2d-1.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx10.1 -O2" } */
+/* { dg-final { scan-assembler-times "vpmovm2d\[ \\t\]+\[^\{\n\]*%k\[0-7\]\[^\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpmovm2d\[ \\t\]+\[^\{\n\]*%k\[0-7\]\[^\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256i x256;
+volatile __m128i x128;
+volatile __mmask8 m8;
+
+void extern
+avx10_1_test (void)
+{
+  x128 = _mm_movm_epi32 (m8);
+  x256 = _mm256_movm_epi32 (m8);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-vpmovm2q-1.c b/gcc/testsuite/gcc.target/i386/avx10_1-vpmovm2q-1.c
new file mode 100644
index 00000000000..b5a3298c4ab
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-vpmovm2q-1.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx10.1 -O2" } */
+/* { dg-final { scan-assembler-times "vpmovm2q\[ \\t\]+\[^\{\n\]*%k\[0-7\]\[^\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpmovm2q\[ \\t\]+\[^\{\n\]*%k\[0-7\]\[^\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256i x256;
+volatile __m128i x128;
+volatile __mmask8 m;
+
+void extern
+avx10_1_test (void)
+{
+  x128 = _mm_movm_epi64 (m);
+  x256 = _mm256_movm_epi64 (m);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-vpmovq2m-1.c b/gcc/testsuite/gcc.target/i386/avx10_1-vpmovq2m-1.c
new file mode 100644
index 00000000000..2eb1f81a7ed
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-vpmovq2m-1.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx10.1 -O2" } */
+/* { dg-final { scan-assembler-times "vpmovq2m\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vpmovq2m\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256i x256;
+volatile __m128i x128;
+volatile __mmask8 m;
+
+void extern
+avx10_1_test (void)
+{
+  m = _mm_movepi64_mask (x128);
+  m = _mm256_movepi64_mask (x256);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-vxorpd-1.c b/gcc/testsuite/gcc.target/i386/avx10_1-vxorpd-1.c
new file mode 100644
index 00000000000..062acc9b011
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-vxorpd-1.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx10.1 -O2" } */
+/* { dg-final { scan-assembler-times "vxorpd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vxorpd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vxorpd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vxorpd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256d y;
+volatile __m128d x;
+volatile __mmask8 m;
+
+
+void extern
+avx10_1_test (void)
+{
+  y = _mm256_mask_xor_pd (y, m, y, y);
+  y = _mm256_maskz_xor_pd (m, y, y);
+
+  x = _mm_mask_xor_pd (x, m, x, x);
+  x = _mm_maskz_xor_pd (m, x, x);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-vxorps-1.c b/gcc/testsuite/gcc.target/i386/avx10_1-vxorps-1.c
new file mode 100644
index 00000000000..04473ce0468
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-vxorps-1.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx10.1 -O2" } */
+/* { dg-final { scan-assembler-times "vxorps\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vxorps\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vxorps\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vxorps\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256 y;
+volatile __m128 x;
+volatile __mmask8 n;
+
+void extern
+avx10_1_test (void)
+{
+  y = _mm256_mask_xor_ps (y, n, y, y);
+  y = _mm256_maskz_xor_ps (n, y, y);
+
+  x = _mm_mask_xor_ps (x, n, x, x);
+  x = _mm_maskz_xor_ps (n, x, x);
+}
-- 
2.31.1


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH 5/6] Support AVX10.1 for AVX512DQ+AVX512VL intrins
  2023-08-08  7:13 Intel AVX10.1 Compiler Design and Support Haochen Jiang
                   ` (6 preceding siblings ...)
  2023-08-08  7:20 ` [PATCH 4/6] " Haochen Jiang
@ 2023-08-08  7:20 ` Haochen Jiang
  2023-08-08  7:20 ` [PATCH 6/6] " Haochen Jiang
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 88+ messages in thread
From: Haochen Jiang @ 2023-08-08  7:20 UTC (permalink / raw)
  To: gcc-patches; +Cc: ubizjak, hongtao.liu

gcc/ChangeLog:

	* config/i386/avx512vldqintrin.h: Remove target attribute.
	* config/i386/i386-builtin.def (BDESC):
	Add OPTION_MASK_ISA2_AVX10_1.
	* config/i386/sse.md (VF_AVX512VLDQ_AVX10_1): New.
	(VFH_AVX512VLDQ_AVX10_1): Ditto.
	(VF1_AVX512VLDQ_AVX10_1): Ditto.
	(<mask_codefor>reducep<mode><mask_name><round_saeonly_name>):
	Change iterator to VFH_AVX512VLDQ_AVX10_1. Remove target check.
	(vec_pack<floatprefix>_float_<mode>): Change iterator to
	VI8_AVX512VLDQ_AVX10_1. Remove target check.
	(vec_unpack_<fixprefix>fix_trunc_lo_<mode>): Change iterator to
	VF1_AVX512VLDQ_AVX10_1. Remove target check.
	(vec_unpack_<fixprefix>fix_trunc_hi_<mode>): Ditto.
	(VI48F_256_DQVL_AVX10_1): Rename from VI48F_256_DQ.
	(avx512vl_vextractf128<mode>): Change iterator to
	VI48F_256_DQVL_AVX10_1. Remove target check.
	(vec_extract_hi_<mode>_mask): Add TARGET_AVX10_1.
	(vec_extract_hi_<mode>): Ditto.
	(avx512vl_vinsert<mode>): Ditto.
	(vec_set_lo_<mode><mask_name>): Ditto.
	(vec_set_hi_<mode><mask_name>): Ditto.
	(avx512dq_rangep<mode><mask_name><round_saeonly_name>): Change
	iterator to VF_AVX512VLDQ_AVX10_1. Remove target check.
	(avx512dq_fpclass<mode><mask_scalar_merge_name>): Change
	iterator to VFH_AVX512VLDQ_AVX10_1. Remove target check.
	* config/i386/subst.md (mask_avx512dq_condition): Add
	TARGET_AVX10_1.
	(mask_scalar_merge): Ditto.
---
 gcc/config/i386/avx512vldqintrin.h | 11 ----
 gcc/config/i386/i386-builtin.def   | 32 +++++-----
 gcc/config/i386/sse.md             | 94 ++++++++++++++++++------------
 gcc/config/i386/subst.md           |  4 +-
 4 files changed, 76 insertions(+), 65 deletions(-)

diff --git a/gcc/config/i386/avx512vldqintrin.h b/gcc/config/i386/avx512vldqintrin.h
index a8d14a4efc9..1fbf93a0b52 100644
--- a/gcc/config/i386/avx512vldqintrin.h
+++ b/gcc/config/i386/avx512vldqintrin.h
@@ -1331,12 +1331,6 @@ _mm256_movepi64_mask (__m256i __A)
   return (__mmask8) __builtin_ia32_cvtq2mask256 ((__v4di) __A);
 }
 
-#if !defined(__AVX512VL__) || !defined(__AVX512DQ__)
-#pragma GCC push_options
-#pragma GCC target("avx512vl,avx512dq")
-#define __DISABLE_AVX512VLDQ__
-#endif /* __AVX512VLDQ__ */
-
 #ifdef __OPTIMIZE__
 extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
@@ -2008,9 +2002,4 @@ _mm256_maskz_insertf64x2 (__mmask8 __U, __m256d __A, __m128d __B,
 
 #endif
 
-#ifdef __DISABLE_AVX512VLDQ__
-#undef __DISABLE_AVX512VLDQ__
-#pragma GCC pop_options
-#endif /* __DISABLE_AVX512VLDQ__ */
-
 #endif /* _AVX512VLDQINTRIN_H_INCLUDED */
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index aa0a29caa9f..34768552e78 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -1782,8 +1782,8 @@ BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_vec_dup_gprv2di_mask, "__b
 BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_vec_dupv8sf_mask, "__builtin_ia32_broadcastss256_mask", IX86_BUILTIN_BROADCASTSS256, UNKNOWN, (int) V8SF_FTYPE_V4SF_V8SF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_vec_dupv4sf_mask, "__builtin_ia32_broadcastss128_mask", IX86_BUILTIN_BROADCASTSS128, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_vec_dupv4df_mask, "__builtin_ia32_broadcastsd256_mask", IX86_BUILTIN_BROADCASTSD256, UNKNOWN, (int) V4DF_FTYPE_V2DF_V4DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_vextractf128v4df, "__builtin_ia32_extractf64x2_256_mask", IX86_BUILTIN_EXTRACTF64X2_256, UNKNOWN, (int) V2DF_FTYPE_V4DF_INT_V2DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_vextractf128v4di, "__builtin_ia32_extracti64x2_256_mask", IX86_BUILTIN_EXTRACTI64X2_256, UNKNOWN, (int) V2DI_FTYPE_V4DI_INT_V2DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_avx512vl_vextractf128v4df, "__builtin_ia32_extractf64x2_256_mask", IX86_BUILTIN_EXTRACTF64X2_256, UNKNOWN, (int) V2DF_FTYPE_V4DF_INT_V2DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_avx512vl_vextractf128v4di, "__builtin_ia32_extracti64x2_256_mask", IX86_BUILTIN_EXTRACTI64X2_256, UNKNOWN, (int) V2DI_FTYPE_V4DI_INT_V2DI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_vinsertv8sf, "__builtin_ia32_insertf32x4_256_mask", IX86_BUILTIN_INSERTF32X4_256, UNKNOWN, (int) V8SF_FTYPE_V8SF_V4SF_INT_V8SF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_vinsertv8si, "__builtin_ia32_inserti32x4_256_mask", IX86_BUILTIN_INSERTI32X4_256, UNKNOWN, (int) V8SI_FTYPE_V8SI_V4SI_INT_V8SI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512BW | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx2_sign_extendv16qiv16hi2_mask, "__builtin_ia32_pmovsxbw256_mask", IX86_BUILTIN_PMOVSXBW256_MASK, UNKNOWN, (int) V16HI_FTYPE_V16QI_V16HI_UHI)
@@ -1810,10 +1810,10 @@ BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx2_zero_extendv4hiv4di2_mask, "__
 BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_sse4_1_zero_extendv2hiv2di2_mask, "__builtin_ia32_pmovzxwq128_mask", IX86_BUILTIN_PMOVZXWQ128_MASK, UNKNOWN, (int) V2DI_FTYPE_V8HI_V2DI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx2_zero_extendv4siv4di2_mask, "__builtin_ia32_pmovzxdq256_mask", IX86_BUILTIN_PMOVZXDQ256_MASK, UNKNOWN, (int) V4DI_FTYPE_V4SI_V4DI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_sse4_1_zero_extendv2siv2di2_mask, "__builtin_ia32_pmovzxdq128_mask", IX86_BUILTIN_PMOVZXDQ128_MASK, UNKNOWN, (int) V2DI_FTYPE_V4SI_V2DI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_reducepv4df_mask, "__builtin_ia32_reducepd256_mask", IX86_BUILTIN_REDUCEPD256_MASK, UNKNOWN, (int) V4DF_FTYPE_V4DF_INT_V4DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_reducepv2df_mask, "__builtin_ia32_reducepd128_mask", IX86_BUILTIN_REDUCEPD128_MASK, UNKNOWN, (int) V2DF_FTYPE_V2DF_INT_V2DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_reducepv8sf_mask, "__builtin_ia32_reduceps256_mask", IX86_BUILTIN_REDUCEPS256_MASK, UNKNOWN, (int) V8SF_FTYPE_V8SF_INT_V8SF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_reducepv4sf_mask, "__builtin_ia32_reduceps128_mask", IX86_BUILTIN_REDUCEPS128_MASK, UNKNOWN, (int) V4SF_FTYPE_V4SF_INT_V4SF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_reducepv4df_mask, "__builtin_ia32_reducepd256_mask", IX86_BUILTIN_REDUCEPD256_MASK, UNKNOWN, (int) V4DF_FTYPE_V4DF_INT_V4DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_reducepv2df_mask, "__builtin_ia32_reducepd128_mask", IX86_BUILTIN_REDUCEPD128_MASK, UNKNOWN, (int) V2DF_FTYPE_V2DF_INT_V2DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_reducepv8sf_mask, "__builtin_ia32_reduceps256_mask", IX86_BUILTIN_REDUCEPS256_MASK, UNKNOWN, (int) V8SF_FTYPE_V8SF_INT_V8SF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_reducepv4sf_mask, "__builtin_ia32_reduceps128_mask", IX86_BUILTIN_REDUCEPS128_MASK, UNKNOWN, (int) V4SF_FTYPE_V4SF_INT_V4SF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_reducesv2df_mask, "__builtin_ia32_reducesd_mask", IX86_BUILTIN_REDUCESD128_MASK, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT_V2DF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_reducesv4sf_mask, "__builtin_ia32_reducess_mask", IX86_BUILTIN_REDUCESS128_MASK, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT_V4SF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512BW | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_permvarv16hi_mask, "__builtin_ia32_permvarhi256_mask", IX86_BUILTIN_VPERMVARHI256_MASK, UNKNOWN, (int) V16HI_FTYPE_V16HI_V16HI_V16HI_UHI)
@@ -1908,10 +1908,10 @@ BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_ss_truncatev2div2si2_mask,
 BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_ss_truncatev4div4si2_mask, "__builtin_ia32_pmovsqd256_mask", IX86_BUILTIN_PMOVSQD256, UNKNOWN, (int) V4SI_FTYPE_V4DI_V4SI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_us_truncatev2div2si2_mask, "__builtin_ia32_pmovusqd128_mask", IX86_BUILTIN_PMOVUSQD128, UNKNOWN, (int) V4SI_FTYPE_V2DI_V4SI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_us_truncatev4div4si2_mask, "__builtin_ia32_pmovusqd256_mask", IX86_BUILTIN_PMOVUSQD256, UNKNOWN, (int) V4SI_FTYPE_V4DI_V4SI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512dq_rangepv4df_mask, "__builtin_ia32_rangepd256_mask", IX86_BUILTIN_RANGEPD256, UNKNOWN, (int) V4DF_FTYPE_V4DF_V4DF_INT_V4DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512dq_rangepv2df_mask, "__builtin_ia32_rangepd128_mask", IX86_BUILTIN_RANGEPD128, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT_V2DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512dq_rangepv8sf_mask, "__builtin_ia32_rangeps256_mask", IX86_BUILTIN_RANGEPS256, UNKNOWN, (int) V8SF_FTYPE_V8SF_V8SF_INT_V8SF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512dq_rangepv4sf_mask, "__builtin_ia32_rangeps128_mask", IX86_BUILTIN_RANGEPS128, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT_V4SF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_avx512dq_rangepv4df_mask, "__builtin_ia32_rangepd256_mask", IX86_BUILTIN_RANGEPD256, UNKNOWN, (int) V4DF_FTYPE_V4DF_V4DF_INT_V4DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_avx512dq_rangepv2df_mask, "__builtin_ia32_rangepd128_mask", IX86_BUILTIN_RANGEPD128, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT_V2DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_avx512dq_rangepv8sf_mask, "__builtin_ia32_rangeps256_mask", IX86_BUILTIN_RANGEPS256, UNKNOWN, (int) V8SF_FTYPE_V8SF_V8SF_INT_V8SF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_avx512dq_rangepv4sf_mask, "__builtin_ia32_rangeps128_mask", IX86_BUILTIN_RANGEPS128, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT_V4SF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_getexpv8sf_mask, "__builtin_ia32_getexpps256_mask", IX86_BUILTIN_GETEXPPS256, UNKNOWN, (int) V8SF_FTYPE_V8SF_V8SF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_getexpv4df_mask, "__builtin_ia32_getexppd256_mask", IX86_BUILTIN_GETEXPPD256, UNKNOWN, (int) V4DF_FTYPE_V4DF_V4DF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_getexpv4sf_mask, "__builtin_ia32_getexpps128_mask", IX86_BUILTIN_GETEXPPS128, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_UQI)
@@ -2076,8 +2076,8 @@ BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_fmsubadd_v4df_mask3, "__bu
 BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_fmsubadd_v2df_mask3, "__builtin_ia32_vfmsubaddpd128_mask3", IX86_BUILTIN_VFMSUBADDPD128_MASK3, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_fmsubadd_v8sf_mask3, "__builtin_ia32_vfmsubaddps256_mask3", IX86_BUILTIN_VFMSUBADDPS256_MASK3, UNKNOWN, (int) V8SF_FTYPE_V8SF_V8SF_V8SF_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_fmsubadd_v4sf_mask3, "__builtin_ia32_vfmsubaddps128_mask3", IX86_BUILTIN_VFMSUBADDPS128_MASK3, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_vinsertv4df, "__builtin_ia32_insertf64x2_256_mask", IX86_BUILTIN_INSERTF64X2_256, UNKNOWN, (int) V4DF_FTYPE_V4DF_V2DF_INT_V4DF_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_vinsertv4di, "__builtin_ia32_inserti64x2_256_mask", IX86_BUILTIN_INSERTI64X2_256, UNKNOWN, (int) V4DI_FTYPE_V4DI_V2DI_INT_V4DI_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_avx512vl_vinsertv4df, "__builtin_ia32_insertf64x2_256_mask", IX86_BUILTIN_INSERTF64X2_256, UNKNOWN, (int) V4DF_FTYPE_V4DF_V2DF_INT_V4DF_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_avx512vl_vinsertv4di, "__builtin_ia32_inserti64x2_256_mask", IX86_BUILTIN_INSERTI64X2_256, UNKNOWN, (int) V4DI_FTYPE_V4DI_V2DI_INT_V4DI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512BW | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_ashrvv16hi_mask, "__builtin_ia32_psrav16hi_mask", IX86_BUILTIN_PSRAVV16HI, UNKNOWN, (int) V16HI_FTYPE_V16HI_V16HI_V16HI_UHI)
 BDESC (OPTION_MASK_ISA_AVX512BW | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_ashrvv8hi_mask, "__builtin_ia32_psrav8hi_mask", IX86_BUILTIN_PSRAVV8HI, UNKNOWN, (int) V8HI_FTYPE_V8HI_V8HI_V8HI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512BW | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512bw_pmaddubsw512v16hi_mask, "__builtin_ia32_pmaddubsw256_mask", IX86_BUILTIN_PMADDUBSW256_MASK, UNKNOWN, (int) V16HI_FTYPE_V32QI_V32QI_V16HI_UHI)
@@ -2184,11 +2184,11 @@ BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_rorvv4si_mask, "__builtin_
 BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_rolvv4si_mask, "__builtin_ia32_prolvd128_mask", IX86_BUILTIN_PROLVD128, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_rorv4si_mask, "__builtin_ia32_prord128_mask", IX86_BUILTIN_PRORD128, UNKNOWN, (int) V4SI_FTYPE_V4SI_INT_V4SI_UQI)
 BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_rolv4si_mask, "__builtin_ia32_prold128_mask", IX86_BUILTIN_PROLD128, UNKNOWN, (int) V4SI_FTYPE_V4SI_INT_V4SI_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512dq_fpclassv4df_mask, "__builtin_ia32_fpclasspd256_mask", IX86_BUILTIN_FPCLASSPD256, UNKNOWN, (int) QI_FTYPE_V4DF_INT_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512dq_fpclassv2df_mask, "__builtin_ia32_fpclasspd128_mask", IX86_BUILTIN_FPCLASSPD128, UNKNOWN, (int) QI_FTYPE_V2DF_INT_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_avx512dq_fpclassv4df_mask, "__builtin_ia32_fpclasspd256_mask", IX86_BUILTIN_FPCLASSPD256, UNKNOWN, (int) QI_FTYPE_V4DF_INT_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_avx512dq_fpclassv2df_mask, "__builtin_ia32_fpclasspd128_mask", IX86_BUILTIN_FPCLASSPD128, UNKNOWN, (int) QI_FTYPE_V2DF_INT_UQI)
 BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_vmfpclassv2df_mask, "__builtin_ia32_fpclasssd_mask", IX86_BUILTIN_FPCLASSSD_MASK, UNKNOWN, (int) QI_FTYPE_V2DF_INT_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512dq_fpclassv8sf_mask, "__builtin_ia32_fpclassps256_mask", IX86_BUILTIN_FPCLASSPS256, UNKNOWN, (int) QI_FTYPE_V8SF_INT_UQI)
-BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512dq_fpclassv4sf_mask, "__builtin_ia32_fpclassps128_mask", IX86_BUILTIN_FPCLASSPS128, UNKNOWN, (int) QI_FTYPE_V4SF_INT_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_avx512dq_fpclassv8sf_mask, "__builtin_ia32_fpclassps256_mask", IX86_BUILTIN_FPCLASSPS256, UNKNOWN, (int) QI_FTYPE_V8SF_INT_UQI)
+BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX10_1, CODE_FOR_avx512dq_fpclassv4sf_mask, "__builtin_ia32_fpclassps128_mask", IX86_BUILTIN_FPCLASSPS128, UNKNOWN, (int) QI_FTYPE_V4SF_INT_UQI)
 BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_vmfpclassv4sf_mask, "__builtin_ia32_fpclassss_mask", IX86_BUILTIN_FPCLASSSS_MASK, UNKNOWN, (int) QI_FTYPE_V4SF_INT_UQI)
 BDESC (OPTION_MASK_ISA_AVX512BW | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_cvtb2maskv16qi, "__builtin_ia32_cvtb2mask128", IX86_BUILTIN_CVTB2MASK128, UNKNOWN, (int) UHI_FTYPE_V16QI)
 BDESC (OPTION_MASK_ISA_AVX512BW | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_cvtb2maskv32qi, "__builtin_ia32_cvtb2mask256", IX86_BUILTIN_CVTB2MASK256, UNKNOWN, (int) USI_FTYPE_V32QI)
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 9003776ee01..6784a8c5369 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -467,6 +467,14 @@
   [V16SF (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")
    V8DF (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")])
 
+(define_mode_iterator VF_AVX512VLDQ_AVX10_1
+  [(V16SF "TARGET_AVX512DQ")
+   (V8SF "(TARGET_AVX512DQ && TARGET_AVX512VL) || TARGET_AVX10_1")
+   (V4SF "(TARGET_AVX512DQ && TARGET_AVX512VL) || TARGET_AVX10_1")
+   (V8DF "TARGET_AVX512DQ")
+   (V4DF "(TARGET_AVX512DQ && TARGET_AVX512VL) || TARGET_AVX10_1")
+   (V2DF "(TARGET_AVX512DQ && TARGET_AVX512VL) || TARGET_AVX10_1")])
+
 ;; AVX512ER SF plus 128- and 256-bit SF vector modes
 (define_mode_iterator VF1_AVX512ER_128_256
   [(V16SF "TARGET_AVX512ER") (V8SF "TARGET_AVX") V4SF])
@@ -478,6 +486,17 @@
    V16SF (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")
    V8DF (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")])
 
+(define_mode_iterator VFH_AVX512VLDQ_AVX10_1
+  [(V32HF "TARGET_AVX512FP16")
+   (V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
+   (V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
+   (V16SF "TARGET_AVX512DQ")
+   (V8SF "(TARGET_AVX512DQ && TARGET_AVX512VL) || TARGET_AVX10_1")
+   (V4SF "(TARGET_AVX512DQ && TARGET_AVX512VL) || TARGET_AVX10_1")
+   (V8DF "TARGET_AVX512DQ")
+   (V4DF "(TARGET_AVX512DQ && TARGET_AVX512VL) || TARGET_AVX10_1")
+   (V2DF "(TARGET_AVX512DQ && TARGET_AVX512VL) || TARGET_AVX10_1")])
+
 (define_mode_iterator VF2_AVX512VLDQ_AVX10_1
   [(V8DF "TARGET_AVX512DQ")
    (V4DF "(TARGET_AVX512DQ && TARGET_AVX512VL) || TARGET_AVX10_1")
@@ -486,6 +505,11 @@
 (define_mode_iterator VF1_AVX512VL
   [V16SF (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")])
 
+(define_mode_iterator VF1_AVX512VLDQ_AVX10_1
+  [(V16SF "TARGET_AVX512DQ")
+   (V8SF "(TARGET_AVX512DQ && TARGET_AVX512VL) || TARGET_AVX10_1")
+   (V4SF "(TARGET_AVX512DQ && TARGET_AVX512VL) || TARGET_AVX10_1")])
+
 (define_mode_iterator VF_AVX512FP16
   [V32HF V16HF V8HF])
 
@@ -3520,12 +3544,12 @@
 })
 
 (define_insn "<mask_codefor>reducep<mode><mask_name><round_saeonly_name>"
-  [(set (match_operand:VFH_AVX512VL 0 "register_operand" "=v")
-	(unspec:VFH_AVX512VL
-	  [(match_operand:VFH_AVX512VL 1 "<round_saeonly_nimm_predicate>" "<round_saeonly_constraint>")
+  [(set (match_operand:VFH_AVX512VLDQ_AVX10_1 0 "register_operand" "=v")
+	(unspec:VFH_AVX512VLDQ_AVX10_1
+	  [(match_operand:VFH_AVX512VLDQ_AVX10_1 1 "<round_saeonly_nimm_predicate>" "<round_saeonly_constraint>")
 	   (match_operand:SI 2 "const_0_to_255_operand")]
 	  UNSPEC_REDUCE))]
-  "TARGET_AVX512DQ || (VALID_AVX512FP16_REG_MODE (<MODE>mode))"
+  ""
   "vreduce<ssemodesuffix>\t{%2, <round_saeonly_mask_op3>%1, %0<mask_operand3>|%0<mask_operand3>, %1<round_saeonly_mask_op3>, %2}"
   [(set_attr "type" "sse")
    (set_attr "prefix" "evex")
@@ -8514,9 +8538,9 @@
 (define_expand "vec_pack<floatprefix>_float_<mode>"
   [(match_operand:<ssePSmode> 0 "register_operand")
    (any_float:<ssePSmode>
-     (match_operand:VI8_AVX512VL 1 "register_operand"))
-   (match_operand:VI8_AVX512VL 2 "register_operand")]
-  "TARGET_AVX512DQ"
+     (match_operand:VI8_AVX512VLDQ_AVX10_1 1 "register_operand"))
+   (match_operand:VI8_AVX512VLDQ_AVX10_1 2 "register_operand")]
+  ""
 {
   rtx r1 = gen_reg_rtx (<vpckfloat_temp_mode>mode);
   rtx r2 = gen_reg_rtx (<vpckfloat_temp_mode>mode);
@@ -8975,8 +8999,8 @@
 (define_expand "vec_unpack_<fixprefix>fix_trunc_lo_<mode>"
   [(match_operand:<vunpckfixt_mode> 0 "register_operand")
    (any_fix:<vunpckfixt_mode>
-     (match_operand:VF1_AVX512VL 1 "register_operand"))]
-  "TARGET_AVX512DQ"
+     (match_operand:VF1_AVX512VLDQ_AVX10_1 1 "register_operand"))]
+  ""
 {
   rtx tem = operands[1];
   rtx (*gen) (rtx, rtx);
@@ -8998,8 +9022,8 @@
 (define_expand "vec_unpack_<fixprefix>fix_trunc_hi_<mode>"
   [(match_operand:<vunpckfixt_mode> 0 "register_operand")
    (any_fix:<vunpckfixt_mode>
-     (match_operand:VF1_AVX512VL 1 "register_operand"))]
-  "TARGET_AVX512DQ"
+     (match_operand:VF1_AVX512VLDQ_AVX10_1 1 "register_operand"))]
+  ""
 {
   rtx tem;
   rtx (*gen) (rtx, rtx);
@@ -11812,16 +11836,19 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_mode_iterator VI48F_256_DQ
-  [V8SI V8SF (V4DI "TARGET_AVX512DQ") (V4DF "TARGET_AVX512DQ")])
+(define_mode_iterator VI48F_256_DQVL_AVX10_1
+  [(V8SI "TARGET_AVX512VL")
+   (V8SF "TARGET_AVX512VL")
+   (V4DI "(TARGET_AVX512DQ && TARGET_AVX512VL) || TARGET_AVX10_1")
+   (V4DF "(TARGET_AVX512DQ && TARGET_AVX512VL) || TARGET_AVX10_1")])
 
 (define_expand "avx512vl_vextractf128<mode>"
   [(match_operand:<ssehalfvecmode> 0 "nonimmediate_operand")
-   (match_operand:VI48F_256_DQ 1 "register_operand")
+   (match_operand:VI48F_256_DQVL_AVX10_1 1 "register_operand")
    (match_operand:SI 2 "const_0_to_1_operand")
    (match_operand:<ssehalfvecmode> 3 "nonimm_or_0_operand")
    (match_operand:QI 4 "register_operand")]
-  "TARGET_AVX512VL"
+  ""
 {
   rtx (*insn)(rtx, rtx, rtx, rtx);
   rtx dest = operands[0];
@@ -11960,8 +11987,7 @@
 	    (parallel [(const_int 0) (const_int 1)]))
 	  (match_operand:<ssehalfvecmode> 2 "nonimm_or_0_operand" "0C,0")
 	  (match_operand:QI 3 "register_operand" "Yk,Yk")))]
-  "TARGET_AVX512DQ
-   && TARGET_AVX512VL
+  "((TARGET_AVX512DQ && TARGET_AVX512VL) || TARGET_AVX10_1)
    && (!MEM_P (operands[0]) || rtx_equal_p (operands[0], operands[2]))"
   "vextract<shuffletype>64x2\t{$0x0, %1, %0%{%3%}%N2|%0%{%3%}%N2, %1, 0x0}"
    [(set_attr "type" "sselog1")
@@ -11997,8 +12023,7 @@
 	    (parallel [(const_int 2) (const_int 3)]))
 	  (match_operand:<ssehalfvecmode> 2 "nonimm_or_0_operand" "0C,0")
 	  (match_operand:QI 3 "register_operand" "Yk,Yk")))]
-  "TARGET_AVX512DQ
-   && TARGET_AVX512VL
+  "((TARGET_AVX512DQ && TARGET_AVX512VL) || TARGET_AVX10_1)
    && (!MEM_P (operands[0]) || rtx_equal_p (operands[0], operands[2]))"
   "vextract<shuffletype>64x2\t{$0x1, %1, %0%{%3%}%N2|%0%{%3%}%N2, %1, 0x1}"
   [(set_attr "type" "sselog1")
@@ -12013,13 +12038,10 @@
 	  (parallel [(const_int 2) (const_int 3)])))]
   "TARGET_AVX"
 {
-  if (TARGET_AVX512VL)
-    {
-      if (TARGET_AVX512DQ)
-	return "vextract<shuffletype>64x2\t{$0x1, %1, %0|%0, %1, 0x1}";
-      else
-	return "vextract<shuffletype>32x4\t{$0x1, %1, %0|%0, %1, 0x1}";
-    }
+  if ((TARGET_AVX512DQ && TARGET_AVX512VL) || TARGET_AVX10_1)
+    return "vextract<shuffletype>64x2\t{$0x1, %1, %0|%0, %1, 0x1}";
+  else if (TARGET_AVX512VL)
+    return "vextract<shuffletype>32x4\t{$0x1, %1, %0|%0, %1, 0x1}";
   else
     return "vextract<i128>\t{$0x1, %1, %0|%0, %1, 0x1}";
 }
@@ -27201,7 +27223,7 @@
    (match_operand:SI 3 "const_0_to_1_operand")
    (match_operand:VI48F_256 4 "register_operand")
    (match_operand:<avx512fmaskmode> 5 "register_operand")]
-  "TARGET_AVX512VL"
+  "TARGET_AVX512VL || TARGET_AVX10_1"
 {
   rtx (*insn)(rtx, rtx, rtx, rtx, rtx);
 
@@ -27256,7 +27278,7 @@
 	    (parallel [(const_int 2) (const_int 3)]))))]
   "TARGET_AVX && <mask_avx512dq_condition>"
 {
-  if (TARGET_AVX512DQ)
+  if ((TARGET_AVX512DQ && TARGET_AVX512VL) || TARGET_AVX10_1)
     return "vinsert<shuffletype>64x2\t{$0x0, %2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2, 0x0}";
   else if (TARGET_AVX512VL)
     return "vinsert<shuffletype>32x4\t{$0x0, %2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2, 0x0}";
@@ -27278,7 +27300,7 @@
 	  (match_operand:<ssehalfvecmode> 2 "nonimmediate_operand" "vm")))]
   "TARGET_AVX && <mask_avx512dq_condition>"
 {
-  if (TARGET_AVX512DQ)
+  if ((TARGET_AVX512DQ && TARGET_AVX512VL)|| TARGET_AVX10_1)
     return "vinsert<shuffletype>64x2\t{$0x1, %2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2, 0x1}";
   else if (TARGET_AVX512VL)
     return "vinsert<shuffletype>32x4\t{$0x1, %2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2, 0x1}";
@@ -28549,13 +28571,13 @@
   "operands[2] = CONST0_RTX (<MODE>mode);")
 
 (define_insn "avx512dq_rangep<mode><mask_name><round_saeonly_name>"
-  [(set (match_operand:VF_AVX512VL 0 "register_operand" "=v")
-	(unspec:VF_AVX512VL
-	  [(match_operand:VF_AVX512VL 1 "register_operand" "v")
-	   (match_operand:VF_AVX512VL 2 "<round_saeonly_nimm_predicate>" "<round_saeonly_constraint>")
+  [(set (match_operand:VF_AVX512VLDQ_AVX10_1 0 "register_operand" "=v")
+	(unspec:VF_AVX512VLDQ_AVX10_1
+	  [(match_operand:VF_AVX512VLDQ_AVX10_1 1 "register_operand" "v")
+	   (match_operand:VF_AVX512VLDQ_AVX10_1 2 "<round_saeonly_nimm_predicate>" "<round_saeonly_constraint>")
 	   (match_operand:SI 3 "const_0_to_15_operand")]
 	  UNSPEC_RANGE))]
-  "TARGET_AVX512DQ && <round_saeonly_mode512bit_condition>"
+  "<round_saeonly_mode512bit_condition>"
 {
   if (TARGET_DEST_FALSE_DEP_FOR_GLC
       && <mask4_dest_false_dep_for_glc_cond>
@@ -28594,10 +28616,10 @@
 (define_insn "avx512dq_fpclass<mode><mask_scalar_merge_name>"
   [(set (match_operand:<avx512fmaskmode> 0 "register_operand" "=k")
           (unspec:<avx512fmaskmode>
-            [(match_operand:VFH_AVX512VL 1 "vector_operand" "vm")
+            [(match_operand:VFH_AVX512VLDQ_AVX10_1 1 "vector_operand" "vm")
              (match_operand 2 "const_0_to_255_operand")]
              UNSPEC_FPCLASS))]
-   "TARGET_AVX512DQ || VALID_AVX512FP16_REG_MODE(<MODE>mode)"
+   ""
    "vfpclass<ssemodesuffix><vecmemsuffix>\t{%2, %1, %0<mask_scalar_merge_operand3>|%0<mask_scalar_merge_operand3>, %1, %2}";
   [(set_attr "type" "sse")
    (set_attr "length_immediate" "1")
diff --git a/gcc/config/i386/subst.md b/gcc/config/i386/subst.md
index 59c4b395a9d..fe923458ab8 100644
--- a/gcc/config/i386/subst.md
+++ b/gcc/config/i386/subst.md
@@ -65,7 +65,7 @@
 							    || TARGET_AVX10_1)")
 (define_subst_attr "mask_avx512vl_condition" "mask" "1" "(TARGET_AVX512VL || TARGET_AVX10_1)")
 (define_subst_attr "mask_avx512bw_condition" "mask" "1" "TARGET_AVX512BW")
-(define_subst_attr "mask_avx512dq_condition" "mask" "1" "TARGET_AVX512DQ")
+(define_subst_attr "mask_avx512dq_condition" "mask" "1" "(TARGET_AVX512DQ || TARGET_AVX10_1)")
 (define_subst_attr "mask_prefix" "mask" "vex" "evex")
 (define_subst_attr "mask_prefix2" "mask" "maybe_vex" "evex")
 (define_subst_attr "mask_prefix3" "mask" "orig,vex" "evex,evex")
@@ -120,7 +120,7 @@
 (define_subst "mask_scalar_merge"
   [(set (match_operand:SUBST_S 0)
         (match_operand:SUBST_S 1))]
-  "TARGET_AVX512F"
+  "TARGET_AVX512F || TARGET_AVX10_1"
   [(set (match_dup 0)
         (and:SUBST_S
 	  (match_dup 1)
-- 
2.31.1


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH 6/6] Support AVX10.1 for AVX512DQ+AVX512VL intrins
  2023-08-08  7:13 Intel AVX10.1 Compiler Design and Support Haochen Jiang
                   ` (7 preceding siblings ...)
  2023-08-08  7:20 ` [PATCH 5/6] " Haochen Jiang
@ 2023-08-08  7:20 ` Haochen Jiang
  2023-08-16  2:36   ` Hongtao Liu
  2023-08-08  7:42 ` Intel AVX10.1 Compiler Design and Support Jakub Jelinek
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 88+ messages in thread
From: Haochen Jiang @ 2023-08-08  7:20 UTC (permalink / raw)
  To: gcc-patches; +Cc: ubizjak, hongtao.liu

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx10_1-vextractf64x2-1.c: New test.
	* gcc.target/i386/avx10_1-vextracti64x2-1.c: Ditto.
	* gcc.target/i386/avx10_1-vfpclasspd-1.c: Ditto.
	* gcc.target/i386/avx10_1-vfpclassps-1.c: Ditto.
	* gcc.target/i386/avx10_1-vinsertf64x2-1.c: Ditto.
	* gcc.target/i386/avx10_1-vinserti64x2-1.c: Ditto.
	* gcc.target/i386/avx10_1-vrangepd-1.c: Ditto.
	* gcc.target/i386/avx10_1-vrangeps-1.c: Ditto.
	* gcc.target/i386/avx10_1-vreducepd-1.c: Ditto.
	* gcc.target/i386/avx10_1-vreduceps-1.c: Ditto.
---
 .../gcc.target/i386/avx10_1-vextractf64x2-1.c | 18 ++++++++++++
 .../gcc.target/i386/avx10_1-vextracti64x2-1.c | 19 ++++++++++++
 .../gcc.target/i386/avx10_1-vfpclasspd-1.c    | 21 ++++++++++++++
 .../gcc.target/i386/avx10_1-vfpclassps-1.c    | 21 ++++++++++++++
 .../gcc.target/i386/avx10_1-vinsertf64x2-1.c  | 18 ++++++++++++
 .../gcc.target/i386/avx10_1-vinserti64x2-1.c  | 18 ++++++++++++
 .../gcc.target/i386/avx10_1-vrangepd-1.c      | 27 +++++++++++++++++
 .../gcc.target/i386/avx10_1-vrangeps-1.c      | 27 +++++++++++++++++
 .../gcc.target/i386/avx10_1-vreducepd-1.c     | 29 +++++++++++++++++++
 .../gcc.target/i386/avx10_1-vreduceps-1.c     | 29 +++++++++++++++++++
 10 files changed, 227 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-vextractf64x2-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-vextracti64x2-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-vfpclasspd-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-vfpclassps-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-vinsertf64x2-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-vinserti64x2-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-vrangepd-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-vrangeps-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-vreducepd-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-vreduceps-1.c

diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-vextractf64x2-1.c b/gcc/testsuite/gcc.target/i386/avx10_1-vextractf64x2-1.c
new file mode 100644
index 00000000000..4c7e54dc198
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-vextractf64x2-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx10.1 -O2" } */
+/* { dg-final { scan-assembler-times "vextractf64x2\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+.{7}(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vextractf64x2\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+.{7}\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vextractf64x2\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+.{7}\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  1 } } */
+
+#include <immintrin.h>
+
+volatile __m256d x;
+volatile __m128d y;
+
+void extern
+avx10_1_test (void)
+{
+  y = _mm256_extractf64x2_pd (x, 1);
+  y = _mm256_mask_extractf64x2_pd (y, 2, x, 1);
+  y = _mm256_maskz_extractf64x2_pd (2, x, 1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-vextracti64x2-1.c b/gcc/testsuite/gcc.target/i386/avx10_1-vextracti64x2-1.c
new file mode 100644
index 00000000000..c0bd7700d52
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-vextracti64x2-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx10.1 -O2" } */
+/* { dg-final { scan-assembler-times "vextracti64x2\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+.{7}(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vextracti64x2\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+.{7}\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vextracti64x2\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+.{7}\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  1 } } */
+
+#include <immintrin.h>
+
+volatile __m256i x;
+volatile __m128i y;
+
+void extern
+avx10_1_test (void)
+{
+  y = _mm256_extracti64x2_epi64 (x, 1);
+  y = _mm256_mask_extracti64x2_epi64 (y, 2, x, 1);
+  y = _mm256_maskz_extracti64x2_epi64 (2, x, 1);
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-vfpclasspd-1.c b/gcc/testsuite/gcc.target/i386/avx10_1-vfpclasspd-1.c
new file mode 100644
index 00000000000..806ba800023
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-vfpclasspd-1.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx10.1 -O2" } */
+/* { dg-final { scan-assembler-times "vfpclasspdy\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n^k\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfpclasspdx\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n^k\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfpclasspdy\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n^k\]*%k\[0-7\]\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfpclasspdx\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n^k\]*%k\[0-7\]\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256d x256;
+volatile __m128d x128;
+volatile __mmask8 m;
+
+void extern
+avx10_1_test (void)
+{
+  m = _mm256_fpclass_pd_mask (x256, 13);
+  m = _mm_fpclass_pd_mask (x128, 13);
+  m = _mm256_mask_fpclass_pd_mask (2, x256, 13);
+  m = _mm_mask_fpclass_pd_mask (2, x128, 13);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-vfpclassps-1.c b/gcc/testsuite/gcc.target/i386/avx10_1-vfpclassps-1.c
new file mode 100644
index 00000000000..174903c7676
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-vfpclassps-1.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx10.1 -O2" } */
+/* { dg-final { scan-assembler-times "vfpclasspsy\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n^k\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfpclasspsx\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n^k\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfpclasspsy\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n^k\]*%k\[0-7\]\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vfpclasspsx\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n^k\]*%k\[0-7\]\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256 x256;
+volatile __m128 x128;
+volatile __mmask8 m;
+
+void extern
+avx10_1_test (void)
+{
+  m = _mm256_fpclass_ps_mask (x256, 13);
+  m = _mm_fpclass_ps_mask (x128, 13);
+  m = _mm256_mask_fpclass_ps_mask (2, x256, 13);
+  m = _mm_mask_fpclass_ps_mask (2, x128, 13);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-vinsertf64x2-1.c b/gcc/testsuite/gcc.target/i386/avx10_1-vinsertf64x2-1.c
new file mode 100644
index 00000000000..5a196844e76
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-vinsertf64x2-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx10.1 -O2" } */
+/* { dg-final { scan-assembler-times "vinsertf64x2\[^\n\]*ymm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vinsertf64x2\[^\n\]*ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vinsertf64x2\[^\n\]*ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  1 } } */
+
+#include <immintrin.h>
+
+volatile __m256d x;
+volatile __m128d y;
+
+void extern
+avx10_1_test (void)
+{
+  x = _mm256_insertf64x2 (x, y, 1);
+  x = _mm256_mask_insertf64x2 (x, 2, x, y, 1);
+  x = _mm256_maskz_insertf64x2 (2, x, y, 1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-vinserti64x2-1.c b/gcc/testsuite/gcc.target/i386/avx10_1-vinserti64x2-1.c
new file mode 100644
index 00000000000..69ee06f0f08
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-vinserti64x2-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx10.1 -O2" } */
+/* { dg-final { scan-assembler-times "vinserti64x2\[^\n\]*ymm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vinserti64x2\[^\n\]*ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vinserti64x2\[^\n\]*ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  1 } } */
+
+#include <immintrin.h>
+
+volatile __m256i x;
+volatile __m128i y;
+
+void extern
+avx10_1_test (void)
+{
+  x = _mm256_inserti64x2 (x, y, 1);
+  x = _mm256_mask_inserti64x2 (x, 2, x, y, 1);
+  x = _mm256_maskz_inserti64x2 (2, x, y, 1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-vrangepd-1.c b/gcc/testsuite/gcc.target/i386/avx10_1-vrangepd-1.c
new file mode 100644
index 00000000000..995b6de64ae
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-vrangepd-1.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx10.1 -O2" } */
+/* { dg-final { scan-assembler-times "vrangepd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vrangepd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vrangepd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vrangepd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vrangepd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vrangepd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256d y;
+volatile __m128d x;
+volatile __mmask8 m;
+
+void extern
+avx10_1_test (void)
+{
+  y = _mm256_range_pd (y, y, 15);
+  x = _mm_range_pd (x, x, 15);
+
+  y = _mm256_mask_range_pd (y, m, y, y, 15);
+  x = _mm_mask_range_pd (x, m, x, x, 15);
+
+  y = _mm256_maskz_range_pd (m, y, y, 15);
+  x = _mm_maskz_range_pd (m, x, x, 15);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-vrangeps-1.c b/gcc/testsuite/gcc.target/i386/avx10_1-vrangeps-1.c
new file mode 100644
index 00000000000..faf844a9ae1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-vrangeps-1.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx10.1 -O2" } */
+/* { dg-final { scan-assembler-times "vrangeps\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vrangeps\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vrangeps\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vrangeps\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vrangeps\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vrangeps\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+volatile __m256 y;
+volatile __m128 x;
+volatile __mmask8 m;
+
+void extern
+avx10_1_test (void)
+{
+  y = _mm256_range_ps (y, y, 15);
+  x = _mm_range_ps (x, x, 15);
+
+  y = _mm256_mask_range_ps (y, m, y, y, 15);
+  x = _mm_mask_range_ps (x, m, x, x, 15);
+
+  y = _mm256_maskz_range_ps (m, y, y, 15);
+  x = _mm_maskz_range_ps (m, x, x, 15);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-vreducepd-1.c b/gcc/testsuite/gcc.target/i386/avx10_1-vreducepd-1.c
new file mode 100644
index 00000000000..76bcec0d2f6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-vreducepd-1.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx10.1 -O2" } */
+/* { dg-final { scan-assembler-times "vreducepd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vreducepd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vreducepd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vreducepd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vreducepd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vreducepd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+#define IMM 123
+
+volatile __m256d x1;
+volatile __m128d x2;
+volatile __mmask8 m;
+
+void extern
+avx156p_test (void)
+{
+  x1 = _mm256_reduce_pd (x1, IMM);
+  x2 = _mm_reduce_pd (x2, IMM);
+
+  x1 = _mm256_mask_reduce_pd (x1, m, x1, IMM);
+  x2 = _mm_mask_reduce_pd (x2, m, x2, IMM);
+
+  x1 = _mm256_maskz_reduce_pd (m, x1, IMM);
+  x2 = _mm_maskz_reduce_pd (m, x2, IMM);
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-vreduceps-1.c b/gcc/testsuite/gcc.target/i386/avx10_1-vreduceps-1.c
new file mode 100644
index 00000000000..9d3aeb362fc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-vreduceps-1.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx10.1 -O2" } */
+/* { dg-final { scan-assembler-times "vreduceps\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vreduceps\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vreduceps\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vreduceps\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vreduceps\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vreduceps\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+
+#include <immintrin.h>
+
+#define IMM 123
+
+volatile __m256 x1;
+volatile __m128 x2;
+volatile __mmask8 m;
+
+void extern
+avx10_1_test (void)
+{
+  x1 = _mm256_reduce_ps (x1, IMM);
+  x2 = _mm_reduce_ps (x2, IMM);
+
+  x1 = _mm256_mask_reduce_ps (x1, m, x1, IMM);
+  x2 = _mm_mask_reduce_ps (x2, m, x2, IMM);
+
+  x1 = _mm256_maskz_reduce_ps (m, x1, IMM);
+  x2 = _mm_maskz_reduce_ps (m, x2, IMM);
+}
-- 
2.31.1


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 6/6] Support AVX10.1 for AVX512DQ+AVX512VL intrins
  2023-08-08  7:20 ` [PATCH 6/6] " Haochen Jiang
@ 2023-08-16  2:36   ` Hongtao Liu
  0 siblings, 0 replies; 88+ messages in thread
From: Hongtao Liu @ 2023-08-16  2:36 UTC (permalink / raw)
  To: Haochen Jiang; +Cc: gcc-patches, ubizjak, hongtao.liu

On Tue, Aug 8, 2023 at 3:23 PM Haochen Jiang via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/i386/avx10_1-vextractf64x2-1.c: New test.
>         * gcc.target/i386/avx10_1-vextracti64x2-1.c: Ditto.
>         * gcc.target/i386/avx10_1-vfpclasspd-1.c: Ditto.
>         * gcc.target/i386/avx10_1-vfpclassps-1.c: Ditto.
>         * gcc.target/i386/avx10_1-vinsertf64x2-1.c: Ditto.
>         * gcc.target/i386/avx10_1-vinserti64x2-1.c: Ditto.
>         * gcc.target/i386/avx10_1-vrangepd-1.c: Ditto.
>         * gcc.target/i386/avx10_1-vrangeps-1.c: Ditto.
>         * gcc.target/i386/avx10_1-vreducepd-1.c: Ditto.
>         * gcc.target/i386/avx10_1-vreduceps-1.c: Ditto.
Ok for all 6 patches(please wait for extra 24 hours to commit, if
there's no objection).
> ---
>  .../gcc.target/i386/avx10_1-vextractf64x2-1.c | 18 ++++++++++++
>  .../gcc.target/i386/avx10_1-vextracti64x2-1.c | 19 ++++++++++++
>  .../gcc.target/i386/avx10_1-vfpclasspd-1.c    | 21 ++++++++++++++
>  .../gcc.target/i386/avx10_1-vfpclassps-1.c    | 21 ++++++++++++++
>  .../gcc.target/i386/avx10_1-vinsertf64x2-1.c  | 18 ++++++++++++
>  .../gcc.target/i386/avx10_1-vinserti64x2-1.c  | 18 ++++++++++++
>  .../gcc.target/i386/avx10_1-vrangepd-1.c      | 27 +++++++++++++++++
>  .../gcc.target/i386/avx10_1-vrangeps-1.c      | 27 +++++++++++++++++
>  .../gcc.target/i386/avx10_1-vreducepd-1.c     | 29 +++++++++++++++++++
>  .../gcc.target/i386/avx10_1-vreduceps-1.c     | 29 +++++++++++++++++++
>  10 files changed, 227 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-vextractf64x2-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-vextracti64x2-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-vfpclasspd-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-vfpclassps-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-vinsertf64x2-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-vinserti64x2-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-vrangepd-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-vrangeps-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-vreducepd-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-vreduceps-1.c
>
> diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-vextractf64x2-1.c b/gcc/testsuite/gcc.target/i386/avx10_1-vextractf64x2-1.c
> new file mode 100644
> index 00000000000..4c7e54dc198
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx10_1-vextractf64x2-1.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mavx10.1 -O2" } */
> +/* { dg-final { scan-assembler-times "vextractf64x2\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+.{7}(?:\n|\[ \\t\]+#)"  1 } } */
> +/* { dg-final { scan-assembler-times "vextractf64x2\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+.{7}\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)"  1 } } */
> +/* { dg-final { scan-assembler-times "vextractf64x2\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+.{7}\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  1 } } */
> +
> +#include <immintrin.h>
> +
> +volatile __m256d x;
> +volatile __m128d y;
> +
> +void extern
> +avx10_1_test (void)
> +{
> +  y = _mm256_extractf64x2_pd (x, 1);
> +  y = _mm256_mask_extractf64x2_pd (y, 2, x, 1);
> +  y = _mm256_maskz_extractf64x2_pd (2, x, 1);
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-vextracti64x2-1.c b/gcc/testsuite/gcc.target/i386/avx10_1-vextracti64x2-1.c
> new file mode 100644
> index 00000000000..c0bd7700d52
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx10_1-vextracti64x2-1.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mavx10.1 -O2" } */
> +/* { dg-final { scan-assembler-times "vextracti64x2\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+.{7}(?:\n|\[ \\t\]+#)"  1 } } */
> +/* { dg-final { scan-assembler-times "vextracti64x2\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+.{7}\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)"  1 } } */
> +/* { dg-final { scan-assembler-times "vextracti64x2\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+.{7}\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  1 } } */
> +
> +#include <immintrin.h>
> +
> +volatile __m256i x;
> +volatile __m128i y;
> +
> +void extern
> +avx10_1_test (void)
> +{
> +  y = _mm256_extracti64x2_epi64 (x, 1);
> +  y = _mm256_mask_extracti64x2_epi64 (y, 2, x, 1);
> +  y = _mm256_maskz_extracti64x2_epi64 (2, x, 1);
> +}
> +
> diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-vfpclasspd-1.c b/gcc/testsuite/gcc.target/i386/avx10_1-vfpclasspd-1.c
> new file mode 100644
> index 00000000000..806ba800023
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx10_1-vfpclasspd-1.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mavx10.1 -O2" } */
> +/* { dg-final { scan-assembler-times "vfpclasspdy\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n^k\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
> +/* { dg-final { scan-assembler-times "vfpclasspdx\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n^k\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
> +/* { dg-final { scan-assembler-times "vfpclasspdy\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n^k\]*%k\[0-7\]\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
> +/* { dg-final { scan-assembler-times "vfpclasspdx\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n^k\]*%k\[0-7\]\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
> +
> +#include <immintrin.h>
> +
> +volatile __m256d x256;
> +volatile __m128d x128;
> +volatile __mmask8 m;
> +
> +void extern
> +avx10_1_test (void)
> +{
> +  m = _mm256_fpclass_pd_mask (x256, 13);
> +  m = _mm_fpclass_pd_mask (x128, 13);
> +  m = _mm256_mask_fpclass_pd_mask (2, x256, 13);
> +  m = _mm_mask_fpclass_pd_mask (2, x128, 13);
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-vfpclassps-1.c b/gcc/testsuite/gcc.target/i386/avx10_1-vfpclassps-1.c
> new file mode 100644
> index 00000000000..174903c7676
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx10_1-vfpclassps-1.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mavx10.1 -O2" } */
> +/* { dg-final { scan-assembler-times "vfpclasspsy\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n^k\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
> +/* { dg-final { scan-assembler-times "vfpclasspsx\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n^k\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
> +/* { dg-final { scan-assembler-times "vfpclasspsy\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n^k\]*%k\[0-7\]\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
> +/* { dg-final { scan-assembler-times "vfpclasspsx\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n^k\]*%k\[0-7\]\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
> +
> +#include <immintrin.h>
> +
> +volatile __m256 x256;
> +volatile __m128 x128;
> +volatile __mmask8 m;
> +
> +void extern
> +avx10_1_test (void)
> +{
> +  m = _mm256_fpclass_ps_mask (x256, 13);
> +  m = _mm_fpclass_ps_mask (x128, 13);
> +  m = _mm256_mask_fpclass_ps_mask (2, x256, 13);
> +  m = _mm_mask_fpclass_ps_mask (2, x128, 13);
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-vinsertf64x2-1.c b/gcc/testsuite/gcc.target/i386/avx10_1-vinsertf64x2-1.c
> new file mode 100644
> index 00000000000..5a196844e76
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx10_1-vinsertf64x2-1.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mavx10.1 -O2" } */
> +/* { dg-final { scan-assembler-times "vinsertf64x2\[^\n\]*ymm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
> +/* { dg-final { scan-assembler-times "vinsertf64x2\[^\n\]*ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)"  1 } } */
> +/* { dg-final { scan-assembler-times "vinsertf64x2\[^\n\]*ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  1 } } */
> +
> +#include <immintrin.h>
> +
> +volatile __m256d x;
> +volatile __m128d y;
> +
> +void extern
> +avx10_1_test (void)
> +{
> +  x = _mm256_insertf64x2 (x, y, 1);
> +  x = _mm256_mask_insertf64x2 (x, 2, x, y, 1);
> +  x = _mm256_maskz_insertf64x2 (2, x, y, 1);
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-vinserti64x2-1.c b/gcc/testsuite/gcc.target/i386/avx10_1-vinserti64x2-1.c
> new file mode 100644
> index 00000000000..69ee06f0f08
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx10_1-vinserti64x2-1.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mavx10.1 -O2" } */
> +/* { dg-final { scan-assembler-times "vinserti64x2\[^\n\]*ymm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
> +/* { dg-final { scan-assembler-times "vinserti64x2\[^\n\]*ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)"  1 } } */
> +/* { dg-final { scan-assembler-times "vinserti64x2\[^\n\]*ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  1 } } */
> +
> +#include <immintrin.h>
> +
> +volatile __m256i x;
> +volatile __m128i y;
> +
> +void extern
> +avx10_1_test (void)
> +{
> +  x = _mm256_inserti64x2 (x, y, 1);
> +  x = _mm256_mask_inserti64x2 (x, 2, x, y, 1);
> +  x = _mm256_maskz_inserti64x2 (2, x, y, 1);
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-vrangepd-1.c b/gcc/testsuite/gcc.target/i386/avx10_1-vrangepd-1.c
> new file mode 100644
> index 00000000000..995b6de64ae
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx10_1-vrangepd-1.c
> @@ -0,0 +1,27 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mavx10.1 -O2" } */
> +/* { dg-final { scan-assembler-times "vrangepd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
> +/* { dg-final { scan-assembler-times "vrangepd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
> +/* { dg-final { scan-assembler-times "vrangepd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
> +/* { dg-final { scan-assembler-times "vrangepd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
> +/* { dg-final { scan-assembler-times "vrangepd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
> +/* { dg-final { scan-assembler-times "vrangepd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
> +
> +#include <immintrin.h>
> +
> +volatile __m256d y;
> +volatile __m128d x;
> +volatile __mmask8 m;
> +
> +void extern
> +avx10_1_test (void)
> +{
> +  y = _mm256_range_pd (y, y, 15);
> +  x = _mm_range_pd (x, x, 15);
> +
> +  y = _mm256_mask_range_pd (y, m, y, y, 15);
> +  x = _mm_mask_range_pd (x, m, x, x, 15);
> +
> +  y = _mm256_maskz_range_pd (m, y, y, 15);
> +  x = _mm_maskz_range_pd (m, x, x, 15);
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-vrangeps-1.c b/gcc/testsuite/gcc.target/i386/avx10_1-vrangeps-1.c
> new file mode 100644
> index 00000000000..faf844a9ae1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx10_1-vrangeps-1.c
> @@ -0,0 +1,27 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mavx10.1 -O2" } */
> +/* { dg-final { scan-assembler-times "vrangeps\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
> +/* { dg-final { scan-assembler-times "vrangeps\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
> +/* { dg-final { scan-assembler-times "vrangeps\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
> +/* { dg-final { scan-assembler-times "vrangeps\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
> +/* { dg-final { scan-assembler-times "vrangeps\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
> +/* { dg-final { scan-assembler-times "vrangeps\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
> +
> +#include <immintrin.h>
> +
> +volatile __m256 y;
> +volatile __m128 x;
> +volatile __mmask8 m;
> +
> +void extern
> +avx10_1_test (void)
> +{
> +  y = _mm256_range_ps (y, y, 15);
> +  x = _mm_range_ps (x, x, 15);
> +
> +  y = _mm256_mask_range_ps (y, m, y, y, 15);
> +  x = _mm_mask_range_ps (x, m, x, x, 15);
> +
> +  y = _mm256_maskz_range_ps (m, y, y, 15);
> +  x = _mm_maskz_range_ps (m, x, x, 15);
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-vreducepd-1.c b/gcc/testsuite/gcc.target/i386/avx10_1-vreducepd-1.c
> new file mode 100644
> index 00000000000..76bcec0d2f6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx10_1-vreducepd-1.c
> @@ -0,0 +1,29 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mavx10.1 -O2" } */
> +/* { dg-final { scan-assembler-times "vreducepd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
> +/* { dg-final { scan-assembler-times "vreducepd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
> +/* { dg-final { scan-assembler-times "vreducepd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
> +/* { dg-final { scan-assembler-times "vreducepd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
> +/* { dg-final { scan-assembler-times "vreducepd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
> +/* { dg-final { scan-assembler-times "vreducepd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
> +
> +#include <immintrin.h>
> +
> +#define IMM 123
> +
> +volatile __m256d x1;
> +volatile __m128d x2;
> +volatile __mmask8 m;
> +
> +void extern
> +avx156p_test (void)
> +{
> +  x1 = _mm256_reduce_pd (x1, IMM);
> +  x2 = _mm_reduce_pd (x2, IMM);
> +
> +  x1 = _mm256_mask_reduce_pd (x1, m, x1, IMM);
> +  x2 = _mm_mask_reduce_pd (x2, m, x2, IMM);
> +
> +  x1 = _mm256_maskz_reduce_pd (m, x1, IMM);
> +  x2 = _mm_maskz_reduce_pd (m, x2, IMM);
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-vreduceps-1.c b/gcc/testsuite/gcc.target/i386/avx10_1-vreduceps-1.c
> new file mode 100644
> index 00000000000..9d3aeb362fc
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx10_1-vreduceps-1.c
> @@ -0,0 +1,29 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mavx10.1 -O2" } */
> +/* { dg-final { scan-assembler-times "vreduceps\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
> +/* { dg-final { scan-assembler-times "vreduceps\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
> +/* { dg-final { scan-assembler-times "vreduceps\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
> +/* { dg-final { scan-assembler-times "vreduceps\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
> +/* { dg-final { scan-assembler-times "vreduceps\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
> +/* { dg-final { scan-assembler-times "vreduceps\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
> +
> +#include <immintrin.h>
> +
> +#define IMM 123
> +
> +volatile __m256 x1;
> +volatile __m128 x2;
> +volatile __mmask8 m;
> +
> +void extern
> +avx10_1_test (void)
> +{
> +  x1 = _mm256_reduce_ps (x1, IMM);
> +  x2 = _mm_reduce_ps (x2, IMM);
> +
> +  x1 = _mm256_mask_reduce_ps (x1, m, x1, IMM);
> +  x2 = _mm_mask_reduce_ps (x2, m, x2, IMM);
> +
> +  x1 = _mm256_maskz_reduce_ps (m, x1, IMM);
> +  x2 = _mm_maskz_reduce_ps (m, x2, IMM);
> +}
> --
> 2.31.1
>


-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Intel AVX10.1 Compiler Design and Support
  2023-08-08  7:13 Intel AVX10.1 Compiler Design and Support Haochen Jiang
                   ` (8 preceding siblings ...)
  2023-08-08  7:20 ` [PATCH 6/6] " Haochen Jiang
@ 2023-08-08  7:42 ` Jakub Jelinek
  2023-08-08  8:14   ` Jiang, Haochen
  2023-08-08 19:55 ` Joseph Myers
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 88+ messages in thread
From: Jakub Jelinek @ 2023-08-08  7:42 UTC (permalink / raw)
  To: Haochen Jiang; +Cc: gcc-patches, ubizjak, hongtao.liu

On Tue, Aug 08, 2023 at 03:13:09PM +0800, Haochen Jiang via Gcc-patches wrote:
> We will send out our initial support of AVX10 and some sample patches in this
> mailing thread. And there will be more coming up afterwards. Therefore, we would
> like to share our proposed AVX10 design in GCC.
> 
> Here is a quick introduction to AVX10:
>   - AVX10 is the first major new ISA since the introduction of AVX512 in 2013.
>   - Since the introduction of AVX10, we would like to establish a common,
>     converged vector instruction set across all Intel architectures, including
>     Xeon Server, Atom Server and Clients.
>   - The default maximum vector size for AVX10 will be 256 bit, while 512 bit is
>     optional.

So, what does this imply for the current ISAs?
The expectations in lots of config/i386/* is that -mavx512f / TARGET_AVX512F
means 512 bit vector support is available and most of the various -mavx512XXX
options imply -mavx512f (and -mno-avx512f turns those off).  And if
-mavx512vl / TARGET_AVX512VL isn't available, tons of places just use
512-bit EVEX instructions for 256-bit or 128-bit stuff (mostly to be able to
access [xy]mm16+).
Sure, I expect all AVX10.N CPUs will have AVX512VL CPUID, will they have
AVX512F CPUID even when the 512-bit vectors aren't present? What happens if
one mixes the -mavx10* options together with -mno-avx512vl or similar
options?  Will -mno-avx512f still imply -mno-avx512vl etc.?

	Jakub


^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: Intel AVX10.1 Compiler Design and Support
  2023-08-08  7:42 ` Intel AVX10.1 Compiler Design and Support Jakub Jelinek
@ 2023-08-08  8:14   ` Jiang, Haochen
  2023-08-08 12:44     ` Richard Biener
  0 siblings, 1 reply; 88+ messages in thread
From: Jiang, Haochen @ 2023-08-08  8:14 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches, ubizjak, Liu, Hongtao

Hi Jakub,

> So, what does this imply for the current ISAs?

AVX10 will imply AVX2 on the ISA level. And we suppose AVX10 is an
independent ISA feature set. Although sharing the same instructions and
encodings, AVX10 and AVX512 are conceptual independent features, which
means they are orthogonal.

> The expectations in lots of config/i386/* is that -mavx512f / TARGET_AVX512F
> means 512 bit vector support is available and most of the various -mavx512XXX
> options imply -mavx512f (and -mno-avx512f turns those off).  And if
> -mavx512vl / TARGET_AVX512VL isn't available, tons of places just use
> 512-bit EVEX instructions for 256-bit or 128-bit stuff (mostly to be able to
> access [xy]mm16+).

For AVX10, the 128/256/scalar version of the instructions are always there, and
also for [xy]mm16+. 512 version is "optional", which needs user to indicate them
in options. When 512 version is enabled, 128/256/scalar version is also enabled,
which is kind of reverse relation between the current AVX512F/AVX512VL.

Since we take AVX10 and AVX512 are orthogonal, we will add OR logic for the current
pattern, which is shown in our AVX512DQ+VL sample patches.

> Sure, I expect all AVX10.N CPUs will have AVX512VL CPUID, will they have
> AVX512F CPUID even when the 512-bit vectors aren't present? What happens if
> one mixes the -mavx10* options together with -mno-avx512vl or similar
> options?  Will -mno-avx512f still imply -mno-avx512vl etc.?

For the CPUID part, AVX10 and AVX512 have different emulation. Only Xeon Server
will have AVX512 related CPUIDs for backward compatibility. For GNR, it will be
AVX512F, AVX512VL, AVX512CD, AVX512BW, AVX512DQ, AVX512_IFMA, AVX512_VBMI,
AVX512_VNNI, AVX512_BF16, AVX512_BITALG, AVX512_VPOPCNTDQ, AV512_VBMI2,
AVX512_FP16. Also, it will have AVX10 CPUIDs with 512 bit support set. Atom Server and
client will only have AVX10 CPUIDs with 256 bit support set.

-mno-avx512f will still imply -mno-avx512vl.

As we mentioned below, we don't recommend users to combine the AVX10 and legacy
AVX512 options. We understand that there will be different opinions on what should
compiler behave on some controversial option combinations.

If there is someone mixes the options, the golden rule is that we are using OR logic.
Therefore, enabling either feature will turn on the shared instructions, no matter the other
feature is not mentioned or closed. That is why we are emitting warning for some scenarios,
which is also mentioned in the letter.

Thx,
Haochen

> 
> 	Jakub

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Intel AVX10.1 Compiler Design and Support
  2023-08-08  8:14   ` Jiang, Haochen
@ 2023-08-08 12:44     ` Richard Biener
  2023-08-09  2:06       ` Hongtao Liu
  2023-08-09  6:30       ` Jiang, Haochen
  0 siblings, 2 replies; 88+ messages in thread
From: Richard Biener @ 2023-08-08 12:44 UTC (permalink / raw)
  To: Jiang, Haochen; +Cc: Jakub Jelinek, gcc-patches, ubizjak, Liu, Hongtao

On Tue, Aug 8, 2023 at 10:15 AM Jiang, Haochen via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Hi Jakub,
>
> > So, what does this imply for the current ISAs?
>
> AVX10 will imply AVX2 on the ISA level. And we suppose AVX10 is an
> independent ISA feature set. Although sharing the same instructions and
> encodings, AVX10 and AVX512 are conceptual independent features, which
> means they are orthogonal.
>
> > The expectations in lots of config/i386/* is that -mavx512f / TARGET_AVX512F
> > means 512 bit vector support is available and most of the various -mavx512XXX
> > options imply -mavx512f (and -mno-avx512f turns those off).  And if
> > -mavx512vl / TARGET_AVX512VL isn't available, tons of places just use
> > 512-bit EVEX instructions for 256-bit or 128-bit stuff (mostly to be able to
> > access [xy]mm16+).
>
> For AVX10, the 128/256/scalar version of the instructions are always there, and
> also for [xy]mm16+. 512 version is "optional", which needs user to indicate them
> in options. When 512 version is enabled, 128/256/scalar version is also enabled,
> which is kind of reverse relation between the current AVX512F/AVX512VL.
>
> Since we take AVX10 and AVX512 are orthogonal, we will add OR logic for the current
> pattern, which is shown in our AVX512DQ+VL sample patches.

Hmm, so it sounds like AVX10 is currently, at the 10.1 level, a way to specify
AVX512F and AVX512VL "differently", so wouldn't it make sense to make it
complement those only so one can use, say, -mavx10 -mno-avx512bf16 to disable
parts of the former AVX512 ISA one doesn't like to get code generated for?
-mavx10 would then enable all the existing sub-AVX512 ISAs?

> > Sure, I expect all AVX10.N CPUs will have AVX512VL CPUID, will they have
> > AVX512F CPUID even when the 512-bit vectors aren't present? What happens if
> > one mixes the -mavx10* options together with -mno-avx512vl or similar
> > options?  Will -mno-avx512f still imply -mno-avx512vl etc.?
>
> For the CPUID part, AVX10 and AVX512 have different emulation. Only Xeon Server
> will have AVX512 related CPUIDs for backward compatibility. For GNR, it will be
> AVX512F, AVX512VL, AVX512CD, AVX512BW, AVX512DQ, AVX512_IFMA, AVX512_VBMI,
> AVX512_VNNI, AVX512_BF16, AVX512_BITALG, AVX512_VPOPCNTDQ, AV512_VBMI2,
> AVX512_FP16. Also, it will have AVX10 CPUIDs with 512 bit support set. Atom Server and
> client will only have AVX10 CPUIDs with 256 bit support set.
>
> -mno-avx512f will still imply -mno-avx512vl.
>
> As we mentioned below, we don't recommend users to combine the AVX10 and legacy
> AVX512 options. We understand that there will be different opinions on what should
> compiler behave on some controversial option combinations.
>
> If there is someone mixes the options, the golden rule is that we are using OR logic.
> Therefore, enabling either feature will turn on the shared instructions, no matter the other
> feature is not mentioned or closed. That is why we are emitting warning for some scenarios,
> which is also mentioned in the letter.

I'm refraining from commenting on the senslesness of AVX10 as you're
likely on the same
receiving side as us.

Thanks,
Richard.

> Thx,
> Haochen
>
> >
> >       Jakub
>

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Intel AVX10.1 Compiler Design and Support
  2023-08-08 12:44     ` Richard Biener
@ 2023-08-09  2:06       ` Hongtao Liu
  2023-08-09  2:08         ` Hongtao Liu
  2023-08-09  6:30       ` Jiang, Haochen
  1 sibling, 1 reply; 88+ messages in thread
From: Hongtao Liu @ 2023-08-09  2:06 UTC (permalink / raw)
  To: Richard Biener
  Cc: Jiang, Haochen, Jakub Jelinek, gcc-patches, ubizjak, Liu, Hongtao

On Tue, Aug 8, 2023 at 8:45 PM Richard Biener via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On Tue, Aug 8, 2023 at 10:15 AM Jiang, Haochen via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > Hi Jakub,
> >
> > > So, what does this imply for the current ISAs?
> >
> > AVX10 will imply AVX2 on the ISA level. And we suppose AVX10 is an
> > independent ISA feature set. Although sharing the same instructions and
> > encodings, AVX10 and AVX512 are conceptual independent features, which
> > means they are orthogonal.
> >
> > > The expectations in lots of config/i386/* is that -mavx512f / TARGET_AVX512F
> > > means 512 bit vector support is available and most of the various -mavx512XXX
> > > options imply -mavx512f (and -mno-avx512f turns those off).  And if
> > > -mavx512vl / TARGET_AVX512VL isn't available, tons of places just use
> > > 512-bit EVEX instructions for 256-bit or 128-bit stuff (mostly to be able to
> > > access [xy]mm16+).
> >
> > For AVX10, the 128/256/scalar version of the instructions are always there, and
> > also for [xy]mm16+. 512 version is "optional", which needs user to indicate them
> > in options. When 512 version is enabled, 128/256/scalar version is also enabled,
> > which is kind of reverse relation between the current AVX512F/AVX512VL.
> >
> > Since we take AVX10 and AVX512 are orthogonal, we will add OR logic for the current
> > pattern, which is shown in our AVX512DQ+VL sample patches.
>
> Hmm, so it sounds like AVX10 is currently, at the 10.1 level, a way to specify
> AVX512F and AVX512VL "differently", so wouldn't it make sense to make it
In the future there're plantfomrs only support AVX10.x-256, but not
AVX512 stuffs, it doesn't make much sense on that platfrom to disable
part of AVX512.
We really want to make AVX10.x a indivisible features, just like other
individual CPUID.
> complement those only so one can use, say, -mavx10 -mno-avx512bf16 to disable
> parts of the former AVX512 ISA one doesn't like to get code generated for?
> -mavx10 would then enable all the existing sub-AVX512 ISAs?
Another alternative solution is
>
> > > Sure, I expect all AVX10.N CPUs will have AVX512VL CPUID, will they have
> > > AVX512F CPUID even when the 512-bit vectors aren't present? What happens if
> > > one mixes the -mavx10* options together with -mno-avx512vl or similar
> > > options?  Will -mno-avx512f still imply -mno-avx512vl etc.?
> >
> > For the CPUID part, AVX10 and AVX512 have different emulation. Only Xeon Server
> > will have AVX512 related CPUIDs for backward compatibility. For GNR, it will be
> > AVX512F, AVX512VL, AVX512CD, AVX512BW, AVX512DQ, AVX512_IFMA, AVX512_VBMI,
> > AVX512_VNNI, AVX512_BF16, AVX512_BITALG, AVX512_VPOPCNTDQ, AV512_VBMI2,
> > AVX512_FP16. Also, it will have AVX10 CPUIDs with 512 bit support set. Atom Server and
> > client will only have AVX10 CPUIDs with 256 bit support set.
> >
> > -mno-avx512f will still imply -mno-avx512vl.
> >
> > As we mentioned below, we don't recommend users to combine the AVX10 and legacy
> > AVX512 options. We understand that there will be different opinions on what should
> > compiler behave on some controversial option combinations.
> >
> > If there is someone mixes the options, the golden rule is that we are using OR logic.
> > Therefore, enabling either feature will turn on the shared instructions, no matter the other
> > feature is not mentioned or closed. That is why we are emitting warning for some scenarios,
> > which is also mentioned in the letter.
>
> I'm refraining from commenting on the senslesness of AVX10 as you're
> likely on the same
> receiving side as us.
>
> Thanks,
> Richard.
>
> > Thx,
> > Haochen
> >
> > >
> > >       Jakub
> >



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Intel AVX10.1 Compiler Design and Support
  2023-08-09  2:06       ` Hongtao Liu
@ 2023-08-09  2:08         ` Hongtao Liu
  0 siblings, 0 replies; 88+ messages in thread
From: Hongtao Liu @ 2023-08-09  2:08 UTC (permalink / raw)
  To: Richard Biener
  Cc: Jiang, Haochen, Jakub Jelinek, gcc-patches, ubizjak, Liu, Hongtao

On Wed, Aug 9, 2023 at 10:06 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Tue, Aug 8, 2023 at 8:45 PM Richard Biener via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > On Tue, Aug 8, 2023 at 10:15 AM Jiang, Haochen via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> > >
> > > Hi Jakub,
> > >
> > > > So, what does this imply for the current ISAs?
> > >
> > > AVX10 will imply AVX2 on the ISA level. And we suppose AVX10 is an
> > > independent ISA feature set. Although sharing the same instructions and
> > > encodings, AVX10 and AVX512 are conceptual independent features, which
> > > means they are orthogonal.
> > >
> > > > The expectations in lots of config/i386/* is that -mavx512f / TARGET_AVX512F
> > > > means 512 bit vector support is available and most of the various -mavx512XXX
> > > > options imply -mavx512f (and -mno-avx512f turns those off).  And if
> > > > -mavx512vl / TARGET_AVX512VL isn't available, tons of places just use
> > > > 512-bit EVEX instructions for 256-bit or 128-bit stuff (mostly to be able to
> > > > access [xy]mm16+).
> > >
> > > For AVX10, the 128/256/scalar version of the instructions are always there, and
> > > also for [xy]mm16+. 512 version is "optional", which needs user to indicate them
> > > in options. When 512 version is enabled, 128/256/scalar version is also enabled,
> > > which is kind of reverse relation between the current AVX512F/AVX512VL.
> > >
> > > Since we take AVX10 and AVX512 are orthogonal, we will add OR logic for the current
> > > pattern, which is shown in our AVX512DQ+VL sample patches.
> >
> > Hmm, so it sounds like AVX10 is currently, at the 10.1 level, a way to specify
> > AVX512F and AVX512VL "differently", so wouldn't it make sense to make it
> In the future there're plantfomrs only support AVX10.x-256, but not
> AVX512 stuffs, it doesn't make much sense on that platfrom to disable
> part of AVX512.
> We really want to make AVX10.x a indivisible features, just like other
> individual CPUID.
> > complement those only so one can use, say, -mavx10 -mno-avx512bf16 to disable
> > parts of the former AVX512 ISA one doesn't like to get code generated for?
> > -mavx10 would then enable all the existing sub-AVX512 ISAs?
> Another alternative solution is
is split AVX512 into AVX512-256 and AVX512-512, like AVX512F-256,
AVX512FP16-256, AVX512FP16-512, AVX512FP16-512, and make AVX10.1-256
implies those AVX512-256, AVX10.1-512 implies AVX512-512.
> >
> > > > Sure, I expect all AVX10.N CPUs will have AVX512VL CPUID, will they have
> > > > AVX512F CPUID even when the 512-bit vectors aren't present? What happens if
> > > > one mixes the -mavx10* options together with -mno-avx512vl or similar
> > > > options?  Will -mno-avx512f still imply -mno-avx512vl etc.?
> > >
> > > For the CPUID part, AVX10 and AVX512 have different emulation. Only Xeon Server
> > > will have AVX512 related CPUIDs for backward compatibility. For GNR, it will be
> > > AVX512F, AVX512VL, AVX512CD, AVX512BW, AVX512DQ, AVX512_IFMA, AVX512_VBMI,
> > > AVX512_VNNI, AVX512_BF16, AVX512_BITALG, AVX512_VPOPCNTDQ, AV512_VBMI2,
> > > AVX512_FP16. Also, it will have AVX10 CPUIDs with 512 bit support set. Atom Server and
> > > client will only have AVX10 CPUIDs with 256 bit support set.
> > >
> > > -mno-avx512f will still imply -mno-avx512vl.
> > >
> > > As we mentioned below, we don't recommend users to combine the AVX10 and legacy
> > > AVX512 options. We understand that there will be different opinions on what should
> > > compiler behave on some controversial option combinations.
> > >
> > > If there is someone mixes the options, the golden rule is that we are using OR logic.
> > > Therefore, enabling either feature will turn on the shared instructions, no matter the other
> > > feature is not mentioned or closed. That is why we are emitting warning for some scenarios,
> > > which is also mentioned in the letter.
> >
> > I'm refraining from commenting on the senslesness of AVX10 as you're
> > likely on the same
> > receiving side as us.
> >
> > Thanks,
> > Richard.
> >
> > > Thx,
> > > Haochen
> > >
> > > >
> > > >       Jakub
> > >
>
>
>
> --
> BR,
> Hongtao



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: Intel AVX10.1 Compiler Design and Support
  2023-08-08 12:44     ` Richard Biener
  2023-08-09  2:06       ` Hongtao Liu
@ 2023-08-09  6:30       ` Jiang, Haochen
  1 sibling, 0 replies; 88+ messages in thread
From: Jiang, Haochen @ 2023-08-09  6:30 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jakub Jelinek, gcc-patches, ubizjak, Liu, Hongtao

> -----Original Message-----
> From: Richard Biener <richard.guenther@gmail.com>
> Sent: Tuesday, August 8, 2023 8:45 PM
> To: Jiang, Haochen <haochen.jiang@intel.com>
> Cc: Jakub Jelinek <jakub@redhat.com>; gcc-patches@gcc.gnu.org;
> ubizjak@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>
> Subject: Re: Intel AVX10.1 Compiler Design and Support
> 
> On Tue, Aug 8, 2023 at 10:15 AM Jiang, Haochen via Gcc-patches <gcc-
> patches@gcc.gnu.org> wrote:
> >
> > Hi Jakub,
> >
> > > So, what does this imply for the current ISAs?
> >
> > AVX10 will imply AVX2 on the ISA level. And we suppose AVX10 is an
> > independent ISA feature set. Although sharing the same instructions
> > and encodings, AVX10 and AVX512 are conceptual independent features,
> > which means they are orthogonal.
> >
> > > The expectations in lots of config/i386/* is that -mavx512f /
> > > TARGET_AVX512F means 512 bit vector support is available and most of
> > > the various -mavx512XXX options imply -mavx512f (and -mno-avx512f
> > > turns those off).  And if -mavx512vl / TARGET_AVX512VL isn't
> > > available, tons of places just use 512-bit EVEX instructions for
> > > 256-bit or 128-bit stuff (mostly to be able to access [xy]mm16+).
> >
> > For AVX10, the 128/256/scalar version of the instructions are always
> > there, and also for [xy]mm16+. 512 version is "optional", which needs
> > user to indicate them in options. When 512 version is enabled,
> > 128/256/scalar version is also enabled, which is kind of reverse relation
> > between the current AVX512F/AVX512VL.
> >
> > Since we take AVX10 and AVX512 are orthogonal, we will add OR logic
> > for the current pattern, which is shown in our AVX512DQ+VL sample patches.
> 
> Hmm, so it sounds like AVX10 is currently, at the 10.1 level, a way to specify
> AVX512F and AVX512VL "differently", so wouldn't it make sense to make it
> complement those only so one can use, say, -mavx10 -mno-avx512bf16 to disable
> parts of the former AVX512 ISA one doesn't like to get code generated for?
> -mavx10 would then enable all the existing sub-AVX512 ISAs?
>

We take AVX10 and AVX512 two independent ISAs.

Therefore, it is quite weird to disable something with another unrelated ISA.
I don't think -mavx10.1 -mno-avx512f should disable anything.

Thx,
Haochen

> > > Sure, I expect all AVX10.N CPUs will have AVX512VL CPUID, will they
> > > have AVX512F CPUID even when the 512-bit vectors aren't present?
> > > What happens if one mixes the -mavx10* options together with
> > > -mno-avx512vl or similar options?  Will -mno-avx512f still imply -mno-avx512vl etc.?
> >
> > For the CPUID part, AVX10 and AVX512 have different emulation. Only
> > Xeon Server will have AVX512 related CPUIDs for backward
> > compatibility. For GNR, it will be AVX512F, AVX512VL, AVX512CD,
> > AVX512BW, AVX512DQ, AVX512_IFMA, AVX512_VBMI, AVX512_VNNI,
> > AVX512_BF16, AVX512_BITALG, AVX512_VPOPCNTDQ, AV512_VBMI2,
> > AVX512_FP16. Also, it will have AVX10 CPUIDs with 512 bit support set. Atom
> Server and client will only have AVX10 CPUIDs with 256 bit support set.
> >
> > -mno-avx512f will still imply -mno-avx512vl.
> >
> > As we mentioned below, we don't recommend users to combine the AVX10
> > and legacy
> > AVX512 options. We understand that there will be different opinions on
> > what should compiler behave on some controversial option combinations.
> >
> > If there is someone mixes the options, the golden rule is that we are using OR logic.
> > Therefore, enabling either feature will turn on the shared
> > instructions, no matter the other feature is not mentioned or closed.
> > That is why we are emitting warning for some scenarios, which is also
> > mentioned in the letter.
> 
> I'm refraining from commenting on the senslesness of AVX10 as you're likely on
> the same receiving side as us.
> 
> Thanks,
> Richard.
> 
> > Thx,
> > Haochen
> >
> > >
> > >       Jakub
> >

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Intel AVX10.1 Compiler Design and Support
  2023-08-08  7:13 Intel AVX10.1 Compiler Design and Support Haochen Jiang
                   ` (9 preceding siblings ...)
  2023-08-08  7:42 ` Intel AVX10.1 Compiler Design and Support Jakub Jelinek
@ 2023-08-08 19:55 ` Joseph Myers
  2023-08-09  1:21   ` Hongtao Liu
  2023-08-10 15:08 ` Jiang, Haochen
  2023-08-19 22:44 ` ZiNgA BuRgA
  12 siblings, 1 reply; 88+ messages in thread
From: Joseph Myers @ 2023-08-08 19:55 UTC (permalink / raw)
  To: Haochen Jiang; +Cc: gcc-patches, ubizjak, hongtao.liu

Do you have any comments on the interaction of AVX10 with the 
micro-architecture levels defined in the ABI (and supported with 
glibc-hwcaps directories in glibc)?  Given that the levels are cumulative, 
should we take it that any future levels will be ones supporting 512-bit 
vector width for AVX10 (because x86-64-v4 requires the current AVX512F, 
AVX512BW, AVX512CD, AVX512DQ and AVX512VL) - and so any future processors 
that only support 256-bit vector width will be considered to match the 
x86-64-v3 micro-architecture level but not any higher level?

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Intel AVX10.1 Compiler Design and Support
  2023-08-08 19:55 ` Joseph Myers
@ 2023-08-09  1:21   ` Hongtao Liu
  2023-08-09  2:14     ` Hongtao Liu
  0 siblings, 1 reply; 88+ messages in thread
From: Hongtao Liu @ 2023-08-09  1:21 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Haochen Jiang, gcc-patches, ubizjak, hongtao.liu, Zhang, Annita

On Wed, Aug 9, 2023 at 3:55 AM Joseph Myers <joseph@codesourcery.com> wrote:
>
> Do you have any comments on the interaction of AVX10 with the
> micro-architecture levels defined in the ABI (and supported with
> glibc-hwcaps directories in glibc)?  Given that the levels are cumulative,
> should we take it that any future levels will be ones supporting 512-bit
> vector width for AVX10 (because x86-64-v4 requires the current AVX512F,
> AVX512BW, AVX512CD, AVX512DQ and AVX512VL) - and so any future processors
> that only support 256-bit vector width will be considered to match the
> x86-64-v3 micro-architecture level but not any higher level?
This is actually something we really want to discuss in the community,
our proposal for x86-64-v5: AVX10.2-256(Implying AVX10.1-256) + APX.
One big reason is Intel E-core will only support AVX10 256-bit, if we
want to use x86-64-v5 accross  server and client, it's better to
256-bit default.
>
> --
> Joseph S. Myers
> joseph@codesourcery.com



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Intel AVX10.1 Compiler Design and Support
  2023-08-09  1:21   ` Hongtao Liu
@ 2023-08-09  2:14     ` Hongtao Liu
  2023-08-09  2:18       ` Hongtao Liu
  2023-08-09  7:17       ` Jan Beulich
  0 siblings, 2 replies; 88+ messages in thread
From: Hongtao Liu @ 2023-08-09  2:14 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Haochen Jiang, gcc-patches, ubizjak, hongtao.liu, Zhang, Annita,
	phoebe.wang, x86-64-abi, llvm-dev, Craig Topper

On Wed, Aug 9, 2023 at 9:21 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Wed, Aug 9, 2023 at 3:55 AM Joseph Myers <joseph@codesourcery.com> wrote:
> >
> > Do you have any comments on the interaction of AVX10 with the
> > micro-architecture levels defined in the ABI (and supported with
> > glibc-hwcaps directories in glibc)?  Given that the levels are cumulative,
> > should we take it that any future levels will be ones supporting 512-bit
> > vector width for AVX10 (because x86-64-v4 requires the current AVX512F,
> > AVX512BW, AVX512CD, AVX512DQ and AVX512VL) - and so any future processors
> > that only support 256-bit vector width will be considered to match the
> > x86-64-v3 micro-architecture level but not any higher level?
> This is actually something we really want to discuss in the community,
> our proposal for x86-64-v5: AVX10.2-256(Implying AVX10.1-256) + APX.
> One big reason is Intel E-core will only support AVX10 256-bit, if we
> want to use x86-64-v5 accross  server and client, it's better to
> 256-bit default.
+ ABI and LLVM folked for this topic.
> >
> > --
> > Joseph S. Myers
> > joseph@codesourcery.com
>
>
>
> --
> BR,
> Hongtao



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Intel AVX10.1 Compiler Design and Support
  2023-08-09  2:14     ` Hongtao Liu
@ 2023-08-09  2:18       ` Hongtao Liu
  2023-08-09  3:59         ` Wang, Phoebe
  2023-08-09  4:01         ` Phoebe Wang
  2023-08-09  7:17       ` Jan Beulich
  1 sibling, 2 replies; 88+ messages in thread
From: Hongtao Liu @ 2023-08-09  2:18 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Haochen Jiang, gcc-patches, ubizjak, hongtao.liu, Zhang, Annita,
	phoebe.wang, x86-64-abi, llvm-dev, Craig Topper

On Wed, Aug 9, 2023 at 10:14 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Wed, Aug 9, 2023 at 9:21 AM Hongtao Liu <crazylht@gmail.com> wrote:
> >
> > On Wed, Aug 9, 2023 at 3:55 AM Joseph Myers <joseph@codesourcery.com> wrote:
> > >
> > > Do you have any comments on the interaction of AVX10 with the
> > > micro-architecture levels defined in the ABI (and supported with
> > > glibc-hwcaps directories in glibc)?  Given that the levels are cumulative,
> > > should we take it that any future levels will be ones supporting 512-bit
> > > vector width for AVX10 (because x86-64-v4 requires the current AVX512F,
> > > AVX512BW, AVX512CD, AVX512DQ and AVX512VL) - and so any future processors
> > > that only support 256-bit vector width will be considered to match the
> > > x86-64-v3 micro-architecture level but not any higher level?
> > This is actually something we really want to discuss in the community,
> > our proposal for x86-64-v5: AVX10.2-256(Implying AVX10.1-256) + APX.
> > One big reason is Intel E-core will only support AVX10 256-bit, if we
> > want to use x86-64-v5 accross  server and client, it's better to
> > 256-bit default.
> + ABI and LLVM folked for this topic.
s/folked/folks/

> > >
> > > --
> > > Joseph S. Myers
> > > joseph@codesourcery.com
> >
> >
> >
> > --
> > BR,
> > Hongtao
>
>
>
> --
> BR,
> Hongtao



--
BR,
Hongtao

^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: Intel AVX10.1 Compiler Design and Support
  2023-08-09  2:18       ` Hongtao Liu
@ 2023-08-09  3:59         ` Wang, Phoebe
  2023-08-09 20:43           ` Joseph Myers
  2023-08-09  4:01         ` Phoebe Wang
  1 sibling, 1 reply; 88+ messages in thread
From: Wang, Phoebe @ 2023-08-09  3:59 UTC (permalink / raw)
  To: Hongtao Liu, Joseph Myers
  Cc: Jiang, Haochen, gcc-patches, ubizjak, Liu, Hongtao, Zhang,
	Annita, x86-64-abi, llvm-dev, Craig Topper

I have some proposals about unifying ABI on AVX10 for both 256-bit and 512-bit.

Proposal 1: Promote attribute from AVX10-256 to AVX10-512 for any function which has 512-bit or above vectors in passing/returning arguments.
Problem: Binary cannot run on AVX10-256 only target.
Reason:
When user tries to pass/return 512-bit vector, they should be aware of it will become target dependent. User should be taught not to use it on 256-bit targets and there will be unexpected things happening if they insist.
Actually, ICC and MSVC already have chosen to promote for the argument: https://godbolt.org/z/vcrf9qW5z
I think if compiler have to choose the misbehavior between fail in result and crash due to illegal instruction, the latter is definitely better than the former.
In this way, we can also declare x86-64-v5 is inherit from x86-64-v4 and has the interaction with previous versions.

Proposal 2: Abort compilation when user tries to pass/return 512-bit vectors.
Reason: This turns possible run time crash into compile time error.

Proposal 3: Change the ABI of 512-bit vector and always be passed/returned from memory.
Reason: We expect AVX10-256 is a universal configuration and in most scenarios, 512-bit vector won't bring performance improvements. So we can sacrifice a little 512-bit performance to achieve the interaction between AVX10-256 and AVX10-512. In this way, there won't have any runtime issue in the future either.

Thanks
Phoebe

-----Original Message-----
From: Hongtao Liu <crazylht@gmail.com> 
Sent: Wednesday, August 9, 2023 10:19 AM
To: Joseph Myers <joseph@codesourcery.com>
Cc: Jiang, Haochen <haochen.jiang@intel.com>; gcc-patches@gcc.gnu.org; ubizjak@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>; Zhang, Annita <annita.zhang@intel.com>; Wang, Phoebe <phoebe.wang@intel.com>; x86-64-abi <x86-64-abi@googlegroups.com>; llvm-dev <llvm-dev@lists.llvm.org>; Craig Topper <craig.topper@gmail.com>
Subject: Re: Intel AVX10.1 Compiler Design and Support

On Wed, Aug 9, 2023 at 10:14 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Wed, Aug 9, 2023 at 9:21 AM Hongtao Liu <crazylht@gmail.com> wrote:
> >
> > On Wed, Aug 9, 2023 at 3:55 AM Joseph Myers <joseph@codesourcery.com> wrote:
> > >
> > > Do you have any comments on the interaction of AVX10 with the 
> > > micro-architecture levels defined in the ABI (and supported with 
> > > glibc-hwcaps directories in glibc)?  Given that the levels are 
> > > cumulative, should we take it that any future levels will be ones 
> > > supporting 512-bit vector width for AVX10 (because x86-64-v4 
> > > requires the current AVX512F, AVX512BW, AVX512CD, AVX512DQ and 
> > > AVX512VL) - and so any future processors that only support 256-bit 
> > > vector width will be considered to match the
> > > x86-64-v3 micro-architecture level but not any higher level?
> > This is actually something we really want to discuss in the 
> > community, our proposal for x86-64-v5: AVX10.2-256(Implying AVX10.1-256) + APX.
> > One big reason is Intel E-core will only support AVX10 256-bit, if 
> > we want to use x86-64-v5 accross  server and client, it's better to 
> > 256-bit default.
> + ABI and LLVM folked for this topic.
s/folked/folks/

> > >
> > > --
> > > Joseph S. Myers
> > > joseph@codesourcery.com
> >
> >
> >
> > --
> > BR,
> > Hongtao
>
>
>
> --
> BR,
> Hongtao

--
BR,
Hongtao

^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: Intel AVX10.1 Compiler Design and Support
  2023-08-09  3:59         ` Wang, Phoebe
@ 2023-08-09 20:43           ` Joseph Myers
  2023-08-09 20:49             ` Jakub Jelinek
  2023-08-10 12:36             ` Phoebe Wang
  0 siblings, 2 replies; 88+ messages in thread
From: Joseph Myers @ 2023-08-09 20:43 UTC (permalink / raw)
  To: Wang, Phoebe
  Cc: Hongtao Liu, Jiang, Haochen, gcc-patches, ubizjak, Liu, Hongtao,
	Zhang, Annita, x86-64-abi, llvm-dev, Craig Topper

On Wed, 9 Aug 2023, Wang, Phoebe via Gcc-patches wrote:

> Proposal 3: Change the ABI of 512-bit vector and always be 
> passed/returned from memory.

Changing ABIs like that for existing code that has worked for some time on 
existing hardware is a bad idea.

At this point it seems appropriate to remind people of another ABI 
consideration for vector extensions.  glibc's libmvec defines vector 
versions of various functions, including AVX512 ones (of course those 
function versions only work on hardware with the relevant instructions).  
glibc's headers use both _Pragma ("omp declare simd notinbranch") and 
__attribute__ ((__simd__ ("notinbranch"))) to declare, to the compiler 
including those headers, what function variants are available in glibc.

Existing glibc versions need to continue to work with new compiler 
versions.  That is, it's part of the ABI, which must remain stable, 
exactly which function versions the above pragma and attribute imply are 
available - and of course the details of how those functions versions take 
arguments / return results are also part of the ABI (it would be OK for a 
new compiler to choose not to use some of those vector versions, but not 
to start calling them with a different ABI).

Maybe you'll want to add new vector function versions, with different 
interfaces, to libmvec in future.  If so, you need a *different* pragma or 
attribute to declare to the compiler that the libmvec version using that 
pragma or attribute has the additional functions - so new compilers using 
the existing header will not try to generate calls to new function 
versions that don't exist in that glibc version (but new compilers using a 
new header version from new glibc will see the new pragma or attribute and 
so be able to generate the relevant calls to new functions).  And once 
you've defined the ABI for such a new pragma or attribute, that itself 
then becomes a stable interface - so if you end up with vector extensions 
involving yet another set of interfaces, they need another corresponding 
new pragma / attribute for libmvec to declare to the compiler that the new 
interfaces exist.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Intel AVX10.1 Compiler Design and Support
  2023-08-09 20:43           ` Joseph Myers
@ 2023-08-09 20:49             ` Jakub Jelinek
  2023-08-10 12:36             ` Phoebe Wang
  1 sibling, 0 replies; 88+ messages in thread
From: Jakub Jelinek @ 2023-08-09 20:49 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Wang, Phoebe, Hongtao Liu, Jiang, Haochen, gcc-patches, ubizjak,
	Liu, Hongtao, Zhang, Annita, x86-64-abi, llvm-dev, Craig Topper

On Wed, Aug 09, 2023 at 08:43:00PM +0000, Joseph Myers wrote:
> At this point it seems appropriate to remind people of another ABI 
> consideration for vector extensions.  glibc's libmvec defines vector 
> versions of various functions, including AVX512 ones (of course those 
> function versions only work on hardware with the relevant instructions).  
> glibc's headers use both _Pragma ("omp declare simd notinbranch") and 
> __attribute__ ((__simd__ ("notinbranch"))) to declare, to the compiler 
> including those headers, what function variants are available in glibc.

For omp declare simd or simd attribute that simply implies that the
variants with 512-bit vectors may only be called from -mavx512f or
-mavx10.1-512 (or how the switch will be called code), not from -mavx10.1.
We shouldn't change that ABI because of AVX10.

	Jakub


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Intel AVX10.1 Compiler Design and Support
  2023-08-09 20:43           ` Joseph Myers
  2023-08-09 20:49             ` Jakub Jelinek
@ 2023-08-10 12:36             ` Phoebe Wang
  2023-08-10 12:45               ` Richard Biener
  1 sibling, 1 reply; 88+ messages in thread
From: Phoebe Wang @ 2023-08-10 12:36 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Wang, Phoebe, Hongtao Liu, Jiang, Haochen, gcc-patches, ubizjak,
	Liu, Hongtao, Zhang, Annita, x86-64-abi, llvm-dev, Craig Topper

[-- Attachment #1: Type: text/plain, Size: 3621 bytes --]

>  Changing ABIs like that for existing code that has worked for some time
on
>  existing hardware is a bad idea.

I agree, so Proposal 3 is the last choice.

The target of the proposals is to solve the ABI incompatible issue between
AVX10-256 and AVX10-512 when passing/returning 512 vectors. So we are
discussing the default ABI rather than other vector variants.

If you believe that changing 512-bit ABI (the 512-bit version) is a bad
idea, how about Proposal 1 and 2? I don't want to call the non 512-bit
version an ABI because it doesn't provide the interaction between 256-bit
and 512-bit targets. Besides, LLVM also behaves differently with GCC on non
512-bit targets. It is a good time to solve the problem together if we make
the 512-bit ABI consistent and target independent. WDYT?

Thanks
Phoebe

Joseph Myers <joseph@codesourcery.com> 于2023年8月10日周四 04:43写道：

> On Wed, 9 Aug 2023, Wang, Phoebe via Gcc-patches wrote:
>
> > Proposal 3: Change the ABI of 512-bit vector and always be
> > passed/returned from memory.
>
> Changing ABIs like that for existing code that has worked for some time on
> existing hardware is a bad idea.
>
> At this point it seems appropriate to remind people of another ABI
> consideration for vector extensions.  glibc's libmvec defines vector
> versions of various functions, including AVX512 ones (of course those
> function versions only work on hardware with the relevant instructions).
> glibc's headers use both _Pragma ("omp declare simd notinbranch") and
> __attribute__ ((__simd__ ("notinbranch"))) to declare, to the compiler
> including those headers, what function variants are available in glibc.
>
> Existing glibc versions need to continue to work with new compiler
> versions.  That is, it's part of the ABI, which must remain stable,
> exactly which function versions the above pragma and attribute imply are
> available - and of course the details of how those functions versions take
> arguments / return results are also part of the ABI (it would be OK for a
> new compiler to choose not to use some of those vector versions, but not
> to start calling them with a different ABI).
>
> Maybe you'll want to add new vector function versions, with different
> interfaces, to libmvec in future.  If so, you need a *different* pragma or
> attribute to declare to the compiler that the libmvec version using that
> pragma or attribute has the additional functions - so new compilers using
> the existing header will not try to generate calls to new function
> versions that don't exist in that glibc version (but new compilers using a
> new header version from new glibc will see the new pragma or attribute and
> so be able to generate the relevant calls to new functions).  And once
> you've defined the ABI for such a new pragma or attribute, that itself
> then becomes a stable interface - so if you end up with vector extensions
> involving yet another set of interfaces, they need another corresponding
> new pragma / attribute for libmvec to declare to the compiler that the new
> interfaces exist.
>
> --
> Joseph S. Myers
> joseph@codesourcery.com
>
> --
> You received this message because you are subscribed to the Google Groups
> "X86-64 System V Application Binary Interface" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to x86-64-abi+unsubscribe@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/x86-64-abi/8fb470de-d2a3-3e71-be6a-ccc7f4f31a31%40codesourcery.com
> .
>

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Intel AVX10.1 Compiler Design and Support
  2023-08-10 12:36             ` Phoebe Wang
@ 2023-08-10 12:45               ` Richard Biener
  2023-08-10 13:12                 ` Phoebe Wang
  2023-08-10 22:16                 ` Joseph Myers
  0 siblings, 2 replies; 88+ messages in thread
From: Richard Biener @ 2023-08-10 12:45 UTC (permalink / raw)
  To: Phoebe Wang
  Cc: Joseph Myers, Wang, Phoebe, Hongtao Liu, Jiang, Haochen,
	gcc-patches, ubizjak, Liu, Hongtao, Zhang, Annita, x86-64-abi,
	llvm-dev, Craig Topper

On Thu, Aug 10, 2023 at 2:37 PM Phoebe Wang via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> >  Changing ABIs like that for existing code that has worked for some time
> on
> >  existing hardware is a bad idea.
>
> I agree, so Proposal 3 is the last choice.
>
> The target of the proposals is to solve the ABI incompatible issue between
> AVX10-256 and AVX10-512 when passing/returning 512 vectors. So we are
> discussing the default ABI rather than other vector variants.
>
> If you believe that changing 512-bit ABI (the 512-bit version) is a bad
> idea, how about Proposal 1 and 2? I don't want to call the non 512-bit
> version an ABI because it doesn't provide the interaction between 256-bit
> and 512-bit targets. Besides, LLVM also behaves differently with GCC on non
> 512-bit targets. It is a good time to solve the problem together if we make
> the 512-bit ABI consistent and target independent. WDYT?

Isn't this situation similar to the not defined ABI when passing generic
vectors (via __attribute__((vector_size))) that do not map to vectors supported
by the current ISA?  There's cases like vector<2> char or vector<1> double
to consider for example that would fit in a lowpart of a supported vector
register and as in the AVX512 case vectors that are larger than any supported
vector register.

The psABI should have some simple rule covering all of the above I think.

Richard.

> Thanks
> Phoebe
>
> Joseph Myers <joseph@codesourcery.com> 于2023年8月10日周四 04:43写道：
>
> > On Wed, 9 Aug 2023, Wang, Phoebe via Gcc-patches wrote:
> >
> > > Proposal 3: Change the ABI of 512-bit vector and always be
> > > passed/returned from memory.
> >
> > Changing ABIs like that for existing code that has worked for some time on
> > existing hardware is a bad idea.
> >
> > At this point it seems appropriate to remind people of another ABI
> > consideration for vector extensions.  glibc's libmvec defines vector
> > versions of various functions, including AVX512 ones (of course those
> > function versions only work on hardware with the relevant instructions).
> > glibc's headers use both _Pragma ("omp declare simd notinbranch") and
> > __attribute__ ((__simd__ ("notinbranch"))) to declare, to the compiler
> > including those headers, what function variants are available in glibc.
> >
> > Existing glibc versions need to continue to work with new compiler
> > versions.  That is, it's part of the ABI, which must remain stable,
> > exactly which function versions the above pragma and attribute imply are
> > available - and of course the details of how those functions versions take
> > arguments / return results are also part of the ABI (it would be OK for a
> > new compiler to choose not to use some of those vector versions, but not
> > to start calling them with a different ABI).
> >
> > Maybe you'll want to add new vector function versions, with different
> > interfaces, to libmvec in future.  If so, you need a *different* pragma or
> > attribute to declare to the compiler that the libmvec version using that
> > pragma or attribute has the additional functions - so new compilers using
> > the existing header will not try to generate calls to new function
> > versions that don't exist in that glibc version (but new compilers using a
> > new header version from new glibc will see the new pragma or attribute and
> > so be able to generate the relevant calls to new functions).  And once
> > you've defined the ABI for such a new pragma or attribute, that itself
> > then becomes a stable interface - so if you end up with vector extensions
> > involving yet another set of interfaces, they need another corresponding
> > new pragma / attribute for libmvec to declare to the compiler that the new
> > interfaces exist.
> >
> > --
> > Joseph S. Myers
> > joseph@codesourcery.com
> >
> > --
> > You received this message because you are subscribed to the Google Groups
> > "X86-64 System V Application Binary Interface" group.
> > To unsubscribe from this group and stop receiving emails from it, send an
> > email to x86-64-abi+unsubscribe@googlegroups.com.
> > To view this discussion on the web visit
> > https://groups.google.com/d/msgid/x86-64-abi/8fb470de-d2a3-3e71-be6a-ccc7f4f31a31%40codesourcery.com
> > .
> >

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Intel AVX10.1 Compiler Design and Support
  2023-08-10 12:45               ` Richard Biener
@ 2023-08-10 13:12                 ` Phoebe Wang
  2023-08-10 13:30                   ` Jan Beulich
  2023-08-10 22:16                 ` Joseph Myers
  1 sibling, 1 reply; 88+ messages in thread
From: Phoebe Wang @ 2023-08-10 13:12 UTC (permalink / raw)
  To: Richard Biener
  Cc: Joseph Myers, Wang, Phoebe, Hongtao Liu, Jiang, Haochen,
	gcc-patches, ubizjak, Liu, Hongtao, Zhang, Annita, x86-64-abi,
	llvm-dev, Craig Topper

[-- Attachment #1: Type: text/plain, Size: 5510 bytes --]

>  The psABI should have some simple rule covering all of the above I think.

psABI has a rule for the case doesn't mean the rule is a well defined ABI
in practice. A well defined ABI should guarantee 1) interlinkable across
different compile options within the same compiler; 2) interlinkable across
different compilers. Both aspects are failed in the non 512-bit version.

1) is more important than 2) and becomes more critical on AVX10 targets.
Because we expect AVX10-256 is a general setting for binaries that can run
on both AVX10-256 and AVX10-512. It would be common that binaries compiled
with AVX10-256 may link with native built binaries on AVX10-512 targets.

Both 1) and 2) show the problem of the current rule in the psABI. So I
think the psABI should be updated to solve them.

Thanks
Phoebe

Richard Biener <richard.guenther@gmail.com> 于2023年8月10日周四 20:46写道：

> On Thu, Aug 10, 2023 at 2:37 PM Phoebe Wang via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > >  Changing ABIs like that for existing code that has worked for some
> time
> > on
> > >  existing hardware is a bad idea.
> >
> > I agree, so Proposal 3 is the last choice.
> >
> > The target of the proposals is to solve the ABI incompatible issue
> between
> > AVX10-256 and AVX10-512 when passing/returning 512 vectors. So we are
> > discussing the default ABI rather than other vector variants.
> >
> > If you believe that changing 512-bit ABI (the 512-bit version) is a bad
> > idea, how about Proposal 1 and 2? I don't want to call the non 512-bit
> > version an ABI because it doesn't provide the interaction between 256-bit
> > and 512-bit targets. Besides, LLVM also behaves differently with GCC on
> non
> > 512-bit targets. It is a good time to solve the problem together if we
> make
> > the 512-bit ABI consistent and target independent. WDYT?
>
> Isn't this situation similar to the not defined ABI when passing generic
> vectors (via __attribute__((vector_size))) that do not map to vectors
> supported
> by the current ISA?  There's cases like vector<2> char or vector<1> double
> to consider for example that would fit in a lowpart of a supported vector
> register and as in the AVX512 case vectors that are larger than any
> supported
> vector register.
>
> The psABI should have some simple rule covering all of the above I think.
>
> Richard.
>
> > Thanks
> > Phoebe
> >
> > Joseph Myers <joseph@codesourcery.com> 于2023年8月10日周四 04:43写道：
> >
> > > On Wed, 9 Aug 2023, Wang, Phoebe via Gcc-patches wrote:
> > >
> > > > Proposal 3: Change the ABI of 512-bit vector and always be
> > > > passed/returned from memory.
> > >
> > > Changing ABIs like that for existing code that has worked for some
> time on
> > > existing hardware is a bad idea.
> > >
> > > At this point it seems appropriate to remind people of another ABI
> > > consideration for vector extensions.  glibc's libmvec defines vector
> > > versions of various functions, including AVX512 ones (of course those
> > > function versions only work on hardware with the relevant
> instructions).
> > > glibc's headers use both _Pragma ("omp declare simd notinbranch") and
> > > __attribute__ ((__simd__ ("notinbranch"))) to declare, to the compiler
> > > including those headers, what function variants are available in glibc.
> > >
> > > Existing glibc versions need to continue to work with new compiler
> > > versions.  That is, it's part of the ABI, which must remain stable,
> > > exactly which function versions the above pragma and attribute imply
> are
> > > available - and of course the details of how those functions versions
> take
> > > arguments / return results are also part of the ABI (it would be OK
> for a
> > > new compiler to choose not to use some of those vector versions, but
> not
> > > to start calling them with a different ABI).
> > >
> > > Maybe you'll want to add new vector function versions, with different
> > > interfaces, to libmvec in future.  If so, you need a *different*
> pragma or
> > > attribute to declare to the compiler that the libmvec version using
> that
> > > pragma or attribute has the additional functions - so new compilers
> using
> > > the existing header will not try to generate calls to new function
> > > versions that don't exist in that glibc version (but new compilers
> using a
> > > new header version from new glibc will see the new pragma or attribute
> and
> > > so be able to generate the relevant calls to new functions).  And once
> > > you've defined the ABI for such a new pragma or attribute, that itself
> > > then becomes a stable interface - so if you end up with vector
> extensions
> > > involving yet another set of interfaces, they need another
> corresponding
> > > new pragma / attribute for libmvec to declare to the compiler that the
> new
> > > interfaces exist.
> > >
> > > --
> > > Joseph S. Myers
> > > joseph@codesourcery.com
> > >
> > > --
> > > You received this message because you are subscribed to the Google
> Groups
> > > "X86-64 System V Application Binary Interface" group.
> > > To unsubscribe from this group and stop receiving emails from it, send
> an
> > > email to x86-64-abi+unsubscribe@googlegroups.com.
> > > To view this discussion on the web visit
> > >
> https://groups.google.com/d/msgid/x86-64-abi/8fb470de-d2a3-3e71-be6a-ccc7f4f31a31%40codesourcery.com
> > > .
> > >
>

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Intel AVX10.1 Compiler Design and Support
  2023-08-10 13:12                 ` Phoebe Wang
@ 2023-08-10 13:30                   ` Jan Beulich
  2023-08-10 13:52                     ` Richard Biener
  2023-08-10 14:15                     ` Jiang, Haochen
  0 siblings, 2 replies; 88+ messages in thread
From: Jan Beulich @ 2023-08-10 13:30 UTC (permalink / raw)
  To: Phoebe Wang
  Cc: Joseph Myers, Wang, Phoebe, Hongtao Liu, Jiang, Haochen,
	gcc-patches, ubizjak, Liu, Hongtao, Zhang, Annita, x86-64-abi,
	llvm-dev, Craig Topper, Richard Biener

On 10.08.2023 15:12, Phoebe Wang wrote:
>>  The psABI should have some simple rule covering all of the above I think.
> 
> psABI has a rule for the case doesn't mean the rule is a well defined ABI
> in practice. A well defined ABI should guarantee 1) interlinkable across
> different compile options within the same compiler; 2) interlinkable across
> different compilers. Both aspects are failed in the non 512-bit version.
> 
> 1) is more important than 2) and becomes more critical on AVX10 targets.
> Because we expect AVX10-256 is a general setting for binaries that can run
> on both AVX10-256 and AVX10-512. It would be common that binaries compiled
> with AVX10-256 may link with native built binaries on AVX10-512 targets.

But you're only describing a pre-existing problem here afaict. Code compiled
with -mavx51f passing __m512 type data to a function compiled with only,
say, -maxv2 won't interoperate properly either. What's worse, imo the psABI
doesn't sufficiently define what __m256 etc actually are. After all these
aren't types defined by the C standard (as opposed to at least most other
types in the respective table there), and you can't really make assumptions
like "this is what certain compilers think this is".

Jan

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Intel AVX10.1 Compiler Design and Support
  2023-08-10 13:30                   ` Jan Beulich
@ 2023-08-10 13:52                     ` Richard Biener
  2023-08-10 14:15                     ` Jiang, Haochen
  1 sibling, 0 replies; 88+ messages in thread
From: Richard Biener @ 2023-08-10 13:52 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Phoebe Wang, Joseph Myers, Wang, Phoebe, Hongtao Liu, Jiang,
	Haochen, gcc-patches, ubizjak, Liu, Hongtao, Zhang, Annita,
	x86-64-abi, llvm-dev, Craig Topper

On Thu, Aug 10, 2023 at 3:31 PM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 10.08.2023 15:12, Phoebe Wang wrote:
> >>  The psABI should have some simple rule covering all of the above I think.
> >
> > psABI has a rule for the case doesn't mean the rule is a well defined ABI
> > in practice. A well defined ABI should guarantee 1) interlinkable across
> > different compile options within the same compiler; 2) interlinkable across
> > different compilers. Both aspects are failed in the non 512-bit version.
> >
> > 1) is more important than 2) and becomes more critical on AVX10 targets.
> > Because we expect AVX10-256 is a general setting for binaries that can run
> > on both AVX10-256 and AVX10-512. It would be common that binaries compiled
> > with AVX10-256 may link with native built binaries on AVX10-512 targets.
>
> But you're only describing a pre-existing problem here afaict. Code compiled
> with -mavx51f passing __m512 type data to a function compiled with only,
> say, -maxv2 won't interoperate properly either. What's worse, imo the psABI
> doesn't sufficiently define what __m256 etc actually are. After all these
> aren't types defined by the C standard (as opposed to at least most other
> types in the respective table there), and you can't really make assumptions
> like "this is what certain compilers think this is".

You might be able to speak in terms of OpenMP SIMD with simdlen?

Richard.

> Jan

^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: Intel AVX10.1 Compiler Design and Support
  2023-08-10 13:30                   ` Jan Beulich
  2023-08-10 13:52                     ` Richard Biener
@ 2023-08-10 14:15                     ` Jiang, Haochen
  2023-08-10 15:08                       ` Zhang, Annita
  1 sibling, 1 reply; 88+ messages in thread
From: Jiang, Haochen @ 2023-08-10 14:15 UTC (permalink / raw)
  To: Beulich, Jan, Phoebe Wang
  Cc: Joseph Myers, Wang, Phoebe, Hongtao Liu, gcc-patches, ubizjak,
	Liu, Hongtao, Zhang, Annita, x86-64-abi, llvm-dev, Craig Topper,
	Richard Biener

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Thursday, August 10, 2023 9:31 PM
> To: Phoebe Wang <phoebe.pf.w@gmail.com>
> Cc: Joseph Myers <joseph@codesourcery.com>; Wang, Phoebe
> <phoebe.wang@intel.com>; Hongtao Liu <crazylht@gmail.com>; Jiang, Haochen
> <haochen.jiang@intel.com>; gcc-patches@gcc.gnu.org; ubizjak@gmail.com; Liu,
> Hongtao <hongtao.liu@intel.com>; Zhang, Annita <annita.zhang@intel.com>;
> x86-64-abi <x86-64-abi@googlegroups.com>; llvm-dev <llvm-
> dev@lists.llvm.org>; Craig Topper <craig.topper@gmail.com>; Richard Biener
> <richard.guenther@gmail.com>
> Subject: Re: Intel AVX10.1 Compiler Design and Support
> 
> On 10.08.2023 15:12, Phoebe Wang wrote:
> >>  The psABI should have some simple rule covering all of the above I think.
> >
> > psABI has a rule for the case doesn't mean the rule is a well defined
> > ABI in practice. A well defined ABI should guarantee 1) interlinkable
> > across different compile options within the same compiler; 2)
> > interlinkable across different compilers. Both aspects are failed in the non 512-
> bit version.
> >
> > 1) is more important than 2) and becomes more critical on AVX10 targets.
> > Because we expect AVX10-256 is a general setting for binaries that can
> > run on both AVX10-256 and AVX10-512. It would be common that binaries
> > compiled with AVX10-256 may link with native built binaries on AVX10-512
> targets.

IMO it is not acceptable for AVX10-256 to generate zmm registers.

If I have to choose among the three proposal, the second is better.

But the best choice I suppose is to keep what we are doing currently, which is
passing them in memory and emit a warning. It is a reasonable behavior.

Thx,
Haochen

> 
> But you're only describing a pre-existing problem here afaict. Code compiled with
> -mavx51f passing __m512 type data to a function compiled with only, say, -maxv2
> won't interoperate properly either. What's worse, imo the psABI doesn't
> sufficiently define what __m256 etc actually are. After all these aren't types
> defined by the C standard (as opposed to at least most other types in the
> respective table there), and you can't really make assumptions like "this is what
> certain compilers think this is".
> 
> Jan

^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: Intel AVX10.1 Compiler Design and Support
  2023-08-10 14:15                     ` Jiang, Haochen
@ 2023-08-10 15:08                       ` Zhang, Annita
  2023-08-10 15:18                         ` Jakub Jelinek
  0 siblings, 1 reply; 88+ messages in thread
From: Zhang, Annita @ 2023-08-10 15:08 UTC (permalink / raw)
  To: Jiang, Haochen, Beulich, Jan, Phoebe Wang
  Cc: Joseph Myers, Wang, Phoebe, Hongtao Liu, gcc-patches, ubizjak,
	Liu, Hongtao, x86-64-abi, llvm-dev, Craig Topper, Richard Biener

For ABI change proposal, I'd suggest to raise a discussion in x86-64-abi group. 

Thx,
Annita

> -----Original Message-----
> From: Jiang, Haochen <haochen.jiang@intel.com>
> Sent: Thursday, August 10, 2023 10:15 PM
> To: Beulich, Jan <JBeulich@suse.com>; Phoebe Wang
> <phoebe.pf.w@gmail.com>
> Cc: Joseph Myers <joseph@codesourcery.com>; Wang, Phoebe
> <phoebe.wang@intel.com>; Hongtao Liu <crazylht@gmail.com>; gcc-
> patches@gcc.gnu.org; ubizjak@gmail.com; Liu, Hongtao
> <hongtao.liu@intel.com>; Zhang, Annita <annita.zhang@intel.com>; x86-64-
> abi <x86-64-abi@googlegroups.com>; llvm-dev <llvm-dev@lists.llvm.org>;
> Craig Topper <craig.topper@gmail.com>; Richard Biener
> <richard.guenther@gmail.com>
> Subject: RE: Intel AVX10.1 Compiler Design and Support
> 
> > -----Original Message-----
> > From: Jan Beulich <jbeulich@suse.com>
> > Sent: Thursday, August 10, 2023 9:31 PM
> > To: Phoebe Wang <phoebe.pf.w@gmail.com>
> > Cc: Joseph Myers <joseph@codesourcery.com>; Wang, Phoebe
> > <phoebe.wang@intel.com>; Hongtao Liu <crazylht@gmail.com>; Jiang,
> > Haochen <haochen.jiang@intel.com>; gcc-patches@gcc.gnu.org;
> > ubizjak@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>; Zhang, Annita
> > <annita.zhang@intel.com>; x86-64-abi <x86-64-abi@googlegroups.com>;
> > llvm-dev <llvm- dev@lists.llvm.org>; Craig Topper
> > <craig.topper@gmail.com>; Richard Biener <richard.guenther@gmail.com>
> > Subject: Re: Intel AVX10.1 Compiler Design and Support
> >
> > On 10.08.2023 15:12, Phoebe Wang wrote:
> > >>  The psABI should have some simple rule covering all of the above I think.
> > >
> > > psABI has a rule for the case doesn't mean the rule is a well
> > > defined ABI in practice. A well defined ABI should guarantee 1)
> > > interlinkable across different compile options within the same
> > > compiler; 2) interlinkable across different compilers. Both aspects
> > > are failed in the non 512-
> > bit version.
> > >
> > > 1) is more important than 2) and becomes more critical on AVX10 targets.
> > > Because we expect AVX10-256 is a general setting for binaries that
> > > can run on both AVX10-256 and AVX10-512. It would be common that
> > > binaries compiled with AVX10-256 may link with native built binaries
> > > on AVX10-512
> > targets.
> 
> IMO it is not acceptable for AVX10-256 to generate zmm registers.
> 
> If I have to choose among the three proposal, the second is better.
> 
> But the best choice I suppose is to keep what we are doing currently, which is
> passing them in memory and emit a warning. It is a reasonable behavior.
> 
> Thx,
> Haochen
> 
> >
> > But you're only describing a pre-existing problem here afaict. Code
> > compiled with -mavx51f passing __m512 type data to a function compiled
> > with only, say, -maxv2 won't interoperate properly either. What's
> > worse, imo the psABI doesn't sufficiently define what __m256 etc
> > actually are. After all these aren't types defined by the C standard
> > (as opposed to at least most other types in the respective table
> > there), and you can't really make assumptions like "this is what certain
> compilers think this is".
> >
> > Jan

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Intel AVX10.1 Compiler Design and Support
  2023-08-10 15:08                       ` Zhang, Annita
@ 2023-08-10 15:18                         ` Jakub Jelinek
  0 siblings, 0 replies; 88+ messages in thread
From: Jakub Jelinek @ 2023-08-10 15:18 UTC (permalink / raw)
  To: Zhang, Annita
  Cc: Jiang, Haochen, Beulich, Jan, Phoebe Wang, Joseph Myers, Wang,
	Phoebe, Hongtao Liu, gcc-patches, ubizjak, Liu, Hongtao,
	x86-64-abi, llvm-dev, Craig Topper, Richard Biener

On Thu, Aug 10, 2023 at 03:08:11PM +0000, Zhang, Annita via Gcc-patches wrote:
> > IMO it is not acceptable for AVX10-256 to generate zmm registers.
> > 
> > If I have to choose among the three proposal, the second is better.
> > 
> > But the best choice I suppose is to keep what we are doing currently, which is
> > passing them in memory and emit a warning. It is a reasonable behavior.

Completely agree on this.  If anything in the psABI should be changed, that
IMHO would be just clarification if it is not clear enough that when __m256
and/or __m512 are passed on ISAs which do not support those they are passed
in memory.  That is what the psABI was clearly effectively saying before the
__m256 resp. __m512 support has been added there.
So yes, warn and use memory if ISA doesn't support those.

	Jakub

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Intel AVX10.1 Compiler Design and Support
  2023-08-10 12:45               ` Richard Biener
  2023-08-10 13:12                 ` Phoebe Wang
@ 2023-08-10 22:16                 ` Joseph Myers
  1 sibling, 0 replies; 88+ messages in thread
From: Joseph Myers @ 2023-08-10 22:16 UTC (permalink / raw)
  To: Richard Biener
  Cc: Phoebe Wang, Wang, Phoebe, Hongtao Liu, Jiang, Haochen,
	gcc-patches, ubizjak, Liu, Hongtao, Zhang, Annita, x86-64-abi,
	llvm-dev, Craig Topper

On Thu, 10 Aug 2023, Richard Biener via Gcc-patches wrote:

> Isn't this situation similar to the not defined ABI when passing generic
> vectors (via __attribute__((vector_size))) that do not map to vectors supported
> by the current ISA?  There's cases like vector<2> char or vector<1> double
> to consider for example that would fit in a lowpart of a supported vector
> register and as in the AVX512 case vectors that are larger than any supported
> vector register.

Note there is a difference in some cases (I don't know if this is relevant 
for x86) between "vectors supported by the current ISA" and "vectors whose 
ABI, for ISAs that do support them, can be implemented using the current 
ISA".

Specifically, when working on the VFP AAPCS variant for 32-bit Arm, I made 
sure that generic vectors had the same ABI on all processors supporting 
VFP, whether or not the vector parts of the instruction set were supported 
on the chosen processor.  On 32-bit Arm that's possible because vector 
registers are the same as floating-point registers (and even the 
single-precision-only VFP variant has suitable load and store 
instructions).

Of course if your ABI for some kinds of vectors uses registers not 
supported on all processors, and on the processors that do support those 
registers you use that ABI for corresponding generic vectors, then you 
won't be able to be compatible with that ABI for those generic vectors on 
processors without those registers.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Intel AVX10.1 Compiler Design and Support
  2023-08-09  2:18       ` Hongtao Liu
  2023-08-09  3:59         ` Wang, Phoebe
@ 2023-08-09  4:01         ` Phoebe Wang
  2023-08-09  5:37           ` Richard Biener
  1 sibling, 1 reply; 88+ messages in thread
From: Phoebe Wang @ 2023-08-09  4:01 UTC (permalink / raw)
  To: Hongtao Liu
  Cc: Joseph Myers, Haochen Jiang, gcc-patches, ubizjak, hongtao.liu,
	Zhang, Annita, phoebe.wang, x86-64-abi, llvm-dev, Craig Topper

[-- Attachment #1: Type: text/plain, Size: 3545 bytes --]

I have some proposals about unifying ABI on AVX10 for both 256-bit and
512-bit.

Proposal 1: Promote attribute from AVX10-256 to AVX10-512 for any function
which has 512-bit or above vectors in passing/returning arguments.

Problem: Binary cannot run on AVX10-256 only target.

Reason:

When user tries to pass/return 512-bit vector, they should be aware of it
will become target dependent. User should be taught not to use it on
256-bit targets and there will be unexpected things happening if they
insist.

Actually, ICC and MSVC already have chosen to promote for the argument:
https://godbolt.org/z/vcrf9qW5z I think if compiler have to choose the
misbehavior between fail in result and crash due to illegal instruction,
the latter is definitely better than the former.

In this way, we can also declare x86-64-v5 is inherit from x86-64-v4 and
has the interaction with previous versions.

Proposal 2: Abort compilation when user tries to pass/return 512-bit
vectors.

Reason: This turns possible run time crash into compile time error.

Proposal 3: Change the ABI of 512-bit vector and always be passed/returned
from memory.

Reason: We expect AVX10-256 is a universal configuration and in most
scenarios, 512-bit vector won't bring performance improvements. So we can
sacrifice a little 512-bit performance to achieve the interaction between
AVX10-256 and AVX10-512. In this way, there won't have any runtime issue in
the future either.

Thanks

Phoebe

Hongtao Liu <crazylht@gmail.com> 于2023年8月9日周三 10:18写道：

> On Wed, Aug 9, 2023 at 10:14 AM Hongtao Liu <crazylht@gmail.com> wrote:
> >
> > On Wed, Aug 9, 2023 at 9:21 AM Hongtao Liu <crazylht@gmail.com> wrote:
> > >
> > > On Wed, Aug 9, 2023 at 3:55 AM Joseph Myers <joseph@codesourcery.com>
> wrote:
> > > >
> > > > Do you have any comments on the interaction of AVX10 with the
> > > > micro-architecture levels defined in the ABI (and supported with
> > > > glibc-hwcaps directories in glibc)?  Given that the levels are
> cumulative,
> > > > should we take it that any future levels will be ones supporting
> 512-bit
> > > > vector width for AVX10 (because x86-64-v4 requires the current
> AVX512F,
> > > > AVX512BW, AVX512CD, AVX512DQ and AVX512VL) - and so any future
> processors
> > > > that only support 256-bit vector width will be considered to match
> the
> > > > x86-64-v3 micro-architecture level but not any higher level?
> > > This is actually something we really want to discuss in the community,
> > > our proposal for x86-64-v5: AVX10.2-256(Implying AVX10.1-256) + APX.
> > > One big reason is Intel E-core will only support AVX10 256-bit, if we
> > > want to use x86-64-v5 accross  server and client, it's better to
> > > 256-bit default.
> > + ABI and LLVM folked for this topic.
> s/folked/folks/
>
> > > >
> > > > --
> > > > Joseph S. Myers
> > > > joseph@codesourcery.com
> > >
> > >
> > >
> > > --
> > > BR,
> > > Hongtao
> >
> >
> >
> > --
> > BR,
> > Hongtao
>
>
>
> --
> BR,
> Hongtao
>
> --
> You received this message because you are subscribed to the Google Groups
> "X86-64 System V Application Binary Interface" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to x86-64-abi+unsubscribe@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/x86-64-abi/CAMZc-bzj5971PJ4UN2aB4LB-9nj4q_fRiykT9My3syohGLbZrw%40mail.gmail.com
> .
>

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Intel AVX10.1 Compiler Design and Support
  2023-08-09  4:01         ` Phoebe Wang
@ 2023-08-09  5:37           ` Richard Biener
  2023-08-09  6:24             ` Jiang, Haochen
  2023-08-09  8:14             ` Florian Weimer
  0 siblings, 2 replies; 88+ messages in thread
From: Richard Biener @ 2023-08-09  5:37 UTC (permalink / raw)
  To: Phoebe Wang
  Cc: Hongtao Liu, Joseph Myers, Haochen Jiang, gcc-patches, ubizjak,
	hongtao.liu, Zhang, Annita, phoebe.wang, x86-64-abi, llvm-dev,
	Craig Topper



> Am 09.08.2023 um 06:02 schrieb Phoebe Wang via Gcc-patches <gcc-patches@gcc.gnu.org>:
> 
> I have some proposals about unifying ABI on AVX10 for both 256-bit and
> 512-bit.
> 
> 
> 
> Proposal 1: Promote attribute from AVX10-256 to AVX10-512 for any function
> which has 512-bit or above vectors in passing/returning arguments.
> 
> Problem: Binary cannot run on AVX10-256 only target.
> 
> Reason:
> 
> When user tries to pass/return 512-bit vector, they should be aware of it
> will become target dependent. User should be taught not to use it on
> 256-bit targets and there will be unexpected things happening if they
> insist.
> 
> Actually, ICC and MSVC already have chosen to promote for the argument:
> https://godbolt.org/z/vcrf9qW5z I think if compiler have to choose the
> misbehavior between fail in result and crash due to illegal instruction,
> the latter is definitely better than the former.
> 
> In this way, we can also declare x86-64-v5 is inherit from x86-64-v4 and
> has the interaction with previous versions.
> 
> 
> 
> Proposal 2: Abort compilation when user tries to pass/return 512-bit
> vectors.
> 
> Reason: This turns possible run time crash into compile time error.
> 
> 
> 
> Proposal 3: Change the ABI of 512-bit vector and always be passed/returned
> from memory.

I don’t think we can realistically change the ABI.  If we could passing them in two 256bit registers would be possible as well.

Note I fully expect intel to turn around and implement 512 bits on a 256 but data path on the E cores in 5 years.  And it will take at least that time for AVX10 to take off (look at AVX512 for this and how they cautionously chose to include bf16 to cut off Zen4).  So IMHO we shouldn’t worry at all and just wait and see for AVX42 to arrive.

Richard 

> Reason: We expect AVX10-256 is a universal configuration and in most
> scenarios, 512-bit vector won't bring performance improvements. So we can
> sacrifice a little 512-bit performance to achieve the interaction between
> AVX10-256 and AVX10-512. In this way, there won't have any runtime issue in
> the future either.
> 
> 
> 
> Thanks
> 
> Phoebe
> 
> Hongtao Liu <crazylht@gmail.com> 于2023年8月9日周三 10:18写道：
> 
>>> On Wed, Aug 9, 2023 at 10:14 AM Hongtao Liu <crazylht@gmail.com> wrote:
>>> 
>>> On Wed, Aug 9, 2023 at 9:21 AM Hongtao Liu <crazylht@gmail.com> wrote:
>>>> 
>>>> On Wed, Aug 9, 2023 at 3:55 AM Joseph Myers <joseph@codesourcery.com>
>> wrote:
>>>>> 
>>>>> Do you have any comments on the interaction of AVX10 with the
>>>>> micro-architecture levels defined in the ABI (and supported with
>>>>> glibc-hwcaps directories in glibc)?  Given that the levels are
>> cumulative,
>>>>> should we take it that any future levels will be ones supporting
>> 512-bit
>>>>> vector width for AVX10 (because x86-64-v4 requires the current
>> AVX512F,
>>>>> AVX512BW, AVX512CD, AVX512DQ and AVX512VL) - and so any future
>> processors
>>>>> that only support 256-bit vector width will be considered to match
>> the
>>>>> x86-64-v3 micro-architecture level but not any higher level?
>>>> This is actually something we really want to discuss in the community,
>>>> our proposal for x86-64-v5: AVX10.2-256(Implying AVX10.1-256) + APX.
>>>> One big reason is Intel E-core will only support AVX10 256-bit, if we
>>>> want to use x86-64-v5 accross  server and client, it's better to
>>>> 256-bit default.
>>> + ABI and LLVM folked for this topic.
>> s/folked/folks/
>> 
>>>>> 
>>>>> --
>>>>> Joseph S. Myers
>>>>> joseph@codesourcery.com
>>>> 
>>>> 
>>>> 
>>>> --
>>>> BR,
>>>> Hongtao
>>> 
>>> 
>>> 
>>> --
>>> BR,
>>> Hongtao
>> 
>> 
>> 
>> --
>> BR,
>> Hongtao
>> 
>> --
>> You received this message because you are subscribed to the Google Groups
>> "X86-64 System V Application Binary Interface" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to x86-64-abi+unsubscribe@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/x86-64-abi/CAMZc-bzj5971PJ4UN2aB4LB-9nj4q_fRiykT9My3syohGLbZrw%40mail.gmail.com
>> .
>> 

^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: Intel AVX10.1 Compiler Design and Support
  2023-08-09  5:37           ` Richard Biener
@ 2023-08-09  6:24             ` Jiang, Haochen
  2023-08-09  8:14             ` Florian Weimer
  1 sibling, 0 replies; 88+ messages in thread
From: Jiang, Haochen @ 2023-08-09  6:24 UTC (permalink / raw)
  To: Richard Biener, Phoebe Wang
  Cc: Hongtao Liu, Joseph Myers, gcc-patches, ubizjak, Liu, Hongtao,
	Zhang, Annita, Wang, Phoebe, x86-64-abi, llvm-dev, Craig Topper

> -----Original Message-----
> From: Richard Biener <richard.guenther@gmail.com>
> Sent: Wednesday, August 9, 2023 1:38 PM
> To: Phoebe Wang <phoebe.pf.w@gmail.com>
> Cc: Hongtao Liu <crazylht@gmail.com>; Joseph Myers
> <joseph@codesourcery.com>; Jiang, Haochen <haochen.jiang@intel.com>; gcc-
> patches@gcc.gnu.org; ubizjak@gmail.com; Liu, Hongtao
> <hongtao.liu@intel.com>; Zhang, Annita <annita.zhang@intel.com>; Wang,
> Phoebe <phoebe.wang@intel.com>; x86-64-abi <x86-64-
> abi@googlegroups.com>; llvm-dev <llvm-dev@lists.llvm.org>; Craig Topper
> <craig.topper@gmail.com>
> Subject: Re: Intel AVX10.1 Compiler Design and Support
> 
> 
> 
> > Am 09.08.2023 um 06:02 schrieb Phoebe Wang via Gcc-patches <gcc-
> patches@gcc.gnu.org>:
> >
> > I have some proposals about unifying ABI on AVX10 for both 256-bit
> > and 512-bit.
> >
> >
> >
> > Proposal 1: Promote attribute from AVX10-256 to AVX10-512 for any
> > function which has 512-bit or above vectors in passing/returning arguments.
> >
> > Problem: Binary cannot run on AVX10-256 only target.
> >
> > Reason:
> >
> > When user tries to pass/return 512-bit vector, they should be aware of
> > it will become target dependent. User should be taught not to use it
> > on 256-bit targets and there will be unexpected things happening if
> > they insist.
> >
> > Actually, ICC and MSVC already have chosen to promote for the argument:
> > https://godbolt.org/z/vcrf9qW5z I think if compiler have to choose the
> > misbehavior between fail in result and crash due to illegal
> > instruction, the latter is definitely better than the former.
> >
> > In this way, we can also declare x86-64-v5 is inherit from x86-64-v4
> > and has the interaction with previous versions.
> >
> >
> >
> > Proposal 2: Abort compilation when user tries to pass/return 512-bit
> > vectors.
> >
> > Reason: This turns possible run time crash into compile time error.
> >
> >
> >
> > Proposal 3: Change the ABI of 512-bit vector and always be
> > passed/returned from memory.
> 
> I don’t think we can realistically change the ABI.  If we could passing them in two
> 256bit registers would be possible as well.
> 
> Note I fully expect intel to turn around and implement 512 bits on a 256 but data
> path on the E cores in 5 years.  And it will take at least that time for AVX10 to take
> off (look at AVX512 for this and how they cautionously chose to include bf16 to
> cut off Zen4).  So IMHO we shouldn’t worry at all and just wait and see for AVX42
> to arrive.

Let me try to clarify the whole thing.

I suppose Phoebe's "change" is based on LLVM.

In GCC, current behavior is to pass 512 bit vector in memory when there is no
512 bit support. But when there is support, everything should be passed in register.

In AVX10, I prefer to still keep to this pattern. But if most of you want to change it,
I have no objection since AVX10 is a new start.

Thx,
Haochen

> 
> Richard
> 
> > Reason: We expect AVX10-256 is a universal configuration and in most
> > scenarios, 512-bit vector won't bring performance improvements. So we
> > can sacrifice a little 512-bit performance to achieve the interaction
> > between
> > AVX10-256 and AVX10-512. In this way, there won't have any runtime
> > issue in the future either.
> >
> >
> >
> > Thanks
> >
> > Phoebe
> >
> > Hongtao Liu <crazylht@gmail.com> 于2023年8月9日周三 10:18写道：
> >
> >>> On Wed, Aug 9, 2023 at 10:14 AM Hongtao Liu <crazylht@gmail.com> wrote:
> >>>
> >>> On Wed, Aug 9, 2023 at 9:21 AM Hongtao Liu <crazylht@gmail.com> wrote:
> >>>>
> >>>> On Wed, Aug 9, 2023 at 3:55 AM Joseph Myers
> >>>> <joseph@codesourcery.com>
> >> wrote:
> >>>>>
> >>>>> Do you have any comments on the interaction of AVX10 with the
> >>>>> micro-architecture levels defined in the ABI (and supported with
> >>>>> glibc-hwcaps directories in glibc)?  Given that the levels are
> >> cumulative,
> >>>>> should we take it that any future levels will be ones supporting
> >> 512-bit
> >>>>> vector width for AVX10 (because x86-64-v4 requires the current
> >> AVX512F,
> >>>>> AVX512BW, AVX512CD, AVX512DQ and AVX512VL) - and so any future
> >> processors
> >>>>> that only support 256-bit vector width will be considered to match
> >> the
> >>>>> x86-64-v3 micro-architecture level but not any higher level?
> >>>> This is actually something we really want to discuss in the
> >>>> community, our proposal for x86-64-v5: AVX10.2-256(Implying AVX10.1-
> 256) + APX.
> >>>> One big reason is Intel E-core will only support AVX10 256-bit, if
> >>>> we want to use x86-64-v5 accross  server and client, it's better to
> >>>> 256-bit default.
> >>> + ABI and LLVM folked for this topic.
> >> s/folked/folks/
> >>
> >>>>>
> >>>>> --
> >>>>> Joseph S. Myers
> >>>>> joseph@codesourcery.com
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> BR,
> >>>> Hongtao
> >>>
> >>>
> >>>
> >>> --
> >>> BR,
> >>> Hongtao
> >>
> >>
> >>
> >> --
> >> BR,
> >> Hongtao
> >>
> >> --
> >> You received this message because you are subscribed to the Google
> >> Groups
> >> "X86-64 System V Application Binary Interface" group.
> >> To unsubscribe from this group and stop receiving emails from it,
> >> send an email to x86-64-abi+unsubscribe@googlegroups.com.
> >> To view this discussion on the web visit
> >> https://groups.google.com/d/msgid/x86-64-abi/CAMZc-bzj5971PJ4UN2aB4LB
> >> -9nj4q_fRiykT9My3syohGLbZrw%40mail.gmail.com
> >> .
> >>

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Intel AVX10.1 Compiler Design and Support
  2023-08-09  5:37           ` Richard Biener
  2023-08-09  6:24             ` Jiang, Haochen
@ 2023-08-09  8:14             ` Florian Weimer
  2023-08-09  8:24               ` Hongtao Liu
  1 sibling, 1 reply; 88+ messages in thread
From: Florian Weimer @ 2023-08-09  8:14 UTC (permalink / raw)
  To: Richard Biener via Gcc-patches
  Cc: Phoebe Wang, Richard Biener, Hongtao Liu, Joseph Myers,
	Haochen Jiang, ubizjak, hongtao.liu, Zhang, Annita, phoebe.wang,
	x86-64-abi, llvm-dev, Craig Topper

* Richard Biener via Gcc-patches:

> I don’t think we can realistically change the ABI.  If we could
> passing them in two 256bit registers would be possible as well.
>
> Note I fully expect intel to turn around and implement 512 bits on a
> 256 but data path on the E cores in 5 years.  And it will take at
> least that time for AVX10 to take off (look at AVX512 for this and how
> they cautionously chose to include bf16 to cut off Zen4).  So IMHO we
> shouldn’t worry at all and just wait and see for AVX42 to arrive.

Yes, the direction is a bit unclear.  In retrospect, we could have
defined x86-64-v4 to use 256 bit vector width, so it could eventually be
compatible with AVX10; it's also what current Intel CPUs prefer (and
past, with the exception of the Xeon Phi line).  But in the meantime,
AMD has started to ship CPUs that seem to prefer 512 bit vectors,
despite having a double pumped implementation.  (Disclaimer: All CPU
preferences inferred from current compiler tuning defaults, not actual
experiments. 8-/)

To me, this looks like we may have defined x86-64-v4 prematurely, and
this suggests we should wait a bit to see where things are heading.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Intel AVX10.1 Compiler Design and Support
  2023-08-09  8:14             ` Florian Weimer
@ 2023-08-09  8:24               ` Hongtao Liu
  0 siblings, 0 replies; 88+ messages in thread
From: Hongtao Liu @ 2023-08-09  8:24 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Richard Biener via Gcc-patches, Phoebe Wang, Richard Biener,
	Joseph Myers, Haochen Jiang, ubizjak, hongtao.liu, Zhang, Annita,
	phoebe.wang, x86-64-abi, llvm-dev, Craig Topper

On Wed, Aug 9, 2023 at 4:14 PM Florian Weimer <fweimer@redhat.com> wrote:
>
> * Richard Biener via Gcc-patches:
>
> > I don’t think we can realistically change the ABI.  If we could
> > passing them in two 256bit registers would be possible as well.
> >
> > Note I fully expect intel to turn around and implement 512 bits on a
> > 256 but data path on the E cores in 5 years.  And it will take at
> > least that time for AVX10 to take off (look at AVX512 for this and how
> > they cautionously chose to include bf16 to cut off Zen4).  So IMHO we
> > shouldn’t worry at all and just wait and see for AVX42 to arrive.
>
> Yes, the direction is a bit unclear.  In retrospect, we could have
> defined x86-64-v4 to use 256 bit vector width, so it could eventually be
> compatible with AVX10; it's also what current Intel CPUs prefer (and
NOTE, avx10.x-256 also inhibit the usage of 64-bit kmask which is
supposed to be only used  by zmm instructions.
But in theory, those 64-bit kmask intrinsics can be used standalone
.i.e. kshift/kand/kor.
> past, with the exception of the Xeon Phi line).  But in the meantime,
> AMD has started to ship CPUs that seem to prefer 512 bit vectors,
> despite having a double pumped implementation.  (Disclaimer: All CPU
> preferences inferred from current compiler tuning defaults, not actual
> experiments. 8-/)
>
> To me, this looks like we may have defined x86-64-v4 prematurely, and
> this suggests we should wait a bit to see where things are heading.
>
> Thanks,
> Florian
>


-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Intel AVX10.1 Compiler Design and Support
  2023-08-09  2:14     ` Hongtao Liu
  2023-08-09  2:18       ` Hongtao Liu
@ 2023-08-09  7:17       ` Jan Beulich
  2023-08-09  7:38         ` Hongtao Liu
  1 sibling, 1 reply; 88+ messages in thread
From: Jan Beulich @ 2023-08-09  7:17 UTC (permalink / raw)
  To: Hongtao Liu
  Cc: Haochen Jiang, gcc-patches, ubizjak, hongtao.liu, Zhang, Annita,
	phoebe.wang, x86-64-abi, llvm-dev, Craig Topper, Joseph Myers

On 09.08.2023 04:14, Hongtao Liu wrote:
> On Wed, Aug 9, 2023 at 9:21 AM Hongtao Liu <crazylht@gmail.com> wrote:
>>
>> On Wed, Aug 9, 2023 at 3:55 AM Joseph Myers <joseph@codesourcery.com> wrote:
>>>
>>> Do you have any comments on the interaction of AVX10 with the
>>> micro-architecture levels defined in the ABI (and supported with
>>> glibc-hwcaps directories in glibc)?  Given that the levels are cumulative,
>>> should we take it that any future levels will be ones supporting 512-bit
>>> vector width for AVX10 (because x86-64-v4 requires the current AVX512F,
>>> AVX512BW, AVX512CD, AVX512DQ and AVX512VL) - and so any future processors
>>> that only support 256-bit vector width will be considered to match the
>>> x86-64-v3 micro-architecture level but not any higher level?
>> This is actually something we really want to discuss in the community,
>> our proposal for x86-64-v5: AVX10.2-256(Implying AVX10.1-256) + APX.
>> One big reason is Intel E-core will only support AVX10 256-bit, if we
>> want to use x86-64-v5 accross  server and client, it's better to
>> 256-bit default.

Aiui these ABI levels were intended to be incremental, i.e. higher versions
would include everything earlier ones cover. Without such a guarantee, how
would you propose compatibility checks to be implemented in a way
applicable both forwards and backwards? If a new level is wanted here, then
I guess it could only be something like v3.5.

Jan

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Intel AVX10.1 Compiler Design and Support
  2023-08-09  7:17       ` Jan Beulich
@ 2023-08-09  7:38         ` Hongtao Liu
  2023-08-09  8:04           ` Jan Beulich
  2023-08-09  9:15           ` Florian Weimer
  0 siblings, 2 replies; 88+ messages in thread
From: Hongtao Liu @ 2023-08-09  7:38 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Haochen Jiang, gcc-patches, ubizjak, hongtao.liu, Zhang, Annita,
	phoebe.wang, x86-64-abi, llvm-dev, Craig Topper, Joseph Myers

On Wed, Aug 9, 2023 at 3:17 PM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 09.08.2023 04:14, Hongtao Liu wrote:
> > On Wed, Aug 9, 2023 at 9:21 AM Hongtao Liu <crazylht@gmail.com> wrote:
> >>
> >> On Wed, Aug 9, 2023 at 3:55 AM Joseph Myers <joseph@codesourcery.com> wrote:
> >>>
> >>> Do you have any comments on the interaction of AVX10 with the
> >>> micro-architecture levels defined in the ABI (and supported with
> >>> glibc-hwcaps directories in glibc)?  Given that the levels are cumulative,
> >>> should we take it that any future levels will be ones supporting 512-bit
> >>> vector width for AVX10 (because x86-64-v4 requires the current AVX512F,
> >>> AVX512BW, AVX512CD, AVX512DQ and AVX512VL) - and so any future processors
> >>> that only support 256-bit vector width will be considered to match the
> >>> x86-64-v3 micro-architecture level but not any higher level?
> >> This is actually something we really want to discuss in the community,
> >> our proposal for x86-64-v5: AVX10.2-256(Implying AVX10.1-256) + APX.
> >> One big reason is Intel E-core will only support AVX10 256-bit, if we
> >> want to use x86-64-v5 accross  server and client, it's better to
> >> 256-bit default.
>
> Aiui these ABI levels were intended to be incremental, i.e. higher versions
> would include everything earlier ones cover. Without such a guarantee, how
> would you propose compatibility checks to be implemented in a way
Are there many software implemenation based on this assumption?
At least in GCC, it's not a big problem, we can adjust code for the
new micro-architecture level.
> applicable both forwards and backwards? If a new level is wanted here, then
> I guess it could only be something like v3.5.
But if we use avx10.1 as v3.5, it's still not subset of
x86-64-v4(avx10.1 contains avx512fp16,avx512bf16 .etc which are not in
x86-64-v4), there will be still a diverge.
Then 256-bit of x86-64-v4 as v3.5? that's too weired to me.

Our main proposal is to make AVX10.x as new micro-architecture level
with 256-bit default, either v3.5 or v5 would be acceptable if it's
just the name.
>
> Jan



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Intel AVX10.1 Compiler Design and Support
  2023-08-09  7:38         ` Hongtao Liu
@ 2023-08-09  8:04           ` Jan Beulich
  2023-08-09  9:15           ` Florian Weimer
  1 sibling, 0 replies; 88+ messages in thread
From: Jan Beulich @ 2023-08-09  8:04 UTC (permalink / raw)
  To: Hongtao Liu
  Cc: Haochen Jiang, gcc-patches, ubizjak, hongtao.liu, Zhang, Annita,
	phoebe.wang, x86-64-abi, llvm-dev, Craig Topper, Joseph Myers

On 09.08.2023 09:38, Hongtao Liu wrote:
> On Wed, Aug 9, 2023 at 3:17 PM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 09.08.2023 04:14, Hongtao Liu wrote:
>>> On Wed, Aug 9, 2023 at 9:21 AM Hongtao Liu <crazylht@gmail.com> wrote:
>>>>
>>>> On Wed, Aug 9, 2023 at 3:55 AM Joseph Myers <joseph@codesourcery.com> wrote:
>>>>>
>>>>> Do you have any comments on the interaction of AVX10 with the
>>>>> micro-architecture levels defined in the ABI (and supported with
>>>>> glibc-hwcaps directories in glibc)?  Given that the levels are cumulative,
>>>>> should we take it that any future levels will be ones supporting 512-bit
>>>>> vector width for AVX10 (because x86-64-v4 requires the current AVX512F,
>>>>> AVX512BW, AVX512CD, AVX512DQ and AVX512VL) - and so any future processors
>>>>> that only support 256-bit vector width will be considered to match the
>>>>> x86-64-v3 micro-architecture level but not any higher level?
>>>> This is actually something we really want to discuss in the community,
>>>> our proposal for x86-64-v5: AVX10.2-256(Implying AVX10.1-256) + APX.
>>>> One big reason is Intel E-core will only support AVX10 256-bit, if we
>>>> want to use x86-64-v5 accross  server and client, it's better to
>>>> 256-bit default.
>>
>> Aiui these ABI levels were intended to be incremental, i.e. higher versions
>> would include everything earlier ones cover. Without such a guarantee, how
>> would you propose compatibility checks to be implemented in a way
> Are there many software implemenation based on this assumption?
> At least in GCC, it's not a big problem, we can adjust code for the
> new micro-architecture level.
>> applicable both forwards and backwards? If a new level is wanted here, then
>> I guess it could only be something like v3.5.
> But if we use avx10.1 as v3.5, it's still not subset of
> x86-64-v4(avx10.1 contains avx512fp16,avx512bf16 .etc which are not in
> x86-64-v4), there will be still a diverge.

Hmm, yes. But something will end up being odd in any event. Versions no
longer being integral values is kind of indicating a "branch", i.e. v4
not being a successor. Maybe v3.1 would be better, for it to then have
possible successors v3.2, v3.3, etc. Of course it would be possible to
"merge" branches back then, into e.g. v5 covering AVX10.2/512 (and
thus fully covering everything that's in v4).

Jan

> Then 256-bit of x86-64-v4 as v3.5? that's too weired to me.
> 
> Our main proposal is to make AVX10.x as new micro-architecture level
> with 256-bit default, either v3.5 or v5 would be acceptable if it's
> just the name.


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Intel AVX10.1 Compiler Design and Support
  2023-08-09  7:38         ` Hongtao Liu
  2023-08-09  8:04           ` Jan Beulich
@ 2023-08-09  9:15           ` Florian Weimer
  2023-08-09 10:15             ` Hongtao Liu
  2023-08-09 10:17             ` Zhang, Annita
  1 sibling, 2 replies; 88+ messages in thread
From: Florian Weimer @ 2023-08-09  9:15 UTC (permalink / raw)
  To: Hongtao Liu
  Cc: Jan Beulich, Haochen Jiang, gcc-patches, ubizjak, hongtao.liu,
	Zhang, Annita, phoebe.wang, x86-64-abi, llvm-dev, Craig Topper,
	Joseph Myers

* Hongtao Liu:

> On Wed, Aug 9, 2023 at 3:17 PM Jan Beulich <jbeulich@suse.com> wrote:
>> Aiui these ABI levels were intended to be incremental, i.e. higher versions
>> would include everything earlier ones cover. Without such a guarantee, how
>> would you propose compatibility checks to be implemented in a way

Correct, this was the intent.  But it's mostly to foster adoption and
make it easier for developers to pick the variants that they want to
target custom builds.  If it's an ascending chain, the trade-offs are
simpler.

> Are there many software implemenation based on this assumption?
> At least in GCC, it's not a big problem, we can adjust code for the
> new micro-architecture level.

The glibc framework can deal with alternate choices in principle,
although I'd prefer not to go there for the reasons indicated.

>> applicable both forwards and backwards? If a new level is wanted here, then
>> I guess it could only be something like v3.5.

> But if we use avx10.1 as v3.5, it's still not subset of
> x86-64-v4(avx10.1 contains avx512fp16,avx512bf16 .etc which are not in
> x86-64-v4), there will be still a diverge.
> Then 256-bit of x86-64-v4 as v3.5? that's too weired to me.

The question is whether you want to mandate the 16-bit floating point
extensions.  You might get better adoption if you stay compatible with
shipping CPUs.  Furthermore, the 256-bit tuning apparently benefits
current Intel CPUs, even though they can do 512-bit vectors.

(The thread subject is a bit misleading for this sub-topic, by the way.)

Thanks,
Florian


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Intel AVX10.1 Compiler Design and Support
  2023-08-09  9:15           ` Florian Weimer
@ 2023-08-09 10:15             ` Hongtao Liu
  2023-08-09 10:17             ` Zhang, Annita
  1 sibling, 0 replies; 88+ messages in thread
From: Hongtao Liu @ 2023-08-09 10:15 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Jan Beulich, Haochen Jiang, gcc-patches, ubizjak, hongtao.liu,
	Zhang, Annita, phoebe.wang, x86-64-abi, llvm-dev, Craig Topper,
	Joseph Myers

On Wed, Aug 9, 2023 at 5:15 PM Florian Weimer <fweimer@redhat.com> wrote:
>
> * Hongtao Liu:
>
> > On Wed, Aug 9, 2023 at 3:17 PM Jan Beulich <jbeulich@suse.com> wrote:
> >> Aiui these ABI levels were intended to be incremental, i.e. higher versions
> >> would include everything earlier ones cover. Without such a guarantee, how
> >> would you propose compatibility checks to be implemented in a way
>
> Correct, this was the intent.  But it's mostly to foster adoption and
> make it easier for developers to pick the variants that they want to
> target custom builds.  If it's an ascending chain, the trade-offs are
> simpler.
>
> > Are there many software implemenation based on this assumption?
> > At least in GCC, it's not a big problem, we can adjust code for the
> > new micro-architecture level.
>
> The glibc framework can deal with alternate choices in principle,
> although I'd prefer not to go there for the reasons indicated.
>
> >> applicable both forwards and backwards? If a new level is wanted here, then
> >> I guess it could only be something like v3.5.
>
> > But if we use avx10.1 as v3.5, it's still not subset of
> > x86-64-v4(avx10.1 contains avx512fp16,avx512bf16 .etc which are not in
> > x86-64-v4), there will be still a diverge.
> > Then 256-bit of x86-64-v4 as v3.5? that's too weired to me.
>
> The question is whether you want to mandate the 16-bit floating point
> extensions.  You might get better adoption if you stay compatible with
> shipping CPUs.  Furthermore, the 256-bit tuning apparently benefits
> current Intel CPUs, even though they can do 512-bit vectors.
Not only 16-bit floating point, here's a whole picture of  AVX512->AVX10 in
Figure 1-1. Intel® AVX-512 Feature Flags Across Intel® Xeon® Processor
Generations vs. Intel® AVX10
and Figure 1-2. Intel® ISA Families and Features
at https://cdrdv2.intel.com/v1/dl/getContent/784343 (this link is a
direct download of pdf).



>
> (The thread subject is a bit misleading for this sub-topic, by the way.)
>
> Thanks,
> Florian
>


-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: Intel AVX10.1 Compiler Design and Support
  2023-08-09  9:15           ` Florian Weimer
  2023-08-09 10:15             ` Hongtao Liu
@ 2023-08-09 10:17             ` Zhang, Annita
  2023-08-09 13:54               ` Michael Matz
  1 sibling, 1 reply; 88+ messages in thread
From: Zhang, Annita @ 2023-08-09 10:17 UTC (permalink / raw)
  To: Florian Weimer, Hongtao Liu
  Cc: Beulich, Jan, Jiang, Haochen, gcc-patches, ubizjak, Liu, Hongtao,
	Wang, Phoebe, x86-64-abi, llvm-dev, Craig Topper, Joseph Myers



> -----Original Message-----
> From: Florian Weimer <fweimer@redhat.com>
> Sent: Wednesday, August 9, 2023 5:16 PM
> To: Hongtao Liu <crazylht@gmail.com>
> Cc: Beulich, Jan <JBeulich@suse.com>; Jiang, Haochen
> <haochen.jiang@intel.com>; gcc-patches@gcc.gnu.org; ubizjak@gmail.com;
> Liu, Hongtao <hongtao.liu@intel.com>; Zhang, Annita
> <annita.zhang@intel.com>; Wang, Phoebe <phoebe.wang@intel.com>; x86-
> 64-abi <x86-64-abi@googlegroups.com>; llvm-dev <llvm-dev@lists.llvm.org>;
> Craig Topper <craig.topper@gmail.com>; Joseph Myers
> <joseph@codesourcery.com>
> Subject: Re: Intel AVX10.1 Compiler Design and Support
> 
> * Hongtao Liu:
> 
> > On Wed, Aug 9, 2023 at 3:17 PM Jan Beulich <jbeulich@suse.com> wrote:
> >> Aiui these ABI levels were intended to be incremental, i.e. higher
> >> versions would include everything earlier ones cover. Without such a
> >> guarantee, how would you propose compatibility checks to be
> >> implemented in a way
> 
> Correct, this was the intent.  But it's mostly to foster adoption and make it
> easier for developers to pick the variants that they want to target custom
> builds.  If it's an ascending chain, the trade-offs are simpler.
> 
> > Are there many software implemenation based on this assumption?
> > At least in GCC, it's not a big problem, we can adjust code for the
> > new micro-architecture level.
> 
> The glibc framework can deal with alternate choices in principle, although I'd
> prefer not to go there for the reasons indicated.
> 
> >> applicable both forwards and backwards? If a new level is wanted
> >> here, then I guess it could only be something like v3.5.
> 
> > But if we use avx10.1 as v3.5, it's still not subset of
> > x86-64-v4(avx10.1 contains avx512fp16,avx512bf16 .etc which are not in
> > x86-64-v4), there will be still a diverge.
> > Then 256-bit of x86-64-v4 as v3.5? that's too weired to me.
> 
> The question is whether you want to mandate the 16-bit floating point
> extensions.  You might get better adoption if you stay compatible with shipping
> CPUs.  Furthermore, the 256-bit tuning apparently benefits current Intel CPUs,
> even though they can do 512-bit vectors.
> 
> (The thread subject is a bit misleading for this sub-topic, by the way.)
> 
> Thanks,
> Florian

Since 256bit and 512bit are diverged from AVX10.1 and will continue in the future AVX10 versions, I think it's hard to keep a single version number to cover both and increase monotonically. Hence I'd like to suggest x86-64-v5 for 512bit and x86-64-v5-256 for 256bit, and so on. 

Thx,
Annita



 

^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: Intel AVX10.1 Compiler Design and Support
  2023-08-09 10:17             ` Zhang, Annita
@ 2023-08-09 13:54               ` Michael Matz
  2023-08-09 14:34                 ` Zhang, Annita
  0 siblings, 1 reply; 88+ messages in thread
From: Michael Matz @ 2023-08-09 13:54 UTC (permalink / raw)
  To: Zhang, Annita
  Cc: Florian Weimer, Hongtao Liu, Beulich, Jan, Jiang, Haochen,
	gcc-patches, ubizjak, Liu, Hongtao, Wang, Phoebe, x86-64-abi,
	llvm-dev, Craig Topper, Joseph Myers

Hello,

On Wed, 9 Aug 2023, Zhang, Annita via Gcc-patches wrote:

> > The question is whether you want to mandate the 16-bit floating point
> > extensions.  You might get better adoption if you stay compatible with shipping
> > CPUs.  Furthermore, the 256-bit tuning apparently benefits current Intel CPUs,
> > even though they can do 512-bit vectors.
> > 
> > (The thread subject is a bit misleading for this sub-topic, by the way.)
> > 
> > Thanks,
> > Florian
> 
> Since 256bit and 512bit are diverged from AVX10.1 and will continue in 
> the future AVX10 versions, I think it's hard to keep a single version 
> number to cover both and increase monotonically. Hence I'd like to 
> suggest x86-64-v5 for 512bit and x86-64-v5-256 for 256bit, and so on.

The raison d'etre for the x86-64-vX scheme is to make life sensible as 
distributor.  That goal can only be achieved if this scheme contains only 
a few components that have a simple relationship.  That basically means: 
one dimension only.  If you now add a second dimension (with and without 
-512) we have to add another one if Intel (or whomever else) next does a 
marketing stunt for feature "foobar" and end up with x86-64-v6, 
x86-64-v6-512, x86-64-v6-1024, x86-64-v6-foobar, x86-64-v6-512-foobar, 
x86-64-v6-1024-foobar.

In short: no.

It isn't the right time anyway to assign meaning to x86-64-v5, as it 
wasn't the right time for assigning x86-64-v4 (as we now see).  These are 
supposed to reflect generally useful feature sets actually shipped in 
generally available CPUs in the market, and be vendor independend.  As 
such it's much too early to define v5 based purely on text documents.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: Intel AVX10.1 Compiler Design and Support
  2023-08-09 13:54               ` Michael Matz
@ 2023-08-09 14:34                 ` Zhang, Annita
  0 siblings, 0 replies; 88+ messages in thread
From: Zhang, Annita @ 2023-08-09 14:34 UTC (permalink / raw)
  To: Michael Matz
  Cc: Florian Weimer, Hongtao Liu, Beulich, Jan, Jiang, Haochen,
	gcc-patches, ubizjak, Liu, Hongtao, Wang, Phoebe, x86-64-abi,
	llvm-dev, Craig Topper, Joseph Myers



> -----Original Message-----
> From: Michael Matz <matz@suse.de>
> Sent: Wednesday, August 9, 2023 9:54 PM
> To: Zhang, Annita <annita.zhang@intel.com>
> Cc: Florian Weimer <fweimer@redhat.com>; Hongtao Liu
> <crazylht@gmail.com>; Beulich, Jan <JBeulich@suse.com>; Jiang, Haochen
> <haochen.jiang@intel.com>; gcc-patches@gcc.gnu.org; ubizjak@gmail.com;
> Liu, Hongtao <hongtao.liu@intel.com>; Wang, Phoebe
> <phoebe.wang@intel.com>; x86-64-abi <x86-64-abi@googlegroups.com>;
> llvm-dev <llvm-dev@lists.llvm.org>; Craig Topper <craig.topper@gmail.com>;
> Joseph Myers <joseph@codesourcery.com>
> Subject: RE: Intel AVX10.1 Compiler Design and Support
> 
> Hello,
> 
> On Wed, 9 Aug 2023, Zhang, Annita via Gcc-patches wrote:
> 
> > > The question is whether you want to mandate the 16-bit floating
> > > point extensions.  You might get better adoption if you stay
> > > compatible with shipping CPUs.  Furthermore, the 256-bit tuning
> > > apparently benefits current Intel CPUs, even though they can do 512-bit
> vectors.
> > >
> > > (The thread subject is a bit misleading for this sub-topic, by the
> > > way.)
> > >
> > > Thanks,
> > > Florian
> >
> > Since 256bit and 512bit are diverged from AVX10.1 and will continue in
> > the future AVX10 versions, I think it's hard to keep a single version
> > number to cover both and increase monotonically. Hence I'd like to
> > suggest x86-64-v5 for 512bit and x86-64-v5-256 for 256bit, and so on.
> 
> The raison d'etre for the x86-64-vX scheme is to make life sensible as
> distributor.  That goal can only be achieved if this scheme contains only a few
> components that have a simple relationship.  That basically means:
> one dimension only.  If you now add a second dimension (with and without
> -512) we have to add another one if Intel (or whomever else) next does a
> marketing stunt for feature "foobar" and end up with x86-64-v6, x86-64-v6-
> 512, x86-64-v6-1024, x86-64-v6-foobar, x86-64-v6-512-foobar, x86-64-v6-
> 1024-foobar.
> 
> In short: no.
> 
> It isn't the right time anyway to assign meaning to x86-64-v5, as it wasn't the
> right time for assigning x86-64-v4 (as we now see).  These are supposed to
> reflect generally useful feature sets actually shipped in generally available CPUs
> in the market, and be vendor independend.  As such it's much too early to
> define v5 based purely on text documents.
> 
> 
> Ciao,
> Michael.

Make sense. 

^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: Intel AVX10.1 Compiler Design and Support
  2023-08-08  7:13 Intel AVX10.1 Compiler Design and Support Haochen Jiang
                   ` (10 preceding siblings ...)
  2023-08-08 19:55 ` Joseph Myers
@ 2023-08-10 15:08 ` Jiang, Haochen
  2023-08-10 16:00   ` Jakub Jelinek
  2023-08-19 22:44 ` ZiNgA BuRgA
  12 siblings, 1 reply; 88+ messages in thread
From: Jiang, Haochen @ 2023-08-10 15:08 UTC (permalink / raw)
  To: Jiang, Haochen, gcc-patches
  Cc: ubizjak, Liu, Hongtao, Beulich, Jan, Richard Biener,
	Joseph Myers, Phoebe Wang

Hi all,

There are lots of discussions on arch level and ABIs and I really appreciate that.

For the arch level issue, it might be a little early to discuss and should not block
these patches.

For ABI issue, the problem actually comes from the current behavior between
GCC and clang/LLVM are different in return value for m512 w/o 512 bit support.
Then it becomes a question to get unified and we get the whole discussion.
However, it is a corner case.

So let's first focus on the options design and the behavior on that. We could
continue to discuss those two issues after the main behavior is settled down.
Richard has raised some concerns in option combinations. Any other concerns?

Thx,
Haochen

> -----Original Message-----
> From: Gcc-patches <gcc-patches-
> bounces+haochen.jiang=intel.com@gcc.gnu.org> On Behalf Of Haochen Jiang via
> Gcc-patches
> Sent: Tuesday, August 8, 2023 3:13 PM
> To: gcc-patches@gcc.gnu.org
> Cc: ubizjak@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>
> Subject: Intel AVX10.1 Compiler Design and Support
> 
> Hi all,
> 
> We will send out our initial support of AVX10 and some sample patches in this
> mailing thread. And there will be more coming up afterwards. Therefore, we
> would like to share our proposed AVX10 design in GCC.
> 
> Here is a quick introduction to AVX10:
>   - AVX10 is the first major new ISA since the introduction of AVX512 in 2013.
>   - Since the introduction of AVX10, we would like to establish a common,
>     converged vector instruction set across all Intel architectures, including
>     Xeon Server, Atom Server and Clients.
>   - The default maximum vector size for AVX10 will be 256 bit, while 512 bit is
>     optional.
>   - AVX10.1 will include all existing AVX512 instructions in Granite Rapids.
>   - There will be no new AVX512 CPUID introduced in future. All EVEX vector
>     instructions will be under AVX10 umbrella.
>   - AVX10 will be version-based ISA instead of tons of different CPUIDs like
>     AVX512BW, AVX512DQ, AVX512FP16, etc.
>   - Based on AVX10.1, AVX10.2 will introduce ymm embedded rounding, SAE
>     (Suppressed All Exceptions) control and new instructions.
> 
> If you would like to have a closed look at the details, please follow the links
> below:
> 
> Intel Advanced Vector Extensions 10 (Intel AVX10) Architecture Specification It
> describes the Intel Advanced Vector Extensions 10 Instruction Set Architecture.
> https://cdrdv2.intel.com/v1/dl/getContent/784267
> 
> The Converged Vector ISA: Intel Advanced Vector Extensions 10 Technical Paper It
> provides introductory information regarding the converged vector ISA: Intel
> Advanced Vector Extensions 10.
> https://cdrdv2.intel.com/v1/dl/getContent/784343
> 
> Hence, we will have several compiler design ground rules for AVX10:
>   - AVX10 is a converged ISA feature set.
>     We will not provide -m[no-]xxx to enable/disable each single vector feature
>     in one version as we used to before. Instead, a simple option -m[no-]avx10.x
>     is used. If 512 bit version is needed, -mavx10.x-512 is all you need. Also,
>     maximum vector width should be the same when different version of AVX10 is
>     used. For example, enabling AVX10.1 with 512 bit vector width while enabling
>     AVX10.2 with only 256 bit vector width is not a desired behavior.
>   - AVX10 is an evolving ISA feature set.
>     Every feature showed up in the current version will always show up in future
>     version.
>   - AVX10 is an independent ISA feature set.
>     Although sharing the same instructions and encodings, AVX10 and AVX512 are
>     conceptual independent features, which means they are orthogonal.
> 
> Since AVX10 will have several benefits like bringing AVX512 features on Atom
> Server and Clients and getting rid of tons of AVX512 CPUIDs but a simple AVX10
> option to enable features, we lean towards the adoption of AVX10 instead of
> AVX512 from now on.
> 
> Based on all we got, we would like to introduce the following compiler options:
>   - -mavx10.x: The option will enable AVX10.1-AVX10.x features with a default
>     256 bit vector width to make sure the compatibility on all platforms.
>   - -mavx10.x-512: The option will enable AVX10.1-AVX10.x features with 512 bit
>     vector width. “-mno-avx10.x-512” option will not be provided to avoid
>     confusion of disabling 512 vector width or avx10.x itself.
>   - -mavx10.x-256: The option will enable AVX10.1-AVX10.x features with 256 bit
>     vector width. But it will disable 512 bit vector width since the vector size
>     is indicated in option. “-mno-avx10.x-256” option will not be provided to
>     keep align with the 512 ones.
>   - -mno-avx10.x: The option will disable all the features introduced >=avx10.x
>     (both 256 and 512 bit) and keep features <avx10.x if enabled, just like how
>     -mno- options behave previously.
> 
> When there comes an option combination of various vector size indicated (e.g. -
> mavx10.x-512 -mavx10.y-256), we would like to emit a warning since the vector
> size conflicts under this scenario. Also in the warning message, we will indicate
> the last mentioned vector size will be picked. The ISA set will be the highest one.
> 
> For the auto dispatch support including function __builtin_cpu_supports (),
> function multi versioning, function attribute usage, the behavior will be identical
> to compiler options, which means we will have avx10.x, avx10.x-256,
> avx10.x-512 and no-avx10.x.
> 
> As we have mentioned before, we lean towards the adoption of AVX10 instead of
> AVX512 from now on. Hence, we don’t recommend users to combine the AVX10
> and legacy AVX512 options since different users will have different opinions on
> compiler behavior with option combinations like “-m[no-]avx10.1 -m[no-
> ]avx512f"
> and it is hard to tell whether compiler should open or close the feature under
> those scenarios. Furthermore, we don't guarantee that the behavior is consistent
> between GCC and LLVM/ICX.
> 
> From our understanding, we propose to maintain the independency between
> AVX10 and AVX512 switches. Therefore, opening one of them will turn on the
> feature, no matter the other one is opened or not. We will emit a warning when
> user enables one feature but disable the other afterwards. Some typical examples
> are given to help better understand that:
>   - -mno-avx512xxx: It will check if AVX10.1 is disabled when handling the
>     option. If AVX10.1 is  disabled, it is valid and then disables AVX512xxx.
>     If AVX10.1 not disabled, a warning will be emitted and -mno-avx512xxx will
>     be ignored.
>   - -mno-avx10.1: It will check if all AVX512 features in Granite Rapids are
>     disabled when handling the option. If all disabled, it is valid and then
>     disables all the features. If not, a warning will be emitted and
>     -mno-avx10.1 will be ignored.
>   - -mno-avx10.x (x >= 2): It is always valid.
> 
> Also, since we maintain the independency between AVX10 and AVX512 switches,
> when using a compiler option of “-mavx10.x[-256] -mavx512xxx”, it will actually
> open all the AVX10.x 128/256 bit vector instruction support and 512 bit vector
> instruction support for AVX512xxx.
> 
> Last thing needed to be mentioned is -march options. We will imply AVX10
> features for future platforms with AVX10 available, i.e., AVX10/512 for Xeon
> Servers and AVX10/256 for Atom Servers and Clients. We purpose to change the
> current -march=graniterapids/graniterapids-d from implying AVX512 features to
> AVX10.1/512. No obvious behavior changes will happen for these two -march.
> 
> There will be a minor open after implying change: when we are using -
> march=graniterapids -mno-avx512f or -mno-avx512f -march=graniterapids, it will
> not disable AVX512F and it is a change in behavior. Should we emit a warning for
> that? Our current behavior is not to emit a warning but I am open for changes.
> However, I suppose if we finally choose to emit a warning, it should only happen
> in Granite Rapids and Granite Rapids D since for the next generation Xeon Server
> product, user should be aware of AVX10 change.
> 
> For the following nine patches, first three of them will be the initial support for
> AVX10.1 while the latter six is the AVX10.1 support for AVX512DQ+AVX512VL.
> 
> If you have any questions, feel free to ask in this thread. Also, if you are working
> on AVX512 related patterns during AVX10 upstreaming, especially constraints,
> target check and iterators related, please kindly cc me in the patches since there
> might be some conflicts.
> 
> Thx,
> Haochen
> 


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Intel AVX10.1 Compiler Design and Support
  2023-08-10 15:08 ` Jiang, Haochen
@ 2023-08-10 16:00   ` Jakub Jelinek
  0 siblings, 0 replies; 88+ messages in thread
From: Jakub Jelinek @ 2023-08-10 16:00 UTC (permalink / raw)
  To: Jiang, Haochen
  Cc: gcc-patches, ubizjak, Liu, Hongtao, Beulich, Jan, Richard Biener,
	Joseph Myers, Phoebe Wang

On Thu, Aug 10, 2023 at 03:08:31PM +0000, Jiang, Haochen via Gcc-patches wrote:
> There are lots of discussions on arch level and ABIs and I really appreciate that.
> 
> For the arch level issue, it might be a little early to discuss and should not block
> these patches.
> 
> For ABI issue, the problem actually comes from the current behavior between
> GCC and clang/LLVM are different in return value for m512 w/o 512 bit support.
> Then it becomes a question to get unified and we get the whole discussion.
> However, it is a corner case.

What LLVM does looks just wrong to me.

Try:

typedef int V256 __attribute__((vector_size (32)));
typedef int V512 __attribute__((vector_size (64)));
typedef int V1024 __attribute__((vector_size (128)));

V256
foo256 (V256 x, V256 y)
{
  return x + y;
}

V512
foo512 (V512 x, V512 y)
{
  return x + y;
}

V1024
foo1024 (V1024 x, V1024 y)
{
  return x + y;
}

with -msse4, -mavx2 and -mavx512f.
GCC passes all arguments and all return values in memory with warnings for
the first case, all but foo256 in the second case and everything in foo1024
in the last case.  That matches the psABI without/with __m256 and/or __m512
additions, it is unfortunate that there is no interoperability between the
pre-AVX2 vs. AVX2+ resp. pre-AVX512F vs. AVX512F+ passing/returning, but
that is a consequence of wanting to get fast code on new ISAs.

While LLVM passes all the arguments the same as GCC (though without
warnings), but for foo256 returns the result in xmm0/xmm1 pair with -msse4
and in ymm0 for -mavx2 and later, for foo512 returns the result in
xmm0/xmm1/xmm2/xmm3 quadruplet for -msse4, in ymm0/ymm1 pair for -mavx2 and
finally in zmm0 for -mavx512f.  And for foo1024 in memory for -msse4,
in ymm0/ymm1/ymm2/ymm3 quadruplet for -mavx2 and in zmm0/zmm1 pair for
-mavx512f.  I have no idea what in psABI would that be based on, both the
different passing of arguments vs. returning of result, but more
importantly, this doesn't mean 2 different ABIs for one function depending
on ISA flags, but 3, maybe 4 (with -mno-sse?).

	Jakub


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Intel AVX10.1 Compiler Design and Support
  2023-08-08  7:13 Intel AVX10.1 Compiler Design and Support Haochen Jiang
                   ` (11 preceding siblings ...)
  2023-08-10 15:08 ` Jiang, Haochen
@ 2023-08-19 22:44 ` ZiNgA BuRgA
  2023-08-20  5:44   ` Richard Biener
  2023-08-21  1:19   ` Hongtao Liu
  12 siblings, 2 replies; 88+ messages in thread
From: ZiNgA BuRgA @ 2023-08-19 22:44 UTC (permalink / raw)
  To: haochen.jiang; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1136 bytes --]

Hi,

With the proposed design of these switches, how would I restrict AVX10.1 
to particular AVX-512 subsets?

For example, usage of the |_mm256_rol_epi32| intrinsic should be 
compatible on any AVX10/256 implementation, /as well as /any AVX-512VL 
without AVX10 implementation (e.g. Skylake-X).  But how do I signal that 
I want compatibility with both these targets?

  * |-mavx512vl| lets the compiler use 512-bit registers -> incompatible
    with 256-bit AVX10.
  * |-mavx512vl -mprefer-vector-width=256| might steer the compiler away
    from 512-bit registers, but I don't think it guarantees it.
  * |-mavx10.1-256| lets the compiler use all Sapphire Rapids AVX-512
    features at 256-bit wide (so in theory, it could choose to compile
    it with |vpshldd|) -> incompatible with Skylake-X.
  * |-mavx10.1-256 -mno-avx512fp16 -mno-avx512...| will emit a warning
    and ignore the attempts at disabling AVX-512 subsets.
  * |-mavx10.1-256 -mavx512vl| takes the /union/ of the features, not
    the /intersection./

Is there something like |-mavx512vl -mmax-vector-width=256|, or am I 
misunderstanding the situation?

Thanks!

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Intel AVX10.1 Compiler Design and Support
  2023-08-19 22:44 ` ZiNgA BuRgA
@ 2023-08-20  5:44   ` Richard Biener
  2023-08-21  1:19   ` Hongtao Liu
  1 sibling, 0 replies; 88+ messages in thread
From: Richard Biener @ 2023-08-20  5:44 UTC (permalink / raw)
  To: ZiNgA BuRgA; +Cc: haochen.jiang, gcc-patches



> Am 20.08.2023 um 00:45 schrieb ZiNgA BuRgA via Gcc-patches <gcc-patches@gcc.gnu.org>:
> 
> Hi,
> 
> With the proposed design of these switches, how would I restrict AVX10.1 to particular AVX-512 subsets?
> 
> For example, usage of the |_mm256_rol_epi32| intrinsic should be compatible on any AVX10/256 implementation, /as well as /any AVX-512VL without AVX10 implementation (e.g. Skylake-X).  But how do I signal that I want compatibility with both these targets?
> 
> * |-mavx512vl| lets the compiler use 512-bit registers -> incompatible
>   with 256-bit AVX10.
> * |-mavx512vl -mprefer-vector-width=256| might steer the compiler away
>   from 512-bit registers, but I don't think it guarantees it.

We’ve been taking these cases as bugs (but yes, intrinsics are still allowed, so in some cases it might prove difficult to guarantee this).

I don’t see any other way of doing what you want within the constraints of this design.

> * |-mavx10.1-256| lets the compiler use all Sapphire Rapids AVX-512
>   features at 256-bit wide (so in theory, it could choose to compile
>   it with |vpshldd|) -> incompatible with Skylake-X.
> * |-mavx10.1-256 -mno-avx512fp16 -mno-avx512...| will emit a warning
>   and ignore the attempts at disabling AVX-512 subsets.
> * |-mavx10.1-256 -mavx512vl| takes the /union/ of the features, not
>   the /intersection./
> 
> Is there something like |-mavx512vl -mmax-vector-width=256|, or am I misunderstanding the situation?
> 
> Thanks!

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Intel AVX10.1 Compiler Design and Support
  2023-08-19 22:44 ` ZiNgA BuRgA
  2023-08-20  5:44   ` Richard Biener
@ 2023-08-21  1:19   ` Hongtao Liu
  2023-08-21  7:36     ` Richard Biener
  2023-08-21  7:49     ` ZiNgA BuRgA
  1 sibling, 2 replies; 88+ messages in thread
From: Hongtao Liu @ 2023-08-21  1:19 UTC (permalink / raw)
  To: ZiNgA BuRgA; +Cc: haochen.jiang, gcc-patches

On Sun, Aug 20, 2023 at 6:44 AM ZiNgA BuRgA via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Hi,
>
> With the proposed design of these switches, how would I restrict AVX10.1
> to particular AVX-512 subsets?
We can't, avx10.1 is taken as an indivisible ISA which contains all
AVX512 related instructions.

> We’ve been taking these cases as bugs (but yes, intrinsics are still allowed, so in some cases it might prove difficult to guarantee this).
intel sde support avx10.1-256 target which can be used to validate the
binary(if there's invalid 512-bit vector register or 64-bit kmask
register is used).
> I don’t see any other way of doing what you want within the constraints of this design.
It looks like the requirement is that we want a
-mavx10-vector-width=256(or maybe reuse -mprefer-vector-width=256)
option that acts on the original -mavx512XXX option to produce
avx10.1-256 compatible binary. we can't use -mavx10.1-256 since it may
include avx512fp16 directives and thus not be backward compatible
SKX/CLX/ICX.
>
> For example, usage of the |_mm256_rol_epi32| intrinsic should be
> compatible on any AVX10/256 implementation, /as well as /any AVX-512VL
> without AVX10 implementation (e.g. Skylake-X).  But how do I signal that
> I want compatibility with both these targets?
>
>   * |-mavx512vl| lets the compiler use 512-bit registers -> incompatible
>     with 256-bit AVX10.
>   * |-mavx512vl -mprefer-vector-width=256| might steer the compiler away
>     from 512-bit registers, but I don't think it guarantees it.
>   * |-mavx10.1-256| lets the compiler use all Sapphire Rapids AVX-512
>     features at 256-bit wide (so in theory, it could choose to compile
>     it with |vpshldd|) -> incompatible with Skylake-X.
>   * |-mavx10.1-256 -mno-avx512fp16 -mno-avx512...| will emit a warning
>     and ignore the attempts at disabling AVX-512 subsets.
>   * |-mavx10.1-256 -mavx512vl| takes the /union/ of the features, not
>     the /intersection./
>
> Is there something like |-mavx512vl -mmax-vector-width=256|, or am I
> misunderstanding the situation?
>
> Thanks!



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Intel AVX10.1 Compiler Design and Support
  2023-08-21  1:19   ` Hongtao Liu
@ 2023-08-21  7:36     ` Richard Biener
  2023-08-21  8:09       ` Jakub Jelinek
  2023-08-21  9:26       ` ZiNgA BuRgA
  2023-08-21  7:49     ` ZiNgA BuRgA
  1 sibling, 2 replies; 88+ messages in thread
From: Richard Biener @ 2023-08-21  7:36 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: ZiNgA BuRgA, haochen.jiang, gcc-patches

On Mon, Aug 21, 2023 at 3:20 AM Hongtao Liu via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On Sun, Aug 20, 2023 at 6:44 AM ZiNgA BuRgA via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > Hi,
> >
> > With the proposed design of these switches, how would I restrict AVX10.1
> > to particular AVX-512 subsets?
> We can't, avx10.1 is taken as an indivisible ISA which contains all
> AVX512 related instructions.
>
> > We’ve been taking these cases as bugs (but yes, intrinsics are still allowed, so in some cases it might prove difficult to guarantee this).
> intel sde support avx10.1-256 target which can be used to validate the
> binary(if there's invalid 512-bit vector register or 64-bit kmask
> register is used).
> > I don’t see any other way of doing what you want within the constraints of this design.
> It looks like the requirement is that we want a
> -mavx10-vector-width=256(or maybe reuse -mprefer-vector-width=256)
> option that acts on the original -mavx512XXX option to produce
> avx10.1-256 compatible binary. we can't use -mavx10.1-256 since it may
> include avx512fp16 directives and thus not be backward compatible
> SKX/CLX/ICX.

Yes.  Note we cannot really re-purpose -mprefer-vector-width=256 since that
would also make uses of 512bit intrinsics ill-formed.  So we'd need a new
flag that would restrict AVX512VL to 256bit, possibly using a common internal
flag for this and the -mavx10.1-256 vector size effect.

Maybe -mdisable-vector-width-512 or -mavx512vl-for-avx10.1-256 or
-mavx512vl-256?  Writing these the last looks most sensible to me?
Note it should combine with -mavx512vl to -mavx512vl-256 to make
-march=native -mavx512vl-256 work (I think we should also allow the
flag together with -mavx10.1*?)

mavx512vl-256
Target ...
Disable the 512bit vector ISA subset of AVX512 or AVX10, enable
the 256bit vector ISA subset of AVX512.

Richard.

> >
> > For example, usage of the |_mm256_rol_epi32| intrinsic should be
> > compatible on any AVX10/256 implementation, /as well as /any AVX-512VL
> > without AVX10 implementation (e.g. Skylake-X).  But how do I signal that
> > I want compatibility with both these targets?
> >
> >   * |-mavx512vl| lets the compiler use 512-bit registers -> incompatible
> >     with 256-bit AVX10.
> >   * |-mavx512vl -mprefer-vector-width=256| might steer the compiler away
> >     from 512-bit registers, but I don't think it guarantees it.
> >   * |-mavx10.1-256| lets the compiler use all Sapphire Rapids AVX-512
> >     features at 256-bit wide (so in theory, it could choose to compile
> >     it with |vpshldd|) -> incompatible with Skylake-X.
> >   * |-mavx10.1-256 -mno-avx512fp16 -mno-avx512...| will emit a warning
> >     and ignore the attempts at disabling AVX-512 subsets.
> >   * |-mavx10.1-256 -mavx512vl| takes the /union/ of the features, not
> >     the /intersection./
> >
> > Is there something like |-mavx512vl -mmax-vector-width=256|, or am I
> > misunderstanding the situation?
> >
> > Thanks!
>
>
>
> --
> BR,
> Hongtao

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Intel AVX10.1 Compiler Design and Support
  2023-08-21  7:36     ` Richard Biener
@ 2023-08-21  8:09       ` Jakub Jelinek
  2023-08-21  8:28         ` Hongtao Liu
  2023-08-21  9:26       ` ZiNgA BuRgA
  1 sibling, 1 reply; 88+ messages in thread
From: Jakub Jelinek @ 2023-08-21  8:09 UTC (permalink / raw)
  To: Richard Biener; +Cc: Hongtao Liu, ZiNgA BuRgA, haochen.jiang, gcc-patches

On Mon, Aug 21, 2023 at 09:36:16AM +0200, Richard Biener via Gcc-patches wrote:
> > On Sun, Aug 20, 2023 at 6:44 AM ZiNgA BuRgA via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> > >
> > > Hi,
> > >
> > > With the proposed design of these switches, how would I restrict AVX10.1
> > > to particular AVX-512 subsets?
> > We can't, avx10.1 is taken as an indivisible ISA which contains all
> > AVX512 related instructions.
> >
> > > We’ve been taking these cases as bugs (but yes, intrinsics are still allowed, so in some cases it might prove difficult to guarantee this).
> > intel sde support avx10.1-256 target which can be used to validate the
> > binary(if there's invalid 512-bit vector register or 64-bit kmask
> > register is used).
> > > I don’t see any other way of doing what you want within the constraints of this design.
> > It looks like the requirement is that we want a
> > -mavx10-vector-width=256(or maybe reuse -mprefer-vector-width=256)
> > option that acts on the original -mavx512XXX option to produce
> > avx10.1-256 compatible binary. we can't use -mavx10.1-256 since it may
> > include avx512fp16 directives and thus not be backward compatible
> > SKX/CLX/ICX.
> 
> Yes.  Note we cannot really re-purpose -mprefer-vector-width=256 since that
> would also make uses of 512bit intrinsics ill-formed.  So we'd need a new
> flag that would restrict AVX512VL to 256bit, possibly using a common internal
> flag for this and the -mavx10.1-256 vector size effect.
> 
> Maybe -mdisable-vector-width-512 or -mavx512vl-for-avx10.1-256 or
> -mavx512vl-256?  Writing these the last looks most sensible to me?
> Note it should combine with -mavx512vl to -mavx512vl-256 to make
> -march=native -mavx512vl-256 work (I think we should also allow the
> flag together with -mavx10.1*?)
> 
> mavx512vl-256
> Target ...
> Disable the 512bit vector ISA subset of AVX512 or AVX10, enable
> the 256bit vector ISA subset of AVX512.

Wouldn't it be better to have it similarly to other ISA options as something
positive, say -mevex512 (the ISA docs talk about EVEX.512, EVEX.256 and
EVEX.128)?
Have -mavx512f (and anything that implies it right now) imply also -mevex512
but allow -mno-evex512 which wouldn't unset everything dependent on
-mavx512f.  There is one gotcha, if -mavx512vl isn't enabled in the end,
then -mavx512f -mno-evex512 should disable whole TARGET_AVX512F because
nothing is left.
TARGET_EVEX512 then would guard all TARGET_AVX512* intrinsics which operate
on 512-bit vector registers or 64-bit mask registers (in addition to the
other TARGET_AVX512* options, perhaps except TARGET_AVX512F), whether the
512-bit modes can be used etc.

	Jakub


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Intel AVX10.1 Compiler Design and Support
  2023-08-21  8:09       ` Jakub Jelinek
@ 2023-08-21  8:28         ` Hongtao Liu
  2023-08-21  8:37           ` Jakub Jelinek
  2023-08-21  9:34           ` Richard Biener
  0 siblings, 2 replies; 88+ messages in thread
From: Hongtao Liu @ 2023-08-21  8:28 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Richard Biener, ZiNgA BuRgA, haochen.jiang, gcc-patches

On Mon, Aug 21, 2023 at 4:09 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Mon, Aug 21, 2023 at 09:36:16AM +0200, Richard Biener via Gcc-patches wrote:
> > > On Sun, Aug 20, 2023 at 6:44 AM ZiNgA BuRgA via Gcc-patches
> > > <gcc-patches@gcc.gnu.org> wrote:
> > > >
> > > > Hi,
> > > >
> > > > With the proposed design of these switches, how would I restrict AVX10.1
> > > > to particular AVX-512 subsets?
> > > We can't, avx10.1 is taken as an indivisible ISA which contains all
> > > AVX512 related instructions.
> > >
> > > > We’ve been taking these cases as bugs (but yes, intrinsics are still allowed, so in some cases it might prove difficult to guarantee this).
> > > intel sde support avx10.1-256 target which can be used to validate the
> > > binary(if there's invalid 512-bit vector register or 64-bit kmask
> > > register is used).
> > > > I don’t see any other way of doing what you want within the constraints of this design.
> > > It looks like the requirement is that we want a
> > > -mavx10-vector-width=256(or maybe reuse -mprefer-vector-width=256)
> > > option that acts on the original -mavx512XXX option to produce
> > > avx10.1-256 compatible binary. we can't use -mavx10.1-256 since it may
> > > include avx512fp16 directives and thus not be backward compatible
> > > SKX/CLX/ICX.
> >
> > Yes.  Note we cannot really re-purpose -mprefer-vector-width=256 since that
> > would also make uses of 512bit intrinsics ill-formed.  So we'd need a new
> > flag that would restrict AVX512VL to 256bit, possibly using a common internal
> > flag for this and the -mavx10.1-256 vector size effect.
> >
> > Maybe -mdisable-vector-width-512 or -mavx512vl-for-avx10.1-256 or
> > -mavx512vl-256?  Writing these the last looks most sensible to me?
> > Note it should combine with -mavx512vl to -mavx512vl-256 to make
> > -march=native -mavx512vl-256 work (I think we should also allow the
> > flag together with -mavx10.1*?)
> >
> > mavx512vl-256
> > Target ...
> > Disable the 512bit vector ISA subset of AVX512 or AVX10, enable
> > the 256bit vector ISA subset of AVX512.
>
> Wouldn't it be better to have it similarly to other ISA options as something
> positive, say -mevex512 (the ISA docs talk about EVEX.512, EVEX.256 and
> EVEX.128)?
> Have -mavx512f (and anything that implies it right now) imply also -mevex512
> but allow -mno-evex512 which wouldn't unset everything dependent on
> -mavx512f.  There is one gotcha, if -mavx512vl isn't enabled in the end,
> then -mavx512f -mno-evex512 should disable whole TARGET_AVX512F because
> nothing is left.
> TARGET_EVEX512 then would guard all TARGET_AVX512* intrinsics which operate
> on 512-bit vector registers or 64-bit mask registers (in addition to the
> other TARGET_AVX512* options, perhaps except TARGET_AVX512F), whether the
> 512-bit modes can be used etc.
We have an undocumented option mavx10-max-512bit.

1314;; Only for implementation use
1315mavx10-max-512bit
1316Target Mask(ISA2_AVX10_512BIT) Var(ix86_isa_flags2) Undocumented Save
1317Indicates 512 bit vector width support for AVX10.

Currently it's only used for AVX10 only, maybe we can extend it to
existing AVX512*** FLAGS.
so users can use -mavx512XXX -mno-avx10-max-512bit to get avx10.1-256
compatible binaries.

From the implementation perspective, we need to restrict all 512-bit
vector patterns/builtins/intrinsics under both AVX512XXX and
TARGET_AVX10_512BIT.
similar for register allocation, parameter passing, return value,
vector_mode_supported_p, gather/scatter hook, and all other hooks.
After that, the -mavx10-max-512bit will divide existing AVX512 into 2
parts, AVX512XXX-256, AVX512XXX-512.


>
>         Jakub
>


-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Intel AVX10.1 Compiler Design and Support
  2023-08-21  8:28         ` Hongtao Liu
@ 2023-08-21  8:37           ` Jakub Jelinek
  2023-08-21  8:46             ` Hongtao Liu
  2023-08-21  9:34           ` Richard Biener
  1 sibling, 1 reply; 88+ messages in thread
From: Jakub Jelinek @ 2023-08-21  8:37 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: Richard Biener, ZiNgA BuRgA, haochen.jiang, gcc-patches

On Mon, Aug 21, 2023 at 04:28:20PM +0800, Hongtao Liu wrote:
> We have an undocumented option mavx10-max-512bit.

How it is called internally is one thing, but it is weird to use
avx10 in an option name which would be meant for finding common subset
of -mavx512xxx and -mavx10.1-256.

	Jakub


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Intel AVX10.1 Compiler Design and Support
  2023-08-21  8:37           ` Jakub Jelinek
@ 2023-08-21  8:46             ` Hongtao Liu
  0 siblings, 0 replies; 88+ messages in thread
From: Hongtao Liu @ 2023-08-21  8:46 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Richard Biener, ZiNgA BuRgA, haochen.jiang, gcc-patches

On Mon, Aug 21, 2023 at 4:38 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Mon, Aug 21, 2023 at 04:28:20PM +0800, Hongtao Liu wrote:
> > We have an undocumented option mavx10-max-512bit.
>
> How it is called internally is one thing, but it is weird to use
> avx10 in an option name which would be meant for finding common subset
> of -mavx512xxx and -mavx10.1-256.
We can have an alias for the name, but internally use the same bit
since they're doing the same thing.
And the option is somewhat orthogonal to  AVX512XXX/AVX10, it only
care about vector/kmask size.
>
>         Jakub
>


-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Intel AVX10.1 Compiler Design and Support
  2023-08-21  8:28         ` Hongtao Liu
  2023-08-21  8:37           ` Jakub Jelinek
@ 2023-08-21  9:34           ` Richard Biener
  2023-08-21  9:36             ` Richard Biener
  2023-08-21  9:50             ` Hongtao Liu
  1 sibling, 2 replies; 88+ messages in thread
From: Richard Biener @ 2023-08-21  9:34 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: Jakub Jelinek, ZiNgA BuRgA, haochen.jiang, gcc-patches

On Mon, Aug 21, 2023 at 10:28 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Mon, Aug 21, 2023 at 4:09 PM Jakub Jelinek <jakub@redhat.com> wrote:
> >
> > On Mon, Aug 21, 2023 at 09:36:16AM +0200, Richard Biener via Gcc-patches wrote:
> > > > On Sun, Aug 20, 2023 at 6:44 AM ZiNgA BuRgA via Gcc-patches
> > > > <gcc-patches@gcc.gnu.org> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > With the proposed design of these switches, how would I restrict AVX10.1
> > > > > to particular AVX-512 subsets?
> > > > We can't, avx10.1 is taken as an indivisible ISA which contains all
> > > > AVX512 related instructions.
> > > >
> > > > > We’ve been taking these cases as bugs (but yes, intrinsics are still allowed, so in some cases it might prove difficult to guarantee this).
> > > > intel sde support avx10.1-256 target which can be used to validate the
> > > > binary(if there's invalid 512-bit vector register or 64-bit kmask
> > > > register is used).
> > > > > I don’t see any other way of doing what you want within the constraints of this design.
> > > > It looks like the requirement is that we want a
> > > > -mavx10-vector-width=256(or maybe reuse -mprefer-vector-width=256)
> > > > option that acts on the original -mavx512XXX option to produce
> > > > avx10.1-256 compatible binary. we can't use -mavx10.1-256 since it may
> > > > include avx512fp16 directives and thus not be backward compatible
> > > > SKX/CLX/ICX.
> > >
> > > Yes.  Note we cannot really re-purpose -mprefer-vector-width=256 since that
> > > would also make uses of 512bit intrinsics ill-formed.  So we'd need a new
> > > flag that would restrict AVX512VL to 256bit, possibly using a common internal
> > > flag for this and the -mavx10.1-256 vector size effect.
> > >
> > > Maybe -mdisable-vector-width-512 or -mavx512vl-for-avx10.1-256 or
> > > -mavx512vl-256?  Writing these the last looks most sensible to me?
> > > Note it should combine with -mavx512vl to -mavx512vl-256 to make
> > > -march=native -mavx512vl-256 work (I think we should also allow the
> > > flag together with -mavx10.1*?)
> > >
> > > mavx512vl-256
> > > Target ...
> > > Disable the 512bit vector ISA subset of AVX512 or AVX10, enable
> > > the 256bit vector ISA subset of AVX512.
> >
> > Wouldn't it be better to have it similarly to other ISA options as something
> > positive, say -mevex512 (the ISA docs talk about EVEX.512, EVEX.256 and
> > EVEX.128)?
> > Have -mavx512f (and anything that implies it right now) imply also -mevex512
> > but allow -mno-evex512 which wouldn't unset everything dependent on
> > -mavx512f.  There is one gotcha, if -mavx512vl isn't enabled in the end,
> > then -mavx512f -mno-evex512 should disable whole TARGET_AVX512F because
> > nothing is left.
> > TARGET_EVEX512 then would guard all TARGET_AVX512* intrinsics which operate
> > on 512-bit vector registers or 64-bit mask registers (in addition to the
> > other TARGET_AVX512* options, perhaps except TARGET_AVX512F), whether the
> > 512-bit modes can be used etc.
> We have an undocumented option mavx10-max-512bit.
>
> 1314;; Only for implementation use
> 1315mavx10-max-512bit
> 1316Target Mask(ISA2_AVX10_512BIT) Var(ix86_isa_flags2) Undocumented Save
> 1317Indicates 512 bit vector width support for AVX10.

Ah, missed that, but ...

> Currently it's only used for AVX10 only, maybe we can extend it to
> existing AVX512*** FLAGS.
> so users can use -mavx512XXX -mno-avx10-max-512bit to get avx10.1-256
> compatible binaries.

... -mno-avx10-max-512bit sounds awkward, no-..-max implies the max doesn't
apply, so what is it then?

If you think -mavx512vl-256 isn't good then maybe -mavx-width-512
and -mno-avx-width-512 would be better (applying to both avx512 and avx10).
I chose -mavx512vl-256 because of the existing -mavx10.1-256.  Btw,
will we then have -mavx10.2-256 as well?  Do we allow -mavx10.1-512
-mavx10.2-256 then, thus just enable 256bit for 10.2 extensions to 10.1?!
I think we opened up too many holes here and the options should be fixed
to decouple the size from the base ISA.

What variable we map this to internally doesn't really matter but yes,
we'd need to guard 512bit patterns with (AVX512VL || AVX10) && 512-enabled-flag

Richard.

> From the implementation perspective, we need to restrict all 512-bit
> vector patterns/builtins/intrinsics under both AVX512XXX and
> TARGET_AVX10_512BIT.
> similar for register allocation, parameter passing, return value,
> vector_mode_supported_p, gather/scatter hook, and all other hooks.
> After that, the -mavx10-max-512bit will divide existing AVX512 into 2
> parts, AVX512XXX-256, AVX512XXX-512.
>
>
> >
> >         Jakub
> >
>
>
> --
> BR,
> Hongtao

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Intel AVX10.1 Compiler Design and Support
  2023-08-21  9:34           ` Richard Biener
@ 2023-08-21  9:36             ` Richard Biener
  2023-08-21  9:50             ` Hongtao Liu
  1 sibling, 0 replies; 88+ messages in thread
From: Richard Biener @ 2023-08-21  9:36 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: Jakub Jelinek, ZiNgA BuRgA, haochen.jiang, gcc-patches

On Mon, Aug 21, 2023 at 11:34 AM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Mon, Aug 21, 2023 at 10:28 AM Hongtao Liu <crazylht@gmail.com> wrote:
> >
> > On Mon, Aug 21, 2023 at 4:09 PM Jakub Jelinek <jakub@redhat.com> wrote:
> > >
> > > On Mon, Aug 21, 2023 at 09:36:16AM +0200, Richard Biener via Gcc-patches wrote:
> > > > > On Sun, Aug 20, 2023 at 6:44 AM ZiNgA BuRgA via Gcc-patches
> > > > > <gcc-patches@gcc.gnu.org> wrote:
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > With the proposed design of these switches, how would I restrict AVX10.1
> > > > > > to particular AVX-512 subsets?
> > > > > We can't, avx10.1 is taken as an indivisible ISA which contains all
> > > > > AVX512 related instructions.
> > > > >
> > > > > > We’ve been taking these cases as bugs (but yes, intrinsics are still allowed, so in some cases it might prove difficult to guarantee this).
> > > > > intel sde support avx10.1-256 target which can be used to validate the
> > > > > binary(if there's invalid 512-bit vector register or 64-bit kmask
> > > > > register is used).
> > > > > > I don’t see any other way of doing what you want within the constraints of this design.
> > > > > It looks like the requirement is that we want a
> > > > > -mavx10-vector-width=256(or maybe reuse -mprefer-vector-width=256)
> > > > > option that acts on the original -mavx512XXX option to produce
> > > > > avx10.1-256 compatible binary. we can't use -mavx10.1-256 since it may
> > > > > include avx512fp16 directives and thus not be backward compatible
> > > > > SKX/CLX/ICX.
> > > >
> > > > Yes.  Note we cannot really re-purpose -mprefer-vector-width=256 since that
> > > > would also make uses of 512bit intrinsics ill-formed.  So we'd need a new
> > > > flag that would restrict AVX512VL to 256bit, possibly using a common internal
> > > > flag for this and the -mavx10.1-256 vector size effect.
> > > >
> > > > Maybe -mdisable-vector-width-512 or -mavx512vl-for-avx10.1-256 or
> > > > -mavx512vl-256?  Writing these the last looks most sensible to me?
> > > > Note it should combine with -mavx512vl to -mavx512vl-256 to make
> > > > -march=native -mavx512vl-256 work (I think we should also allow the
> > > > flag together with -mavx10.1*?)
> > > >
> > > > mavx512vl-256
> > > > Target ...
> > > > Disable the 512bit vector ISA subset of AVX512 or AVX10, enable
> > > > the 256bit vector ISA subset of AVX512.
> > >
> > > Wouldn't it be better to have it similarly to other ISA options as something
> > > positive, say -mevex512 (the ISA docs talk about EVEX.512, EVEX.256 and
> > > EVEX.128)?
> > > Have -mavx512f (and anything that implies it right now) imply also -mevex512
> > > but allow -mno-evex512 which wouldn't unset everything dependent on
> > > -mavx512f.  There is one gotcha, if -mavx512vl isn't enabled in the end,
> > > then -mavx512f -mno-evex512 should disable whole TARGET_AVX512F because
> > > nothing is left.
> > > TARGET_EVEX512 then would guard all TARGET_AVX512* intrinsics which operate
> > > on 512-bit vector registers or 64-bit mask registers (in addition to the
> > > other TARGET_AVX512* options, perhaps except TARGET_AVX512F), whether the
> > > 512-bit modes can be used etc.
> > We have an undocumented option mavx10-max-512bit.
> >
> > 1314;; Only for implementation use
> > 1315mavx10-max-512bit
> > 1316Target Mask(ISA2_AVX10_512BIT) Var(ix86_isa_flags2) Undocumented Save
> > 1317Indicates 512 bit vector width support for AVX10.
>
> Ah, missed that, but ...
>
> > Currently it's only used for AVX10 only, maybe we can extend it to
> > existing AVX512*** FLAGS.
> > so users can use -mavx512XXX -mno-avx10-max-512bit to get avx10.1-256
> > compatible binaries.
>
> ... -mno-avx10-max-512bit sounds awkward, no-..-max implies the max doesn't
> apply, so what is it then?
>
> If you think -mavx512vl-256 isn't good then maybe -mavx-width-512
> and -mno-avx-width-512 would be better (applying to both avx512 and avx10).
> I chose -mavx512vl-256 because of the existing -mavx10.1-256.  Btw,
> will we then have -mavx10.2-256 as well?  Do we allow -mavx10.1-512
> -mavx10.2-256 then, thus just enable 256bit for 10.2 extensions to 10.1?!
> I think we opened up too many holes here and the options should be fixed
> to decouple the size from the base ISA.

Like how about -mavx10.1 -mavx10.2 plus a -mavx10-512 where
-mavx10.[12...] enables just 256 bits (the intended default as Intel thinks)
and -mavx10-512 will enable 512 bits but for the whole selected ISA
(maybe have it enable -max10.1 if that wasn't specified, maybe not).
We can then allow -mno-avx10-512 also with AVX512?

>
> What variable we map this to internally doesn't really matter but yes,
> we'd need to guard 512bit patterns with (AVX512VL || AVX10) && 512-enabled-flag
>
> Richard.
>
> > From the implementation perspective, we need to restrict all 512-bit
> > vector patterns/builtins/intrinsics under both AVX512XXX and
> > TARGET_AVX10_512BIT.
> > similar for register allocation, parameter passing, return value,
> > vector_mode_supported_p, gather/scatter hook, and all other hooks.
> > After that, the -mavx10-max-512bit will divide existing AVX512 into 2
> > parts, AVX512XXX-256, AVX512XXX-512.
> >
> >
> > >
> > >         Jakub
> > >
> >
> >
> > --
> > BR,
> > Hongtao

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Intel AVX10.1 Compiler Design and Support
  2023-08-21  9:34           ` Richard Biener
  2023-08-21  9:36             ` Richard Biener
@ 2023-08-21  9:50             ` Hongtao Liu
  1 sibling, 0 replies; 88+ messages in thread
From: Hongtao Liu @ 2023-08-21  9:50 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jakub Jelinek, ZiNgA BuRgA, haochen.jiang, gcc-patches

On Mon, Aug 21, 2023 at 5:35 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Mon, Aug 21, 2023 at 10:28 AM Hongtao Liu <crazylht@gmail.com> wrote:
> >
> > On Mon, Aug 21, 2023 at 4:09 PM Jakub Jelinek <jakub@redhat.com> wrote:
> > >
> > > On Mon, Aug 21, 2023 at 09:36:16AM +0200, Richard Biener via Gcc-patches wrote:
> > > > > On Sun, Aug 20, 2023 at 6:44 AM ZiNgA BuRgA via Gcc-patches
> > > > > <gcc-patches@gcc.gnu.org> wrote:
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > With the proposed design of these switches, how would I restrict AVX10.1
> > > > > > to particular AVX-512 subsets?
> > > > > We can't, avx10.1 is taken as an indivisible ISA which contains all
> > > > > AVX512 related instructions.
> > > > >
> > > > > > We’ve been taking these cases as bugs (but yes, intrinsics are still allowed, so in some cases it might prove difficult to guarantee this).
> > > > > intel sde support avx10.1-256 target which can be used to validate the
> > > > > binary(if there's invalid 512-bit vector register or 64-bit kmask
> > > > > register is used).
> > > > > > I don’t see any other way of doing what you want within the constraints of this design.
> > > > > It looks like the requirement is that we want a
> > > > > -mavx10-vector-width=256(or maybe reuse -mprefer-vector-width=256)
> > > > > option that acts on the original -mavx512XXX option to produce
> > > > > avx10.1-256 compatible binary. we can't use -mavx10.1-256 since it may
> > > > > include avx512fp16 directives and thus not be backward compatible
> > > > > SKX/CLX/ICX.
> > > >
> > > > Yes.  Note we cannot really re-purpose -mprefer-vector-width=256 since that
> > > > would also make uses of 512bit intrinsics ill-formed.  So we'd need a new
> > > > flag that would restrict AVX512VL to 256bit, possibly using a common internal
> > > > flag for this and the -mavx10.1-256 vector size effect.
> > > >
> > > > Maybe -mdisable-vector-width-512 or -mavx512vl-for-avx10.1-256 or
> > > > -mavx512vl-256?  Writing these the last looks most sensible to me?
> > > > Note it should combine with -mavx512vl to -mavx512vl-256 to make
> > > > -march=native -mavx512vl-256 work (I think we should also allow the
> > > > flag together with -mavx10.1*?)
> > > >
> > > > mavx512vl-256
> > > > Target ...
> > > > Disable the 512bit vector ISA subset of AVX512 or AVX10, enable
> > > > the 256bit vector ISA subset of AVX512.
> > >
> > > Wouldn't it be better to have it similarly to other ISA options as something
> > > positive, say -mevex512 (the ISA docs talk about EVEX.512, EVEX.256 and
> > > EVEX.128)?
> > > Have -mavx512f (and anything that implies it right now) imply also -mevex512
> > > but allow -mno-evex512 which wouldn't unset everything dependent on
> > > -mavx512f.  There is one gotcha, if -mavx512vl isn't enabled in the end,
> > > then -mavx512f -mno-evex512 should disable whole TARGET_AVX512F because
> > > nothing is left.
> > > TARGET_EVEX512 then would guard all TARGET_AVX512* intrinsics which operate
> > > on 512-bit vector registers or 64-bit mask registers (in addition to the
> > > other TARGET_AVX512* options, perhaps except TARGET_AVX512F), whether the
> > > 512-bit modes can be used etc.
> > We have an undocumented option mavx10-max-512bit.
> >
> > 1314;; Only for implementation use
> > 1315mavx10-max-512bit
> > 1316Target Mask(ISA2_AVX10_512BIT) Var(ix86_isa_flags2) Undocumented Save
> > 1317Indicates 512 bit vector width support for AVX10.
>
> Ah, missed that, but ...
>
> > Currently it's only used for AVX10 only, maybe we can extend it to
> > existing AVX512*** FLAGS.
> > so users can use -mavx512XXX -mno-avx10-max-512bit to get avx10.1-256
> > compatible binaries.
>
> ... -mno-avx10-max-512bit sounds awkward, no-..-max implies the max doesn't
> apply, so what is it then?
>
> If you think -mavx512vl-256 isn't good then maybe -mavx-width-512
> and -mno-avx-width-512 would be better (applying to both avx512 and avx10).
> I chose -mavx512vl-256 because of the existing -mavx10.1-256.  Btw,
> will we then have -mavx10.2-256 as well?  Do we allow -mavx10.1-512
> -mavx10.2-256 then, thus just enable 256bit for 10.2 extensions to 10.1?!
We're only allowing a single vector width.
-mavx10.1-512 mavx10.2-256 will only enable -mavx10.2-256 + -mavx10.1-256.
> I think we opened up too many holes here and the options should be fixed
> to decouple the size from the base ISA.
I see, we can try to use -mavx-max-512bit(maybe another name) to
decouple the size from the base ISA.
And make
 -mavx10.1-256 just implies all -mavx512XXX + -mno-avx-max-512bit,
 -mavx10.1-512 implies -mavx512XXX + mavx-max-512bit.
then -mavx512vl-256 is just equal to -mavx512vl + mno-avx-max-512bit.

Lots of work to do, but still not too late for GCC14.1
>
> What variable we map this to internally doesn't really matter but yes,
> we'd need to guard 512bit patterns with (AVX512VL || AVX10) && 512-enabled-flag
>
> Richard.
>
> > From the implementation perspective, we need to restrict all 512-bit
> > vector patterns/builtins/intrinsics under both AVX512XXX and
> > TARGET_AVX10_512BIT.
> > similar for register allocation, parameter passing, return value,
> > vector_mode_supported_p, gather/scatter hook, and all other hooks.
> > After that, the -mavx10-max-512bit will divide existing AVX512 into 2
> > parts, AVX512XXX-256, AVX512XXX-512.
> >
> >
> > >
> > >         Jakub
> > >
> >
> >
> > --
> > BR,
> > Hongtao



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Intel AVX10.1 Compiler Design and Support
  2023-08-21  7:36     ` Richard Biener
  2023-08-21  8:09       ` Jakub Jelinek
@ 2023-08-21  9:26       ` ZiNgA BuRgA
  2023-08-22  3:20         ` Jiang, Haochen
  1 sibling, 1 reply; 88+ messages in thread
From: ZiNgA BuRgA @ 2023-08-21  9:26 UTC (permalink / raw)
  To: Richard Biener, Hongtao Liu; +Cc: haochen.jiang, gcc-patches

Another way (not saying this is better, just throwing out ideas) is to 
break AVX10.1 into all the AVX-512 subsets.
So you'd have something like -mavx10.1-256-vl, -mavx10.1-512-vbmi etc.

* -mavx10.1-256  would effectively be an alias for all the 128+256-bit 
subsets, and set the __AVX10_1__ define
* -mavx512vbmi  would effectively be an alias for `-mavx10.1-128-vbmi 
-mavx10.1-256-vbmi -mavx10.1-512-vbmi` and set the __AVX512VBMI__ define 
(`-mavx10.1-512-vl` might not make much sense unless it implies AVX512F?)
* -mno-avx512vbmi  would similarly be an alias for 
`-mno-avx10.1-128-vbmi -mno-avx10.1-256-vbmi -mno-avx10.1-512-vbmi`; 
with this, `-mavx10.1-256 -mno-avx512vbmi` would make sense, even if 
unusual (enable all AVX10.1 but disable all VBMI)
* -mavx10.2-256  would act as a single feature, cementing in AVX10.2 
like the current AVX10.1 proposal, and AVX-512 subsets can't be turned off


On 21/08/2023 5:36 pm, Richard Biener wrote:
> On Mon, Aug 21, 2023 at 3:20 AM Hongtao Liu via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>
> Yes.  Note we cannot really re-purpose -mprefer-vector-width=256 since that
> would also make uses of 512bit intrinsics ill-formed.  So we'd need a new
> flag that would restrict AVX512VL to 256bit, possibly using a common internal
> flag for this and the -mavx10.1-256 vector size effect.
>
> Maybe -mdisable-vector-width-512 or -mavx512vl-for-avx10.1-256 or
> -mavx512vl-256?  Writing these the last looks most sensible to me?
> Note it should combine with -mavx512vl to -mavx512vl-256 to make
> -march=native -mavx512vl-256 work (I think we should also allow the
> flag together with -mavx10.1*?)
>
> mavx512vl-256
> Target ...
> Disable the 512bit vector ISA subset of AVX512 or AVX10, enable
> the 256bit vector ISA subset of AVX512.
>
> Richard.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: Intel AVX10.1 Compiler Design and Support
  2023-08-21  9:26       ` ZiNgA BuRgA
@ 2023-08-22  3:20         ` Jiang, Haochen
  2023-08-22  7:36           ` Richard Biener
  0 siblings, 1 reply; 88+ messages in thread
From: Jiang, Haochen @ 2023-08-22  3:20 UTC (permalink / raw)
  To: ZiNgA BuRgA, Richard Biener, Hongtao Liu; +Cc: gcc-patches

> -----Original Message-----
> From: ZiNgA BuRgA <zingaburga@hotmail.com>
> Sent: Monday, August 21, 2023 5:27 PM
> To: Richard Biener <richard.guenther@gmail.com>; Hongtao Liu
> <crazylht@gmail.com>
> Cc: Jiang, Haochen <haochen.jiang@intel.com>; gcc-patches@gcc.gnu.org
> Subject: Re: Intel AVX10.1 Compiler Design and Support
> 
> Another way (not saying this is better, just throwing out ideas) is to
> break AVX10.1 into all the AVX-512 subsets.
> So you'd have something like -mavx10.1-256-vl, -mavx10.1-512-vbmi etc.
> 
> * -mavx10.1-256  would effectively be an alias for all the 128+256-bit
> subsets, and set the __AVX10_1__ define
> * -mavx512vbmi  would effectively be an alias for `-mavx10.1-128-vbmi
> -mavx10.1-256-vbmi -mavx10.1-512-vbmi` and set the __AVX512VBMI__ define
> (`-mavx10.1-512-vl` might not make much sense unless it implies AVX512F?)
> * -mno-avx512vbmi  would similarly be an alias for
> `-mno-avx10.1-128-vbmi -mno-avx10.1-256-vbmi -mno-avx10.1-512-vbmi`;
> with this, `-mavx10.1-256 -mno-avx512vbmi` would make sense, even if
> unusual (enable all AVX10.1 but disable all VBMI)
> * -mavx10.2-256  would act as a single feature, cementing in AVX10.2
> like the current AVX10.1 proposal, and AVX-512 subsets can't be turned off

I am considering a proposal quite similar to this if we want to change the
design so that it is flexible.

But there are a few proposals on the table. The problem for this proposal
is that if it is a over-design to make each AVX512 feature to split since in most
scenarios we just need to keep the vector width as the same.

Thx,
Haochen

> 
> 
> On 21/08/2023 5:36 pm, Richard Biener wrote:
> > On Mon, Aug 21, 2023 at 3:20 AM Hongtao Liu via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> >
> > Yes.  Note we cannot really re-purpose -mprefer-vector-width=256 since that
> > would also make uses of 512bit intrinsics ill-formed.  So we'd need a new
> > flag that would restrict AVX512VL to 256bit, possibly using a common internal
> > flag for this and the -mavx10.1-256 vector size effect.
> >
> > Maybe -mdisable-vector-width-512 or -mavx512vl-for-avx10.1-256 or
> > -mavx512vl-256?  Writing these the last looks most sensible to me?
> > Note it should combine with -mavx512vl to -mavx512vl-256 to make
> > -march=native -mavx512vl-256 work (I think we should also allow the
> > flag together with -mavx10.1*?)
> >
> > mavx512vl-256
> > Target ...
> > Disable the 512bit vector ISA subset of AVX512 or AVX10, enable
> > the 256bit vector ISA subset of AVX512.
> >
> > Richard.
> 


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Intel AVX10.1 Compiler Design and Support
  2023-08-22  3:20         ` Jiang, Haochen
@ 2023-08-22  7:36           ` Richard Biener
  2023-08-22  8:34             ` Jakub Jelinek
  0 siblings, 1 reply; 88+ messages in thread
From: Richard Biener @ 2023-08-22  7:36 UTC (permalink / raw)
  To: Jiang, Haochen; +Cc: ZiNgA BuRgA, Hongtao Liu, gcc-patches

On Tue, Aug 22, 2023 at 5:20 AM Jiang, Haochen <haochen.jiang@intel.com> wrote:
>
> > -----Original Message-----
> > From: ZiNgA BuRgA <zingaburga@hotmail.com>
> > Sent: Monday, August 21, 2023 5:27 PM
> > To: Richard Biener <richard.guenther@gmail.com>; Hongtao Liu
> > <crazylht@gmail.com>
> > Cc: Jiang, Haochen <haochen.jiang@intel.com>; gcc-patches@gcc.gnu.org
> > Subject: Re: Intel AVX10.1 Compiler Design and Support
> >
> > Another way (not saying this is better, just throwing out ideas) is to
> > break AVX10.1 into all the AVX-512 subsets.
> > So you'd have something like -mavx10.1-256-vl, -mavx10.1-512-vbmi etc.
> >
> > * -mavx10.1-256  would effectively be an alias for all the 128+256-bit
> > subsets, and set the __AVX10_1__ define
> > * -mavx512vbmi  would effectively be an alias for `-mavx10.1-128-vbmi
> > -mavx10.1-256-vbmi -mavx10.1-512-vbmi` and set the __AVX512VBMI__ define
> > (`-mavx10.1-512-vl` might not make much sense unless it implies AVX512F?)
> > * -mno-avx512vbmi  would similarly be an alias for
> > `-mno-avx10.1-128-vbmi -mno-avx10.1-256-vbmi -mno-avx10.1-512-vbmi`;
> > with this, `-mavx10.1-256 -mno-avx512vbmi` would make sense, even if
> > unusual (enable all AVX10.1 but disable all VBMI)
> > * -mavx10.2-256  would act as a single feature, cementing in AVX10.2
> > like the current AVX10.1 proposal, and AVX-512 subsets can't be turned off
>
> I am considering a proposal quite similar to this if we want to change the
> design so that it is flexible.
>
> But there are a few proposals on the table. The problem for this proposal
> is that if it is a over-design to make each AVX512 feature to split since in most
> scenarios we just need to keep the vector width as the same.

I think internally we should have conditional 512bit support work across
AVX512 and AVX10.

I also think it makes sense to _internally_ have AVX10.1 (10.1!) just
enable the respective AVX512 features.  AVX10.2 would then internally
cover the ISA extensions added in 10.2 only.  Both would reduce the
redundancy and possibly make providing inter-operation between
AVX10.1 (10.1!) and AVX512 to the user easier.  I see AVX 10.1 (10.1!)
just as "re-branding" latest AVX512, so we should treat it that way
(making it an alias to the AVX512 features).

Whether we want allow -mavx10.1 -mno-avx512cd or whether
we only allow the "positive" -mavx512f -mavx512... (omitting avx512cd)
is an entirely separate
question.  But I think to not wreck the core idea (more interoperability,
here between small/big cores) we absolutely have to
provide a subset of avx10.1 but with disabled 512bit vectors which
effectively means AVX512 with disabled 512bit support.

Richard.

>
> Thx,
> Haochen
>
> >
> >
> > On 21/08/2023 5:36 pm, Richard Biener wrote:
> > > On Mon, Aug 21, 2023 at 3:20 AM Hongtao Liu via Gcc-patches
> > > <gcc-patches@gcc.gnu.org> wrote:
> > >
> > > Yes.  Note we cannot really re-purpose -mprefer-vector-width=256 since that
> > > would also make uses of 512bit intrinsics ill-formed.  So we'd need a new
> > > flag that would restrict AVX512VL to 256bit, possibly using a common internal
> > > flag for this and the -mavx10.1-256 vector size effect.
> > >
> > > Maybe -mdisable-vector-width-512 or -mavx512vl-for-avx10.1-256 or
> > > -mavx512vl-256?  Writing these the last looks most sensible to me?
> > > Note it should combine with -mavx512vl to -mavx512vl-256 to make
> > > -march=native -mavx512vl-256 work (I think we should also allow the
> > > flag together with -mavx10.1*?)
> > >
> > > mavx512vl-256
> > > Target ...
> > > Disable the 512bit vector ISA subset of AVX512 or AVX10, enable
> > > the 256bit vector ISA subset of AVX512.
> > >
> > > Richard.
> >
>

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Intel AVX10.1 Compiler Design and Support
  2023-08-22  7:36           ` Richard Biener
@ 2023-08-22  8:34             ` Jakub Jelinek
  2023-08-22  8:35               ` Richard Biener
  2023-08-22 13:02               ` Hongtao Liu
  0 siblings, 2 replies; 88+ messages in thread
From: Jakub Jelinek @ 2023-08-22  8:34 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jiang, Haochen, ZiNgA BuRgA, Hongtao Liu, gcc-patches

On Tue, Aug 22, 2023 at 09:36:15AM +0200, Richard Biener via Gcc-patches wrote:
> I think internally we should have conditional 512bit support work across
> AVX512 and AVX10.
> 
> I also think it makes sense to _internally_ have AVX10.1 (10.1!) just
> enable the respective AVX512 features.  AVX10.2 would then internally
> cover the ISA extensions added in 10.2 only.  Both would reduce the
> redundancy and possibly make providing inter-operation between
> AVX10.1 (10.1!) and AVX512 to the user easier.  I see AVX 10.1 (10.1!)
> just as "re-branding" latest AVX512, so we should treat it that way
> (making it an alias to the AVX512 features).
> 
> Whether we want allow -mavx10.1 -mno-avx512cd or whether
> we only allow the "positive" -mavx512f -mavx512... (omitting avx512cd)
> is an entirely separate
> question.  But I think to not wreck the core idea (more interoperability,
> here between small/big cores) we absolutely have to
> provide a subset of avx10.1 but with disabled 512bit vectors which
> effectively means AVX512 with disabled 512bit support.

Agreed.  And I still think -mevex512 vs. -mno-evex512 is the best option
name to represent whether the effective ISA set allows 512-bit vectors or
not.  I think -mavx10.1 -mno-avx512cd should be fine.  And, -mavx10.1-256
option IMHO should be in the same spirit to all the others a positive enablement,
not both positive (enable avx512{f,cd,bw,dq,...} and negative (disallow
512-bit vectors).  So, if one uses -mavx512f -mavx10.1-256, because the
former would allow 512-bit vectors, the latter shouldn't disable those again
because it isn't a -mno-* option.  Sure, instructions which are specific to
AVX10.1 (aren't present in any currently existing AVX512* ISA set) might be
enabled only in 128/256 bit variants if we differentiate that level.
But, if one uses -mavx2 -mavx10.1-256, because no AVX512* has been enabled
it can enable all the AVX10.1 implied AVX512* parts without EVEX.512.

	Jakub

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Intel AVX10.1 Compiler Design and Support
  2023-08-22  8:34             ` Jakub Jelinek
@ 2023-08-22  8:35               ` Richard Biener
  2023-08-22  8:52                 ` Jiang, Haochen
  2023-08-22 13:02               ` Hongtao Liu
  1 sibling, 1 reply; 88+ messages in thread
From: Richard Biener @ 2023-08-22  8:35 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Jiang, Haochen, ZiNgA BuRgA, Hongtao Liu, gcc-patches

On Tue, Aug 22, 2023 at 10:34 AM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Tue, Aug 22, 2023 at 09:36:15AM +0200, Richard Biener via Gcc-patches wrote:
> > I think internally we should have conditional 512bit support work across
> > AVX512 and AVX10.
> >
> > I also think it makes sense to _internally_ have AVX10.1 (10.1!) just
> > enable the respective AVX512 features.  AVX10.2 would then internally
> > cover the ISA extensions added in 10.2 only.  Both would reduce the
> > redundancy and possibly make providing inter-operation between
> > AVX10.1 (10.1!) and AVX512 to the user easier.  I see AVX 10.1 (10.1!)
> > just as "re-branding" latest AVX512, so we should treat it that way
> > (making it an alias to the AVX512 features).
> >
> > Whether we want allow -mavx10.1 -mno-avx512cd or whether
> > we only allow the "positive" -mavx512f -mavx512... (omitting avx512cd)
> > is an entirely separate
> > question.  But I think to not wreck the core idea (more interoperability,
> > here between small/big cores) we absolutely have to
> > provide a subset of avx10.1 but with disabled 512bit vectors which
> > effectively means AVX512 with disabled 512bit support.
>
> Agreed.  And I still think -mevex512 vs. -mno-evex512 is the best option
> name to represent whether the effective ISA set allows 512-bit vectors or
> not.

Works for me.  Note it also implies mask regs are SImode, not DImode,
not sure if that relates to evex more than mask reg encodings are all evex ...

>  I think -mavx10.1 -mno-avx512cd should be fine.  And, -mavx10.1-256
> option IMHO should be in the same spirit to all the others a positive enablement,
> not both positive (enable avx512{f,cd,bw,dq,...} and negative (disallow
> 512-bit vectors).  So, if one uses -mavx512f -mavx10.1-256, because the
> former would allow 512-bit vectors, the latter shouldn't disable those again
> because it isn't a -mno-* option.  Sure, instructions which are specific to
> AVX10.1 (aren't present in any currently existing AVX512* ISA set) might be
> enabled only in 128/256 bit variants if we differentiate that level.
> But, if one uses -mavx2 -mavx10.1-256, because no AVX512* has been enabled
> it can enable all the AVX10.1 implied AVX512* parts without EVEX.512.
>
>         Jakub
>

^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: Intel AVX10.1 Compiler Design and Support
  2023-08-22  8:35               ` Richard Biener
@ 2023-08-22  8:52                 ` Jiang, Haochen
  2023-08-22  9:23                   ` Richard Biener
  0 siblings, 1 reply; 88+ messages in thread
From: Jiang, Haochen @ 2023-08-22  8:52 UTC (permalink / raw)
  To: Richard Biener, Jakub Jelinek; +Cc: ZiNgA BuRgA, Hongtao Liu, gcc-patches

> -----Original Message-----
> From: Richard Biener <richard.guenther@gmail.com>
> Sent: Tuesday, August 22, 2023 4:36 PM
> To: Jakub Jelinek <jakub@redhat.com>
> Cc: Jiang, Haochen <haochen.jiang@intel.com>; ZiNgA BuRgA
> <zingaburga@hotmail.com>; Hongtao Liu <crazylht@gmail.com>; gcc-
> patches@gcc.gnu.org
> Subject: Re: Intel AVX10.1 Compiler Design and Support
> 
> On Tue, Aug 22, 2023 at 10:34 AM Jakub Jelinek <jakub@redhat.com> wrote:
> >
> > On Tue, Aug 22, 2023 at 09:36:15AM +0200, Richard Biener via Gcc-patches
> wrote:
> > > I think internally we should have conditional 512bit support work across
> > > AVX512 and AVX10.
> > >
> > > I also think it makes sense to _internally_ have AVX10.1 (10.1!) just
> > > enable the respective AVX512 features.  AVX10.2 would then internally
> > > cover the ISA extensions added in 10.2 only.  Both would reduce the
> > > redundancy and possibly make providing inter-operation between
> > > AVX10.1 (10.1!) and AVX512 to the user easier.  I see AVX 10.1 (10.1!)
> > > just as "re-branding" latest AVX512, so we should treat it that way
> > > (making it an alias to the AVX512 features).
> > >
> > > Whether we want allow -mavx10.1 -mno-avx512cd or whether
> > > we only allow the "positive" -mavx512f -mavx512... (omitting avx512cd)
> > > is an entirely separate
> > > question.  But I think to not wreck the core idea (more interoperability,
> > > here between small/big cores) we absolutely have to
> > > provide a subset of avx10.1 but with disabled 512bit vectors which
> > > effectively means AVX512 with disabled 512bit support.
> >
> > Agreed.  And I still think -mevex512 vs. -mno-evex512 is the best option
> > name to represent whether the effective ISA set allows 512-bit vectors or
> > not.
> 
> Works for me.  Note it also implies mask regs are SImode, not DImode,
> not sure if that relates to evex more than mask reg encodings are all evex ...
> 

Just in case we are not on the same page.

So we are looking forward to an "extended" -m[no-]avx10-max-512bit option,
which can also be used on AVX512. The other basic logic will not change.

BTW, -mevex512 is not a good name since there will be 64 bit mask operations
promoted to EVEX128 in APX, which might cause confusion.

Thx,
Haochen

> >  I think -mavx10.1 -mno-avx512cd should be fine.  And, -mavx10.1-256
> > option IMHO should be in the same spirit to all the others a positive
> enablement,
> > not both positive (enable avx512{f,cd,bw,dq,...} and negative (disallow
> > 512-bit vectors).  So, if one uses -mavx512f -mavx10.1-256, because the
> > former would allow 512-bit vectors, the latter shouldn't disable those again
> > because it isn't a -mno-* option.  Sure, instructions which are specific to
> > AVX10.1 (aren't present in any currently existing AVX512* ISA set) might be
> > enabled only in 128/256 bit variants if we differentiate that level.
> > But, if one uses -mavx2 -mavx10.1-256, because no AVX512* has been enabled
> > it can enable all the AVX10.1 implied AVX512* parts without EVEX.512.
> >
> >         Jakub
> >

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Intel AVX10.1 Compiler Design and Support
  2023-08-22  8:52                 ` Jiang, Haochen
@ 2023-08-22  9:23                   ` Richard Biener
  0 siblings, 0 replies; 88+ messages in thread
From: Richard Biener @ 2023-08-22  9:23 UTC (permalink / raw)
  To: Jiang, Haochen; +Cc: Jakub Jelinek, ZiNgA BuRgA, Hongtao Liu, gcc-patches

On Tue, Aug 22, 2023 at 10:53 AM Jiang, Haochen <haochen.jiang@intel.com> wrote:
>
> > -----Original Message-----
> > From: Richard Biener <richard.guenther@gmail.com>
> > Sent: Tuesday, August 22, 2023 4:36 PM
> > To: Jakub Jelinek <jakub@redhat.com>
> > Cc: Jiang, Haochen <haochen.jiang@intel.com>; ZiNgA BuRgA
> > <zingaburga@hotmail.com>; Hongtao Liu <crazylht@gmail.com>; gcc-
> > patches@gcc.gnu.org
> > Subject: Re: Intel AVX10.1 Compiler Design and Support
> >
> > On Tue, Aug 22, 2023 at 10:34 AM Jakub Jelinek <jakub@redhat.com> wrote:
> > >
> > > On Tue, Aug 22, 2023 at 09:36:15AM +0200, Richard Biener via Gcc-patches
> > wrote:
> > > > I think internally we should have conditional 512bit support work across
> > > > AVX512 and AVX10.
> > > >
> > > > I also think it makes sense to _internally_ have AVX10.1 (10.1!) just
> > > > enable the respective AVX512 features.  AVX10.2 would then internally
> > > > cover the ISA extensions added in 10.2 only.  Both would reduce the
> > > > redundancy and possibly make providing inter-operation between
> > > > AVX10.1 (10.1!) and AVX512 to the user easier.  I see AVX 10.1 (10.1!)
> > > > just as "re-branding" latest AVX512, so we should treat it that way
> > > > (making it an alias to the AVX512 features).
> > > >
> > > > Whether we want allow -mavx10.1 -mno-avx512cd or whether
> > > > we only allow the "positive" -mavx512f -mavx512... (omitting avx512cd)
> > > > is an entirely separate
> > > > question.  But I think to not wreck the core idea (more interoperability,
> > > > here between small/big cores) we absolutely have to
> > > > provide a subset of avx10.1 but with disabled 512bit vectors which
> > > > effectively means AVX512 with disabled 512bit support.
> > >
> > > Agreed.  And I still think -mevex512 vs. -mno-evex512 is the best option
> > > name to represent whether the effective ISA set allows 512-bit vectors or
> > > not.
> >
> > Works for me.  Note it also implies mask regs are SImode, not DImode,
> > not sure if that relates to evex more than mask reg encodings are all evex ...
> >
>
> Just in case we are not on the same page.
>
> So we are looking forward to an "extended" -m[no-]avx10-max-512bit option,
> which can also be used on AVX512. The other basic logic will not change.

Yes, I think that fulfills the main complaints.

Internally I'd also like to avoid having TARGET_AVX10.1 guards in the md file
but alias -mavx10.1 to the set of AVX512 sub-ISAs it covers.  Only have
TARGET_AVX10.2 covering ISA extensions introduced with 10.2.

> BTW, -mevex512 is not a good name since there will be 64 bit mask operations
> promoted to EVEX128 in APX, which might cause confusion.
>
> Thx,
> Haochen
>
> > >  I think -mavx10.1 -mno-avx512cd should be fine.  And, -mavx10.1-256
> > > option IMHO should be in the same spirit to all the others a positive
> > enablement,
> > > not both positive (enable avx512{f,cd,bw,dq,...} and negative (disallow
> > > 512-bit vectors).  So, if one uses -mavx512f -mavx10.1-256, because the
> > > former would allow 512-bit vectors, the latter shouldn't disable those again
> > > because it isn't a -mno-* option.  Sure, instructions which are specific to
> > > AVX10.1 (aren't present in any currently existing AVX512* ISA set) might be
> > > enabled only in 128/256 bit variants if we differentiate that level.
> > > But, if one uses -mavx2 -mavx10.1-256, because no AVX512* has been enabled
> > > it can enable all the AVX10.1 implied AVX512* parts without EVEX.512.
> > >
> > >         Jakub
> > >

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Intel AVX10.1 Compiler Design and Support
  2023-08-22  8:34             ` Jakub Jelinek
  2023-08-22  8:35               ` Richard Biener
@ 2023-08-22 13:02               ` Hongtao Liu
  2023-08-22 13:16                 ` Jakub Jelinek
  1 sibling, 1 reply; 88+ messages in thread
From: Hongtao Liu @ 2023-08-22 13:02 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Richard Biener, Jiang, Haochen, ZiNgA BuRgA, gcc-patches

On Tue, Aug 22, 2023 at 4:34 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Tue, Aug 22, 2023 at 09:36:15AM +0200, Richard Biener via Gcc-patches wrote:
> > I think internally we should have conditional 512bit support work across
> > AVX512 and AVX10.
> >
> > I also think it makes sense to _internally_ have AVX10.1 (10.1!) just
> > enable the respective AVX512 features.  AVX10.2 would then internally
> > cover the ISA extensions added in 10.2 only.  Both would reduce the
> > redundancy and possibly make providing inter-operation between
> > AVX10.1 (10.1!) and AVX512 to the user easier.  I see AVX 10.1 (10.1!)
> > just as "re-branding" latest AVX512, so we should treat it that way
> > (making it an alias to the AVX512 features).
> >
> > Whether we want allow -mavx10.1 -mno-avx512cd or whether
> > we only allow the "positive" -mavx512f -mavx512... (omitting avx512cd)
> > is an entirely separate
> > question.  But I think to not wreck the core idea (more interoperability,
> > here between small/big cores) we absolutely have to
> > provide a subset of avx10.1 but with disabled 512bit vectors which
> > effectively means AVX512 with disabled 512bit support.
>
> Agreed.  And I still think -mevex512 vs. -mno-evex512 is the best option
> name to represent whether the effective ISA set allows 512-bit vectors or
> not.  I think -mavx10.1 -mno-avx512cd should be fine.  And, -mavx10.1-256
> option IMHO should be in the same spirit to all the others a positive enablement,
> not both positive (enable avx512{f,cd,bw,dq,...} and negative (disallow
> 512-bit vectors).  So, if one uses -mavx512f -mavx10.1-256, because the
> former would allow 512-bit vectors, the latter shouldn't disable those again
> because it isn't a -mno-* option.  Sure, instructions which are specific to
But there's implicit negative (disallow 512-bit vector), I think
-mav512f -mavx10.1-256 or -mavx10.1-256 -mavx512f shouldn't enable
512-bit vector.
Further, we should disallow a mix of exex512 and non-evex512 (e.g.
-mavx10.1-512 -mavx10.2-256),they should be a unified separate switch
that either disallows both or allows both. Instead of some isa
allowing it and some isa disallowing it.
> AVX10.1 (aren't present in any currently existing AVX512* ISA set) might be
> enabled only in 128/256 bit variants if we differentiate that level.
> But, if one uses -mavx2 -mavx10.1-256, because no AVX512* has been enabled
> it can enable all the AVX10.1 implied AVX512* parts without EVEX.512.
>
>         Jakub
>


-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Intel AVX10.1 Compiler Design and Support
  2023-08-22 13:02               ` Hongtao Liu
@ 2023-08-22 13:16                 ` Jakub Jelinek
  2023-08-22 13:23                   ` Richard Biener
  0 siblings, 1 reply; 88+ messages in thread
From: Jakub Jelinek @ 2023-08-22 13:16 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: Richard Biener, Jiang, Haochen, ZiNgA BuRgA, gcc-patches

On Tue, Aug 22, 2023 at 09:02:29PM +0800, Hongtao Liu wrote:
> > Agreed.  And I still think -mevex512 vs. -mno-evex512 is the best option
> > name to represent whether the effective ISA set allows 512-bit vectors or
> > not.  I think -mavx10.1 -mno-avx512cd should be fine.  And, -mavx10.1-256
> > option IMHO should be in the same spirit to all the others a positive enablement,
> > not both positive (enable avx512{f,cd,bw,dq,...} and negative (disallow
> > 512-bit vectors).  So, if one uses -mavx512f -mavx10.1-256, because the
> > former would allow 512-bit vectors, the latter shouldn't disable those again
> > because it isn't a -mno-* option.  Sure, instructions which are specific to
> But there's implicit negative (disallow 512-bit vector), I think

That is wrong.

> -mav512f -mavx10.1-256 or -mavx10.1-256 -mavx512f shouldn't enable
> 512-bit vector.

Because then the -mavx10.1-256 option behaves completely differently from
all the other isa options.

We have the -march= options which are processed separately, but the normal
ISA options either only enable something (when -mwhatever), or only disable something
(when -mno-whatever). -mavx512f -mavx10.1-256 should be a union of those
ISAs, like say -mavx2 -mbmi is, not an intersection or something even
harder to understand.

> Further, we should disallow a mix of exex512 and non-evex512 (e.g.
> -mavx10.1-512 -mavx10.2-256),they should be a unified separate switch
> that either disallows both or allows both. Instead of some isa
> allowing it and some isa disallowing it.

No, it will be really terrible user experience if the new options behave
completely differently from everything else.  Because then we'll need to
document it in detail how it behaves and users will have hard time to figure
it out, and specify what it does not just on the command line, but also when
mixing with target attribute or pragmas.  -mavx10.1-512 -mavx10.2-256 should
be a union of those two ISAs.  Either internally there is an ISA flag whether
the instructions in the avx10.2 ISA but not avx10.1 ISA can operate on
512-bit vectors or not, in that case -mavx10.1-512 -mavx10.2-256 should
enable the AVX10.1 set including 512-bit vectors + just the < 512-bit
instructions from the 10.1 to 10.2 delta, or if there is no such separation
internally, it will just enable full AVX10.2-512.  User has asked for it.

	Jakub

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Intel AVX10.1 Compiler Design and Support
  2023-08-22 13:16                 ` Jakub Jelinek
@ 2023-08-22 13:23                   ` Richard Biener
  2023-08-22 13:35                     ` Hongtao Liu
  0 siblings, 1 reply; 88+ messages in thread
From: Richard Biener @ 2023-08-22 13:23 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Hongtao Liu, Jiang, Haochen, ZiNgA BuRgA, gcc-patches

On Tue, Aug 22, 2023 at 3:16 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Tue, Aug 22, 2023 at 09:02:29PM +0800, Hongtao Liu wrote:
> > > Agreed.  And I still think -mevex512 vs. -mno-evex512 is the best option
> > > name to represent whether the effective ISA set allows 512-bit vectors or
> > > not.  I think -mavx10.1 -mno-avx512cd should be fine.  And, -mavx10.1-256
> > > option IMHO should be in the same spirit to all the others a positive enablement,
> > > not both positive (enable avx512{f,cd,bw,dq,...} and negative (disallow
> > > 512-bit vectors).  So, if one uses -mavx512f -mavx10.1-256, because the
> > > former would allow 512-bit vectors, the latter shouldn't disable those again
> > > because it isn't a -mno-* option.  Sure, instructions which are specific to
> > But there's implicit negative (disallow 512-bit vector), I think
>
> That is wrong.
>
> > -mav512f -mavx10.1-256 or -mavx10.1-256 -mavx512f shouldn't enable
> > 512-bit vector.
>
> Because then the -mavx10.1-256 option behaves completely differently from
> all the other isa options.
>
> We have the -march= options which are processed separately, but the normal
> ISA options either only enable something (when -mwhatever), or only disable something
> (when -mno-whatever). -mavx512f -mavx10.1-256 should be a union of those
> ISAs, like say -mavx2 -mbmi is, not an intersection or something even
> harder to understand.
>
> > Further, we should disallow a mix of exex512 and non-evex512 (e.g.
> > -mavx10.1-512 -mavx10.2-256),they should be a unified separate switch
> > that either disallows both or allows both. Instead of some isa
> > allowing it and some isa disallowing it.
>
> No, it will be really terrible user experience if the new options behave
> completely differently from everything else.  Because then we'll need to
> document it in detail how it behaves and users will have hard time to figure
> it out, and specify what it does not just on the command line, but also when
> mixing with target attribute or pragmas.  -mavx10.1-512 -mavx10.2-256 should
> be a union of those two ISAs.  Either internally there is an ISA flag whether
> the instructions in the avx10.2 ISA but not avx10.1 ISA can operate on
> 512-bit vectors or not, in that case -mavx10.1-512 -mavx10.2-256 should
> enable the AVX10.1 set including 512-bit vectors + just the < 512-bit
> instructions from the 10.1 to 10.2 delta, or if there is no such separation
> internally, it will just enable full AVX10.2-512.  User has asked for it.

I think having all three -mavx10.1, -mavx10.1-256 and -mavx10.1-512 is just
confusing.  Please separate ISA (avx10.1) from size.  If -m[no-]evex512 isn't
good propose something else.  -mavx512f will enable 512bits, -mavx10.1
will not unless -mevex512.  -mavx512f -mavx512vl -mno-evex512 will disable
512bits.

So scrap -mavx10.1-256 and -mavx10.1-512 please.

Richard.

>         Jakub
>

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Intel AVX10.1 Compiler Design and Support
  2023-08-22 13:23                   ` Richard Biener
@ 2023-08-22 13:35                     ` Hongtao Liu
  2023-08-22 13:54                       ` Jakub Jelinek
  2023-08-22 14:39                       ` Hongtao Liu
  0 siblings, 2 replies; 88+ messages in thread
From: Hongtao Liu @ 2023-08-22 13:35 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jakub Jelinek, Jiang, Haochen, ZiNgA BuRgA, gcc-patches

On Tue, Aug 22, 2023 at 9:24 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Tue, Aug 22, 2023 at 3:16 PM Jakub Jelinek <jakub@redhat.com> wrote:
> >
> > On Tue, Aug 22, 2023 at 09:02:29PM +0800, Hongtao Liu wrote:
> > > > Agreed.  And I still think -mevex512 vs. -mno-evex512 is the best option
> > > > name to represent whether the effective ISA set allows 512-bit vectors or
> > > > not.  I think -mavx10.1 -mno-avx512cd should be fine.  And, -mavx10.1-256
> > > > option IMHO should be in the same spirit to all the others a positive enablement,
> > > > not both positive (enable avx512{f,cd,bw,dq,...} and negative (disallow
> > > > 512-bit vectors).  So, if one uses -mavx512f -mavx10.1-256, because the
> > > > former would allow 512-bit vectors, the latter shouldn't disable those again
> > > > because it isn't a -mno-* option.  Sure, instructions which are specific to
> > > But there's implicit negative (disallow 512-bit vector), I think
> >
> > That is wrong.
> >
> > > -mav512f -mavx10.1-256 or -mavx10.1-256 -mavx512f shouldn't enable
> > > 512-bit vector.
> >
> > Because then the -mavx10.1-256 option behaves completely differently from
> > all the other isa options.
> >
> > We have the -march= options which are processed separately, but the normal
> > ISA options either only enable something (when -mwhatever), or only disable something
> > (when -mno-whatever). -mavx512f -mavx10.1-256 should be a union of those
> > ISAs, like say -mavx2 -mbmi is, not an intersection or something even
> > harder to understand.
> >
> > > Further, we should disallow a mix of exex512 and non-evex512 (e.g.
> > > -mavx10.1-512 -mavx10.2-256),they should be a unified separate switch
> > > that either disallows both or allows both. Instead of some isa
> > > allowing it and some isa disallowing it.
> >
> > No, it will be really terrible user experience if the new options behave
> > completely differently from everything else.  Because then we'll need to
Ok, then we can't avoid TARGET_AVX10_1 in those existing 256/128-bit
evex instruction patterns.
> > document it in detail how it behaves and users will have hard time to figure
> > it out, and specify what it does not just on the command line, but also when
> > mixing with target attribute or pragmas.  -mavx10.1-512 -mavx10.2-256 should
> > be a union of those two ISAs.  Either internally there is an ISA flag whether
> > the instructions in the avx10.2 ISA but not avx10.1 ISA can operate on
> > 512-bit vectors or not, in that case -mavx10.1-512 -mavx10.2-256 should
> > enable the AVX10.1 set including 512-bit vectors + just the < 512-bit
> > instructions from the 10.1 to 10.2 delta, or if there is no such separation
> > internally, it will just enable full AVX10.2-512.  User has asked for it.
>
> I think having all three -mavx10.1, -mavx10.1-256 and -mavx10.1-512 is just
> confusing.  Please separate ISA (avx10.1) from size.  If -m[no-]evex512 isn't
> good propose something else.  -mavx512f will enable 512bits, -mavx10.1
> will not unless -mevex512.  -mavx512f -mavx512vl -mno-evex512 will disable
> 512bits.
>
> So scrap -mavx10.1-256 and -mavx10.1-512 please.

It sounds to me we would have something like
avx512XXX
   ^
   |
"independent": TARGET_AVX512VL || TARGET_AVX10_1 will enable
128/256-bit instruction.
   |
avx10.1-256 <----implied---- avx10.1-512
    ^                                              ^
    |                                               |
    |                                               |
implied                                   implied
    |                                               |
    |                                               |
avx10.2-256 <----implied -----avx10.2-512
    ^                                              ^
    |                                               |
    |                                               |
implied                                    Implied
    |                                               |
    |                                               |
avx10.3-256 <---implied-------avx10.3-512
  .....

And put every existing and new instruction under those flags

>
> Richard.
>
> >         Jakub
> >



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Intel AVX10.1 Compiler Design and Support
  2023-08-22 13:35                     ` Hongtao Liu
@ 2023-08-22 13:54                       ` Jakub Jelinek
  2023-08-22 14:35                         ` Hongtao Liu
  2023-08-22 14:39                       ` Hongtao Liu
  1 sibling, 1 reply; 88+ messages in thread
From: Jakub Jelinek @ 2023-08-22 13:54 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: Richard Biener, Jiang, Haochen, ZiNgA BuRgA, gcc-patches

On Tue, Aug 22, 2023 at 09:35:44PM +0800, Hongtao Liu wrote:
> Ok, then we can't avoid TARGET_AVX10_1 in those existing 256/128-bit
> evex instruction patterns.

Why?
Internally for md etc. purposes, we should have the current
TARGET_AVX512* etc. ISA flags, plus one new one, whatever we call it
(TARGET_EVEX512 even if it is not completely descriptive because of kandq
etc., or some other name) which says if 512-bit vector modes can be used,
if g modifier can be used, if the 64-bit mask operations can be used etc.
Plus, if AVX10.1 contains any instructions not covered in the preexisting
TARGET_AVX512* sets, TARGET_AVX10_1 which covers that delta, otherwise
keep -mavx10.1 just as an command line option which enables/disables
other stuff.
The current common/config/i386/i386-common.cc OPTION_MASK_ISA*SET* would be
like now, except that the current AVX512* sets imply also EVEX512/whatever
it will be called, that option itself enables nothing (or TARGET_AVX512F),
and unsetting it doesn't disable all the TARGET_AVX512*.
-mavx10.1 would enable the AVX512* sets without EVEX512/whatever.
At the end of the option processing, if EVEX512/whatever is set but
TARGET_AVX512VL is not, disable TARGET_AVX512F with all its dependencies,
because VL is a precondition of 128/256-bit EVEX and if 512-bit EVEX is not
enabled, there is nothing left.

	Jakub


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Intel AVX10.1 Compiler Design and Support
  2023-08-22 13:54                       ` Jakub Jelinek
@ 2023-08-22 14:35                         ` Hongtao Liu
  2023-08-22 15:01                           ` Jakub Jelinek
  2023-08-23  7:32                           ` Richard Biener
  0 siblings, 2 replies; 88+ messages in thread
From: Hongtao Liu @ 2023-08-22 14:35 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Richard Biener, Jiang, Haochen, ZiNgA BuRgA, gcc-patches

On Tue, Aug 22, 2023 at 9:54 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Tue, Aug 22, 2023 at 09:35:44PM +0800, Hongtao Liu wrote:
> > Ok, then we can't avoid TARGET_AVX10_1 in those existing 256/128-bit
> > evex instruction patterns.
>
> Why?
> Internally for md etc. purposes, we should have the current
> TARGET_AVX512* etc. ISA flags, plus one new one, whatever we call it
> (TARGET_EVEX512 even if it is not completely descriptive because of kandq
> etc., or some other name) which says if 512-bit vector modes can be used,
> if g modifier can be used, if the 64-bit mask operations can be used etc.
> Plus, if AVX10.1 contains any instructions not covered in the preexisting
> TARGET_AVX512* sets, TARGET_AVX10_1 which covers that delta, otherwise
> keep -mavx10.1 just as an command line option which enables/disables
Let's assume there's no detla now, AVX10.1-512 is equal to
AVX512{F,VL,BW,DQ,CD,BF16,FP16,VBMI,VBMI2,VNNI,IFMA,BITALG, VPOPCNTDQ}
> other stuff.
> The current common/config/i386/i386-common.cc OPTION_MASK_ISA*SET* would be
> like now, except that the current AVX512* sets imply also EVEX512/whatever
> it will be called, that option itself enables nothing (or TARGET_AVX512F),
> and unsetting it doesn't disable all the TARGET_AVX512*.
> -mavx10.1 would enable the AVX512* sets without EVEX512/whatever.
So for -mavx512bw -mavx10.1-256, -mavx512bw will set EVEX512, but
-mavx10.1-256 doesn't clear EVEX512 but just enable all AVX512* sets?.
then the combination basically is equal to AVX10.1-512(AVX512* sets +
EVEX512)
If this is your assumption, yes, there's no need for TARGET_AVX10_1.
(My former understanding is that you want  -mavx512bw -mavx10.1-256
enable all 128/256/scalar invariants but only avx512bw 512-bit
invariants, this can't be done without TARGET_AVX10_1).
So the whole point is -mavx10.x-256 shouldn't clear nor set EVEX512,
and -mavx10.x-512 should set EVEX512.
> At the end of the option processing, if EVEX512/whatever is set but
> TARGET_AVX512VL is not, disable TARGET_AVX512F with all its dependencies,
> because VL is a precondition of 128/256-bit EVEX and if 512-bit EVEX is not
> enabled, there is nothing left.
There's scalar evex instruction under TARGET_AVX512F(and other
non-avx512vl) w/o EVEX512, not nothing left.
>
>         Jakub
>


-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Intel AVX10.1 Compiler Design and Support
  2023-08-22 14:35                         ` Hongtao Liu
@ 2023-08-22 15:01                           ` Jakub Jelinek
  2023-08-23  1:57                             ` Jiang, Haochen
  2023-08-23  7:32                           ` Richard Biener
  1 sibling, 1 reply; 88+ messages in thread
From: Jakub Jelinek @ 2023-08-22 15:01 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: Richard Biener, Jiang, Haochen, ZiNgA BuRgA, gcc-patches

On Tue, Aug 22, 2023 at 10:35:55PM +0800, Hongtao Liu wrote:
> Let's assume there's no detla now, AVX10.1-512 is equal to
> AVX512{F,VL,BW,DQ,CD,BF16,FP16,VBMI,VBMI2,VNNI,IFMA,BITALG, VPOPCNTDQ}
> > other stuff.
> > The current common/config/i386/i386-common.cc OPTION_MASK_ISA*SET* would be
> > like now, except that the current AVX512* sets imply also EVEX512/whatever
> > it will be called, that option itself enables nothing (or TARGET_AVX512F),
> > and unsetting it doesn't disable all the TARGET_AVX512*.
> > -mavx10.1 would enable the AVX512* sets without EVEX512/whatever.
> So for -mavx512bw -mavx10.1-256, -mavx512bw will set EVEX512, but
> -mavx10.1-256 doesn't clear EVEX512 but just enable all AVX512* sets?.
> then the combination basically is equal to AVX10.1-512(AVX512* sets +
> EVEX512)
> If this is your assumption, yes, there's no need for TARGET_AVX10_1.

I think that would be my expectation.  -mavx512bw currently implies
512-bit vector support of avx512f and avx512bw, and with -mavx512{bw,vl}
also 128-bit/256-bit vector support.  All pre-AVX10 chips which do support
AVX512BW support 512-bit vectors.  Now, -mavx10.1 will bring in also
vl,dq,cd,bf16,fp16,vbmi,vbmi2,vnni,ifma,bitalg,vpopcntdq as you wrote
which weren't enabled before, but unless there is some existing or planned
CPU which would support 512-bit vectors in avx512f and avx512bw ISAs and
only support 128/256-bit vectors in those
dq,cd,bf16,fp16,vbmi,vbmi2,vnni,ifma,bitalg,vpopcntdq isas, I think there
is no need to differentiate further; the only CPUs which will support both
what -mavx512bw and -mavx10.1 requires will be (if there is no delta)
either CPUs with 128/256/512-bit vector support of those
f,vl,bw,dq,cd,...vpopcntdq ISAs, or AVX10.1-512 ISAs.
-mavx512vl -mavx512bw -mno-evex512 -mavx10.1-256 would on the other side
disable all 512-bit vector instructions and in the end just mean the
same as -mavx10.1-256.
For just
-mavx512bw -mno-evex512 -mavx10.1-256
the question is if that -mno-evex512 turns off also avx512bw/avx512f because
avx512vl isn't enabled at that point during processing, or if we do that
only at the end as a special case.  Of course, in this exact case there is
no difference, because -mavx10.1-256 turns that back on.
But it would make a difference on
-mavx512bw -mno-evex512 -mavx512vl
(when processed right away would disable AVX512BW (because VL isn't on)
and in the end enable VL,F including EVEX512, or be equivalent to just
-mavx512bw -mavx512vl if processed at the end, because -mavx512vl implied
-mevex512 again.

	Jakub


^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: Intel AVX10.1 Compiler Design and Support
  2023-08-22 15:01                           ` Jakub Jelinek
@ 2023-08-23  1:57                             ` Jiang, Haochen
  2023-08-23  2:19                               ` Hongtao Liu
  2023-08-23  8:16                               ` Jakub Jelinek
  0 siblings, 2 replies; 88+ messages in thread
From: Jiang, Haochen @ 2023-08-23  1:57 UTC (permalink / raw)
  To: Jakub Jelinek, Hongtao Liu; +Cc: Richard Biener, ZiNgA BuRgA, gcc-patches

> -----Original Message-----
> From: Jakub Jelinek <jakub@redhat.com>
> Sent: Tuesday, August 22, 2023 11:02 PM
> To: Hongtao Liu <crazylht@gmail.com>
> Cc: Richard Biener <richard.guenther@gmail.com>; Jiang, Haochen
> <haochen.jiang@intel.com>; ZiNgA BuRgA <zingaburga@hotmail.com>; gcc-
> patches@gcc.gnu.org
> Subject: Re: Intel AVX10.1 Compiler Design and Support
> 
> On Tue, Aug 22, 2023 at 10:35:55PM +0800, Hongtao Liu wrote:
> > Let's assume there's no detla now, AVX10.1-512 is equal to
> > AVX512{F,VL,BW,DQ,CD,BF16,FP16,VBMI,VBMI2,VNNI,IFMA,BITALG,VPOPCNTDQ}
> > > other stuff.
> > > The current common/config/i386/i386-common.cc OPTION_MASK_ISA*SET* would be
> > > like now, except that the current AVX512* sets imply also EVEX512/whatever
> > > it will be called, that option itself enables nothing (or TARGET_AVX512F),
> > > and unsetting it doesn't disable all the TARGET_AVX512*.
> > > -mavx10.1 would enable the AVX512* sets without EVEX512/whatever.
> > So for -mavx512bw -mavx10.1-256, -mavx512bw will set EVEX512, but
> > -mavx10.1-256 doesn't clear EVEX512 but just enable all AVX512* sets?.
> > then the combination basically is equal to AVX10.1-512(AVX512* sets +
> > EVEX512)
> > If this is your assumption, yes, there's no need for TARGET_AVX10_1.

I think we still need that since the current w/o AVX512VL, we will not only
enable 512 bit vector instructions but also enable scalar instructions, which
means when it comes to -mavx512bw -mno-evex512, we should enable
the scalar function.

And scalar functions will also be enabled in AVX10.1-256, we need something
to distinguish them out from the ISA set w/o AVX512VL.

Thx,
Haochen

> 
> I think that would be my expectation.  -mavx512bw currently implies
> 512-bit vector support of avx512f and avx512bw, and with -mavx512{bw,vl}
> also 128-bit/256-bit vector support.  All pre-AVX10 chips which do support
> AVX512BW support 512-bit vectors.  Now, -mavx10.1 will bring in also
> vl,dq,cd,bf16,fp16,vbmi,vbmi2,vnni,ifma,bitalg,vpopcntdq as you wrote
> which weren't enabled before, but unless there is some existing or planned
> CPU which would support 512-bit vectors in avx512f and avx512bw ISAs and
> only support 128/256-bit vectors in those
> dq,cd,bf16,fp16,vbmi,vbmi2,vnni,ifma,bitalg,vpopcntdq isas, I think there
> is no need to differentiate further; the only CPUs which will support both
> what -mavx512bw and -mavx10.1 requires will be (if there is no delta)
> either CPUs with 128/256/512-bit vector support of those
> f,vl,bw,dq,cd,...vpopcntdq ISAs, or AVX10.1-512 ISAs.
> -mavx512vl -mavx512bw -mno-evex512 -mavx10.1-256 would on the other side
> disable all 512-bit vector instructions and in the end just mean the
> same as -mavx10.1-256.
> For just
> -mavx512bw -mno-evex512 -mavx10.1-256
> the question is if that -mno-evex512 turns off also avx512bw/avx512f because
> avx512vl isn't enabled at that point during processing, or if we do that
> only at the end as a special case.  Of course, in this exact case there is
> no difference, because -mavx10.1-256 turns that back on.
> But it would make a difference on
> -mavx512bw -mno-evex512 -mavx512vl
> (when processed right away would disable AVX512BW (because VL isn't on)
> and in the end enable VL,F including EVEX512, or be equivalent to just
> -mavx512bw -mavx512vl if processed at the end, because -mavx512vl implied
> -mevex512 again.
> 
> 	Jakub


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Intel AVX10.1 Compiler Design and Support
  2023-08-23  1:57                             ` Jiang, Haochen
@ 2023-08-23  2:19                               ` Hongtao Liu
  2023-08-23  6:47                                 ` Jiang, Haochen
  2023-08-23  8:16                               ` Jakub Jelinek
  1 sibling, 1 reply; 88+ messages in thread
From: Hongtao Liu @ 2023-08-23  2:19 UTC (permalink / raw)
  To: Jiang, Haochen; +Cc: Jakub Jelinek, Richard Biener, ZiNgA BuRgA, gcc-patches

On Wed, Aug 23, 2023 at 9:58 AM Jiang, Haochen <haochen.jiang@intel.com> wrote:
>
> > -----Original Message-----
> > From: Jakub Jelinek <jakub@redhat.com>
> > Sent: Tuesday, August 22, 2023 11:02 PM
> > To: Hongtao Liu <crazylht@gmail.com>
> > Cc: Richard Biener <richard.guenther@gmail.com>; Jiang, Haochen
> > <haochen.jiang@intel.com>; ZiNgA BuRgA <zingaburga@hotmail.com>; gcc-
> > patches@gcc.gnu.org
> > Subject: Re: Intel AVX10.1 Compiler Design and Support
> >
> > On Tue, Aug 22, 2023 at 10:35:55PM +0800, Hongtao Liu wrote:
> > > Let's assume there's no detla now, AVX10.1-512 is equal to
> > > AVX512{F,VL,BW,DQ,CD,BF16,FP16,VBMI,VBMI2,VNNI,IFMA,BITALG,VPOPCNTDQ}
> > > > other stuff.
> > > > The current common/config/i386/i386-common.cc OPTION_MASK_ISA*SET* would be
> > > > like now, except that the current AVX512* sets imply also EVEX512/whatever
> > > > it will be called, that option itself enables nothing (or TARGET_AVX512F),
> > > > and unsetting it doesn't disable all the TARGET_AVX512*.
> > > > -mavx10.1 would enable the AVX512* sets without EVEX512/whatever.
> > > So for -mavx512bw -mavx10.1-256, -mavx512bw will set EVEX512, but
> > > -mavx10.1-256 doesn't clear EVEX512 but just enable all AVX512* sets?.
> > > then the combination basically is equal to AVX10.1-512(AVX512* sets +
> > > EVEX512)
> > > If this is your assumption, yes, there's no need for TARGET_AVX10_1.
>
> I think we still need that since the current w/o AVX512VL, we will not only
> enable 512 bit vector instructions but also enable scalar instructions, which
> means when it comes to -mavx512bw -mno-evex512, we should enable
> the scalar function.
>
> And scalar functions will also be enabled in AVX10.1-256, we need something
> to distinguish them out from the ISA set w/o AVX512VL.
Why do we need to distinguish scalar evex instruction?
As long as -mavx512XXX -mno-evex does not generate zmm/64-bit kmask,
it should be ok.

Assume there's no delta in AVX10.1, It sounds to me the design should be like

avx512*  <== mno-evex512==  avx512* + mevex512
(no-evex512)                            (original AVX512 stuff)
   /\                                              /\
   ||(equal)                                   ||(equal)
   \/                                              \/
avx10.1-256                       avx10.1-512
    /\                                              /\
    ||                                              ||
    ||                                              ||
implied                                    implied
    ||                                              ||
    ||                                              ||
avx10.2-256 <== implied ==  avx10.2-512
    /\                                             /\
    ||                                             ||
    ||                                             ||
implied                                    Implied
    ||                                             ||
    ||                                             ||
avx10.3-256 <== implied ==   avx10.3-512

1. The new instructions in avx10.x should be put in either avx10.x-256
or avx10.x-512 according to vector/kmask size
2. -mno-evex512 should disable -avx10.x-512.
3. -mavx512* will defaultly enable -mevex512, but -mavx10.1-256 will
just enable -mavx512* but not -mevex512

>
> Thx,
> Haochen
>
> >
> > I think that would be my expectation.  -mavx512bw currently implies
> > 512-bit vector support of avx512f and avx512bw, and with -mavx512{bw,vl}
> > also 128-bit/256-bit vector support.  All pre-AVX10 chips which do support
> > AVX512BW support 512-bit vectors.  Now, -mavx10.1 will bring in also
> > vl,dq,cd,bf16,fp16,vbmi,vbmi2,vnni,ifma,bitalg,vpopcntdq as you wrote
> > which weren't enabled before, but unless there is some existing or planned
> > CPU which would support 512-bit vectors in avx512f and avx512bw ISAs and
> > only support 128/256-bit vectors in those
> > dq,cd,bf16,fp16,vbmi,vbmi2,vnni,ifma,bitalg,vpopcntdq isas, I think there
> > is no need to differentiate further; the only CPUs which will support both
> > what -mavx512bw and -mavx10.1 requires will be (if there is no delta)
> > either CPUs with 128/256/512-bit vector support of those
> > f,vl,bw,dq,cd,...vpopcntdq ISAs, or AVX10.1-512 ISAs.
> > -mavx512vl -mavx512bw -mno-evex512 -mavx10.1-256 would on the other side
> > disable all 512-bit vector instructions and in the end just mean the
> > same as -mavx10.1-256.
> > For just
> > -mavx512bw -mno-evex512 -mavx10.1-256
> > the question is if that -mno-evex512 turns off also avx512bw/avx512f because
> > avx512vl isn't enabled at that point during processing, or if we do that
> > only at the end as a special case.  Of course, in this exact case there is
> > no difference, because -mavx10.1-256 turns that back on.
> > But it would make a difference on
> > -mavx512bw -mno-evex512 -mavx512vl
> > (when processed right away would disable AVX512BW (because VL isn't on)
> > and in the end enable VL,F including EVEX512, or be equivalent to just
> > -mavx512bw -mavx512vl if processed at the end, because -mavx512vl implied
> > -mevex512 again.
> >
> >       Jakub
>


--
BR,
Hongtao

^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: Intel AVX10.1 Compiler Design and Support
  2023-08-23  2:19                               ` Hongtao Liu
@ 2023-08-23  6:47                                 ` Jiang, Haochen
  0 siblings, 0 replies; 88+ messages in thread
From: Jiang, Haochen @ 2023-08-23  6:47 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: Jakub Jelinek, Richard Biener, ZiNgA BuRgA, gcc-patches

> -----Original Message-----
> From: Hongtao Liu <crazylht@gmail.com>
> Sent: Wednesday, August 23, 2023 10:19 AM
> To: Jiang, Haochen <haochen.jiang@intel.com>
> Cc: Jakub Jelinek <jakub@redhat.com>; Richard Biener
> <richard.guenther@gmail.com>; ZiNgA BuRgA <zingaburga@hotmail.com>;
> gcc-patches@gcc.gnu.org
> Subject: Re: Intel AVX10.1 Compiler Design and Support
> 
> On Wed, Aug 23, 2023 at 9:58 AM Jiang, Haochen
> <haochen.jiang@intel.com> wrote:
> >
> > > -----Original Message-----
> > > From: Jakub Jelinek <jakub@redhat.com>
> > > Sent: Tuesday, August 22, 2023 11:02 PM
> > > To: Hongtao Liu <crazylht@gmail.com>
> > > Cc: Richard Biener <richard.guenther@gmail.com>; Jiang, Haochen
> > > <haochen.jiang@intel.com>; ZiNgA BuRgA <zingaburga@hotmail.com>;
> > > gcc- patches@gcc.gnu.org
> > > Subject: Re: Intel AVX10.1 Compiler Design and Support
> > >
> > > On Tue, Aug 22, 2023 at 10:35:55PM +0800, Hongtao Liu wrote:
> > > > Let's assume there's no detla now, AVX10.1-512 is equal to
> > > >
> AVX512{F,VL,BW,DQ,CD,BF16,FP16,VBMI,VBMI2,VNNI,IFMA,BITALG,VPOPC
> NT
> > > > DQ}
> > > > > other stuff.
> > > > > The current common/config/i386/i386-common.cc
> > > > > OPTION_MASK_ISA*SET* would be like now, except that the current
> > > > > AVX512* sets imply also EVEX512/whatever it will be called, that
> > > > > option itself enables nothing (or TARGET_AVX512F), and unsetting it
> doesn't disable all the TARGET_AVX512*.
> > > > > -mavx10.1 would enable the AVX512* sets without
> EVEX512/whatever.
> > > > So for -mavx512bw -mavx10.1-256, -mavx512bw will set EVEX512, but
> > > > -mavx10.1-256 doesn't clear EVEX512 but just enable all AVX512* sets?.
> > > > then the combination basically is equal to AVX10.1-512(AVX512*
> > > > sets +
> > > > EVEX512)
> > > > If this is your assumption, yes, there's no need for TARGET_AVX10_1.
> >
> > I think we still need that since the current w/o AVX512VL, we will not
> > only enable 512 bit vector instructions but also enable scalar
> > instructions, which means when it comes to -mavx512bw -mno-evex512,
> we
> > should enable the scalar function.
> >
> > And scalar functions will also be enabled in AVX10.1-256, we need
> > something to distinguish them out from the ISA set w/o AVX512VL.
> Why do we need to distinguish scalar evex instruction?
> As long as -mavx512XXX -mno-evex does not generate zmm/64-bit kmask, it
> should be ok.
> 
> Assume there's no delta in AVX10.1, It sounds to me the design should be like
> 
> avx512*  <== mno-evex512==  avx512* + mevex512
> (no-evex512)                            (original AVX512 stuff)
>    /\                                              /\
>    ||(equal)                                   ||(equal)
>    \/                                              \/
> avx10.1-256                       avx10.1-512
>     /\                                              /\
>     ||                                              ||
>     ||                                              ||
> implied                                    implied
>     ||                                              ||
>     ||                                              ||
> avx10.2-256 <== implied ==  avx10.2-512
>     /\                                             /\
>     ||                                             ||
>     ||                                             ||
> implied                                    Implied
>     ||                                             ||
>     ||                                             ||
> avx10.3-256 <== implied ==   avx10.3-512
> 
> 1. The new instructions in avx10.x should be put in either avx10.x-256 or
> avx10.x-512 according to vector/kmask size 2. -mno-evex512 should disable -
> avx10.x-512.
> 3. -mavx512* will defaultly enable -mevex512, but -mavx10.1-256 will just
> enable -mavx512* but not -mevex512

I will revert all the AVX10.1 patches that have been committed in trunk since
the design changed if there is no objection in 24 hours.

Also I am working on a sample patch for -mevex512. Although there is a little
encoding issue in APX EVEX promoted KMOVQ, most of the users will not
notice that. And -mavxex512 is quite straightforward.

Thx,
Haochen

> 
> >
> > Thx,
> > Haochen
> >
> > >
> > > I think that would be my expectation.  -mavx512bw currently implies
> > > 512-bit vector support of avx512f and avx512bw, and with
> > > -mavx512{bw,vl} also 128-bit/256-bit vector support.  All pre-AVX10
> > > chips which do support AVX512BW support 512-bit vectors.  Now,
> > > -mavx10.1 will bring in also
> > > vl,dq,cd,bf16,fp16,vbmi,vbmi2,vnni,ifma,bitalg,vpopcntdq as you
> > > wrote which weren't enabled before, but unless there is some
> > > existing or planned CPU which would support 512-bit vectors in
> > > avx512f and avx512bw ISAs and only support 128/256-bit vectors in
> > > those dq,cd,bf16,fp16,vbmi,vbmi2,vnni,ifma,bitalg,vpopcntdq isas, I
> > > think there is no need to differentiate further; the only CPUs which
> > > will support both what -mavx512bw and -mavx10.1 requires will be (if
> > > there is no delta) either CPUs with 128/256/512-bit vector support of
> those f,vl,bw,dq,cd,...vpopcntdq ISAs, or AVX10.1-512 ISAs.
> > > -mavx512vl -mavx512bw -mno-evex512 -mavx10.1-256 would on the
> other
> > > side disable all 512-bit vector instructions and in the end just
> > > mean the same as -mavx10.1-256.
> > > For just
> > > -mavx512bw -mno-evex512 -mavx10.1-256 the question is if that
> > > -mno-evex512 turns off also avx512bw/avx512f because avx512vl isn't
> > > enabled at that point during processing, or if we do that only at
> > > the end as a special case.  Of course, in this exact case there is
> > > no difference, because -mavx10.1-256 turns that back on.
> > > But it would make a difference on
> > > -mavx512bw -mno-evex512 -mavx512vl
> > > (when processed right away would disable AVX512BW (because VL isn't
> > > on) and in the end enable VL,F including EVEX512, or be equivalent
> > > to just -mavx512bw -mavx512vl if processed at the end, because
> > > -mavx512vl implied
> > > -mevex512 again.
> > >
> > >       Jakub
> >
> 
> 
> --
> BR,
> Hongtao

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Intel AVX10.1 Compiler Design and Support
  2023-08-23  1:57                             ` Jiang, Haochen
  2023-08-23  2:19                               ` Hongtao Liu
@ 2023-08-23  8:16                               ` Jakub Jelinek
  2023-08-23  8:27                                 ` Hongtao Liu
  1 sibling, 1 reply; 88+ messages in thread
From: Jakub Jelinek @ 2023-08-23  8:16 UTC (permalink / raw)
  To: Jiang, Haochen; +Cc: Hongtao Liu, Richard Biener, ZiNgA BuRgA, gcc-patches

On Wed, Aug 23, 2023 at 01:57:59AM +0000, Jiang, Haochen wrote:
> > > Let's assume there's no detla now, AVX10.1-512 is equal to
> > > AVX512{F,VL,BW,DQ,CD,BF16,FP16,VBMI,VBMI2,VNNI,IFMA,BITALG,VPOPCNTDQ}
> > > > other stuff.
> > > > The current common/config/i386/i386-common.cc OPTION_MASK_ISA*SET* would be
> > > > like now, except that the current AVX512* sets imply also EVEX512/whatever
> > > > it will be called, that option itself enables nothing (or TARGET_AVX512F),
> > > > and unsetting it doesn't disable all the TARGET_AVX512*.
> > > > -mavx10.1 would enable the AVX512* sets without EVEX512/whatever.
> > > So for -mavx512bw -mavx10.1-256, -mavx512bw will set EVEX512, but
> > > -mavx10.1-256 doesn't clear EVEX512 but just enable all AVX512* sets?.
> > > then the combination basically is equal to AVX10.1-512(AVX512* sets +
> > > EVEX512)
> > > If this is your assumption, yes, there's no need for TARGET_AVX10_1.
> 
> I think we still need that since the current w/o AVX512VL, we will not only
> enable 512 bit vector instructions but also enable scalar instructions, which
> means when it comes to -mavx512bw -mno-evex512, we should enable
> the scalar function.
> 
> And scalar functions will also be enabled in AVX10.1-256, we need something
> to distinguish them out from the ISA set w/o AVX512VL.

Ah, forgot about scalar instructions, even better, then we don't have to do
that special case.  So, I think TARGET_AVX512F && !TARGET_EVEX512 && !TARGET_AVX512VL
in general should disable 512-bit modes in ix86_hard_regno_mode_ok.  That
should prevent the need to replace TARGET_AVX512F to TARGET_EVEX512 on all
the patterns which refer to 512-bit modes.  Also wonder if it
wouldn't be easiest to make "v" constraint in that case be equivalent to
just "x" so that all those hacks to make xmm16+ registers working in various
instructions through g modifiers wouldn't trigger.  Sure, that would
penalize also scalar instructions, but the above case wouldn't be something
any CPU actually supports, it would be only the common subset of say XeonPhi
and AVX10.1-256.

	Jakub


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Intel AVX10.1 Compiler Design and Support
  2023-08-23  8:16                               ` Jakub Jelinek
@ 2023-08-23  8:27                                 ` Hongtao Liu
  0 siblings, 0 replies; 88+ messages in thread
From: Hongtao Liu @ 2023-08-23  8:27 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Jiang, Haochen, Richard Biener, ZiNgA BuRgA, gcc-patches

On Wed, Aug 23, 2023 at 4:16 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Wed, Aug 23, 2023 at 01:57:59AM +0000, Jiang, Haochen wrote:
> > > > Let's assume there's no detla now, AVX10.1-512 is equal to
> > > > AVX512{F,VL,BW,DQ,CD,BF16,FP16,VBMI,VBMI2,VNNI,IFMA,BITALG,VPOPCNTDQ}
> > > > > other stuff.
> > > > > The current common/config/i386/i386-common.cc OPTION_MASK_ISA*SET* would be
> > > > > like now, except that the current AVX512* sets imply also EVEX512/whatever
> > > > > it will be called, that option itself enables nothing (or TARGET_AVX512F),
> > > > > and unsetting it doesn't disable all the TARGET_AVX512*.
> > > > > -mavx10.1 would enable the AVX512* sets without EVEX512/whatever.
> > > > So for -mavx512bw -mavx10.1-256, -mavx512bw will set EVEX512, but
> > > > -mavx10.1-256 doesn't clear EVEX512 but just enable all AVX512* sets?.
> > > > then the combination basically is equal to AVX10.1-512(AVX512* sets +
> > > > EVEX512)
> > > > If this is your assumption, yes, there's no need for TARGET_AVX10_1.
> >
> > I think we still need that since the current w/o AVX512VL, we will not only
> > enable 512 bit vector instructions but also enable scalar instructions, which
> > means when it comes to -mavx512bw -mno-evex512, we should enable
> > the scalar function.
> >
> > And scalar functions will also be enabled in AVX10.1-256, we need something
> > to distinguish them out from the ISA set w/o AVX512VL.
>
> Ah, forgot about scalar instructions, even better, then we don't have to do
> that special case.  So, I think TARGET_AVX512F && !TARGET_EVEX512 && !TARGET_AVX512VL
> in general should disable 512-bit modes in ix86_hard_regno_mode_ok.  That
> should prevent the need to replace TARGET_AVX512F to TARGET_EVEX512 on all
> the patterns which refer to 512-bit modes.  Also wonder if it
> wouldn't be easiest to make "v" constraint in that case be equivalent to
> just "x" so that all those hacks to make xmm16+ registers working in various
We can clear evex sse register in ix86_conditional_register_usage when
TARGET_AVX512F && !TARGET_EVEX512 && !TARGET_AVX512VL if we don't care
much about scalar ones.
> instructions through g modifiers wouldn't trigger.  Sure, that would
> penalize also scalar instructions, but the above case wouldn't be something
> any CPU actually supports, it would be only the common subset of say XeonPhi
> and AVX10.1-256.
>
>         Jakub
>


-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Intel AVX10.1 Compiler Design and Support
  2023-08-22 14:35                         ` Hongtao Liu
  2023-08-22 15:01                           ` Jakub Jelinek
@ 2023-08-23  7:32                           ` Richard Biener
  2023-08-23  8:03                             ` Jiang, Haochen
  2023-08-23  8:24                             ` Hongtao Liu
  1 sibling, 2 replies; 88+ messages in thread
From: Richard Biener @ 2023-08-23  7:32 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: Jakub Jelinek, Jiang, Haochen, ZiNgA BuRgA, gcc-patches

On Tue, Aug 22, 2023 at 4:36 PM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Tue, Aug 22, 2023 at 9:54 PM Jakub Jelinek <jakub@redhat.com> wrote:
> >
> > On Tue, Aug 22, 2023 at 09:35:44PM +0800, Hongtao Liu wrote:
> > > Ok, then we can't avoid TARGET_AVX10_1 in those existing 256/128-bit
> > > evex instruction patterns.
> >
> > Why?
> > Internally for md etc. purposes, we should have the current
> > TARGET_AVX512* etc. ISA flags, plus one new one, whatever we call it
> > (TARGET_EVEX512 even if it is not completely descriptive because of kandq
> > etc., or some other name) which says if 512-bit vector modes can be used,
> > if g modifier can be used, if the 64-bit mask operations can be used etc.
> > Plus, if AVX10.1 contains any instructions not covered in the preexisting
> > TARGET_AVX512* sets, TARGET_AVX10_1 which covers that delta, otherwise
> > keep -mavx10.1 just as an command line option which enables/disables
> Let's assume there's no detla now, AVX10.1-512 is equal to
> AVX512{F,VL,BW,DQ,CD,BF16,FP16,VBMI,VBMI2,VNNI,IFMA,BITALG, VPOPCNTDQ}
> > other stuff.
> > The current common/config/i386/i386-common.cc OPTION_MASK_ISA*SET* would be
> > like now, except that the current AVX512* sets imply also EVEX512/whatever
> > it will be called, that option itself enables nothing (or TARGET_AVX512F),
> > and unsetting it doesn't disable all the TARGET_AVX512*.
> > -mavx10.1 would enable the AVX512* sets without EVEX512/whatever.
> So for -mavx512bw -mavx10.1-256, -mavx512bw will set EVEX512, but
> -mavx10.1-256 doesn't clear EVEX512 but just enable all AVX512* sets?.

As I said earlier -mavx10.1-256 (and -mavx10.1-512) should not exist.
So instead
we'd have -mavx512bw -mavx10.1 where -mavx512bw enables evex512 and
-mavx10.1 will enable the 10.1 ISAs _not affecting_ whether evex512 is
set or not.

We then have the -mevex512 flag (or whatever name we agree to) to enable
(or disable) 512bit support.

If you insist on having -mavx10.1-256 that should alias to -mavx10.1 +
-mno-evex512,
but Jakub disagrees here, so I'd rather not have it at all.  We could have
-mavx10.1-512 aliasing to -mavx10.1 + -mevex512 (Jakub would agree here).

Richard.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: Intel AVX10.1 Compiler Design and Support
  2023-08-23  7:32                           ` Richard Biener
@ 2023-08-23  8:03                             ` Jiang, Haochen
  2023-08-23  8:31                               ` Jakub Jelinek
  2023-08-23  8:24                             ` Hongtao Liu
  1 sibling, 1 reply; 88+ messages in thread
From: Jiang, Haochen @ 2023-08-23  8:03 UTC (permalink / raw)
  To: Richard Biener, Hongtao Liu; +Cc: Jakub Jelinek, ZiNgA BuRgA, gcc-patches

> -----Original Message-----
> From: Richard Biener <richard.guenther@gmail.com>
> Sent: Wednesday, August 23, 2023 3:32 PM
> To: Hongtao Liu <crazylht@gmail.com>
> Cc: Jakub Jelinek <jakub@redhat.com>; Jiang, Haochen
> <haochen.jiang@intel.com>; ZiNgA BuRgA <zingaburga@hotmail.com>; gcc-
> patches@gcc.gnu.org
> Subject: Re: Intel AVX10.1 Compiler Design and Support
> 
> On Tue, Aug 22, 2023 at 4:36 PM Hongtao Liu <crazylht@gmail.com> wrote:
> >
> > On Tue, Aug 22, 2023 at 9:54 PM Jakub Jelinek <jakub@redhat.com> wrote:
> > >
> > > On Tue, Aug 22, 2023 at 09:35:44PM +0800, Hongtao Liu wrote:
> > > > Ok, then we can't avoid TARGET_AVX10_1 in those existing 256/128-bit
> > > > evex instruction patterns.
> > >
> > > Why?
> > > Internally for md etc. purposes, we should have the current
> > > TARGET_AVX512* etc. ISA flags, plus one new one, whatever we call it
> > > (TARGET_EVEX512 even if it is not completely descriptive because of kandq
> > > etc., or some other name) which says if 512-bit vector modes can be used,
> > > if g modifier can be used, if the 64-bit mask operations can be used etc.
> > > Plus, if AVX10.1 contains any instructions not covered in the preexisting
> > > TARGET_AVX512* sets, TARGET_AVX10_1 which covers that delta, otherwise
> > > keep -mavx10.1 just as an command line option which enables/disables
> > Let's assume there's no detla now, AVX10.1-512 is equal to
> > AVX512{F,VL,BW,DQ,CD,BF16,FP16,VBMI,VBMI2,VNNI,IFMA,BITALG,
> VPOPCNTDQ}
> > > other stuff.
> > > The current common/config/i386/i386-common.cc OPTION_MASK_ISA*SET*
> would be
> > > like now, except that the current AVX512* sets imply also EVEX512/whatever
> > > it will be called, that option itself enables nothing (or TARGET_AVX512F),
> > > and unsetting it doesn't disable all the TARGET_AVX512*.
> > > -mavx10.1 would enable the AVX512* sets without EVEX512/whatever.
> > So for -mavx512bw -mavx10.1-256, -mavx512bw will set EVEX512, but
> > -mavx10.1-256 doesn't clear EVEX512 but just enable all AVX512* sets?.
> 
> As I said earlier -mavx10.1-256 (and -mavx10.1-512) should not exist.
> So instead
> we'd have -mavx512bw -mavx10.1 where -mavx512bw enables evex512 and
> -mavx10.1 will enable the 10.1 ISAs _not affecting_ whether evex512 is
> set or not.
> 
> We then have the -mevex512 flag (or whatever name we agree to) to enable
> (or disable) 512bit support.
> 
> If you insist on having -mavx10.1-256 that should alias to -mavx10.1 +
> -mno-evex512,
> but Jakub disagrees here, so I'd rather not have it at all.  We could have
> -mavx10.1-512 aliasing to -mavx10.1 + -mevex512 (Jakub would agree here).

We could first work on -mevex512 then further discuss -mavx10.1-256/512 since
these -mavx10.1-256/512 is quite controversial.

Just to clarify, -mno-evex512 -mavx512f should not enable 512 bit vector right?

Thx,
Haochen

> 
> Richard.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Intel AVX10.1 Compiler Design and Support
  2023-08-23  8:03                             ` Jiang, Haochen
@ 2023-08-23  8:31                               ` Jakub Jelinek
  2023-08-23  8:47                                 ` Hongtao Liu
  0 siblings, 1 reply; 88+ messages in thread
From: Jakub Jelinek @ 2023-08-23  8:31 UTC (permalink / raw)
  To: Jiang, Haochen; +Cc: Richard Biener, Hongtao Liu, ZiNgA BuRgA, gcc-patches

On Wed, Aug 23, 2023 at 08:03:58AM +0000, Jiang, Haochen wrote:
> We could first work on -mevex512 then further discuss -mavx10.1-256/512 since
> these -mavx10.1-256/512 is quite controversial.
> 
> Just to clarify, -mno-evex512 -mavx512f should not enable 512 bit vector right?

I think it should enable them because -mavx512f is after it.  But it seems the
option handling is more complex than I thought, e.g. -mavx512bw -mno-avx512bw
just cancels each other, rather than
enabling AVX512BW, AVX512F, AVX2 and all its dependencies (like -mavx512bw
alone does) and then just disabling AVX512BW (like -mno-avx512bw does).
But, if one uses separate target pragmas, it behaves like that:
#pragma GCC target ("avx512bw")
#ifdef __AVX512F__
int a;
#endif
#ifdef __AVX512BW__
int b;
#endif
#pragma GCC target ("no-avx512bw")
#ifdef __AVX512F__
int c;
#endif
#ifdef __AVX512BW__
int d;
#endif
The above defines a, b and c vars even without any special -march= or other
command line option.

So, first important decision would be whether to make EVEX512
OPTION_MASK_ISA_EVEX512 or OPTION_MASK_ISA2_EVEX512, the former would need
to move some other ISA flag from the first to second set.
That OPTION_MASK_ISA*_EVEX512 then should be added to
OPTION_MASK_ISA_AVX512F_SET or OPTION_MASK_ISA2_AVX512F_SET (but, if it is
the latter, we also need to do that for tons of other AVX512*_SET),
and then just arrange for -mavx10.1-256 to enable
OPTION_MASK_ISA*_AVX512*_SET of everything it needs except the EVEX512 set
(but, only disable it from the newly added set, not actually act as
-mavx512{f,bw,...} -mno-evex512).
OPTION_MASK_ISA*_EVEX512_SET dunno, should it enable OPTION_MASK_ISA_AVX512F
or just EVEX512?
And then the UNSET cases...

	Jakub

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Intel AVX10.1 Compiler Design and Support
  2023-08-23  8:31                               ` Jakub Jelinek
@ 2023-08-23  8:47                                 ` Hongtao Liu
  0 siblings, 0 replies; 88+ messages in thread
From: Hongtao Liu @ 2023-08-23  8:47 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Jiang, Haochen, Richard Biener, ZiNgA BuRgA, gcc-patches

On Wed, Aug 23, 2023 at 4:31 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Wed, Aug 23, 2023 at 08:03:58AM +0000, Jiang, Haochen wrote:
> > We could first work on -mevex512 then further discuss -mavx10.1-256/512 since
> > these -mavx10.1-256/512 is quite controversial.
> >
> > Just to clarify, -mno-evex512 -mavx512f should not enable 512 bit vector right?
>
> I think it should enable them because -mavx512f is after it.  But it seems the
> option handling is more complex than I thought, e.g. -mavx512bw -mno-avx512bw
> just cancels each other, rather than
> enabling AVX512BW, AVX512F, AVX2 and all its dependencies (like -mavx512bw
> alone does) and then just disabling AVX512BW (like -mno-avx512bw does).
> But, if one uses separate target pragmas, it behaves like that:
> #pragma GCC target ("avx512bw")
> #ifdef __AVX512F__
> int a;
> #endif
> #ifdef __AVX512BW__
> int b;
> #endif
> #pragma GCC target ("no-avx512bw")
> #ifdef __AVX512F__
> int c;
> #endif
> #ifdef __AVX512BW__
> int d;
> #endif
> The above defines a, b and c vars even without any special -march= or other
> command line option.
>
> So, first important decision would be whether to make EVEX512
> OPTION_MASK_ISA_EVEX512 or OPTION_MASK_ISA2_EVEX512, the former would need
> to move some other ISA flag from the first to second set.
> That OPTION_MASK_ISA*_EVEX512 then should be added to
> OPTION_MASK_ISA_AVX512F_SET or OPTION_MASK_ISA2_AVX512F_SET (but, if it is
> the latter, we also need to do that for tons of other AVX512*_SET),
> and then just arrange for -mavx10.1-256 to enable
> OPTION_MASK_ISA*_AVX512*_SET of everything it needs except the EVEX512 set
> (but, only disable it from the newly added set, not actually act as
> -mavx512{f,bw,...} -mno-evex512).
> OPTION_MASK_ISA*_EVEX512_SET dunno, should it enable OPTION_MASK_ISA_AVX512F
> or just EVEX512?
> And then the UNSET cases...
We can make OPTION_MASK_ISA2_EVEX512, but not set/unset that in
ix86_handle_option, but in ix86_option_override_internal, after all
set/unset for the existing AVX512***, if there's still
OPTION_MASK_ISA_AVX512F and no explicit set/unset for
OPTION_MASK_ISA2_EVEX512, then we set OPTION_MASK_ISA2_EVEX512.
That would make -mavx512*** implicitly set -mevex-512, but when
there's explicit -mno-evex512, -mavx512f won't set -mevex512 no matter
where -mno-evex512 is put.(-mno-evex512 -mavx512f still disable
512-bit).
>
>         Jakub
>


-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Intel AVX10.1 Compiler Design and Support
  2023-08-23  7:32                           ` Richard Biener
  2023-08-23  8:03                             ` Jiang, Haochen
@ 2023-08-23  8:24                             ` Hongtao Liu
  1 sibling, 0 replies; 88+ messages in thread
From: Hongtao Liu @ 2023-08-23  8:24 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jakub Jelinek, Jiang, Haochen, ZiNgA BuRgA, gcc-patches

On Wed, Aug 23, 2023 at 3:33 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Tue, Aug 22, 2023 at 4:36 PM Hongtao Liu <crazylht@gmail.com> wrote:
> >
> > On Tue, Aug 22, 2023 at 9:54 PM Jakub Jelinek <jakub@redhat.com> wrote:
> > >
> > > On Tue, Aug 22, 2023 at 09:35:44PM +0800, Hongtao Liu wrote:
> > > > Ok, then we can't avoid TARGET_AVX10_1 in those existing 256/128-bit
> > > > evex instruction patterns.
> > >
> > > Why?
> > > Internally for md etc. purposes, we should have the current
> > > TARGET_AVX512* etc. ISA flags, plus one new one, whatever we call it
> > > (TARGET_EVEX512 even if it is not completely descriptive because of kandq
> > > etc., or some other name) which says if 512-bit vector modes can be used,
> > > if g modifier can be used, if the 64-bit mask operations can be used etc.
> > > Plus, if AVX10.1 contains any instructions not covered in the preexisting
> > > TARGET_AVX512* sets, TARGET_AVX10_1 which covers that delta, otherwise
> > > keep -mavx10.1 just as an command line option which enables/disables
> > Let's assume there's no detla now, AVX10.1-512 is equal to
> > AVX512{F,VL,BW,DQ,CD,BF16,FP16,VBMI,VBMI2,VNNI,IFMA,BITALG, VPOPCNTDQ}
> > > other stuff.
> > > The current common/config/i386/i386-common.cc OPTION_MASK_ISA*SET* would be
> > > like now, except that the current AVX512* sets imply also EVEX512/whatever
> > > it will be called, that option itself enables nothing (or TARGET_AVX512F),
> > > and unsetting it doesn't disable all the TARGET_AVX512*.
> > > -mavx10.1 would enable the AVX512* sets without EVEX512/whatever.
> > So for -mavx512bw -mavx10.1-256, -mavx512bw will set EVEX512, but
> > -mavx10.1-256 doesn't clear EVEX512 but just enable all AVX512* sets?.
>
> As I said earlier -mavx10.1-256 (and -mavx10.1-512) should not exist.
> So instead
> we'd have -mavx512bw -mavx10.1 where -mavx512bw enables evex512 and
> -mavx10.1 will enable the 10.1 ISAs _not affecting_ whether evex512 is
> set or not.
>
> We then have the -mevex512 flag (or whatever name we agree to) to enable
> (or disable) 512bit support.
>
> If you insist on having -mavx10.1-256 that should alias to -mavx10.1 +
> -mno-evex512,
> but Jakub disagrees here, so I'd rather not have it at all.  We could have
I think we can just support -mevex512 for now, as for avx10.1-256/512
it can wait for a while, considering it doesn't have new instructions
and is controversial.
Basically, -mno-evex512 is good enough for most needs.
The only part I disagree with Jakub is I think for -mavx512f
-mno-evex512 -mavx512bw, we need to disable 512-bit, an explicit
-mno-evex512 should precedence over implicit yes.
> -mavx10.1-512 aliasing to -mavx10.1 + -mevex512 (Jakub would agree here).
>
> Richard.



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Intel AVX10.1 Compiler Design and Support
  2023-08-22 13:35                     ` Hongtao Liu
  2023-08-22 13:54                       ` Jakub Jelinek
@ 2023-08-22 14:39                       ` Hongtao Liu
  1 sibling, 0 replies; 88+ messages in thread
From: Hongtao Liu @ 2023-08-22 14:39 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jakub Jelinek, Jiang, Haochen, ZiNgA BuRgA, gcc-patches

On Tue, Aug 22, 2023 at 9:35 PM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Tue, Aug 22, 2023 at 9:24 PM Richard Biener
> <richard.guenther@gmail.com> wrote:
> >
> > On Tue, Aug 22, 2023 at 3:16 PM Jakub Jelinek <jakub@redhat.com> wrote:
> > >
> > > On Tue, Aug 22, 2023 at 09:02:29PM +0800, Hongtao Liu wrote:
> > > > > Agreed.  And I still think -mevex512 vs. -mno-evex512 is the best option
> > > > > name to represent whether the effective ISA set allows 512-bit vectors or
> > > > > not.  I think -mavx10.1 -mno-avx512cd should be fine.  And, -mavx10.1-256
> > > > > option IMHO should be in the same spirit to all the others a positive enablement,
> > > > > not both positive (enable avx512{f,cd,bw,dq,...} and negative (disallow
> > > > > 512-bit vectors).  So, if one uses -mavx512f -mavx10.1-256, because the
> > > > > former would allow 512-bit vectors, the latter shouldn't disable those again
> > > > > because it isn't a -mno-* option.  Sure, instructions which are specific to
> > > > But there's implicit negative (disallow 512-bit vector), I think
> > >
> > > That is wrong.
> > >
> > > > -mav512f -mavx10.1-256 or -mavx10.1-256 -mavx512f shouldn't enable
> > > > 512-bit vector.
> > >
> > > Because then the -mavx10.1-256 option behaves completely differently from
> > > all the other isa options.
> > >
> > > We have the -march= options which are processed separately, but the normal
> > > ISA options either only enable something (when -mwhatever), or only disable something
> > > (when -mno-whatever). -mavx512f -mavx10.1-256 should be a union of those
> > > ISAs, like say -mavx2 -mbmi is, not an intersection or something even
> > > harder to understand.
> > >
> > > > Further, we should disallow a mix of exex512 and non-evex512 (e.g.
> > > > -mavx10.1-512 -mavx10.2-256),they should be a unified separate switch
> > > > that either disallows both or allows both. Instead of some isa
> > > > allowing it and some isa disallowing it.
> > >
> > > No, it will be really terrible user experience if the new options behave
> > > completely differently from everything else.  Because then we'll need to
> Ok, then we can't avoid TARGET_AVX10_1 in those existing 256/128-bit
> evex instruction patterns.
> > > document it in detail how it behaves and users will have hard time to figure
> > > it out, and specify what it does not just on the command line, but also when
> > > mixing with target attribute or pragmas.  -mavx10.1-512 -mavx10.2-256 should
> > > be a union of those two ISAs.  Either internally there is an ISA flag whether
> > > the instructions in the avx10.2 ISA but not avx10.1 ISA can operate on
> > > 512-bit vectors or not, in that case -mavx10.1-512 -mavx10.2-256 should
> > > enable the AVX10.1 set including 512-bit vectors + just the < 512-bit
> > > instructions from the 10.1 to 10.2 delta, or if there is no such separation
> > > internally, it will just enable full AVX10.2-512.  User has asked for it.
> >
> > I think having all three -mavx10.1, -mavx10.1-256 and -mavx10.1-512 is just
> > confusing.  Please separate ISA (avx10.1) from size.  If -m[no-]evex512 isn't
> > good propose something else.  -mavx512f will enable 512bits, -mavx10.1
> > will not unless -mevex512.  -mavx512f -mavx512vl -mno-evex512 will disable
> > 512bits.
> >
> > So scrap -mavx10.1-256 and -mavx10.1-512 please.
The related issue is what's the meaning of -mno-avx10.1-256/-mno-avx10.1-512
For -mno-avx10.1-256, maybe it just disable whole avx10.1
But for avx10.1-512 should it disable whole avx10.1 or just EVEX512,
or maybe we just doesn't provide -mno-avx10.1-512, just provide
-mno-avx10.1-256.
And use -mno-evex512 to disable 512-bit vectors.
>
> It sounds to me we would have something like
> avx512XXX
>    ^
>    |
> "independent": TARGET_AVX512VL || TARGET_AVX10_1 will enable
> 128/256-bit instruction.
>    |
> avx10.1-256 <----implied---- avx10.1-512
>     ^                                              ^
>     |                                               |
>     |                                               |
> implied                                   implied
>     |                                               |
>     |                                               |
> avx10.2-256 <----implied -----avx10.2-512
>     ^                                              ^
>     |                                               |
>     |                                               |
> implied                                    Implied
>     |                                               |
>     |                                               |
> avx10.3-256 <---implied-------avx10.3-512
>   .....
>
> And put every existing and new instruction under those flags
>
> >
> > Richard.
> >
> > >         Jakub
> > >
>
>
>
> --
> BR,
> Hongtao



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Intel AVX10.1 Compiler Design and Support
  2023-08-21  1:19   ` Hongtao Liu
  2023-08-21  7:36     ` Richard Biener
@ 2023-08-21  7:49     ` ZiNgA BuRgA
  1 sibling, 0 replies; 88+ messages in thread
From: ZiNgA BuRgA @ 2023-08-21  7:49 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: haochen.jiang, gcc-patches

Thanks for the responses!

It'd be unfortunate if AVX10 adoption is desired, yet there's no way to 
compile existing 256-bit code to be compatible with it.
Relying on SDE to check the output isn't a particularly viable solution.

It looks like `-mavx512vl -mprefer-vector-width=256` is my best bet 
under this design, and hope it works.  Fortunately, I'm not relying on 
3rd party code here, so I control all intrinsics used.

Something like a `-mmax-vector-width=256` option sounds more preferrable 
though, particularly for those using 3rd party code which checks the 
`__AVX512VL__` define, and assumes 512-bit vectors are available.

^ permalink raw reply	[flat|nested] 88+ messages in thread

end of thread, other threads:[~2023-08-23  8:48 UTC | newest]

Thread overview: 88+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-08  7:13 Intel AVX10.1 Compiler Design and Support Haochen Jiang
2023-08-08  7:13 ` [PATCH 1/3] Initial support for AVX10.1 Haochen Jiang
2023-08-16  2:29   ` Hongtao Liu
2023-08-08  7:13 ` [PATCH 2/3] Emit a warning when disabling AVX512 with AVX10 enabled or disabling AVX10 with AVX512 enabled Haochen Jiang
2023-08-16  2:30   ` Hongtao Liu
2023-08-08  7:13 ` [PATCH 3/3] Emit a warning when AVX10 options conflict in vector width Haochen Jiang
2023-08-16  2:30   ` Hongtao Liu
2023-08-08  7:19 ` [PATCH 1/6] Support AVX10.1 for AVX512DQ+AVX512VL intrins Haochen Jiang
2023-08-08  7:20 ` [PATCH 2/6] " Haochen Jiang
2023-08-08  7:20 ` [PATCH 3/6] " Haochen Jiang
2023-08-08  7:20 ` [PATCH 4/6] " Haochen Jiang
2023-08-08  7:20 ` [PATCH 5/6] " Haochen Jiang
2023-08-08  7:20 ` [PATCH 6/6] " Haochen Jiang
2023-08-16  2:36   ` Hongtao Liu
2023-08-08  7:42 ` Intel AVX10.1 Compiler Design and Support Jakub Jelinek
2023-08-08  8:14   ` Jiang, Haochen
2023-08-08 12:44     ` Richard Biener
2023-08-09  2:06       ` Hongtao Liu
2023-08-09  2:08         ` Hongtao Liu
2023-08-09  6:30       ` Jiang, Haochen
2023-08-08 19:55 ` Joseph Myers
2023-08-09  1:21   ` Hongtao Liu
2023-08-09  2:14     ` Hongtao Liu
2023-08-09  2:18       ` Hongtao Liu
2023-08-09  3:59         ` Wang, Phoebe
2023-08-09 20:43           ` Joseph Myers
2023-08-09 20:49             ` Jakub Jelinek
2023-08-10 12:36             ` Phoebe Wang
2023-08-10 12:45               ` Richard Biener
2023-08-10 13:12                 ` Phoebe Wang
2023-08-10 13:30                   ` Jan Beulich
2023-08-10 13:52                     ` Richard Biener
2023-08-10 14:15                     ` Jiang, Haochen
2023-08-10 15:08                       ` Zhang, Annita
2023-08-10 15:18                         ` Jakub Jelinek
2023-08-10 22:16                 ` Joseph Myers
2023-08-09  4:01         ` Phoebe Wang
2023-08-09  5:37           ` Richard Biener
2023-08-09  6:24             ` Jiang, Haochen
2023-08-09  8:14             ` Florian Weimer
2023-08-09  8:24               ` Hongtao Liu
2023-08-09  7:17       ` Jan Beulich
2023-08-09  7:38         ` Hongtao Liu
2023-08-09  8:04           ` Jan Beulich
2023-08-09  9:15           ` Florian Weimer
2023-08-09 10:15             ` Hongtao Liu
2023-08-09 10:17             ` Zhang, Annita
2023-08-09 13:54               ` Michael Matz
2023-08-09 14:34                 ` Zhang, Annita
2023-08-10 15:08 ` Jiang, Haochen
2023-08-10 16:00   ` Jakub Jelinek
2023-08-19 22:44 ` ZiNgA BuRgA
2023-08-20  5:44   ` Richard Biener
2023-08-21  1:19   ` Hongtao Liu
2023-08-21  7:36     ` Richard Biener
2023-08-21  8:09       ` Jakub Jelinek
2023-08-21  8:28         ` Hongtao Liu
2023-08-21  8:37           ` Jakub Jelinek
2023-08-21  8:46             ` Hongtao Liu
2023-08-21  9:34           ` Richard Biener
2023-08-21  9:36             ` Richard Biener
2023-08-21  9:50             ` Hongtao Liu
2023-08-21  9:26       ` ZiNgA BuRgA
2023-08-22  3:20         ` Jiang, Haochen
2023-08-22  7:36           ` Richard Biener
2023-08-22  8:34             ` Jakub Jelinek
2023-08-22  8:35               ` Richard Biener
2023-08-22  8:52                 ` Jiang, Haochen
2023-08-22  9:23                   ` Richard Biener
2023-08-22 13:02               ` Hongtao Liu
2023-08-22 13:16                 ` Jakub Jelinek
2023-08-22 13:23                   ` Richard Biener
2023-08-22 13:35                     ` Hongtao Liu
2023-08-22 13:54                       ` Jakub Jelinek
2023-08-22 14:35                         ` Hongtao Liu
2023-08-22 15:01                           ` Jakub Jelinek
2023-08-23  1:57                             ` Jiang, Haochen
2023-08-23  2:19                               ` Hongtao Liu
2023-08-23  6:47                                 ` Jiang, Haochen
2023-08-23  8:16                               ` Jakub Jelinek
2023-08-23  8:27                                 ` Hongtao Liu
2023-08-23  7:32                           ` Richard Biener
2023-08-23  8:03                             ` Jiang, Haochen
2023-08-23  8:31                               ` Jakub Jelinek
2023-08-23  8:47                                 ` Hongtao Liu
2023-08-23  8:24                             ` Hongtao Liu
2023-08-22 14:39                       ` Hongtao Liu
2023-08-21  7:49     ` ZiNgA BuRgA

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).