public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] i386: Share AES xmm intrin with VAES
@ 2023-04-18  7:18 Haochen Jiang
  2023-04-18  7:28 ` Haochen Jiang
  2023-04-19  2:31 ` Hongtao Liu
  0 siblings, 2 replies; 12+ messages in thread
From: Haochen Jiang @ 2023-04-18  7:18 UTC (permalink / raw)
  To: gcc-patches; +Cc: hongtao.liu, ubizjak

Hi all,

Currently in GCC, the 128 bit intrin for instruction vaes{end,dec}{last,}
is under AES ISA. Because there is no dependency between ISA set AES
and VAES, The 128 bit intrin is not available when we use compiler flag
-mvaes -mavx512vl and there is no other way to use that intrin. But it
should according to Intel SDM.

Although VAES aims to be a VEX/EVEX promotion for AES, but it is only part
of it. Therefore, we share the AES xmm intrin with VAES.

Also, since -mvaes indicates that we could use VEX encoding for ymm, we
should imply AVX for VAES.

Tested on x86_64-pc-linux-gnu. Ok for trunk?

BRs,
Haochen

gcc/ChangeLog:

	* common/config/i386/i386-common.cc
	(OPTION_MASK_ISA2_AVX_UNSET): Add OPTION_MASK_ISA2_VAES_UNSET.
	(ix86_handle_option): Set AVX flag for VAES.
	* config/i386/i386-builtins.cc (ix86_init_mmx_sse_builtins):
	Add OPTION_MASK_ISA2_VAES_UNSET.
	(def_builtin): Share builtin between AES and VAES.
	* config/i386/i386-expand.cc (ix86_check_builtin_isa_match):
	Ditto.
	* config/i386/i386.md (aes): New isa attribute.
	* config/i386/sse.md (aesenc): Add pattern for VAES with xmm.
	(aesenclast): Ditto.
	(aesdec): Ditto.
	(aesdeclast): Ditto.
	* config/i386/vaesintrin.h: Remove redundant avx target push.
	* config/i386/wmmintrin.h (_mm_aesdec_si128): Change to macro.
	(_mm_aesdeclast_si128): Ditto.
	(_mm_aesenc_si128): Ditto.
	(_mm_aesenclast_si128): Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512fvl-vaes-1.c: Add VAES xmm test.
	* gcc.target/i386/pr84335.c: Modify error message.
---
 gcc/common/config/i386/i386-common.cc         |  5 +-
 gcc/config/i386/i386-builtins.cc              | 21 ++++---
 gcc/config/i386/i386-expand.cc                |  1 +
 gcc/config/i386/i386.md                       |  3 +-
 gcc/config/i386/sse.md                        | 60 ++++++++++---------
 gcc/config/i386/vaesintrin.h                  |  4 +-
 gcc/config/i386/wmmintrin.h                   | 29 +++------
 .../gcc.target/i386/avx512fvl-vaes-1.c        | 11 ++++
 gcc/testsuite/gcc.target/i386/pr84335.c       |  4 +-
 9 files changed, 75 insertions(+), 63 deletions(-)

diff --git a/gcc/common/config/i386/i386-common.cc b/gcc/common/config/i386/i386-common.cc
index c7954da8e34..bf126f14073 100644
--- a/gcc/common/config/i386/i386-common.cc
+++ b/gcc/common/config/i386/i386-common.cc
@@ -348,7 +348,8 @@ along with GCC; see the file COPYING3.  If not see
    | OPTION_MASK_ISA2_AVX512VP2INTERSECT_UNSET)
 #define OPTION_MASK_ISA2_GENERAL_REGS_ONLY_UNSET \
   OPTION_MASK_ISA2_SSE_UNSET
-#define OPTION_MASK_ISA2_AVX_UNSET OPTION_MASK_ISA2_AVX2_UNSET
+#define OPTION_MASK_ISA2_AVX_UNSET \
+  (OPTION_MASK_ISA2_AVX2_UNSET | OPTION_MASK_ISA2_VAES_UNSET)
 #define OPTION_MASK_ISA2_SSE4_2_UNSET OPTION_MASK_ISA2_AVX_UNSET
 #define OPTION_MASK_ISA2_SSE4_1_UNSET OPTION_MASK_ISA2_SSE4_2_UNSET
 #define OPTION_MASK_ISA2_SSE4_UNSET OPTION_MASK_ISA2_SSE4_1_UNSET
@@ -685,6 +686,8 @@ ix86_handle_option (struct gcc_options *opts,
 	{
 	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_VAES_SET;
 	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_VAES_SET;
+	  opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX_SET;
+	  opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX_SET;
 	}
       else
 	{
diff --git a/gcc/config/i386/i386-builtins.cc b/gcc/config/i386/i386-builtins.cc
index fc0c82b156e..28f404da288 100644
--- a/gcc/config/i386/i386-builtins.cc
+++ b/gcc/config/i386/i386-builtins.cc
@@ -279,14 +279,15 @@ def_builtin (HOST_WIDE_INT mask, HOST_WIDE_INT mask2,
       if (((mask2 == 0 || (mask2 & ix86_isa_flags2) != 0)
 	   && (mask == 0 || (mask & ix86_isa_flags) != 0))
 	  || ((mask & OPTION_MASK_ISA_MMX) != 0 && TARGET_MMX_WITH_SSE)
-	  /* "Unified" builtin used by either AVXVNNI/AVXIFMA intrinsics
-	     or AVX512VNNIVL/AVX512IFMAVL non-mask intrinsics should be
-	     defined whenever avxvnni/avxifma or avx512vnni/avxifma &&
-	     avx512vl exist.  */
+	  /* "Unified" builtin used by either AVXVNNI/AVXIFMA/AES intrinsics
+	     or AVX512VNNIVL/AVX512IFMAVL/VAESVL non-mask intrinsics should be
+	     defined whenever avxvnni/avxifma/aes or avx512vnni/avx512ifma/vaes
+	     && avx512vl exist.  */
 	  || (mask2 == OPTION_MASK_ISA2_AVXVNNI)
 	  || (mask2 == OPTION_MASK_ISA2_AVXIFMA)
 	  || (mask2 == (OPTION_MASK_ISA2_AVXNECONVERT
 			| OPTION_MASK_ISA2_AVX512BF16))
+	  || ((mask2 & OPTION_MASK_ISA2_VAES) != 0)
 	  || (lang_hooks.builtin_function
 	      == lang_hooks.builtin_function_ext_scope))
 	{
@@ -661,16 +662,20 @@ ix86_init_mmx_sse_builtins (void)
 	       VOID_FTYPE_UNSIGNED_UNSIGNED, IX86_BUILTIN_MWAIT);
 
   /* AES */
-  def_builtin_const (OPTION_MASK_ISA_AES | OPTION_MASK_ISA_SSE2, 0,
+  def_builtin_const (OPTION_MASK_ISA_AES | OPTION_MASK_ISA_SSE2,
+		     OPTION_MASK_ISA2_VAES,
 		     "__builtin_ia32_aesenc128",
 		     V2DI_FTYPE_V2DI_V2DI, IX86_BUILTIN_AESENC128);
-  def_builtin_const (OPTION_MASK_ISA_AES | OPTION_MASK_ISA_SSE2, 0,
+  def_builtin_const (OPTION_MASK_ISA_AES | OPTION_MASK_ISA_SSE2,
+		     OPTION_MASK_ISA2_VAES,
 		     "__builtin_ia32_aesenclast128",
 		     V2DI_FTYPE_V2DI_V2DI, IX86_BUILTIN_AESENCLAST128);
-  def_builtin_const (OPTION_MASK_ISA_AES | OPTION_MASK_ISA_SSE2, 0,
+  def_builtin_const (OPTION_MASK_ISA_AES | OPTION_MASK_ISA_SSE2,
+		     OPTION_MASK_ISA2_VAES,
 		     "__builtin_ia32_aesdec128",
 		     V2DI_FTYPE_V2DI_V2DI, IX86_BUILTIN_AESDEC128);
-  def_builtin_const (OPTION_MASK_ISA_AES | OPTION_MASK_ISA_SSE2, 0,
+  def_builtin_const (OPTION_MASK_ISA_AES | OPTION_MASK_ISA_SSE2,
+		     OPTION_MASK_ISA2_VAES,
 		     "__builtin_ia32_aesdeclast128",
 		     V2DI_FTYPE_V2DI_V2DI, IX86_BUILTIN_AESDECLAST128);
   def_builtin_const (OPTION_MASK_ISA_AES | OPTION_MASK_ISA_SSE2, 0,
diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 54d5dfae677..28574a5809b 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -12624,6 +12624,7 @@ ix86_check_builtin_isa_match (unsigned int fcode,
 		 OPTION_MASK_ISA2_AVXIFMA);
   SHARE_BUILTIN (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512BF16, 0,
 		 OPTION_MASK_ISA2_AVXNECONVERT);
+  SHARE_BUILTIN (OPTION_MASK_ISA_AES, 0, 0, OPTION_MASK_ISA2_VAES);
   isa = tmp_isa;
   isa2 = tmp_isa2;
 
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index acc994226e7..15c366cb595 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -836,7 +836,7 @@
 
 ;; Used to control the "enabled" attribute on a per-instruction basis.
 (define_attr "isa" "base,x64,nox64,x64_sse2,x64_sse4,x64_sse4_noavx,
-		    x64_avx,x64_avx512bw,x64_avx512dq,
+		    x64_avx,x64_avx512bw,x64_avx512dq,aes,
 		    sse_noavx,sse2,sse2_noavx,sse3,sse3_noavx,sse4,sse4_noavx,
 		    avx,noavx,avx2,noavx2,bmi,bmi2,fma4,fma,avx512f,noavx512f,
 		    avx512bw,noavx512bw,avx512dq,noavx512dq,fma_or_avx512vl,
@@ -863,6 +863,7 @@
 	   (symbol_ref "TARGET_64BIT && TARGET_AVX512BW")
 	 (eq_attr "isa" "x64_avx512dq")
 	   (symbol_ref "TARGET_64BIT && TARGET_AVX512DQ")
+	 (eq_attr "isa" "aes") (symbol_ref "TARGET_AES")
 	 (eq_attr "isa" "sse_noavx")
 	   (symbol_ref "TARGET_SSE && !TARGET_AVX")
 	 (eq_attr "isa" "sse2") (symbol_ref "TARGET_SSE2")
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 33e281901cf..e7d565a8389 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -25107,67 +25107,71 @@
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 
 (define_insn "aesenc"
-  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
-	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x")
-		       (match_operand:V2DI 2 "vector_operand" "xBm,xm")]
+  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
+	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
+		       (match_operand:V2DI 2 "vector_operand" "xBm,xm,vm")]
 		      UNSPEC_AESENC))]
-  "TARGET_AES"
+  "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
   "@
    aesenc\t{%2, %0|%0, %2}
+   vaesenc\t{%2, %1, %0|%0, %1, %2}
    vaesenc\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,aes,avx512vl")
    (set_attr "type" "sselog1")
    (set_attr "prefix_extra" "1")
-   (set_attr "prefix" "orig,vex")
-   (set_attr "btver2_decode" "double,double")
+   (set_attr "prefix" "orig,vex,evex")
+   (set_attr "btver2_decode" "double,double,double")
    (set_attr "mode" "TI")])
 
 (define_insn "aesenclast"
-  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
-	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x")
-		       (match_operand:V2DI 2 "vector_operand" "xBm,xm")]
+  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
+	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
+		       (match_operand:V2DI 2 "vector_operand" "xBm,xm,vm")]
 		      UNSPEC_AESENCLAST))]
-  "TARGET_AES"
+  "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
   "@
    aesenclast\t{%2, %0|%0, %2}
+   vaesenclast\t{%2, %1, %0|%0, %1, %2}
    vaesenclast\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,aes,avx512vl")
    (set_attr "type" "sselog1")
    (set_attr "prefix_extra" "1")
-   (set_attr "prefix" "orig,vex")
-   (set_attr "btver2_decode" "double,double") 
+   (set_attr "prefix" "orig,vex,evex")
+   (set_attr "btver2_decode" "double,double,double") 
    (set_attr "mode" "TI")])
 
 (define_insn "aesdec"
-  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
-	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x")
-		       (match_operand:V2DI 2 "vector_operand" "xBm,xm")]
+  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
+	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
+		       (match_operand:V2DI 2 "vector_operand" "xBm,xm,vm")]
 		      UNSPEC_AESDEC))]
-  "TARGET_AES"
+  "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
   "@
    aesdec\t{%2, %0|%0, %2}
+   vaesdec\t{%2, %1, %0|%0, %1, %2}
    vaesdec\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,aes,avx512vl")
    (set_attr "type" "sselog1")
    (set_attr "prefix_extra" "1")
-   (set_attr "prefix" "orig,vex")
-   (set_attr "btver2_decode" "double,double") 
+   (set_attr "prefix" "orig,vex,evex")
+   (set_attr "btver2_decode" "double,double,double") 
    (set_attr "mode" "TI")])
 
 (define_insn "aesdeclast"
-  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
-	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x")
-		       (match_operand:V2DI 2 "vector_operand" "xBm,xm")]
+  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
+	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
+		       (match_operand:V2DI 2 "vector_operand" "xBm,xm,vm")]
 		      UNSPEC_AESDECLAST))]
-  "TARGET_AES"
+  "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
   "@
    aesdeclast\t{%2, %0|%0, %2}
+   vaesdeclast\t{%2, %1, %0|%0, %1, %2}
    vaesdeclast\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,aes,avx512vl")
    (set_attr "type" "sselog1")
    (set_attr "prefix_extra" "1")
-   (set_attr "prefix" "orig,vex")
-   (set_attr "btver2_decode" "double,double")
+   (set_attr "prefix" "orig,vex,evex")
+   (set_attr "btver2_decode" "double,double,double")
    (set_attr "mode" "TI")])
 
 (define_insn "aesimc"
diff --git a/gcc/config/i386/vaesintrin.h b/gcc/config/i386/vaesintrin.h
index 0f1cffe71e9..58fc19c9eb3 100644
--- a/gcc/config/i386/vaesintrin.h
+++ b/gcc/config/i386/vaesintrin.h
@@ -24,9 +24,9 @@
 #ifndef __VAESINTRIN_H_INCLUDED
 #define __VAESINTRIN_H_INCLUDED
 
-#if !defined(__VAES__) || !defined(__AVX__)
+#if !defined(__VAES__)
 #pragma GCC push_options
-#pragma GCC target("vaes,avx")
+#pragma GCC target("vaes")
 #define __DISABLE_VAES__
 #endif /* __VAES__ */
 
diff --git a/gcc/config/i386/wmmintrin.h b/gcc/config/i386/wmmintrin.h
index ae15cea429e..da314dbd44d 100644
--- a/gcc/config/i386/wmmintrin.h
+++ b/gcc/config/i386/wmmintrin.h
@@ -40,36 +40,23 @@
 
 /* Performs 1 round of AES decryption of the first m128i using 
    the second m128i as a round key.  */
-extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_mm_aesdec_si128 (__m128i __X, __m128i __Y)
-{
-  return (__m128i) __builtin_ia32_aesdec128 ((__v2di)__X, (__v2di)__Y);
-}
+#define _mm_aesdec_si128(X, Y) \
+  (__m128i) __builtin_ia32_aesdec128 ((__v2di) (X), (__v2di) (Y))
 
 /* Performs the last round of AES decryption of the first m128i 
    using the second m128i as a round key.  */
-extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_mm_aesdeclast_si128 (__m128i __X, __m128i __Y)
-{
-  return (__m128i) __builtin_ia32_aesdeclast128 ((__v2di)__X,
-						 (__v2di)__Y);
-}
+#define _mm_aesdeclast_si128(X, Y) \
+  (__m128i) __builtin_ia32_aesdeclast128 ((__v2di) (X), (__v2di) (Y))
 
 /* Performs 1 round of AES encryption of the first m128i using 
    the second m128i as a round key.  */
-extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_mm_aesenc_si128 (__m128i __X, __m128i __Y)
-{
-  return (__m128i) __builtin_ia32_aesenc128 ((__v2di)__X, (__v2di)__Y);
-}
+#define _mm_aesenc_si128(X, Y) \
+  (__m128i) __builtin_ia32_aesenc128 ((__v2di) (X), (__v2di) (Y))
 
 /* Performs the last round of AES encryption of the first m128i
    using the second m128i as a round key.  */
-extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_mm_aesenclast_si128 (__m128i __X, __m128i __Y)
-{
-  return (__m128i) __builtin_ia32_aesenclast128 ((__v2di)__X, (__v2di)__Y);
-}
+#define _mm_aesenclast_si128(X, Y) \
+  (__m128i) __builtin_ia32_aesenclast128 ((__v2di) (X), (__v2di) (Y))
 
 /* Performs the InverseMixColumn operation on the source m128i 
    and stores the result into m128i destination.  */
diff --git a/gcc/testsuite/gcc.target/i386/avx512fvl-vaes-1.c b/gcc/testsuite/gcc.target/i386/avx512fvl-vaes-1.c
index c65b570cd47..f35742ec98b 100644
--- a/gcc/testsuite/gcc.target/i386/avx512fvl-vaes-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx512fvl-vaes-1.c
@@ -10,10 +10,16 @@
 /* { dg-final { scan-assembler-times "vaesenc\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\{\n\]*%ymm\[0-9\]+\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
 /* { dg-final { scan-assembler-times "vaesenclast\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\{\n\]*%ymm\[0-9\]+\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
 
+/* { dg-final { scan-assembler-times "vaesdec\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\{\n\]*%xmm\[0-9\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vaesdeclast\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\{\n\]*%xmm\[0-9\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vaesenc\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\{\n\]*%xmm\[0-9\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vaesenclast\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\{\n\]*%xmm\[0-9\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+
 #include <immintrin.h>
 
 volatile __m512i x,y;
 volatile __m256i x256, y256;
+volatile __m128i x128, y128;
 
 void extern
 avx512f_test (void)
@@ -27,4 +33,9 @@ avx512f_test (void)
   x256 = _mm256_aesdeclast_epi128 (x256, y256);
   x256 = _mm256_aesenc_epi128 (x256, y256);
   x256 = _mm256_aesenclast_epi128 (x256, y256);
+
+  x128 = _mm_aesdec_si128 (x128, y128);
+  x128 = _mm_aesdeclast_si128 (x128, y128);
+  x128 = _mm_aesenc_si128 (x128, y128);
+  x128 = _mm_aesenclast_si128 (x128, y128);
 }
diff --git a/gcc/testsuite/gcc.target/i386/pr84335.c b/gcc/testsuite/gcc.target/i386/pr84335.c
index c8d2a712f1f..5e45e2b322a 100644
--- a/gcc/testsuite/gcc.target/i386/pr84335.c
+++ b/gcc/testsuite/gcc.target/i386/pr84335.c
@@ -6,5 +6,5 @@ typedef long long V __attribute__ ((__vector_size__ (16)));
 V
 foo (V *a, V *b)
 {
-  return __builtin_ia32_aesenc128 (*a, *b);	/* { dg-error "needs isa option" } */
-}
+  return __builtin_ia32_aesenc128 (*a, *b);	/* { dg-warning "implicit declaration of function" } */
+}						/* { dg-error "incompatible types when returning type" "" { target *-*-* } .-1 } */
-- 
2.31.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH] i386: Share AES xmm intrin with VAES
  2023-04-18  7:18 [PATCH] i386: Share AES xmm intrin with VAES Haochen Jiang
@ 2023-04-18  7:28 ` Haochen Jiang
  2023-04-19  2:31 ` Hongtao Liu
  1 sibling, 0 replies; 12+ messages in thread
From: Haochen Jiang @ 2023-04-18  7:28 UTC (permalink / raw)
  To: gcc-patches; +Cc: hongtao.liu, ubizjak

Hi all,

I realized that I attached a old version of my patch. We should change
the error message of pr109117-1.c but not pr84335.c.

Please review this patch.

Thx,
Haochen

gcc/ChangeLog:

	* common/config/i386/i386-common.cc
	(OPTION_MASK_ISA2_AVX_UNSET): Add OPTION_MASK_ISA2_VAES_UNSET.
	(ix86_handle_option): Set AVX flag for VAES.
	* config/i386/i386-builtins.cc (ix86_init_mmx_sse_builtins):
	Add OPTION_MASK_ISA2_VAES_UNSET.
	(def_builtin): Share builtin between AES and VAES.
	* config/i386/i386-expand.cc (ix86_check_builtin_isa_match):
	Ditto.
	* config/i386/i386.md (aes): New isa attribute.
	* config/i386/sse.md (aesenc): Add pattern for VAES with xmm.
	(aesenclast): Ditto.
	(aesdec): Ditto.
	(aesdeclast): Ditto.
	* config/i386/vaesintrin.h: Remove redundant avx target push.
	* config/i386/wmmintrin.h (_mm_aesdec_si128): Change to macro.
	(_mm_aesdeclast_si128): Ditto.
	(_mm_aesenc_si128): Ditto.
	(_mm_aesenclast_si128): Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512fvl-vaes-1.c: Add VAES xmm test.
	* gcc.target/i386/pr109117-1.c: Modify error message.
---
 gcc/common/config/i386/i386-common.cc         |  5 +-
 gcc/config/i386/i386-builtins.cc              | 21 ++++---
 gcc/config/i386/i386-expand.cc                |  1 +
 gcc/config/i386/i386.md                       |  3 +-
 gcc/config/i386/sse.md                        | 60 ++++++++++---------
 gcc/config/i386/vaesintrin.h                  |  4 +-
 gcc/config/i386/wmmintrin.h                   | 29 +++------
 .../gcc.target/i386/avx512fvl-vaes-1.c        | 11 ++++
 gcc/testsuite/gcc.target/i386/pr109117-1.c    |  4 +-
 9 files changed, 75 insertions(+), 63 deletions(-)

diff --git a/gcc/common/config/i386/i386-common.cc b/gcc/common/config/i386/i386-common.cc
index c7954da8e34..bf126f14073 100644
--- a/gcc/common/config/i386/i386-common.cc
+++ b/gcc/common/config/i386/i386-common.cc
@@ -348,7 +348,8 @@ along with GCC; see the file COPYING3.  If not see
    | OPTION_MASK_ISA2_AVX512VP2INTERSECT_UNSET)
 #define OPTION_MASK_ISA2_GENERAL_REGS_ONLY_UNSET \
   OPTION_MASK_ISA2_SSE_UNSET
-#define OPTION_MASK_ISA2_AVX_UNSET OPTION_MASK_ISA2_AVX2_UNSET
+#define OPTION_MASK_ISA2_AVX_UNSET \
+  (OPTION_MASK_ISA2_AVX2_UNSET | OPTION_MASK_ISA2_VAES_UNSET)
 #define OPTION_MASK_ISA2_SSE4_2_UNSET OPTION_MASK_ISA2_AVX_UNSET
 #define OPTION_MASK_ISA2_SSE4_1_UNSET OPTION_MASK_ISA2_SSE4_2_UNSET
 #define OPTION_MASK_ISA2_SSE4_UNSET OPTION_MASK_ISA2_SSE4_1_UNSET
@@ -685,6 +686,8 @@ ix86_handle_option (struct gcc_options *opts,
 	{
 	  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_VAES_SET;
 	  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_VAES_SET;
+	  opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX_SET;
+	  opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX_SET;
 	}
       else
 	{
diff --git a/gcc/config/i386/i386-builtins.cc b/gcc/config/i386/i386-builtins.cc
index fc0c82b156e..28f404da288 100644
--- a/gcc/config/i386/i386-builtins.cc
+++ b/gcc/config/i386/i386-builtins.cc
@@ -279,14 +279,15 @@ def_builtin (HOST_WIDE_INT mask, HOST_WIDE_INT mask2,
       if (((mask2 == 0 || (mask2 & ix86_isa_flags2) != 0)
 	   && (mask == 0 || (mask & ix86_isa_flags) != 0))
 	  || ((mask & OPTION_MASK_ISA_MMX) != 0 && TARGET_MMX_WITH_SSE)
-	  /* "Unified" builtin used by either AVXVNNI/AVXIFMA intrinsics
-	     or AVX512VNNIVL/AVX512IFMAVL non-mask intrinsics should be
-	     defined whenever avxvnni/avxifma or avx512vnni/avxifma &&
-	     avx512vl exist.  */
+	  /* "Unified" builtin used by either AVXVNNI/AVXIFMA/AES intrinsics
+	     or AVX512VNNIVL/AVX512IFMAVL/VAESVL non-mask intrinsics should be
+	     defined whenever avxvnni/avxifma/aes or avx512vnni/avx512ifma/vaes
+	     && avx512vl exist.  */
 	  || (mask2 == OPTION_MASK_ISA2_AVXVNNI)
 	  || (mask2 == OPTION_MASK_ISA2_AVXIFMA)
 	  || (mask2 == (OPTION_MASK_ISA2_AVXNECONVERT
 			| OPTION_MASK_ISA2_AVX512BF16))
+	  || ((mask2 & OPTION_MASK_ISA2_VAES) != 0)
 	  || (lang_hooks.builtin_function
 	      == lang_hooks.builtin_function_ext_scope))
 	{
@@ -661,16 +662,20 @@ ix86_init_mmx_sse_builtins (void)
 	       VOID_FTYPE_UNSIGNED_UNSIGNED, IX86_BUILTIN_MWAIT);
 
   /* AES */
-  def_builtin_const (OPTION_MASK_ISA_AES | OPTION_MASK_ISA_SSE2, 0,
+  def_builtin_const (OPTION_MASK_ISA_AES | OPTION_MASK_ISA_SSE2,
+		     OPTION_MASK_ISA2_VAES,
 		     "__builtin_ia32_aesenc128",
 		     V2DI_FTYPE_V2DI_V2DI, IX86_BUILTIN_AESENC128);
-  def_builtin_const (OPTION_MASK_ISA_AES | OPTION_MASK_ISA_SSE2, 0,
+  def_builtin_const (OPTION_MASK_ISA_AES | OPTION_MASK_ISA_SSE2,
+		     OPTION_MASK_ISA2_VAES,
 		     "__builtin_ia32_aesenclast128",
 		     V2DI_FTYPE_V2DI_V2DI, IX86_BUILTIN_AESENCLAST128);
-  def_builtin_const (OPTION_MASK_ISA_AES | OPTION_MASK_ISA_SSE2, 0,
+  def_builtin_const (OPTION_MASK_ISA_AES | OPTION_MASK_ISA_SSE2,
+		     OPTION_MASK_ISA2_VAES,
 		     "__builtin_ia32_aesdec128",
 		     V2DI_FTYPE_V2DI_V2DI, IX86_BUILTIN_AESDEC128);
-  def_builtin_const (OPTION_MASK_ISA_AES | OPTION_MASK_ISA_SSE2, 0,
+  def_builtin_const (OPTION_MASK_ISA_AES | OPTION_MASK_ISA_SSE2,
+		     OPTION_MASK_ISA2_VAES,
 		     "__builtin_ia32_aesdeclast128",
 		     V2DI_FTYPE_V2DI_V2DI, IX86_BUILTIN_AESDECLAST128);
   def_builtin_const (OPTION_MASK_ISA_AES | OPTION_MASK_ISA_SSE2, 0,
diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 54d5dfae677..28574a5809b 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -12624,6 +12624,7 @@ ix86_check_builtin_isa_match (unsigned int fcode,
 		 OPTION_MASK_ISA2_AVXIFMA);
   SHARE_BUILTIN (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512BF16, 0,
 		 OPTION_MASK_ISA2_AVXNECONVERT);
+  SHARE_BUILTIN (OPTION_MASK_ISA_AES, 0, 0, OPTION_MASK_ISA2_VAES);
   isa = tmp_isa;
   isa2 = tmp_isa2;
 
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index acc994226e7..15c366cb595 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -836,7 +836,7 @@
 
 ;; Used to control the "enabled" attribute on a per-instruction basis.
 (define_attr "isa" "base,x64,nox64,x64_sse2,x64_sse4,x64_sse4_noavx,
-		    x64_avx,x64_avx512bw,x64_avx512dq,
+		    x64_avx,x64_avx512bw,x64_avx512dq,aes,
 		    sse_noavx,sse2,sse2_noavx,sse3,sse3_noavx,sse4,sse4_noavx,
 		    avx,noavx,avx2,noavx2,bmi,bmi2,fma4,fma,avx512f,noavx512f,
 		    avx512bw,noavx512bw,avx512dq,noavx512dq,fma_or_avx512vl,
@@ -863,6 +863,7 @@
 	   (symbol_ref "TARGET_64BIT && TARGET_AVX512BW")
 	 (eq_attr "isa" "x64_avx512dq")
 	   (symbol_ref "TARGET_64BIT && TARGET_AVX512DQ")
+	 (eq_attr "isa" "aes") (symbol_ref "TARGET_AES")
 	 (eq_attr "isa" "sse_noavx")
 	   (symbol_ref "TARGET_SSE && !TARGET_AVX")
 	 (eq_attr "isa" "sse2") (symbol_ref "TARGET_SSE2")
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 33e281901cf..e7d565a8389 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -25107,67 +25107,71 @@
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 
 (define_insn "aesenc"
-  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
-	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x")
-		       (match_operand:V2DI 2 "vector_operand" "xBm,xm")]
+  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
+	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
+		       (match_operand:V2DI 2 "vector_operand" "xBm,xm,vm")]
 		      UNSPEC_AESENC))]
-  "TARGET_AES"
+  "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
   "@
    aesenc\t{%2, %0|%0, %2}
+   vaesenc\t{%2, %1, %0|%0, %1, %2}
    vaesenc\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,aes,avx512vl")
    (set_attr "type" "sselog1")
    (set_attr "prefix_extra" "1")
-   (set_attr "prefix" "orig,vex")
-   (set_attr "btver2_decode" "double,double")
+   (set_attr "prefix" "orig,vex,evex")
+   (set_attr "btver2_decode" "double,double,double")
    (set_attr "mode" "TI")])
 
 (define_insn "aesenclast"
-  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
-	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x")
-		       (match_operand:V2DI 2 "vector_operand" "xBm,xm")]
+  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
+	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
+		       (match_operand:V2DI 2 "vector_operand" "xBm,xm,vm")]
 		      UNSPEC_AESENCLAST))]
-  "TARGET_AES"
+  "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
   "@
    aesenclast\t{%2, %0|%0, %2}
+   vaesenclast\t{%2, %1, %0|%0, %1, %2}
    vaesenclast\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,aes,avx512vl")
    (set_attr "type" "sselog1")
    (set_attr "prefix_extra" "1")
-   (set_attr "prefix" "orig,vex")
-   (set_attr "btver2_decode" "double,double") 
+   (set_attr "prefix" "orig,vex,evex")
+   (set_attr "btver2_decode" "double,double,double") 
    (set_attr "mode" "TI")])
 
 (define_insn "aesdec"
-  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
-	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x")
-		       (match_operand:V2DI 2 "vector_operand" "xBm,xm")]
+  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
+	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
+		       (match_operand:V2DI 2 "vector_operand" "xBm,xm,vm")]
 		      UNSPEC_AESDEC))]
-  "TARGET_AES"
+  "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
   "@
    aesdec\t{%2, %0|%0, %2}
+   vaesdec\t{%2, %1, %0|%0, %1, %2}
    vaesdec\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,aes,avx512vl")
    (set_attr "type" "sselog1")
    (set_attr "prefix_extra" "1")
-   (set_attr "prefix" "orig,vex")
-   (set_attr "btver2_decode" "double,double") 
+   (set_attr "prefix" "orig,vex,evex")
+   (set_attr "btver2_decode" "double,double,double") 
    (set_attr "mode" "TI")])
 
 (define_insn "aesdeclast"
-  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
-	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x")
-		       (match_operand:V2DI 2 "vector_operand" "xBm,xm")]
+  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
+	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
+		       (match_operand:V2DI 2 "vector_operand" "xBm,xm,vm")]
 		      UNSPEC_AESDECLAST))]
-  "TARGET_AES"
+  "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
   "@
    aesdeclast\t{%2, %0|%0, %2}
+   vaesdeclast\t{%2, %1, %0|%0, %1, %2}
    vaesdeclast\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,aes,avx512vl")
    (set_attr "type" "sselog1")
    (set_attr "prefix_extra" "1")
-   (set_attr "prefix" "orig,vex")
-   (set_attr "btver2_decode" "double,double")
+   (set_attr "prefix" "orig,vex,evex")
+   (set_attr "btver2_decode" "double,double,double")
    (set_attr "mode" "TI")])
 
 (define_insn "aesimc"
diff --git a/gcc/config/i386/vaesintrin.h b/gcc/config/i386/vaesintrin.h
index 0f1cffe71e9..58fc19c9eb3 100644
--- a/gcc/config/i386/vaesintrin.h
+++ b/gcc/config/i386/vaesintrin.h
@@ -24,9 +24,9 @@
 #ifndef __VAESINTRIN_H_INCLUDED
 #define __VAESINTRIN_H_INCLUDED
 
-#if !defined(__VAES__) || !defined(__AVX__)
+#if !defined(__VAES__)
 #pragma GCC push_options
-#pragma GCC target("vaes,avx")
+#pragma GCC target("vaes")
 #define __DISABLE_VAES__
 #endif /* __VAES__ */
 
diff --git a/gcc/config/i386/wmmintrin.h b/gcc/config/i386/wmmintrin.h
index ae15cea429e..da314dbd44d 100644
--- a/gcc/config/i386/wmmintrin.h
+++ b/gcc/config/i386/wmmintrin.h
@@ -40,36 +40,23 @@
 
 /* Performs 1 round of AES decryption of the first m128i using 
    the second m128i as a round key.  */
-extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_mm_aesdec_si128 (__m128i __X, __m128i __Y)
-{
-  return (__m128i) __builtin_ia32_aesdec128 ((__v2di)__X, (__v2di)__Y);
-}
+#define _mm_aesdec_si128(X, Y) \
+  (__m128i) __builtin_ia32_aesdec128 ((__v2di) (X), (__v2di) (Y))
 
 /* Performs the last round of AES decryption of the first m128i 
    using the second m128i as a round key.  */
-extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_mm_aesdeclast_si128 (__m128i __X, __m128i __Y)
-{
-  return (__m128i) __builtin_ia32_aesdeclast128 ((__v2di)__X,
-						 (__v2di)__Y);
-}
+#define _mm_aesdeclast_si128(X, Y) \
+  (__m128i) __builtin_ia32_aesdeclast128 ((__v2di) (X), (__v2di) (Y))
 
 /* Performs 1 round of AES encryption of the first m128i using 
    the second m128i as a round key.  */
-extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_mm_aesenc_si128 (__m128i __X, __m128i __Y)
-{
-  return (__m128i) __builtin_ia32_aesenc128 ((__v2di)__X, (__v2di)__Y);
-}
+#define _mm_aesenc_si128(X, Y) \
+  (__m128i) __builtin_ia32_aesenc128 ((__v2di) (X), (__v2di) (Y))
 
 /* Performs the last round of AES encryption of the first m128i
    using the second m128i as a round key.  */
-extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_mm_aesenclast_si128 (__m128i __X, __m128i __Y)
-{
-  return (__m128i) __builtin_ia32_aesenclast128 ((__v2di)__X, (__v2di)__Y);
-}
+#define _mm_aesenclast_si128(X, Y) \
+  (__m128i) __builtin_ia32_aesenclast128 ((__v2di) (X), (__v2di) (Y))
 
 /* Performs the InverseMixColumn operation on the source m128i 
    and stores the result into m128i destination.  */
diff --git a/gcc/testsuite/gcc.target/i386/avx512fvl-vaes-1.c b/gcc/testsuite/gcc.target/i386/avx512fvl-vaes-1.c
index c65b570cd47..f35742ec98b 100644
--- a/gcc/testsuite/gcc.target/i386/avx512fvl-vaes-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx512fvl-vaes-1.c
@@ -10,10 +10,16 @@
 /* { dg-final { scan-assembler-times "vaesenc\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\{\n\]*%ymm\[0-9\]+\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
 /* { dg-final { scan-assembler-times "vaesenclast\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\{\n\]*%ymm\[0-9\]+\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
 
+/* { dg-final { scan-assembler-times "vaesdec\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\{\n\]*%xmm\[0-9\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vaesdeclast\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\{\n\]*%xmm\[0-9\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vaesenc\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\{\n\]*%xmm\[0-9\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+/* { dg-final { scan-assembler-times "vaesenclast\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\{\n\]*%xmm\[0-9\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+
 #include <immintrin.h>
 
 volatile __m512i x,y;
 volatile __m256i x256, y256;
+volatile __m128i x128, y128;
 
 void extern
 avx512f_test (void)
@@ -27,4 +33,9 @@ avx512f_test (void)
   x256 = _mm256_aesdeclast_epi128 (x256, y256);
   x256 = _mm256_aesenc_epi128 (x256, y256);
   x256 = _mm256_aesenclast_epi128 (x256, y256);
+
+  x128 = _mm_aesdec_si128 (x128, y128);
+  x128 = _mm_aesdeclast_si128 (x128, y128);
+  x128 = _mm_aesenc_si128 (x128, y128);
+  x128 = _mm_aesenclast_si128 (x128, y128);
 }
diff --git a/gcc/testsuite/gcc.target/i386/pr109117-1.c b/gcc/testsuite/gcc.target/i386/pr109117-1.c
index 87a5c0e7fc9..1c4da997c36 100644
--- a/gcc/testsuite/gcc.target/i386/pr109117-1.c
+++ b/gcc/testsuite/gcc.target/i386/pr109117-1.c
@@ -10,5 +10,5 @@ volatile __m128i res;
 void
 foo (void)
 {
-      res = __builtin_ia32_vaesdec_v16qi (x, y); /* { dg-warning "implicit declaration of function" } */
-}     /* { dg-error "incompatible types when assigning to type" "" { target *-*-* } .-1 } */
+      res = __builtin_ia32_vaesdec_v16qi (x, y); /* { dg-error "incompatible types when assigning to type" } */
+}
-- 
2.31.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] i386: Share AES xmm intrin with VAES
  2023-04-18  7:18 [PATCH] i386: Share AES xmm intrin with VAES Haochen Jiang
  2023-04-18  7:28 ` Haochen Jiang
@ 2023-04-19  2:31 ` Hongtao Liu
  2023-04-19  2:40   ` Jiang, Haochen
  1 sibling, 1 reply; 12+ messages in thread
From: Hongtao Liu @ 2023-04-19  2:31 UTC (permalink / raw)
  To: Haochen Jiang; +Cc: gcc-patches, hongtao.liu, ubizjak

On Tue, Apr 18, 2023 at 3:19 PM Haochen Jiang via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Hi all,
>
> Currently in GCC, the 128 bit intrin for instruction vaes{end,dec}{last,}
> is under AES ISA. Because there is no dependency between ISA set AES
> and VAES, The 128 bit intrin is not available when we use compiler flag
> -mvaes -mavx512vl and there is no other way to use that intrin. But it
> should according to Intel SDM.
>
> Although VAES aims to be a VEX/EVEX promotion for AES, but it is only part
> of it. Therefore, we share the AES xmm intrin with VAES.
>
> Also, since -mvaes indicates that we could use VEX encoding for ymm, we
> should imply AVX for VAES.
>
> Tested on x86_64-pc-linux-gnu. Ok for trunk?
>
> BRs,
> Haochen
>
> gcc/ChangeLog:
>
>         * common/config/i386/i386-common.cc
>         (OPTION_MASK_ISA2_AVX_UNSET): Add OPTION_MASK_ISA2_VAES_UNSET.
>         (ix86_handle_option): Set AVX flag for VAES.
>         * config/i386/i386-builtins.cc (ix86_init_mmx_sse_builtins):
>         Add OPTION_MASK_ISA2_VAES_UNSET.
>         (def_builtin): Share builtin between AES and VAES.
>         * config/i386/i386-expand.cc (ix86_check_builtin_isa_match):
>         Ditto.
>         * config/i386/i386.md (aes): New isa attribute.
>         * config/i386/sse.md (aesenc): Add pattern for VAES with xmm.
>         (aesenclast): Ditto.
>         (aesdec): Ditto.
>         (aesdeclast): Ditto.
>         * config/i386/vaesintrin.h: Remove redundant avx target push.
>         * config/i386/wmmintrin.h (_mm_aesdec_si128): Change to macro.
>         (_mm_aesdeclast_si128): Ditto.
>         (_mm_aesenc_si128): Ditto.
>         (_mm_aesenclast_si128): Ditto.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/i386/avx512fvl-vaes-1.c: Add VAES xmm test.
>         * gcc.target/i386/pr84335.c: Modify error message.
> ---
>  gcc/common/config/i386/i386-common.cc         |  5 +-
>  gcc/config/i386/i386-builtins.cc              | 21 ++++---
>  gcc/config/i386/i386-expand.cc                |  1 +
>  gcc/config/i386/i386.md                       |  3 +-
>  gcc/config/i386/sse.md                        | 60 ++++++++++---------
>  gcc/config/i386/vaesintrin.h                  |  4 +-
>  gcc/config/i386/wmmintrin.h                   | 29 +++------
>  .../gcc.target/i386/avx512fvl-vaes-1.c        | 11 ++++
>  gcc/testsuite/gcc.target/i386/pr84335.c       |  4 +-
>  9 files changed, 75 insertions(+), 63 deletions(-)
>
> diff --git a/gcc/common/config/i386/i386-common.cc b/gcc/common/config/i386/i386-common.cc
> index c7954da8e34..bf126f14073 100644
> --- a/gcc/common/config/i386/i386-common.cc
> +++ b/gcc/common/config/i386/i386-common.cc
> @@ -348,7 +348,8 @@ along with GCC; see the file COPYING3.  If not see
>     | OPTION_MASK_ISA2_AVX512VP2INTERSECT_UNSET)
>  #define OPTION_MASK_ISA2_GENERAL_REGS_ONLY_UNSET \
>    OPTION_MASK_ISA2_SSE_UNSET
> -#define OPTION_MASK_ISA2_AVX_UNSET OPTION_MASK_ISA2_AVX2_UNSET
> +#define OPTION_MASK_ISA2_AVX_UNSET \
> +  (OPTION_MASK_ISA2_AVX2_UNSET | OPTION_MASK_ISA2_VAES_UNSET)
>  #define OPTION_MASK_ISA2_SSE4_2_UNSET OPTION_MASK_ISA2_AVX_UNSET
>  #define OPTION_MASK_ISA2_SSE4_1_UNSET OPTION_MASK_ISA2_SSE4_2_UNSET
>  #define OPTION_MASK_ISA2_SSE4_UNSET OPTION_MASK_ISA2_SSE4_1_UNSET
> @@ -685,6 +686,8 @@ ix86_handle_option (struct gcc_options *opts,
>         {
>           opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_VAES_SET;
>           opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_VAES_SET;
> +         opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX_SET;
> +         opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX_SET;
>         }
>        else
>         {
> diff --git a/gcc/config/i386/i386-builtins.cc b/gcc/config/i386/i386-builtins.cc
> index fc0c82b156e..28f404da288 100644
> --- a/gcc/config/i386/i386-builtins.cc
> +++ b/gcc/config/i386/i386-builtins.cc
> @@ -279,14 +279,15 @@ def_builtin (HOST_WIDE_INT mask, HOST_WIDE_INT mask2,
>        if (((mask2 == 0 || (mask2 & ix86_isa_flags2) != 0)
>            && (mask == 0 || (mask & ix86_isa_flags) != 0))
>           || ((mask & OPTION_MASK_ISA_MMX) != 0 && TARGET_MMX_WITH_SSE)
> -         /* "Unified" builtin used by either AVXVNNI/AVXIFMA intrinsics
> -            or AVX512VNNIVL/AVX512IFMAVL non-mask intrinsics should be
> -            defined whenever avxvnni/avxifma or avx512vnni/avxifma &&
> -            avx512vl exist.  */
> +         /* "Unified" builtin used by either AVXVNNI/AVXIFMA/AES intrinsics
> +            or AVX512VNNIVL/AVX512IFMAVL/VAESVL non-mask intrinsics should be
> +            defined whenever avxvnni/avxifma/aes or avx512vnni/avx512ifma/vaes
> +            && avx512vl exist.  */
>           || (mask2 == OPTION_MASK_ISA2_AVXVNNI)
>           || (mask2 == OPTION_MASK_ISA2_AVXIFMA)
>           || (mask2 == (OPTION_MASK_ISA2_AVXNECONVERT
>                         | OPTION_MASK_ISA2_AVX512BF16))
> +         || ((mask2 & OPTION_MASK_ISA2_VAES) != 0)
>           || (lang_hooks.builtin_function
>               == lang_hooks.builtin_function_ext_scope))
>         {
> @@ -661,16 +662,20 @@ ix86_init_mmx_sse_builtins (void)
>                VOID_FTYPE_UNSIGNED_UNSIGNED, IX86_BUILTIN_MWAIT);
>
>    /* AES */
> -  def_builtin_const (OPTION_MASK_ISA_AES | OPTION_MASK_ISA_SSE2, 0,
> +  def_builtin_const (OPTION_MASK_ISA_AES | OPTION_MASK_ISA_SSE2,
> +                    OPTION_MASK_ISA2_VAES,
>                      "__builtin_ia32_aesenc128",
>                      V2DI_FTYPE_V2DI_V2DI, IX86_BUILTIN_AESENC128);
> -  def_builtin_const (OPTION_MASK_ISA_AES | OPTION_MASK_ISA_SSE2, 0,
> +  def_builtin_const (OPTION_MASK_ISA_AES | OPTION_MASK_ISA_SSE2,
> +                    OPTION_MASK_ISA2_VAES,
>                      "__builtin_ia32_aesenclast128",
>                      V2DI_FTYPE_V2DI_V2DI, IX86_BUILTIN_AESENCLAST128);
> -  def_builtin_const (OPTION_MASK_ISA_AES | OPTION_MASK_ISA_SSE2, 0,
> +  def_builtin_const (OPTION_MASK_ISA_AES | OPTION_MASK_ISA_SSE2,
> +                    OPTION_MASK_ISA2_VAES,
>                      "__builtin_ia32_aesdec128",
>                      V2DI_FTYPE_V2DI_V2DI, IX86_BUILTIN_AESDEC128);
> -  def_builtin_const (OPTION_MASK_ISA_AES | OPTION_MASK_ISA_SSE2, 0,
> +  def_builtin_const (OPTION_MASK_ISA_AES | OPTION_MASK_ISA_SSE2,
> +                    OPTION_MASK_ISA2_VAES,
>                      "__builtin_ia32_aesdeclast128",
>                      V2DI_FTYPE_V2DI_V2DI, IX86_BUILTIN_AESDECLAST128);
>    def_builtin_const (OPTION_MASK_ISA_AES | OPTION_MASK_ISA_SSE2, 0,
> diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
> index 54d5dfae677..28574a5809b 100644
> --- a/gcc/config/i386/i386-expand.cc
> +++ b/gcc/config/i386/i386-expand.cc
> @@ -12624,6 +12624,7 @@ ix86_check_builtin_isa_match (unsigned int fcode,
>                  OPTION_MASK_ISA2_AVXIFMA);
>    SHARE_BUILTIN (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512BF16, 0,
>                  OPTION_MASK_ISA2_AVXNECONVERT);
> +  SHARE_BUILTIN (OPTION_MASK_ISA_AES, 0, 0, OPTION_MASK_ISA2_VAES);
>    isa = tmp_isa;
>    isa2 = tmp_isa2;
>
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index acc994226e7..15c366cb595 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -836,7 +836,7 @@
>
>  ;; Used to control the "enabled" attribute on a per-instruction basis.
>  (define_attr "isa" "base,x64,nox64,x64_sse2,x64_sse4,x64_sse4_noavx,
> -                   x64_avx,x64_avx512bw,x64_avx512dq,
> +                   x64_avx,x64_avx512bw,x64_avx512dq,aes,
>                     sse_noavx,sse2,sse2_noavx,sse3,sse3_noavx,sse4,sse4_noavx,
>                     avx,noavx,avx2,noavx2,bmi,bmi2,fma4,fma,avx512f,noavx512f,
>                     avx512bw,noavx512bw,avx512dq,noavx512dq,fma_or_avx512vl,
> @@ -863,6 +863,7 @@
>            (symbol_ref "TARGET_64BIT && TARGET_AVX512BW")
>          (eq_attr "isa" "x64_avx512dq")
>            (symbol_ref "TARGET_64BIT && TARGET_AVX512DQ")
> +        (eq_attr "isa" "aes") (symbol_ref "TARGET_AES")
>          (eq_attr "isa" "sse_noavx")
>            (symbol_ref "TARGET_SSE && !TARGET_AVX")
>          (eq_attr "isa" "sse2") (symbol_ref "TARGET_SSE2")
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index 33e281901cf..e7d565a8389 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -25107,67 +25107,71 @@
>  ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>
>  (define_insn "aesenc"
> -  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
> -       (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x")
> -                      (match_operand:V2DI 2 "vector_operand" "xBm,xm")]
> +  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
> +       (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
> +                      (match_operand:V2DI 2 "vector_operand" "xBm,xm,vm")]
>                       UNSPEC_AESENC))]
> -  "TARGET_AES"
> +  "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
>    "@
>     aesenc\t{%2, %0|%0, %2}
> +   vaesenc\t{%2, %1, %0|%0, %1, %2}
>     vaesenc\t{%2, %1, %0|%0, %1, %2}"
> -  [(set_attr "isa" "noavx,avx")
> +  [(set_attr "isa" "noavx,aes,avx512vl")
Shouldn't it be vaes_avx512vl and then remove " || (TARGET_VAES &&
TARGET_AVX512VL)" from condition.
Similar for below patterns.
Others LGTM.
>     (set_attr "type" "sselog1")
>     (set_attr "prefix_extra" "1")
> -   (set_attr "prefix" "orig,vex")
> -   (set_attr "btver2_decode" "double,double")
> +   (set_attr "prefix" "orig,vex,evex")
> +   (set_attr "btver2_decode" "double,double,double")
>     (set_attr "mode" "TI")])
>
>  (define_insn "aesenclast"
> -  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
> -       (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x")
> -                      (match_operand:V2DI 2 "vector_operand" "xBm,xm")]
> +  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
> +       (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
> +                      (match_operand:V2DI 2 "vector_operand" "xBm,xm,vm")]
>                       UNSPEC_AESENCLAST))]
> -  "TARGET_AES"
> +  "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
>    "@
>     aesenclast\t{%2, %0|%0, %2}
> +   vaesenclast\t{%2, %1, %0|%0, %1, %2}
>     vaesenclast\t{%2, %1, %0|%0, %1, %2}"
> -  [(set_attr "isa" "noavx,avx")
> +  [(set_attr "isa" "noavx,aes,avx512vl")
>     (set_attr "type" "sselog1")
>     (set_attr "prefix_extra" "1")
> -   (set_attr "prefix" "orig,vex")
> -   (set_attr "btver2_decode" "double,double")
> +   (set_attr "prefix" "orig,vex,evex")
> +   (set_attr "btver2_decode" "double,double,double")
>     (set_attr "mode" "TI")])
>
>  (define_insn "aesdec"
> -  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
> -       (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x")
> -                      (match_operand:V2DI 2 "vector_operand" "xBm,xm")]
> +  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
> +       (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
> +                      (match_operand:V2DI 2 "vector_operand" "xBm,xm,vm")]
>                       UNSPEC_AESDEC))]
> -  "TARGET_AES"
> +  "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
>    "@
>     aesdec\t{%2, %0|%0, %2}
> +   vaesdec\t{%2, %1, %0|%0, %1, %2}
>     vaesdec\t{%2, %1, %0|%0, %1, %2}"
> -  [(set_attr "isa" "noavx,avx")
> +  [(set_attr "isa" "noavx,aes,avx512vl")
>     (set_attr "type" "sselog1")
>     (set_attr "prefix_extra" "1")
> -   (set_attr "prefix" "orig,vex")
> -   (set_attr "btver2_decode" "double,double")
> +   (set_attr "prefix" "orig,vex,evex")
> +   (set_attr "btver2_decode" "double,double,double")
>     (set_attr "mode" "TI")])
>
>  (define_insn "aesdeclast"
> -  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
> -       (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x")
> -                      (match_operand:V2DI 2 "vector_operand" "xBm,xm")]
> +  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
> +       (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
> +                      (match_operand:V2DI 2 "vector_operand" "xBm,xm,vm")]
>                       UNSPEC_AESDECLAST))]
> -  "TARGET_AES"
> +  "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
>    "@
>     aesdeclast\t{%2, %0|%0, %2}
> +   vaesdeclast\t{%2, %1, %0|%0, %1, %2}
>     vaesdeclast\t{%2, %1, %0|%0, %1, %2}"
> -  [(set_attr "isa" "noavx,avx")
> +  [(set_attr "isa" "noavx,aes,avx512vl")
>     (set_attr "type" "sselog1")
>     (set_attr "prefix_extra" "1")
> -   (set_attr "prefix" "orig,vex")
> -   (set_attr "btver2_decode" "double,double")
> +   (set_attr "prefix" "orig,vex,evex")
> +   (set_attr "btver2_decode" "double,double,double")
>     (set_attr "mode" "TI")])
>
>  (define_insn "aesimc"
> diff --git a/gcc/config/i386/vaesintrin.h b/gcc/config/i386/vaesintrin.h
> index 0f1cffe71e9..58fc19c9eb3 100644
> --- a/gcc/config/i386/vaesintrin.h
> +++ b/gcc/config/i386/vaesintrin.h
> @@ -24,9 +24,9 @@
>  #ifndef __VAESINTRIN_H_INCLUDED
>  #define __VAESINTRIN_H_INCLUDED
>
> -#if !defined(__VAES__) || !defined(__AVX__)
> +#if !defined(__VAES__)
>  #pragma GCC push_options
> -#pragma GCC target("vaes,avx")
> +#pragma GCC target("vaes")
>  #define __DISABLE_VAES__
>  #endif /* __VAES__ */
>
> diff --git a/gcc/config/i386/wmmintrin.h b/gcc/config/i386/wmmintrin.h
> index ae15cea429e..da314dbd44d 100644
> --- a/gcc/config/i386/wmmintrin.h
> +++ b/gcc/config/i386/wmmintrin.h
> @@ -40,36 +40,23 @@
>
>  /* Performs 1 round of AES decryption of the first m128i using
>     the second m128i as a round key.  */
> -extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
> -_mm_aesdec_si128 (__m128i __X, __m128i __Y)
> -{
> -  return (__m128i) __builtin_ia32_aesdec128 ((__v2di)__X, (__v2di)__Y);
> -}
> +#define _mm_aesdec_si128(X, Y) \
> +  (__m128i) __builtin_ia32_aesdec128 ((__v2di) (X), (__v2di) (Y))
>
>  /* Performs the last round of AES decryption of the first m128i
>     using the second m128i as a round key.  */
> -extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
> -_mm_aesdeclast_si128 (__m128i __X, __m128i __Y)
> -{
> -  return (__m128i) __builtin_ia32_aesdeclast128 ((__v2di)__X,
> -                                                (__v2di)__Y);
> -}
> +#define _mm_aesdeclast_si128(X, Y) \
> +  (__m128i) __builtin_ia32_aesdeclast128 ((__v2di) (X), (__v2di) (Y))
>
>  /* Performs 1 round of AES encryption of the first m128i using
>     the second m128i as a round key.  */
> -extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
> -_mm_aesenc_si128 (__m128i __X, __m128i __Y)
> -{
> -  return (__m128i) __builtin_ia32_aesenc128 ((__v2di)__X, (__v2di)__Y);
> -}
> +#define _mm_aesenc_si128(X, Y) \
> +  (__m128i) __builtin_ia32_aesenc128 ((__v2di) (X), (__v2di) (Y))
>
>  /* Performs the last round of AES encryption of the first m128i
>     using the second m128i as a round key.  */
> -extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
> -_mm_aesenclast_si128 (__m128i __X, __m128i __Y)
> -{
> -  return (__m128i) __builtin_ia32_aesenclast128 ((__v2di)__X, (__v2di)__Y);
> -}
> +#define _mm_aesenclast_si128(X, Y) \
> +  (__m128i) __builtin_ia32_aesenclast128 ((__v2di) (X), (__v2di) (Y))
>
>  /* Performs the InverseMixColumn operation on the source m128i
>     and stores the result into m128i destination.  */
> diff --git a/gcc/testsuite/gcc.target/i386/avx512fvl-vaes-1.c b/gcc/testsuite/gcc.target/i386/avx512fvl-vaes-1.c
> index c65b570cd47..f35742ec98b 100644
> --- a/gcc/testsuite/gcc.target/i386/avx512fvl-vaes-1.c
> +++ b/gcc/testsuite/gcc.target/i386/avx512fvl-vaes-1.c
> @@ -10,10 +10,16 @@
>  /* { dg-final { scan-assembler-times "vaesenc\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\{\n\]*%ymm\[0-9\]+\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
>  /* { dg-final { scan-assembler-times "vaesenclast\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\{\n\]*%ymm\[0-9\]+\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
>
> +/* { dg-final { scan-assembler-times "vaesdec\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\{\n\]*%xmm\[0-9\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
> +/* { dg-final { scan-assembler-times "vaesdeclast\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\{\n\]*%xmm\[0-9\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
> +/* { dg-final { scan-assembler-times "vaesenc\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\{\n\]*%xmm\[0-9\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
> +/* { dg-final { scan-assembler-times "vaesenclast\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\{\n\]*%xmm\[0-9\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
> +
>  #include <immintrin.h>
>
>  volatile __m512i x,y;
>  volatile __m256i x256, y256;
> +volatile __m128i x128, y128;
>
>  void extern
>  avx512f_test (void)
> @@ -27,4 +33,9 @@ avx512f_test (void)
>    x256 = _mm256_aesdeclast_epi128 (x256, y256);
>    x256 = _mm256_aesenc_epi128 (x256, y256);
>    x256 = _mm256_aesenclast_epi128 (x256, y256);
> +
> +  x128 = _mm_aesdec_si128 (x128, y128);
> +  x128 = _mm_aesdeclast_si128 (x128, y128);
> +  x128 = _mm_aesenc_si128 (x128, y128);
> +  x128 = _mm_aesenclast_si128 (x128, y128);
>  }
> diff --git a/gcc/testsuite/gcc.target/i386/pr84335.c b/gcc/testsuite/gcc.target/i386/pr84335.c
> index c8d2a712f1f..5e45e2b322a 100644
> --- a/gcc/testsuite/gcc.target/i386/pr84335.c
> +++ b/gcc/testsuite/gcc.target/i386/pr84335.c
> @@ -6,5 +6,5 @@ typedef long long V __attribute__ ((__vector_size__ (16)));
>  V
>  foo (V *a, V *b)
>  {
> -  return __builtin_ia32_aesenc128 (*a, *b);    /* { dg-error "needs isa option" } */
> -}
> +  return __builtin_ia32_aesenc128 (*a, *b);    /* { dg-warning "implicit declaration of function" } */
> +}                                              /* { dg-error "incompatible types when returning type" "" { target *-*-* } .-1 } */
> --
> 2.31.1
>


-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] i386: Share AES xmm intrin with VAES
  2023-04-19  2:31 ` Hongtao Liu
@ 2023-04-19  2:40   ` Jiang, Haochen
  2023-04-19  2:42     ` Liu, Hongtao
  2024-04-04  8:41     ` [PATCH] i386: Fix aes/vaes patterns [PR114576] Jakub Jelinek
  0 siblings, 2 replies; 12+ messages in thread
From: Jiang, Haochen @ 2023-04-19  2:40 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: gcc-patches, Liu, Hongtao, ubizjak

> > a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index
> > 33e281901cf..e7d565a8389 100644
> > --- a/gcc/config/i386/sse.md
> > +++ b/gcc/config/i386/sse.md
> > @@ -25107,67 +25107,71 @@
> >
> > ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
> > ;;
> >
> >  (define_insn "aesenc"
> > -  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
> > -       (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x")
> > -                      (match_operand:V2DI 2 "vector_operand" "xBm,xm")]
> > +  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
> > +       (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
> > +                      (match_operand:V2DI 2 "vector_operand"
> > + "xBm,xm,vm")]
> >                       UNSPEC_AESENC))]
> > -  "TARGET_AES"
> > +  "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
> >    "@
> >     aesenc\t{%2, %0|%0, %2}
> > +   vaesenc\t{%2, %1, %0|%0, %1, %2}
> >     vaesenc\t{%2, %1, %0|%0, %1, %2}"
> > -  [(set_attr "isa" "noavx,avx")
> > +  [(set_attr "isa" "noavx,aes,avx512vl")
> Shouldn't it be vaes_avx512vl and then remove " || (TARGET_VAES &&
> TARGET_AVX512VL)" from condition.

Since VAES should not imply AES, we need that "|| (TARGET_VAES && 
TARGET_AVX512VL)"

And there is no need to add vaes_avx512vl since the last alternative will only
be hit when there is no aes. When there is no aes, the pattern will need vaes
and avx512vl both or we could not use this pattern. avx512vl here is just like
a placeholder.

BRs,
Haochen

> Similar for below patterns.
> Others LGTM.
> >     (set_attr "type" "sselog1")
> >     (set_attr "prefix_extra" "1")
> > -   (set_attr "prefix" "orig,vex")
> > -   (set_attr "btver2_decode" "double,double")
> > +   (set_attr "prefix" "orig,vex,evex")
> > +   (set_attr "btver2_decode" "double,double,double")
> >     (set_attr "mode" "TI")])
> >
> >  (define_insn "aesenclast"
> > -  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
> > -       (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x")
> > -                      (match_operand:V2DI 2 "vector_operand" "xBm,xm")]
> > +  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
> > +       (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
> > +                      (match_operand:V2DI 2 "vector_operand"
> > + "xBm,xm,vm")]
> >                       UNSPEC_AESENCLAST))]
> > -  "TARGET_AES"
> > +  "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
> >    "@
> >     aesenclast\t{%2, %0|%0, %2}
> > +   vaesenclast\t{%2, %1, %0|%0, %1, %2}
> >     vaesenclast\t{%2, %1, %0|%0, %1, %2}"
> > -  [(set_attr "isa" "noavx,avx")
> > +  [(set_attr "isa" "noavx,aes,avx512vl")
> >     (set_attr "type" "sselog1")
> >     (set_attr "prefix_extra" "1")
> > -   (set_attr "prefix" "orig,vex")
> > -   (set_attr "btver2_decode" "double,double")
> > +   (set_attr "prefix" "orig,vex,evex")
> > +   (set_attr "btver2_decode" "double,double,double")
> >     (set_attr "mode" "TI")])
> >
> >  (define_insn "aesdec"
> > -  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
> > -       (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x")
> > -                      (match_operand:V2DI 2 "vector_operand" "xBm,xm")]
> > +  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
> > +       (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
> > +                      (match_operand:V2DI 2 "vector_operand"
> > + "xBm,xm,vm")]
> >                       UNSPEC_AESDEC))]
> > -  "TARGET_AES"
> > +  "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
> >    "@
> >     aesdec\t{%2, %0|%0, %2}
> > +   vaesdec\t{%2, %1, %0|%0, %1, %2}
> >     vaesdec\t{%2, %1, %0|%0, %1, %2}"
> > -  [(set_attr "isa" "noavx,avx")
> > +  [(set_attr "isa" "noavx,aes,avx512vl")
> >     (set_attr "type" "sselog1")
> >     (set_attr "prefix_extra" "1")
> > -   (set_attr "prefix" "orig,vex")
> > -   (set_attr "btver2_decode" "double,double")
> > +   (set_attr "prefix" "orig,vex,evex")
> > +   (set_attr "btver2_decode" "double,double,double")
> >     (set_attr "mode" "TI")])
> >
> >  (define_insn "aesdeclast"
> > -  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
> > -       (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x")
> > -                      (match_operand:V2DI 2 "vector_operand" "xBm,xm")]
> > +  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
> > +       (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
> > +                      (match_operand:V2DI 2 "vector_operand"
> > + "xBm,xm,vm")]
> >                       UNSPEC_AESDECLAST))]
> > -  "TARGET_AES"
> > +  "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
> >    "@
> >     aesdeclast\t{%2, %0|%0, %2}
> > +   vaesdeclast\t{%2, %1, %0|%0, %1, %2}
> >     vaesdeclast\t{%2, %1, %0|%0, %1, %2}"
> > -  [(set_attr "isa" "noavx,avx")
> > +  [(set_attr "isa" "noavx,aes,avx512vl")
> >     (set_attr "type" "sselog1")
> >     (set_attr "prefix_extra" "1")
> > -   (set_attr "prefix" "orig,vex")
> > -   (set_attr "btver2_decode" "double,double")
> > +   (set_attr "prefix" "orig,vex,evex")
> > +   (set_attr "btver2_decode" "double,double,double")
> >     (set_attr "mode" "TI")])
> >
> >  (define_insn "aesimc"
> > diff --git a/gcc/config/i386/vaesintrin.h
> > b/gcc/config/i386/vaesintrin.h index 0f1cffe71e9..58fc19c9eb3 100644
> > --- a/gcc/config/i386/vaesintrin.h
> > +++ b/gcc/config/i386/vaesintrin.h
> > @@ -24,9 +24,9 @@
> >  #ifndef __VAESINTRIN_H_INCLUDED
> >  #define __VAESINTRIN_H_INCLUDED
> >
> > -#if !defined(__VAES__) || !defined(__AVX__)
> > +#if !defined(__VAES__)
> >  #pragma GCC push_options
> > -#pragma GCC target("vaes,avx")
> > +#pragma GCC target("vaes")
> >  #define __DISABLE_VAES__
> >  #endif /* __VAES__ */
> >
> > diff --git a/gcc/config/i386/wmmintrin.h b/gcc/config/i386/wmmintrin.h
> > index ae15cea429e..da314dbd44d 100644
> > --- a/gcc/config/i386/wmmintrin.h
> > +++ b/gcc/config/i386/wmmintrin.h
> > @@ -40,36 +40,23 @@
> >
> >  /* Performs 1 round of AES decryption of the first m128i using
> >     the second m128i as a round key.  */ -extern __inline __m128i
> > __attribute__((__gnu_inline__, __always_inline__, __artificial__))
> > -_mm_aesdec_si128 (__m128i __X, __m128i __Y) -{
> > -  return (__m128i) __builtin_ia32_aesdec128 ((__v2di)__X,
> > (__v2di)__Y); -}
> > +#define _mm_aesdec_si128(X, Y) \
> > +  (__m128i) __builtin_ia32_aesdec128 ((__v2di) (X), (__v2di) (Y))
> >
> >  /* Performs the last round of AES decryption of the first m128i
> >     using the second m128i as a round key.  */ -extern __inline
> > __m128i __attribute__((__gnu_inline__, __always_inline__,
> > __artificial__))
> > -_mm_aesdeclast_si128 (__m128i __X, __m128i __Y) -{
> > -  return (__m128i) __builtin_ia32_aesdeclast128 ((__v2di)__X,
> > -                                                (__v2di)__Y);
> > -}
> > +#define _mm_aesdeclast_si128(X, Y) \
> > +  (__m128i) __builtin_ia32_aesdeclast128 ((__v2di) (X), (__v2di) (Y))
> >
> >  /* Performs 1 round of AES encryption of the first m128i using
> >     the second m128i as a round key.  */ -extern __inline __m128i
> > __attribute__((__gnu_inline__, __always_inline__, __artificial__))
> > -_mm_aesenc_si128 (__m128i __X, __m128i __Y) -{
> > -  return (__m128i) __builtin_ia32_aesenc128 ((__v2di)__X,
> > (__v2di)__Y); -}
> > +#define _mm_aesenc_si128(X, Y) \
> > +  (__m128i) __builtin_ia32_aesenc128 ((__v2di) (X), (__v2di) (Y))
> >
> >  /* Performs the last round of AES encryption of the first m128i
> >     using the second m128i as a round key.  */ -extern __inline
> > __m128i __attribute__((__gnu_inline__, __always_inline__,
> > __artificial__))
> > -_mm_aesenclast_si128 (__m128i __X, __m128i __Y) -{
> > -  return (__m128i) __builtin_ia32_aesenclast128 ((__v2di)__X,
> > (__v2di)__Y); -}
> > +#define _mm_aesenclast_si128(X, Y) \
> > +  (__m128i) __builtin_ia32_aesenclast128 ((__v2di) (X), (__v2di) (Y))
> >
> >  /* Performs the InverseMixColumn operation on the source m128i
> >     and stores the result into m128i destination.  */ diff --git
> > a/gcc/testsuite/gcc.target/i386/avx512fvl-vaes-1.c
> > b/gcc/testsuite/gcc.target/i386/avx512fvl-vaes-1.c
> > index c65b570cd47..f35742ec98b 100644
> > --- a/gcc/testsuite/gcc.target/i386/avx512fvl-vaes-1.c
> > +++ b/gcc/testsuite/gcc.target/i386/avx512fvl-vaes-1.c
> > @@ -10,10 +10,16 @@
> >  /* { dg-final { scan-assembler-times "vaesenc\[
> > \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\{\n\]*%ymm\[0-
> 9\]+\[^\{\n\]*%ymm\[0-9\
> > ]+(?:\n|\[ \\t\]+#)"  1 } } */
> >  /* { dg-final { scan-assembler-times "vaesenclast\[
> > \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\{\n\]*%ymm\[0-
> 9\]+\[^\{\n\]*%ymm\[0-9\
> > ]+(?:\n|\[ \\t\]+#)"  1 } } */
> >
> > +/* { dg-final { scan-assembler-times "vaesdec\[
> > +\\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\{\n\]*%xmm\[0-
> 9\]+\[^\{\n\]*%xmm\[0-9
> > +\]+(?:\n|\[ \\t\]+#)"  1 } } */
> > +/* { dg-final { scan-assembler-times "vaesdeclast\[
> > +\\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\{\n\]*%xmm\[0-
> 9\]+\[^\{\n\]*%xmm\[0-9
> > +\]+(?:\n|\[ \\t\]+#)"  1 } } */
> > +/* { dg-final { scan-assembler-times "vaesenc\[
> > +\\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\{\n\]*%xmm\[0-
> 9\]+\[^\{\n\]*%xmm\[0-9
> > +\]+(?:\n|\[ \\t\]+#)"  1 } } */
> > +/* { dg-final { scan-assembler-times "vaesenclast\[
> > +\\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\{\n\]*%xmm\[0-
> 9\]+\[^\{\n\]*%xmm\[0-9
> > +\]+(?:\n|\[ \\t\]+#)"  1 } } */
> > +
> >  #include <immintrin.h>
> >
> >  volatile __m512i x,y;
> >  volatile __m256i x256, y256;
> > +volatile __m128i x128, y128;
> >
> >  void extern
> >  avx512f_test (void)
> > @@ -27,4 +33,9 @@ avx512f_test (void)
> >    x256 = _mm256_aesdeclast_epi128 (x256, y256);
> >    x256 = _mm256_aesenc_epi128 (x256, y256);
> >    x256 = _mm256_aesenclast_epi128 (x256, y256);
> > +
> > +  x128 = _mm_aesdec_si128 (x128, y128);
> > +  x128 = _mm_aesdeclast_si128 (x128, y128);
> > +  x128 = _mm_aesenc_si128 (x128, y128);
> > +  x128 = _mm_aesenclast_si128 (x128, y128);
> >  }
> > diff --git a/gcc/testsuite/gcc.target/i386/pr84335.c
> > b/gcc/testsuite/gcc.target/i386/pr84335.c
> > index c8d2a712f1f..5e45e2b322a 100644
> > --- a/gcc/testsuite/gcc.target/i386/pr84335.c
> > +++ b/gcc/testsuite/gcc.target/i386/pr84335.c
> > @@ -6,5 +6,5 @@ typedef long long V __attribute__ ((__vector_size__
> > (16)));  V  foo (V *a, V *b)  {
> > -  return __builtin_ia32_aesenc128 (*a, *b);    /* { dg-error "needs isa
> option" } */
> > -}
> > +  return __builtin_ia32_aesenc128 (*a, *b);    /* { dg-warning "implicit
> declaration of function" } */
> > +}                                              /* { dg-error "incompatible types when returning
> type" "" { target *-*-* } .-1 } */
> > --
> > 2.31.1
> >
> 
> 
> --
> BR,
> Hongtao

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] i386: Share AES xmm intrin with VAES
  2023-04-19  2:40   ` Jiang, Haochen
@ 2023-04-19  2:42     ` Liu, Hongtao
  2024-04-04  8:41     ` [PATCH] i386: Fix aes/vaes patterns [PR114576] Jakub Jelinek
  1 sibling, 0 replies; 12+ messages in thread
From: Liu, Hongtao @ 2023-04-19  2:42 UTC (permalink / raw)
  To: Jiang, Haochen, Hongtao Liu; +Cc: gcc-patches, ubizjak



> -----Original Message-----
> From: Jiang, Haochen <haochen.jiang@intel.com>
> Sent: Wednesday, April 19, 2023 10:41 AM
> To: Hongtao Liu <crazylht@gmail.com>
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao <hongtao.liu@intel.com>;
> ubizjak@gmail.com
> Subject: RE: [PATCH] i386: Share AES xmm intrin with VAES
> 
> > > a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index
> > > 33e281901cf..e7d565a8389 100644
> > > --- a/gcc/config/i386/sse.md
> > > +++ b/gcc/config/i386/sse.md
> > > @@ -25107,67 +25107,71 @@
> > >
> > > ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
> > > ;;
> > > ;;
> > >
> > >  (define_insn "aesenc"
> > > -  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
> > > -       (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x")
> > > -                      (match_operand:V2DI 2 "vector_operand" "xBm,xm")]
> > > +  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
> > > +       (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
> > > +                      (match_operand:V2DI 2 "vector_operand"
> > > + "xBm,xm,vm")]
> > >                       UNSPEC_AESENC))]
> > > -  "TARGET_AES"
> > > +  "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
> > >    "@
> > >     aesenc\t{%2, %0|%0, %2}
> > > +   vaesenc\t{%2, %1, %0|%0, %1, %2}
> > >     vaesenc\t{%2, %1, %0|%0, %1, %2}"
> > > -  [(set_attr "isa" "noavx,avx")
> > > +  [(set_attr "isa" "noavx,aes,avx512vl")
> > Shouldn't it be vaes_avx512vl and then remove " || (TARGET_VAES &&
> > TARGET_AVX512VL)" from condition.
> 
> Since VAES should not imply AES, we need that "|| (TARGET_VAES &&
> TARGET_AVX512VL)"
> 
> And there is no need to add vaes_avx512vl since the last alternative will only
> be hit when there is no aes. When there is no aes, the pattern will need vaes
> and avx512vl both or we could not use this pattern. avx512vl here is just like a
> placeholder.
Ok, I see, then LGTM.
> 
> BRs,
> Haochen
> 
> > Similar for below patterns.
> > Others LGTM.
> > >     (set_attr "type" "sselog1")
> > >     (set_attr "prefix_extra" "1")
> > > -   (set_attr "prefix" "orig,vex")
> > > -   (set_attr "btver2_decode" "double,double")
> > > +   (set_attr "prefix" "orig,vex,evex")
> > > +   (set_attr "btver2_decode" "double,double,double")
> > >     (set_attr "mode" "TI")])
> > >
> > >  (define_insn "aesenclast"
> > > -  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
> > > -       (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x")
> > > -                      (match_operand:V2DI 2 "vector_operand" "xBm,xm")]
> > > +  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
> > > +       (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
> > > +                      (match_operand:V2DI 2 "vector_operand"
> > > + "xBm,xm,vm")]
> > >                       UNSPEC_AESENCLAST))]
> > > -  "TARGET_AES"
> > > +  "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
> > >    "@
> > >     aesenclast\t{%2, %0|%0, %2}
> > > +   vaesenclast\t{%2, %1, %0|%0, %1, %2}
> > >     vaesenclast\t{%2, %1, %0|%0, %1, %2}"
> > > -  [(set_attr "isa" "noavx,avx")
> > > +  [(set_attr "isa" "noavx,aes,avx512vl")
> > >     (set_attr "type" "sselog1")
> > >     (set_attr "prefix_extra" "1")
> > > -   (set_attr "prefix" "orig,vex")
> > > -   (set_attr "btver2_decode" "double,double")
> > > +   (set_attr "prefix" "orig,vex,evex")
> > > +   (set_attr "btver2_decode" "double,double,double")
> > >     (set_attr "mode" "TI")])
> > >
> > >  (define_insn "aesdec"
> > > -  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
> > > -       (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x")
> > > -                      (match_operand:V2DI 2 "vector_operand" "xBm,xm")]
> > > +  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
> > > +       (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
> > > +                      (match_operand:V2DI 2 "vector_operand"
> > > + "xBm,xm,vm")]
> > >                       UNSPEC_AESDEC))]
> > > -  "TARGET_AES"
> > > +  "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
> > >    "@
> > >     aesdec\t{%2, %0|%0, %2}
> > > +   vaesdec\t{%2, %1, %0|%0, %1, %2}
> > >     vaesdec\t{%2, %1, %0|%0, %1, %2}"
> > > -  [(set_attr "isa" "noavx,avx")
> > > +  [(set_attr "isa" "noavx,aes,avx512vl")
> > >     (set_attr "type" "sselog1")
> > >     (set_attr "prefix_extra" "1")
> > > -   (set_attr "prefix" "orig,vex")
> > > -   (set_attr "btver2_decode" "double,double")
> > > +   (set_attr "prefix" "orig,vex,evex")
> > > +   (set_attr "btver2_decode" "double,double,double")
> > >     (set_attr "mode" "TI")])
> > >
> > >  (define_insn "aesdeclast"
> > > -  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
> > > -       (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x")
> > > -                      (match_operand:V2DI 2 "vector_operand" "xBm,xm")]
> > > +  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
> > > +       (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
> > > +                      (match_operand:V2DI 2 "vector_operand"
> > > + "xBm,xm,vm")]
> > >                       UNSPEC_AESDECLAST))]
> > > -  "TARGET_AES"
> > > +  "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
> > >    "@
> > >     aesdeclast\t{%2, %0|%0, %2}
> > > +   vaesdeclast\t{%2, %1, %0|%0, %1, %2}
> > >     vaesdeclast\t{%2, %1, %0|%0, %1, %2}"
> > > -  [(set_attr "isa" "noavx,avx")
> > > +  [(set_attr "isa" "noavx,aes,avx512vl")
> > >     (set_attr "type" "sselog1")
> > >     (set_attr "prefix_extra" "1")
> > > -   (set_attr "prefix" "orig,vex")
> > > -   (set_attr "btver2_decode" "double,double")
> > > +   (set_attr "prefix" "orig,vex,evex")
> > > +   (set_attr "btver2_decode" "double,double,double")
> > >     (set_attr "mode" "TI")])
> > >
> > >  (define_insn "aesimc"
> > > diff --git a/gcc/config/i386/vaesintrin.h
> > > b/gcc/config/i386/vaesintrin.h index 0f1cffe71e9..58fc19c9eb3 100644
> > > --- a/gcc/config/i386/vaesintrin.h
> > > +++ b/gcc/config/i386/vaesintrin.h
> > > @@ -24,9 +24,9 @@
> > >  #ifndef __VAESINTRIN_H_INCLUDED
> > >  #define __VAESINTRIN_H_INCLUDED
> > >
> > > -#if !defined(__VAES__) || !defined(__AVX__)
> > > +#if !defined(__VAES__)
> > >  #pragma GCC push_options
> > > -#pragma GCC target("vaes,avx")
> > > +#pragma GCC target("vaes")
> > >  #define __DISABLE_VAES__
> > >  #endif /* __VAES__ */
> > >
> > > diff --git a/gcc/config/i386/wmmintrin.h
> > > b/gcc/config/i386/wmmintrin.h index ae15cea429e..da314dbd44d 100644
> > > --- a/gcc/config/i386/wmmintrin.h
> > > +++ b/gcc/config/i386/wmmintrin.h
> > > @@ -40,36 +40,23 @@
> > >
> > >  /* Performs 1 round of AES decryption of the first m128i using
> > >     the second m128i as a round key.  */ -extern __inline __m128i
> > > __attribute__((__gnu_inline__, __always_inline__, __artificial__))
> > > -_mm_aesdec_si128 (__m128i __X, __m128i __Y) -{
> > > -  return (__m128i) __builtin_ia32_aesdec128 ((__v2di)__X,
> > > (__v2di)__Y); -}
> > > +#define _mm_aesdec_si128(X, Y) \
> > > +  (__m128i) __builtin_ia32_aesdec128 ((__v2di) (X), (__v2di) (Y))
> > >
> > >  /* Performs the last round of AES decryption of the first m128i
> > >     using the second m128i as a round key.  */ -extern __inline
> > > __m128i __attribute__((__gnu_inline__, __always_inline__,
> > > __artificial__))
> > > -_mm_aesdeclast_si128 (__m128i __X, __m128i __Y) -{
> > > -  return (__m128i) __builtin_ia32_aesdeclast128 ((__v2di)__X,
> > > -                                                (__v2di)__Y);
> > > -}
> > > +#define _mm_aesdeclast_si128(X, Y) \
> > > +  (__m128i) __builtin_ia32_aesdeclast128 ((__v2di) (X), (__v2di)
> > > +(Y))
> > >
> > >  /* Performs 1 round of AES encryption of the first m128i using
> > >     the second m128i as a round key.  */ -extern __inline __m128i
> > > __attribute__((__gnu_inline__, __always_inline__, __artificial__))
> > > -_mm_aesenc_si128 (__m128i __X, __m128i __Y) -{
> > > -  return (__m128i) __builtin_ia32_aesenc128 ((__v2di)__X,
> > > (__v2di)__Y); -}
> > > +#define _mm_aesenc_si128(X, Y) \
> > > +  (__m128i) __builtin_ia32_aesenc128 ((__v2di) (X), (__v2di) (Y))
> > >
> > >  /* Performs the last round of AES encryption of the first m128i
> > >     using the second m128i as a round key.  */ -extern __inline
> > > __m128i __attribute__((__gnu_inline__, __always_inline__,
> > > __artificial__))
> > > -_mm_aesenclast_si128 (__m128i __X, __m128i __Y) -{
> > > -  return (__m128i) __builtin_ia32_aesenclast128 ((__v2di)__X,
> > > (__v2di)__Y); -}
> > > +#define _mm_aesenclast_si128(X, Y) \
> > > +  (__m128i) __builtin_ia32_aesenclast128 ((__v2di) (X), (__v2di)
> > > +(Y))
> > >
> > >  /* Performs the InverseMixColumn operation on the source m128i
> > >     and stores the result into m128i destination.  */ diff --git
> > > a/gcc/testsuite/gcc.target/i386/avx512fvl-vaes-1.c
> > > b/gcc/testsuite/gcc.target/i386/avx512fvl-vaes-1.c
> > > index c65b570cd47..f35742ec98b 100644
> > > --- a/gcc/testsuite/gcc.target/i386/avx512fvl-vaes-1.c
> > > +++ b/gcc/testsuite/gcc.target/i386/avx512fvl-vaes-1.c
> > > @@ -10,10 +10,16 @@
> > >  /* { dg-final { scan-assembler-times "vaesenc\[
> > > \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\{\n\]*%ymm\[0-
> > 9\]+\[^\{\n\]*%ymm\[0-9\
> > > ]+(?:\n|\[ \\t\]+#)"  1 } } */
> > >  /* { dg-final { scan-assembler-times "vaesenclast\[
> > > \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\{\n\]*%ymm\[0-
> > 9\]+\[^\{\n\]*%ymm\[0-9\
> > > ]+(?:\n|\[ \\t\]+#)"  1 } } */
> > >
> > > +/* { dg-final { scan-assembler-times "vaesdec\[
> > > +\\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\{\n\]*%xmm\[0-
> > 9\]+\[^\{\n\]*%xmm\[0-9
> > > +\]+(?:\n|\[ \\t\]+#)"  1 } } */
> > > +/* { dg-final { scan-assembler-times "vaesdeclast\[
> > > +\\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\{\n\]*%xmm\[0-
> > 9\]+\[^\{\n\]*%xmm\[0-9
> > > +\]+(?:\n|\[ \\t\]+#)"  1 } } */
> > > +/* { dg-final { scan-assembler-times "vaesenc\[
> > > +\\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\{\n\]*%xmm\[0-
> > 9\]+\[^\{\n\]*%xmm\[0-9
> > > +\]+(?:\n|\[ \\t\]+#)"  1 } } */
> > > +/* { dg-final { scan-assembler-times "vaesenclast\[
> > > +\\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\{\n\]*%xmm\[0-
> > 9\]+\[^\{\n\]*%xmm\[0-9
> > > +\]+(?:\n|\[ \\t\]+#)"  1 } } */
> > > +
> > >  #include <immintrin.h>
> > >
> > >  volatile __m512i x,y;
> > >  volatile __m256i x256, y256;
> > > +volatile __m128i x128, y128;
> > >
> > >  void extern
> > >  avx512f_test (void)
> > > @@ -27,4 +33,9 @@ avx512f_test (void)
> > >    x256 = _mm256_aesdeclast_epi128 (x256, y256);
> > >    x256 = _mm256_aesenc_epi128 (x256, y256);
> > >    x256 = _mm256_aesenclast_epi128 (x256, y256);
> > > +
> > > +  x128 = _mm_aesdec_si128 (x128, y128);
> > > +  x128 = _mm_aesdeclast_si128 (x128, y128);
> > > +  x128 = _mm_aesenc_si128 (x128, y128);
> > > +  x128 = _mm_aesenclast_si128 (x128, y128);
> > >  }
> > > diff --git a/gcc/testsuite/gcc.target/i386/pr84335.c
> > > b/gcc/testsuite/gcc.target/i386/pr84335.c
> > > index c8d2a712f1f..5e45e2b322a 100644
> > > --- a/gcc/testsuite/gcc.target/i386/pr84335.c
> > > +++ b/gcc/testsuite/gcc.target/i386/pr84335.c
> > > @@ -6,5 +6,5 @@ typedef long long V __attribute__ ((__vector_size__
> > > (16)));  V  foo (V *a, V *b)  {
> > > -  return __builtin_ia32_aesenc128 (*a, *b);    /* { dg-error "needs isa
> > option" } */
> > > -}
> > > +  return __builtin_ia32_aesenc128 (*a, *b);    /* { dg-warning "implicit
> > declaration of function" } */
> > > +}                                              /* { dg-error "incompatible types when
> returning
> > type" "" { target *-*-* } .-1 } */
> > > --
> > > 2.31.1
> > >
> >
> >
> > --
> > BR,
> > Hongtao

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH] i386: Fix aes/vaes patterns [PR114576]
  2023-04-19  2:40   ` Jiang, Haochen
  2023-04-19  2:42     ` Liu, Hongtao
@ 2024-04-04  8:41     ` Jakub Jelinek
  2024-04-08 12:33       ` Jiang, Haochen
  2024-04-09  3:23       ` Hongtao Liu
  1 sibling, 2 replies; 12+ messages in thread
From: Jakub Jelinek @ 2024-04-04  8:41 UTC (permalink / raw)
  To: Hongtao Liu, Jiang, Haochen; +Cc: gcc-patches, Liu, Hongtao, ubizjak

On Wed, Apr 19, 2023 at 02:40:59AM +0000, Jiang, Haochen via Gcc-patches wrote:
> > >  (define_insn "aesenc"
> > > -  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
> > > -       (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x")
> > > -                      (match_operand:V2DI 2 "vector_operand" "xBm,xm")]
> > > +  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
> > > +       (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
> > > +                      (match_operand:V2DI 2 "vector_operand"
> > > + "xBm,xm,vm")]
> > >                       UNSPEC_AESENC))]
> > > -  "TARGET_AES"
> > > +  "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
> > >    "@
> > >     aesenc\t{%2, %0|%0, %2}
> > > +   vaesenc\t{%2, %1, %0|%0, %1, %2}
> > >     vaesenc\t{%2, %1, %0|%0, %1, %2}"
> > > -  [(set_attr "isa" "noavx,avx")
> > > +  [(set_attr "isa" "noavx,aes,avx512vl")
> > Shouldn't it be vaes_avx512vl and then remove " || (TARGET_VAES &&
> > TARGET_AVX512VL)" from condition.
> 
> Since VAES should not imply AES, we need that "|| (TARGET_VAES && 
> TARGET_AVX512VL)"
> 
> And there is no need to add vaes_avx512vl since the last alternative will only
> be hit when there is no aes. When there is no aes, the pattern will need vaes
> and avx512vl both or we could not use this pattern. avx512vl here is just like
> a placeholder.

As the following testcase shows, the above change was incorrect.

Using aes isa for the second alternative is obviously wrong, aes is enabled
whenever -maes is, regardless of -mavx or -mno-avx, so the above change
means that for -maes -mno-avx RA can choose, either it matches the first
alternative with the dup operand, or it matches the second one (but that
is of course wrong because vaesenc VEX encoded insn needs AES & AVX CPUID).

The big question is if "Since VAES should not imply AES" is the case or not.
Looking around at what LLVM does on godbolt, seems since clang 6 which added
-mvaes support -mvaes there implies -maes, but GCC treats those two
independent.

Now, if we'd take the LLVM path of making -mvaes imply -maes and -mno-aes
imply -mno-vaes, then we should probably just revert the above patch and
tweak common/config/i386/ to do the implications (+ add the testcase from
this patch).

If we keep the current behavior, where AES and VAES are completely
independent extensions, then we need to do more changes as the following
patch attempts to do.
We should use the aesenc etc. insns for noavx as before, we know at that
point that TARGET_AES must be true because (TARGET_VAES && TARGET_AVX512VL)
won't be true when !TARGET_AVX - TARGET_AVX512VL implies TARGET_AVX.
For the second alternative, i.e. the AVX AES VEX encoded case, the patch
uses aes_avx isa which requires both.  Now, for the third one we can't
use avx512vl isa attribute, because one could compile with
-maes -mavx512vl -mno-vaes and in that case we want VEX encoded vaesenc
which can't use %xmm16+ (nor EGPRs), so we need vaes_avx512vl isa to
ensure it is enabled only for -mvaes -mavx512vl.  And there is another
problem, with -mno-aes -mvaes -mavx512vl we could emit VEX encoded vaesenc
which requires AES and AVX ISAs rather than the VAES and AVX512VL which
are enabled.  So the patch uses the {evex} prefix for those cases.
And similarly for the vaes*_<mode> instructions, if they aren't 128-bit
or use %xmm16+ registers, the current case is fine, but if they are 128-bit
and use only %xmm0-15 registers, assembler would again emit VEX encoded insn
which needs AES & AVX CPUID, rather than the EVEX encoded ones which need
VAES & AVX512VL CPUIDs.
Still, I wonder if -mvaes shouldn't imply at least -mavx512f and
-mno-avx512f shouldn't imply -mno-vaes, because otherwise can't see how
it could use 512-bit registers (this part not done in the patch).

The following patch has been successfully bootstrapped/regtested on
x86_64-linux and i686-linux.

2024-04-04  Jakub Jelinek  <jakub@redhat.com>

	PR target/114576
	* config/i386/i386.md (isa): Remove aes, add aes_avx, vaes_avx512vl.
	(enabled): Remove aes isa check, add aes_avx and vaes_avx512vl.
	* config/i386/sse.md (aesenc, aesenclast, aesdec, aesdeclast): Add
	4th alternative, emit {evex} prefix for the third one, use
	noavx,aes_avx,vaes_avx512vl,vaes_avx512vl isa attribute, use jm
	rather than m constraint on the 2nd and 3rd alternative input.
	(vaesdec_<mode>, vaesdeclast_<mode>, vaesenc_<mode>,
	vaesenclast_<mode>): Add second alternative with x instead of v
	and jm instead of m.

	* gcc.target/i386/aes-pr114576.c: New test.

--- gcc/config/i386/i386.md.jj	2024-03-18 22:15:43.165839479 +0100
+++ gcc/config/i386/i386.md	2024-04-04 00:48:46.575511556 +0200
@@ -568,13 +568,14 @@ (define_attr "unit" "integer,i387,sse,mm
 
 ;; Used to control the "enabled" attribute on a per-instruction basis.
 (define_attr "isa" "base,x64,nox64,x64_sse2,x64_sse4,x64_sse4_noavx,
-		    x64_avx,x64_avx512bw,x64_avx512dq,aes,apx_ndd,
+		    x64_avx,x64_avx512bw,x64_avx512dq,apx_ndd,
 		    sse_noavx,sse2,sse2_noavx,sse3,sse3_noavx,sse4,sse4_noavx,
 		    avx,noavx,avx2,noavx2,bmi,bmi2,fma4,fma,avx512f,avx512f_512,
 		    noavx512f,avx512bw,avx512bw_512,noavx512bw,avx512dq,
 		    noavx512dq,fma_or_avx512vl,avx512vl,noavx512vl,avxvnni,
 		    avx512vnnivl,avx512fp16,avxifma,avx512ifmavl,avxneconvert,
-		    avx512bf16vl,vpclmulqdqvl,avx_noavx512f,avx_noavx512vl"
+		    avx512bf16vl,vpclmulqdqvl,avx_noavx512f,avx_noavx512vl,
+		    aes_avx,vaes_avx512vl"
   (const_string "base"))
 
 ;; The (bounding maximum) length of an instruction immediate.
@@ -915,7 +916,6 @@ (define_attr "enabled" ""
 	   (symbol_ref "TARGET_64BIT && TARGET_AVX512BW")
 	 (eq_attr "isa" "x64_avx512dq")
 	   (symbol_ref "TARGET_64BIT && TARGET_AVX512DQ")
-	 (eq_attr "isa" "aes") (symbol_ref "TARGET_AES")
 	 (eq_attr "isa" "sse_noavx")
 	   (symbol_ref "TARGET_SSE && !TARGET_AVX")
 	 (eq_attr "isa" "sse2") (symbol_ref "TARGET_SSE2")
@@ -968,6 +968,10 @@ (define_attr "enabled" ""
 	   (symbol_ref "TARGET_VPCLMULQDQ && TARGET_AVX512VL")
 	 (eq_attr "isa" "apx_ndd")
 	   (symbol_ref "TARGET_APX_NDD")
+	 (eq_attr "isa" "aes_avx")
+	   (symbol_ref "TARGET_AES && TARGET_AVX")
+	 (eq_attr "isa" "vaes_avx512vl")
+	   (symbol_ref "TARGET_VAES && TARGET_AVX512VL")
 
 	 (eq_attr "mmx_isa" "native")
 	   (symbol_ref "!TARGET_MMX_WITH_SSE")
--- gcc/config/i386/sse.md.jj	2024-03-18 22:15:43.168839437 +0100
+++ gcc/config/i386/sse.md	2024-04-04 00:58:56.482090689 +0200
@@ -26277,75 +26277,79 @@ (define_insn "xop_vpermil2<mode>3"
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 
 (define_insn "aesenc"
-  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
-	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
-		       (match_operand:V2DI 2 "vector_operand" "xja,xm,vm")]
+  [(set (match_operand:V2DI 0 "register_operand" "=x,x,x,v")
+	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,x,v")
+		       (match_operand:V2DI 2 "vector_operand" "xja,xjm,xjm,vm")]
 		      UNSPEC_AESENC))]
   "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
   "@
    aesenc\t{%2, %0|%0, %2}
    vaesenc\t{%2, %1, %0|%0, %1, %2}
+   %{evex%} vaesenc\t{%2, %1, %0|%0, %1, %2}
    vaesenc\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "isa" "noavx,aes,avx512vl")
+  [(set_attr "isa" "noavx,aes_avx,vaes_avx512vl,vaes_avx512vl")
    (set_attr "type" "sselog1")
-   (set_attr "addr" "gpr16,*,*")
+   (set_attr "addr" "gpr16,*,*,*")
    (set_attr "prefix_extra" "1")
-   (set_attr "prefix" "orig,vex,evex")
-   (set_attr "btver2_decode" "double,double,double")
+   (set_attr "prefix" "orig,vex,evex,evex")
+   (set_attr "btver2_decode" "double,double,double,double")
    (set_attr "mode" "TI")])
 
 (define_insn "aesenclast"
-  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
-	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
-		       (match_operand:V2DI 2 "vector_operand" "xja,xm,vm")]
+  [(set (match_operand:V2DI 0 "register_operand" "=x,x,x,v")
+	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,x,v")
+		       (match_operand:V2DI 2 "vector_operand" "xja,xjm,xjm,vm")]
 		      UNSPEC_AESENCLAST))]
   "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
   "@
    aesenclast\t{%2, %0|%0, %2}
    vaesenclast\t{%2, %1, %0|%0, %1, %2}
+   %{evex%} vaesenclast\t{%2, %1, %0|%0, %1, %2}
    vaesenclast\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "isa" "noavx,aes,avx512vl")
+  [(set_attr "isa" "noavx,aes_avx,vaes_avx512vl,vaes_avx512vl")
    (set_attr "type" "sselog1")
-   (set_attr "addr" "gpr16,*,*")
+   (set_attr "addr" "gpr16,*,*,*")
    (set_attr "prefix_extra" "1")
-   (set_attr "prefix" "orig,vex,evex")
-   (set_attr "btver2_decode" "double,double,double") 
+   (set_attr "prefix" "orig,vex,evex,evex")
+   (set_attr "btver2_decode" "double,double,double,double")
    (set_attr "mode" "TI")])
 
 (define_insn "aesdec"
-  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
-	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
-		       (match_operand:V2DI 2 "vector_operand" "xja,xm,vm")]
+  [(set (match_operand:V2DI 0 "register_operand" "=x,x,x,v")
+	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,x,v")
+		       (match_operand:V2DI 2 "vector_operand" "xja,xjm,xjm,vm")]
 		      UNSPEC_AESDEC))]
   "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
   "@
    aesdec\t{%2, %0|%0, %2}
    vaesdec\t{%2, %1, %0|%0, %1, %2}
+   %{evex%} vaesdec\t{%2, %1, %0|%0, %1, %2}
    vaesdec\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "isa" "noavx,aes,avx512vl")
+  [(set_attr "isa" "noavx,aes_avx,vaes_avx512vl,vaes_avx512vl")
    (set_attr "type" "sselog1")
-   (set_attr "addr" "gpr16,*,*")
+   (set_attr "addr" "gpr16,*,*,*")
    (set_attr "prefix_extra" "1")
-   (set_attr "prefix" "orig,vex,evex")
-   (set_attr "btver2_decode" "double,double,double") 
+   (set_attr "prefix" "orig,vex,evex,evex")
+   (set_attr "btver2_decode" "double,double,double,double") 
    (set_attr "mode" "TI")])
 
 (define_insn "aesdeclast"
-  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
-	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
-		       (match_operand:V2DI 2 "vector_operand" "xja,xm,vm")]
+  [(set (match_operand:V2DI 0 "register_operand" "=x,x,x,v")
+	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,x,v")
+		       (match_operand:V2DI 2 "vector_operand" "xja,xjm,xjm,vm")]
 		      UNSPEC_AESDECLAST))]
   "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
   "@
    aesdeclast\t{%2, %0|%0, %2}
    vaesdeclast\t{%2, %1, %0|%0, %1, %2}
+   %{evex%} vaesdeclast\t{%2, %1, %0|%0, %1, %2}
    vaesdeclast\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "isa" "noavx,aes,avx512vl")
-   (set_attr "addr" "gpr16,*,*")
+  [(set_attr "isa" "noavx,aes_avx,vaes_avx512vl,vaes_avx512vl")
+   (set_attr "addr" "gpr16,*,*,*")
    (set_attr "type" "sselog1")
    (set_attr "prefix_extra" "1")
-   (set_attr "prefix" "orig,vex,evex")
-   (set_attr "btver2_decode" "double,double,double")
+   (set_attr "prefix" "orig,vex,evex,evex")
+   (set_attr "btver2_decode" "double,double,double,double")
    (set_attr "mode" "TI")])
 
 (define_insn "aesimc"
@@ -30246,44 +30250,60 @@ (define_insn "vpdpwssds_<mode>_maskz_1"
    [(set_attr ("prefix") ("evex"))])
 
 (define_insn "vaesdec_<mode>"
-  [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=v")
+  [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=x,v")
 	(unspec:VI1_AVX512VL_F
-	  [(match_operand:VI1_AVX512VL_F 1 "register_operand" "v")
-	   (match_operand:VI1_AVX512VL_F 2 "vector_operand" "vm")]
+	  [(match_operand:VI1_AVX512VL_F 1 "register_operand" "x,v")
+	   (match_operand:VI1_AVX512VL_F 2 "vector_operand" "xjm,vm")]
 	  UNSPEC_VAESDEC))]
   "TARGET_VAES"
-  "vaesdec\t{%2, %1, %0|%0, %1, %2}"
-)
+{
+  if (which_alternative == 0 && <MODE>mode == V16QImode)
+    return "%{evex%} vaesdec\t{%2, %1, %0|%0, %1, %2}";
+  else
+    return "vaesdec\t{%2, %1, %0|%0, %1, %2}";
+})
 
 (define_insn "vaesdeclast_<mode>"
-  [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=v")
+  [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=x,v")
 	(unspec:VI1_AVX512VL_F
-	  [(match_operand:VI1_AVX512VL_F 1 "register_operand" "v")
-	   (match_operand:VI1_AVX512VL_F 2 "vector_operand" "vm")]
+	  [(match_operand:VI1_AVX512VL_F 1 "register_operand" "x,v")
+	   (match_operand:VI1_AVX512VL_F 2 "vector_operand" "xjm,vm")]
 	  UNSPEC_VAESDECLAST))]
   "TARGET_VAES"
-  "vaesdeclast\t{%2, %1, %0|%0, %1, %2}"
-)
+{
+  if (which_alternative == 0 && <MODE>mode == V16QImode)
+    return "%{evex%} vaesdeclast\t{%2, %1, %0|%0, %1, %2}";
+  else
+    return "vaesdeclast\t{%2, %1, %0|%0, %1, %2}";
+})
 
 (define_insn "vaesenc_<mode>"
-  [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=v")
+  [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=x,v")
 	(unspec:VI1_AVX512VL_F
-	  [(match_operand:VI1_AVX512VL_F 1 "register_operand" "v")
-	   (match_operand:VI1_AVX512VL_F 2 "vector_operand" "vm")]
+	  [(match_operand:VI1_AVX512VL_F 1 "register_operand" "x,v")
+	   (match_operand:VI1_AVX512VL_F 2 "vector_operand" "xjm,vm")]
 	  UNSPEC_VAESENC))]
   "TARGET_VAES"
-  "vaesenc\t{%2, %1, %0|%0, %1, %2}"
-)
+{
+  if (which_alternative == 0 && <MODE>mode == V16QImode)
+    return "%{evex%} vaesenc\t{%2, %1, %0|%0, %1, %2}";
+  else
+    return "vaesenc\t{%2, %1, %0|%0, %1, %2}";
+})
 
 (define_insn "vaesenclast_<mode>"
-  [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=v")
+  [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=x,v")
 	(unspec:VI1_AVX512VL_F
-	  [(match_operand:VI1_AVX512VL_F 1 "register_operand" "v")
-	   (match_operand:VI1_AVX512VL_F 2 "vector_operand" "vm")]
+	  [(match_operand:VI1_AVX512VL_F 1 "register_operand" "x,v")
+	   (match_operand:VI1_AVX512VL_F 2 "vector_operand" "xjm,vm")]
 	  UNSPEC_VAESENCLAST))]
   "TARGET_VAES"
-  "vaesenclast\t{%2, %1, %0|%0, %1, %2}"
-)
+{
+  if (which_alternative == 0 && <MODE>mode == V16QImode)
+    return "%{evex%} vaesenclast\t{%2, %1, %0|%0, %1, %2}";
+  else
+    return "vaesenclast\t{%2, %1, %0|%0, %1, %2}";
+})
 
 (define_insn "vpclmulqdq_<mode>"
   [(set (match_operand:VI8_FVL 0 "register_operand" "=v")
--- gcc/testsuite/gcc.target/i386/aes-pr114576.c.jj	2024-04-04 09:50:17.117757179 +0200
+++ gcc/testsuite/gcc.target/i386/aes-pr114576.c	2024-04-04 09:51:45.211544801 +0200
@@ -0,0 +1,63 @@
+/* PR target/114576 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -maes -mno-avx" } */
+/* { dg-final { scan-assembler-times "\taesenc\t" 2 } } */
+/* { dg-final { scan-assembler-times "\taesdec\t" 2 } } */
+/* { dg-final { scan-assembler-times "\taesenclast\t" 2 } } */
+/* { dg-final { scan-assembler-times "\taesdeclast\t" 2 } } */
+/* { dg-final { scan-assembler-not "\tvaesenc" } } */
+/* { dg-final { scan-assembler-not "\tvaesdec" } } */
+
+#include <immintrin.h>
+
+__m128i
+f1 (__m128i x, __m128i y)
+{
+  return _mm_aesenc_si128 (x, y);
+}
+
+__m128i
+f2 (__m128i x, __m128i y)
+{
+  __m128i z = _mm_aesenc_si128 (x, y);
+  return z + x + y;
+}
+
+__m128i
+f3 (__m128i x, __m128i y)
+{
+  return _mm_aesdec_si128 (x, y);
+}
+
+__m128i
+f4 (__m128i x, __m128i y)
+{
+  __m128i z = _mm_aesdec_si128 (x, y);
+  return z + x + y;
+}
+
+__m128i
+f5 (__m128i x, __m128i y)
+{
+  return _mm_aesenclast_si128 (x, y);
+}
+
+__m128i
+f6 (__m128i x, __m128i y)
+{
+  __m128i z = _mm_aesenclast_si128 (x, y);
+  return z + x + y;
+}
+
+__m128i
+f7 (__m128i x, __m128i y)
+{
+  return _mm_aesdeclast_si128 (x, y);
+}
+
+__m128i
+f8 (__m128i x, __m128i y)
+{
+  __m128i z = _mm_aesdeclast_si128 (x, y);
+  return z + x + y;
+}


	Jakub


^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] i386: Fix aes/vaes patterns [PR114576]
  2024-04-04  8:41     ` [PATCH] i386: Fix aes/vaes patterns [PR114576] Jakub Jelinek
@ 2024-04-08 12:33       ` Jiang, Haochen
  2024-04-08 12:43         ` Jakub Jelinek
  2024-04-09  3:23       ` Hongtao Liu
  1 sibling, 1 reply; 12+ messages in thread
From: Jiang, Haochen @ 2024-04-08 12:33 UTC (permalink / raw)
  To: Jakub Jelinek, Hongtao Liu; +Cc: gcc-patches, Liu, Hongtao, ubizjak

Hi Jakub,

Sorry for the late response since I am on vacation for now.

> As the following testcase shows, the above change was incorrect.
> 
> Using aes isa for the second alternative is obviously wrong, aes is enabled
> whenever -maes is, regardless of -mavx or -mno-avx, so the above change
> means that for -maes -mno-avx RA can choose, either it matches the first
> alternative with the dup operand, or it matches the second one (but that
> is of course wrong because vaesenc VEX encoded insn needs AES & AVX CPUID).

When I wrote that patch, I suppose it will never match the second one when
AVX is not enabled because it will immediately drop to the first one so the
second one is automatically AES && AVX, which is tricky here.

But this patch is buggy when "-maes -mavx512vl -mno-vaes" with %xmm16+ so
your change is needed, really appreciate that.

> 
> The big question is if "Since VAES should not imply AES" is the case or not.
> Looking around at what LLVM does on godbolt, seems since clang 6 which added
> -mvaes support -mvaes there implies -maes, but GCC treats those two
> independent.
> 
> Now, if we'd take the LLVM path of making -mvaes imply -maes and -mno-aes
> imply -mno-vaes, then we should probably just revert the above patch and
> tweak common/config/i386/ to do the implications (+ add the testcase from
> this patch).

LLVM always had less restrictions on ISA under such circumstances, I would like to
stick to how SDM did when implementing that, which is a little conservative.

However, I am also ok with VAES implying AES if there is no real HW that has
VAES w/o AES to reduce complexity in this scenario.

Thx,
Haochen

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] i386: Fix aes/vaes patterns [PR114576]
  2024-04-08 12:33       ` Jiang, Haochen
@ 2024-04-08 12:43         ` Jakub Jelinek
  2024-04-08 12:46           ` Jiang, Haochen
  0 siblings, 1 reply; 12+ messages in thread
From: Jakub Jelinek @ 2024-04-08 12:43 UTC (permalink / raw)
  To: Jiang, Haochen; +Cc: Hongtao Liu, gcc-patches, Liu, Hongtao, ubizjak

On Mon, Apr 08, 2024 at 12:33:39PM +0000, Jiang, Haochen wrote:
> Sorry for the late response since I am on vacation for now.
> 
> > As the following testcase shows, the above change was incorrect.
> > 
> > Using aes isa for the second alternative is obviously wrong, aes is enabled
> > whenever -maes is, regardless of -mavx or -mno-avx, so the above change
> > means that for -maes -mno-avx RA can choose, either it matches the first
> > alternative with the dup operand, or it matches the second one (but that
> > is of course wrong because vaesenc VEX encoded insn needs AES & AVX CPUID).
> 
> When I wrote that patch, I suppose it will never match the second one when
> AVX is not enabled because it will immediately drop to the first one so the
> second one is automatically AES && AVX, which is tricky here.

Before the -mvaes changes the alternatives were noavx,avx isa and so clearly
it was either the first alternative is the solely available, or the second,
depending on TARGET_AVX.  But with noavx,aes on the first alternative is
enabled only for !TARGET_AVX, but the second one whenever TARGET_AES, which
is both if !TARGET_AVX and TARGET_AVX.  So, the RA is free to consider both
alternatives, and because the first one is more restrictive (requires
output matching input), if there is a match between those, it will use the
first alternative, but if there isn't, it will happily use the second
alternative.

> LLVM always had less restrictions on ISA under such circumstances, I would like to
> stick to how SDM did when implementing that, which is a little conservative.
> 
> However, I am also ok with VAES implying AES if there is no real HW that has
> VAES w/o AES to reduce complexity in this scenario.

I'm fine with -mvaes not implying -maes, just want to mention that it is
fairly user visible thing and so we shouldn't be changing it after deciding
if we do it one way or another.  Now, I thought -mvaes was added in GCC 14,
but it has been around for a few years, so that means it is likely a bad
idea to change it now.

	Jakub


^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] i386: Fix aes/vaes patterns [PR114576]
  2024-04-08 12:43         ` Jakub Jelinek
@ 2024-04-08 12:46           ` Jiang, Haochen
  0 siblings, 0 replies; 12+ messages in thread
From: Jiang, Haochen @ 2024-04-08 12:46 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Hongtao Liu, gcc-patches, Liu, Hongtao, ubizjak

> -----Original Message-----
> From: Jakub Jelinek <jakub@redhat.com>
> Sent: Monday, April 8, 2024 9:43 PM
> To: Jiang, Haochen <haochen.jiang@intel.com>
> Cc: Hongtao Liu <crazylht@gmail.com>; gcc-patches@gcc.gnu.org; Liu, Hongtao
> <hongtao.liu@intel.com>; ubizjak@gmail.com
> Subject: Re: [PATCH] i386: Fix aes/vaes patterns [PR114576]
> 
> On Mon, Apr 08, 2024 at 12:33:39PM +0000, Jiang, Haochen wrote:
> > Sorry for the late response since I am on vacation for now.
> >
> > > As the following testcase shows, the above change was incorrect.
> > >
> > > Using aes isa for the second alternative is obviously wrong, aes is enabled
> > > whenever -maes is, regardless of -mavx or -mno-avx, so the above change
> > > means that for -maes -mno-avx RA can choose, either it matches the first
> > > alternative with the dup operand, or it matches the second one (but that
> > > is of course wrong because vaesenc VEX encoded insn needs AES & AVX
> CPUID).
> >
> > When I wrote that patch, I suppose it will never match the second one when
> > AVX is not enabled because it will immediately drop to the first one so the
> > second one is automatically AES && AVX, which is tricky here.
> 
> Before the -mvaes changes the alternatives were noavx,avx isa and so clearly
> it was either the first alternative is the solely available, or the second,
> depending on TARGET_AVX.  But with noavx,aes on the first alternative is
> enabled only for !TARGET_AVX, but the second one whenever TARGET_AES, which
> is both if !TARGET_AVX and TARGET_AVX.  So, the RA is free to consider both
> alternatives, and because the first one is more restrictive (requires
> output matching input), if there is a match between those, it will use the
> first alternative, but if there isn't, it will happily use the second
> alternative.
> 

Aha, I see. Thanks for the explanation.

Thx,
Haochen

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] i386: Fix aes/vaes patterns [PR114576]
  2024-04-04  8:41     ` [PATCH] i386: Fix aes/vaes patterns [PR114576] Jakub Jelinek
  2024-04-08 12:33       ` Jiang, Haochen
@ 2024-04-09  3:23       ` Hongtao Liu
  2024-04-09  9:18         ` [PATCH] i386, v2: " Jakub Jelinek
  1 sibling, 1 reply; 12+ messages in thread
From: Hongtao Liu @ 2024-04-09  3:23 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Jiang, Haochen, gcc-patches, Liu, Hongtao, ubizjak

On Thu, Apr 4, 2024 at 4:42 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Wed, Apr 19, 2023 at 02:40:59AM +0000, Jiang, Haochen via Gcc-patches wrote:
> > > >  (define_insn "aesenc"
> > > > -  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
> > > > -       (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x")
> > > > -                      (match_operand:V2DI 2 "vector_operand" "xBm,xm")]
> > > > +  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
> > > > +       (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
> > > > +                      (match_operand:V2DI 2 "vector_operand"
> > > > + "xBm,xm,vm")]
> > > >                       UNSPEC_AESENC))]
> > > > -  "TARGET_AES"
> > > > +  "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
> > > >    "@
> > > >     aesenc\t{%2, %0|%0, %2}
> > > > +   vaesenc\t{%2, %1, %0|%0, %1, %2}
> > > >     vaesenc\t{%2, %1, %0|%0, %1, %2}"
> > > > -  [(set_attr "isa" "noavx,avx")
> > > > +  [(set_attr "isa" "noavx,aes,avx512vl")
> > > Shouldn't it be vaes_avx512vl and then remove " || (TARGET_VAES &&
> > > TARGET_AVX512VL)" from condition.
> >
> > Since VAES should not imply AES, we need that "|| (TARGET_VAES &&
> > TARGET_AVX512VL)"
> >
> > And there is no need to add vaes_avx512vl since the last alternative will only
> > be hit when there is no aes. When there is no aes, the pattern will need vaes
> > and avx512vl both or we could not use this pattern. avx512vl here is just like
> > a placeholder.
>
> As the following testcase shows, the above change was incorrect.
>
> Using aes isa for the second alternative is obviously wrong, aes is enabled
> whenever -maes is, regardless of -mavx or -mno-avx, so the above change
> means that for -maes -mno-avx RA can choose, either it matches the first
> alternative with the dup operand, or it matches the second one (but that
> is of course wrong because vaesenc VEX encoded insn needs AES & AVX CPUID).
>
> The big question is if "Since VAES should not imply AES" is the case or not.
> Looking around at what LLVM does on godbolt, seems since clang 6 which added
> -mvaes support -mvaes there implies -maes, but GCC treats those two
> independent.
>
> Now, if we'd take the LLVM path of making -mvaes imply -maes and -mno-aes
> imply -mno-vaes, then we should probably just revert the above patch and
> tweak common/config/i386/ to do the implications (+ add the testcase from
> this patch).
>
> If we keep the current behavior, where AES and VAES are completely
> independent extensions, then we need to do more changes as the following
> patch attempts to do.
> We should use the aesenc etc. insns for noavx as before, we know at that
> point that TARGET_AES must be true because (TARGET_VAES && TARGET_AVX512VL)
> won't be true when !TARGET_AVX - TARGET_AVX512VL implies TARGET_AVX.
> For the second alternative, i.e. the AVX AES VEX encoded case, the patch
> uses aes_avx isa which requires both.  Now, for the third one we can't
> use avx512vl isa attribute, because one could compile with
> -maes -mavx512vl -mno-vaes and in that case we want VEX encoded vaesenc
> which can't use %xmm16+ (nor EGPRs), so we need vaes_avx512vl isa to
> ensure it is enabled only for -mvaes -mavx512vl.  And there is another
> problem, with -mno-aes -mvaes -mavx512vl we could emit VEX encoded vaesenc
> which requires AES and AVX ISAs rather than the VAES and AVX512VL which
> are enabled.  So the patch uses the {evex} prefix for those cases.
> And similarly for the vaes*_<mode> instructions, if they aren't 128-bit
> or use %xmm16+ registers, the current case is fine, but if they are 128-bit
> and use only %xmm0-15 registers, assembler would again emit VEX encoded insn
> which needs AES & AVX CPUID, rather than the EVEX encoded ones which need
> VAES & AVX512VL CPUIDs.
> Still, I wonder if -mvaes shouldn't imply at least -mavx512f and
> -mno-avx512f shouldn't imply -mno-vaes, because otherwise can't see how
> it could use 512-bit registers (this part not done in the patch).
>
> The following patch has been successfully bootstrapped/regtested on
> x86_64-linux and i686-linux.
>
> 2024-04-04  Jakub Jelinek  <jakub@redhat.com>
>
>         PR target/114576
>         * config/i386/i386.md (isa): Remove aes, add aes_avx, vaes_avx512vl.
>         (enabled): Remove aes isa check, add aes_avx and vaes_avx512vl.
>         * config/i386/sse.md (aesenc, aesenclast, aesdec, aesdeclast): Add
>         4th alternative, emit {evex} prefix for the third one, use
>         noavx,aes_avx,vaes_avx512vl,vaes_avx512vl isa attribute, use jm
>         rather than m constraint on the 2nd and 3rd alternative input.
>         (vaesdec_<mode>, vaesdeclast_<mode>, vaesenc_<mode>,
>         vaesenclast_<mode>): Add second alternative with x instead of v
>         and jm instead of m.
>
>         * gcc.target/i386/aes-pr114576.c: New test.
>
> --- gcc/config/i386/i386.md.jj  2024-03-18 22:15:43.165839479 +0100
> +++ gcc/config/i386/i386.md     2024-04-04 00:48:46.575511556 +0200
> @@ -568,13 +568,14 @@ (define_attr "unit" "integer,i387,sse,mm
>
>  ;; Used to control the "enabled" attribute on a per-instruction basis.
>  (define_attr "isa" "base,x64,nox64,x64_sse2,x64_sse4,x64_sse4_noavx,
> -                   x64_avx,x64_avx512bw,x64_avx512dq,aes,apx_ndd,
> +                   x64_avx,x64_avx512bw,x64_avx512dq,apx_ndd,
>                     sse_noavx,sse2,sse2_noavx,sse3,sse3_noavx,sse4,sse4_noavx,
>                     avx,noavx,avx2,noavx2,bmi,bmi2,fma4,fma,avx512f,avx512f_512,
>                     noavx512f,avx512bw,avx512bw_512,noavx512bw,avx512dq,
>                     noavx512dq,fma_or_avx512vl,avx512vl,noavx512vl,avxvnni,
>                     avx512vnnivl,avx512fp16,avxifma,avx512ifmavl,avxneconvert,
> -                   avx512bf16vl,vpclmulqdqvl,avx_noavx512f,avx_noavx512vl"
> +                   avx512bf16vl,vpclmulqdqvl,avx_noavx512f,avx_noavx512vl,
> +                   aes_avx,vaes_avx512vl"
>    (const_string "base"))
>
>  ;; The (bounding maximum) length of an instruction immediate.
> @@ -915,7 +916,6 @@ (define_attr "enabled" ""
>            (symbol_ref "TARGET_64BIT && TARGET_AVX512BW")
>          (eq_attr "isa" "x64_avx512dq")
>            (symbol_ref "TARGET_64BIT && TARGET_AVX512DQ")
> -        (eq_attr "isa" "aes") (symbol_ref "TARGET_AES")
>          (eq_attr "isa" "sse_noavx")
>            (symbol_ref "TARGET_SSE && !TARGET_AVX")
>          (eq_attr "isa" "sse2") (symbol_ref "TARGET_SSE2")
> @@ -968,6 +968,10 @@ (define_attr "enabled" ""
>            (symbol_ref "TARGET_VPCLMULQDQ && TARGET_AVX512VL")
>          (eq_attr "isa" "apx_ndd")
>            (symbol_ref "TARGET_APX_NDD")
> +        (eq_attr "isa" "aes_avx")
> +          (symbol_ref "TARGET_AES && TARGET_AVX")
> +        (eq_attr "isa" "vaes_avx512vl")
> +          (symbol_ref "TARGET_VAES && TARGET_AVX512VL")
>
>          (eq_attr "mmx_isa" "native")
>            (symbol_ref "!TARGET_MMX_WITH_SSE")
> --- gcc/config/i386/sse.md.jj   2024-03-18 22:15:43.168839437 +0100
> +++ gcc/config/i386/sse.md      2024-04-04 00:58:56.482090689 +0200
> @@ -26277,75 +26277,79 @@ (define_insn "xop_vpermil2<mode>3"
>  ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>
>  (define_insn "aesenc"
> -  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
> -       (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
> -                      (match_operand:V2DI 2 "vector_operand" "xja,xm,vm")]
> +  [(set (match_operand:V2DI 0 "register_operand" "=x,x,x,v")
> +       (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,x,v")
> +                      (match_operand:V2DI 2 "vector_operand" "xja,xjm,xjm,vm")]
>                       UNSPEC_AESENC))]
>    "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
>    "@
>     aesenc\t{%2, %0|%0, %2}
>     vaesenc\t{%2, %1, %0|%0, %1, %2}
> +   %{evex%} vaesenc\t{%2, %1, %0|%0, %1, %2}
I think we can merge alternative 2 with 3 to
*  return TARGET_AES ? \"vaesenc\t{%2, %1, %0|%0, %1, %2}"\" :
\"%{evex%} vaesenc\t{%2, %1, %0|%0, %1, %2}\";
Then it can handle vaes_avx512vl + -mno-aes case.
>     vaesenc\t{%2, %1, %0|%0, %1, %2}"
> -  [(set_attr "isa" "noavx,aes,avx512vl")
> +  [(set_attr "isa" "noavx,aes_avx,vaes_avx512vl,vaes_avx512vl")
>     (set_attr "type" "sselog1")
> -   (set_attr "addr" "gpr16,*,*")
> +   (set_attr "addr" "gpr16,*,*,*")
>     (set_attr "prefix_extra" "1")
> -   (set_attr "prefix" "orig,vex,evex")
> -   (set_attr "btver2_decode" "double,double,double")
> +   (set_attr "prefix" "orig,vex,evex,evex")
> +   (set_attr "btver2_decode" "double,double,double,double")
>     (set_attr "mode" "TI")])
>
>  (define_insn "aesenclast"
> -  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
> -       (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
> -                      (match_operand:V2DI 2 "vector_operand" "xja,xm,vm")]
> +  [(set (match_operand:V2DI 0 "register_operand" "=x,x,x,v")
> +       (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,x,v")
> +                      (match_operand:V2DI 2 "vector_operand" "xja,xjm,xjm,vm")]
>                       UNSPEC_AESENCLAST))]
>    "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
>    "@
>     aesenclast\t{%2, %0|%0, %2}
>     vaesenclast\t{%2, %1, %0|%0, %1, %2}
> +   %{evex%} vaesenclast\t{%2, %1, %0|%0, %1, %2}
Ditto.
>     vaesenclast\t{%2, %1, %0|%0, %1, %2}"
> -  [(set_attr "isa" "noavx,aes,avx512vl")
> +  [(set_attr "isa" "noavx,aes_avx,vaes_avx512vl,vaes_avx512vl")
>     (set_attr "type" "sselog1")
> -   (set_attr "addr" "gpr16,*,*")
> +   (set_attr "addr" "gpr16,*,*,*")
>     (set_attr "prefix_extra" "1")
> -   (set_attr "prefix" "orig,vex,evex")
> -   (set_attr "btver2_decode" "double,double,double")
> +   (set_attr "prefix" "orig,vex,evex,evex")
> +   (set_attr "btver2_decode" "double,double,double,double")
>     (set_attr "mode" "TI")])
>
>  (define_insn "aesdec"
> -  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
> -       (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
> -                      (match_operand:V2DI 2 "vector_operand" "xja,xm,vm")]
> +  [(set (match_operand:V2DI 0 "register_operand" "=x,x,x,v")
> +       (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,x,v")
> +                      (match_operand:V2DI 2 "vector_operand" "xja,xjm,xjm,vm")]
>                       UNSPEC_AESDEC))]
>    "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
>    "@
>     aesdec\t{%2, %0|%0, %2}
>     vaesdec\t{%2, %1, %0|%0, %1, %2}
> +   %{evex%} vaesdec\t{%2, %1, %0|%0, %1, %2}
Ditto.
>     vaesdec\t{%2, %1, %0|%0, %1, %2}"
> -  [(set_attr "isa" "noavx,aes,avx512vl")
> +  [(set_attr "isa" "noavx,aes_avx,vaes_avx512vl,vaes_avx512vl")
>     (set_attr "type" "sselog1")
> -   (set_attr "addr" "gpr16,*,*")
> +   (set_attr "addr" "gpr16,*,*,*")
>     (set_attr "prefix_extra" "1")
> -   (set_attr "prefix" "orig,vex,evex")
> -   (set_attr "btver2_decode" "double,double,double")
> +   (set_attr "prefix" "orig,vex,evex,evex")
> +   (set_attr "btver2_decode" "double,double,double,double")
>     (set_attr "mode" "TI")])
>
>  (define_insn "aesdeclast"
> -  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
> -       (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
> -                      (match_operand:V2DI 2 "vector_operand" "xja,xm,vm")]
> +  [(set (match_operand:V2DI 0 "register_operand" "=x,x,x,v")
> +       (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,x,v")
> +                      (match_operand:V2DI 2 "vector_operand" "xja,xjm,xjm,vm")]
>                       UNSPEC_AESDECLAST))]
>    "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
>    "@
>     aesdeclast\t{%2, %0|%0, %2}
>     vaesdeclast\t{%2, %1, %0|%0, %1, %2}
> +   %{evex%} vaesdeclast\t{%2, %1, %0|%0, %1, %2}
Ditto.
>     vaesdeclast\t{%2, %1, %0|%0, %1, %2}"
> -  [(set_attr "isa" "noavx,aes,avx512vl")
> -   (set_attr "addr" "gpr16,*,*")
> +  [(set_attr "isa" "noavx,aes_avx,vaes_avx512vl,vaes_avx512vl")
> +   (set_attr "addr" "gpr16,*,*,*")
>     (set_attr "type" "sselog1")
>     (set_attr "prefix_extra" "1")
> -   (set_attr "prefix" "orig,vex,evex")
> -   (set_attr "btver2_decode" "double,double,double")
> +   (set_attr "prefix" "orig,vex,evex,evex")
> +   (set_attr "btver2_decode" "double,double,double,double")
>     (set_attr "mode" "TI")])
>
>  (define_insn "aesimc"
> @@ -30246,44 +30250,60 @@ (define_insn "vpdpwssds_<mode>_maskz_1"
>     [(set_attr ("prefix") ("evex"))])
>
>  (define_insn "vaesdec_<mode>"
> -  [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=v")
> +  [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=x,v")
>         (unspec:VI1_AVX512VL_F
> -         [(match_operand:VI1_AVX512VL_F 1 "register_operand" "v")
> -          (match_operand:VI1_AVX512VL_F 2 "vector_operand" "vm")]
> +         [(match_operand:VI1_AVX512VL_F 1 "register_operand" "x,v")
> +          (match_operand:VI1_AVX512VL_F 2 "vector_operand" "xjm,vm")]
>           UNSPEC_VAESDEC))]
>    "TARGET_VAES"
> -  "vaesdec\t{%2, %1, %0|%0, %1, %2}"
> -)
> +{
> +  if (which_alternative == 0 && <MODE>mode == V16QImode)
> +    return "%{evex%} vaesdec\t{%2, %1, %0|%0, %1, %2}";
Similar, but something like
*  return TARGET_AES || <MODE>mode != V16QImode ? \"vaesenc\t{%2, %1,
%0|%0, %1, %2}"\" : \"%{evex%} vaesenc\t{%2, %1, %0|%0, %1, %2}\";

> +  else
> +    return "vaesdec\t{%2, %1, %0|%0, %1, %2}";
> +})
>
>  (define_insn "vaesdeclast_<mode>"
> -  [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=v")
> +  [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=x,v")
>         (unspec:VI1_AVX512VL_F
> -         [(match_operand:VI1_AVX512VL_F 1 "register_operand" "v")
> -          (match_operand:VI1_AVX512VL_F 2 "vector_operand" "vm")]
> +         [(match_operand:VI1_AVX512VL_F 1 "register_operand" "x,v")
> +          (match_operand:VI1_AVX512VL_F 2 "vector_operand" "xjm,vm")]
>           UNSPEC_VAESDECLAST))]
>    "TARGET_VAES"
> -  "vaesdeclast\t{%2, %1, %0|%0, %1, %2}"
> -)
> +{
> +  if (which_alternative == 0 && <MODE>mode == V16QImode)
> +    return "%{evex%} vaesdeclast\t{%2, %1, %0|%0, %1, %2}";
Ditto.
> +  else
> +    return "vaesdeclast\t{%2, %1, %0|%0, %1, %2}";
> +})
>
>  (define_insn "vaesenc_<mode>"
> -  [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=v")
> +  [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=x,v")
>         (unspec:VI1_AVX512VL_F
> -         [(match_operand:VI1_AVX512VL_F 1 "register_operand" "v")
> -          (match_operand:VI1_AVX512VL_F 2 "vector_operand" "vm")]
> +         [(match_operand:VI1_AVX512VL_F 1 "register_operand" "x,v")
> +          (match_operand:VI1_AVX512VL_F 2 "vector_operand" "xjm,vm")]
>           UNSPEC_VAESENC))]
>    "TARGET_VAES"
> -  "vaesenc\t{%2, %1, %0|%0, %1, %2}"
> -)
> +{
> +  if (which_alternative == 0 && <MODE>mode == V16QImode)
> +    return "%{evex%} vaesenc\t{%2, %1, %0|%0, %1, %2}";
Ditto.
> +  else
> +    return "vaesenc\t{%2, %1, %0|%0, %1, %2}";
> +})
>
>  (define_insn "vaesenclast_<mode>"
> -  [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=v")
> +  [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=x,v")
>         (unspec:VI1_AVX512VL_F
> -         [(match_operand:VI1_AVX512VL_F 1 "register_operand" "v")
> -          (match_operand:VI1_AVX512VL_F 2 "vector_operand" "vm")]
> +         [(match_operand:VI1_AVX512VL_F 1 "register_operand" "x,v")
> +          (match_operand:VI1_AVX512VL_F 2 "vector_operand" "xjm,vm")]
>           UNSPEC_VAESENCLAST))]
>    "TARGET_VAES"
> -  "vaesenclast\t{%2, %1, %0|%0, %1, %2}"
> -)
> +{
> +  if (which_alternative == 0 && <MODE>mode == V16QImode)
> +    return "%{evex%} vaesenclast\t{%2, %1, %0|%0, %1, %2}";
Ditto.
> +  else
> +    return "vaesenclast\t{%2, %1, %0|%0, %1, %2}";
> +})
>
>  (define_insn "vpclmulqdq_<mode>"
>    [(set (match_operand:VI8_FVL 0 "register_operand" "=v")
> --- gcc/testsuite/gcc.target/i386/aes-pr114576.c.jj     2024-04-04 09:50:17.117757179 +0200
> +++ gcc/testsuite/gcc.target/i386/aes-pr114576.c        2024-04-04 09:51:45.211544801 +0200
> @@ -0,0 +1,63 @@
> +/* PR target/114576 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -maes -mno-avx" } */
> +/* { dg-final { scan-assembler-times "\taesenc\t" 2 } } */
> +/* { dg-final { scan-assembler-times "\taesdec\t" 2 } } */
> +/* { dg-final { scan-assembler-times "\taesenclast\t" 2 } } */
> +/* { dg-final { scan-assembler-times "\taesdeclast\t" 2 } } */
> +/* { dg-final { scan-assembler-not "\tvaesenc" } } */
> +/* { dg-final { scan-assembler-not "\tvaesdec" } } */
> +
> +#include <immintrin.h>
> +
> +__m128i
> +f1 (__m128i x, __m128i y)
> +{
> +  return _mm_aesenc_si128 (x, y);
> +}
> +
> +__m128i
> +f2 (__m128i x, __m128i y)
> +{
> +  __m128i z = _mm_aesenc_si128 (x, y);
> +  return z + x + y;
> +}
> +
> +__m128i
> +f3 (__m128i x, __m128i y)
> +{
> +  return _mm_aesdec_si128 (x, y);
> +}
> +
> +__m128i
> +f4 (__m128i x, __m128i y)
> +{
> +  __m128i z = _mm_aesdec_si128 (x, y);
> +  return z + x + y;
> +}
> +
> +__m128i
> +f5 (__m128i x, __m128i y)
> +{
> +  return _mm_aesenclast_si128 (x, y);
> +}
> +
> +__m128i
> +f6 (__m128i x, __m128i y)
> +{
> +  __m128i z = _mm_aesenclast_si128 (x, y);
> +  return z + x + y;
> +}
> +
> +__m128i
> +f7 (__m128i x, __m128i y)
> +{
> +  return _mm_aesdeclast_si128 (x, y);
> +}
> +
> +__m128i
> +f8 (__m128i x, __m128i y)
> +{
> +  __m128i z = _mm_aesdeclast_si128 (x, y);
> +  return z + x + y;
> +}
>
>
>         Jakub
>


-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH] i386, v2: Fix aes/vaes patterns [PR114576]
  2024-04-09  3:23       ` Hongtao Liu
@ 2024-04-09  9:18         ` Jakub Jelinek
  2024-04-09 10:32           ` Hongtao Liu
  0 siblings, 1 reply; 12+ messages in thread
From: Jakub Jelinek @ 2024-04-09  9:18 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: Jiang, Haochen, gcc-patches, Liu, Hongtao, ubizjak

On Tue, Apr 09, 2024 at 11:23:40AM +0800, Hongtao Liu wrote:
> I think we can merge alternative 2 with 3 to
> *  return TARGET_AES ? \"vaesenc\t{%2, %1, %0|%0, %1, %2}"\" :
> \"%{evex%} vaesenc\t{%2, %1, %0|%0, %1, %2}\";
> Then it can handle vaes_avx512vl + -mno-aes case.

Ok, done in the patch below.

> > @@ -30246,44 +30250,60 @@ (define_insn "vpdpwssds_<mode>_maskz_1"
> >     [(set_attr ("prefix") ("evex"))])
> >
> >  (define_insn "vaesdec_<mode>"
> > -  [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=v")
> > +  [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=x,v")
> >         (unspec:VI1_AVX512VL_F
> > -         [(match_operand:VI1_AVX512VL_F 1 "register_operand" "v")
> > -          (match_operand:VI1_AVX512VL_F 2 "vector_operand" "vm")]
> > +         [(match_operand:VI1_AVX512VL_F 1 "register_operand" "x,v")
> > +          (match_operand:VI1_AVX512VL_F 2 "vector_operand" "xjm,vm")]
> >           UNSPEC_VAESDEC))]
> >    "TARGET_VAES"
> > -  "vaesdec\t{%2, %1, %0|%0, %1, %2}"
> > -)
> > +{
> > +  if (which_alternative == 0 && <MODE>mode == V16QImode)
> > +    return "%{evex%} vaesdec\t{%2, %1, %0|%0, %1, %2}";
> Similar, but something like
> *  return TARGET_AES || <MODE>mode != V16QImode ? \"vaesenc\t{%2, %1,
> %0|%0, %1, %2}"\" : \"%{evex%} vaesenc\t{%2, %1, %0|%0, %1, %2}\";

For a single alternative, it would need to be
{
  return x86_evex_reg_mentioned_p (operands, 3)
	 ? \"vaesenc\t{%2, %1, %0|%0, %1, %2}\"
	 : \"%{evex%} vaesenc\t{%2, %1, %0|%0, %1, %2}\";
}
(* return would just mean uselessly too long line).
Is that what you want instead?  I thought the 2 separate alternatives
where only the latter covers those cases is more readable...

The following patch just changes the aes* patterns, not the vaes* ones.

2024-04-09  Jakub Jelinek  <jakub@redhat.com>

	PR target/114576
	* config/i386/i386.md (isa): Remove aes, add vaes_avx512vl.
	(enabled): Remove aes isa check, add vaes_avx512vl.
	* config/i386/sse.md (aesenc, aesenclast, aesdec, aesdeclast): Use
	jm instead of m for second alternative and emit {evex} prefix
	for it if !TARGET_AES.  Use noavx,avx,vaes_avx512vl isa attribute.
	(vaesdec_<mode>, vaesdeclast_<mode>, vaesenc_<mode>,
	vaesenclast_<mode>): Add second alternative with x instead of v
	and jm instead of m.

	* gcc.target/i386/aes-pr114576.c: New test.

--- gcc/config/i386/i386.md.jj	2024-04-09 08:12:29.259451422 +0200
+++ gcc/config/i386/i386.md	2024-04-09 10:53:24.965516804 +0200
@@ -568,13 +568,14 @@ (define_attr "unit" "integer,i387,sse,mm
 
 ;; Used to control the "enabled" attribute on a per-instruction basis.
 (define_attr "isa" "base,x64,nox64,x64_sse2,x64_sse4,x64_sse4_noavx,
-		    x64_avx,x64_avx512bw,x64_avx512dq,aes,apx_ndd,
+		    x64_avx,x64_avx512bw,x64_avx512dq,apx_ndd,
 		    sse_noavx,sse2,sse2_noavx,sse3,sse3_noavx,sse4,sse4_noavx,
 		    avx,noavx,avx2,noavx2,bmi,bmi2,fma4,fma,avx512f,avx512f_512,
 		    noavx512f,avx512bw,avx512bw_512,noavx512bw,avx512dq,
 		    noavx512dq,fma_or_avx512vl,avx512vl,noavx512vl,avxvnni,
 		    avx512vnnivl,avx512fp16,avxifma,avx512ifmavl,avxneconvert,
-		    avx512bf16vl,vpclmulqdqvl,avx_noavx512f,avx_noavx512vl"
+		    avx512bf16vl,vpclmulqdqvl,avx_noavx512f,avx_noavx512vl,
+		    vaes_avx512vl"
   (const_string "base"))
 
 ;; The (bounding maximum) length of an instruction immediate.
@@ -915,7 +916,6 @@ (define_attr "enabled" ""
 	   (symbol_ref "TARGET_64BIT && TARGET_AVX512BW")
 	 (eq_attr "isa" "x64_avx512dq")
 	   (symbol_ref "TARGET_64BIT && TARGET_AVX512DQ")
-	 (eq_attr "isa" "aes") (symbol_ref "TARGET_AES")
 	 (eq_attr "isa" "sse_noavx")
 	   (symbol_ref "TARGET_SSE && !TARGET_AVX")
 	 (eq_attr "isa" "sse2") (symbol_ref "TARGET_SSE2")
@@ -968,6 +968,8 @@ (define_attr "enabled" ""
 	   (symbol_ref "TARGET_VPCLMULQDQ && TARGET_AVX512VL")
 	 (eq_attr "isa" "apx_ndd")
 	   (symbol_ref "TARGET_APX_NDD")
+	 (eq_attr "isa" "vaes_avx512vl")
+	   (symbol_ref "TARGET_VAES && TARGET_AVX512VL")
 
 	 (eq_attr "mmx_isa" "native")
 	   (symbol_ref "!TARGET_MMX_WITH_SSE")
--- gcc/config/i386/sse.md.jj	2024-04-04 10:43:32.107789627 +0200
+++ gcc/config/i386/sse.md	2024-04-09 10:53:06.138772957 +0200
@@ -26279,72 +26279,72 @@ (define_insn "xop_vpermil2<mode>3"
 (define_insn "aesenc"
   [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
 	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
-		       (match_operand:V2DI 2 "vector_operand" "xja,xm,vm")]
+		       (match_operand:V2DI 2 "vector_operand" "xja,xjm,vm")]
 		      UNSPEC_AESENC))]
   "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
   "@
    aesenc\t{%2, %0|%0, %2}
-   vaesenc\t{%2, %1, %0|%0, %1, %2}
+   * return TARGET_AES ? \"vaesenc\t{%2, %1, %0|%0, %1, %2}\" : \"%{evex%} vaesenc\t{%2, %1, %0|%0, %1, %2}\";
    vaesenc\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "isa" "noavx,aes,avx512vl")
+  [(set_attr "isa" "noavx,avx,vaes_avx512vl")
    (set_attr "type" "sselog1")
    (set_attr "addr" "gpr16,*,*")
    (set_attr "prefix_extra" "1")
-   (set_attr "prefix" "orig,vex,evex")
+   (set_attr "prefix" "orig,maybe_evex,evex")
    (set_attr "btver2_decode" "double,double,double")
    (set_attr "mode" "TI")])
 
 (define_insn "aesenclast"
   [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
 	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
-		       (match_operand:V2DI 2 "vector_operand" "xja,xm,vm")]
+		       (match_operand:V2DI 2 "vector_operand" "xja,xjm,vm")]
 		      UNSPEC_AESENCLAST))]
   "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
   "@
    aesenclast\t{%2, %0|%0, %2}
-   vaesenclast\t{%2, %1, %0|%0, %1, %2}
+   * return TARGET_AES ? \"vaesenclast\t{%2, %1, %0|%0, %1, %2}\" : \"%{evex%} vaesenclast\t{%2, %1, %0|%0, %1, %2}\";
    vaesenclast\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "isa" "noavx,aes,avx512vl")
+  [(set_attr "isa" "noavx,avx,vaes_avx512vl")
    (set_attr "type" "sselog1")
    (set_attr "addr" "gpr16,*,*")
    (set_attr "prefix_extra" "1")
-   (set_attr "prefix" "orig,vex,evex")
-   (set_attr "btver2_decode" "double,double,double") 
+   (set_attr "prefix" "orig,maybe_evex,evex")
+   (set_attr "btver2_decode" "double,double,double")
    (set_attr "mode" "TI")])
 
 (define_insn "aesdec"
   [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
 	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
-		       (match_operand:V2DI 2 "vector_operand" "xja,xm,vm")]
+		       (match_operand:V2DI 2 "vector_operand" "xja,xjm,vm")]
 		      UNSPEC_AESDEC))]
   "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
   "@
    aesdec\t{%2, %0|%0, %2}
-   vaesdec\t{%2, %1, %0|%0, %1, %2}
+   * return TARGET_AES ? \"vaesdec\t{%2, %1, %0|%0, %1, %2}\" : \"%{evex%} vaesdec\t{%2, %1, %0|%0, %1, %2}\";
    vaesdec\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "isa" "noavx,aes,avx512vl")
+  [(set_attr "isa" "noavx,avx,vaes_avx512vl")
    (set_attr "type" "sselog1")
    (set_attr "addr" "gpr16,*,*")
    (set_attr "prefix_extra" "1")
-   (set_attr "prefix" "orig,vex,evex")
+   (set_attr "prefix" "orig,maybe_evex,evex")
    (set_attr "btver2_decode" "double,double,double") 
    (set_attr "mode" "TI")])
 
 (define_insn "aesdeclast"
   [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
 	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
-		       (match_operand:V2DI 2 "vector_operand" "xja,xm,vm")]
+		       (match_operand:V2DI 2 "vector_operand" "xja,xjm,vm")]
 		      UNSPEC_AESDECLAST))]
   "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
   "@
    aesdeclast\t{%2, %0|%0, %2}
-   vaesdeclast\t{%2, %1, %0|%0, %1, %2}
+   * return TARGET_AES ? \"vaesdeclast\t{%2, %1, %0|%0, %1, %2}\" : \"%{evex%} vaesdeclast\t{%2, %1, %0|%0, %1, %2}\";
    vaesdeclast\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "isa" "noavx,aes,avx512vl")
+  [(set_attr "isa" "noavx,avx,vaes_avx512vl")
    (set_attr "addr" "gpr16,*,*")
    (set_attr "type" "sselog1")
    (set_attr "prefix_extra" "1")
-   (set_attr "prefix" "orig,vex,evex")
+   (set_attr "prefix" "orig,maybe_evex,evex")
    (set_attr "btver2_decode" "double,double,double")
    (set_attr "mode" "TI")])
 
@@ -30246,44 +30246,60 @@ (define_insn "vpdpwssds_<mode>_maskz_1"
    [(set_attr ("prefix") ("evex"))])
 
 (define_insn "vaesdec_<mode>"
-  [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=v")
+  [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=x,v")
 	(unspec:VI1_AVX512VL_F
-	  [(match_operand:VI1_AVX512VL_F 1 "register_operand" "v")
-	   (match_operand:VI1_AVX512VL_F 2 "vector_operand" "vm")]
+	  [(match_operand:VI1_AVX512VL_F 1 "register_operand" "x,v")
+	   (match_operand:VI1_AVX512VL_F 2 "vector_operand" "xjm,vm")]
 	  UNSPEC_VAESDEC))]
   "TARGET_VAES"
-  "vaesdec\t{%2, %1, %0|%0, %1, %2}"
-)
+{
+  if (which_alternative == 0 && <MODE>mode == V16QImode)
+    return "%{evex%} vaesdec\t{%2, %1, %0|%0, %1, %2}";
+  else
+    return "vaesdec\t{%2, %1, %0|%0, %1, %2}";
+})
 
 (define_insn "vaesdeclast_<mode>"
-  [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=v")
+  [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=x,v")
 	(unspec:VI1_AVX512VL_F
-	  [(match_operand:VI1_AVX512VL_F 1 "register_operand" "v")
-	   (match_operand:VI1_AVX512VL_F 2 "vector_operand" "vm")]
+	  [(match_operand:VI1_AVX512VL_F 1 "register_operand" "x,v")
+	   (match_operand:VI1_AVX512VL_F 2 "vector_operand" "xjm,vm")]
 	  UNSPEC_VAESDECLAST))]
   "TARGET_VAES"
-  "vaesdeclast\t{%2, %1, %0|%0, %1, %2}"
-)
+{
+  if (which_alternative == 0 && <MODE>mode == V16QImode)
+    return "%{evex%} vaesdeclast\t{%2, %1, %0|%0, %1, %2}";
+  else
+    return "vaesdeclast\t{%2, %1, %0|%0, %1, %2}";
+})
 
 (define_insn "vaesenc_<mode>"
-  [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=v")
+  [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=x,v")
 	(unspec:VI1_AVX512VL_F
-	  [(match_operand:VI1_AVX512VL_F 1 "register_operand" "v")
-	   (match_operand:VI1_AVX512VL_F 2 "vector_operand" "vm")]
+	  [(match_operand:VI1_AVX512VL_F 1 "register_operand" "x,v")
+	   (match_operand:VI1_AVX512VL_F 2 "vector_operand" "xjm,vm")]
 	  UNSPEC_VAESENC))]
   "TARGET_VAES"
-  "vaesenc\t{%2, %1, %0|%0, %1, %2}"
-)
+{
+  if (which_alternative == 0 && <MODE>mode == V16QImode)
+    return "%{evex%} vaesenc\t{%2, %1, %0|%0, %1, %2}";
+  else
+    return "vaesenc\t{%2, %1, %0|%0, %1, %2}";
+})
 
 (define_insn "vaesenclast_<mode>"
-  [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=v")
+  [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=x,v")
 	(unspec:VI1_AVX512VL_F
-	  [(match_operand:VI1_AVX512VL_F 1 "register_operand" "v")
-	   (match_operand:VI1_AVX512VL_F 2 "vector_operand" "vm")]
+	  [(match_operand:VI1_AVX512VL_F 1 "register_operand" "x,v")
+	   (match_operand:VI1_AVX512VL_F 2 "vector_operand" "xjm,vm")]
 	  UNSPEC_VAESENCLAST))]
   "TARGET_VAES"
-  "vaesenclast\t{%2, %1, %0|%0, %1, %2}"
-)
+{
+  if (which_alternative == 0 && <MODE>mode == V16QImode)
+    return "%{evex%} vaesenclast\t{%2, %1, %0|%0, %1, %2}";
+  else
+    return "vaesenclast\t{%2, %1, %0|%0, %1, %2}";
+})
 
 (define_insn "vpclmulqdq_<mode>"
   [(set (match_operand:VI8_FVL 0 "register_operand" "=v")
--- gcc/testsuite/gcc.target/i386/aes-pr114576.c.jj	2024-04-09 10:27:32.782646751 +0200
+++ gcc/testsuite/gcc.target/i386/aes-pr114576.c	2024-04-09 10:27:32.782646751 +0200
@@ -0,0 +1,63 @@
+/* PR target/114576 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -maes -mno-avx" } */
+/* { dg-final { scan-assembler-times "\taesenc\t" 2 } } */
+/* { dg-final { scan-assembler-times "\taesdec\t" 2 } } */
+/* { dg-final { scan-assembler-times "\taesenclast\t" 2 } } */
+/* { dg-final { scan-assembler-times "\taesdeclast\t" 2 } } */
+/* { dg-final { scan-assembler-not "\tvaesenc" } } */
+/* { dg-final { scan-assembler-not "\tvaesdec" } } */
+
+#include <immintrin.h>
+
+__m128i
+f1 (__m128i x, __m128i y)
+{
+  return _mm_aesenc_si128 (x, y);
+}
+
+__m128i
+f2 (__m128i x, __m128i y)
+{
+  __m128i z = _mm_aesenc_si128 (x, y);
+  return z + x + y;
+}
+
+__m128i
+f3 (__m128i x, __m128i y)
+{
+  return _mm_aesdec_si128 (x, y);
+}
+
+__m128i
+f4 (__m128i x, __m128i y)
+{
+  __m128i z = _mm_aesdec_si128 (x, y);
+  return z + x + y;
+}
+
+__m128i
+f5 (__m128i x, __m128i y)
+{
+  return _mm_aesenclast_si128 (x, y);
+}
+
+__m128i
+f6 (__m128i x, __m128i y)
+{
+  __m128i z = _mm_aesenclast_si128 (x, y);
+  return z + x + y;
+}
+
+__m128i
+f7 (__m128i x, __m128i y)
+{
+  return _mm_aesdeclast_si128 (x, y);
+}
+
+__m128i
+f8 (__m128i x, __m128i y)
+{
+  __m128i z = _mm_aesdeclast_si128 (x, y);
+  return z + x + y;
+}


	Jakub


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] i386, v2: Fix aes/vaes patterns [PR114576]
  2024-04-09  9:18         ` [PATCH] i386, v2: " Jakub Jelinek
@ 2024-04-09 10:32           ` Hongtao Liu
  0 siblings, 0 replies; 12+ messages in thread
From: Hongtao Liu @ 2024-04-09 10:32 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Jiang, Haochen, gcc-patches, Liu, Hongtao, ubizjak

On Tue, Apr 9, 2024 at 5:18 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Tue, Apr 09, 2024 at 11:23:40AM +0800, Hongtao Liu wrote:
> > I think we can merge alternative 2 with 3 to
> > *  return TARGET_AES ? \"vaesenc\t{%2, %1, %0|%0, %1, %2}"\" :
> > \"%{evex%} vaesenc\t{%2, %1, %0|%0, %1, %2}\";
> > Then it can handle vaes_avx512vl + -mno-aes case.
>
> Ok, done in the patch below.
>
> > > @@ -30246,44 +30250,60 @@ (define_insn "vpdpwssds_<mode>_maskz_1"
> > >     [(set_attr ("prefix") ("evex"))])
> > >
> > >  (define_insn "vaesdec_<mode>"
> > > -  [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=v")
> > > +  [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=x,v")
> > >         (unspec:VI1_AVX512VL_F
> > > -         [(match_operand:VI1_AVX512VL_F 1 "register_operand" "v")
> > > -          (match_operand:VI1_AVX512VL_F 2 "vector_operand" "vm")]
> > > +         [(match_operand:VI1_AVX512VL_F 1 "register_operand" "x,v")
> > > +          (match_operand:VI1_AVX512VL_F 2 "vector_operand" "xjm,vm")]
> > >           UNSPEC_VAESDEC))]
> > >    "TARGET_VAES"
> > > -  "vaesdec\t{%2, %1, %0|%0, %1, %2}"
> > > -)
> > > +{
> > > +  if (which_alternative == 0 && <MODE>mode == V16QImode)
> > > +    return "%{evex%} vaesdec\t{%2, %1, %0|%0, %1, %2}";
> > Similar, but something like
> > *  return TARGET_AES || <MODE>mode != V16QImode ? \"vaesenc\t{%2, %1,
> > %0|%0, %1, %2}"\" : \"%{evex%} vaesenc\t{%2, %1, %0|%0, %1, %2}\";
>
> For a single alternative, it would need to be
> {
>   return x86_evex_reg_mentioned_p (operands, 3)
>          ? \"vaesenc\t{%2, %1, %0|%0, %1, %2}\"
>          : \"%{evex%} vaesenc\t{%2, %1, %0|%0, %1, %2}\";
> }
> (* return would just mean uselessly too long line).
> Is that what you want instead?  I thought the 2 separate alternatives
> where only the latter covers those cases is more readable...
>
> The following patch just changes the aes* patterns, not the vaes* ones.
Patch LGTM.
>
> 2024-04-09  Jakub Jelinek  <jakub@redhat.com>
>
>         PR target/114576
>         * config/i386/i386.md (isa): Remove aes, add vaes_avx512vl.
>         (enabled): Remove aes isa check, add vaes_avx512vl.
>         * config/i386/sse.md (aesenc, aesenclast, aesdec, aesdeclast): Use
>         jm instead of m for second alternative and emit {evex} prefix
>         for it if !TARGET_AES.  Use noavx,avx,vaes_avx512vl isa attribute.
>         (vaesdec_<mode>, vaesdeclast_<mode>, vaesenc_<mode>,
>         vaesenclast_<mode>): Add second alternative with x instead of v
>         and jm instead of m.
>
>         * gcc.target/i386/aes-pr114576.c: New test.
>
> --- gcc/config/i386/i386.md.jj  2024-04-09 08:12:29.259451422 +0200
> +++ gcc/config/i386/i386.md     2024-04-09 10:53:24.965516804 +0200
> @@ -568,13 +568,14 @@ (define_attr "unit" "integer,i387,sse,mm
>
>  ;; Used to control the "enabled" attribute on a per-instruction basis.
>  (define_attr "isa" "base,x64,nox64,x64_sse2,x64_sse4,x64_sse4_noavx,
> -                   x64_avx,x64_avx512bw,x64_avx512dq,aes,apx_ndd,
> +                   x64_avx,x64_avx512bw,x64_avx512dq,apx_ndd,
>                     sse_noavx,sse2,sse2_noavx,sse3,sse3_noavx,sse4,sse4_noavx,
>                     avx,noavx,avx2,noavx2,bmi,bmi2,fma4,fma,avx512f,avx512f_512,
>                     noavx512f,avx512bw,avx512bw_512,noavx512bw,avx512dq,
>                     noavx512dq,fma_or_avx512vl,avx512vl,noavx512vl,avxvnni,
>                     avx512vnnivl,avx512fp16,avxifma,avx512ifmavl,avxneconvert,
> -                   avx512bf16vl,vpclmulqdqvl,avx_noavx512f,avx_noavx512vl"
> +                   avx512bf16vl,vpclmulqdqvl,avx_noavx512f,avx_noavx512vl,
> +                   vaes_avx512vl"
>    (const_string "base"))
>
>  ;; The (bounding maximum) length of an instruction immediate.
> @@ -915,7 +916,6 @@ (define_attr "enabled" ""
>            (symbol_ref "TARGET_64BIT && TARGET_AVX512BW")
>          (eq_attr "isa" "x64_avx512dq")
>            (symbol_ref "TARGET_64BIT && TARGET_AVX512DQ")
> -        (eq_attr "isa" "aes") (symbol_ref "TARGET_AES")
>          (eq_attr "isa" "sse_noavx")
>            (symbol_ref "TARGET_SSE && !TARGET_AVX")
>          (eq_attr "isa" "sse2") (symbol_ref "TARGET_SSE2")
> @@ -968,6 +968,8 @@ (define_attr "enabled" ""
>            (symbol_ref "TARGET_VPCLMULQDQ && TARGET_AVX512VL")
>          (eq_attr "isa" "apx_ndd")
>            (symbol_ref "TARGET_APX_NDD")
> +        (eq_attr "isa" "vaes_avx512vl")
> +          (symbol_ref "TARGET_VAES && TARGET_AVX512VL")
>
>          (eq_attr "mmx_isa" "native")
>            (symbol_ref "!TARGET_MMX_WITH_SSE")
> --- gcc/config/i386/sse.md.jj   2024-04-04 10:43:32.107789627 +0200
> +++ gcc/config/i386/sse.md      2024-04-09 10:53:06.138772957 +0200
> @@ -26279,72 +26279,72 @@ (define_insn "xop_vpermil2<mode>3"
>  (define_insn "aesenc"
>    [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
>         (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
> -                      (match_operand:V2DI 2 "vector_operand" "xja,xm,vm")]
> +                      (match_operand:V2DI 2 "vector_operand" "xja,xjm,vm")]
>                       UNSPEC_AESENC))]
>    "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
>    "@
>     aesenc\t{%2, %0|%0, %2}
> -   vaesenc\t{%2, %1, %0|%0, %1, %2}
> +   * return TARGET_AES ? \"vaesenc\t{%2, %1, %0|%0, %1, %2}\" : \"%{evex%} vaesenc\t{%2, %1, %0|%0, %1, %2}\";
>     vaesenc\t{%2, %1, %0|%0, %1, %2}"
> -  [(set_attr "isa" "noavx,aes,avx512vl")
> +  [(set_attr "isa" "noavx,avx,vaes_avx512vl")
>     (set_attr "type" "sselog1")
>     (set_attr "addr" "gpr16,*,*")
>     (set_attr "prefix_extra" "1")
> -   (set_attr "prefix" "orig,vex,evex")
> +   (set_attr "prefix" "orig,maybe_evex,evex")
>     (set_attr "btver2_decode" "double,double,double")
>     (set_attr "mode" "TI")])
>
>  (define_insn "aesenclast"
>    [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
>         (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
> -                      (match_operand:V2DI 2 "vector_operand" "xja,xm,vm")]
> +                      (match_operand:V2DI 2 "vector_operand" "xja,xjm,vm")]
>                       UNSPEC_AESENCLAST))]
>    "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
>    "@
>     aesenclast\t{%2, %0|%0, %2}
> -   vaesenclast\t{%2, %1, %0|%0, %1, %2}
> +   * return TARGET_AES ? \"vaesenclast\t{%2, %1, %0|%0, %1, %2}\" : \"%{evex%} vaesenclast\t{%2, %1, %0|%0, %1, %2}\";
>     vaesenclast\t{%2, %1, %0|%0, %1, %2}"
> -  [(set_attr "isa" "noavx,aes,avx512vl")
> +  [(set_attr "isa" "noavx,avx,vaes_avx512vl")
>     (set_attr "type" "sselog1")
>     (set_attr "addr" "gpr16,*,*")
>     (set_attr "prefix_extra" "1")
> -   (set_attr "prefix" "orig,vex,evex")
> -   (set_attr "btver2_decode" "double,double,double")
> +   (set_attr "prefix" "orig,maybe_evex,evex")
> +   (set_attr "btver2_decode" "double,double,double")
>     (set_attr "mode" "TI")])
>
>  (define_insn "aesdec"
>    [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
>         (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
> -                      (match_operand:V2DI 2 "vector_operand" "xja,xm,vm")]
> +                      (match_operand:V2DI 2 "vector_operand" "xja,xjm,vm")]
>                       UNSPEC_AESDEC))]
>    "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
>    "@
>     aesdec\t{%2, %0|%0, %2}
> -   vaesdec\t{%2, %1, %0|%0, %1, %2}
> +   * return TARGET_AES ? \"vaesdec\t{%2, %1, %0|%0, %1, %2}\" : \"%{evex%} vaesdec\t{%2, %1, %0|%0, %1, %2}\";
>     vaesdec\t{%2, %1, %0|%0, %1, %2}"
> -  [(set_attr "isa" "noavx,aes,avx512vl")
> +  [(set_attr "isa" "noavx,avx,vaes_avx512vl")
>     (set_attr "type" "sselog1")
>     (set_attr "addr" "gpr16,*,*")
>     (set_attr "prefix_extra" "1")
> -   (set_attr "prefix" "orig,vex,evex")
> +   (set_attr "prefix" "orig,maybe_evex,evex")
>     (set_attr "btver2_decode" "double,double,double")
>     (set_attr "mode" "TI")])
>
>  (define_insn "aesdeclast"
>    [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
>         (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
> -                      (match_operand:V2DI 2 "vector_operand" "xja,xm,vm")]
> +                      (match_operand:V2DI 2 "vector_operand" "xja,xjm,vm")]
>                       UNSPEC_AESDECLAST))]
>    "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
>    "@
>     aesdeclast\t{%2, %0|%0, %2}
> -   vaesdeclast\t{%2, %1, %0|%0, %1, %2}
> +   * return TARGET_AES ? \"vaesdeclast\t{%2, %1, %0|%0, %1, %2}\" : \"%{evex%} vaesdeclast\t{%2, %1, %0|%0, %1, %2}\";
>     vaesdeclast\t{%2, %1, %0|%0, %1, %2}"
> -  [(set_attr "isa" "noavx,aes,avx512vl")
> +  [(set_attr "isa" "noavx,avx,vaes_avx512vl")
>     (set_attr "addr" "gpr16,*,*")
>     (set_attr "type" "sselog1")
>     (set_attr "prefix_extra" "1")
> -   (set_attr "prefix" "orig,vex,evex")
> +   (set_attr "prefix" "orig,maybe_evex,evex")
>     (set_attr "btver2_decode" "double,double,double")
>     (set_attr "mode" "TI")])
>
> @@ -30246,44 +30246,60 @@ (define_insn "vpdpwssds_<mode>_maskz_1"
>     [(set_attr ("prefix") ("evex"))])
>
>  (define_insn "vaesdec_<mode>"
> -  [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=v")
> +  [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=x,v")
>         (unspec:VI1_AVX512VL_F
> -         [(match_operand:VI1_AVX512VL_F 1 "register_operand" "v")
> -          (match_operand:VI1_AVX512VL_F 2 "vector_operand" "vm")]
> +         [(match_operand:VI1_AVX512VL_F 1 "register_operand" "x,v")
> +          (match_operand:VI1_AVX512VL_F 2 "vector_operand" "xjm,vm")]
>           UNSPEC_VAESDEC))]
>    "TARGET_VAES"
> -  "vaesdec\t{%2, %1, %0|%0, %1, %2}"
> -)
> +{
> +  if (which_alternative == 0 && <MODE>mode == V16QImode)
> +    return "%{evex%} vaesdec\t{%2, %1, %0|%0, %1, %2}";
> +  else
> +    return "vaesdec\t{%2, %1, %0|%0, %1, %2}";
> +})
>
>  (define_insn "vaesdeclast_<mode>"
> -  [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=v")
> +  [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=x,v")
>         (unspec:VI1_AVX512VL_F
> -         [(match_operand:VI1_AVX512VL_F 1 "register_operand" "v")
> -          (match_operand:VI1_AVX512VL_F 2 "vector_operand" "vm")]
> +         [(match_operand:VI1_AVX512VL_F 1 "register_operand" "x,v")
> +          (match_operand:VI1_AVX512VL_F 2 "vector_operand" "xjm,vm")]
>           UNSPEC_VAESDECLAST))]
>    "TARGET_VAES"
> -  "vaesdeclast\t{%2, %1, %0|%0, %1, %2}"
> -)
> +{
> +  if (which_alternative == 0 && <MODE>mode == V16QImode)
> +    return "%{evex%} vaesdeclast\t{%2, %1, %0|%0, %1, %2}";
> +  else
> +    return "vaesdeclast\t{%2, %1, %0|%0, %1, %2}";
> +})
>
>  (define_insn "vaesenc_<mode>"
> -  [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=v")
> +  [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=x,v")
>         (unspec:VI1_AVX512VL_F
> -         [(match_operand:VI1_AVX512VL_F 1 "register_operand" "v")
> -          (match_operand:VI1_AVX512VL_F 2 "vector_operand" "vm")]
> +         [(match_operand:VI1_AVX512VL_F 1 "register_operand" "x,v")
> +          (match_operand:VI1_AVX512VL_F 2 "vector_operand" "xjm,vm")]
>           UNSPEC_VAESENC))]
>    "TARGET_VAES"
> -  "vaesenc\t{%2, %1, %0|%0, %1, %2}"
> -)
> +{
> +  if (which_alternative == 0 && <MODE>mode == V16QImode)
> +    return "%{evex%} vaesenc\t{%2, %1, %0|%0, %1, %2}";
> +  else
> +    return "vaesenc\t{%2, %1, %0|%0, %1, %2}";
> +})
>
>  (define_insn "vaesenclast_<mode>"
> -  [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=v")
> +  [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=x,v")
>         (unspec:VI1_AVX512VL_F
> -         [(match_operand:VI1_AVX512VL_F 1 "register_operand" "v")
> -          (match_operand:VI1_AVX512VL_F 2 "vector_operand" "vm")]
> +         [(match_operand:VI1_AVX512VL_F 1 "register_operand" "x,v")
> +          (match_operand:VI1_AVX512VL_F 2 "vector_operand" "xjm,vm")]
>           UNSPEC_VAESENCLAST))]
>    "TARGET_VAES"
> -  "vaesenclast\t{%2, %1, %0|%0, %1, %2}"
> -)
> +{
> +  if (which_alternative == 0 && <MODE>mode == V16QImode)
> +    return "%{evex%} vaesenclast\t{%2, %1, %0|%0, %1, %2}";
> +  else
> +    return "vaesenclast\t{%2, %1, %0|%0, %1, %2}";
> +})
>
>  (define_insn "vpclmulqdq_<mode>"
>    [(set (match_operand:VI8_FVL 0 "register_operand" "=v")
> --- gcc/testsuite/gcc.target/i386/aes-pr114576.c.jj     2024-04-09 10:27:32.782646751 +0200
> +++ gcc/testsuite/gcc.target/i386/aes-pr114576.c        2024-04-09 10:27:32.782646751 +0200
> @@ -0,0 +1,63 @@
> +/* PR target/114576 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -maes -mno-avx" } */
> +/* { dg-final { scan-assembler-times "\taesenc\t" 2 } } */
> +/* { dg-final { scan-assembler-times "\taesdec\t" 2 } } */
> +/* { dg-final { scan-assembler-times "\taesenclast\t" 2 } } */
> +/* { dg-final { scan-assembler-times "\taesdeclast\t" 2 } } */
> +/* { dg-final { scan-assembler-not "\tvaesenc" } } */
> +/* { dg-final { scan-assembler-not "\tvaesdec" } } */
> +
> +#include <immintrin.h>
> +
> +__m128i
> +f1 (__m128i x, __m128i y)
> +{
> +  return _mm_aesenc_si128 (x, y);
> +}
> +
> +__m128i
> +f2 (__m128i x, __m128i y)
> +{
> +  __m128i z = _mm_aesenc_si128 (x, y);
> +  return z + x + y;
> +}
> +
> +__m128i
> +f3 (__m128i x, __m128i y)
> +{
> +  return _mm_aesdec_si128 (x, y);
> +}
> +
> +__m128i
> +f4 (__m128i x, __m128i y)
> +{
> +  __m128i z = _mm_aesdec_si128 (x, y);
> +  return z + x + y;
> +}
> +
> +__m128i
> +f5 (__m128i x, __m128i y)
> +{
> +  return _mm_aesenclast_si128 (x, y);
> +}
> +
> +__m128i
> +f6 (__m128i x, __m128i y)
> +{
> +  __m128i z = _mm_aesenclast_si128 (x, y);
> +  return z + x + y;
> +}
> +
> +__m128i
> +f7 (__m128i x, __m128i y)
> +{
> +  return _mm_aesdeclast_si128 (x, y);
> +}
> +
> +__m128i
> +f8 (__m128i x, __m128i y)
> +{
> +  __m128i z = _mm_aesdeclast_si128 (x, y);
> +  return z + x + y;
> +}
>
>
>         Jakub
>


-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-04-09 10:32 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-18  7:18 [PATCH] i386: Share AES xmm intrin with VAES Haochen Jiang
2023-04-18  7:28 ` Haochen Jiang
2023-04-19  2:31 ` Hongtao Liu
2023-04-19  2:40   ` Jiang, Haochen
2023-04-19  2:42     ` Liu, Hongtao
2024-04-04  8:41     ` [PATCH] i386: Fix aes/vaes patterns [PR114576] Jakub Jelinek
2024-04-08 12:33       ` Jiang, Haochen
2024-04-08 12:43         ` Jakub Jelinek
2024-04-08 12:46           ` Jiang, Haochen
2024-04-09  3:23       ` Hongtao Liu
2024-04-09  9:18         ` [PATCH] i386, v2: " Jakub Jelinek
2024-04-09 10:32           ` Hongtao Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).