public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* PATCH: Enable Intel AES/CLMUL
@ 2008-04-03 14:31 H.J. Lu
  2008-04-03 16:21 ` Daniel Berlin
       [not found] ` <alpine.LSU.1.00.0804062013110.22304@acrux.dbai.tuwien.ac.at>
  0 siblings, 2 replies; 23+ messages in thread
From: H.J. Lu @ 2008-04-03 14:31 UTC (permalink / raw)
  To: gcc-patches; +Cc: ubizjak

Hi,

This patch enables Intel AES/CLMUL:

http://softwareprojects.intel.com/avx/

OK for mainline?

Thanks.

H.J.
----
gcc/

2008-04-03  H.J. Lu  <hongjiu.lu@intel.com>

	* config.gcc (extra_headers): Add wmmintrin.h for x86 and x86-64.

	* config/i386/cpuid.h (bit_AES): New.
	(bit_CLMUL): Likewise.

	* config/i386/i386.c (OPTION_MASK_ISA_AES_SET): New.
	(OPTION_MASK_ISA_CLMUL_SET): Likewise.
	(OPTION_MASK_ISA_AES_UNSET): Likewise.
	(OPTION_MASK_ISA_CLMUL_UNSET): Likewise.
	(OPTION_MASK_ISA_SSE4_2_UNSET): Add OPTION_MASK_ISA_AES_UNSET
	and OPTION_MASK_ISA_CLMUL_UNSET.
	(ix86_handle_option): Handle OPT_maes and OPT_mclmul.
	(pta_flags): Add PTA_AES and PTA_PCLMULQDQ.
	(override_options): Handle PTA_AES and PTA_PCLMULQDQ.
	(ix86_builtins): Add IX86_BUILTIN_AESENC128,
	IX86_BUILTIN_AESENCLAST128, IX86_BUILTIN_AESDEC128,
	IX86_BUILTIN_AESDECLAST128, IX86_BUILTIN_AESIMC128,
	IX86_BUILTIN_AESKEYGENASSIST128 and IX86_BUILTIN_PCLMULQDQ128.
	(bdesc_sse_3arg): Add __builtin_ia32_pclmulqdq128.
	(bdesc_2arg): Add __builtin_ia32_aesenc128,
	__builtin_ia32_aesenclast128, __builtin_ia32_aesdec128,
	__builtin_ia32_aesdeclast128,__builtin_ia32_aesimc128 and
	IX86_BUILTIN_AESKEYGENASSIST128.
	(bdesc_1arg): Add __builtin_ia32_aesimc128.
	(ix86_init_mmx_sse_builtins): Handle V2DImode for bdesc_1arg.
	Define __builtin_ia32_aeskeygenassist128.
	* config/i386/i386.c (ix86_expand_binop_imm_builtin): New.
	(ix86_expand_builtin): Use it for IX86_BUILTIN_PSLLDQI128 and
	IX86_BUILTIN_PSRLDQI128.  Handle IX86_BUILTIN_AESKEYGENASSIST128.

	* config/i386/i386.h (TARGET_AES): New.
	(TARGET_CLMUL): Likewise.
	(TARGET_CPU_CPP_BUILTINS): Handle TARGET_AES and TARGET_CLMUL.

	* config/i386/i386.md (UNSPEC_AESENC): New.
	(UNSPEC_AESENCLAST): Likewise.
	(UNSPEC_AESDEC): Likewise.
	(UNSPEC_AESDECLAST): Likewise.
	(UNSPEC_AESIMC): Likewise.
	(UNSPEC_AESKEYGENASSIST): Likewise.
	(UNSPEC_PCLMULQDQ): Likewise.

	* config/i386/i386.opt (maes): New.
	(mclmul): Likewise.

	* config/i386/sse.md (aesenc): New pattern.
	(aesenclast): Likewise.
	(aesdec): Likewise.
	(aesdeclast): Likewise.
	(aesimc): Likewise.
	(aeskeygenassist): Likewise.
	(pclmulqdq): Likewise.

	* config/i386/wmmintrin.h: New.

	* doc/extend.texi: Document AES and CLMUL built-in function.

	* doc/invoke.texi: Document -maes and -mclmul.

gcc/testsuite/

2008-04-03  H.J. Lu  <hongjiu.lu@intel.com>

	* gcc.target/i386/aes-check.h: New.
	* gcc.target/i386/aesdec.c: Likewise.
	* gcc.target/i386/aesdeclast.c: Likewise.
	* gcc.target/i386/aesenc.c: Likewise.
	* gcc.target/i386/aesenclast.c: Likewise.
	* gcc.target/i386/aesimc.c: Likewise.
	* gcc.target/i386/aeskeygenassist.c: Likewise.
	* gcc.target/i386/pclmulqdq.c: Likewise.
	* gcc.target/i386/clmul-check.h: Likewise.

	* gcc.target/i386/i386.exp (check_effective_target_aes): New.
	(check_effective_target_clmul): Likewise.

Index: gcc/doc/extend.texi
===================================================================
--- gcc/doc/extend.texi	(.../fsf/trunk)	(revision 2007)
+++ gcc/doc/extend.texi	(.../branches/aes)	(revision 2007)
@@ -8013,6 +8013,27 @@ depending on the size of @code{unsigned 
 Generates the @code{popcntq} machine instruction.
 @end table
 
+The following built-in functions are available when @option{-maes} is
+used.  All of them generate the machine instruction that is part of the
+name.
+
+@smallexample
+v2di __builtin_ia32_aesenc128 (v2di, v2di)
+v2di __builtin_ia32_aesenclast128 (v2di, v2di)
+v2di __builtin_ia32_aesdec128 (v2di, v2di)
+v2di __builtin_ia32_aesdeclast128 (v2di, v2di)
+v2di __builtin_ia32_aeskeygenassist128 (v2di, const int)
+v2di __builtin_ia32_aesimc128 (v2di)
+@end smallexample
+
+The following built-in function is available when @option{-mclmul} is
+used.
+
+@table @code
+@item v2di __builtin_ia32_pclmulqdq128 (v2di, v2di, const int)
+Generates the @code{pclmulqdq} machine instruction.
+@end table
+
 The following built-in functions are available when @option{-msse4a} is used.
 All of them generate the machine instruction that is part of the name.
 
Index: gcc/doc/invoke.texi
===================================================================
--- gcc/doc/invoke.texi	(.../fsf/trunk)	(revision 2007)
+++ gcc/doc/invoke.texi	(.../branches/aes)	(revision 2007)
@@ -555,6 +555,7 @@ Objective-C and Objective-C++ Dialects}.
 -mno-wide-multiply  -mrtd  -malign-double @gol
 -mpreferred-stack-boundary=@var{num} -mcx16 -msahf -mrecip @gol
 -mmmx  -msse  -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msse4 @gol
+-maes -mclmul @gol
 -msse4a -m3dnow -mpopcnt -mabm -msse5 @gol
 -mthreads  -mno-align-stringops  -minline-all-stringops @gol
 -mpush-args  -maccumulate-outgoing-args  -m128bit-long-double @gol
@@ -10720,6 +10721,10 @@ preferred alignment to @option{-mpreferr
 @itemx -mno-sse4.2
 @itemx -msse4
 @itemx -mno-sse4
+@itemx -maes
+@itemx -mno-aes
+@itemx -mclmul
+@itemx -mno-clmul
 @itemx -msse4a
 @itemx -mno-sse4a
 @itemx -msse5
@@ -10737,8 +10742,8 @@ preferred alignment to @option{-mpreferr
 @opindex m3dnow
 @opindex mno-3dnow
 These switches enable or disable the use of instructions in the MMX,
-SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4A, SSE5, ABM or 3DNow!@: extended
-instruction sets.
+SSE, SSE2, SSE3, SSSE3, SSE4.1, AES, CLMUL, SSE4A, SSE5, ABM or
+3DNow!@: extended instruction sets.
 These extensions are also available as built-in functions: see
 @ref{X86 Built-in Functions}, for details of the functions enabled and
 disabled by these switches.
Index: gcc/testsuite/gcc.target/i386/i386.exp
===================================================================
--- gcc/testsuite/gcc.target/i386/i386.exp	(.../fsf/trunk)	(revision 2007)
+++ gcc/testsuite/gcc.target/i386/i386.exp	(.../branches/aes)	(revision 2007)
@@ -51,6 +51,34 @@ proc check_effective_target_sse4 { } {
     } "-O2 -msse4.1" ]
 }
 
+# Return 1 if aes instructions can be compiled.
+proc check_effective_target_aes { } {
+    return [check_no_compiler_messages aes object {
+	typedef long long __m128i __attribute__ ((__vector_size__ (16)));
+	typedef long long __v2di __attribute__ ((__vector_size__ (16)));
+
+	__m128i _mm_aesimc_si128 (__m128i __X)
+	{
+	    return (__m128i) __builtin_ia32_aesimc128 ((__v2di)__X);
+	}
+    } "-O2 -maes" ]
+}
+
+# Return 1 if clmul instructions can be compiled.
+proc check_effective_target_clmul { } {
+    return [check_no_compiler_messages clmul object {
+	typedef long long __m128i __attribute__ ((__vector_size__ (16)));
+	typedef long long __v2di __attribute__ ((__vector_size__ (16)));
+
+	__m128i pclmulqdq_test (__m128i __X, __m128i __Y)
+	{
+	    return (__m128i) __builtin_ia32_pclmulqdq128 ((__v2di)__X,
+							  (__v2di)__Y,
+							  1);
+	}
+    } "-O2 -mclmul" ]
+}
+
 # Return 1 if sse4a instructions can be compiled.
 proc check_effective_target_sse4a { } {
     return [check_no_compiler_messages sse4a object {
Index: gcc/testsuite/gcc.target/i386/aesdeclast.c
===================================================================
--- gcc/testsuite/gcc.target/i386/aesdeclast.c	(.../fsf/trunk)	(revision 0)
+++ gcc/testsuite/gcc.target/i386/aesdeclast.c	(.../branches/aes)	(revision 2007)
@@ -0,0 +1,69 @@
+/* { dg-do run } */
+/* { dg-require-effective-target aes } */
+/* { dg-options "-O2 -maes" } */
+
+#include <wmmintrin.h>
+#include <string.h>
+
+#include "aes-check.h"
+
+extern void abort (void);
+
+#define NUM 1024
+
+static __m128i src1[NUM];
+static __m128i src2[NUM];
+static __m128i edst[NUM];
+
+static __m128i resdst[NUM];
+
+/* Initialize input/output vectors.  (Currently, there is only one set of
+   input/output vectors).  */
+
+static void
+init_data (__m128i *s1, __m128i *s2, __m128i *d)
+{
+  int i;
+
+  for (i = 0; i < NUM; i++)
+    {
+      s1[i] = _mm_setr_epi32 (0x5d53475d, 0x63746f72,
+			      0x73745665, 0x7b5b5465);
+      s2[i] = _mm_setr_epi32 (0x726f6e5d, 0x5b477565,
+			      0x68617929, 0x48692853);
+      d[i] = _mm_setr_epi32 (0x72a593d0, 0xd410637b,
+			     0x6b317f95, 0xc5a391ef);
+    }
+}
+
+static void
+aes_test (void)
+{
+  int i;
+
+  init_data (src1, src2, edst);
+
+  for (i = 0; i < NUM; i += 16)
+    {
+      resdst[i] = _mm_aesdeclast_si128 (src1[i], src2[i]);
+      resdst[i + 1] = _mm_aesdeclast_si128 (src1[i + 1], src2[i + 1]);
+      resdst[i + 2] = _mm_aesdeclast_si128 (src1[i + 2], src2[i + 2]);
+      resdst[i + 3] = _mm_aesdeclast_si128 (src1[i + 3], src2[i + 3]);
+      resdst[i + 4] = _mm_aesdeclast_si128 (src1[i + 4], src2[i + 4]);
+      resdst[i + 5] = _mm_aesdeclast_si128 (src1[i + 5], src2[i + 5]);
+      resdst[i + 6] = _mm_aesdeclast_si128 (src1[i + 6], src2[i + 6]);
+      resdst[i + 7] = _mm_aesdeclast_si128 (src1[i + 7], src2[i + 7]);
+      resdst[i + 8] = _mm_aesdeclast_si128 (src1[i + 8], src2[i + 8]);
+      resdst[i + 9] = _mm_aesdeclast_si128 (src1[i + 9], src2[i + 9]);
+      resdst[i + 10] = _mm_aesdeclast_si128 (src1[i + 10], src2[i + 10]);
+      resdst[i + 11] = _mm_aesdeclast_si128 (src1[i + 11], src2[i + 11]);
+      resdst[i + 12] = _mm_aesdeclast_si128 (src1[i + 12], src2[i + 12]);
+      resdst[i + 13] = _mm_aesdeclast_si128 (src1[i + 13], src2[i + 13]);
+      resdst[i + 14] = _mm_aesdeclast_si128 (src1[i + 14], src2[i + 14]);
+      resdst[i + 15] = _mm_aesdeclast_si128 (src1[i + 15], src2[i + 15]);
+    }
+
+  for (i = 0; i < NUM; i++)
+    if (memcmp (edst + i, resdst + i, sizeof (__m128i)))
+      abort ();
+}
Index: gcc/testsuite/gcc.target/i386/pclmulqdq.c
===================================================================
--- gcc/testsuite/gcc.target/i386/pclmulqdq.c	(.../fsf/trunk)	(revision 0)
+++ gcc/testsuite/gcc.target/i386/pclmulqdq.c	(.../branches/aes)	(revision 2007)
@@ -0,0 +1,87 @@
+/* { dg-do run } */
+/* { dg-require-effective-target clmul } */
+/* { dg-options "-O2 -mclmul" } */
+
+#include <wmmintrin.h>
+#include <string.h>
+
+#include "clmul-check.h"
+
+extern void abort (void);
+
+#define NUM 1024
+
+static __m128i s1[NUM];
+static __m128i s2[NUM];
+/* We need this array to generate mem form of inst */
+static __m128i s2m[NUM];
+
+static __m128i e_00[NUM];
+static __m128i e_01[NUM];
+static __m128i e_10[NUM];
+static __m128i e_11[NUM];
+
+static __m128i d_00[NUM];
+static __m128i d_01[NUM];
+static __m128i d_10[NUM];
+static __m128i d_11[NUM];
+
+/* Initialize input/output vectors.  (Currently, there is only one set
+   of input/output vectors).  */
+static void
+init_data (__m128i *ls1,   __m128i *ls2, __m128i *le_00, __m128i *le_01,
+	   __m128i *le_10, __m128i *le_11)
+{
+  int i;
+
+  for (i = 0; i < NUM; i++)
+    {
+      ls1[i] = _mm_set_epi32 (0x7B5B5465, 0x73745665,
+			      0x63746F72, 0x5D53475D);
+      ls2[i] = _mm_set_epi32 (0x48692853, 0x68617929,
+			      0x5B477565, 0x726F6E5D);
+      s2m[i] = _mm_set_epi32 (0x48692853, 0x68617929,
+			      0x5B477565, 0x726F6E5D);
+      le_00[i] = _mm_set_epi32 (0x1D4D84C8, 0x5C3440C0,
+				0x929633D5, 0xD36F0451);
+      le_01[i] = _mm_set_epi32 (0x1A2BF6DB, 0x3A30862F,
+				0xBABF262D, 0xF4B7D5C9);
+      le_10[i] = _mm_set_epi32 (0x1BD17C8D, 0x556AB5A1,
+				0x7FA540AC, 0x2A281315);
+      le_11[i] = _mm_set_epi32 (0x1D1E1F2C, 0x592E7C45,
+				0xD66EE03E, 0x410FD4ED);
+    }
+}
+
+static void
+clmul_test (void)
+{
+  int i;
+
+  init_data (s1, s2, e_00, e_01, e_10, e_11);
+
+  for (i = 0; i < NUM; i += 2)
+    {
+      d_00[i] = _mm_clmulepi64_si128 (s1[i], s2m[i], 0x00);
+      d_01[i] = _mm_clmulepi64_si128 (s1[i], s2[i], 0x01);
+      d_10[i] = _mm_clmulepi64_si128 (s1[i], s2[i], 0x10);
+      d_11[i] = _mm_clmulepi64_si128 (s1[i], s2[i], 0x11);
+
+      d_11[i + 1] = _mm_clmulepi64_si128 (s1[i + 1], s2[i + 1], 0x11);
+      d_00[i + 1] = _mm_clmulepi64_si128 (s1[i + 1], s2[i + 1], 0x00);
+      d_10[i + 1] = _mm_clmulepi64_si128 (s1[i + 1], s2m[i + 1], 0x10);
+      d_01[i + 1] = _mm_clmulepi64_si128 (s1[i + 1], s2[i + 1], 0x01);
+    }
+
+  for (i = 0; i < NUM; i++)
+    {
+      if (memcmp (d_00 + i, e_00 + i, sizeof (__m128i)))
+	abort ();
+      if (memcmp (d_01 + i, e_01 + i, sizeof (__m128i)))
+	abort ();
+      if (memcmp (d_10 + i, e_10 + i, sizeof (__m128i)))
+	abort ();
+      if (memcmp(d_11 + i, e_11 + i, sizeof (__m128i)))
+	abort ();
+    }
+}

Property changes on: gcc/testsuite/gcc.target/i386/pclmulqdq.c
___________________________________________________________________
Name: svn:executable
   + *

Index: gcc/testsuite/gcc.target/i386/clmul-check.h
===================================================================
--- gcc/testsuite/gcc.target/i386/clmul-check.h	(.../fsf/trunk)	(revision 0)
+++ gcc/testsuite/gcc.target/i386/clmul-check.h	(.../branches/aes)	(revision 2007)
@@ -0,0 +1,30 @@
+#include <stdio.h>
+#include <stdlib.h>
+
+#include "cpuid.h"
+
+static void clmul_test (void);
+
+int
+main ()
+{
+  unsigned int eax, ebx, ecx, edx;
+ 
+  if (!__get_cpuid (1, &eax, &ebx, &ecx, &edx))
+    return 0;
+
+  /* Run PCLMULQDQ test only if host has PCLMULQDQ support.  */
+  if (ecx & bit_CLMUL)
+    {
+      clmul_test ();
+#ifdef DEBUG
+      printf ("PASSED\n");
+#endif
+    }
+#ifdef DEBUG
+  else
+    printf ("SKIPPED\n");
+#endif
+
+  return 0;
+}
Index: gcc/testsuite/gcc.target/i386/aes-check.h
===================================================================
--- gcc/testsuite/gcc.target/i386/aes-check.h	(.../fsf/trunk)	(revision 0)
+++ gcc/testsuite/gcc.target/i386/aes-check.h	(.../branches/aes)	(revision 2007)
@@ -0,0 +1,30 @@
+#include <stdio.h>
+#include <stdlib.h>
+
+#include "cpuid.h"
+
+static void aes_test (void);
+
+int
+main ()
+{
+  unsigned int eax, ebx, ecx, edx;
+ 
+  if (!__get_cpuid (1, &eax, &ebx, &ecx, &edx))
+    return 0;
+
+  /* Run AES test only if host has AES support.  */
+  if (ecx & bit_AES)
+    {
+      aes_test ();
+#ifdef DEBUG
+    printf ("PASSED\n");
+#endif
+    }
+#ifdef DEBUG
+  else
+    printf ("SKIPPED\n");
+#endif
+
+  return 0;
+}
Index: gcc/testsuite/gcc.target/i386/aeskeygenassist.c
===================================================================
--- gcc/testsuite/gcc.target/i386/aeskeygenassist.c	(.../fsf/trunk)	(revision 0)
+++ gcc/testsuite/gcc.target/i386/aeskeygenassist.c	(.../branches/aes)	(revision 2007)
@@ -0,0 +1,66 @@
+/* { dg-do run } */
+/* { dg-require-effective-target aes } */
+/* { dg-options "-O2 -maes" } */
+
+#include <wmmintrin.h>
+#include <string.h>
+
+#include "aes-check.h"
+
+extern void abort (void);
+
+#define NUM 1024
+#define IMM8 1
+
+static __m128i src1[NUM];
+static __m128i edst[NUM];
+
+static __m128i resdst[NUM];
+
+/* Initialize input/output vectors.  (Currently, there is only one set
+   of input/output vectors).  */
+
+static void
+init_data (__m128i *s1, __m128i *d)
+{
+  int i;
+  for (i = 0; i < NUM; i++)
+    {
+      s1[i] = _mm_setr_epi32 (0x16157e2b, 0xa6d2ae28,
+			      0x8815f7ab, 0x3c4fcf09);
+      d[i] = _mm_setr_epi32 (0x24b5e434, 0x3424b5e5,
+			     0xeb848a01, 0x01eb848b);
+    }
+}
+
+static void
+aes_test (void)
+{
+  int i;
+
+  init_data (src1, edst);
+
+  for (i = 0; i < NUM; i += 16)
+    {
+      resdst[i]  = _mm_aeskeygenassist_si128 (src1[i], IMM8);
+      resdst[i + 1] = _mm_aeskeygenassist_si128 (src1[i + 1], IMM8);
+      resdst[i + 2] = _mm_aeskeygenassist_si128 (src1[i + 2], IMM8);
+      resdst[i + 3] = _mm_aeskeygenassist_si128 (src1[i + 3], IMM8);
+      resdst[i + 4] = _mm_aeskeygenassist_si128 (src1[i + 4], IMM8);
+      resdst[i + 5] = _mm_aeskeygenassist_si128 (src1[i + 5], IMM8);
+      resdst[i + 6] = _mm_aeskeygenassist_si128 (src1[i + 6], IMM8);
+      resdst[i + 7] = _mm_aeskeygenassist_si128 (src1[i + 7], IMM8);
+      resdst[i + 8] = _mm_aeskeygenassist_si128 (src1[i + 8], IMM8);
+      resdst[i + 9] = _mm_aeskeygenassist_si128 (src1[i + 9], IMM8);
+      resdst[i + 10] = _mm_aeskeygenassist_si128 (src1[i + 10], IMM8);
+      resdst[i + 11] = _mm_aeskeygenassist_si128 (src1[i + 11], IMM8);
+      resdst[i + 12] = _mm_aeskeygenassist_si128 (src1[i + 12], IMM8);
+      resdst[i + 13] = _mm_aeskeygenassist_si128 (src1[i + 13], IMM8);
+      resdst[i + 14] = _mm_aeskeygenassist_si128 (src1[i + 14], IMM8);
+      resdst[i + 15] = _mm_aeskeygenassist_si128 (src1[i + 15], IMM8);
+    }
+
+  for (i = 0; i < NUM; i++)
+    if (memcmp(edst + i, resdst + i, sizeof (__m128i)))
+      abort ();
+}
Index: gcc/testsuite/gcc.target/i386/aesenclast.c
===================================================================
--- gcc/testsuite/gcc.target/i386/aesenclast.c	(.../fsf/trunk)	(revision 0)
+++ gcc/testsuite/gcc.target/i386/aesenclast.c	(.../branches/aes)	(revision 2007)
@@ -0,0 +1,68 @@
+/* { dg-do run } */
+/* { dg-require-effective-target aes } */
+/* { dg-options "-O2 -maes" } */
+
+#include <wmmintrin.h>
+#include <string.h>
+
+#include "aes-check.h"
+
+extern void abort (void);
+
+#define NUM 1024
+
+static __m128i src1[NUM];
+static __m128i src2[NUM];
+static __m128i edst[NUM];
+
+static __m128i resdst[NUM];
+
+/* Initialize input/output vectors.  (Currently, there is only one
+   set of input/output vectors).  */
+
+static void
+init_data (__m128i *s1, __m128i *s2, __m128i *d)
+{
+  int i;
+  for (i = 0; i < NUM; i++)
+    {
+      s1[i] = _mm_setr_epi32 (0x5d53475d, 0x63746f72,
+			      0x73745665, 0x7b5b5465);
+      s2[i] = _mm_setr_epi32 (0x726f6e5d, 0x5b477565,
+			      0x68617929, 0x48692853);
+      d[i] = _mm_setr_epi32 (0x53fdc611, 0x177ec425,
+			     0x938c5964, 0xc7fb881e);
+    }
+}
+
+static void
+aes_test (void)
+{
+  int i;
+
+  init_data (src1, src2, edst);
+
+  for (i = 0; i < NUM; i += 16)
+    {
+      resdst[i] = _mm_aesenclast_si128 (src1[i], src2[i]);
+      resdst[i + 1] = _mm_aesenclast_si128 (src1[i + 1], src2[i + 1]);
+      resdst[i + 2] = _mm_aesenclast_si128 (src1[i + 2], src2[i + 2]);
+      resdst[i + 3] = _mm_aesenclast_si128 (src1[i + 3], src2[i + 3]);
+      resdst[i + 4] = _mm_aesenclast_si128 (src1[i + 4], src2[i + 4]);
+      resdst[i + 5] = _mm_aesenclast_si128 (src1[i + 5], src2[i + 5]);
+      resdst[i + 6] = _mm_aesenclast_si128 (src1[i + 6], src2[i + 6]);
+      resdst[i + 7] = _mm_aesenclast_si128 (src1[i + 7], src2[i + 7]);
+      resdst[i + 8] = _mm_aesenclast_si128 (src1[i + 8], src2[i + 8]);
+      resdst[i + 9] = _mm_aesenclast_si128 (src1[i + 9], src2[i + 9]);
+      resdst[i + 10] = _mm_aesenclast_si128 (src1[i + 10], src2[i + 10]);
+      resdst[i + 11] = _mm_aesenclast_si128 (src1[i + 11], src2[i + 11]);
+      resdst[i + 12] = _mm_aesenclast_si128 (src1[i + 12], src2[i + 12]);
+      resdst[i + 13] = _mm_aesenclast_si128 (src1[i + 13], src2[i + 13]);
+      resdst[i + 14] = _mm_aesenclast_si128 (src1[i + 14], src2[i + 14]);
+      resdst[i + 15] = _mm_aesenclast_si128 (src1[i + 15], src2[i + 15]);
+    }
+
+  for (i = 0; i < NUM; i++)
+    if (memcmp(edst + i, resdst + i, sizeof (__m128i)))
+      abort ();
+}
Index: gcc/testsuite/gcc.target/i386/aesimc.c
===================================================================
--- gcc/testsuite/gcc.target/i386/aesimc.c	(.../fsf/trunk)	(revision 0)
+++ gcc/testsuite/gcc.target/i386/aesimc.c	(.../branches/aes)	(revision 2007)
@@ -0,0 +1,66 @@
+/* { dg-do run } */
+/* { dg-require-effective-target aes } */
+/* { dg-options "-O2 -maes" } */
+
+#include <wmmintrin.h>
+#include <string.h>
+
+#include "aes-check.h"
+
+extern void abort (void);
+
+#define NUM 1024
+
+static __m128i src1[NUM];
+static __m128i edst[NUM];
+
+static __m128i resdst[NUM];
+
+/* Initialize input/output vectors.  (Currently, there is only one set
+   of input/output vectors).   */
+
+static void
+init_data (__m128i *s1, __m128i *d)
+{
+  int i;
+
+  for (i = 0; i < NUM; i++)
+    {
+      s1[i] = _mm_setr_epi32 (0x5d53475d, 0x63746f72,
+			      0x73745665, 0x7b5b5465);
+      d[i] = _mm_setr_epi32 (0x81c3b3e5, 0x2b18330a,
+			     0x44b109c8, 0x627a6f66);
+    }
+}
+
+static void
+aes_test (void)
+{
+  int i;
+
+  init_data (src1, edst);
+
+  for (i = 0; i < NUM; i += 16)
+    {
+      resdst[i] = _mm_aesimc_si128 (src1[i]);
+      resdst[i + 1] = _mm_aesimc_si128 (src1[i + 1]);
+      resdst[i + 2] = _mm_aesimc_si128 (src1[i + 2]);
+      resdst[i + 3] = _mm_aesimc_si128 (src1[i + 3]);
+      resdst[i + 4] = _mm_aesimc_si128 (src1[i + 4]);
+      resdst[i + 5] = _mm_aesimc_si128 (src1[i + 5]);
+      resdst[i + 6] = _mm_aesimc_si128 (src1[i + 6]);
+      resdst[i + 7] = _mm_aesimc_si128 (src1[i + 7]);
+      resdst[i + 8] = _mm_aesimc_si128 (src1[i + 8]);
+      resdst[i + 9] = _mm_aesimc_si128 (src1[i + 9]);
+      resdst[i + 10] = _mm_aesimc_si128 (src1[i + 10]);
+      resdst[i + 11] = _mm_aesimc_si128 (src1[i + 11]);
+      resdst[i + 12] = _mm_aesimc_si128 (src1[i + 12]);
+      resdst[i + 13] = _mm_aesimc_si128 (src1[i + 13]);
+      resdst[i + 14] = _mm_aesimc_si128 (src1[i + 14]);
+      resdst[i + 15] = _mm_aesimc_si128 (src1[i + 15]);
+    }
+
+  for (i = 0; i < NUM; i++)
+    if (memcmp(edst + i, resdst + i, sizeof (__m128i)))
+      abort ();
+}
Index: gcc/testsuite/gcc.target/i386/aesenc.c
===================================================================
--- gcc/testsuite/gcc.target/i386/aesenc.c	(.../fsf/trunk)	(revision 0)
+++ gcc/testsuite/gcc.target/i386/aesenc.c	(.../branches/aes)	(revision 2007)
@@ -0,0 +1,68 @@
+/* { dg-do run } */
+/* { dg-require-effective-target aes } */
+/* { dg-options "-O2 -maes" } */
+
+#include <wmmintrin.h>
+#include <string.h>
+
+#include "aes-check.h"
+
+extern void abort (void);
+
+#define NUM 1024
+
+static __m128i src1[NUM];
+static __m128i src2[NUM];
+static __m128i edst[NUM];
+
+static __m128i resdst[NUM];
+
+/* Initialize input/output vectors.  (Currently, there is only one set
+   of input/output vectors).  */
+
+static void
+init_data (__m128i *s1, __m128i *s2, __m128i *d)
+{
+  int i;
+  for (i = 0; i < NUM; i++)
+    {
+      s1[i] = _mm_setr_epi32 (0x5d53475d, 0x63746f72,
+			      0x73745665, 0x7b5b5465);
+      s2[i] = _mm_setr_epi32 (0x726f6e5d, 0x5b477565,
+			      0x68617929, 0x48692853);
+      d[i] = _mm_setr_epi32 (0xded7e595, 0x8b104b58,
+			     0x9fdba3c5, 0xa8311c2f);
+    }
+}
+
+static void
+aes_test (void)
+{
+  int i;
+
+  init_data (src1, src2, edst);
+
+  for (i = 0; i < NUM; i += 16)
+    {
+      resdst[i] = _mm_aesenc_si128 (src1[i], src2[i]);
+      resdst[i + 1] = _mm_aesenc_si128 (src1[i + 1], src2[i + 1]);
+      resdst[i + 2] = _mm_aesenc_si128 (src1[i + 2], src2[i + 2]);
+      resdst[i + 3] = _mm_aesenc_si128 (src1[i + 3], src2[i + 3]);
+      resdst[i + 4] = _mm_aesenc_si128 (src1[i + 4], src2[i + 4]);
+      resdst[i + 5] = _mm_aesenc_si128 (src1[i + 5], src2[i + 5]);
+      resdst[i + 6] = _mm_aesenc_si128 (src1[i + 6], src2[i + 6]);
+      resdst[i + 7] = _mm_aesenc_si128 (src1[i + 7], src2[i + 7]);
+      resdst[i + 8] = _mm_aesenc_si128 (src1[i + 8], src2[i + 8]);
+      resdst[i + 9] = _mm_aesenc_si128 (src1[i + 9], src2[i + 9]);
+      resdst[i + 10] = _mm_aesenc_si128 (src1[i + 10], src2[i + 10]);
+      resdst[i + 11] = _mm_aesenc_si128 (src1[i + 11], src2[i + 11]);
+      resdst[i + 12] = _mm_aesenc_si128 (src1[i + 12], src2[i + 12]);
+      resdst[i + 13] = _mm_aesenc_si128 (src1[i + 13], src2[i + 13]);
+      resdst[i + 14] = _mm_aesenc_si128 (src1[i + 14], src2[i + 14]);
+      resdst[i + 15] = _mm_aesenc_si128 (src1[i + 15], src2[i + 15]);
+    }
+
+  for (i = 0; i < NUM; i++)
+    if (memcmp (edst + i, resdst + i, sizeof (__m128i)))
+      abort ();
+}
Index: gcc/testsuite/gcc.target/i386/aesdec.c
===================================================================
--- gcc/testsuite/gcc.target/i386/aesdec.c	(.../fsf/trunk)	(revision 0)
+++ gcc/testsuite/gcc.target/i386/aesdec.c	(.../branches/aes)	(revision 2007)
@@ -0,0 +1,67 @@
+/* { dg-do run } */
+/* { dg-require-effective-target aes } */
+/* { dg-options "-O2 -maes" } */
+
+#include <wmmintrin.h>
+#include <string.h>
+
+#include "aes-check.h"
+
+extern void abort (void);
+
+#define NUM 1024
+
+static __m128i src1[NUM];
+static __m128i src2[NUM];
+static __m128i edst[NUM];
+
+static __m128i resdst[NUM];
+
+/* Initialize input/output vectors.  (Currently, there is only one set
+   of input/output vectors).  */
+static void
+init_data (__m128i *s1, __m128i *s2, __m128i *d)
+{
+  int i;
+  for (i = 0; i < NUM; i++)
+    {
+      s1[i] = _mm_setr_epi32 (0x5d53475d, 0x63746f72,
+			      0x73745665, 0x7b5b5465);
+      s2[i] = _mm_setr_epi32 (0x726f6e5d, 0x5b477565,
+			      0x68617929, 0x48692853);
+      d[i]  = _mm_setr_epi32 (0xb730392a, 0xb58eb95e,
+			      0xfaea2787, 0x138ac342);
+    }
+}
+
+static void
+aes_test (void)
+{
+  int i;
+
+  init_data (src1, src2, edst);
+
+  for (i = 0; i < NUM; i += 16)
+    {
+      resdst[i] = _mm_aesdec_si128 (src1[i], src2[i]);
+      resdst[i + 1] = _mm_aesdec_si128 (src1[i + 1], src2[i + 1]);
+      resdst[i + 2] = _mm_aesdec_si128 (src1[i + 2], src2[i + 2]);
+      resdst[i + 3] = _mm_aesdec_si128 (src1[i + 3], src2[i + 3]);
+      resdst[i + 4] = _mm_aesdec_si128 (src1[i + 4], src2[i + 4]);
+      resdst[i + 5] = _mm_aesdec_si128 (src1[i + 5], src2[i + 5]);
+      resdst[i + 6] = _mm_aesdec_si128 (src1[i + 6], src2[i + 6]);
+      resdst[i + 7] = _mm_aesdec_si128 (src1[i + 7], src2[i + 7]);
+      resdst[i + 8] = _mm_aesdec_si128 (src1[i + 8], src2[i + 8]);
+      resdst[i + 9] = _mm_aesdec_si128 (src1[i + 9], src2[i + 9]);
+      resdst[i + 10] = _mm_aesdec_si128 (src1[i + 10], src2[i + 10]);
+      resdst[i + 11] = _mm_aesdec_si128 (src1[i + 11], src2[i + 11]);
+      resdst[i + 12] = _mm_aesdec_si128 (src1[i + 12], src2[i + 12]);
+      resdst[i + 13] = _mm_aesdec_si128 (src1[i + 13], src2[i + 13]);
+      resdst[i + 14] = _mm_aesdec_si128 (src1[i + 14], src2[i + 14]);
+      resdst[i + 15] = _mm_aesdec_si128 (src1[i + 15], src2[i + 15]);
+    }
+
+  for (i = 0; i < NUM; i++)
+    if (memcmp (edst + i, resdst + i, sizeof (__m128i)))
+      abort ();
+}
Index: gcc/config.gcc
===================================================================
--- gcc/config.gcc	(.../fsf/trunk)	(revision 2007)
+++ gcc/config.gcc	(.../branches/aes)	(revision 2007)
@@ -309,13 +309,15 @@ i[34567]86-*-*)
 	cpu_type=i386
 	extra_headers="cpuid.h mmintrin.h mm3dnow.h xmmintrin.h emmintrin.h
 		       pmmintrin.h tmmintrin.h ammintrin.h smmintrin.h
-		       nmmintrin.h bmmintrin.h mmintrin-common.h"
+		       nmmintrin.h bmmintrin.h mmintrin-common.h
+		       wmmintrin.h"
 	;;
 x86_64-*-*)
 	cpu_type=i386
 	extra_headers="cpuid.h mmintrin.h mm3dnow.h xmmintrin.h emmintrin.h
 		       pmmintrin.h tmmintrin.h ammintrin.h smmintrin.h
-		       nmmintrin.h bmmintrin.h mmintrin-common.h"
+		       nmmintrin.h bmmintrin.h mmintrin-common.h
+		       wmmintrin.h"
 	need_64bit_hwint=yes
 	;;
 ia64-*-*)
Index: gcc/config/i386/i386.h
===================================================================
--- gcc/config/i386/i386.h	(.../fsf/trunk)	(revision 2007)
+++ gcc/config/i386/i386.h	(.../branches/aes)	(revision 2007)
@@ -46,6 +46,8 @@ along with GCC; see the file COPYING3.  
 #define TARGET_SSSE3	OPTION_ISA_SSSE3
 #define TARGET_SSE4_1	OPTION_ISA_SSE4_1
 #define TARGET_SSE4_2	OPTION_ISA_SSE4_2
+#define TARGET_AES	OPTION_ISA_AES
+#define TARGET_CLMUL	OPTION_ISA_CLMUL
 #define TARGET_SSE4A	OPTION_ISA_SSE4A
 #define TARGET_SSE5	OPTION_ISA_SSE5
 #define TARGET_ROUND	OPTION_ISA_ROUND
@@ -683,6 +685,10 @@ extern const char *host_detect_local_cpu
 	builtin_define ("__SSE4_1__");				\
       if (TARGET_SSE4_2)					\
 	builtin_define ("__SSE4_2__");				\
+      if (TARGET_AES)						\
+	builtin_define ("__AES__");				\
+      if (TARGET_CLMUL)						\
+	builtin_define ("__CLMUL__");				\
       if (TARGET_SSE4A)						\
  	builtin_define ("__SSE4A__");		                \
       if (TARGET_SSE5)						\
Index: gcc/config/i386/i386.md
===================================================================
--- gcc/config/i386/i386.md	(.../fsf/trunk)	(revision 2007)
+++ gcc/config/i386/i386.md	(.../branches/aes)	(revision 2007)
@@ -186,6 +186,17 @@
    (UNSPEC_FRCZ			156)
    (UNSPEC_CVTPH2PS		157)
    (UNSPEC_CVTPS2PH		158)
+
+   ; For AES support
+   (UNSPEC_AESENC		159)
+   (UNSPEC_AESENCLAST		160)
+   (UNSPEC_AESDEC		161)
+   (UNSPEC_AESDECLAST		162)
+   (UNSPEC_AESIMC		163)
+   (UNSPEC_AESKEYGENASSIST	164)
+
+   ; For CLMUL support
+   (UNSPEC_CLMUL		165)
   ])
 
 (define_constants
Index: gcc/config/i386/wmmintrin.h
===================================================================
--- gcc/config/i386/wmmintrin.h	(.../fsf/trunk)	(revision 0)
+++ gcc/config/i386/wmmintrin.h	(.../branches/aes)	(revision 2007)
@@ -0,0 +1,124 @@
+/* Copyright (C) 2008 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 2, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING.  If not, write to
+   the Free Software Foundation, 59 Temple Place - Suite 330,
+   Boston, MA 02111-1307, USA.  */
+
+/* As a special exception, if you include this header file into source
+   files compiled by GCC, this header file does not by itself cause
+   the resulting executable to be covered by the GNU General Public
+   License.  This exception does not however invalidate any other
+   reasons why the executable file might be covered by the GNU General
+   Public License.  */
+
+/* Implemented from the specification included in the Intel C++ Compiler
+   User Guide and Reference, version 11.0.  */
+
+#ifndef _WMMINTRIN_H_INCLUDED
+#define _WMMINTRIN_H_INCLUDED
+
+#if !defined (__AES__) && !defined (__CLMUL__)
+# error "AES/CLMUL instruction set not enabled"
+#else
+
+/* We need definitions from the SSE4, SSSE3, SSE3, SSE2 and SSE header
+   files.  */
+#include <smmintrin.h>
+
+/* AES */
+
+#ifdef __AES__
+/* Performs 1 round of AES decryption of the first m128i using 
+   the second m128i as a round key.  */
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_aesdec_si128 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) __builtin_ia32_aesdec128 ((__v2di)__X, (__v2di)__Y);
+}
+
+/* Performs the last round of AES decryption of the first m128i 
+   using the second m128i as a round key.  */
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_aesdeclast_si128 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) __builtin_ia32_aesdeclast128 ((__v2di)__X,
+						 (__v2di)__Y);
+}
+
+/* Performs 1 round of AES encryption of the first m128i using 
+   the second m128i as a round key.  */
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_aesenc_si128 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) __builtin_ia32_aesenc128 ((__v2di)__X, (__v2di)__Y);
+}
+
+/* Performs the last round of AES encryption of the first m128i
+   using the second m128i as a round key.  */
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_aesenclast_si128 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) __builtin_ia32_aesenclast128 ((__v2di)__X, (__v2di)__Y);
+}
+
+/* Performs the InverseMixColumn operation on the source m128i 
+   and stores the result into m128i destination.  */
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_aesimc_si128 (__m128i __X)
+{
+  return (__m128i) __builtin_ia32_aesimc128 ((__v2di)__X);
+}
+
+/* Generates a m128i round key for the input m128i AES cipher key and
+   byte round constant.  The second parameter must be a compile time
+   constant.  */
+#ifdef __OPTIMIZE__
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_aeskeygenassist_si128 (__m128i __X, const int __C)
+{
+  return (__m128i) __builtin_ia32_aeskeygenassist128 ((__v2di)__X, __C);
+}
+#else
+#define _mm_aeskeygenassist_si128(X, C)					\
+  ((__m128i) __builtin_ia32_aeskeygenassist128 ((__v2di)(__m128i)(X),	\
+						(int)(C)))
+#endif
+#endif  /* __AES__ */
+
+/* CLMUL */
+
+#ifdef __CLMUL__
+/* Performs carry-less integer multiplication of 64-bit halves of
+   128-bit input operands.  The third parameter inducates which 64-bit
+   haves of the input parameters v1 and v2 should be used. It must be
+   a compile time constant.  */
+#ifdef __OPTIMIZE__
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_clmulepi64_si128 (__m128i __X, __m128i __Y, const int __I)
+{
+  return (__m128i) __builtin_ia32_pclmulqdq128 ((__v2di)__X,
+						(__v2di)__Y, __I);
+}
+#else
+#define _mm_clmulepi64_si128(X, Y, I)					\
+  ((__m128i) __builtin_ia32_pclmulqdq128 ((__v2di)(__m128i)(X),		\
+					  (__v2di)(__m128i)(Y), (int)(I)))
+#endif
+#endif  /* __CLMUL__  */
+
+#endif /* __AES__/__CLMUL__ */
+
+#endif /* _WMMINTRIN_H_INCLUDED */
Index: gcc/config/i386/cpuid.h
===================================================================
--- gcc/config/i386/cpuid.h	(.../fsf/trunk)	(revision 2007)
+++ gcc/config/i386/cpuid.h	(.../branches/aes)	(revision 2007)
@@ -33,11 +33,13 @@
 
 /* %ecx */
 #define bit_SSE3	(1 << 0)
+#define bit_CLMUL	(1 << 1)
 #define bit_SSSE3	(1 << 9)
 #define bit_CMPXCHG16B	(1 << 13)
 #define bit_SSE4_1	(1 << 19)
 #define bit_SSE4_2	(1 << 20)
 #define bit_POPCNT	(1 << 23)
+#define bit_AES		(1 << 25)
 
 /* %edx */
 #define bit_CMPXCHG8B	(1 << 8)
Index: gcc/config/i386/sse.md
===================================================================
--- gcc/config/i386/sse.md	(.../fsf/trunk)	(revision 2007)
+++ gcc/config/i386/sse.md	(.../branches/aes)	(revision 2007)
@@ -8047,3 +8047,80 @@
 }
   [(set_attr "type" "ssecmp")
    (set_attr "mode" "TI")])
+
+(define_insn "aesenc"
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
+	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0")
+		       (match_operand:V2DI 2 "nonimmediate_operand" "xm")]
+		      UNSPEC_AESENC))]
+  "TARGET_AES"
+  "aesenc\t{%2, %0|%0, %2}"
+  [(set_attr "type" "sselog1")
+   (set_attr "prefix_extra" "1")
+   (set_attr "mode" "TI")])
+
+(define_insn "aesenclast"
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
+	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0")
+		       (match_operand:V2DI 2 "nonimmediate_operand" "xm")]
+		      UNSPEC_AESENCLAST))]
+  "TARGET_AES"
+  "aesenclast\t{%2, %0|%0, %2}"
+  [(set_attr "type" "sselog1")
+   (set_attr "prefix_extra" "1")
+   (set_attr "mode" "TI")])
+
+(define_insn "aesdec"
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
+	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0")
+		       (match_operand:V2DI 2 "nonimmediate_operand" "xm")]
+		      UNSPEC_AESDEC))]
+  "TARGET_AES"
+  "aesdec\t{%2, %0|%0, %2}"
+  [(set_attr "type" "sselog1")
+   (set_attr "prefix_extra" "1")
+   (set_attr "mode" "TI")])
+
+(define_insn "aesdeclast"
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
+	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0")
+		       (match_operand:V2DI 2 "nonimmediate_operand" "xm")]
+		      UNSPEC_AESDECLAST))]
+  "TARGET_AES"
+  "aesdeclast\t{%2, %0|%0, %2}"
+  [(set_attr "type" "sselog1")
+   (set_attr "prefix_extra" "1")
+   (set_attr "mode" "TI")])
+
+(define_insn "aesimc"
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
+	(unspec:V2DI [(match_operand:V2DI 1 "nonimmediate_operand" "xm")]
+		      UNSPEC_AESIMC))]
+  "TARGET_AES"
+  "aesimc\t{%1, %0|%0, %1}"
+  [(set_attr "type" "sselog1")
+   (set_attr "prefix_extra" "1")
+   (set_attr "mode" "TI")])
+
+(define_insn "aeskeygenassist"
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
+	(unspec:V2DI [(match_operand:V2DI 1 "nonimmediate_operand" "xm")
+		      (match_operand:SI 2 "const_0_to_255_operand" "n")]
+		     UNSPEC_AESKEYGENASSIST))]
+  "TARGET_AES"
+  "aeskeygenassist\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "sselog1")
+   (set_attr "prefix_extra" "1")
+   (set_attr "mode" "TI")])
+
+(define_insn "pclmulqdq"
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
+	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0")
+		      (match_operand:V2DI 2 "nonimmediate_operand" "xm")
+		      (match_operand:SI 3 "const_0_to_255_operand" "n")]
+		     UNSPEC_CLMUL))]
+  "TARGET_CLMUL"
+  "pclmulqdq\t{%3, %2, %0|%0, %2, %3}"
+  [(set_attr "type" "sselog1")
+   (set_attr "prefix_extra" "1")
+   (set_attr "mode" "TI")])
Index: gcc/config/i386/i386.opt
===================================================================
--- gcc/config/i386/i386.opt	(.../fsf/trunk)	(revision 2007)
+++ gcc/config/i386/i386.opt	(.../branches/aes)	(revision 2007)
@@ -236,6 +236,14 @@ msse4
 Target RejectNegative Report Mask(ISA_SSE4_2) MaskExists Var(ix86_isa_flags) VarExists
 Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1 and SSE4.2 built-in functions and code generation
 
+maes
+Target Report Mask(ISA_AES) Var(ix86_isa_flags) VarExists
+Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2 and AES built-in functions and code generation
+
+mclmul
+Target Report Mask(ISA_CLMUL) Var(ix86_isa_flags) VarExists
+Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2 and CLMUL built-in functions and code generation
+
 mno-sse4
 Target RejectNegative Report InverseMask(ISA_SSE4_1) MaskExists Var(ix86_isa_flags) VarExists
 Do not support SSE4.1 and SSE4.2 built-in functions and code generation
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(.../fsf/trunk)	(revision 2007)
+++ gcc/config/i386/i386.c	(.../branches/aes)	(revision 2007)
@@ -1786,6 +1786,10 @@ static int ix86_isa_flags_explicit;
   (OPTION_MASK_ISA_SSE4_1 | OPTION_MASK_ISA_SSSE3_SET)
 #define OPTION_MASK_ISA_SSE4_2_SET \
   (OPTION_MASK_ISA_SSE4_2 | OPTION_MASK_ISA_SSE4_1_SET)
+#define OPTION_MASK_ISA_AES_SET \
+  (OPTION_MASK_ISA_AES | OPTION_MASK_ISA_SSE4_2_SET)
+#define OPTION_MASK_ISA_CLMUL_SET \
+  (OPTION_MASK_ISA_CLMUL | OPTION_MASK_ISA_SSE4_2_SET)
 
 /* SSE4 includes both SSE4.1 and SSE4.2. -msse4 should be the same
    as -msse4.2.  */
@@ -1817,7 +1821,12 @@ static int ix86_isa_flags_explicit;
   (OPTION_MASK_ISA_SSSE3 | OPTION_MASK_ISA_SSE4_1_UNSET)
 #define OPTION_MASK_ISA_SSE4_1_UNSET \
   (OPTION_MASK_ISA_SSE4_1 | OPTION_MASK_ISA_SSE4_2_UNSET)
-#define OPTION_MASK_ISA_SSE4_2_UNSET OPTION_MASK_ISA_SSE4_2
+#define OPTION_MASK_ISA_SSE4_2_UNSET \
+  (OPTION_MASK_ISA_SSE4_2 \
+   | OPTION_MASK_ISA_AES_UNSET \
+   | OPTION_MASK_ISA_CLMUL_UNSET)
+#define OPTION_MASK_ISA_AES_UNSET OPTION_MASK_ISA_AES
+#define OPTION_MASK_ISA_CLMUL_UNSET OPTION_MASK_ISA_CLMUL
 
 /* SSE4 includes both SSE4.1 and SSE4.2.  -mno-sse4 should the same
    as -mno-sse4.1. */
@@ -1947,6 +1956,32 @@ ix86_handle_option (size_t code, const c
 	}
       return true;
 
+    case OPT_maes:
+      if (value)
+	{
+	  ix86_isa_flags |= OPTION_MASK_ISA_AES_SET;
+	  ix86_isa_flags_explicit |= OPTION_MASK_ISA_AES_SET;
+	}
+      else
+	{
+	  ix86_isa_flags &= ~OPTION_MASK_ISA_AES_UNSET;
+	  ix86_isa_flags_explicit |= OPTION_MASK_ISA_AES_UNSET;
+	}
+      return true;
+
+    case OPT_mclmul:
+      if (value)
+	{
+	  ix86_isa_flags |= OPTION_MASK_ISA_CLMUL_SET;
+	  ix86_isa_flags_explicit |= OPTION_MASK_ISA_CLMUL_SET;
+	}
+      else
+	{
+	  ix86_isa_flags &= ~OPTION_MASK_ISA_CLMUL_UNSET;
+	  ix86_isa_flags_explicit |= OPTION_MASK_ISA_CLMUL_UNSET;
+	}
+      return true;
+
     case OPT_msse4:
       ix86_isa_flags |= OPTION_MASK_ISA_SSE4_SET;
       ix86_isa_flags_explicit |= OPTION_MASK_ISA_SSE4_SET;
@@ -2078,7 +2113,9 @@ override_options (void)
       PTA_NO_SAHF = 1 << 13,
       PTA_SSE4_1 = 1 << 14,
       PTA_SSE4_2 = 1 << 15,
-      PTA_SSE5 = 1 << 16
+      PTA_SSE5 = 1 << 16,
+      PTA_AES = 1 << 17,
+      PTA_CLMUL = 1 << 18
     };
 
   static struct pta
@@ -2368,6 +2405,12 @@ override_options (void)
 	if (processor_alias_table[i].flags & PTA_SSE4_2
 	    && !(ix86_isa_flags_explicit & OPTION_MASK_ISA_SSE4_2))
 	  ix86_isa_flags |= OPTION_MASK_ISA_SSE4_2;
+	if (processor_alias_table[i].flags & PTA_AES
+	    && !(ix86_isa_flags_explicit & OPTION_MASK_ISA_AES))
+	  ix86_isa_flags |= OPTION_MASK_ISA_AES;
+	if (processor_alias_table[i].flags & PTA_CLMUL
+	    && !(ix86_isa_flags_explicit & OPTION_MASK_ISA_CLMUL))
+	  ix86_isa_flags |= OPTION_MASK_ISA_CLMUL;
 	if (processor_alias_table[i].flags & PTA_SSE4A
 	    && !(ix86_isa_flags_explicit & OPTION_MASK_ISA_SSE4A))
 	  ix86_isa_flags |= OPTION_MASK_ISA_SSE4A;
@@ -17559,6 +17602,17 @@ enum ix86_builtins
 
   IX86_BUILTIN_PCMPGTQ,
 
+  /* AES instructions */
+  IX86_BUILTIN_AESENC128,
+  IX86_BUILTIN_AESENCLAST128,
+  IX86_BUILTIN_AESDEC128,
+  IX86_BUILTIN_AESDECLAST128,
+  IX86_BUILTIN_AESIMC128,
+  IX86_BUILTIN_AESKEYGENASSIST128,
+
+  /* CLMUL instruction */
+  IX86_BUILTIN_PCLMULQDQ128,
+
   /* TFmode support builtins.  */
   IX86_BUILTIN_INFQ,
   IX86_BUILTIN_FABSQ,
@@ -17914,6 +17968,9 @@ static const struct builtin_description 
   { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_pblendw, "__builtin_ia32_pblendw128", IX86_BUILTIN_PBLENDW128, UNKNOWN, 0 },
   { OPTION_MASK_ISA_ROUND, CODE_FOR_sse4_1_roundsd, 0, IX86_BUILTIN_ROUNDSD, UNKNOWN, 0 },
   { OPTION_MASK_ISA_ROUND, CODE_FOR_sse4_1_roundss, 0, IX86_BUILTIN_ROUNDSS, UNKNOWN, 0 },
+
+  /* CLMUL */
+  { OPTION_MASK_ISA_CLMUL, CODE_FOR_pclmulqdq, "__builtin_ia32_pclmulqdq128", IX86_BUILTIN_PCLMULQDQ128, UNKNOWN, 0 },
 };
 
 static const struct builtin_description bdesc_2arg[] =
@@ -18194,6 +18251,13 @@ static const struct builtin_description 
 
   /* SSE4.2 */
   { OPTION_MASK_ISA_SSE4_2, CODE_FOR_sse4_2_gtv2di3, "__builtin_ia32_pcmpgtq", IX86_BUILTIN_PCMPGTQ, UNKNOWN, 0 },
+
+  /* AES */
+  { OPTION_MASK_ISA_AES, CODE_FOR_aesenc, "__builtin_ia32_aesenc128", IX86_BUILTIN_AESENC128, UNKNOWN, 0 },
+  { OPTION_MASK_ISA_AES, CODE_FOR_aesenclast, "__builtin_ia32_aesenclast128", IX86_BUILTIN_AESENCLAST128, UNKNOWN, 0 },
+  { OPTION_MASK_ISA_AES, CODE_FOR_aesdec, "__builtin_ia32_aesdec128", IX86_BUILTIN_AESDEC128, UNKNOWN, 0 },
+  { OPTION_MASK_ISA_AES, CODE_FOR_aesdeclast, "__builtin_ia32_aesdeclast128", IX86_BUILTIN_AESDECLAST128, UNKNOWN, 0 },
+  { OPTION_MASK_ISA_AES, CODE_FOR_aeskeygenassist, 0, IX86_BUILTIN_AESKEYGENASSIST128, UNKNOWN, 0 },
 };
 
 static const struct builtin_description bdesc_1arg[] =
@@ -18271,6 +18335,9 @@ static const struct builtin_description 
   /* Fake 1 arg builtins with a constant smaller than 8 bits as the 2nd arg.  */
   { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_roundpd, 0, IX86_BUILTIN_ROUNDPD, UNKNOWN, 0 },
   { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_roundps, 0, IX86_BUILTIN_ROUNDPS, UNKNOWN, 0 },
+
+  /* AES */
+  { OPTION_MASK_ISA_AES, CODE_FOR_aesimc, "__builtin_ia32_aesimc128", IX86_BUILTIN_AESIMC128, UNKNOWN, 0 },
 };
 
 /* SSE5 */
@@ -19214,6 +19281,9 @@ ix86_init_mmx_sse_builtins (void)
 	case V4SImode:
 	  type = v4si_ftype_v4si;
 	  break;
+	case V2DImode:
+	  type = v2di_ftype_v2di;
+	  break;
 	case V2DFmode:
 	  type = v2df_ftype_v2df;
 	  break;
@@ -19513,6 +19583,9 @@ ix86_init_mmx_sse_builtins (void)
 				    NULL_TREE);
   def_builtin_const (OPTION_MASK_ISA_SSE4_2, "__builtin_ia32_crc32di", ftype, IX86_BUILTIN_CRC32DI);
 
+  /* AES */
+  def_builtin_const (OPTION_MASK_ISA_AES, "__builtin_ia32_aeskeygenassist128", v2di_ftype_v2di_int, IX86_BUILTIN_AESKEYGENASSIST128);
+
   /* AMDFAM10 SSE4A New built-ins  */
   def_builtin (OPTION_MASK_ISA_SSE4A, "__builtin_ia32_movntsd", void_ftype_pdouble_v2df, IX86_BUILTIN_MOVNTSD);
   def_builtin (OPTION_MASK_ISA_SSE4A, "__builtin_ia32_movntss", void_ftype_pfloat_v4sf, IX86_BUILTIN_MOVNTSS);
@@ -19793,6 +19866,44 @@ ix86_expand_crc32 (enum insn_code icode,
   return target;
 }
 
+/* Subroutine of ix86_expand_builtin to take care of binop insns
+   with an immediate.  */
+
+static rtx
+ix86_expand_binop_imm_builtin (enum insn_code icode, tree exp,
+				rtx target)
+{
+  rtx pat;
+  tree arg0 = CALL_EXPR_ARG (exp, 0);
+  tree arg1 = CALL_EXPR_ARG (exp, 1);
+  rtx op0 = expand_normal (arg0);
+  rtx op1 = expand_normal (arg1);
+  enum machine_mode tmode = insn_data[icode].operand[0].mode;
+  enum machine_mode mode0 = insn_data[icode].operand[1].mode;
+  enum machine_mode mode1 = insn_data[icode].operand[2].mode;
+
+  if (! (*insn_data[icode].operand[1].predicate) (op0, mode1))
+    {
+      op0 = copy_to_reg (op0);
+      op0 = simplify_gen_subreg (mode0, op0, GET_MODE (op0), 0);
+    }
+
+  if (! (*insn_data[icode].operand[2].predicate) (op1, mode1))
+    {
+      error ("the last operand must be an immediate");
+      return const0_rtx;
+    }
+
+  target = gen_reg_rtx (V2DImode);
+  pat = GEN_FCN (icode) (simplify_gen_subreg (tmode, target,
+					      V2DImode, 0),
+			 op0, op1);
+  if (! pat)
+    return 0;
+  emit_insn (pat);
+  return target;
+}
+
 /* Subroutine of ix86_expand_builtin to take care of binop insns.  */
 
 static rtx
@@ -20922,34 +21033,18 @@ ix86_expand_builtin (tree exp, rtx targe
       return target;
 
     case IX86_BUILTIN_PSLLDQI128:
+      return ix86_expand_binop_imm_builtin (CODE_FOR_sse2_ashlti3,
+					     exp, target);
+      break;
+
     case IX86_BUILTIN_PSRLDQI128:
-      icode = (fcode == IX86_BUILTIN_PSLLDQI128 ? CODE_FOR_sse2_ashlti3
-	       : CODE_FOR_sse2_lshrti3);
-      arg0 = CALL_EXPR_ARG (exp, 0);
-      arg1 = CALL_EXPR_ARG (exp, 1);
-      op0 = expand_normal (arg0);
-      op1 = expand_normal (arg1);
-      tmode = insn_data[icode].operand[0].mode;
-      mode1 = insn_data[icode].operand[1].mode;
-      mode2 = insn_data[icode].operand[2].mode;
+      return ix86_expand_binop_imm_builtin (CODE_FOR_sse2_lshrti3,
+					     exp, target);
+      break;
 
-      if (! (*insn_data[icode].operand[1].predicate) (op0, mode1))
-	{
-	  op0 = copy_to_reg (op0);
-	  op0 = simplify_gen_subreg (mode1, op0, GET_MODE (op0), 0);
-	}
-      if (! (*insn_data[icode].operand[2].predicate) (op1, mode2))
-	{
-	  error ("shift must be an immediate");
-	  return const0_rtx;
-	}
-      target = gen_reg_rtx (V2DImode);
-      pat = GEN_FCN (icode) (simplify_gen_subreg (tmode, target, V2DImode, 0),
-			     op0, op1);
-      if (! pat)
-	return 0;
-      emit_insn (pat);
-      return target;
+    case IX86_BUILTIN_AESKEYGENASSIST128:
+      return ix86_expand_binop_imm_builtin (CODE_FOR_aeskeygenassist,
+					     exp, target);
 
     case IX86_BUILTIN_FEMMS:
       emit_insn (gen_mmx_femms ());

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: PATCH: Enable Intel AES/CLMUL
  2008-04-03 14:31 PATCH: Enable Intel AES/CLMUL H.J. Lu
@ 2008-04-03 16:21 ` Daniel Berlin
  2008-04-03 16:23   ` H.J. Lu
       [not found] ` <alpine.LSU.1.00.0804062013110.22304@acrux.dbai.tuwien.ac.at>
  1 sibling, 1 reply; 23+ messages in thread
From: Daniel Berlin @ 2008-04-03 16:21 UTC (permalink / raw)
  To: H.J. Lu; +Cc: gcc-patches, ubizjak

So, when are you going to teach the tree level to auto-transform aes
encryption/etc into these instructions?
:)


On Thu, Apr 3, 2008 at 9:50 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> Hi,
>
>  This patch enables Intel AES/CLMUL:
>
>  http://softwareprojects.intel.com/avx/
>
>  OK for mainline?
>
>  Thanks.
>
>  H.J.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: PATCH: Enable Intel AES/CLMUL
  2008-04-03 16:21 ` Daniel Berlin
@ 2008-04-03 16:23   ` H.J. Lu
  0 siblings, 0 replies; 23+ messages in thread
From: H.J. Lu @ 2008-04-03 16:23 UTC (permalink / raw)
  To: Daniel Berlin; +Cc: gcc-patches, ubizjak

On Thu, Apr 3, 2008 at 9:13 AM, Daniel Berlin <dberlin@dberlin.org> wrote:
> So, when are you going to teach the tree level to auto-transform aes
>  encryption/etc into these instructions?
>  :)
>

That will be a fun project. Are there any examples?


H.J.
>
>
>  On Thu, Apr 3, 2008 at 9:50 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>  > Hi,
>  >
>  >  This patch enables Intel AES/CLMUL:
>  >
>  >  http://softwareprojects.intel.com/avx/
>  >
>  >  OK for mainline?
>  >
>  >  Thanks.
>  >
>  >  H.J.
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* PATCH: Mention Intel AES/PCLMUL
       [not found] ` <alpine.LSU.1.00.0804062013110.22304@acrux.dbai.tuwien.ac.at>
@ 2008-04-06 21:06   ` H.J. Lu
  2008-04-06 21:17     ` Gerald Pfeifer
  0 siblings, 1 reply; 23+ messages in thread
From: H.J. Lu @ 2008-04-06 21:06 UTC (permalink / raw)
  To: Gerald Pfeifer; +Cc: gcc-patches

On Sun, Apr 06, 2008 at 08:13:38PM +0200, Gerald Pfeifer wrote:
> Hi HJ,
> 
> On Thu, 3 Apr 2008, H.J. Lu wrote:
> > This patch enables Intel AES/CLMUL:
> > 
> > http://softwareprojects.intel.com/avx/
> 
> would you mind also adding a note to gcc-4.4/changes.html?

Like this?

Thanks.


H.J.
---
Index: gcc-4.4/changes.html
===================================================================
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.4/changes.html,v
retrieving revision 1.6
diff -r1.6 changes.html
54a55,62
> <h3>IA-32/x86-64</h3>
>   <ul>
>     <li>Support for Intel AES built-in functions and code generation are
> 	available via <code>-maes</code>.</li>
>     <li>Support for Intel PCLMUL built-in function and code generation are
> 	available via <code>-mpclmul</code>.</li>
>   </ul>
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: PATCH: Mention Intel AES/PCLMUL
  2008-04-06 21:06   ` PATCH: Mention Intel AES/PCLMUL H.J. Lu
@ 2008-04-06 21:17     ` Gerald Pfeifer
  0 siblings, 0 replies; 23+ messages in thread
From: Gerald Pfeifer @ 2008-04-06 21:17 UTC (permalink / raw)
  To: H.J. Lu; +Cc: gcc-patches

On Sun, 6 Apr 2008, H.J. Lu wrote:
> Like this?

Yep. :-)

Gerald

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: PATCH: Enable Intel AES/CLMUL
  2008-04-04 20:27           ` Michael Meissner
@ 2008-04-04 20:43             ` H.J. Lu
  0 siblings, 0 replies; 23+ messages in thread
From: H.J. Lu @ 2008-04-04 20:43 UTC (permalink / raw)
  To: Michael Meissner, Uros Bizjak, GCC Patches

On Fri, Apr 04, 2008 at 02:41:41PM -0400, Michael Meissner wrote:
> On Fri, Apr 04, 2008 at 06:28:54AM -0700, H.J. Lu wrote:
> > There is a proposal to get rid of all *mmintrin.h.  Users should
> > include one header file, something like <ia32intrin.h> ,for all
> > current and future intrinsics.  The name of the meta intrinsic
> > heade file hasn't be decided. Do we have any preferences/suggestions?
> 
> While I tend to hate the proliferation of *mmintrin.h files and have been
> guilty of the same, I should note that other compilers for the x86 use these
> files also, and that we should strive to make sure that there exists
> bmmintrin.h, ammintrin.h, smmintrin.h, emmintrin.h, etc. that includes
> ia32inntrin.h or whatever so that these programs will continue to work.

We will keep the existing *mmintrin.h for backward compatibility.


H.J.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: PATCH: Enable Intel AES/CLMUL
  2008-04-04 13:32         ` H.J. Lu
  2008-04-04 13:56           ` Uros Bizjak
@ 2008-04-04 20:27           ` Michael Meissner
  2008-04-04 20:43             ` H.J. Lu
  1 sibling, 1 reply; 23+ messages in thread
From: Michael Meissner @ 2008-04-04 20:27 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Uros Bizjak, GCC Patches

On Fri, Apr 04, 2008 at 06:28:54AM -0700, H.J. Lu wrote:
> There is a proposal to get rid of all *mmintrin.h.  Users should
> include one header file, something like <ia32intrin.h> ,for all
> current and future intrinsics.  The name of the meta intrinsic
> heade file hasn't be decided. Do we have any preferences/suggestions?

While I tend to hate the proliferation of *mmintrin.h files and have been
guilty of the same, I should note that other compilers for the x86 use these
files also, and that we should strive to make sure that there exists
bmmintrin.h, ammintrin.h, smmintrin.h, emmintrin.h, etc. that includes
ia32inntrin.h or whatever so that these programs will continue to work.

-- 
Michael Meissner, AMD
90 Central Street, MS 83-29, Boxborough, MA, 01719, USA
michael.meissner@amd.com

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: PATCH: Enable Intel AES/CLMUL
  2008-04-04 15:00                 ` Jakub Jelinek
  2008-04-04 15:58                   ` H.J. Lu
@ 2008-04-04 16:33                   ` Uros Bizjak
  1 sibling, 0 replies; 23+ messages in thread
From: Uros Bizjak @ 2008-04-04 16:33 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: H.J. Lu, GCC Patches

Jakub Jelinek wrote:
> On Fri, Apr 04, 2008 at 04:33:51PM +0200, Uros Bizjak wrote:
>   
>> This will work just fine even without SSE (although a warning about
>> changed ABI will be issued). We can even add AES functionality to the
>> library this way (hint, hint ;)
>>
>> And using intrinsic, the situation will be actually just reversed to
>> the situation you described below.
>>     
>
> This doesn't make sense.  The aes instructions use SSE2 registers,
> so IMNSHO you really have to enable sse2 to be able to emit aes/pclmul
> instructions.  So it makes perfect sense for -maes to enable -msse2.
>   

Indeed. I have reversed -msse/-mno-sse logic for -maes. This is an 
example I had in mind:

--cut here--
typedef long long __v2di __attribute__ ((vector_size (16)));
typedef long long __m128i __attribute__ ((vector_size (16)));

__v2di __X, __Y;

#ifndef __AES__

__v2di __builtin_ia32_aesenc128 (__v2di __X, __v2di __Y)
{
  /* Not really AES implementation */
  return __X + __Y;
}
#endif

void crypto_func (void)
{
  volatile __m128i AA;

  AA = (__m128i) __builtin_ia32_aesenc128 ((__v2di)__X, (__v2di)__Y);
}
--cut here--

We need -msse2 when -maes is enabled. We _can_ pass parameters to the 
fallback function through the stack, but  -maes is _disabled_ at this 
time and we are free to have -msse/-mno-sse without -maes.

Today is just not my day.

Uros.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: PATCH: Enable Intel AES/CLMUL
  2008-04-04 15:31                 ` H.J. Lu
@ 2008-04-04 16:08                   ` Uros Bizjak
  0 siblings, 0 replies; 23+ messages in thread
From: Uros Bizjak @ 2008-04-04 16:08 UTC (permalink / raw)
  To: H.J. Lu; +Cc: GCC Patches

H.J. Lu wrote:
> On Fri, Apr 04, 2008 at 04:33:51PM +0200, Uros Bizjak wrote:
>   
>> The patch is OK, but please can you reconsider SSE2 requirement?
>>     
>
> The only problem I have is if -maes doesn't turn on SSE, it doesn't
> really enable AES. You may need -msse2. It may be confusing to users.
>
> Here is the patch. I will check it in.
>   

No, please commit your previous version. On a second thought, it was me 
that had the logic of -msse/-mno-sse2 in the header reversed.

Please accept my apologies,
Uros.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: PATCH: Enable Intel AES/CLMUL
  2008-04-04 15:00                 ` Jakub Jelinek
@ 2008-04-04 15:58                   ` H.J. Lu
  2008-04-04 16:33                   ` Uros Bizjak
  1 sibling, 0 replies; 23+ messages in thread
From: H.J. Lu @ 2008-04-04 15:58 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Uros Bizjak, GCC Patches

On Fri, Apr 4, 2008 at 7:50 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Fri, Apr 04, 2008 at 04:33:51PM +0200, Uros Bizjak wrote:
>  > This will work just fine even without SSE (although a warning about
>  > changed ABI will be issued). We can even add AES functionality to the
>  > library this way (hint, hint ;)
>  >
>  > And using intrinsic, the situation will be actually just reversed to
>  > the situation you described below.
>
>  This doesn't make sense.  The aes instructions use SSE2 registers,
>  so IMNSHO you really have to enable sse2 to be able to emit aes/pclmul
>  instructions.  So it makes perfect sense for -maes to enable -msse2.
>
>

That is what I prefer. You will need SSE2 when you use -maes. Otherwise,
you can't really do much with AES. I will check in the version which -maes
enables SSE2. We can change it later before 4.4 release.

Thanks.


H.J.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: PATCH: Enable Intel AES/CLMUL
  2008-04-04 14:56               ` Uros Bizjak
  2008-04-04 15:00                 ` Jakub Jelinek
@ 2008-04-04 15:31                 ` H.J. Lu
  2008-04-04 16:08                   ` Uros Bizjak
  1 sibling, 1 reply; 23+ messages in thread
From: H.J. Lu @ 2008-04-04 15:31 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: GCC Patches

On Fri, Apr 04, 2008 at 04:33:51PM +0200, Uros Bizjak wrote:
> 
> The patch is OK, but please can you reconsider SSE2 requirement?

The only problem I have is if -maes doesn't turn on SSE, it doesn't
really enable AES. You may need -msse2. It may be confusing to users.

Here is the patch. I will check it in.


H.J.
---
gcc/

2008-04-04  H.J. Lu  <hongjiu.lu@intel.com>

	* config.gcc (extra_headers): Add wmmintrin.h for x86 and x86-64.

	* config/i386/cpuid.h (bit_AES): New.
	(bit_PCLMUL): Likewise.

	* config/i386/i386.c (pta_flags): Add PTA_AES and PTA_PCLMUL.
	(override_options): Handle PTA_AES and PTA_PCLMUL.
	(ix86_builtins): Add IX86_BUILTIN_AESENC128,
	IX86_BUILTIN_AESENCLAST128, IX86_BUILTIN_AESDEC128,
	IX86_BUILTIN_AESDECLAST128, IX86_BUILTIN_AESIMC128,
	IX86_BUILTIN_AESKEYGENASSIST128 and IX86_BUILTIN_PCLMULQDQ128.
	(bdesc_sse_3arg): Add IX86_BUILTIN_PCLMULQDQ128.
	(bdesc_2arg): Add IX86_BUILTIN_AESENC128,
	IX86_BUILTIN_AESENCLAST128, IX86_BUILTIN_AESDEC128,
	IX86_BUILTIN_AESDECLAST128 and IX86_BUILTIN_AESKEYGENASSIST128.
	(bdesc_1arg): Add IX86_BUILTIN_AESIMC128.
	(ix86_init_mmx_sse_builtins): Define __builtin_ia32_aesenc128,
	__builtin_ia32_aesenclast128, __builtin_ia32_aesdec128,
	__builtin_ia32_aesdeclast128,__builtin_ia32_aesimc128,
	__builtin_ia32_aeskeygenassist128 and
	__builtin_ia32_pclmulqdq128.
	* config/i386/i386.c (ix86_expand_binop_imm_builtin): New.
	(ix86_expand_builtin): Use it for IX86_BUILTIN_PSLLDQI128 and
	IX86_BUILTIN_PSRLDQI128.  Handle IX86_BUILTIN_AESKEYGENASSIST128.

	* config/i386/i386.h (TARGET_AES): New.
	(TARGET_PCLMUL): Likewise.
	(TARGET_CPU_CPP_BUILTINS): Handle TARGET_AES and TARGET_PCLMUL.

	* config/i386/i386.md (UNSPEC_AESENC): New.
	(UNSPEC_AESENCLAST): Likewise.
	(UNSPEC_AESDEC): Likewise.
	(UNSPEC_AESDECLAST): Likewise.
	(UNSPEC_AESIMC): Likewise.
	(UNSPEC_AESKEYGENASSIST): Likewise.
	(UNSPEC_PCLMULQDQ): Likewise.

	* config/i386/i386.opt (maes): New.
	(mpclmul): Likewise.

	* config/i386/sse.md (aesenc): New pattern.
	(aesenclast): Likewise.
	(aesdec): Likewise.
	(aesdeclast): Likewise.
	(aesimc): Likewise.
	(aeskeygenassist): Likewise.
	(pclmulqdq): Likewise.

	* config/i386/wmmintrin.h: New.

	* doc/extend.texi: Document AES and PCLMUL built-in function.

	* doc/invoke.texi: Document -maes and -mpclmul.

gcc/testsuite/

2008-04-04  H.J. Lu  <hongjiu.lu@intel.com>

	* g++.dg/other/i386-2.C: Include <wmmintrin.h>.
	* g++.dg/other/i386-3.C: Likewise.
	* gcc.target/i386/sse-13.c: Likewise.
	* gcc.target/i386/sse-14.c: Likewise.

	* gcc.target/i386/aes-check.h: New.
	* gcc.target/i386/aesdec.c: Likewise.
	* gcc.target/i386/aesdeclast.c: Likewise.
	* gcc.target/i386/aesenc.c: Likewise.
	* gcc.target/i386/aesenclast.c: Likewise.
	* gcc.target/i386/aesimc.c: Likewise.
	* gcc.target/i386/aeskeygenassist.c: Likewise.
	* gcc.target/i386/pclmulqdq.c: Likewise.
	* gcc.target/i386/pclmul-check.h: Likewise.

	* gcc.target/i386/i386.exp (check_effective_target_aes): New.
	(check_effective_target_pclmul): Likewise.

--- gcc/config.gcc.aes	2008-04-03 22:14:41.000000000 -0700
+++ gcc/config.gcc	2008-04-03 22:14:44.000000000 -0700
@@ -309,13 +309,15 @@ i[34567]86-*-*)
 	cpu_type=i386
 	extra_headers="cpuid.h mmintrin.h mm3dnow.h xmmintrin.h emmintrin.h
 		       pmmintrin.h tmmintrin.h ammintrin.h smmintrin.h
-		       nmmintrin.h bmmintrin.h mmintrin-common.h"
+		       nmmintrin.h bmmintrin.h mmintrin-common.h
+		       wmmintrin.h"
 	;;
 x86_64-*-*)
 	cpu_type=i386
 	extra_headers="cpuid.h mmintrin.h mm3dnow.h xmmintrin.h emmintrin.h
 		       pmmintrin.h tmmintrin.h ammintrin.h smmintrin.h
-		       nmmintrin.h bmmintrin.h mmintrin-common.h"
+		       nmmintrin.h bmmintrin.h mmintrin-common.h
+		       wmmintrin.h"
 	need_64bit_hwint=yes
 	;;
 ia64-*-*)
--- gcc/config/i386/cpuid.h.aes	2008-02-19 20:52:26.000000000 -0800
+++ gcc/config/i386/cpuid.h	2008-04-04 06:40:30.000000000 -0700
@@ -33,11 +33,13 @@
 
 /* %ecx */
 #define bit_SSE3	(1 << 0)
+#define bit_PCLMUL	(1 << 1)
 #define bit_SSSE3	(1 << 9)
 #define bit_CMPXCHG16B	(1 << 13)
 #define bit_SSE4_1	(1 << 19)
 #define bit_SSE4_2	(1 << 20)
 #define bit_POPCNT	(1 << 23)
+#define bit_AES		(1 << 25)
 
 /* %edx */
 #define bit_CMPXCHG8B	(1 << 8)
--- gcc/config/i386/i386.c.aes	2008-04-03 22:14:44.000000000 -0700
+++ gcc/config/i386/i386.c	2008-04-04 07:42:41.000000000 -0700
@@ -2078,7 +2078,9 @@ override_options (void)
       PTA_NO_SAHF = 1 << 13,
       PTA_SSE4_1 = 1 << 14,
       PTA_SSE4_2 = 1 << 15,
-      PTA_SSE5 = 1 << 16
+      PTA_SSE5 = 1 << 16,
+      PTA_AES = 1 << 17,
+      PTA_PCLMUL = 1 << 18
     };
 
   static struct pta
@@ -2385,6 +2387,10 @@ override_options (void)
 	  x86_prefetch_sse = true;
 	if (!(TARGET_64BIT && (processor_alias_table[i].flags & PTA_NO_SAHF)))
 	  x86_sahf = true;
+	if (processor_alias_table[i].flags & PTA_AES)
+	  x86_aes = true;
+	if (processor_alias_table[i].flags & PTA_PCLMUL)
+	  x86_pclmul = true;
 
 	break;
       }
@@ -17646,6 +17652,17 @@ enum ix86_builtins
 
   IX86_BUILTIN_PCMPGTQ,
 
+  /* AES instructions */
+  IX86_BUILTIN_AESENC128,
+  IX86_BUILTIN_AESENCLAST128,
+  IX86_BUILTIN_AESDEC128,
+  IX86_BUILTIN_AESDECLAST128,
+  IX86_BUILTIN_AESIMC128,
+  IX86_BUILTIN_AESKEYGENASSIST128,
+
+  /* PCLMUL instruction */
+  IX86_BUILTIN_PCLMULQDQ128,
+
   /* TFmode support builtins.  */
   IX86_BUILTIN_INFQ,
   IX86_BUILTIN_FABSQ,
@@ -18007,6 +18024,9 @@ static const struct builtin_description 
   { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_pblendw, "__builtin_ia32_pblendw128", IX86_BUILTIN_PBLENDW128, UNKNOWN, 0 },
   { OPTION_MASK_ISA_ROUND, CODE_FOR_sse4_1_roundsd, "__builtin_ia32_roundsd", IX86_BUILTIN_ROUNDSD, UNKNOWN, 0 },
   { OPTION_MASK_ISA_ROUND, CODE_FOR_sse4_1_roundss, "__builtin_ia32_roundss", IX86_BUILTIN_ROUNDSS, UNKNOWN, 0 },
+
+  /* PCLMUL */
+  { OPTION_MASK_ISA_SSE2, CODE_FOR_pclmulqdq, 0, IX86_BUILTIN_PCLMULQDQ128, UNKNOWN, 0 },
 };
 
 static const struct builtin_description bdesc_2arg[] =
@@ -18287,6 +18307,13 @@ static const struct builtin_description 
 
   /* SSE4.2 */
   { OPTION_MASK_ISA_SSE4_2, CODE_FOR_sse4_2_gtv2di3, "__builtin_ia32_pcmpgtq", IX86_BUILTIN_PCMPGTQ, UNKNOWN, 0 },
+
+  /* AES */
+  { OPTION_MASK_ISA_SSE2, CODE_FOR_aesenc, 0, IX86_BUILTIN_AESENC128, UNKNOWN, 0 },
+  { OPTION_MASK_ISA_SSE2, CODE_FOR_aesenclast, 0, IX86_BUILTIN_AESENCLAST128, UNKNOWN, 0 },
+  { OPTION_MASK_ISA_SSE2, CODE_FOR_aesdec, 0, IX86_BUILTIN_AESDEC128, UNKNOWN, 0 },
+  { OPTION_MASK_ISA_SSE2, CODE_FOR_aesdeclast, 0, IX86_BUILTIN_AESDECLAST128, UNKNOWN, 0 },
+  { OPTION_MASK_ISA_SSE2, CODE_FOR_aeskeygenassist, 0, IX86_BUILTIN_AESKEYGENASSIST128, UNKNOWN, 0 },
 };
 
 static const struct builtin_description bdesc_1arg[] =
@@ -18364,6 +18391,9 @@ static const struct builtin_description 
   /* Fake 1 arg builtins with a constant smaller than 8 bits as the 2nd arg.  */
   { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_roundpd, 0, IX86_BUILTIN_ROUNDPD, UNKNOWN, 0 },
   { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_roundps, 0, IX86_BUILTIN_ROUNDPS, UNKNOWN, 0 },
+
+  /* AES */
+  { OPTION_MASK_ISA_SSE2, CODE_FOR_aesimc, 0, IX86_BUILTIN_AESIMC128, UNKNOWN, 0 },
 };
 
 /* SSE5 */
@@ -19600,6 +19630,25 @@ ix86_init_mmx_sse_builtins (void)
 				    NULL_TREE);
   def_builtin_const (OPTION_MASK_ISA_SSE4_2, "__builtin_ia32_crc32di", ftype, IX86_BUILTIN_CRC32DI);
 
+  /* AES */
+  if (TARGET_AES)
+    {
+      /* Define AES built-in functions only if AES is enabled.  */
+      def_builtin_const (OPTION_MASK_ISA_SSE2, "__builtin_ia32_aesenc128", v2di_ftype_v2di_v2di, IX86_BUILTIN_AESENC128);
+      def_builtin_const (OPTION_MASK_ISA_SSE2, "__builtin_ia32_aesenclast128", v2di_ftype_v2di_v2di, IX86_BUILTIN_AESENCLAST128);
+      def_builtin_const (OPTION_MASK_ISA_SSE2, "__builtin_ia32_aesdec128", v2di_ftype_v2di_v2di, IX86_BUILTIN_AESDEC128);
+      def_builtin_const (OPTION_MASK_ISA_SSE2, "__builtin_ia32_aesdeclast128", v2di_ftype_v2di_v2di, IX86_BUILTIN_AESDECLAST128);
+      def_builtin_const (OPTION_MASK_ISA_SSE2, "__builtin_ia32_aesimc128", v2di_ftype_v2di, IX86_BUILTIN_AESIMC128);
+      def_builtin_const (OPTION_MASK_ISA_SSE2, "__builtin_ia32_aeskeygenassist128", v2di_ftype_v2di_int, IX86_BUILTIN_AESKEYGENASSIST128);
+    }
+
+  /* PCLMUL */
+  if (TARGET_PCLMUL)
+    {
+      /* Define PCLMUL built-in function only if PCLMUL is enabled.  */
+      def_builtin_const (OPTION_MASK_ISA_SSE2, "__builtin_ia32_pclmulqdq128", v2di_ftype_v2di_v2di_int, IX86_BUILTIN_PCLMULQDQ128);
+    }
+
   /* AMDFAM10 SSE4A New built-ins  */
   def_builtin (OPTION_MASK_ISA_SSE4A, "__builtin_ia32_movntsd", void_ftype_pdouble_v2df, IX86_BUILTIN_MOVNTSD);
   def_builtin (OPTION_MASK_ISA_SSE4A, "__builtin_ia32_movntss", void_ftype_pfloat_v4sf, IX86_BUILTIN_MOVNTSS);
@@ -19880,6 +19929,44 @@ ix86_expand_crc32 (enum insn_code icode,
   return target;
 }
 
+/* Subroutine of ix86_expand_builtin to take care of binop insns
+   with an immediate.  */
+
+static rtx
+ix86_expand_binop_imm_builtin (enum insn_code icode, tree exp,
+				rtx target)
+{
+  rtx pat;
+  tree arg0 = CALL_EXPR_ARG (exp, 0);
+  tree arg1 = CALL_EXPR_ARG (exp, 1);
+  rtx op0 = expand_normal (arg0);
+  rtx op1 = expand_normal (arg1);
+  enum machine_mode tmode = insn_data[icode].operand[0].mode;
+  enum machine_mode mode0 = insn_data[icode].operand[1].mode;
+  enum machine_mode mode1 = insn_data[icode].operand[2].mode;
+
+  if (! (*insn_data[icode].operand[1].predicate) (op0, mode1))
+    {
+      op0 = copy_to_reg (op0);
+      op0 = simplify_gen_subreg (mode0, op0, GET_MODE (op0), 0);
+    }
+
+  if (! (*insn_data[icode].operand[2].predicate) (op1, mode1))
+    {
+      error ("the last operand must be an immediate");
+      return const0_rtx;
+    }
+
+  target = gen_reg_rtx (V2DImode);
+  pat = GEN_FCN (icode) (simplify_gen_subreg (tmode, target,
+					      V2DImode, 0),
+			 op0, op1);
+  if (! pat)
+    return 0;
+  emit_insn (pat);
+  return target;
+}
+
 /* Subroutine of ix86_expand_builtin to take care of binop insns.  */
 
 static rtx
@@ -20972,34 +21059,18 @@ ix86_expand_builtin (tree exp, rtx targe
       return target;
 
     case IX86_BUILTIN_PSLLDQI128:
+      return ix86_expand_binop_imm_builtin (CODE_FOR_sse2_ashlti3,
+					     exp, target);
+      break;
+
     case IX86_BUILTIN_PSRLDQI128:
-      icode = (fcode == IX86_BUILTIN_PSLLDQI128 ? CODE_FOR_sse2_ashlti3
-	       : CODE_FOR_sse2_lshrti3);
-      arg0 = CALL_EXPR_ARG (exp, 0);
-      arg1 = CALL_EXPR_ARG (exp, 1);
-      op0 = expand_normal (arg0);
-      op1 = expand_normal (arg1);
-      tmode = insn_data[icode].operand[0].mode;
-      mode1 = insn_data[icode].operand[1].mode;
-      mode2 = insn_data[icode].operand[2].mode;
+      return ix86_expand_binop_imm_builtin (CODE_FOR_sse2_lshrti3,
+					     exp, target);
+      break;
 
-      if (! (*insn_data[icode].operand[1].predicate) (op0, mode1))
-	{
-	  op0 = copy_to_reg (op0);
-	  op0 = simplify_gen_subreg (mode1, op0, GET_MODE (op0), 0);
-	}
-      if (! (*insn_data[icode].operand[2].predicate) (op1, mode2))
-	{
-	  error ("shift must be an immediate");
-	  return const0_rtx;
-	}
-      target = gen_reg_rtx (V2DImode);
-      pat = GEN_FCN (icode) (simplify_gen_subreg (tmode, target, V2DImode, 0),
-			     op0, op1);
-      if (! pat)
-	return 0;
-      emit_insn (pat);
-      return target;
+    case IX86_BUILTIN_AESKEYGENASSIST128:
+      return ix86_expand_binop_imm_builtin (CODE_FOR_aeskeygenassist,
+					     exp, target);
 
     case IX86_BUILTIN_FEMMS:
       emit_insn (gen_mmx_femms ());
--- gcc/config/i386/i386.h.aes	2008-04-03 22:14:41.000000000 -0700
+++ gcc/config/i386/i386.h	2008-04-04 06:31:48.000000000 -0700
@@ -395,6 +395,8 @@ extern int x86_prefetch_sse;
 #define TARGET_SAHF		x86_sahf
 #define TARGET_RECIP		x86_recip
 #define TARGET_FUSED_MADD	x86_fused_muladd
+#define TARGET_AES		(TARGET_SSE2 && x86_aes)
+#define TARGET_PCLMUL		(TARGET_SSE2 && x86_pclmul)
 
 #define ASSEMBLER_DIALECT	(ix86_asm_dialect)
 
@@ -683,6 +685,10 @@ extern const char *host_detect_local_cpu
 	builtin_define ("__SSE4_1__");				\
       if (TARGET_SSE4_2)					\
 	builtin_define ("__SSE4_2__");				\
+      if (TARGET_AES)						\
+	builtin_define ("__AES__");				\
+      if (TARGET_PCLMUL)					\
+	builtin_define ("__PCLMUL__");				\
       if (TARGET_SSE4A)						\
  	builtin_define ("__SSE4A__");		                \
       if (TARGET_SSE5)						\
--- gcc/config/i386/i386.md.aes	2008-04-03 22:14:44.000000000 -0700
+++ gcc/config/i386/i386.md	2008-04-04 06:35:25.000000000 -0700
@@ -186,6 +186,17 @@
    (UNSPEC_FRCZ			156)
    (UNSPEC_CVTPH2PS		157)
    (UNSPEC_CVTPS2PH		158)
+
+   ; For AES support
+   (UNSPEC_AESENC		159)
+   (UNSPEC_AESENCLAST		160)
+   (UNSPEC_AESDEC		161)
+   (UNSPEC_AESDECLAST		162)
+   (UNSPEC_AESIMC		163)
+   (UNSPEC_AESKEYGENASSIST	164)
+
+   ; For PCLMUL support
+   (UNSPEC_PCLMUL		165)
   ])
 
 (define_constants
--- gcc/config/i386/i386.opt.aes	2007-09-13 06:25:13.000000000 -0700
+++ gcc/config/i386/i386.opt	2008-04-04 06:32:30.000000000 -0700
@@ -275,3 +275,11 @@ Target Report Var(x86_fused_muladd) Init
 Enable automatic generation of fused floating point multiply-add instructions
 if the ISA supports such instructions.  The -mfused-madd option is on by
 default.
+
+maes
+Target Report RejectNegative Var(x86_aes)
+Support AES built-in functions and code generation
+
+mpclmul
+Target Report RejectNegative Var(x86_pclmul)
+Support PCLMUL built-in functions and code generation
--- gcc/config/i386/sse.md.aes	2008-04-02 20:36:24.000000000 -0700
+++ gcc/config/i386/sse.md	2008-04-04 06:35:17.000000000 -0700
@@ -7897,3 +7897,80 @@
 }
   [(set_attr "type" "ssecmp")
    (set_attr "mode" "TI")])
+
+(define_insn "aesenc"
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
+	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0")
+		       (match_operand:V2DI 2 "nonimmediate_operand" "xm")]
+		      UNSPEC_AESENC))]
+  "TARGET_AES"
+  "aesenc\t{%2, %0|%0, %2}"
+  [(set_attr "type" "sselog1")
+   (set_attr "prefix_extra" "1")
+   (set_attr "mode" "TI")])
+
+(define_insn "aesenclast"
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
+	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0")
+		       (match_operand:V2DI 2 "nonimmediate_operand" "xm")]
+		      UNSPEC_AESENCLAST))]
+  "TARGET_AES"
+  "aesenclast\t{%2, %0|%0, %2}"
+  [(set_attr "type" "sselog1")
+   (set_attr "prefix_extra" "1")
+   (set_attr "mode" "TI")])
+
+(define_insn "aesdec"
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
+	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0")
+		       (match_operand:V2DI 2 "nonimmediate_operand" "xm")]
+		      UNSPEC_AESDEC))]
+  "TARGET_AES"
+  "aesdec\t{%2, %0|%0, %2}"
+  [(set_attr "type" "sselog1")
+   (set_attr "prefix_extra" "1")
+   (set_attr "mode" "TI")])
+
+(define_insn "aesdeclast"
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
+	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0")
+		       (match_operand:V2DI 2 "nonimmediate_operand" "xm")]
+		      UNSPEC_AESDECLAST))]
+  "TARGET_AES"
+  "aesdeclast\t{%2, %0|%0, %2}"
+  [(set_attr "type" "sselog1")
+   (set_attr "prefix_extra" "1")
+   (set_attr "mode" "TI")])
+
+(define_insn "aesimc"
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
+	(unspec:V2DI [(match_operand:V2DI 1 "nonimmediate_operand" "xm")]
+		      UNSPEC_AESIMC))]
+  "TARGET_AES"
+  "aesimc\t{%1, %0|%0, %1}"
+  [(set_attr "type" "sselog1")
+   (set_attr "prefix_extra" "1")
+   (set_attr "mode" "TI")])
+
+(define_insn "aeskeygenassist"
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
+	(unspec:V2DI [(match_operand:V2DI 1 "nonimmediate_operand" "xm")
+		      (match_operand:SI 2 "const_0_to_255_operand" "n")]
+		     UNSPEC_AESKEYGENASSIST))]
+  "TARGET_AES"
+  "aeskeygenassist\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "sselog1")
+   (set_attr "prefix_extra" "1")
+   (set_attr "mode" "TI")])
+
+(define_insn "pclmulqdq"
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
+	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0")
+		      (match_operand:V2DI 2 "nonimmediate_operand" "xm")
+		      (match_operand:SI 3 "const_0_to_255_operand" "n")]
+		     UNSPEC_PCLMUL))]
+  "TARGET_PCLMUL"
+  "pclmulqdq\t{%3, %2, %0|%0, %2, %3}"
+  [(set_attr "type" "sselog1")
+   (set_attr "prefix_extra" "1")
+   (set_attr "mode" "TI")])
--- gcc/config/i386/wmmintrin.h.aes	2008-04-03 22:14:44.000000000 -0700
+++ gcc/config/i386/wmmintrin.h	2008-04-04 07:48:42.000000000 -0700
@@ -0,0 +1,123 @@
+/* Copyright (C) 2008 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 2, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING.  If not, write to
+   the Free Software Foundation, 59 Temple Place - Suite 330,
+   Boston, MA 02111-1307, USA.  */
+
+/* As a special exception, if you include this header file into source
+   files compiled by GCC, this header file does not by itself cause
+   the resulting executable to be covered by the GNU General Public
+   License.  This exception does not however invalidate any other
+   reasons why the executable file might be covered by the GNU General
+   Public License.  */
+
+/* Implemented from the specification included in the Intel C++ Compiler
+   User Guide and Reference, version 10.1.  */
+
+#ifndef _WMMINTRIN_H_INCLUDED
+#define _WMMINTRIN_H_INCLUDED
+
+/* We need definitions from the SSE2 header file.  */
+#include <emmintrin.h>
+
+#if !defined (__AES__) && !defined (__PCLMUL__)
+# error "AES/PCLMUL instructions not enabled"
+#else
+
+/* AES */
+
+#ifdef __AES__
+/* Performs 1 round of AES decryption of the first m128i using 
+   the second m128i as a round key.  */
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_aesdec_si128 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) __builtin_ia32_aesdec128 ((__v2di)__X, (__v2di)__Y);
+}
+
+/* Performs the last round of AES decryption of the first m128i 
+   using the second m128i as a round key.  */
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_aesdeclast_si128 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) __builtin_ia32_aesdeclast128 ((__v2di)__X,
+						 (__v2di)__Y);
+}
+
+/* Performs 1 round of AES encryption of the first m128i using 
+   the second m128i as a round key.  */
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_aesenc_si128 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) __builtin_ia32_aesenc128 ((__v2di)__X, (__v2di)__Y);
+}
+
+/* Performs the last round of AES encryption of the first m128i
+   using the second m128i as a round key.  */
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_aesenclast_si128 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) __builtin_ia32_aesenclast128 ((__v2di)__X, (__v2di)__Y);
+}
+
+/* Performs the InverseMixColumn operation on the source m128i 
+   and stores the result into m128i destination.  */
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_aesimc_si128 (__m128i __X)
+{
+  return (__m128i) __builtin_ia32_aesimc128 ((__v2di)__X);
+}
+
+/* Generates a m128i round key for the input m128i AES cipher key and
+   byte round constant.  The second parameter must be a compile time
+   constant.  */
+#ifdef __OPTIMIZE__
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_aeskeygenassist_si128 (__m128i __X, const int __C)
+{
+  return (__m128i) __builtin_ia32_aeskeygenassist128 ((__v2di)__X, __C);
+}
+#else
+#define _mm_aeskeygenassist_si128(X, C)					\
+  ((__m128i) __builtin_ia32_aeskeygenassist128 ((__v2di)(__m128i)(X),	\
+						(int)(C)))
+#endif
+#endif  /* __AES__ */
+
+/* PCLMUL */
+
+#ifdef __PCLMUL__
+/* Performs carry-less integer multiplication of 64-bit halves of
+   128-bit input operands.  The third parameter inducates which 64-bit
+   haves of the input parameters v1 and v2 should be used. It must be
+   a compile time constant.  */
+#ifdef __OPTIMIZE__
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_clmulepi64_si128 (__m128i __X, __m128i __Y, const int __I)
+{
+  return (__m128i) __builtin_ia32_pclmulqdq128 ((__v2di)__X,
+						(__v2di)__Y, __I);
+}
+#else
+#define _mm_clmulepi64_si128(X, Y, I)					\
+  ((__m128i) __builtin_ia32_pclmulqdq128 ((__v2di)(__m128i)(X),		\
+					  (__v2di)(__m128i)(Y), (int)(I)))
+#endif
+#endif  /* __PCLMUL__  */
+
+#endif /* __AES__/__PCLMUL__ */
+
+#endif /* _WMMINTRIN_H_INCLUDED */
--- gcc/doc/extend.texi.aes	2008-03-25 11:40:12.000000000 -0700
+++ gcc/doc/extend.texi	2008-04-04 07:54:00.000000000 -0700
@@ -8013,6 +8013,27 @@ depending on the size of @code{unsigned 
 Generates the @code{popcntq} machine instruction.
 @end table
 
+The following built-in functions are available when @option{-maes} is
+used and SSE2 is enabled.  All of them generate the machine instruction
+that is part of the name.
+
+@smallexample
+v2di __builtin_ia32_aesenc128 (v2di, v2di)
+v2di __builtin_ia32_aesenclast128 (v2di, v2di)
+v2di __builtin_ia32_aesdec128 (v2di, v2di)
+v2di __builtin_ia32_aesdeclast128 (v2di, v2di)
+v2di __builtin_ia32_aeskeygenassist128 (v2di, const int)
+v2di __builtin_ia32_aesimc128 (v2di)
+@end smallexample
+
+The following built-in function is available when @option{-mpclmul} is
+used and SSE2 is enabled.
+
+@table @code
+@item v2di __builtin_ia32_pclmulqdq128 (v2di, v2di, const int)
+Generates the @code{pclmulqdq} machine instruction.
+@end table
+
 The following built-in functions are available when @option{-msse4a} is used.
 All of them generate the machine instruction that is part of the name.
 
--- gcc/doc/invoke.texi.aes	2008-04-03 22:14:44.000000000 -0700
+++ gcc/doc/invoke.texi	2008-04-04 06:45:47.000000000 -0700
@@ -555,6 +555,7 @@ Objective-C and Objective-C++ Dialects}.
 -mno-wide-multiply  -mrtd  -malign-double @gol
 -mpreferred-stack-boundary=@var{num} -mcx16 -msahf -mrecip @gol
 -mmmx  -msse  -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msse4 @gol
+-maes -mpclmul @gol
 -msse4a -m3dnow -mpopcnt -mabm -msse5 @gol
 -mthreads  -mno-align-stringops  -minline-all-stringops @gol
 -mpush-args  -maccumulate-outgoing-args  -m128bit-long-double @gol
@@ -10720,6 +10721,10 @@ preferred alignment to @option{-mpreferr
 @itemx -mno-sse4.2
 @itemx -msse4
 @itemx -mno-sse4
+@itemx -maes
+@itemx -mno-aes
+@itemx -mpclmul
+@itemx -mno-pclmul
 @itemx -msse4a
 @itemx -mno-sse4a
 @itemx -msse5
@@ -10737,8 +10742,8 @@ preferred alignment to @option{-mpreferr
 @opindex m3dnow
 @opindex mno-3dnow
 These switches enable or disable the use of instructions in the MMX,
-SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4A, SSE5, ABM or 3DNow!@: extended
-instruction sets.
+SSE, SSE2, SSE3, SSSE3, SSE4.1, AES, PCLMUL, SSE4A, SSE5, ABM or
+3DNow!@: extended instruction sets.
 These extensions are also available as built-in functions: see
 @ref{X86 Built-in Functions}, for details of the functions enabled and
 disabled by these switches.
--- gcc/testsuite/g++.dg/other/i386-2.C.aes	2007-12-15 15:49:16.000000000 -0800
+++ gcc/testsuite/g++.dg/other/i386-2.C	2008-04-04 06:37:21.000000000 -0700
@@ -1,8 +1,9 @@
-/* Test that {,x,e,p,t,s,a,b}mmintrin.h, mm3dnow.h and mm_malloc.h are
+/* Test that {,x,e,p,t,s,w,a,b}mmintrin.h, mm3dnow.h and mm_malloc.h are
    usable with -O -pedantic-errors.  */
 /* { dg-do compile { target i?86-*-* x86_64-*-* } } */
-/* { dg-options "-O -pedantic-errors -march=k8 -m3dnow -msse4 -msse5" } */
+/* { dg-options "-O -pedantic-errors -march=k8 -m3dnow -msse4 -msse5 -maes -mpclmul" } */
 
+#include <wmmintrin.h>
 #include <bmmintrin.h>
 #include <smmintrin.h>
 #include <mm3dnow.h>
--- gcc/testsuite/g++.dg/other/i386-3.C.aes	2008-03-13 09:04:41.000000000 -0700
+++ gcc/testsuite/g++.dg/other/i386-3.C	2008-04-04 06:37:56.000000000 -0700
@@ -1,8 +1,9 @@
-/* Test that {,x,e,p,t,s,a,b}mmintrin.h, mm3dnow.h and mm_malloc.h are
+/* Test that {,x,e,p,t,s,w,a,b}mmintrin.h, mm3dnow.h and mm_malloc.h are
    usable with -O -fkeep-inline-functions.  */
 /* { dg-do compile { target i?86-*-* x86_64-*-* } } */
-/* { dg-options "-O -fkeep-inline-functions -march=k8 -m3dnow -msse4 -msse5" } */
+/* { dg-options "-O -fkeep-inline-functions -march=k8 -m3dnow -maes -mpclmul -msse4 -msse5" } */
 
+#include <wmmintrin.h>
 #include <bmmintrin.h>
 #include <smmintrin.h>
 #include <mm3dnow.h>
--- gcc/testsuite/gcc.target/i386/aes-check.h.aes	2008-04-03 22:14:44.000000000 -0700
+++ gcc/testsuite/gcc.target/i386/aes-check.h	2008-04-03 22:14:44.000000000 -0700
@@ -0,0 +1,30 @@
+#include <stdio.h>
+#include <stdlib.h>
+
+#include "cpuid.h"
+
+static void aes_test (void);
+
+int
+main ()
+{
+  unsigned int eax, ebx, ecx, edx;
+ 
+  if (!__get_cpuid (1, &eax, &ebx, &ecx, &edx))
+    return 0;
+
+  /* Run AES test only if host has AES support.  */
+  if (ecx & bit_AES)
+    {
+      aes_test ();
+#ifdef DEBUG
+    printf ("PASSED\n");
+#endif
+    }
+#ifdef DEBUG
+  else
+    printf ("SKIPPED\n");
+#endif
+
+  return 0;
+}
--- gcc/testsuite/gcc.target/i386/aesdec.c.aes	2008-04-03 22:14:44.000000000 -0700
+++ gcc/testsuite/gcc.target/i386/aesdec.c	2008-04-04 07:44:49.000000000 -0700
@@ -0,0 +1,67 @@
+/* { dg-do run } */
+/* { dg-require-effective-target aes } */
+/* { dg-options "-O2 -msse2 -maes" } */
+
+#include <wmmintrin.h>
+#include <string.h>
+
+#include "aes-check.h"
+
+extern void abort (void);
+
+#define NUM 1024
+
+static __m128i src1[NUM];
+static __m128i src2[NUM];
+static __m128i edst[NUM];
+
+static __m128i resdst[NUM];
+
+/* Initialize input/output vectors.  (Currently, there is only one set
+   of input/output vectors).  */
+static void
+init_data (__m128i *s1, __m128i *s2, __m128i *d)
+{
+  int i;
+  for (i = 0; i < NUM; i++)
+    {
+      s1[i] = _mm_setr_epi32 (0x5d53475d, 0x63746f72,
+			      0x73745665, 0x7b5b5465);
+      s2[i] = _mm_setr_epi32 (0x726f6e5d, 0x5b477565,
+			      0x68617929, 0x48692853);
+      d[i]  = _mm_setr_epi32 (0xb730392a, 0xb58eb95e,
+			      0xfaea2787, 0x138ac342);
+    }
+}
+
+static void
+aes_test (void)
+{
+  int i;
+
+  init_data (src1, src2, edst);
+
+  for (i = 0; i < NUM; i += 16)
+    {
+      resdst[i] = _mm_aesdec_si128 (src1[i], src2[i]);
+      resdst[i + 1] = _mm_aesdec_si128 (src1[i + 1], src2[i + 1]);
+      resdst[i + 2] = _mm_aesdec_si128 (src1[i + 2], src2[i + 2]);
+      resdst[i + 3] = _mm_aesdec_si128 (src1[i + 3], src2[i + 3]);
+      resdst[i + 4] = _mm_aesdec_si128 (src1[i + 4], src2[i + 4]);
+      resdst[i + 5] = _mm_aesdec_si128 (src1[i + 5], src2[i + 5]);
+      resdst[i + 6] = _mm_aesdec_si128 (src1[i + 6], src2[i + 6]);
+      resdst[i + 7] = _mm_aesdec_si128 (src1[i + 7], src2[i + 7]);
+      resdst[i + 8] = _mm_aesdec_si128 (src1[i + 8], src2[i + 8]);
+      resdst[i + 9] = _mm_aesdec_si128 (src1[i + 9], src2[i + 9]);
+      resdst[i + 10] = _mm_aesdec_si128 (src1[i + 10], src2[i + 10]);
+      resdst[i + 11] = _mm_aesdec_si128 (src1[i + 11], src2[i + 11]);
+      resdst[i + 12] = _mm_aesdec_si128 (src1[i + 12], src2[i + 12]);
+      resdst[i + 13] = _mm_aesdec_si128 (src1[i + 13], src2[i + 13]);
+      resdst[i + 14] = _mm_aesdec_si128 (src1[i + 14], src2[i + 14]);
+      resdst[i + 15] = _mm_aesdec_si128 (src1[i + 15], src2[i + 15]);
+    }
+
+  for (i = 0; i < NUM; i++)
+    if (memcmp (edst + i, resdst + i, sizeof (__m128i)))
+      abort ();
+}
--- gcc/testsuite/gcc.target/i386/aesdeclast.c.aes	2008-04-03 22:14:44.000000000 -0700
+++ gcc/testsuite/gcc.target/i386/aesdeclast.c	2008-04-04 07:45:21.000000000 -0700
@@ -0,0 +1,69 @@
+/* { dg-do run } */
+/* { dg-require-effective-target aes } */
+/* { dg-options "-O2 -msse2 -maes" } */
+
+#include <wmmintrin.h>
+#include <string.h>
+
+#include "aes-check.h"
+
+extern void abort (void);
+
+#define NUM 1024
+
+static __m128i src1[NUM];
+static __m128i src2[NUM];
+static __m128i edst[NUM];
+
+static __m128i resdst[NUM];
+
+/* Initialize input/output vectors.  (Currently, there is only one set of
+   input/output vectors).  */
+
+static void
+init_data (__m128i *s1, __m128i *s2, __m128i *d)
+{
+  int i;
+
+  for (i = 0; i < NUM; i++)
+    {
+      s1[i] = _mm_setr_epi32 (0x5d53475d, 0x63746f72,
+			      0x73745665, 0x7b5b5465);
+      s2[i] = _mm_setr_epi32 (0x726f6e5d, 0x5b477565,
+			      0x68617929, 0x48692853);
+      d[i] = _mm_setr_epi32 (0x72a593d0, 0xd410637b,
+			     0x6b317f95, 0xc5a391ef);
+    }
+}
+
+static void
+aes_test (void)
+{
+  int i;
+
+  init_data (src1, src2, edst);
+
+  for (i = 0; i < NUM; i += 16)
+    {
+      resdst[i] = _mm_aesdeclast_si128 (src1[i], src2[i]);
+      resdst[i + 1] = _mm_aesdeclast_si128 (src1[i + 1], src2[i + 1]);
+      resdst[i + 2] = _mm_aesdeclast_si128 (src1[i + 2], src2[i + 2]);
+      resdst[i + 3] = _mm_aesdeclast_si128 (src1[i + 3], src2[i + 3]);
+      resdst[i + 4] = _mm_aesdeclast_si128 (src1[i + 4], src2[i + 4]);
+      resdst[i + 5] = _mm_aesdeclast_si128 (src1[i + 5], src2[i + 5]);
+      resdst[i + 6] = _mm_aesdeclast_si128 (src1[i + 6], src2[i + 6]);
+      resdst[i + 7] = _mm_aesdeclast_si128 (src1[i + 7], src2[i + 7]);
+      resdst[i + 8] = _mm_aesdeclast_si128 (src1[i + 8], src2[i + 8]);
+      resdst[i + 9] = _mm_aesdeclast_si128 (src1[i + 9], src2[i + 9]);
+      resdst[i + 10] = _mm_aesdeclast_si128 (src1[i + 10], src2[i + 10]);
+      resdst[i + 11] = _mm_aesdeclast_si128 (src1[i + 11], src2[i + 11]);
+      resdst[i + 12] = _mm_aesdeclast_si128 (src1[i + 12], src2[i + 12]);
+      resdst[i + 13] = _mm_aesdeclast_si128 (src1[i + 13], src2[i + 13]);
+      resdst[i + 14] = _mm_aesdeclast_si128 (src1[i + 14], src2[i + 14]);
+      resdst[i + 15] = _mm_aesdeclast_si128 (src1[i + 15], src2[i + 15]);
+    }
+
+  for (i = 0; i < NUM; i++)
+    if (memcmp (edst + i, resdst + i, sizeof (__m128i)))
+      abort ();
+}
--- gcc/testsuite/gcc.target/i386/aesenc.c.aes	2008-04-03 22:14:44.000000000 -0700
+++ gcc/testsuite/gcc.target/i386/aesenc.c	2008-04-04 07:45:26.000000000 -0700
@@ -0,0 +1,68 @@
+/* { dg-do run } */
+/* { dg-require-effective-target aes } */
+/* { dg-options "-O2 -msse2 -maes" } */
+
+#include <wmmintrin.h>
+#include <string.h>
+
+#include "aes-check.h"
+
+extern void abort (void);
+
+#define NUM 1024
+
+static __m128i src1[NUM];
+static __m128i src2[NUM];
+static __m128i edst[NUM];
+
+static __m128i resdst[NUM];
+
+/* Initialize input/output vectors.  (Currently, there is only one set
+   of input/output vectors).  */
+
+static void
+init_data (__m128i *s1, __m128i *s2, __m128i *d)
+{
+  int i;
+  for (i = 0; i < NUM; i++)
+    {
+      s1[i] = _mm_setr_epi32 (0x5d53475d, 0x63746f72,
+			      0x73745665, 0x7b5b5465);
+      s2[i] = _mm_setr_epi32 (0x726f6e5d, 0x5b477565,
+			      0x68617929, 0x48692853);
+      d[i] = _mm_setr_epi32 (0xded7e595, 0x8b104b58,
+			     0x9fdba3c5, 0xa8311c2f);
+    }
+}
+
+static void
+aes_test (void)
+{
+  int i;
+
+  init_data (src1, src2, edst);
+
+  for (i = 0; i < NUM; i += 16)
+    {
+      resdst[i] = _mm_aesenc_si128 (src1[i], src2[i]);
+      resdst[i + 1] = _mm_aesenc_si128 (src1[i + 1], src2[i + 1]);
+      resdst[i + 2] = _mm_aesenc_si128 (src1[i + 2], src2[i + 2]);
+      resdst[i + 3] = _mm_aesenc_si128 (src1[i + 3], src2[i + 3]);
+      resdst[i + 4] = _mm_aesenc_si128 (src1[i + 4], src2[i + 4]);
+      resdst[i + 5] = _mm_aesenc_si128 (src1[i + 5], src2[i + 5]);
+      resdst[i + 6] = _mm_aesenc_si128 (src1[i + 6], src2[i + 6]);
+      resdst[i + 7] = _mm_aesenc_si128 (src1[i + 7], src2[i + 7]);
+      resdst[i + 8] = _mm_aesenc_si128 (src1[i + 8], src2[i + 8]);
+      resdst[i + 9] = _mm_aesenc_si128 (src1[i + 9], src2[i + 9]);
+      resdst[i + 10] = _mm_aesenc_si128 (src1[i + 10], src2[i + 10]);
+      resdst[i + 11] = _mm_aesenc_si128 (src1[i + 11], src2[i + 11]);
+      resdst[i + 12] = _mm_aesenc_si128 (src1[i + 12], src2[i + 12]);
+      resdst[i + 13] = _mm_aesenc_si128 (src1[i + 13], src2[i + 13]);
+      resdst[i + 14] = _mm_aesenc_si128 (src1[i + 14], src2[i + 14]);
+      resdst[i + 15] = _mm_aesenc_si128 (src1[i + 15], src2[i + 15]);
+    }
+
+  for (i = 0; i < NUM; i++)
+    if (memcmp (edst + i, resdst + i, sizeof (__m128i)))
+      abort ();
+}
--- gcc/testsuite/gcc.target/i386/aesenclast.c.aes	2008-04-03 22:14:44.000000000 -0700
+++ gcc/testsuite/gcc.target/i386/aesenclast.c	2008-04-04 07:45:37.000000000 -0700
@@ -0,0 +1,68 @@
+/* { dg-do run } */
+/* { dg-require-effective-target aes } */
+/* { dg-options "-O2 -msse2 -maes" } */
+
+#include <wmmintrin.h>
+#include <string.h>
+
+#include "aes-check.h"
+
+extern void abort (void);
+
+#define NUM 1024
+
+static __m128i src1[NUM];
+static __m128i src2[NUM];
+static __m128i edst[NUM];
+
+static __m128i resdst[NUM];
+
+/* Initialize input/output vectors.  (Currently, there is only one
+   set of input/output vectors).  */
+
+static void
+init_data (__m128i *s1, __m128i *s2, __m128i *d)
+{
+  int i;
+  for (i = 0; i < NUM; i++)
+    {
+      s1[i] = _mm_setr_epi32 (0x5d53475d, 0x63746f72,
+			      0x73745665, 0x7b5b5465);
+      s2[i] = _mm_setr_epi32 (0x726f6e5d, 0x5b477565,
+			      0x68617929, 0x48692853);
+      d[i] = _mm_setr_epi32 (0x53fdc611, 0x177ec425,
+			     0x938c5964, 0xc7fb881e);
+    }
+}
+
+static void
+aes_test (void)
+{
+  int i;
+
+  init_data (src1, src2, edst);
+
+  for (i = 0; i < NUM; i += 16)
+    {
+      resdst[i] = _mm_aesenclast_si128 (src1[i], src2[i]);
+      resdst[i + 1] = _mm_aesenclast_si128 (src1[i + 1], src2[i + 1]);
+      resdst[i + 2] = _mm_aesenclast_si128 (src1[i + 2], src2[i + 2]);
+      resdst[i + 3] = _mm_aesenclast_si128 (src1[i + 3], src2[i + 3]);
+      resdst[i + 4] = _mm_aesenclast_si128 (src1[i + 4], src2[i + 4]);
+      resdst[i + 5] = _mm_aesenclast_si128 (src1[i + 5], src2[i + 5]);
+      resdst[i + 6] = _mm_aesenclast_si128 (src1[i + 6], src2[i + 6]);
+      resdst[i + 7] = _mm_aesenclast_si128 (src1[i + 7], src2[i + 7]);
+      resdst[i + 8] = _mm_aesenclast_si128 (src1[i + 8], src2[i + 8]);
+      resdst[i + 9] = _mm_aesenclast_si128 (src1[i + 9], src2[i + 9]);
+      resdst[i + 10] = _mm_aesenclast_si128 (src1[i + 10], src2[i + 10]);
+      resdst[i + 11] = _mm_aesenclast_si128 (src1[i + 11], src2[i + 11]);
+      resdst[i + 12] = _mm_aesenclast_si128 (src1[i + 12], src2[i + 12]);
+      resdst[i + 13] = _mm_aesenclast_si128 (src1[i + 13], src2[i + 13]);
+      resdst[i + 14] = _mm_aesenclast_si128 (src1[i + 14], src2[i + 14]);
+      resdst[i + 15] = _mm_aesenclast_si128 (src1[i + 15], src2[i + 15]);
+    }
+
+  for (i = 0; i < NUM; i++)
+    if (memcmp(edst + i, resdst + i, sizeof (__m128i)))
+      abort ();
+}
--- gcc/testsuite/gcc.target/i386/aesimc.c.aes	2008-04-03 22:14:44.000000000 -0700
+++ gcc/testsuite/gcc.target/i386/aesimc.c	2008-04-04 07:46:11.000000000 -0700
@@ -0,0 +1,66 @@
+/* { dg-do run } */
+/* { dg-require-effective-target aes } */
+/* { dg-options "-O2 -msse2 -maes" } */
+
+#include <wmmintrin.h>
+#include <string.h>
+
+#include "aes-check.h"
+
+extern void abort (void);
+
+#define NUM 1024
+
+static __m128i src1[NUM];
+static __m128i edst[NUM];
+
+static __m128i resdst[NUM];
+
+/* Initialize input/output vectors.  (Currently, there is only one set
+   of input/output vectors).   */
+
+static void
+init_data (__m128i *s1, __m128i *d)
+{
+  int i;
+
+  for (i = 0; i < NUM; i++)
+    {
+      s1[i] = _mm_setr_epi32 (0x5d53475d, 0x63746f72,
+			      0x73745665, 0x7b5b5465);
+      d[i] = _mm_setr_epi32 (0x81c3b3e5, 0x2b18330a,
+			     0x44b109c8, 0x627a6f66);
+    }
+}
+
+static void
+aes_test (void)
+{
+  int i;
+
+  init_data (src1, edst);
+
+  for (i = 0; i < NUM; i += 16)
+    {
+      resdst[i] = _mm_aesimc_si128 (src1[i]);
+      resdst[i + 1] = _mm_aesimc_si128 (src1[i + 1]);
+      resdst[i + 2] = _mm_aesimc_si128 (src1[i + 2]);
+      resdst[i + 3] = _mm_aesimc_si128 (src1[i + 3]);
+      resdst[i + 4] = _mm_aesimc_si128 (src1[i + 4]);
+      resdst[i + 5] = _mm_aesimc_si128 (src1[i + 5]);
+      resdst[i + 6] = _mm_aesimc_si128 (src1[i + 6]);
+      resdst[i + 7] = _mm_aesimc_si128 (src1[i + 7]);
+      resdst[i + 8] = _mm_aesimc_si128 (src1[i + 8]);
+      resdst[i + 9] = _mm_aesimc_si128 (src1[i + 9]);
+      resdst[i + 10] = _mm_aesimc_si128 (src1[i + 10]);
+      resdst[i + 11] = _mm_aesimc_si128 (src1[i + 11]);
+      resdst[i + 12] = _mm_aesimc_si128 (src1[i + 12]);
+      resdst[i + 13] = _mm_aesimc_si128 (src1[i + 13]);
+      resdst[i + 14] = _mm_aesimc_si128 (src1[i + 14]);
+      resdst[i + 15] = _mm_aesimc_si128 (src1[i + 15]);
+    }
+
+  for (i = 0; i < NUM; i++)
+    if (memcmp(edst + i, resdst + i, sizeof (__m128i)))
+      abort ();
+}
--- gcc/testsuite/gcc.target/i386/aeskeygenassist.c.aes	2008-04-03 22:14:44.000000000 -0700
+++ gcc/testsuite/gcc.target/i386/aeskeygenassist.c	2008-04-04 07:46:19.000000000 -0700
@@ -0,0 +1,66 @@
+/* { dg-do run } */
+/* { dg-require-effective-target aes } */
+/* { dg-options "-O2 -msse2 -maes" } */
+
+#include <wmmintrin.h>
+#include <string.h>
+
+#include "aes-check.h"
+
+extern void abort (void);
+
+#define NUM 1024
+#define IMM8 1
+
+static __m128i src1[NUM];
+static __m128i edst[NUM];
+
+static __m128i resdst[NUM];
+
+/* Initialize input/output vectors.  (Currently, there is only one set
+   of input/output vectors).  */
+
+static void
+init_data (__m128i *s1, __m128i *d)
+{
+  int i;
+  for (i = 0; i < NUM; i++)
+    {
+      s1[i] = _mm_setr_epi32 (0x16157e2b, 0xa6d2ae28,
+			      0x8815f7ab, 0x3c4fcf09);
+      d[i] = _mm_setr_epi32 (0x24b5e434, 0x3424b5e5,
+			     0xeb848a01, 0x01eb848b);
+    }
+}
+
+static void
+aes_test (void)
+{
+  int i;
+
+  init_data (src1, edst);
+
+  for (i = 0; i < NUM; i += 16)
+    {
+      resdst[i]  = _mm_aeskeygenassist_si128 (src1[i], IMM8);
+      resdst[i + 1] = _mm_aeskeygenassist_si128 (src1[i + 1], IMM8);
+      resdst[i + 2] = _mm_aeskeygenassist_si128 (src1[i + 2], IMM8);
+      resdst[i + 3] = _mm_aeskeygenassist_si128 (src1[i + 3], IMM8);
+      resdst[i + 4] = _mm_aeskeygenassist_si128 (src1[i + 4], IMM8);
+      resdst[i + 5] = _mm_aeskeygenassist_si128 (src1[i + 5], IMM8);
+      resdst[i + 6] = _mm_aeskeygenassist_si128 (src1[i + 6], IMM8);
+      resdst[i + 7] = _mm_aeskeygenassist_si128 (src1[i + 7], IMM8);
+      resdst[i + 8] = _mm_aeskeygenassist_si128 (src1[i + 8], IMM8);
+      resdst[i + 9] = _mm_aeskeygenassist_si128 (src1[i + 9], IMM8);
+      resdst[i + 10] = _mm_aeskeygenassist_si128 (src1[i + 10], IMM8);
+      resdst[i + 11] = _mm_aeskeygenassist_si128 (src1[i + 11], IMM8);
+      resdst[i + 12] = _mm_aeskeygenassist_si128 (src1[i + 12], IMM8);
+      resdst[i + 13] = _mm_aeskeygenassist_si128 (src1[i + 13], IMM8);
+      resdst[i + 14] = _mm_aeskeygenassist_si128 (src1[i + 14], IMM8);
+      resdst[i + 15] = _mm_aeskeygenassist_si128 (src1[i + 15], IMM8);
+    }
+
+  for (i = 0; i < NUM; i++)
+    if (memcmp(edst + i, resdst + i, sizeof (__m128i)))
+      abort ();
+}
--- gcc/testsuite/gcc.target/i386/i386.exp.aes	2008-01-05 10:52:44.000000000 -0800
+++ gcc/testsuite/gcc.target/i386/i386.exp	2008-04-04 07:46:53.000000000 -0700
@@ -51,6 +51,34 @@ proc check_effective_target_sse4 { } {
     } "-O2 -msse4.1" ]
 }
 
+# Return 1 if aes instructions can be compiled.
+proc check_effective_target_aes { } {
+    return [check_no_compiler_messages aes object {
+	typedef long long __m128i __attribute__ ((__vector_size__ (16)));
+	typedef long long __v2di __attribute__ ((__vector_size__ (16)));
+
+	__m128i _mm_aesimc_si128 (__m128i __X)
+	{
+	    return (__m128i) __builtin_ia32_aesimc128 ((__v2di)__X);
+	}
+    } "-O2 -msse2 -maes" ]
+}
+
+# Return 1 if pclmul instructions can be compiled.
+proc check_effective_target_pclmul { } {
+    return [check_no_compiler_messages pclmul object {
+	typedef long long __m128i __attribute__ ((__vector_size__ (16)));
+	typedef long long __v2di __attribute__ ((__vector_size__ (16)));
+
+	__m128i pclmulqdq_test (__m128i __X, __m128i __Y)
+	{
+	    return (__m128i) __builtin_ia32_pclmulqdq128 ((__v2di)__X,
+							  (__v2di)__Y,
+							  1);
+	}
+    } "-O2 -msse2 -mpclmul" ]
+}
+
 # Return 1 if sse4a instructions can be compiled.
 proc check_effective_target_sse4a { } {
     return [check_no_compiler_messages sse4a object {
--- gcc/testsuite/gcc.target/i386/pclmul-check.h.aes	2008-04-03 22:14:44.000000000 -0700
+++ gcc/testsuite/gcc.target/i386/pclmul-check.h	2008-04-04 06:54:12.000000000 -0700
@@ -0,0 +1,30 @@
+#include <stdio.h>
+#include <stdlib.h>
+
+#include "cpuid.h"
+
+static void pclmul_test (void);
+
+int
+main ()
+{
+  unsigned int eax, ebx, ecx, edx;
+ 
+  if (!__get_cpuid (1, &eax, &ebx, &ecx, &edx))
+    return 0;
+
+  /* Run PCLMULQDQ test only if host has PCLMULQDQ support.  */
+  if (ecx & bit_PCLMUL)
+    {
+      pclmul_test ();
+#ifdef DEBUG
+      printf ("PASSED\n");
+#endif
+    }
+#ifdef DEBUG
+  else
+    printf ("SKIPPED\n");
+#endif
+
+  return 0;
+}
--- gcc/testsuite/gcc.target/i386/pclmulqdq.c.aes	2008-04-03 22:14:44.000000000 -0700
+++ gcc/testsuite/gcc.target/i386/pclmulqdq.c	2008-04-04 07:46:31.000000000 -0700
@@ -0,0 +1,87 @@
+/* { dg-do run } */
+/* { dg-require-effective-target pclmul } */
+/* { dg-options "-O2 -msse2 -mpclmul" } */
+
+#include <wmmintrin.h>
+#include <string.h>
+
+#include "pclmul-check.h"
+
+extern void abort (void);
+
+#define NUM 1024
+
+static __m128i s1[NUM];
+static __m128i s2[NUM];
+/* We need this array to generate mem form of inst */
+static __m128i s2m[NUM];
+
+static __m128i e_00[NUM];
+static __m128i e_01[NUM];
+static __m128i e_10[NUM];
+static __m128i e_11[NUM];
+
+static __m128i d_00[NUM];
+static __m128i d_01[NUM];
+static __m128i d_10[NUM];
+static __m128i d_11[NUM];
+
+/* Initialize input/output vectors.  (Currently, there is only one set
+   of input/output vectors).  */
+static void
+init_data (__m128i *ls1,   __m128i *ls2, __m128i *le_00, __m128i *le_01,
+	   __m128i *le_10, __m128i *le_11)
+{
+  int i;
+
+  for (i = 0; i < NUM; i++)
+    {
+      ls1[i] = _mm_set_epi32 (0x7B5B5465, 0x73745665,
+			      0x63746F72, 0x5D53475D);
+      ls2[i] = _mm_set_epi32 (0x48692853, 0x68617929,
+			      0x5B477565, 0x726F6E5D);
+      s2m[i] = _mm_set_epi32 (0x48692853, 0x68617929,
+			      0x5B477565, 0x726F6E5D);
+      le_00[i] = _mm_set_epi32 (0x1D4D84C8, 0x5C3440C0,
+				0x929633D5, 0xD36F0451);
+      le_01[i] = _mm_set_epi32 (0x1A2BF6DB, 0x3A30862F,
+				0xBABF262D, 0xF4B7D5C9);
+      le_10[i] = _mm_set_epi32 (0x1BD17C8D, 0x556AB5A1,
+				0x7FA540AC, 0x2A281315);
+      le_11[i] = _mm_set_epi32 (0x1D1E1F2C, 0x592E7C45,
+				0xD66EE03E, 0x410FD4ED);
+    }
+}
+
+static void
+pclmul_test (void)
+{
+  int i;
+
+  init_data (s1, s2, e_00, e_01, e_10, e_11);
+
+  for (i = 0; i < NUM; i += 2)
+    {
+      d_00[i] = _mm_clmulepi64_si128 (s1[i], s2m[i], 0x00);
+      d_01[i] = _mm_clmulepi64_si128 (s1[i], s2[i], 0x01);
+      d_10[i] = _mm_clmulepi64_si128 (s1[i], s2[i], 0x10);
+      d_11[i] = _mm_clmulepi64_si128 (s1[i], s2[i], 0x11);
+
+      d_11[i + 1] = _mm_clmulepi64_si128 (s1[i + 1], s2[i + 1], 0x11);
+      d_00[i + 1] = _mm_clmulepi64_si128 (s1[i + 1], s2[i + 1], 0x00);
+      d_10[i + 1] = _mm_clmulepi64_si128 (s1[i + 1], s2m[i + 1], 0x10);
+      d_01[i + 1] = _mm_clmulepi64_si128 (s1[i + 1], s2[i + 1], 0x01);
+    }
+
+  for (i = 0; i < NUM; i++)
+    {
+      if (memcmp (d_00 + i, e_00 + i, sizeof (__m128i)))
+	abort ();
+      if (memcmp (d_01 + i, e_01 + i, sizeof (__m128i)))
+	abort ();
+      if (memcmp (d_10 + i, e_10 + i, sizeof (__m128i)))
+	abort ();
+      if (memcmp(d_11 + i, e_11 + i, sizeof (__m128i)))
+	abort ();
+    }
+}
--- gcc/testsuite/gcc.target/i386/sse-13.c.aes	2008-03-26 06:32:52.000000000 -0700
+++ gcc/testsuite/gcc.target/i386/sse-13.c	2008-04-04 06:38:43.000000000 -0700
@@ -1,10 +1,10 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -m3dnow -msse4 -msse5" } */
+/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -m3dnow -msse4 -msse5 -maes -mpclmul" } */
 
 #include <mm_malloc.h>
 
 /* Test that the intrinsics compile with optimization.  All of them are
-   defined as inline functions in {,x,e,p,t,s,a,b}mmintrin.h and mm3dnow.h
+   defined as inline functions in {,x,e,p,t,s,w,a,b}mmintrin.h and mm3dnow.h
    that reference the proper builtin functions.  Defining away "extern" and
    "__inline" results in all of them being compiled as proper functions.  */
 
@@ -17,6 +17,10 @@
 #define __builtin_ia32_extrqi(X, I, L)  __builtin_ia32_extrqi(X, 1, 1)
 #define __builtin_ia32_insertqi(X, Y, I, L) __builtin_ia32_insertqi(X, Y, 1, 1)
 
+/* wmmintrin.h */
+#define __builtin_ia32_aeskeygenassist128(X, C) __builtin_ia32_aeskeygenassist128(X, 1)
+#define __builtin_ia32_pclmulqdq128(X, Y, I) __builtin_ia32_pclmulqdq128(X, Y, 1)
+
 /* smmintrin.h */
 #define __builtin_ia32_pblendw128(X, Y, M) __builtin_ia32_pblendw128 (X, Y, 1)
 #define __builtin_ia32_blendps(X, Y, M) __builtin_ia32_blendps(X, Y, 1)
@@ -94,6 +98,7 @@
 #define __builtin_ia32_protdi(A, B) __builtin_ia32_protdi(A,1)
 #define __builtin_ia32_protqi(A, B) __builtin_ia32_protqi(A,1)
 
+#include <wmmintrin.h>
 #include <bmmintrin.h>
 #include <smmintrin.h>
 #include <mm3dnow.h>
--- gcc/testsuite/gcc.target/i386/sse-14.c.aes	2008-03-26 06:32:52.000000000 -0700
+++ gcc/testsuite/gcc.target/i386/sse-14.c	2008-04-04 06:38:31.000000000 -0700
@@ -1,16 +1,17 @@
 /* { dg-do compile } */
-/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -m3dnow -msse4 -msse5" } */
+/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -m3dnow -msse4 -msse5 -maes -mpclmul" } */
 
 #include <mm_malloc.h>
 
 /* Test that the intrinsics compile without optimization.  All of them are
-   defined as inline functions in {,x,e,p,t,s,a,b}mmintrin.h  and mm3dnow.h
+   defined as inline functions in {,x,e,p,t,s,w,a,b}mmintrin.h  and mm3dnow.h
    that reference the proper builtin functions.  Defining away "extern" and
    "__inline" results in all of them being compiled as proper functions.  */
 
 #define extern
 #define __inline
 
+#include <wmmintrin.h>
 #include <bmmintrin.h>
 #include <smmintrin.h>
 #include <mm3dnow.h>
@@ -46,6 +47,10 @@
 test_1x (_mm_extracti_si64, __m128i, __m128i, 1, 1)
 test_2x (_mm_inserti_si64, __m128i, __m128i, __m128i, 1, 1)
 
+/* wmmintrin.h */
+test_1 (_mm_aeskeygenassist_si128, __m128i, __m128i, 1)
+test_2 (_mm_clmulepi64_si128, __m128i, __m128i, __m128i, 1)
+
 /* smmintrin.h */
 test_2 (_mm_blend_epi16, __m128i, __m128i, __m128i, 1)
 test_2 (_mm_blend_ps, __m128, __m128, __m128, 1)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: PATCH: Enable Intel AES/CLMUL
  2008-04-04 14:56               ` Uros Bizjak
@ 2008-04-04 15:00                 ` Jakub Jelinek
  2008-04-04 15:58                   ` H.J. Lu
  2008-04-04 16:33                   ` Uros Bizjak
  2008-04-04 15:31                 ` H.J. Lu
  1 sibling, 2 replies; 23+ messages in thread
From: Jakub Jelinek @ 2008-04-04 15:00 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: H.J. Lu, GCC Patches

On Fri, Apr 04, 2008 at 04:33:51PM +0200, Uros Bizjak wrote:
> This will work just fine even without SSE (although a warning about
> changed ABI will be issued). We can even add AES functionality to the
> library this way (hint, hint ;)
> 
> And using intrinsic, the situation will be actually just reversed to
> the situation you described below.

This doesn't make sense.  The aes instructions use SSE2 registers,
so IMNSHO you really have to enable sse2 to be able to emit aes/pclmul
instructions.  So it makes perfect sense for -maes to enable -msse2.

(define_insn "aesenc"
  [(set (match_operand:V2DI 0 "register_operand" "=x")
       (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0")
                      (match_operand:V2DI 2 "nonimmediate_operand" "xm")]
                     UNSPEC_AESENC))]
  "TARGET_AES"
  "aesenc\t{%2, %0|%0, %2}"
  [(set_attr "type" "sselog1")
   (set_attr "prefix_extra" "1")
   (set_attr "mode" "TI")])

With -mno-sse "x" constraint is actually NO_REGS, how could that ever match?
And V2DI mode for SSE regs is SSE2+ only.

	Jakub

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: PATCH: Enable Intel AES/CLMUL
  2008-04-04 14:51             ` H.J. Lu
@ 2008-04-04 14:56               ` Uros Bizjak
  2008-04-04 15:00                 ` Jakub Jelinek
  2008-04-04 15:31                 ` H.J. Lu
  0 siblings, 2 replies; 23+ messages in thread
From: Uros Bizjak @ 2008-04-04 14:56 UTC (permalink / raw)
  To: H.J. Lu; +Cc: GCC Patches

On Fri, Apr 4, 2008 at 4:07 PM, H.J. Lu <hjl.tools@gmail.com> wrote:

>  > OK. So, we require user to pass at least -msse2 in addition to
>  > -maes/-mclmul then? The errors will be informative, so there will be
>  > no confusion.
>  >
>
>  Here is the updated patch. I don't think we should require adding
>  -msse2 explicitly for -maes/-mclmul since they won't work at all
>  without SSE2. However, if -mno-sse2 is used, user will get an eror:

I really don't like the idea of switching SSE2 without user explicitly
requesting it. We already have something similar with -mcx16; this
compile flag doesn't switch -m64, although it applies only there.
Another rationale: when __builtin version is used (although not
recommended), adding -maes/-mclmul without -msse2 will result in a
function call that will pass its __v2di parameters through the stack
(this also somehow opens the door for a backup implementation for
targets without HW support or even without SSE):

typedef long long __v2di attribute ((vector_size (16)));

__v2di __builtin_ia32_aes...(__v2di A, __v2di B)
{
...
}

This will work just fine even without SSE (although a warning about
changed ABI will be issued). We can even add AES functionality to the
library this way (hint, hint ;)

And using intrinsic, the situation will be actually just reversed to
the situation you described below.

>
>  bash-3.2$ cat x.c
>  #include <wmmintrin.h>
>  bash-3.2$ ./xgcc -B./ -m32 -c x.c
>  In file included from ./include/wmmintrin.h:34,
>                  from x.c:1:
>  ./include/emmintrin.h:34:3: error: #error "SSE2 instruction set not
>  enabled"
>  In file included from x.c:1:
>  ./include/wmmintrin.h:37:3: error: #error "AES/PCLMUL instructions not
>  enabled"
>  bash-3.2$ ./xgcc -B./ -m32 -c x.c -maes
>  bash-3.2$ ./xgcc -B./ -m32 -c x.c -maes -mno-sse2
>  In file included from ./include/wmmintrin.h:34,
>                  from x.c:1:
>  ./include/emmintrin.h:34:3: error: #error "SSE2 instruction set not
>  enabled"
>  In file included from x.c:1:
>  ./include/wmmintrin.h:37:3: error: #error "AES/PCLMUL instructions not
>  enabled"
>  bash-3.2$
>
>  I also changed clmul/CLMUL to pclmul/PCLMUL. OK to install?

No problems with the name change.

The patch is OK, but please can you reconsider SSE2 requirement?

Thanks,
Uros.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: PATCH: Enable Intel AES/CLMUL
  2008-04-04 13:56           ` Uros Bizjak
  2008-04-04 14:08             ` Uros Bizjak
@ 2008-04-04 14:51             ` H.J. Lu
  2008-04-04 14:56               ` Uros Bizjak
  1 sibling, 1 reply; 23+ messages in thread
From: H.J. Lu @ 2008-04-04 14:51 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: GCC Patches

On Fri, Apr 04, 2008 at 03:36:16PM +0200, Uros Bizjak wrote:
> On Fri, Apr 4, 2008 at 3:28 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> 
> >  > >  SSE doesn't support V2DI. You need at least SSE2.
> >  > >
> >  >
> >  > OK.
> 
> >  > This practically forces usage of -msse4.1 for -maes/-mclmul, since
> >  > these instructions are used only through intrinsics, defined in their
> >  > corresponding header file. wmmintrin.h includes smmintrin.h that will
> >  > error out by #ifndef __SSE4_1__.
> >  >
> >  > I think that user should explicitly enable sse4.1 manually in the
> >  > command line together with -maes/-mclmul to access these new
> >  > instructions. Including wmmintrin.h will error out when neither
> >  > -maes/-mclmul is enabled, and smmintrin.h will error out with a
> >  > message that SSE4.1 instruction set is not enabled.
> >
> >  There is a proposal to get rid of all *mmintrin.h.  Users should
> >  include one header file, something like <ia32intrin.h> ,for all
> >  current and future intrinsics.  The name of the meta intrinsic
> >  heade file hasn't be decided. Do we have any preferences/suggestions?
> >
> >  With that in mind, I will change it to include emmintrin.h.
> 
> OK. So, we require user to pass at least -msse2 in addition to
> -maes/-mclmul then? The errors will be informative, so there will be
> no confusion.
> 

Here is the updated patch. I don't think we should require adding
-msse2 explicitly for -maes/-mclmul since they won't work at all
without SSE2. However, if -mno-sse2 is used, user will get an eror:

bash-3.2$ cat x.c
#include <wmmintrin.h>
bash-3.2$ ./xgcc -B./ -m32 -c x.c 
In file included from ./include/wmmintrin.h:34,
                 from x.c:1:
./include/emmintrin.h:34:3: error: #error "SSE2 instruction set not
enabled"
In file included from x.c:1:
./include/wmmintrin.h:37:3: error: #error "AES/PCLMUL instructions not
enabled"
bash-3.2$ ./xgcc -B./ -m32 -c x.c -maes
bash-3.2$ ./xgcc -B./ -m32 -c x.c -maes -mno-sse2
In file included from ./include/wmmintrin.h:34,
                 from x.c:1:
./include/emmintrin.h:34:3: error: #error "SSE2 instruction set not
enabled"
In file included from x.c:1:
./include/wmmintrin.h:37:3: error: #error "AES/PCLMUL instructions not
enabled"
bash-3.2$ 

I also changed clmul/CLMUL to pclmul/PCLMUL. OK to install?

Thanks.

H.J.
---
gcc/

2008-04-04  H.J. Lu  <hongjiu.lu@intel.com>

	* config.gcc (extra_headers): Add wmmintrin.h for x86 and x86-64.

	* config/i386/cpuid.h (bit_AES): New.
	(bit_PCLMUL): Likewise.

	* config/i386/i386.c (pta_flags): Add PTA_AES and PTA_PCLMUL.
	(override_options): Handle PTA_AES and PTA_PCLMUL.  Enable
	SSE2 if AES or PCLMUL is enabled.
	(ix86_builtins): Add IX86_BUILTIN_AESENC128,
	IX86_BUILTIN_AESENCLAST128, IX86_BUILTIN_AESDEC128,
	IX86_BUILTIN_AESDECLAST128, IX86_BUILTIN_AESIMC128,
	IX86_BUILTIN_AESKEYGENASSIST128 and IX86_BUILTIN_PCLMULQDQ128.
	(bdesc_sse_3arg): Add IX86_BUILTIN_PCLMULQDQ128.
	(bdesc_2arg): Add IX86_BUILTIN_AESENC128,
	IX86_BUILTIN_AESENCLAST128, IX86_BUILTIN_AESDEC128,
	IX86_BUILTIN_AESDECLAST128 and IX86_BUILTIN_AESKEYGENASSIST128.
	(bdesc_1arg): Add IX86_BUILTIN_AESIMC128.
	(ix86_init_mmx_sse_builtins): Define __builtin_ia32_aesenc128,
	__builtin_ia32_aesenclast128, __builtin_ia32_aesdec128,
	__builtin_ia32_aesdeclast128,__builtin_ia32_aesimc128,
	__builtin_ia32_aeskeygenassist128 and
	__builtin_ia32_pclmulqdq128.
	* config/i386/i386.c (ix86_expand_binop_imm_builtin): New.
	(ix86_expand_builtin): Use it for IX86_BUILTIN_PSLLDQI128 and
	IX86_BUILTIN_PSRLDQI128.  Handle IX86_BUILTIN_AESKEYGENASSIST128.

	* config/i386/i386.h (TARGET_AES): New.
	(TARGET_PCLMUL): Likewise.
	(TARGET_CPU_CPP_BUILTINS): Handle TARGET_AES and TARGET_PCLMUL.

	* config/i386/i386.md (UNSPEC_AESENC): New.
	(UNSPEC_AESENCLAST): Likewise.
	(UNSPEC_AESDEC): Likewise.
	(UNSPEC_AESDECLAST): Likewise.
	(UNSPEC_AESIMC): Likewise.
	(UNSPEC_AESKEYGENASSIST): Likewise.
	(UNSPEC_PCLMULQDQ): Likewise.

	* config/i386/i386.opt (maes): New.
	(mpclmul): Likewise.

	* config/i386/sse.md (aesenc): New pattern.
	(aesenclast): Likewise.
	(aesdec): Likewise.
	(aesdeclast): Likewise.
	(aesimc): Likewise.
	(aeskeygenassist): Likewise.
	(pclmulqdq): Likewise.

	* config/i386/wmmintrin.h: New.

	* doc/extend.texi: Document AES and PCLMUL built-in function.

	* doc/invoke.texi: Document -maes and -mpclmul.

gcc/testsuite/

2008-04-04  H.J. Lu  <hongjiu.lu@intel.com>

	* g++.dg/other/i386-2.C: Include <wmmintrin.h>.
	* g++.dg/other/i386-3.C: Likewise.
	* gcc.target/i386/sse-13.c: Likewise.
	* gcc.target/i386/sse-14.c: Likewise.

	* gcc.target/i386/aes-check.h: New.
	* gcc.target/i386/aesdec.c: Likewise.
	* gcc.target/i386/aesdeclast.c: Likewise.
	* gcc.target/i386/aesenc.c: Likewise.
	* gcc.target/i386/aesenclast.c: Likewise.
	* gcc.target/i386/aesimc.c: Likewise.
	* gcc.target/i386/aeskeygenassist.c: Likewise.
	* gcc.target/i386/pclmulqdq.c: Likewise.
	* gcc.target/i386/pclmul-check.h: Likewise.

	* gcc.target/i386/i386.exp (check_effective_target_aes): New.
	(check_effective_target_pclmul): Likewise.

--- gcc/config.gcc.aes	2008-04-03 22:14:41.000000000 -0700
+++ gcc/config.gcc	2008-04-03 22:14:44.000000000 -0700
@@ -309,13 +309,15 @@ i[34567]86-*-*)
 	cpu_type=i386
 	extra_headers="cpuid.h mmintrin.h mm3dnow.h xmmintrin.h emmintrin.h
 		       pmmintrin.h tmmintrin.h ammintrin.h smmintrin.h
-		       nmmintrin.h bmmintrin.h mmintrin-common.h"
+		       nmmintrin.h bmmintrin.h mmintrin-common.h
+		       wmmintrin.h"
 	;;
 x86_64-*-*)
 	cpu_type=i386
 	extra_headers="cpuid.h mmintrin.h mm3dnow.h xmmintrin.h emmintrin.h
 		       pmmintrin.h tmmintrin.h ammintrin.h smmintrin.h
-		       nmmintrin.h bmmintrin.h mmintrin-common.h"
+		       nmmintrin.h bmmintrin.h mmintrin-common.h
+		       wmmintrin.h"
 	need_64bit_hwint=yes
 	;;
 ia64-*-*)
--- gcc/config/i386/cpuid.h.aes	2008-02-19 20:52:26.000000000 -0800
+++ gcc/config/i386/cpuid.h	2008-04-04 06:40:30.000000000 -0700
@@ -33,11 +33,13 @@
 
 /* %ecx */
 #define bit_SSE3	(1 << 0)
+#define bit_PCLMUL	(1 << 1)
 #define bit_SSSE3	(1 << 9)
 #define bit_CMPXCHG16B	(1 << 13)
 #define bit_SSE4_1	(1 << 19)
 #define bit_SSE4_2	(1 << 20)
 #define bit_POPCNT	(1 << 23)
+#define bit_AES		(1 << 25)
 
 /* %edx */
 #define bit_CMPXCHG8B	(1 << 8)
--- gcc/config/i386/i386.c.aes	2008-04-03 22:14:44.000000000 -0700
+++ gcc/config/i386/i386.c	2008-04-04 06:34:20.000000000 -0700
@@ -2078,7 +2078,9 @@ override_options (void)
       PTA_NO_SAHF = 1 << 13,
       PTA_SSE4_1 = 1 << 14,
       PTA_SSE4_2 = 1 << 15,
-      PTA_SSE5 = 1 << 16
+      PTA_SSE5 = 1 << 16,
+      PTA_AES = 1 << 17,
+      PTA_PCLMUL = 1 << 18
     };
 
   static struct pta
@@ -2385,6 +2387,10 @@ override_options (void)
 	  x86_prefetch_sse = true;
 	if (!(TARGET_64BIT && (processor_alias_table[i].flags & PTA_NO_SAHF)))
 	  x86_sahf = true;
+	if (processor_alias_table[i].flags & PTA_AES)
+	  x86_aes = true;
+	if (processor_alias_table[i].flags & PTA_PCLMUL)
+	  x86_pclmul = true;
 
 	break;
       }
@@ -2428,6 +2434,14 @@ override_options (void)
   if (i == pta_size)
     error ("bad value (%s) for -mtune= switch", ix86_tune_string);
 
+  /* Enable SSE2 if AES or PCLMUL is enabled.  */
+  if ((x86_aes || x86_pclmul)
+      && !(ix86_isa_flags_explicit & OPTION_MASK_ISA_SSE2))
+    {
+      ix86_isa_flags |= OPTION_MASK_ISA_SSE2_SET;
+      ix86_isa_flags_explicit |= OPTION_MASK_ISA_SSE2_SET;
+    }
+
   ix86_tune_mask = 1u << ix86_tune;
   for (i = 0; i < X86_TUNE_LAST; ++i)
     ix86_tune_features[i] &= ix86_tune_mask;
@@ -17646,6 +17660,17 @@ enum ix86_builtins
 
   IX86_BUILTIN_PCMPGTQ,
 
+  /* AES instructions */
+  IX86_BUILTIN_AESENC128,
+  IX86_BUILTIN_AESENCLAST128,
+  IX86_BUILTIN_AESDEC128,
+  IX86_BUILTIN_AESDECLAST128,
+  IX86_BUILTIN_AESIMC128,
+  IX86_BUILTIN_AESKEYGENASSIST128,
+
+  /* PCLMUL instruction */
+  IX86_BUILTIN_PCLMULQDQ128,
+
   /* TFmode support builtins.  */
   IX86_BUILTIN_INFQ,
   IX86_BUILTIN_FABSQ,
@@ -18007,6 +18032,9 @@ static const struct builtin_description 
   { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_pblendw, "__builtin_ia32_pblendw128", IX86_BUILTIN_PBLENDW128, UNKNOWN, 0 },
   { OPTION_MASK_ISA_ROUND, CODE_FOR_sse4_1_roundsd, "__builtin_ia32_roundsd", IX86_BUILTIN_ROUNDSD, UNKNOWN, 0 },
   { OPTION_MASK_ISA_ROUND, CODE_FOR_sse4_1_roundss, "__builtin_ia32_roundss", IX86_BUILTIN_ROUNDSS, UNKNOWN, 0 },
+
+  /* PCLMUL */
+  { OPTION_MASK_ISA_SSE2, CODE_FOR_pclmulqdq, 0, IX86_BUILTIN_PCLMULQDQ128, UNKNOWN, 0 },
 };
 
 static const struct builtin_description bdesc_2arg[] =
@@ -18287,6 +18315,13 @@ static const struct builtin_description 
 
   /* SSE4.2 */
   { OPTION_MASK_ISA_SSE4_2, CODE_FOR_sse4_2_gtv2di3, "__builtin_ia32_pcmpgtq", IX86_BUILTIN_PCMPGTQ, UNKNOWN, 0 },
+
+  /* AES */
+  { OPTION_MASK_ISA_SSE2, CODE_FOR_aesenc, 0, IX86_BUILTIN_AESENC128, UNKNOWN, 0 },
+  { OPTION_MASK_ISA_SSE2, CODE_FOR_aesenclast, 0, IX86_BUILTIN_AESENCLAST128, UNKNOWN, 0 },
+  { OPTION_MASK_ISA_SSE2, CODE_FOR_aesdec, 0, IX86_BUILTIN_AESDEC128, UNKNOWN, 0 },
+  { OPTION_MASK_ISA_SSE2, CODE_FOR_aesdeclast, 0, IX86_BUILTIN_AESDECLAST128, UNKNOWN, 0 },
+  { OPTION_MASK_ISA_SSE2, CODE_FOR_aeskeygenassist, 0, IX86_BUILTIN_AESKEYGENASSIST128, UNKNOWN, 0 },
 };
 
 static const struct builtin_description bdesc_1arg[] =
@@ -18364,6 +18399,9 @@ static const struct builtin_description 
   /* Fake 1 arg builtins with a constant smaller than 8 bits as the 2nd arg.  */
   { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_roundpd, 0, IX86_BUILTIN_ROUNDPD, UNKNOWN, 0 },
   { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_roundps, 0, IX86_BUILTIN_ROUNDPS, UNKNOWN, 0 },
+
+  /* AES */
+  { OPTION_MASK_ISA_SSE2, CODE_FOR_aesimc, 0, IX86_BUILTIN_AESIMC128, UNKNOWN, 0 },
 };
 
 /* SSE5 */
@@ -19600,6 +19638,25 @@ ix86_init_mmx_sse_builtins (void)
 				    NULL_TREE);
   def_builtin_const (OPTION_MASK_ISA_SSE4_2, "__builtin_ia32_crc32di", ftype, IX86_BUILTIN_CRC32DI);
 
+  /* AES */
+  if (TARGET_AES)
+    {
+      /* Define AES built-in functions only if AES is enabled.  */
+      def_builtin_const (OPTION_MASK_ISA_SSE2, "__builtin_ia32_aesenc128", v2di_ftype_v2di_v2di, IX86_BUILTIN_AESENC128);
+      def_builtin_const (OPTION_MASK_ISA_SSE2, "__builtin_ia32_aesenclast128", v2di_ftype_v2di_v2di, IX86_BUILTIN_AESENCLAST128);
+      def_builtin_const (OPTION_MASK_ISA_SSE2, "__builtin_ia32_aesdec128", v2di_ftype_v2di_v2di, IX86_BUILTIN_AESDEC128);
+      def_builtin_const (OPTION_MASK_ISA_SSE2, "__builtin_ia32_aesdeclast128", v2di_ftype_v2di_v2di, IX86_BUILTIN_AESDECLAST128);
+      def_builtin_const (OPTION_MASK_ISA_SSE2, "__builtin_ia32_aesimc128", v2di_ftype_v2di, IX86_BUILTIN_AESIMC128);
+      def_builtin_const (OPTION_MASK_ISA_SSE2, "__builtin_ia32_aeskeygenassist128", v2di_ftype_v2di_int, IX86_BUILTIN_AESKEYGENASSIST128);
+    }
+
+  /* PCLMUL */
+  if (TARGET_PCLMUL)
+    {
+      /* Define PCLMUL built-in function only if PCLMUL is enabled.  */
+      def_builtin_const (OPTION_MASK_ISA_SSE2, "__builtin_ia32_pclmulqdq128", v2di_ftype_v2di_v2di_int, IX86_BUILTIN_PCLMULQDQ128);
+    }
+
   /* AMDFAM10 SSE4A New built-ins  */
   def_builtin (OPTION_MASK_ISA_SSE4A, "__builtin_ia32_movntsd", void_ftype_pdouble_v2df, IX86_BUILTIN_MOVNTSD);
   def_builtin (OPTION_MASK_ISA_SSE4A, "__builtin_ia32_movntss", void_ftype_pfloat_v4sf, IX86_BUILTIN_MOVNTSS);
@@ -19880,6 +19937,44 @@ ix86_expand_crc32 (enum insn_code icode,
   return target;
 }
 
+/* Subroutine of ix86_expand_builtin to take care of binop insns
+   with an immediate.  */
+
+static rtx
+ix86_expand_binop_imm_builtin (enum insn_code icode, tree exp,
+				rtx target)
+{
+  rtx pat;
+  tree arg0 = CALL_EXPR_ARG (exp, 0);
+  tree arg1 = CALL_EXPR_ARG (exp, 1);
+  rtx op0 = expand_normal (arg0);
+  rtx op1 = expand_normal (arg1);
+  enum machine_mode tmode = insn_data[icode].operand[0].mode;
+  enum machine_mode mode0 = insn_data[icode].operand[1].mode;
+  enum machine_mode mode1 = insn_data[icode].operand[2].mode;
+
+  if (! (*insn_data[icode].operand[1].predicate) (op0, mode1))
+    {
+      op0 = copy_to_reg (op0);
+      op0 = simplify_gen_subreg (mode0, op0, GET_MODE (op0), 0);
+    }
+
+  if (! (*insn_data[icode].operand[2].predicate) (op1, mode1))
+    {
+      error ("the last operand must be an immediate");
+      return const0_rtx;
+    }
+
+  target = gen_reg_rtx (V2DImode);
+  pat = GEN_FCN (icode) (simplify_gen_subreg (tmode, target,
+					      V2DImode, 0),
+			 op0, op1);
+  if (! pat)
+    return 0;
+  emit_insn (pat);
+  return target;
+}
+
 /* Subroutine of ix86_expand_builtin to take care of binop insns.  */
 
 static rtx
@@ -20972,34 +21067,18 @@ ix86_expand_builtin (tree exp, rtx targe
       return target;
 
     case IX86_BUILTIN_PSLLDQI128:
+      return ix86_expand_binop_imm_builtin (CODE_FOR_sse2_ashlti3,
+					     exp, target);
+      break;
+
     case IX86_BUILTIN_PSRLDQI128:
-      icode = (fcode == IX86_BUILTIN_PSLLDQI128 ? CODE_FOR_sse2_ashlti3
-	       : CODE_FOR_sse2_lshrti3);
-      arg0 = CALL_EXPR_ARG (exp, 0);
-      arg1 = CALL_EXPR_ARG (exp, 1);
-      op0 = expand_normal (arg0);
-      op1 = expand_normal (arg1);
-      tmode = insn_data[icode].operand[0].mode;
-      mode1 = insn_data[icode].operand[1].mode;
-      mode2 = insn_data[icode].operand[2].mode;
+      return ix86_expand_binop_imm_builtin (CODE_FOR_sse2_lshrti3,
+					     exp, target);
+      break;
 
-      if (! (*insn_data[icode].operand[1].predicate) (op0, mode1))
-	{
-	  op0 = copy_to_reg (op0);
-	  op0 = simplify_gen_subreg (mode1, op0, GET_MODE (op0), 0);
-	}
-      if (! (*insn_data[icode].operand[2].predicate) (op1, mode2))
-	{
-	  error ("shift must be an immediate");
-	  return const0_rtx;
-	}
-      target = gen_reg_rtx (V2DImode);
-      pat = GEN_FCN (icode) (simplify_gen_subreg (tmode, target, V2DImode, 0),
-			     op0, op1);
-      if (! pat)
-	return 0;
-      emit_insn (pat);
-      return target;
+    case IX86_BUILTIN_AESKEYGENASSIST128:
+      return ix86_expand_binop_imm_builtin (CODE_FOR_aeskeygenassist,
+					     exp, target);
 
     case IX86_BUILTIN_FEMMS:
       emit_insn (gen_mmx_femms ());
--- gcc/config/i386/i386.h.aes	2008-04-03 22:14:41.000000000 -0700
+++ gcc/config/i386/i386.h	2008-04-04 06:31:48.000000000 -0700
@@ -395,6 +395,8 @@ extern int x86_prefetch_sse;
 #define TARGET_SAHF		x86_sahf
 #define TARGET_RECIP		x86_recip
 #define TARGET_FUSED_MADD	x86_fused_muladd
+#define TARGET_AES		(TARGET_SSE2 && x86_aes)
+#define TARGET_PCLMUL		(TARGET_SSE2 && x86_pclmul)
 
 #define ASSEMBLER_DIALECT	(ix86_asm_dialect)
 
@@ -683,6 +685,10 @@ extern const char *host_detect_local_cpu
 	builtin_define ("__SSE4_1__");				\
       if (TARGET_SSE4_2)					\
 	builtin_define ("__SSE4_2__");				\
+      if (TARGET_AES)						\
+	builtin_define ("__AES__");				\
+      if (TARGET_PCLMUL)					\
+	builtin_define ("__PCLMUL__");				\
       if (TARGET_SSE4A)						\
  	builtin_define ("__SSE4A__");		                \
       if (TARGET_SSE5)						\
--- gcc/config/i386/i386.md.aes	2008-04-03 22:14:44.000000000 -0700
+++ gcc/config/i386/i386.md	2008-04-04 06:35:25.000000000 -0700
@@ -186,6 +186,17 @@
    (UNSPEC_FRCZ			156)
    (UNSPEC_CVTPH2PS		157)
    (UNSPEC_CVTPS2PH		158)
+
+   ; For AES support
+   (UNSPEC_AESENC		159)
+   (UNSPEC_AESENCLAST		160)
+   (UNSPEC_AESDEC		161)
+   (UNSPEC_AESDECLAST		162)
+   (UNSPEC_AESIMC		163)
+   (UNSPEC_AESKEYGENASSIST	164)
+
+   ; For PCLMUL support
+   (UNSPEC_PCLMUL		165)
   ])
 
 (define_constants
--- gcc/config/i386/i386.opt.aes	2007-09-13 06:25:13.000000000 -0700
+++ gcc/config/i386/i386.opt	2008-04-04 06:32:30.000000000 -0700
@@ -275,3 +275,11 @@ Target Report Var(x86_fused_muladd) Init
 Enable automatic generation of fused floating point multiply-add instructions
 if the ISA supports such instructions.  The -mfused-madd option is on by
 default.
+
+maes
+Target Report RejectNegative Var(x86_aes)
+Support AES built-in functions and code generation
+
+mpclmul
+Target Report RejectNegative Var(x86_pclmul)
+Support PCLMUL built-in functions and code generation
--- gcc/config/i386/sse.md.aes	2008-04-02 20:36:24.000000000 -0700
+++ gcc/config/i386/sse.md	2008-04-04 06:35:17.000000000 -0700
@@ -7897,3 +7897,80 @@
 }
   [(set_attr "type" "ssecmp")
    (set_attr "mode" "TI")])
+
+(define_insn "aesenc"
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
+	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0")
+		       (match_operand:V2DI 2 "nonimmediate_operand" "xm")]
+		      UNSPEC_AESENC))]
+  "TARGET_AES"
+  "aesenc\t{%2, %0|%0, %2}"
+  [(set_attr "type" "sselog1")
+   (set_attr "prefix_extra" "1")
+   (set_attr "mode" "TI")])
+
+(define_insn "aesenclast"
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
+	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0")
+		       (match_operand:V2DI 2 "nonimmediate_operand" "xm")]
+		      UNSPEC_AESENCLAST))]
+  "TARGET_AES"
+  "aesenclast\t{%2, %0|%0, %2}"
+  [(set_attr "type" "sselog1")
+   (set_attr "prefix_extra" "1")
+   (set_attr "mode" "TI")])
+
+(define_insn "aesdec"
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
+	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0")
+		       (match_operand:V2DI 2 "nonimmediate_operand" "xm")]
+		      UNSPEC_AESDEC))]
+  "TARGET_AES"
+  "aesdec\t{%2, %0|%0, %2}"
+  [(set_attr "type" "sselog1")
+   (set_attr "prefix_extra" "1")
+   (set_attr "mode" "TI")])
+
+(define_insn "aesdeclast"
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
+	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0")
+		       (match_operand:V2DI 2 "nonimmediate_operand" "xm")]
+		      UNSPEC_AESDECLAST))]
+  "TARGET_AES"
+  "aesdeclast\t{%2, %0|%0, %2}"
+  [(set_attr "type" "sselog1")
+   (set_attr "prefix_extra" "1")
+   (set_attr "mode" "TI")])
+
+(define_insn "aesimc"
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
+	(unspec:V2DI [(match_operand:V2DI 1 "nonimmediate_operand" "xm")]
+		      UNSPEC_AESIMC))]
+  "TARGET_AES"
+  "aesimc\t{%1, %0|%0, %1}"
+  [(set_attr "type" "sselog1")
+   (set_attr "prefix_extra" "1")
+   (set_attr "mode" "TI")])
+
+(define_insn "aeskeygenassist"
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
+	(unspec:V2DI [(match_operand:V2DI 1 "nonimmediate_operand" "xm")
+		      (match_operand:SI 2 "const_0_to_255_operand" "n")]
+		     UNSPEC_AESKEYGENASSIST))]
+  "TARGET_AES"
+  "aeskeygenassist\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "sselog1")
+   (set_attr "prefix_extra" "1")
+   (set_attr "mode" "TI")])
+
+(define_insn "pclmulqdq"
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
+	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0")
+		      (match_operand:V2DI 2 "nonimmediate_operand" "xm")
+		      (match_operand:SI 3 "const_0_to_255_operand" "n")]
+		     UNSPEC_PCLMUL))]
+  "TARGET_PCLMUL"
+  "pclmulqdq\t{%3, %2, %0|%0, %2, %3}"
+  [(set_attr "type" "sselog1")
+   (set_attr "prefix_extra" "1")
+   (set_attr "mode" "TI")])
--- gcc/config/i386/wmmintrin.h.aes	2008-04-03 22:14:44.000000000 -0700
+++ gcc/config/i386/wmmintrin.h	2008-04-04 07:05:03.000000000 -0700
@@ -0,0 +1,123 @@
+/* Copyright (C) 2008 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 2, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING.  If not, write to
+   the Free Software Foundation, 59 Temple Place - Suite 330,
+   Boston, MA 02111-1307, USA.  */
+
+/* As a special exception, if you include this header file into source
+   files compiled by GCC, this header file does not by itself cause
+   the resulting executable to be covered by the GNU General Public
+   License.  This exception does not however invalidate any other
+   reasons why the executable file might be covered by the GNU General
+   Public License.  */
+
+/* Implemented from the specification included in the Intel C++ Compiler
+   User Guide and Reference, version 10.1.  */
+
+#ifndef _WMMINTRIN_H_INCLUDED
+#define _WMMINTRIN_H_INCLUDED
+
+/* We need definitions from the SSE2 header file.  */
+#include <emmintrin.h>
+
+#if !defined (__AES__) && !defined (__PCLMUL__)
+# error "AES/PCLMUL instructions not enabled"
+#else
+
+/* AES */
+
+#ifdef __AES__
+/* Performs 1 round of AES decryption of the first m128i using 
+   the second m128i as a round key.  */
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_aesdec_si128 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) __builtin_ia32_aesdec128 ((__v2di)__X, (__v2di)__Y);
+}
+
+/* Performs the last round of AES decryption of the first m128i 
+   using the second m128i as a round key.  */
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_aesdeclast_si128 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) __builtin_ia32_aesdeclast128 ((__v2di)__X,
+						 (__v2di)__Y);
+}
+
+/* Performs 1 round of AES encryption of the first m128i using 
+   the second m128i as a round key.  */
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_aesenc_si128 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) __builtin_ia32_aesenc128 ((__v2di)__X, (__v2di)__Y);
+}
+
+/* Performs the last round of AES encryption of the first m128i
+   using the second m128i as a round key.  */
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_aesenclast_si128 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) __builtin_ia32_aesenclast128 ((__v2di)__X, (__v2di)__Y);
+}
+
+/* Performs the InverseMixColumn operation on the source m128i 
+   and stores the result into m128i destination.  */
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_aesimc_si128 (__m128i __X)
+{
+  return (__m128i) __builtin_ia32_aesimc128 ((__v2di)__X);
+}
+
+/* Generates a m128i round key for the input m128i AES cipher key and
+   byte round constant.  The second parameter must be a compile time
+   constant.  */
+#ifdef __OPTIMIZE__
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_aeskeygenassist_si128 (__m128i __X, const int __C)
+{
+  return (__m128i) __builtin_ia32_aeskeygenassist128 ((__v2di)__X, __C);
+}
+#else
+#define _mm_aeskeygenassist_si128(X, C)					\
+  ((__m128i) __builtin_ia32_aeskeygenassist128 ((__v2di)(__m128i)(X),	\
+						(int)(C)))
+#endif
+#endif  /* __AES__ */
+
+/* PCLMUL */
+
+#ifdef __PCLMUL__
+/* Performs carry-less integer multiplication of 64-bit halves of
+   128-bit input operands.  The third parameter inducates which 64-bit
+   haves of the input parameters v1 and v2 should be used. It must be
+   a compile time constant.  */
+#ifdef __OPTIMIZE__
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_clmulepi64_si128 (__m128i __X, __m128i __Y, const int __I)
+{
+  return (__m128i) __builtin_ia32_pclmulqdq128 ((__v2di)__X,
+						(__v2di)__Y, __I);
+}
+#else
+#define _mm_clmulepi64_si128(X, Y, I)					\
+  ((__m128i) __builtin_ia32_pclmulqdq128 ((__v2di)(__m128i)(X),		\
+					  (__v2di)(__m128i)(Y), (int)(I)))
+#endif
+#endif  /* __PCLMUL__  */
+
+#endif /* __AES__/__PCLMUL__ */
+
+#endif /* _WMMINTRIN_H_INCLUDED */
--- gcc/doc/extend.texi.aes	2008-03-25 11:40:12.000000000 -0700
+++ gcc/doc/extend.texi	2008-04-04 06:40:00.000000000 -0700
@@ -8013,6 +8013,27 @@ depending on the size of @code{unsigned 
 Generates the @code{popcntq} machine instruction.
 @end table
 
+The following built-in functions are available when @option{-maes} is
+used.  All of them generate the machine instruction that is part of the
+name.
+
+@smallexample
+v2di __builtin_ia32_aesenc128 (v2di, v2di)
+v2di __builtin_ia32_aesenclast128 (v2di, v2di)
+v2di __builtin_ia32_aesdec128 (v2di, v2di)
+v2di __builtin_ia32_aesdeclast128 (v2di, v2di)
+v2di __builtin_ia32_aeskeygenassist128 (v2di, const int)
+v2di __builtin_ia32_aesimc128 (v2di)
+@end smallexample
+
+The following built-in function is available when @option{-mpclmul} is
+used.
+
+@table @code
+@item v2di __builtin_ia32_pclmulqdq128 (v2di, v2di, const int)
+Generates the @code{pclmulqdq} machine instruction.
+@end table
+
 The following built-in functions are available when @option{-msse4a} is used.
 All of them generate the machine instruction that is part of the name.
 
--- gcc/doc/invoke.texi.aes	2008-04-03 22:14:44.000000000 -0700
+++ gcc/doc/invoke.texi	2008-04-04 06:45:47.000000000 -0700
@@ -555,6 +555,7 @@ Objective-C and Objective-C++ Dialects}.
 -mno-wide-multiply  -mrtd  -malign-double @gol
 -mpreferred-stack-boundary=@var{num} -mcx16 -msahf -mrecip @gol
 -mmmx  -msse  -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msse4 @gol
+-maes -mpclmul @gol
 -msse4a -m3dnow -mpopcnt -mabm -msse5 @gol
 -mthreads  -mno-align-stringops  -minline-all-stringops @gol
 -mpush-args  -maccumulate-outgoing-args  -m128bit-long-double @gol
@@ -10720,6 +10721,10 @@ preferred alignment to @option{-mpreferr
 @itemx -mno-sse4.2
 @itemx -msse4
 @itemx -mno-sse4
+@itemx -maes
+@itemx -mno-aes
+@itemx -mpclmul
+@itemx -mno-pclmul
 @itemx -msse4a
 @itemx -mno-sse4a
 @itemx -msse5
@@ -10737,8 +10742,8 @@ preferred alignment to @option{-mpreferr
 @opindex m3dnow
 @opindex mno-3dnow
 These switches enable or disable the use of instructions in the MMX,
-SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4A, SSE5, ABM or 3DNow!@: extended
-instruction sets.
+SSE, SSE2, SSE3, SSSE3, SSE4.1, AES, PCLMUL, SSE4A, SSE5, ABM or
+3DNow!@: extended instruction sets.
 These extensions are also available as built-in functions: see
 @ref{X86 Built-in Functions}, for details of the functions enabled and
 disabled by these switches.
--- gcc/testsuite/g++.dg/other/i386-2.C.aes	2007-12-15 15:49:16.000000000 -0800
+++ gcc/testsuite/g++.dg/other/i386-2.C	2008-04-04 06:37:21.000000000 -0700
@@ -1,8 +1,9 @@
-/* Test that {,x,e,p,t,s,a,b}mmintrin.h, mm3dnow.h and mm_malloc.h are
+/* Test that {,x,e,p,t,s,w,a,b}mmintrin.h, mm3dnow.h and mm_malloc.h are
    usable with -O -pedantic-errors.  */
 /* { dg-do compile { target i?86-*-* x86_64-*-* } } */
-/* { dg-options "-O -pedantic-errors -march=k8 -m3dnow -msse4 -msse5" } */
+/* { dg-options "-O -pedantic-errors -march=k8 -m3dnow -msse4 -msse5 -maes -mpclmul" } */
 
+#include <wmmintrin.h>
 #include <bmmintrin.h>
 #include <smmintrin.h>
 #include <mm3dnow.h>
--- gcc/testsuite/g++.dg/other/i386-3.C.aes	2008-03-13 09:04:41.000000000 -0700
+++ gcc/testsuite/g++.dg/other/i386-3.C	2008-04-04 06:37:56.000000000 -0700
@@ -1,8 +1,9 @@
-/* Test that {,x,e,p,t,s,a,b}mmintrin.h, mm3dnow.h and mm_malloc.h are
+/* Test that {,x,e,p,t,s,w,a,b}mmintrin.h, mm3dnow.h and mm_malloc.h are
    usable with -O -fkeep-inline-functions.  */
 /* { dg-do compile { target i?86-*-* x86_64-*-* } } */
-/* { dg-options "-O -fkeep-inline-functions -march=k8 -m3dnow -msse4 -msse5" } */
+/* { dg-options "-O -fkeep-inline-functions -march=k8 -m3dnow -maes -mpclmul -msse4 -msse5" } */
 
+#include <wmmintrin.h>
 #include <bmmintrin.h>
 #include <smmintrin.h>
 #include <mm3dnow.h>
--- gcc/testsuite/gcc.target/i386/aes-check.h.aes	2008-04-03 22:14:44.000000000 -0700
+++ gcc/testsuite/gcc.target/i386/aes-check.h	2008-04-03 22:14:44.000000000 -0700
@@ -0,0 +1,30 @@
+#include <stdio.h>
+#include <stdlib.h>
+
+#include "cpuid.h"
+
+static void aes_test (void);
+
+int
+main ()
+{
+  unsigned int eax, ebx, ecx, edx;
+ 
+  if (!__get_cpuid (1, &eax, &ebx, &ecx, &edx))
+    return 0;
+
+  /* Run AES test only if host has AES support.  */
+  if (ecx & bit_AES)
+    {
+      aes_test ();
+#ifdef DEBUG
+    printf ("PASSED\n");
+#endif
+    }
+#ifdef DEBUG
+  else
+    printf ("SKIPPED\n");
+#endif
+
+  return 0;
+}
--- gcc/testsuite/gcc.target/i386/aesdec.c.aes	2008-04-03 22:14:44.000000000 -0700
+++ gcc/testsuite/gcc.target/i386/aesdec.c	2008-04-03 22:14:44.000000000 -0700
@@ -0,0 +1,67 @@
+/* { dg-do run } */
+/* { dg-require-effective-target aes } */
+/* { dg-options "-O2 -maes" } */
+
+#include <wmmintrin.h>
+#include <string.h>
+
+#include "aes-check.h"
+
+extern void abort (void);
+
+#define NUM 1024
+
+static __m128i src1[NUM];
+static __m128i src2[NUM];
+static __m128i edst[NUM];
+
+static __m128i resdst[NUM];
+
+/* Initialize input/output vectors.  (Currently, there is only one set
+   of input/output vectors).  */
+static void
+init_data (__m128i *s1, __m128i *s2, __m128i *d)
+{
+  int i;
+  for (i = 0; i < NUM; i++)
+    {
+      s1[i] = _mm_setr_epi32 (0x5d53475d, 0x63746f72,
+			      0x73745665, 0x7b5b5465);
+      s2[i] = _mm_setr_epi32 (0x726f6e5d, 0x5b477565,
+			      0x68617929, 0x48692853);
+      d[i]  = _mm_setr_epi32 (0xb730392a, 0xb58eb95e,
+			      0xfaea2787, 0x138ac342);
+    }
+}
+
+static void
+aes_test (void)
+{
+  int i;
+
+  init_data (src1, src2, edst);
+
+  for (i = 0; i < NUM; i += 16)
+    {
+      resdst[i] = _mm_aesdec_si128 (src1[i], src2[i]);
+      resdst[i + 1] = _mm_aesdec_si128 (src1[i + 1], src2[i + 1]);
+      resdst[i + 2] = _mm_aesdec_si128 (src1[i + 2], src2[i + 2]);
+      resdst[i + 3] = _mm_aesdec_si128 (src1[i + 3], src2[i + 3]);
+      resdst[i + 4] = _mm_aesdec_si128 (src1[i + 4], src2[i + 4]);
+      resdst[i + 5] = _mm_aesdec_si128 (src1[i + 5], src2[i + 5]);
+      resdst[i + 6] = _mm_aesdec_si128 (src1[i + 6], src2[i + 6]);
+      resdst[i + 7] = _mm_aesdec_si128 (src1[i + 7], src2[i + 7]);
+      resdst[i + 8] = _mm_aesdec_si128 (src1[i + 8], src2[i + 8]);
+      resdst[i + 9] = _mm_aesdec_si128 (src1[i + 9], src2[i + 9]);
+      resdst[i + 10] = _mm_aesdec_si128 (src1[i + 10], src2[i + 10]);
+      resdst[i + 11] = _mm_aesdec_si128 (src1[i + 11], src2[i + 11]);
+      resdst[i + 12] = _mm_aesdec_si128 (src1[i + 12], src2[i + 12]);
+      resdst[i + 13] = _mm_aesdec_si128 (src1[i + 13], src2[i + 13]);
+      resdst[i + 14] = _mm_aesdec_si128 (src1[i + 14], src2[i + 14]);
+      resdst[i + 15] = _mm_aesdec_si128 (src1[i + 15], src2[i + 15]);
+    }
+
+  for (i = 0; i < NUM; i++)
+    if (memcmp (edst + i, resdst + i, sizeof (__m128i)))
+      abort ();
+}
--- gcc/testsuite/gcc.target/i386/aesdeclast.c.aes	2008-04-03 22:14:44.000000000 -0700
+++ gcc/testsuite/gcc.target/i386/aesdeclast.c	2008-04-03 22:14:44.000000000 -0700
@@ -0,0 +1,69 @@
+/* { dg-do run } */
+/* { dg-require-effective-target aes } */
+/* { dg-options "-O2 -maes" } */
+
+#include <wmmintrin.h>
+#include <string.h>
+
+#include "aes-check.h"
+
+extern void abort (void);
+
+#define NUM 1024
+
+static __m128i src1[NUM];
+static __m128i src2[NUM];
+static __m128i edst[NUM];
+
+static __m128i resdst[NUM];
+
+/* Initialize input/output vectors.  (Currently, there is only one set of
+   input/output vectors).  */
+
+static void
+init_data (__m128i *s1, __m128i *s2, __m128i *d)
+{
+  int i;
+
+  for (i = 0; i < NUM; i++)
+    {
+      s1[i] = _mm_setr_epi32 (0x5d53475d, 0x63746f72,
+			      0x73745665, 0x7b5b5465);
+      s2[i] = _mm_setr_epi32 (0x726f6e5d, 0x5b477565,
+			      0x68617929, 0x48692853);
+      d[i] = _mm_setr_epi32 (0x72a593d0, 0xd410637b,
+			     0x6b317f95, 0xc5a391ef);
+    }
+}
+
+static void
+aes_test (void)
+{
+  int i;
+
+  init_data (src1, src2, edst);
+
+  for (i = 0; i < NUM; i += 16)
+    {
+      resdst[i] = _mm_aesdeclast_si128 (src1[i], src2[i]);
+      resdst[i + 1] = _mm_aesdeclast_si128 (src1[i + 1], src2[i + 1]);
+      resdst[i + 2] = _mm_aesdeclast_si128 (src1[i + 2], src2[i + 2]);
+      resdst[i + 3] = _mm_aesdeclast_si128 (src1[i + 3], src2[i + 3]);
+      resdst[i + 4] = _mm_aesdeclast_si128 (src1[i + 4], src2[i + 4]);
+      resdst[i + 5] = _mm_aesdeclast_si128 (src1[i + 5], src2[i + 5]);
+      resdst[i + 6] = _mm_aesdeclast_si128 (src1[i + 6], src2[i + 6]);
+      resdst[i + 7] = _mm_aesdeclast_si128 (src1[i + 7], src2[i + 7]);
+      resdst[i + 8] = _mm_aesdeclast_si128 (src1[i + 8], src2[i + 8]);
+      resdst[i + 9] = _mm_aesdeclast_si128 (src1[i + 9], src2[i + 9]);
+      resdst[i + 10] = _mm_aesdeclast_si128 (src1[i + 10], src2[i + 10]);
+      resdst[i + 11] = _mm_aesdeclast_si128 (src1[i + 11], src2[i + 11]);
+      resdst[i + 12] = _mm_aesdeclast_si128 (src1[i + 12], src2[i + 12]);
+      resdst[i + 13] = _mm_aesdeclast_si128 (src1[i + 13], src2[i + 13]);
+      resdst[i + 14] = _mm_aesdeclast_si128 (src1[i + 14], src2[i + 14]);
+      resdst[i + 15] = _mm_aesdeclast_si128 (src1[i + 15], src2[i + 15]);
+    }
+
+  for (i = 0; i < NUM; i++)
+    if (memcmp (edst + i, resdst + i, sizeof (__m128i)))
+      abort ();
+}
--- gcc/testsuite/gcc.target/i386/aesenc.c.aes	2008-04-03 22:14:44.000000000 -0700
+++ gcc/testsuite/gcc.target/i386/aesenc.c	2008-04-03 22:14:44.000000000 -0700
@@ -0,0 +1,68 @@
+/* { dg-do run } */
+/* { dg-require-effective-target aes } */
+/* { dg-options "-O2 -maes" } */
+
+#include <wmmintrin.h>
+#include <string.h>
+
+#include "aes-check.h"
+
+extern void abort (void);
+
+#define NUM 1024
+
+static __m128i src1[NUM];
+static __m128i src2[NUM];
+static __m128i edst[NUM];
+
+static __m128i resdst[NUM];
+
+/* Initialize input/output vectors.  (Currently, there is only one set
+   of input/output vectors).  */
+
+static void
+init_data (__m128i *s1, __m128i *s2, __m128i *d)
+{
+  int i;
+  for (i = 0; i < NUM; i++)
+    {
+      s1[i] = _mm_setr_epi32 (0x5d53475d, 0x63746f72,
+			      0x73745665, 0x7b5b5465);
+      s2[i] = _mm_setr_epi32 (0x726f6e5d, 0x5b477565,
+			      0x68617929, 0x48692853);
+      d[i] = _mm_setr_epi32 (0xded7e595, 0x8b104b58,
+			     0x9fdba3c5, 0xa8311c2f);
+    }
+}
+
+static void
+aes_test (void)
+{
+  int i;
+
+  init_data (src1, src2, edst);
+
+  for (i = 0; i < NUM; i += 16)
+    {
+      resdst[i] = _mm_aesenc_si128 (src1[i], src2[i]);
+      resdst[i + 1] = _mm_aesenc_si128 (src1[i + 1], src2[i + 1]);
+      resdst[i + 2] = _mm_aesenc_si128 (src1[i + 2], src2[i + 2]);
+      resdst[i + 3] = _mm_aesenc_si128 (src1[i + 3], src2[i + 3]);
+      resdst[i + 4] = _mm_aesenc_si128 (src1[i + 4], src2[i + 4]);
+      resdst[i + 5] = _mm_aesenc_si128 (src1[i + 5], src2[i + 5]);
+      resdst[i + 6] = _mm_aesenc_si128 (src1[i + 6], src2[i + 6]);
+      resdst[i + 7] = _mm_aesenc_si128 (src1[i + 7], src2[i + 7]);
+      resdst[i + 8] = _mm_aesenc_si128 (src1[i + 8], src2[i + 8]);
+      resdst[i + 9] = _mm_aesenc_si128 (src1[i + 9], src2[i + 9]);
+      resdst[i + 10] = _mm_aesenc_si128 (src1[i + 10], src2[i + 10]);
+      resdst[i + 11] = _mm_aesenc_si128 (src1[i + 11], src2[i + 11]);
+      resdst[i + 12] = _mm_aesenc_si128 (src1[i + 12], src2[i + 12]);
+      resdst[i + 13] = _mm_aesenc_si128 (src1[i + 13], src2[i + 13]);
+      resdst[i + 14] = _mm_aesenc_si128 (src1[i + 14], src2[i + 14]);
+      resdst[i + 15] = _mm_aesenc_si128 (src1[i + 15], src2[i + 15]);
+    }
+
+  for (i = 0; i < NUM; i++)
+    if (memcmp (edst + i, resdst + i, sizeof (__m128i)))
+      abort ();
+}
--- gcc/testsuite/gcc.target/i386/aesenclast.c.aes	2008-04-03 22:14:44.000000000 -0700
+++ gcc/testsuite/gcc.target/i386/aesenclast.c	2008-04-03 22:14:44.000000000 -0700
@@ -0,0 +1,68 @@
+/* { dg-do run } */
+/* { dg-require-effective-target aes } */
+/* { dg-options "-O2 -maes" } */
+
+#include <wmmintrin.h>
+#include <string.h>
+
+#include "aes-check.h"
+
+extern void abort (void);
+
+#define NUM 1024
+
+static __m128i src1[NUM];
+static __m128i src2[NUM];
+static __m128i edst[NUM];
+
+static __m128i resdst[NUM];
+
+/* Initialize input/output vectors.  (Currently, there is only one
+   set of input/output vectors).  */
+
+static void
+init_data (__m128i *s1, __m128i *s2, __m128i *d)
+{
+  int i;
+  for (i = 0; i < NUM; i++)
+    {
+      s1[i] = _mm_setr_epi32 (0x5d53475d, 0x63746f72,
+			      0x73745665, 0x7b5b5465);
+      s2[i] = _mm_setr_epi32 (0x726f6e5d, 0x5b477565,
+			      0x68617929, 0x48692853);
+      d[i] = _mm_setr_epi32 (0x53fdc611, 0x177ec425,
+			     0x938c5964, 0xc7fb881e);
+    }
+}
+
+static void
+aes_test (void)
+{
+  int i;
+
+  init_data (src1, src2, edst);
+
+  for (i = 0; i < NUM; i += 16)
+    {
+      resdst[i] = _mm_aesenclast_si128 (src1[i], src2[i]);
+      resdst[i + 1] = _mm_aesenclast_si128 (src1[i + 1], src2[i + 1]);
+      resdst[i + 2] = _mm_aesenclast_si128 (src1[i + 2], src2[i + 2]);
+      resdst[i + 3] = _mm_aesenclast_si128 (src1[i + 3], src2[i + 3]);
+      resdst[i + 4] = _mm_aesenclast_si128 (src1[i + 4], src2[i + 4]);
+      resdst[i + 5] = _mm_aesenclast_si128 (src1[i + 5], src2[i + 5]);
+      resdst[i + 6] = _mm_aesenclast_si128 (src1[i + 6], src2[i + 6]);
+      resdst[i + 7] = _mm_aesenclast_si128 (src1[i + 7], src2[i + 7]);
+      resdst[i + 8] = _mm_aesenclast_si128 (src1[i + 8], src2[i + 8]);
+      resdst[i + 9] = _mm_aesenclast_si128 (src1[i + 9], src2[i + 9]);
+      resdst[i + 10] = _mm_aesenclast_si128 (src1[i + 10], src2[i + 10]);
+      resdst[i + 11] = _mm_aesenclast_si128 (src1[i + 11], src2[i + 11]);
+      resdst[i + 12] = _mm_aesenclast_si128 (src1[i + 12], src2[i + 12]);
+      resdst[i + 13] = _mm_aesenclast_si128 (src1[i + 13], src2[i + 13]);
+      resdst[i + 14] = _mm_aesenclast_si128 (src1[i + 14], src2[i + 14]);
+      resdst[i + 15] = _mm_aesenclast_si128 (src1[i + 15], src2[i + 15]);
+    }
+
+  for (i = 0; i < NUM; i++)
+    if (memcmp(edst + i, resdst + i, sizeof (__m128i)))
+      abort ();
+}
--- gcc/testsuite/gcc.target/i386/aesimc.c.aes	2008-04-03 22:14:44.000000000 -0700
+++ gcc/testsuite/gcc.target/i386/aesimc.c	2008-04-03 22:14:44.000000000 -0700
@@ -0,0 +1,66 @@
+/* { dg-do run } */
+/* { dg-require-effective-target aes } */
+/* { dg-options "-O2 -maes" } */
+
+#include <wmmintrin.h>
+#include <string.h>
+
+#include "aes-check.h"
+
+extern void abort (void);
+
+#define NUM 1024
+
+static __m128i src1[NUM];
+static __m128i edst[NUM];
+
+static __m128i resdst[NUM];
+
+/* Initialize input/output vectors.  (Currently, there is only one set
+   of input/output vectors).   */
+
+static void
+init_data (__m128i *s1, __m128i *d)
+{
+  int i;
+
+  for (i = 0; i < NUM; i++)
+    {
+      s1[i] = _mm_setr_epi32 (0x5d53475d, 0x63746f72,
+			      0x73745665, 0x7b5b5465);
+      d[i] = _mm_setr_epi32 (0x81c3b3e5, 0x2b18330a,
+			     0x44b109c8, 0x627a6f66);
+    }
+}
+
+static void
+aes_test (void)
+{
+  int i;
+
+  init_data (src1, edst);
+
+  for (i = 0; i < NUM; i += 16)
+    {
+      resdst[i] = _mm_aesimc_si128 (src1[i]);
+      resdst[i + 1] = _mm_aesimc_si128 (src1[i + 1]);
+      resdst[i + 2] = _mm_aesimc_si128 (src1[i + 2]);
+      resdst[i + 3] = _mm_aesimc_si128 (src1[i + 3]);
+      resdst[i + 4] = _mm_aesimc_si128 (src1[i + 4]);
+      resdst[i + 5] = _mm_aesimc_si128 (src1[i + 5]);
+      resdst[i + 6] = _mm_aesimc_si128 (src1[i + 6]);
+      resdst[i + 7] = _mm_aesimc_si128 (src1[i + 7]);
+      resdst[i + 8] = _mm_aesimc_si128 (src1[i + 8]);
+      resdst[i + 9] = _mm_aesimc_si128 (src1[i + 9]);
+      resdst[i + 10] = _mm_aesimc_si128 (src1[i + 10]);
+      resdst[i + 11] = _mm_aesimc_si128 (src1[i + 11]);
+      resdst[i + 12] = _mm_aesimc_si128 (src1[i + 12]);
+      resdst[i + 13] = _mm_aesimc_si128 (src1[i + 13]);
+      resdst[i + 14] = _mm_aesimc_si128 (src1[i + 14]);
+      resdst[i + 15] = _mm_aesimc_si128 (src1[i + 15]);
+    }
+
+  for (i = 0; i < NUM; i++)
+    if (memcmp(edst + i, resdst + i, sizeof (__m128i)))
+      abort ();
+}
--- gcc/testsuite/gcc.target/i386/aeskeygenassist.c.aes	2008-04-03 22:14:44.000000000 -0700
+++ gcc/testsuite/gcc.target/i386/aeskeygenassist.c	2008-04-03 22:14:44.000000000 -0700
@@ -0,0 +1,66 @@
+/* { dg-do run } */
+/* { dg-require-effective-target aes } */
+/* { dg-options "-O2 -maes" } */
+
+#include <wmmintrin.h>
+#include <string.h>
+
+#include "aes-check.h"
+
+extern void abort (void);
+
+#define NUM 1024
+#define IMM8 1
+
+static __m128i src1[NUM];
+static __m128i edst[NUM];
+
+static __m128i resdst[NUM];
+
+/* Initialize input/output vectors.  (Currently, there is only one set
+   of input/output vectors).  */
+
+static void
+init_data (__m128i *s1, __m128i *d)
+{
+  int i;
+  for (i = 0; i < NUM; i++)
+    {
+      s1[i] = _mm_setr_epi32 (0x16157e2b, 0xa6d2ae28,
+			      0x8815f7ab, 0x3c4fcf09);
+      d[i] = _mm_setr_epi32 (0x24b5e434, 0x3424b5e5,
+			     0xeb848a01, 0x01eb848b);
+    }
+}
+
+static void
+aes_test (void)
+{
+  int i;
+
+  init_data (src1, edst);
+
+  for (i = 0; i < NUM; i += 16)
+    {
+      resdst[i]  = _mm_aeskeygenassist_si128 (src1[i], IMM8);
+      resdst[i + 1] = _mm_aeskeygenassist_si128 (src1[i + 1], IMM8);
+      resdst[i + 2] = _mm_aeskeygenassist_si128 (src1[i + 2], IMM8);
+      resdst[i + 3] = _mm_aeskeygenassist_si128 (src1[i + 3], IMM8);
+      resdst[i + 4] = _mm_aeskeygenassist_si128 (src1[i + 4], IMM8);
+      resdst[i + 5] = _mm_aeskeygenassist_si128 (src1[i + 5], IMM8);
+      resdst[i + 6] = _mm_aeskeygenassist_si128 (src1[i + 6], IMM8);
+      resdst[i + 7] = _mm_aeskeygenassist_si128 (src1[i + 7], IMM8);
+      resdst[i + 8] = _mm_aeskeygenassist_si128 (src1[i + 8], IMM8);
+      resdst[i + 9] = _mm_aeskeygenassist_si128 (src1[i + 9], IMM8);
+      resdst[i + 10] = _mm_aeskeygenassist_si128 (src1[i + 10], IMM8);
+      resdst[i + 11] = _mm_aeskeygenassist_si128 (src1[i + 11], IMM8);
+      resdst[i + 12] = _mm_aeskeygenassist_si128 (src1[i + 12], IMM8);
+      resdst[i + 13] = _mm_aeskeygenassist_si128 (src1[i + 13], IMM8);
+      resdst[i + 14] = _mm_aeskeygenassist_si128 (src1[i + 14], IMM8);
+      resdst[i + 15] = _mm_aeskeygenassist_si128 (src1[i + 15], IMM8);
+    }
+
+  for (i = 0; i < NUM; i++)
+    if (memcmp(edst + i, resdst + i, sizeof (__m128i)))
+      abort ();
+}
--- gcc/testsuite/gcc.target/i386/i386.exp.aes	2008-01-05 10:52:44.000000000 -0800
+++ gcc/testsuite/gcc.target/i386/i386.exp	2008-04-04 06:48:42.000000000 -0700
@@ -51,6 +51,34 @@ proc check_effective_target_sse4 { } {
     } "-O2 -msse4.1" ]
 }
 
+# Return 1 if aes instructions can be compiled.
+proc check_effective_target_aes { } {
+    return [check_no_compiler_messages aes object {
+	typedef long long __m128i __attribute__ ((__vector_size__ (16)));
+	typedef long long __v2di __attribute__ ((__vector_size__ (16)));
+
+	__m128i _mm_aesimc_si128 (__m128i __X)
+	{
+	    return (__m128i) __builtin_ia32_aesimc128 ((__v2di)__X);
+	}
+    } "-O2 -maes" ]
+}
+
+# Return 1 if pclmul instructions can be compiled.
+proc check_effective_target_pclmul { } {
+    return [check_no_compiler_messages pclmul object {
+	typedef long long __m128i __attribute__ ((__vector_size__ (16)));
+	typedef long long __v2di __attribute__ ((__vector_size__ (16)));
+
+	__m128i pclmulqdq_test (__m128i __X, __m128i __Y)
+	{
+	    return (__m128i) __builtin_ia32_pclmulqdq128 ((__v2di)__X,
+							  (__v2di)__Y,
+							  1);
+	}
+    } "-O2 -mpclmul" ]
+}
+
 # Return 1 if sse4a instructions can be compiled.
 proc check_effective_target_sse4a { } {
     return [check_no_compiler_messages sse4a object {
--- gcc/testsuite/gcc.target/i386/pclmul-check.h.aes	2008-04-03 22:14:44.000000000 -0700
+++ gcc/testsuite/gcc.target/i386/pclmul-check.h	2008-04-04 06:54:12.000000000 -0700
@@ -0,0 +1,30 @@
+#include <stdio.h>
+#include <stdlib.h>
+
+#include "cpuid.h"
+
+static void pclmul_test (void);
+
+int
+main ()
+{
+  unsigned int eax, ebx, ecx, edx;
+ 
+  if (!__get_cpuid (1, &eax, &ebx, &ecx, &edx))
+    return 0;
+
+  /* Run PCLMULQDQ test only if host has PCLMULQDQ support.  */
+  if (ecx & bit_PCLMUL)
+    {
+      pclmul_test ();
+#ifdef DEBUG
+      printf ("PASSED\n");
+#endif
+    }
+#ifdef DEBUG
+  else
+    printf ("SKIPPED\n");
+#endif
+
+  return 0;
+}
--- gcc/testsuite/gcc.target/i386/pclmulqdq.c.aes	2008-04-03 22:14:44.000000000 -0700
+++ gcc/testsuite/gcc.target/i386/pclmulqdq.c	2008-04-04 06:48:56.000000000 -0700
@@ -0,0 +1,87 @@
+/* { dg-do run } */
+/* { dg-require-effective-target pclmul } */
+/* { dg-options "-O2 -mpclmul" } */
+
+#include <wmmintrin.h>
+#include <string.h>
+
+#include "pclmul-check.h"
+
+extern void abort (void);
+
+#define NUM 1024
+
+static __m128i s1[NUM];
+static __m128i s2[NUM];
+/* We need this array to generate mem form of inst */
+static __m128i s2m[NUM];
+
+static __m128i e_00[NUM];
+static __m128i e_01[NUM];
+static __m128i e_10[NUM];
+static __m128i e_11[NUM];
+
+static __m128i d_00[NUM];
+static __m128i d_01[NUM];
+static __m128i d_10[NUM];
+static __m128i d_11[NUM];
+
+/* Initialize input/output vectors.  (Currently, there is only one set
+   of input/output vectors).  */
+static void
+init_data (__m128i *ls1,   __m128i *ls2, __m128i *le_00, __m128i *le_01,
+	   __m128i *le_10, __m128i *le_11)
+{
+  int i;
+
+  for (i = 0; i < NUM; i++)
+    {
+      ls1[i] = _mm_set_epi32 (0x7B5B5465, 0x73745665,
+			      0x63746F72, 0x5D53475D);
+      ls2[i] = _mm_set_epi32 (0x48692853, 0x68617929,
+			      0x5B477565, 0x726F6E5D);
+      s2m[i] = _mm_set_epi32 (0x48692853, 0x68617929,
+			      0x5B477565, 0x726F6E5D);
+      le_00[i] = _mm_set_epi32 (0x1D4D84C8, 0x5C3440C0,
+				0x929633D5, 0xD36F0451);
+      le_01[i] = _mm_set_epi32 (0x1A2BF6DB, 0x3A30862F,
+				0xBABF262D, 0xF4B7D5C9);
+      le_10[i] = _mm_set_epi32 (0x1BD17C8D, 0x556AB5A1,
+				0x7FA540AC, 0x2A281315);
+      le_11[i] = _mm_set_epi32 (0x1D1E1F2C, 0x592E7C45,
+				0xD66EE03E, 0x410FD4ED);
+    }
+}
+
+static void
+pclmul_test (void)
+{
+  int i;
+
+  init_data (s1, s2, e_00, e_01, e_10, e_11);
+
+  for (i = 0; i < NUM; i += 2)
+    {
+      d_00[i] = _mm_clmulepi64_si128 (s1[i], s2m[i], 0x00);
+      d_01[i] = _mm_clmulepi64_si128 (s1[i], s2[i], 0x01);
+      d_10[i] = _mm_clmulepi64_si128 (s1[i], s2[i], 0x10);
+      d_11[i] = _mm_clmulepi64_si128 (s1[i], s2[i], 0x11);
+
+      d_11[i + 1] = _mm_clmulepi64_si128 (s1[i + 1], s2[i + 1], 0x11);
+      d_00[i + 1] = _mm_clmulepi64_si128 (s1[i + 1], s2[i + 1], 0x00);
+      d_10[i + 1] = _mm_clmulepi64_si128 (s1[i + 1], s2m[i + 1], 0x10);
+      d_01[i + 1] = _mm_clmulepi64_si128 (s1[i + 1], s2[i + 1], 0x01);
+    }
+
+  for (i = 0; i < NUM; i++)
+    {
+      if (memcmp (d_00 + i, e_00 + i, sizeof (__m128i)))
+	abort ();
+      if (memcmp (d_01 + i, e_01 + i, sizeof (__m128i)))
+	abort ();
+      if (memcmp (d_10 + i, e_10 + i, sizeof (__m128i)))
+	abort ();
+      if (memcmp(d_11 + i, e_11 + i, sizeof (__m128i)))
+	abort ();
+    }
+}
--- gcc/testsuite/gcc.target/i386/sse-13.c.aes	2008-03-26 06:32:52.000000000 -0700
+++ gcc/testsuite/gcc.target/i386/sse-13.c	2008-04-04 06:38:43.000000000 -0700
@@ -1,10 +1,10 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -m3dnow -msse4 -msse5" } */
+/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -m3dnow -msse4 -msse5 -maes -mpclmul" } */
 
 #include <mm_malloc.h>
 
 /* Test that the intrinsics compile with optimization.  All of them are
-   defined as inline functions in {,x,e,p,t,s,a,b}mmintrin.h and mm3dnow.h
+   defined as inline functions in {,x,e,p,t,s,w,a,b}mmintrin.h and mm3dnow.h
    that reference the proper builtin functions.  Defining away "extern" and
    "__inline" results in all of them being compiled as proper functions.  */
 
@@ -17,6 +17,10 @@
 #define __builtin_ia32_extrqi(X, I, L)  __builtin_ia32_extrqi(X, 1, 1)
 #define __builtin_ia32_insertqi(X, Y, I, L) __builtin_ia32_insertqi(X, Y, 1, 1)
 
+/* wmmintrin.h */
+#define __builtin_ia32_aeskeygenassist128(X, C) __builtin_ia32_aeskeygenassist128(X, 1)
+#define __builtin_ia32_pclmulqdq128(X, Y, I) __builtin_ia32_pclmulqdq128(X, Y, 1)
+
 /* smmintrin.h */
 #define __builtin_ia32_pblendw128(X, Y, M) __builtin_ia32_pblendw128 (X, Y, 1)
 #define __builtin_ia32_blendps(X, Y, M) __builtin_ia32_blendps(X, Y, 1)
@@ -94,6 +98,7 @@
 #define __builtin_ia32_protdi(A, B) __builtin_ia32_protdi(A,1)
 #define __builtin_ia32_protqi(A, B) __builtin_ia32_protqi(A,1)
 
+#include <wmmintrin.h>
 #include <bmmintrin.h>
 #include <smmintrin.h>
 #include <mm3dnow.h>
--- gcc/testsuite/gcc.target/i386/sse-14.c.aes	2008-03-26 06:32:52.000000000 -0700
+++ gcc/testsuite/gcc.target/i386/sse-14.c	2008-04-04 06:38:31.000000000 -0700
@@ -1,16 +1,17 @@
 /* { dg-do compile } */
-/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -m3dnow -msse4 -msse5" } */
+/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -m3dnow -msse4 -msse5 -maes -mpclmul" } */
 
 #include <mm_malloc.h>
 
 /* Test that the intrinsics compile without optimization.  All of them are
-   defined as inline functions in {,x,e,p,t,s,a,b}mmintrin.h  and mm3dnow.h
+   defined as inline functions in {,x,e,p,t,s,w,a,b}mmintrin.h  and mm3dnow.h
    that reference the proper builtin functions.  Defining away "extern" and
    "__inline" results in all of them being compiled as proper functions.  */
 
 #define extern
 #define __inline
 
+#include <wmmintrin.h>
 #include <bmmintrin.h>
 #include <smmintrin.h>
 #include <mm3dnow.h>
@@ -46,6 +47,10 @@
 test_1x (_mm_extracti_si64, __m128i, __m128i, 1, 1)
 test_2x (_mm_inserti_si64, __m128i, __m128i, __m128i, 1, 1)
 
+/* wmmintrin.h */
+test_1 (_mm_aeskeygenassist_si128, __m128i, __m128i, 1)
+test_2 (_mm_clmulepi64_si128, __m128i, __m128i, __m128i, 1)
+
 /* smmintrin.h */
 test_2 (_mm_blend_epi16, __m128i, __m128i, __m128i, 1)
 test_2 (_mm_blend_ps, __m128, __m128, __m128, 1)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: PATCH: Enable Intel AES/CLMUL
  2008-04-04 13:56           ` Uros Bizjak
@ 2008-04-04 14:08             ` Uros Bizjak
  2008-04-04 14:51             ` H.J. Lu
  1 sibling, 0 replies; 23+ messages in thread
From: Uros Bizjak @ 2008-04-04 14:08 UTC (permalink / raw)
  To: H.J. Lu; +Cc: GCC Patches

On Fri, Apr 4, 2008 at 3:28 PM, H.J. Lu <hjl.tools@gmail.com> wrote:

>  > >  SSE doesn't support V2DI. You need at least SSE2.
>  > >
>  >
>  > OK.

>  > This practically forces usage of -msse4.1 for -maes/-mclmul, since
>  > these instructions are used only through intrinsics, defined in their
>  > corresponding header file. wmmintrin.h includes smmintrin.h that will
>  > error out by #ifndef __SSE4_1__.
>  >
>  > I think that user should explicitly enable sse4.1 manually in the
>  > command line together with -maes/-mclmul to access these new
>  > instructions. Including wmmintrin.h will error out when neither
>  > -maes/-mclmul is enabled, and smmintrin.h will error out with a
>  > message that SSE4.1 instruction set is not enabled.
>
>  There is a proposal to get rid of all *mmintrin.h.  Users should
>  include one header file, something like <ia32intrin.h> ,for all
>  current and future intrinsics.  The name of the meta intrinsic
>  heade file hasn't be decided. Do we have any preferences/suggestions?
>
>  With that in mind, I will change it to include emmintrin.h.

OK. So, we require user to pass at least -msse2 in addition to
-maes/-mclmul then? The errors will be informative, so there will be
no confusion.

>  > "instruction set" or "instructions" in the error message?
>
>  I can use instructions.

This is up to your taste, "instruction set" just sound too fancy for
me for a pack of instructions ;)

Uros.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: PATCH: Enable Intel AES/CLMUL
  2008-04-04 13:32         ` H.J. Lu
@ 2008-04-04 13:56           ` Uros Bizjak
  2008-04-04 14:08             ` Uros Bizjak
  2008-04-04 14:51             ` H.J. Lu
  2008-04-04 20:27           ` Michael Meissner
  1 sibling, 2 replies; 23+ messages in thread
From: Uros Bizjak @ 2008-04-04 13:56 UTC (permalink / raw)
  To: H.J. Lu; +Cc: GCC Patches

On Fri, Apr 4, 2008 at 3:28 PM, H.J. Lu <hjl.tools@gmail.com> wrote:

>  > >  SSE doesn't support V2DI. You need at least SSE2.
>  > >
>  >
>  > OK.

>  > This practically forces usage of -msse4.1 for -maes/-mclmul, since
>  > these instructions are used only through intrinsics, defined in their
>  > corresponding header file. wmmintrin.h includes smmintrin.h that will
>  > error out by #ifndef __SSE4_1__.
>  >
>  > I think that user should explicitly enable sse4.1 manually in the
>  > command line together with -maes/-mclmul to access these new
>  > instructions. Including wmmintrin.h will error out when neither
>  > -maes/-mclmul is enabled, and smmintrin.h will error out with a
>  > message that SSE4.1 instruction set is not enabled.
>
>  There is a proposal to get rid of all *mmintrin.h.  Users should
>  include one header file, something like <ia32intrin.h> ,for all
>  current and future intrinsics.  The name of the meta intrinsic
>  heade file hasn't be decided. Do we have any preferences/suggestions?
>
>  With that in mind, I will change it to include emmintrin.h.

OK. So, we require user to pass at least -msse2 in addition to
-maes/-mclmul then? The errors will be informative, so there will be
no confusion.

>  > "instruction set" or "instructions" in the error message?
>
>  I can use instructions.

This is up to your taste, "instruction set" just sound too fancy for
me for a pack of instructions ;)

Uros.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: PATCH: Enable Intel AES/CLMUL
  2008-04-04 13:29       ` Uros Bizjak
@ 2008-04-04 13:32         ` H.J. Lu
  2008-04-04 13:56           ` Uros Bizjak
  2008-04-04 20:27           ` Michael Meissner
  0 siblings, 2 replies; 23+ messages in thread
From: H.J. Lu @ 2008-04-04 13:32 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: GCC Patches

On Fri, Apr 04, 2008 at 03:16:06PM +0200, Uros Bizjak wrote:
> On Fri, Apr 4, 2008 at 2:48 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> > On Fri, Apr 04, 2008 at 08:30:40AM +0200, Uros Bizjak wrote:
> >  > On Thu, Apr 3, 2008 at 11:51 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> >  >
> >  > >  +  /* Enable SSE 4.2 if AES or CLMUL is enabled.  */
> >  > >  +  if ((x86_aes || x86_clmul)
> >  > >  +      && !(ix86_isa_flags_explicit & OPTION_MASK_ISA_SSE4_2))
> >  > >  +    {
> >  > >  +      ix86_isa_flags |= OPTION_MASK_ISA_SSE4_2_SET;
> >  > >  +      ix86_isa_flags_explicit |= OPTION_MASK_ISA_SSE4_2_SET;
> >  > >  +    }
> >  > >  +
> >  >
> >  > Do we really want to enable all SSE builtins if only AES functionality
> >  > is required? I think that -msse -maes is more appropriate, since -msse
> >  > instructs the compiler to enable support for SSE registers and -maes
> >  > enables special AES instructions that depend on SSE registers.
> >  > Unfortunately, we don't have separate option to enable only SSE
> >  > registers without SSE builtins, but IMO we can live with that. Similar
> >  > situation is with -mclmul. At minimum, -msse -mclmul should be
> >  > required, I see no reason to enable everything up to sse4_2 for this
> >  > insn.
> >
> >  SSE doesn't support V2DI. You need at least SSE2.
> >
> 
> OK.
> 
> >
> >  >
> >  > >  +/* We need definitions from the SSE4, SSSE3, SSE3, SSE2 and SSE header
> >  > >  +   files.  */
> >  > >  +#include <smmintrin.h>
> >  > >  +
> >  >
> >  > Actually we only need __v2di and __m128i typedefs here. Perhaps we can
> >  > copy these two definitions here, or we can define
> >  >
> >  > typedef long long __m128a __attribute__ ((vector_size (16), __may_alias));
> >  >
> >  > and rewrite these new intrinsics to use this type?
> >
> >  <wmmintrin.h> from icc 10.1 includes SSE4 intrinsic header file. If
> >  <wmmintrin.h> from gcc doesn't support SSE4 intrinsics, it will be
> >  incompatible with icc. I can image other companies may implement
> >  AES/CLMUL. I think it should be handled similar to SSE4 intrinsics.
> >  That is we put AES/CLMUL intrinsics in a common intrinsic header file
> >  and 2 different options, one of them is -maes/-mclmul, will enale
> >  them.  -maes/-mclmul will still imply SSE4.
> 
> This practically forces usage of -msse4.1 for -maes/-mclmul, since
> these instructions are used only through intrinsics, defined in their
> corresponding header file. wmmintrin.h includes smmintrin.h that will
> error out by #ifndef __SSE4_1__.
> 
> I think that user should explicitly enable sse4.1 manually in the
> command line together with -maes/-mclmul to access these new
> instructions. Including wmmintrin.h will error out when neither
> -maes/-mclmul is enabled, and smmintrin.h will error out with a
> message that SSE4.1 instruction set is not enabled.

There is a proposal to get rid of all *mmintrin.h.  Users should
include one header file, something like <ia32intrin.h> ,for all
current and future intrinsics.  The name of the meta intrinsic
heade file hasn't be decided. Do we have any preferences/suggestions?

With that in mind, I will change it to include emmintrin.h.

> 
> Does icc automatically enable all SSE intrinsics for -maes/-mclmul?

Icc has a different switch to enable SSE 4 and AES. -maes/-mclmul
is for gcc emulation.

> 
> >  I can change
> >
> >
> >  { OPTION_MASK_ISA_SSE4_2, CODE_FOR_aesimc, 0, IX86_BUILTIN_AESIMC128, UNKNOWN , 0 },
> >
> >  to
> >
> >  { OPTION_MASK_ISA_SSE2, CODE_FOR_aesimc, 0, IX86_BUILTIN_AESIMC128, UNKNOWN , 0 },
> >
> 
> Since there is no other way to generate these insns than through
> intrinsics (__builtin_X should not be used by user as this is not
> considered stable interface), we can put OPTION_MASK_ISA_SSE4_1 here.
> At least SSE4_1 should be active due to constraint forced by
> wmmintrin.h/smmintrin.h
> 
> > +#if !defined (__AES__) && !defined (__CLMUL__)
> > +# error "AES/CLMUL instruction set not enabled"
> > +#else
> 
> "instruction set" or "instructions" in the error message?

I can use instructions.

H.J.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: PATCH: Enable Intel AES/CLMUL
  2008-04-04 12:53     ` H.J. Lu
@ 2008-04-04 13:29       ` Uros Bizjak
  2008-04-04 13:32         ` H.J. Lu
  0 siblings, 1 reply; 23+ messages in thread
From: Uros Bizjak @ 2008-04-04 13:29 UTC (permalink / raw)
  To: H.J. Lu; +Cc: GCC Patches

On Fri, Apr 4, 2008 at 2:48 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Fri, Apr 04, 2008 at 08:30:40AM +0200, Uros Bizjak wrote:
>  > On Thu, Apr 3, 2008 at 11:51 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>  >
>  > >  +  /* Enable SSE 4.2 if AES or CLMUL is enabled.  */
>  > >  +  if ((x86_aes || x86_clmul)
>  > >  +      && !(ix86_isa_flags_explicit & OPTION_MASK_ISA_SSE4_2))
>  > >  +    {
>  > >  +      ix86_isa_flags |= OPTION_MASK_ISA_SSE4_2_SET;
>  > >  +      ix86_isa_flags_explicit |= OPTION_MASK_ISA_SSE4_2_SET;
>  > >  +    }
>  > >  +
>  >
>  > Do we really want to enable all SSE builtins if only AES functionality
>  > is required? I think that -msse -maes is more appropriate, since -msse
>  > instructs the compiler to enable support for SSE registers and -maes
>  > enables special AES instructions that depend on SSE registers.
>  > Unfortunately, we don't have separate option to enable only SSE
>  > registers without SSE builtins, but IMO we can live with that. Similar
>  > situation is with -mclmul. At minimum, -msse -mclmul should be
>  > required, I see no reason to enable everything up to sse4_2 for this
>  > insn.
>
>  SSE doesn't support V2DI. You need at least SSE2.
>

OK.

>
>  >
>  > >  +/* We need definitions from the SSE4, SSSE3, SSE3, SSE2 and SSE header
>  > >  +   files.  */
>  > >  +#include <smmintrin.h>
>  > >  +
>  >
>  > Actually we only need __v2di and __m128i typedefs here. Perhaps we can
>  > copy these two definitions here, or we can define
>  >
>  > typedef long long __m128a __attribute__ ((vector_size (16), __may_alias));
>  >
>  > and rewrite these new intrinsics to use this type?
>
>  <wmmintrin.h> from icc 10.1 includes SSE4 intrinsic header file. If
>  <wmmintrin.h> from gcc doesn't support SSE4 intrinsics, it will be
>  incompatible with icc. I can image other companies may implement
>  AES/CLMUL. I think it should be handled similar to SSE4 intrinsics.
>  That is we put AES/CLMUL intrinsics in a common intrinsic header file
>  and 2 different options, one of them is -maes/-mclmul, will enale
>  them.  -maes/-mclmul will still imply SSE4.

This practically forces usage of -msse4.1 for -maes/-mclmul, since
these instructions are used only through intrinsics, defined in their
corresponding header file. wmmintrin.h includes smmintrin.h that will
error out by #ifndef __SSE4_1__.

I think that user should explicitly enable sse4.1 manually in the
command line together with -maes/-mclmul to access these new
instructions. Including wmmintrin.h will error out when neither
-maes/-mclmul is enabled, and smmintrin.h will error out with a
message that SSE4.1 instruction set is not enabled.

Does icc automatically enable all SSE intrinsics for -maes/-mclmul?

>  I can change
>
>
>  { OPTION_MASK_ISA_SSE4_2, CODE_FOR_aesimc, 0, IX86_BUILTIN_AESIMC128, UNKNOWN , 0 },
>
>  to
>
>  { OPTION_MASK_ISA_SSE2, CODE_FOR_aesimc, 0, IX86_BUILTIN_AESIMC128, UNKNOWN , 0 },
>

Since there is no other way to generate these insns than through
intrinsics (__builtin_X should not be used by user as this is not
considered stable interface), we can put OPTION_MASK_ISA_SSE4_1 here.
At least SSE4_1 should be active due to constraint forced by
wmmintrin.h/smmintrin.h

> +#if !defined (__AES__) && !defined (__CLMUL__)
> +# error "AES/CLMUL instruction set not enabled"
> +#else

"instruction set" or "instructions" in the error message?

Uros.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: PATCH: Enable Intel AES/CLMUL
  2008-04-04  6:35   ` Uros Bizjak
@ 2008-04-04 12:53     ` H.J. Lu
  2008-04-04 13:29       ` Uros Bizjak
  0 siblings, 1 reply; 23+ messages in thread
From: H.J. Lu @ 2008-04-04 12:53 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: GCC Patches

On Fri, Apr 04, 2008 at 08:30:40AM +0200, Uros Bizjak wrote:
> On Thu, Apr 3, 2008 at 11:51 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> 
> >  +  /* Enable SSE 4.2 if AES or CLMUL is enabled.  */
> >  +  if ((x86_aes || x86_clmul)
> >  +      && !(ix86_isa_flags_explicit & OPTION_MASK_ISA_SSE4_2))
> >  +    {
> >  +      ix86_isa_flags |= OPTION_MASK_ISA_SSE4_2_SET;
> >  +      ix86_isa_flags_explicit |= OPTION_MASK_ISA_SSE4_2_SET;
> >  +    }
> >  +
> 
> Do we really want to enable all SSE builtins if only AES functionality
> is required? I think that -msse -maes is more appropriate, since -msse
> instructs the compiler to enable support for SSE registers and -maes
> enables special AES instructions that depend on SSE registers.
> Unfortunately, we don't have separate option to enable only SSE
> registers without SSE builtins, but IMO we can live with that. Similar
> situation is with -mclmul. At minimum, -msse -mclmul should be
> required, I see no reason to enable everything up to sse4_2 for this
> insn.

SSE doesn't support V2DI. You need at least SSE2.

I can change

{ OPTION_MASK_ISA_SSE4_2, CODE_FOR_aesimc, 0, IX86_BUILTIN_AESIMC128, UNKNOWN , 0 },

to

{ OPTION_MASK_ISA_SSE2, CODE_FOR_aesimc, 0, IX86_BUILTIN_AESIMC128, UNKNOWN , 0 },

> 
> >  +/* We need definitions from the SSE4, SSSE3, SSE3, SSE2 and SSE header
> >  +   files.  */
> >  +#include <smmintrin.h>
> >  +
> 
> Actually we only need __v2di and __m128i typedefs here. Perhaps we can
> copy these two definitions here, or we can define
> 
> typedef long long __m128a __attribute__ ((vector_size (16), __may_alias));
> 
> and rewrite these new intrinsics to use this type?

<wmmintrin.h> from icc 10.1 includes SSE4 intrinsic header file. If
<wmmintrin.h> from gcc doesn't support SSE4 intrinsics, it will be
incompatible with icc. I can image other companies may implement
AES/CLMUL. I think it should be handled similar to SSE4 intrinsics.
That is we put AES/CLMUL intrinsics in a common intrinsic header file
and 2 different options, one of them is -maes/-mclmul, will enale
them.  -maes/-mclmul will still imply SSE4.


H.J.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: PATCH: Enable Intel AES/CLMUL
  2008-04-03 22:00 ` H.J. Lu
  2008-04-03 23:17   ` H.J. Lu
@ 2008-04-04  6:35   ` Uros Bizjak
  2008-04-04 12:53     ` H.J. Lu
  1 sibling, 1 reply; 23+ messages in thread
From: Uros Bizjak @ 2008-04-04  6:35 UTC (permalink / raw)
  To: H.J. Lu; +Cc: GCC Patches

On Thu, Apr 3, 2008 at 11:51 PM, H.J. Lu <hjl.tools@gmail.com> wrote:

>  +  /* Enable SSE 4.2 if AES or CLMUL is enabled.  */
>  +  if ((x86_aes || x86_clmul)
>  +      && !(ix86_isa_flags_explicit & OPTION_MASK_ISA_SSE4_2))
>  +    {
>  +      ix86_isa_flags |= OPTION_MASK_ISA_SSE4_2_SET;
>  +      ix86_isa_flags_explicit |= OPTION_MASK_ISA_SSE4_2_SET;
>  +    }
>  +

Do we really want to enable all SSE builtins if only AES functionality
is required? I think that -msse -maes is more appropriate, since -msse
instructs the compiler to enable support for SSE registers and -maes
enables special AES instructions that depend on SSE registers.
Unfortunately, we don't have separate option to enable only SSE
registers without SSE builtins, but IMO we can live with that. Similar
situation is with -mclmul. At minimum, -msse -mclmul should be
required, I see no reason to enable everything up to sse4_2 for this
insn.

>  +/* We need definitions from the SSE4, SSSE3, SSE3, SSE2 and SSE header
>  +   files.  */
>  +#include <smmintrin.h>
>  +

Actually we only need __v2di and __m128i typedefs here. Perhaps we can
copy these two definitions here, or we can define

typedef long long __m128a __attribute__ ((vector_size (16), __may_alias));

and rewrite these new intrinsics to use this type?

Uros.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: PATCH: Enable Intel AES/CLMUL
  2008-04-03 22:00 ` H.J. Lu
@ 2008-04-03 23:17   ` H.J. Lu
  2008-04-04  6:35   ` Uros Bizjak
  1 sibling, 0 replies; 23+ messages in thread
From: H.J. Lu @ 2008-04-03 23:17 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 5031 bytes --]

Hi,

I should use def_builtin_const instead of def_builtin. Here is the
updated patch.


H..J.
On Thu, Apr 3, 2008 at 2:51 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Thu, Apr 03, 2008 at 07:41:49PM +0200, Uros Bizjak wrote:
>  > Hello!
>  >
>  >>      * config/i386/i386.c (OPTION_MASK_ISA_AES_SET): New.
>  >>      (OPTION_MASK_ISA_CLMUL_SET): Likewise.
>  >>      (OPTION_MASK_ISA_AES_UNSET): Likewise.
>  >>      (OPTION_MASK_ISA_CLMUL_UNSET): Likewise.
>  >>      (OPTION_MASK_ISA_SSE4_2_UNSET): Add OPTION_MASK_ISA_AES_UNSET
>  >>      and OPTION_MASK_ISA_CLMUL_UNSET.
>  >
>  > I don't think that MASK_ISA is correct approach here, since MASK_ISA is
>  > reserved for different SSE levels that siwtch on or off whole pack of
>  > instructions. Perhaps we should do with:
>  >
>  > i386.h:
>  > #define TARGET_CLMUL      x86_aes
>  >
>  > where
>  >
>  > i386.opt:
>  > maes
>  > Target Report RejectNegative Var(x86_aes)
>  >
>  > We can even #define TARGET_CLMUL  (TARGET_SSE4_2 && x86_aes), depending on
>  > how we want to enable generation of AES instructions. Please also note,
>  > that we will use PTA_* flags in future for certain targets and we can
>  > switch this flag depending on target features. So:
>  >
>  > gcc -msse -maes
>  >
>  > or
>  >
>  > gcc -mfancy_future_target
>  >
>  > will both switch AES instruction supports on.
>  >
>  > Regarding the tests, all new *intrin.h should also be added to
>  > gcc.target/i386/sse-[13,14] to check that intrinsics compile with and
>  > without optimization (I would also recommend to add the header to
>  > g++.dg/other/i386-3.C)
>  >
>  >
>
>  Here is the updated patch. OK to install?
>
>
>  Thanks.
>
>
>  H.J.
>  -----
>  gcc/
>
>  2008-04-03  H.J. Lu  <hongjiu.lu@intel.com>
>
>         * config.gcc (extra_headers): Add wmmintrin.h for x86 and x86-64.
>
>         * config/i386/cpuid.h (bit_AES): New.
>         (bit_CLMUL): Likewise.
>
>         * config/i386/i386.c (pta_flags): Add PTA_AES and PTA_CLMUL.
>         (override_options): Handle PTA_AES and PTA_CLMUL.  Enable
>         SSE 4.2 if AES or CLMUL is enabled.
>
>         (ix86_builtins): Add IX86_BUILTIN_AESENC128,
>         IX86_BUILTIN_AESENCLAST128, IX86_BUILTIN_AESDEC128,
>         IX86_BUILTIN_AESDECLAST128, IX86_BUILTIN_AESIMC128,
>         IX86_BUILTIN_AESKEYGENASSIST128 and IX86_BUILTIN_PCLMULQDQ128.
>         (bdesc_sse_3arg): Add IX86_BUILTIN_PCLMULQDQ128.
>         (bdesc_2arg): Add IX86_BUILTIN_AESENC128,
>
>         IX86_BUILTIN_AESENCLAST128, IX86_BUILTIN_AESDEC128,
>         IX86_BUILTIN_AESDECLAST128 and IX86_BUILTIN_AESKEYGENASSIST128.
>         (bdesc_1arg): Add IX86_BUILTIN_AESIMC128.
>         (ix86_init_mmx_sse_builtins): Define __builtin_ia32_aesenc128,
>
>         __builtin_ia32_aesenclast128, __builtin_ia32_aesdec128,
>         __builtin_ia32_aesdeclast128,__builtin_ia32_aesimc128,
>         __builtin_ia32_aeskeygenassist128 and
>         __builtin_ia32_pclmulqdq128.
>
>
>         * config/i386/i386.c (ix86_expand_binop_imm_builtin): New.
>         (ix86_expand_builtin): Use it for IX86_BUILTIN_PSLLDQI128 and
>         IX86_BUILTIN_PSRLDQI128.  Handle IX86_BUILTIN_AESKEYGENASSIST128.
>
>         * config/i386/i386.h (TARGET_AES): New.
>         (TARGET_CLMUL): Likewise.
>         (TARGET_CPU_CPP_BUILTINS): Handle TARGET_AES and TARGET_CLMUL.
>
>         * config/i386/i386.md (UNSPEC_AESENC): New.
>         (UNSPEC_AESENCLAST): Likewise.
>         (UNSPEC_AESDEC): Likewise.
>         (UNSPEC_AESDECLAST): Likewise.
>         (UNSPEC_AESIMC): Likewise.
>         (UNSPEC_AESKEYGENASSIST): Likewise.
>         (UNSPEC_PCLMULQDQ): Likewise.
>
>         * config/i386/i386.opt (maes): New.
>         (mclmul): Likewise.
>
>         * config/i386/sse.md (aesenc): New pattern.
>         (aesenclast): Likewise.
>         (aesdec): Likewise.
>         (aesdeclast): Likewise.
>         (aesimc): Likewise.
>         (aeskeygenassist): Likewise.
>         (pclmulqdq): Likewise.
>
>         * config/i386/wmmintrin.h: New.
>
>         * doc/extend.texi: Document AES and CLMUL built-in function.
>
>         * doc/invoke.texi: Document -maes and -mclmul.
>
>  gcc/testsuite/
>
>  2008-04-03  H.J. Lu  <hongjiu.lu@intel.com>
>
>         * g++.dg/other/i386-2.C: Include <wmmintrin.h> instead of
>         <smmintrin.h>.
>         * g++.dg/other/i386-3.C: Likewise.
>         * gcc.target/i386/sse-13.c: Likewise.
>         * gcc.target/i386/sse-14.c: Likewise.
>
>
>         * gcc.target/i386/aes-check.h: New.
>         * gcc.target/i386/aesdec.c: Likewise.
>         * gcc.target/i386/aesdeclast.c: Likewise.
>         * gcc.target/i386/aesenc.c: Likewise.
>         * gcc.target/i386/aesenclast.c: Likewise.
>         * gcc.target/i386/aesimc.c: Likewise.
>         * gcc.target/i386/aeskeygenassist.c: Likewise.
>         * gcc.target/i386/pclmulqdq.c: Likewise.
>         * gcc.target/i386/clmul-check.h: Likewise.
>
>         * gcc.target/i386/i386.exp (check_effective_target_aes): New.
>         (check_effective_target_clmul): Likewise.
>

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: gcc-aes-6.patch --]
[-- Type: text/x-patch; name=gcc-aes-6.patch, Size: 44558 bytes --]

--- gcc/config.gcc.aes	2008-04-03 12:20:45.000000000 -0700
+++ gcc/config.gcc	2008-04-03 12:20:45.000000000 -0700
@@ -309,13 +309,15 @@ i[34567]86-*-*)
 	cpu_type=i386
 	extra_headers="cpuid.h mmintrin.h mm3dnow.h xmmintrin.h emmintrin.h
 		       pmmintrin.h tmmintrin.h ammintrin.h smmintrin.h
-		       nmmintrin.h bmmintrin.h mmintrin-common.h"
+		       nmmintrin.h bmmintrin.h mmintrin-common.h
+		       wmmintrin.h"
 	;;
 x86_64-*-*)
 	cpu_type=i386
 	extra_headers="cpuid.h mmintrin.h mm3dnow.h xmmintrin.h emmintrin.h
 		       pmmintrin.h tmmintrin.h ammintrin.h smmintrin.h
-		       nmmintrin.h bmmintrin.h mmintrin-common.h"
+		       nmmintrin.h bmmintrin.h mmintrin-common.h
+		       wmmintrin.h"
 	need_64bit_hwint=yes
 	;;
 ia64-*-*)
--- gcc/config/i386/cpuid.h.aes	2008-02-25 09:57:37.000000000 -0800
+++ gcc/config/i386/cpuid.h	2008-04-03 12:20:45.000000000 -0700
@@ -33,11 +33,13 @@
 
 /* %ecx */
 #define bit_SSE3	(1 << 0)
+#define bit_CLMUL	(1 << 1)
 #define bit_SSSE3	(1 << 9)
 #define bit_CMPXCHG16B	(1 << 13)
 #define bit_SSE4_1	(1 << 19)
 #define bit_SSE4_2	(1 << 20)
 #define bit_POPCNT	(1 << 23)
+#define bit_AES		(1 << 25)
 
 /* %edx */
 #define bit_CMPXCHG8B	(1 << 8)
--- gcc/config/i386/i386.c.aes	2008-04-03 12:20:45.000000000 -0700
+++ gcc/config/i386/i386.c	2008-04-03 16:07:38.000000000 -0700
@@ -2078,7 +2078,9 @@ override_options (void)
       PTA_NO_SAHF = 1 << 13,
       PTA_SSE4_1 = 1 << 14,
       PTA_SSE4_2 = 1 << 15,
-      PTA_SSE5 = 1 << 16
+      PTA_SSE5 = 1 << 16,
+      PTA_AES = 1 << 17,
+      PTA_CLMUL = 1 << 18
     };
 
   static struct pta
@@ -2385,6 +2387,10 @@ override_options (void)
 	  x86_prefetch_sse = true;
 	if (!(TARGET_64BIT && (processor_alias_table[i].flags & PTA_NO_SAHF)))
 	  x86_sahf = true;
+	if (processor_alias_table[i].flags & PTA_AES)
+	  x86_aes = true;
+	if (processor_alias_table[i].flags & PTA_CLMUL)
+	  x86_clmul = true;
 
 	break;
       }
@@ -2428,6 +2434,14 @@ override_options (void)
   if (i == pta_size)
     error ("bad value (%s) for -mtune= switch", ix86_tune_string);
 
+  /* Enable SSE 4.2 if AES or CLMUL is enabled.  */
+  if ((x86_aes || x86_clmul)
+      && !(ix86_isa_flags_explicit & OPTION_MASK_ISA_SSE4_2))
+    {
+      ix86_isa_flags |= OPTION_MASK_ISA_SSE4_2_SET;
+      ix86_isa_flags_explicit |= OPTION_MASK_ISA_SSE4_2_SET;
+    }
+
   ix86_tune_mask = 1u << ix86_tune;
   for (i = 0; i < X86_TUNE_LAST; ++i)
     ix86_tune_features[i] &= ix86_tune_mask;
@@ -17626,6 +17640,17 @@ enum ix86_builtins
 
   IX86_BUILTIN_PCMPGTQ,
 
+  /* AES instructions */
+  IX86_BUILTIN_AESENC128,
+  IX86_BUILTIN_AESENCLAST128,
+  IX86_BUILTIN_AESDEC128,
+  IX86_BUILTIN_AESDECLAST128,
+  IX86_BUILTIN_AESIMC128,
+  IX86_BUILTIN_AESKEYGENASSIST128,
+
+  /* CLMUL instruction */
+  IX86_BUILTIN_PCLMULQDQ128,
+
   /* TFmode support builtins.  */
   IX86_BUILTIN_INFQ,
   IX86_BUILTIN_FABSQ,
@@ -17987,6 +18012,9 @@ static const struct builtin_description 
   { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_pblendw, "__builtin_ia32_pblendw128", IX86_BUILTIN_PBLENDW128, UNKNOWN, 0 },
   { OPTION_MASK_ISA_ROUND, CODE_FOR_sse4_1_roundsd, "__builtin_ia32_roundsd", IX86_BUILTIN_ROUNDSD, UNKNOWN, 0 },
   { OPTION_MASK_ISA_ROUND, CODE_FOR_sse4_1_roundss, "__builtin_ia32_roundss", IX86_BUILTIN_ROUNDSS, UNKNOWN, 0 },
+
+  /* CLMUL */
+  { OPTION_MASK_ISA_SSE4_2, CODE_FOR_pclmulqdq, 0, IX86_BUILTIN_PCLMULQDQ128, UNKNOWN, 0 },
 };
 
 static const struct builtin_description bdesc_2arg[] =
@@ -18267,6 +18295,13 @@ static const struct builtin_description 
 
   /* SSE4.2 */
   { OPTION_MASK_ISA_SSE4_2, CODE_FOR_sse4_2_gtv2di3, "__builtin_ia32_pcmpgtq", IX86_BUILTIN_PCMPGTQ, UNKNOWN, 0 },
+
+  /* AES */
+  { OPTION_MASK_ISA_SSE4_2, CODE_FOR_aesenc, 0, IX86_BUILTIN_AESENC128, UNKNOWN, 0 },
+  { OPTION_MASK_ISA_SSE4_2, CODE_FOR_aesenclast, 0, IX86_BUILTIN_AESENCLAST128, UNKNOWN, 0 },
+  { OPTION_MASK_ISA_SSE4_2, CODE_FOR_aesdec, 0, IX86_BUILTIN_AESDEC128, UNKNOWN, 0 },
+  { OPTION_MASK_ISA_SSE4_2, CODE_FOR_aesdeclast, 0, IX86_BUILTIN_AESDECLAST128, UNKNOWN, 0 },
+  { OPTION_MASK_ISA_SSE4_2, CODE_FOR_aeskeygenassist, 0, IX86_BUILTIN_AESKEYGENASSIST128, UNKNOWN, 0 },
 };
 
 static const struct builtin_description bdesc_1arg[] =
@@ -18344,6 +18379,9 @@ static const struct builtin_description 
   /* Fake 1 arg builtins with a constant smaller than 8 bits as the 2nd arg.  */
   { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_roundpd, 0, IX86_BUILTIN_ROUNDPD, UNKNOWN, 0 },
   { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_roundps, 0, IX86_BUILTIN_ROUNDPS, UNKNOWN, 0 },
+
+  /* AES */
+  { OPTION_MASK_ISA_SSE4_2, CODE_FOR_aesimc, 0, IX86_BUILTIN_AESIMC128, UNKNOWN, 0 },
 };
 
 /* SSE5 */
@@ -19580,6 +19618,25 @@ ix86_init_mmx_sse_builtins (void)
 				    NULL_TREE);
   def_builtin_const (OPTION_MASK_ISA_SSE4_2, "__builtin_ia32_crc32di", ftype, IX86_BUILTIN_CRC32DI);
 
+  /* AES */
+  if (TARGET_AES)
+    {
+      /* Define AES built-in functions only if AES is enabled.  */
+      def_builtin_const (OPTION_MASK_ISA_SSE4_2, "__builtin_ia32_aesenc128", v2di_ftype_v2di_v2di, IX86_BUILTIN_AESENC128);
+      def_builtin_const (OPTION_MASK_ISA_SSE4_2, "__builtin_ia32_aesenclast128", v2di_ftype_v2di_v2di, IX86_BUILTIN_AESENCLAST128);
+      def_builtin_const (OPTION_MASK_ISA_SSE4_2, "__builtin_ia32_aesdec128", v2di_ftype_v2di_v2di, IX86_BUILTIN_AESDEC128);
+      def_builtin_const (OPTION_MASK_ISA_SSE4_2, "__builtin_ia32_aesdeclast128", v2di_ftype_v2di_v2di, IX86_BUILTIN_AESDECLAST128);
+      def_builtin_const (OPTION_MASK_ISA_SSE4_2, "__builtin_ia32_aesimc128", v2di_ftype_v2di, IX86_BUILTIN_AESIMC128);
+      def_builtin_const (OPTION_MASK_ISA_SSE2, "__builtin_ia32_aeskeygenassist128", v2di_ftype_v2di_int, IX86_BUILTIN_AESKEYGENASSIST128);
+    }
+
+  /* CLMUL */
+  if (TARGET_CLMUL)
+    {
+      /* Define CLMUL built-in function only if CLMUL is enabled.  */
+      def_builtin_const (OPTION_MASK_ISA_SSE4_2, "__builtin_ia32_pclmulqdq128", v2di_ftype_v2di_v2di_int, IX86_BUILTIN_PCLMULQDQ128);
+    }
+
   /* AMDFAM10 SSE4A New built-ins  */
   def_builtin (OPTION_MASK_ISA_SSE4A, "__builtin_ia32_movntsd", void_ftype_pdouble_v2df, IX86_BUILTIN_MOVNTSD);
   def_builtin (OPTION_MASK_ISA_SSE4A, "__builtin_ia32_movntss", void_ftype_pfloat_v4sf, IX86_BUILTIN_MOVNTSS);
@@ -19860,6 +19917,44 @@ ix86_expand_crc32 (enum insn_code icode,
   return target;
 }
 
+/* Subroutine of ix86_expand_builtin to take care of binop insns
+   with an immediate.  */
+
+static rtx
+ix86_expand_binop_imm_builtin (enum insn_code icode, tree exp,
+				rtx target)
+{
+  rtx pat;
+  tree arg0 = CALL_EXPR_ARG (exp, 0);
+  tree arg1 = CALL_EXPR_ARG (exp, 1);
+  rtx op0 = expand_normal (arg0);
+  rtx op1 = expand_normal (arg1);
+  enum machine_mode tmode = insn_data[icode].operand[0].mode;
+  enum machine_mode mode0 = insn_data[icode].operand[1].mode;
+  enum machine_mode mode1 = insn_data[icode].operand[2].mode;
+
+  if (! (*insn_data[icode].operand[1].predicate) (op0, mode1))
+    {
+      op0 = copy_to_reg (op0);
+      op0 = simplify_gen_subreg (mode0, op0, GET_MODE (op0), 0);
+    }
+
+  if (! (*insn_data[icode].operand[2].predicate) (op1, mode1))
+    {
+      error ("the last operand must be an immediate");
+      return const0_rtx;
+    }
+
+  target = gen_reg_rtx (V2DImode);
+  pat = GEN_FCN (icode) (simplify_gen_subreg (tmode, target,
+					      V2DImode, 0),
+			 op0, op1);
+  if (! pat)
+    return 0;
+  emit_insn (pat);
+  return target;
+}
+
 /* Subroutine of ix86_expand_builtin to take care of binop insns.  */
 
 static rtx
@@ -20952,34 +21047,18 @@ ix86_expand_builtin (tree exp, rtx targe
       return target;
 
     case IX86_BUILTIN_PSLLDQI128:
+      return ix86_expand_binop_imm_builtin (CODE_FOR_sse2_ashlti3,
+					     exp, target);
+      break;
+
     case IX86_BUILTIN_PSRLDQI128:
-      icode = (fcode == IX86_BUILTIN_PSLLDQI128 ? CODE_FOR_sse2_ashlti3
-	       : CODE_FOR_sse2_lshrti3);
-      arg0 = CALL_EXPR_ARG (exp, 0);
-      arg1 = CALL_EXPR_ARG (exp, 1);
-      op0 = expand_normal (arg0);
-      op1 = expand_normal (arg1);
-      tmode = insn_data[icode].operand[0].mode;
-      mode1 = insn_data[icode].operand[1].mode;
-      mode2 = insn_data[icode].operand[2].mode;
+      return ix86_expand_binop_imm_builtin (CODE_FOR_sse2_lshrti3,
+					     exp, target);
+      break;
 
-      if (! (*insn_data[icode].operand[1].predicate) (op0, mode1))
-	{
-	  op0 = copy_to_reg (op0);
-	  op0 = simplify_gen_subreg (mode1, op0, GET_MODE (op0), 0);
-	}
-      if (! (*insn_data[icode].operand[2].predicate) (op1, mode2))
-	{
-	  error ("shift must be an immediate");
-	  return const0_rtx;
-	}
-      target = gen_reg_rtx (V2DImode);
-      pat = GEN_FCN (icode) (simplify_gen_subreg (tmode, target, V2DImode, 0),
-			     op0, op1);
-      if (! pat)
-	return 0;
-      emit_insn (pat);
-      return target;
+    case IX86_BUILTIN_AESKEYGENASSIST128:
+      return ix86_expand_binop_imm_builtin (CODE_FOR_aeskeygenassist,
+					     exp, target);
 
     case IX86_BUILTIN_FEMMS:
       emit_insn (gen_mmx_femms ());
--- gcc/config/i386/i386.h.aes	2008-04-03 12:20:45.000000000 -0700
+++ gcc/config/i386/i386.h	2008-04-03 13:13:28.000000000 -0700
@@ -395,6 +395,8 @@ extern int x86_prefetch_sse;
 #define TARGET_SAHF		x86_sahf
 #define TARGET_RECIP		x86_recip
 #define TARGET_FUSED_MADD	x86_fused_muladd
+#define TARGET_AES		(TARGET_SSE4_2 && x86_aes)
+#define TARGET_CLMUL		(TARGET_SSE4_2 && x86_clmul)
 
 #define ASSEMBLER_DIALECT	(ix86_asm_dialect)
 
@@ -683,6 +685,10 @@ extern const char *host_detect_local_cpu
 	builtin_define ("__SSE4_1__");				\
       if (TARGET_SSE4_2)					\
 	builtin_define ("__SSE4_2__");				\
+      if (TARGET_AES)						\
+	builtin_define ("__AES__");				\
+      if (TARGET_CLMUL)						\
+	builtin_define ("__CLMUL__");				\
       if (TARGET_SSE4A)						\
  	builtin_define ("__SSE4A__");		                \
       if (TARGET_SSE5)						\
--- gcc/config/i386/i386.md.aes	2008-04-03 12:20:45.000000000 -0700
+++ gcc/config/i386/i386.md	2008-04-03 12:20:45.000000000 -0700
@@ -186,6 +186,17 @@
    (UNSPEC_FRCZ			156)
    (UNSPEC_CVTPH2PS		157)
    (UNSPEC_CVTPS2PH		158)
+
+   ; For AES support
+   (UNSPEC_AESENC		159)
+   (UNSPEC_AESENCLAST		160)
+   (UNSPEC_AESDEC		161)
+   (UNSPEC_AESDECLAST		162)
+   (UNSPEC_AESIMC		163)
+   (UNSPEC_AESKEYGENASSIST	164)
+
+   ; For CLMUL support
+   (UNSPEC_CLMUL		165)
   ])
 
 (define_constants
--- gcc/config/i386/i386.opt.aes	2007-09-12 21:44:32.000000000 -0700
+++ gcc/config/i386/i386.opt	2008-04-03 13:14:09.000000000 -0700
@@ -275,3 +275,11 @@ Target Report Var(x86_fused_muladd) Init
 Enable automatic generation of fused floating point multiply-add instructions
 if the ISA supports such instructions.  The -mfused-madd option is on by
 default.
+
+maes
+Target Report RejectNegative Var(x86_aes)
+Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2 and AES built-in functions and code generation
+
+mclmul
+Target Report RejectNegative Var(x86_clmul)
+Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2 and CLMUL built-in functions and code generation
--- gcc/config/i386/sse.md.aes	2008-04-03 11:49:09.000000000 -0700
+++ gcc/config/i386/sse.md	2008-04-03 12:20:45.000000000 -0700
@@ -7897,3 +7897,80 @@
 }
   [(set_attr "type" "ssecmp")
    (set_attr "mode" "TI")])
+
+(define_insn "aesenc"
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
+	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0")
+		       (match_operand:V2DI 2 "nonimmediate_operand" "xm")]
+		      UNSPEC_AESENC))]
+  "TARGET_AES"
+  "aesenc\t{%2, %0|%0, %2}"
+  [(set_attr "type" "sselog1")
+   (set_attr "prefix_extra" "1")
+   (set_attr "mode" "TI")])
+
+(define_insn "aesenclast"
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
+	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0")
+		       (match_operand:V2DI 2 "nonimmediate_operand" "xm")]
+		      UNSPEC_AESENCLAST))]
+  "TARGET_AES"
+  "aesenclast\t{%2, %0|%0, %2}"
+  [(set_attr "type" "sselog1")
+   (set_attr "prefix_extra" "1")
+   (set_attr "mode" "TI")])
+
+(define_insn "aesdec"
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
+	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0")
+		       (match_operand:V2DI 2 "nonimmediate_operand" "xm")]
+		      UNSPEC_AESDEC))]
+  "TARGET_AES"
+  "aesdec\t{%2, %0|%0, %2}"
+  [(set_attr "type" "sselog1")
+   (set_attr "prefix_extra" "1")
+   (set_attr "mode" "TI")])
+
+(define_insn "aesdeclast"
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
+	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0")
+		       (match_operand:V2DI 2 "nonimmediate_operand" "xm")]
+		      UNSPEC_AESDECLAST))]
+  "TARGET_AES"
+  "aesdeclast\t{%2, %0|%0, %2}"
+  [(set_attr "type" "sselog1")
+   (set_attr "prefix_extra" "1")
+   (set_attr "mode" "TI")])
+
+(define_insn "aesimc"
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
+	(unspec:V2DI [(match_operand:V2DI 1 "nonimmediate_operand" "xm")]
+		      UNSPEC_AESIMC))]
+  "TARGET_AES"
+  "aesimc\t{%1, %0|%0, %1}"
+  [(set_attr "type" "sselog1")
+   (set_attr "prefix_extra" "1")
+   (set_attr "mode" "TI")])
+
+(define_insn "aeskeygenassist"
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
+	(unspec:V2DI [(match_operand:V2DI 1 "nonimmediate_operand" "xm")
+		      (match_operand:SI 2 "const_0_to_255_operand" "n")]
+		     UNSPEC_AESKEYGENASSIST))]
+  "TARGET_AES"
+  "aeskeygenassist\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "sselog1")
+   (set_attr "prefix_extra" "1")
+   (set_attr "mode" "TI")])
+
+(define_insn "pclmulqdq"
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
+	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0")
+		      (match_operand:V2DI 2 "nonimmediate_operand" "xm")
+		      (match_operand:SI 3 "const_0_to_255_operand" "n")]
+		     UNSPEC_CLMUL))]
+  "TARGET_CLMUL"
+  "pclmulqdq\t{%3, %2, %0|%0, %2, %3}"
+  [(set_attr "type" "sselog1")
+   (set_attr "prefix_extra" "1")
+   (set_attr "mode" "TI")])
--- gcc/config/i386/wmmintrin.h.aes	2008-04-03 12:20:45.000000000 -0700
+++ gcc/config/i386/wmmintrin.h	2008-04-03 14:39:31.000000000 -0700
@@ -0,0 +1,124 @@
+/* Copyright (C) 2008 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 2, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING.  If not, write to
+   the Free Software Foundation, 59 Temple Place - Suite 330,
+   Boston, MA 02111-1307, USA.  */
+
+/* As a special exception, if you include this header file into source
+   files compiled by GCC, this header file does not by itself cause
+   the resulting executable to be covered by the GNU General Public
+   License.  This exception does not however invalidate any other
+   reasons why the executable file might be covered by the GNU General
+   Public License.  */
+
+/* Implemented from the specification included in the Intel C++ Compiler
+   User Guide and Reference, version 10.1.  */
+
+#ifndef _WMMINTRIN_H_INCLUDED
+#define _WMMINTRIN_H_INCLUDED
+
+#if !defined (__AES__) && !defined (__CLMUL__)
+# error "AES/CLMUL instruction set not enabled"
+#else
+
+/* We need definitions from the SSE4, SSSE3, SSE3, SSE2 and SSE header
+   files.  */
+#include <smmintrin.h>
+
+/* AES */
+
+#ifdef __AES__
+/* Performs 1 round of AES decryption of the first m128i using 
+   the second m128i as a round key.  */
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_aesdec_si128 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) __builtin_ia32_aesdec128 ((__v2di)__X, (__v2di)__Y);
+}
+
+/* Performs the last round of AES decryption of the first m128i 
+   using the second m128i as a round key.  */
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_aesdeclast_si128 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) __builtin_ia32_aesdeclast128 ((__v2di)__X,
+						 (__v2di)__Y);
+}
+
+/* Performs 1 round of AES encryption of the first m128i using 
+   the second m128i as a round key.  */
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_aesenc_si128 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) __builtin_ia32_aesenc128 ((__v2di)__X, (__v2di)__Y);
+}
+
+/* Performs the last round of AES encryption of the first m128i
+   using the second m128i as a round key.  */
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_aesenclast_si128 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) __builtin_ia32_aesenclast128 ((__v2di)__X, (__v2di)__Y);
+}
+
+/* Performs the InverseMixColumn operation on the source m128i 
+   and stores the result into m128i destination.  */
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_aesimc_si128 (__m128i __X)
+{
+  return (__m128i) __builtin_ia32_aesimc128 ((__v2di)__X);
+}
+
+/* Generates a m128i round key for the input m128i AES cipher key and
+   byte round constant.  The second parameter must be a compile time
+   constant.  */
+#ifdef __OPTIMIZE__
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_aeskeygenassist_si128 (__m128i __X, const int __C)
+{
+  return (__m128i) __builtin_ia32_aeskeygenassist128 ((__v2di)__X, __C);
+}
+#else
+#define _mm_aeskeygenassist_si128(X, C)					\
+  ((__m128i) __builtin_ia32_aeskeygenassist128 ((__v2di)(__m128i)(X),	\
+						(int)(C)))
+#endif
+#endif  /* __AES__ */
+
+/* CLMUL */
+
+#ifdef __CLMUL__
+/* Performs carry-less integer multiplication of 64-bit halves of
+   128-bit input operands.  The third parameter inducates which 64-bit
+   haves of the input parameters v1 and v2 should be used. It must be
+   a compile time constant.  */
+#ifdef __OPTIMIZE__
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_clmulepi64_si128 (__m128i __X, __m128i __Y, const int __I)
+{
+  return (__m128i) __builtin_ia32_pclmulqdq128 ((__v2di)__X,
+						(__v2di)__Y, __I);
+}
+#else
+#define _mm_clmulepi64_si128(X, Y, I)					\
+  ((__m128i) __builtin_ia32_pclmulqdq128 ((__v2di)(__m128i)(X),		\
+					  (__v2di)(__m128i)(Y), (int)(I)))
+#endif
+#endif  /* __CLMUL__  */
+
+#endif /* __AES__/__CLMUL__ */
+
+#endif /* _WMMINTRIN_H_INCLUDED */
--- gcc/doc/extend.texi.aes	2008-03-28 13:03:17.000000000 -0700
+++ gcc/doc/extend.texi	2008-04-03 12:20:45.000000000 -0700
@@ -8013,6 +8013,27 @@ depending on the size of @code{unsigned 
 Generates the @code{popcntq} machine instruction.
 @end table
 
+The following built-in functions are available when @option{-maes} is
+used.  All of them generate the machine instruction that is part of the
+name.
+
+@smallexample
+v2di __builtin_ia32_aesenc128 (v2di, v2di)
+v2di __builtin_ia32_aesenclast128 (v2di, v2di)
+v2di __builtin_ia32_aesdec128 (v2di, v2di)
+v2di __builtin_ia32_aesdeclast128 (v2di, v2di)
+v2di __builtin_ia32_aeskeygenassist128 (v2di, const int)
+v2di __builtin_ia32_aesimc128 (v2di)
+@end smallexample
+
+The following built-in function is available when @option{-mclmul} is
+used.
+
+@table @code
+@item v2di __builtin_ia32_pclmulqdq128 (v2di, v2di, const int)
+Generates the @code{pclmulqdq} machine instruction.
+@end table
+
 The following built-in functions are available when @option{-msse4a} is used.
 All of them generate the machine instruction that is part of the name.
 
--- gcc/doc/invoke.texi.aes	2008-04-03 12:20:45.000000000 -0700
+++ gcc/doc/invoke.texi	2008-04-03 12:20:45.000000000 -0700
@@ -555,6 +555,7 @@ Objective-C and Objective-C++ Dialects}.
 -mno-wide-multiply  -mrtd  -malign-double @gol
 -mpreferred-stack-boundary=@var{num} -mcx16 -msahf -mrecip @gol
 -mmmx  -msse  -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msse4 @gol
+-maes -mclmul @gol
 -msse4a -m3dnow -mpopcnt -mabm -msse5 @gol
 -mthreads  -mno-align-stringops  -minline-all-stringops @gol
 -mpush-args  -maccumulate-outgoing-args  -m128bit-long-double @gol
@@ -10720,6 +10721,10 @@ preferred alignment to @option{-mpreferr
 @itemx -mno-sse4.2
 @itemx -msse4
 @itemx -mno-sse4
+@itemx -maes
+@itemx -mno-aes
+@itemx -mclmul
+@itemx -mno-clmul
 @itemx -msse4a
 @itemx -mno-sse4a
 @itemx -msse5
@@ -10737,8 +10742,8 @@ preferred alignment to @option{-mpreferr
 @opindex m3dnow
 @opindex mno-3dnow
 These switches enable or disable the use of instructions in the MMX,
-SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4A, SSE5, ABM or 3DNow!@: extended
-instruction sets.
+SSE, SSE2, SSE3, SSSE3, SSE4.1, AES, CLMUL, SSE4A, SSE5, ABM or
+3DNow!@: extended instruction sets.
 These extensions are also available as built-in functions: see
 @ref{X86 Built-in Functions}, for details of the functions enabled and
 disabled by these switches.
--- gcc/testsuite/g++.dg/other/i386-2.C.aes	2007-12-15 14:50:08.000000000 -0800
+++ gcc/testsuite/g++.dg/other/i386-2.C	2008-04-03 14:21:06.000000000 -0700
@@ -1,10 +1,10 @@
-/* Test that {,x,e,p,t,s,a,b}mmintrin.h, mm3dnow.h and mm_malloc.h are
+/* Test that {,x,e,p,t,s,w,a,b}mmintrin.h, mm3dnow.h and mm_malloc.h are
    usable with -O -pedantic-errors.  */
 /* { dg-do compile { target i?86-*-* x86_64-*-* } } */
-/* { dg-options "-O -pedantic-errors -march=k8 -m3dnow -msse4 -msse5" } */
+/* { dg-options "-O -pedantic-errors -march=k8 -m3dnow -msse5 -maes -mclmul" } */
 
 #include <bmmintrin.h>
-#include <smmintrin.h>
+#include <wmmintrin.h>
 #include <mm3dnow.h>
 
 int dummy;
--- gcc/testsuite/g++.dg/other/i386-3.C.aes	2008-03-17 06:44:39.000000000 -0700
+++ gcc/testsuite/g++.dg/other/i386-3.C	2008-04-03 14:21:19.000000000 -0700
@@ -1,8 +1,8 @@
-/* Test that {,x,e,p,t,s,a,b}mmintrin.h, mm3dnow.h and mm_malloc.h are
+/* Test that {,x,e,p,t,s,w,a,b}mmintrin.h, mm3dnow.h and mm_malloc.h are
    usable with -O -fkeep-inline-functions.  */
 /* { dg-do compile { target i?86-*-* x86_64-*-* } } */
-/* { dg-options "-O -fkeep-inline-functions -march=k8 -m3dnow -msse4 -msse5" } */
+/* { dg-options "-O -fkeep-inline-functions -march=k8 -m3dnow -maes -mclmul -msse5" } */
 
 #include <bmmintrin.h>
-#include <smmintrin.h>
+#include <wmmintrin.h>
 #include <mm3dnow.h>
--- gcc/testsuite/gcc.target/i386/aes-check.h.aes	2008-04-03 12:20:45.000000000 -0700
+++ gcc/testsuite/gcc.target/i386/aes-check.h	2008-04-03 12:20:45.000000000 -0700
@@ -0,0 +1,30 @@
+#include <stdio.h>
+#include <stdlib.h>
+
+#include "cpuid.h"
+
+static void aes_test (void);
+
+int
+main ()
+{
+  unsigned int eax, ebx, ecx, edx;
+ 
+  if (!__get_cpuid (1, &eax, &ebx, &ecx, &edx))
+    return 0;
+
+  /* Run AES test only if host has AES support.  */
+  if (ecx & bit_AES)
+    {
+      aes_test ();
+#ifdef DEBUG
+    printf ("PASSED\n");
+#endif
+    }
+#ifdef DEBUG
+  else
+    printf ("SKIPPED\n");
+#endif
+
+  return 0;
+}
--- gcc/testsuite/gcc.target/i386/aesdec.c.aes	2008-04-03 12:20:45.000000000 -0700
+++ gcc/testsuite/gcc.target/i386/aesdec.c	2008-04-03 12:20:45.000000000 -0700
@@ -0,0 +1,67 @@
+/* { dg-do run } */
+/* { dg-require-effective-target aes } */
+/* { dg-options "-O2 -maes" } */
+
+#include <wmmintrin.h>
+#include <string.h>
+
+#include "aes-check.h"
+
+extern void abort (void);
+
+#define NUM 1024
+
+static __m128i src1[NUM];
+static __m128i src2[NUM];
+static __m128i edst[NUM];
+
+static __m128i resdst[NUM];
+
+/* Initialize input/output vectors.  (Currently, there is only one set
+   of input/output vectors).  */
+static void
+init_data (__m128i *s1, __m128i *s2, __m128i *d)
+{
+  int i;
+  for (i = 0; i < NUM; i++)
+    {
+      s1[i] = _mm_setr_epi32 (0x5d53475d, 0x63746f72,
+			      0x73745665, 0x7b5b5465);
+      s2[i] = _mm_setr_epi32 (0x726f6e5d, 0x5b477565,
+			      0x68617929, 0x48692853);
+      d[i]  = _mm_setr_epi32 (0xb730392a, 0xb58eb95e,
+			      0xfaea2787, 0x138ac342);
+    }
+}
+
+static void
+aes_test (void)
+{
+  int i;
+
+  init_data (src1, src2, edst);
+
+  for (i = 0; i < NUM; i += 16)
+    {
+      resdst[i] = _mm_aesdec_si128 (src1[i], src2[i]);
+      resdst[i + 1] = _mm_aesdec_si128 (src1[i + 1], src2[i + 1]);
+      resdst[i + 2] = _mm_aesdec_si128 (src1[i + 2], src2[i + 2]);
+      resdst[i + 3] = _mm_aesdec_si128 (src1[i + 3], src2[i + 3]);
+      resdst[i + 4] = _mm_aesdec_si128 (src1[i + 4], src2[i + 4]);
+      resdst[i + 5] = _mm_aesdec_si128 (src1[i + 5], src2[i + 5]);
+      resdst[i + 6] = _mm_aesdec_si128 (src1[i + 6], src2[i + 6]);
+      resdst[i + 7] = _mm_aesdec_si128 (src1[i + 7], src2[i + 7]);
+      resdst[i + 8] = _mm_aesdec_si128 (src1[i + 8], src2[i + 8]);
+      resdst[i + 9] = _mm_aesdec_si128 (src1[i + 9], src2[i + 9]);
+      resdst[i + 10] = _mm_aesdec_si128 (src1[i + 10], src2[i + 10]);
+      resdst[i + 11] = _mm_aesdec_si128 (src1[i + 11], src2[i + 11]);
+      resdst[i + 12] = _mm_aesdec_si128 (src1[i + 12], src2[i + 12]);
+      resdst[i + 13] = _mm_aesdec_si128 (src1[i + 13], src2[i + 13]);
+      resdst[i + 14] = _mm_aesdec_si128 (src1[i + 14], src2[i + 14]);
+      resdst[i + 15] = _mm_aesdec_si128 (src1[i + 15], src2[i + 15]);
+    }
+
+  for (i = 0; i < NUM; i++)
+    if (memcmp (edst + i, resdst + i, sizeof (__m128i)))
+      abort ();
+}
--- gcc/testsuite/gcc.target/i386/aesdeclast.c.aes	2008-04-03 12:20:45.000000000 -0700
+++ gcc/testsuite/gcc.target/i386/aesdeclast.c	2008-04-03 12:20:45.000000000 -0700
@@ -0,0 +1,69 @@
+/* { dg-do run } */
+/* { dg-require-effective-target aes } */
+/* { dg-options "-O2 -maes" } */
+
+#include <wmmintrin.h>
+#include <string.h>
+
+#include "aes-check.h"
+
+extern void abort (void);
+
+#define NUM 1024
+
+static __m128i src1[NUM];
+static __m128i src2[NUM];
+static __m128i edst[NUM];
+
+static __m128i resdst[NUM];
+
+/* Initialize input/output vectors.  (Currently, there is only one set of
+   input/output vectors).  */
+
+static void
+init_data (__m128i *s1, __m128i *s2, __m128i *d)
+{
+  int i;
+
+  for (i = 0; i < NUM; i++)
+    {
+      s1[i] = _mm_setr_epi32 (0x5d53475d, 0x63746f72,
+			      0x73745665, 0x7b5b5465);
+      s2[i] = _mm_setr_epi32 (0x726f6e5d, 0x5b477565,
+			      0x68617929, 0x48692853);
+      d[i] = _mm_setr_epi32 (0x72a593d0, 0xd410637b,
+			     0x6b317f95, 0xc5a391ef);
+    }
+}
+
+static void
+aes_test (void)
+{
+  int i;
+
+  init_data (src1, src2, edst);
+
+  for (i = 0; i < NUM; i += 16)
+    {
+      resdst[i] = _mm_aesdeclast_si128 (src1[i], src2[i]);
+      resdst[i + 1] = _mm_aesdeclast_si128 (src1[i + 1], src2[i + 1]);
+      resdst[i + 2] = _mm_aesdeclast_si128 (src1[i + 2], src2[i + 2]);
+      resdst[i + 3] = _mm_aesdeclast_si128 (src1[i + 3], src2[i + 3]);
+      resdst[i + 4] = _mm_aesdeclast_si128 (src1[i + 4], src2[i + 4]);
+      resdst[i + 5] = _mm_aesdeclast_si128 (src1[i + 5], src2[i + 5]);
+      resdst[i + 6] = _mm_aesdeclast_si128 (src1[i + 6], src2[i + 6]);
+      resdst[i + 7] = _mm_aesdeclast_si128 (src1[i + 7], src2[i + 7]);
+      resdst[i + 8] = _mm_aesdeclast_si128 (src1[i + 8], src2[i + 8]);
+      resdst[i + 9] = _mm_aesdeclast_si128 (src1[i + 9], src2[i + 9]);
+      resdst[i + 10] = _mm_aesdeclast_si128 (src1[i + 10], src2[i + 10]);
+      resdst[i + 11] = _mm_aesdeclast_si128 (src1[i + 11], src2[i + 11]);
+      resdst[i + 12] = _mm_aesdeclast_si128 (src1[i + 12], src2[i + 12]);
+      resdst[i + 13] = _mm_aesdeclast_si128 (src1[i + 13], src2[i + 13]);
+      resdst[i + 14] = _mm_aesdeclast_si128 (src1[i + 14], src2[i + 14]);
+      resdst[i + 15] = _mm_aesdeclast_si128 (src1[i + 15], src2[i + 15]);
+    }
+
+  for (i = 0; i < NUM; i++)
+    if (memcmp (edst + i, resdst + i, sizeof (__m128i)))
+      abort ();
+}
--- gcc/testsuite/gcc.target/i386/aesenc.c.aes	2008-04-03 12:20:45.000000000 -0700
+++ gcc/testsuite/gcc.target/i386/aesenc.c	2008-04-03 12:20:45.000000000 -0700
@@ -0,0 +1,68 @@
+/* { dg-do run } */
+/* { dg-require-effective-target aes } */
+/* { dg-options "-O2 -maes" } */
+
+#include <wmmintrin.h>
+#include <string.h>
+
+#include "aes-check.h"
+
+extern void abort (void);
+
+#define NUM 1024
+
+static __m128i src1[NUM];
+static __m128i src2[NUM];
+static __m128i edst[NUM];
+
+static __m128i resdst[NUM];
+
+/* Initialize input/output vectors.  (Currently, there is only one set
+   of input/output vectors).  */
+
+static void
+init_data (__m128i *s1, __m128i *s2, __m128i *d)
+{
+  int i;
+  for (i = 0; i < NUM; i++)
+    {
+      s1[i] = _mm_setr_epi32 (0x5d53475d, 0x63746f72,
+			      0x73745665, 0x7b5b5465);
+      s2[i] = _mm_setr_epi32 (0x726f6e5d, 0x5b477565,
+			      0x68617929, 0x48692853);
+      d[i] = _mm_setr_epi32 (0xded7e595, 0x8b104b58,
+			     0x9fdba3c5, 0xa8311c2f);
+    }
+}
+
+static void
+aes_test (void)
+{
+  int i;
+
+  init_data (src1, src2, edst);
+
+  for (i = 0; i < NUM; i += 16)
+    {
+      resdst[i] = _mm_aesenc_si128 (src1[i], src2[i]);
+      resdst[i + 1] = _mm_aesenc_si128 (src1[i + 1], src2[i + 1]);
+      resdst[i + 2] = _mm_aesenc_si128 (src1[i + 2], src2[i + 2]);
+      resdst[i + 3] = _mm_aesenc_si128 (src1[i + 3], src2[i + 3]);
+      resdst[i + 4] = _mm_aesenc_si128 (src1[i + 4], src2[i + 4]);
+      resdst[i + 5] = _mm_aesenc_si128 (src1[i + 5], src2[i + 5]);
+      resdst[i + 6] = _mm_aesenc_si128 (src1[i + 6], src2[i + 6]);
+      resdst[i + 7] = _mm_aesenc_si128 (src1[i + 7], src2[i + 7]);
+      resdst[i + 8] = _mm_aesenc_si128 (src1[i + 8], src2[i + 8]);
+      resdst[i + 9] = _mm_aesenc_si128 (src1[i + 9], src2[i + 9]);
+      resdst[i + 10] = _mm_aesenc_si128 (src1[i + 10], src2[i + 10]);
+      resdst[i + 11] = _mm_aesenc_si128 (src1[i + 11], src2[i + 11]);
+      resdst[i + 12] = _mm_aesenc_si128 (src1[i + 12], src2[i + 12]);
+      resdst[i + 13] = _mm_aesenc_si128 (src1[i + 13], src2[i + 13]);
+      resdst[i + 14] = _mm_aesenc_si128 (src1[i + 14], src2[i + 14]);
+      resdst[i + 15] = _mm_aesenc_si128 (src1[i + 15], src2[i + 15]);
+    }
+
+  for (i = 0; i < NUM; i++)
+    if (memcmp (edst + i, resdst + i, sizeof (__m128i)))
+      abort ();
+}
--- gcc/testsuite/gcc.target/i386/aesenclast.c.aes	2008-04-03 12:20:45.000000000 -0700
+++ gcc/testsuite/gcc.target/i386/aesenclast.c	2008-04-03 12:20:45.000000000 -0700
@@ -0,0 +1,68 @@
+/* { dg-do run } */
+/* { dg-require-effective-target aes } */
+/* { dg-options "-O2 -maes" } */
+
+#include <wmmintrin.h>
+#include <string.h>
+
+#include "aes-check.h"
+
+extern void abort (void);
+
+#define NUM 1024
+
+static __m128i src1[NUM];
+static __m128i src2[NUM];
+static __m128i edst[NUM];
+
+static __m128i resdst[NUM];
+
+/* Initialize input/output vectors.  (Currently, there is only one
+   set of input/output vectors).  */
+
+static void
+init_data (__m128i *s1, __m128i *s2, __m128i *d)
+{
+  int i;
+  for (i = 0; i < NUM; i++)
+    {
+      s1[i] = _mm_setr_epi32 (0x5d53475d, 0x63746f72,
+			      0x73745665, 0x7b5b5465);
+      s2[i] = _mm_setr_epi32 (0x726f6e5d, 0x5b477565,
+			      0x68617929, 0x48692853);
+      d[i] = _mm_setr_epi32 (0x53fdc611, 0x177ec425,
+			     0x938c5964, 0xc7fb881e);
+    }
+}
+
+static void
+aes_test (void)
+{
+  int i;
+
+  init_data (src1, src2, edst);
+
+  for (i = 0; i < NUM; i += 16)
+    {
+      resdst[i] = _mm_aesenclast_si128 (src1[i], src2[i]);
+      resdst[i + 1] = _mm_aesenclast_si128 (src1[i + 1], src2[i + 1]);
+      resdst[i + 2] = _mm_aesenclast_si128 (src1[i + 2], src2[i + 2]);
+      resdst[i + 3] = _mm_aesenclast_si128 (src1[i + 3], src2[i + 3]);
+      resdst[i + 4] = _mm_aesenclast_si128 (src1[i + 4], src2[i + 4]);
+      resdst[i + 5] = _mm_aesenclast_si128 (src1[i + 5], src2[i + 5]);
+      resdst[i + 6] = _mm_aesenclast_si128 (src1[i + 6], src2[i + 6]);
+      resdst[i + 7] = _mm_aesenclast_si128 (src1[i + 7], src2[i + 7]);
+      resdst[i + 8] = _mm_aesenclast_si128 (src1[i + 8], src2[i + 8]);
+      resdst[i + 9] = _mm_aesenclast_si128 (src1[i + 9], src2[i + 9]);
+      resdst[i + 10] = _mm_aesenclast_si128 (src1[i + 10], src2[i + 10]);
+      resdst[i + 11] = _mm_aesenclast_si128 (src1[i + 11], src2[i + 11]);
+      resdst[i + 12] = _mm_aesenclast_si128 (src1[i + 12], src2[i + 12]);
+      resdst[i + 13] = _mm_aesenclast_si128 (src1[i + 13], src2[i + 13]);
+      resdst[i + 14] = _mm_aesenclast_si128 (src1[i + 14], src2[i + 14]);
+      resdst[i + 15] = _mm_aesenclast_si128 (src1[i + 15], src2[i + 15]);
+    }
+
+  for (i = 0; i < NUM; i++)
+    if (memcmp(edst + i, resdst + i, sizeof (__m128i)))
+      abort ();
+}
--- gcc/testsuite/gcc.target/i386/aesimc.c.aes	2008-04-03 12:20:45.000000000 -0700
+++ gcc/testsuite/gcc.target/i386/aesimc.c	2008-04-03 12:20:45.000000000 -0700
@@ -0,0 +1,66 @@
+/* { dg-do run } */
+/* { dg-require-effective-target aes } */
+/* { dg-options "-O2 -maes" } */
+
+#include <wmmintrin.h>
+#include <string.h>
+
+#include "aes-check.h"
+
+extern void abort (void);
+
+#define NUM 1024
+
+static __m128i src1[NUM];
+static __m128i edst[NUM];
+
+static __m128i resdst[NUM];
+
+/* Initialize input/output vectors.  (Currently, there is only one set
+   of input/output vectors).   */
+
+static void
+init_data (__m128i *s1, __m128i *d)
+{
+  int i;
+
+  for (i = 0; i < NUM; i++)
+    {
+      s1[i] = _mm_setr_epi32 (0x5d53475d, 0x63746f72,
+			      0x73745665, 0x7b5b5465);
+      d[i] = _mm_setr_epi32 (0x81c3b3e5, 0x2b18330a,
+			     0x44b109c8, 0x627a6f66);
+    }
+}
+
+static void
+aes_test (void)
+{
+  int i;
+
+  init_data (src1, edst);
+
+  for (i = 0; i < NUM; i += 16)
+    {
+      resdst[i] = _mm_aesimc_si128 (src1[i]);
+      resdst[i + 1] = _mm_aesimc_si128 (src1[i + 1]);
+      resdst[i + 2] = _mm_aesimc_si128 (src1[i + 2]);
+      resdst[i + 3] = _mm_aesimc_si128 (src1[i + 3]);
+      resdst[i + 4] = _mm_aesimc_si128 (src1[i + 4]);
+      resdst[i + 5] = _mm_aesimc_si128 (src1[i + 5]);
+      resdst[i + 6] = _mm_aesimc_si128 (src1[i + 6]);
+      resdst[i + 7] = _mm_aesimc_si128 (src1[i + 7]);
+      resdst[i + 8] = _mm_aesimc_si128 (src1[i + 8]);
+      resdst[i + 9] = _mm_aesimc_si128 (src1[i + 9]);
+      resdst[i + 10] = _mm_aesimc_si128 (src1[i + 10]);
+      resdst[i + 11] = _mm_aesimc_si128 (src1[i + 11]);
+      resdst[i + 12] = _mm_aesimc_si128 (src1[i + 12]);
+      resdst[i + 13] = _mm_aesimc_si128 (src1[i + 13]);
+      resdst[i + 14] = _mm_aesimc_si128 (src1[i + 14]);
+      resdst[i + 15] = _mm_aesimc_si128 (src1[i + 15]);
+    }
+
+  for (i = 0; i < NUM; i++)
+    if (memcmp(edst + i, resdst + i, sizeof (__m128i)))
+      abort ();
+}
--- gcc/testsuite/gcc.target/i386/aeskeygenassist.c.aes	2008-04-03 12:20:45.000000000 -0700
+++ gcc/testsuite/gcc.target/i386/aeskeygenassist.c	2008-04-03 12:20:45.000000000 -0700
@@ -0,0 +1,66 @@
+/* { dg-do run } */
+/* { dg-require-effective-target aes } */
+/* { dg-options "-O2 -maes" } */
+
+#include <wmmintrin.h>
+#include <string.h>
+
+#include "aes-check.h"
+
+extern void abort (void);
+
+#define NUM 1024
+#define IMM8 1
+
+static __m128i src1[NUM];
+static __m128i edst[NUM];
+
+static __m128i resdst[NUM];
+
+/* Initialize input/output vectors.  (Currently, there is only one set
+   of input/output vectors).  */
+
+static void
+init_data (__m128i *s1, __m128i *d)
+{
+  int i;
+  for (i = 0; i < NUM; i++)
+    {
+      s1[i] = _mm_setr_epi32 (0x16157e2b, 0xa6d2ae28,
+			      0x8815f7ab, 0x3c4fcf09);
+      d[i] = _mm_setr_epi32 (0x24b5e434, 0x3424b5e5,
+			     0xeb848a01, 0x01eb848b);
+    }
+}
+
+static void
+aes_test (void)
+{
+  int i;
+
+  init_data (src1, edst);
+
+  for (i = 0; i < NUM; i += 16)
+    {
+      resdst[i]  = _mm_aeskeygenassist_si128 (src1[i], IMM8);
+      resdst[i + 1] = _mm_aeskeygenassist_si128 (src1[i + 1], IMM8);
+      resdst[i + 2] = _mm_aeskeygenassist_si128 (src1[i + 2], IMM8);
+      resdst[i + 3] = _mm_aeskeygenassist_si128 (src1[i + 3], IMM8);
+      resdst[i + 4] = _mm_aeskeygenassist_si128 (src1[i + 4], IMM8);
+      resdst[i + 5] = _mm_aeskeygenassist_si128 (src1[i + 5], IMM8);
+      resdst[i + 6] = _mm_aeskeygenassist_si128 (src1[i + 6], IMM8);
+      resdst[i + 7] = _mm_aeskeygenassist_si128 (src1[i + 7], IMM8);
+      resdst[i + 8] = _mm_aeskeygenassist_si128 (src1[i + 8], IMM8);
+      resdst[i + 9] = _mm_aeskeygenassist_si128 (src1[i + 9], IMM8);
+      resdst[i + 10] = _mm_aeskeygenassist_si128 (src1[i + 10], IMM8);
+      resdst[i + 11] = _mm_aeskeygenassist_si128 (src1[i + 11], IMM8);
+      resdst[i + 12] = _mm_aeskeygenassist_si128 (src1[i + 12], IMM8);
+      resdst[i + 13] = _mm_aeskeygenassist_si128 (src1[i + 13], IMM8);
+      resdst[i + 14] = _mm_aeskeygenassist_si128 (src1[i + 14], IMM8);
+      resdst[i + 15] = _mm_aeskeygenassist_si128 (src1[i + 15], IMM8);
+    }
+
+  for (i = 0; i < NUM; i++)
+    if (memcmp(edst + i, resdst + i, sizeof (__m128i)))
+      abort ();
+}
--- gcc/testsuite/gcc.target/i386/clmul-check.h.aes	2008-04-03 12:20:45.000000000 -0700
+++ gcc/testsuite/gcc.target/i386/clmul-check.h	2008-04-03 12:20:45.000000000 -0700
@@ -0,0 +1,30 @@
+#include <stdio.h>
+#include <stdlib.h>
+
+#include "cpuid.h"
+
+static void clmul_test (void);
+
+int
+main ()
+{
+  unsigned int eax, ebx, ecx, edx;
+ 
+  if (!__get_cpuid (1, &eax, &ebx, &ecx, &edx))
+    return 0;
+
+  /* Run PCLMULQDQ test only if host has PCLMULQDQ support.  */
+  if (ecx & bit_CLMUL)
+    {
+      clmul_test ();
+#ifdef DEBUG
+      printf ("PASSED\n");
+#endif
+    }
+#ifdef DEBUG
+  else
+    printf ("SKIPPED\n");
+#endif
+
+  return 0;
+}
--- gcc/testsuite/gcc.target/i386/i386.exp.aes	2008-01-07 17:31:32.000000000 -0800
+++ gcc/testsuite/gcc.target/i386/i386.exp	2008-04-03 12:20:45.000000000 -0700
@@ -51,6 +51,34 @@ proc check_effective_target_sse4 { } {
     } "-O2 -msse4.1" ]
 }
 
+# Return 1 if aes instructions can be compiled.
+proc check_effective_target_aes { } {
+    return [check_no_compiler_messages aes object {
+	typedef long long __m128i __attribute__ ((__vector_size__ (16)));
+	typedef long long __v2di __attribute__ ((__vector_size__ (16)));
+
+	__m128i _mm_aesimc_si128 (__m128i __X)
+	{
+	    return (__m128i) __builtin_ia32_aesimc128 ((__v2di)__X);
+	}
+    } "-O2 -maes" ]
+}
+
+# Return 1 if clmul instructions can be compiled.
+proc check_effective_target_clmul { } {
+    return [check_no_compiler_messages clmul object {
+	typedef long long __m128i __attribute__ ((__vector_size__ (16)));
+	typedef long long __v2di __attribute__ ((__vector_size__ (16)));
+
+	__m128i pclmulqdq_test (__m128i __X, __m128i __Y)
+	{
+	    return (__m128i) __builtin_ia32_pclmulqdq128 ((__v2di)__X,
+							  (__v2di)__Y,
+							  1);
+	}
+    } "-O2 -mclmul" ]
+}
+
 # Return 1 if sse4a instructions can be compiled.
 proc check_effective_target_sse4a { } {
     return [check_no_compiler_messages sse4a object {
--- gcc/testsuite/gcc.target/i386/pclmulqdq.c.aes	2008-04-03 12:20:45.000000000 -0700
+++ gcc/testsuite/gcc.target/i386/pclmulqdq.c	2008-04-03 12:20:45.000000000 -0700
@@ -0,0 +1,87 @@
+/* { dg-do run } */
+/* { dg-require-effective-target clmul } */
+/* { dg-options "-O2 -mclmul" } */
+
+#include <wmmintrin.h>
+#include <string.h>
+
+#include "clmul-check.h"
+
+extern void abort (void);
+
+#define NUM 1024
+
+static __m128i s1[NUM];
+static __m128i s2[NUM];
+/* We need this array to generate mem form of inst */
+static __m128i s2m[NUM];
+
+static __m128i e_00[NUM];
+static __m128i e_01[NUM];
+static __m128i e_10[NUM];
+static __m128i e_11[NUM];
+
+static __m128i d_00[NUM];
+static __m128i d_01[NUM];
+static __m128i d_10[NUM];
+static __m128i d_11[NUM];
+
+/* Initialize input/output vectors.  (Currently, there is only one set
+   of input/output vectors).  */
+static void
+init_data (__m128i *ls1,   __m128i *ls2, __m128i *le_00, __m128i *le_01,
+	   __m128i *le_10, __m128i *le_11)
+{
+  int i;
+
+  for (i = 0; i < NUM; i++)
+    {
+      ls1[i] = _mm_set_epi32 (0x7B5B5465, 0x73745665,
+			      0x63746F72, 0x5D53475D);
+      ls2[i] = _mm_set_epi32 (0x48692853, 0x68617929,
+			      0x5B477565, 0x726F6E5D);
+      s2m[i] = _mm_set_epi32 (0x48692853, 0x68617929,
+			      0x5B477565, 0x726F6E5D);
+      le_00[i] = _mm_set_epi32 (0x1D4D84C8, 0x5C3440C0,
+				0x929633D5, 0xD36F0451);
+      le_01[i] = _mm_set_epi32 (0x1A2BF6DB, 0x3A30862F,
+				0xBABF262D, 0xF4B7D5C9);
+      le_10[i] = _mm_set_epi32 (0x1BD17C8D, 0x556AB5A1,
+				0x7FA540AC, 0x2A281315);
+      le_11[i] = _mm_set_epi32 (0x1D1E1F2C, 0x592E7C45,
+				0xD66EE03E, 0x410FD4ED);
+    }
+}
+
+static void
+clmul_test (void)
+{
+  int i;
+
+  init_data (s1, s2, e_00, e_01, e_10, e_11);
+
+  for (i = 0; i < NUM; i += 2)
+    {
+      d_00[i] = _mm_clmulepi64_si128 (s1[i], s2m[i], 0x00);
+      d_01[i] = _mm_clmulepi64_si128 (s1[i], s2[i], 0x01);
+      d_10[i] = _mm_clmulepi64_si128 (s1[i], s2[i], 0x10);
+      d_11[i] = _mm_clmulepi64_si128 (s1[i], s2[i], 0x11);
+
+      d_11[i + 1] = _mm_clmulepi64_si128 (s1[i + 1], s2[i + 1], 0x11);
+      d_00[i + 1] = _mm_clmulepi64_si128 (s1[i + 1], s2[i + 1], 0x00);
+      d_10[i + 1] = _mm_clmulepi64_si128 (s1[i + 1], s2m[i + 1], 0x10);
+      d_01[i + 1] = _mm_clmulepi64_si128 (s1[i + 1], s2[i + 1], 0x01);
+    }
+
+  for (i = 0; i < NUM; i++)
+    {
+      if (memcmp (d_00 + i, e_00 + i, sizeof (__m128i)))
+	abort ();
+      if (memcmp (d_01 + i, e_01 + i, sizeof (__m128i)))
+	abort ();
+      if (memcmp (d_10 + i, e_10 + i, sizeof (__m128i)))
+	abort ();
+      if (memcmp(d_11 + i, e_11 + i, sizeof (__m128i)))
+	abort ();
+    }
+}
--- gcc/testsuite/gcc.target/i386/sse-13.c.aes	2008-03-28 13:03:18.000000000 -0700
+++ gcc/testsuite/gcc.target/i386/sse-13.c	2008-04-03 14:21:42.000000000 -0700
@@ -1,10 +1,10 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -m3dnow -msse4 -msse5" } */
+/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -m3dnow -msse5 -maes -mclmul" } */
 
 #include <mm_malloc.h>
 
 /* Test that the intrinsics compile with optimization.  All of them are
-   defined as inline functions in {,x,e,p,t,s,a,b}mmintrin.h and mm3dnow.h
+   defined as inline functions in {,x,e,p,t,s,w,a,b}mmintrin.h and mm3dnow.h
    that reference the proper builtin functions.  Defining away "extern" and
    "__inline" results in all of them being compiled as proper functions.  */
 
@@ -17,6 +17,10 @@
 #define __builtin_ia32_extrqi(X, I, L)  __builtin_ia32_extrqi(X, 1, 1)
 #define __builtin_ia32_insertqi(X, Y, I, L) __builtin_ia32_insertqi(X, Y, 1, 1)
 
+/* wmmintrin.h */
+#define __builtin_ia32_aeskeygenassist128(X, C) __builtin_ia32_aeskeygenassist128(X, 1)
+#define __builtin_ia32_pclmulqdq128(X, Y, I) __builtin_ia32_pclmulqdq128(X, Y, 1)
+
 /* smmintrin.h */
 #define __builtin_ia32_pblendw128(X, Y, M) __builtin_ia32_pblendw128 (X, Y, 1)
 #define __builtin_ia32_blendps(X, Y, M) __builtin_ia32_blendps(X, Y, 1)
@@ -95,5 +99,5 @@
 #define __builtin_ia32_protqi(A, B) __builtin_ia32_protqi(A,1)
 
 #include <bmmintrin.h>
-#include <smmintrin.h>
+#include <wmmintrin.h>
 #include <mm3dnow.h>
--- gcc/testsuite/gcc.target/i386/sse-14.c.aes	2008-03-28 13:03:18.000000000 -0700
+++ gcc/testsuite/gcc.target/i386/sse-14.c	2008-04-03 14:21:53.000000000 -0700
@@ -1,10 +1,10 @@
 /* { dg-do compile } */
-/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -m3dnow -msse4 -msse5" } */
+/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -m3dnow -msse5 -maes -mclmul" } */
 
 #include <mm_malloc.h>
 
 /* Test that the intrinsics compile without optimization.  All of them are
-   defined as inline functions in {,x,e,p,t,s,a,b}mmintrin.h  and mm3dnow.h
+   defined as inline functions in {,x,e,p,t,s,w,a,b}mmintrin.h  and mm3dnow.h
    that reference the proper builtin functions.  Defining away "extern" and
    "__inline" results in all of them being compiled as proper functions.  */
 
@@ -12,7 +12,7 @@
 #define __inline
 
 #include <bmmintrin.h>
-#include <smmintrin.h>
+#include <wmmintrin.h>
 #include <mm3dnow.h>
 
 #define _CONCAT(x,y) x ## y
@@ -46,6 +46,10 @@
 test_1x (_mm_extracti_si64, __m128i, __m128i, 1, 1)
 test_2x (_mm_inserti_si64, __m128i, __m128i, __m128i, 1, 1)
 
+/* wmmintrin.h */
+test_1 (_mm_aeskeygenassist_si128, __m128i, __m128i, 1)
+test_2 (_mm_clmulepi64_si128, __m128i, __m128i, __m128i, 1)
+
 /* smmintrin.h */
 test_2 (_mm_blend_epi16, __m128i, __m128i, __m128i, 1)
 test_2 (_mm_blend_ps, __m128, __m128, __m128, 1)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: PATCH: Enable Intel AES/CLMUL
  2008-04-03 18:53 PATCH: Enable Intel AES/CLMUL Uros Bizjak
@ 2008-04-03 22:00 ` H.J. Lu
  2008-04-03 23:17   ` H.J. Lu
  2008-04-04  6:35   ` Uros Bizjak
  0 siblings, 2 replies; 23+ messages in thread
From: H.J. Lu @ 2008-04-03 22:00 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: GCC Patches

On Thu, Apr 03, 2008 at 07:41:49PM +0200, Uros Bizjak wrote:
> Hello!
>
>> 	* config/i386/i386.c (OPTION_MASK_ISA_AES_SET): New.
>> 	(OPTION_MASK_ISA_CLMUL_SET): Likewise.
>> 	(OPTION_MASK_ISA_AES_UNSET): Likewise.
>> 	(OPTION_MASK_ISA_CLMUL_UNSET): Likewise.
>> 	(OPTION_MASK_ISA_SSE4_2_UNSET): Add OPTION_MASK_ISA_AES_UNSET
>> 	and OPTION_MASK_ISA_CLMUL_UNSET.
>
> I don't think that MASK_ISA is correct approach here, since MASK_ISA is 
> reserved for different SSE levels that siwtch on or off whole pack of 
> instructions. Perhaps we should do with:
>
> i386.h:
> #define TARGET_CLMUL      x86_aes
>
> where
>
> i386.opt:
> maes
> Target Report RejectNegative Var(x86_aes)
>
> We can even #define TARGET_CLMUL  (TARGET_SSE4_2 && x86_aes), depending on 
> how we want to enable generation of AES instructions. Please also note, 
> that we will use PTA_* flags in future for certain targets and we can 
> switch this flag depending on target features. So:
>
> gcc -msse -maes
>
> or
>
> gcc -mfancy_future_target
>
> will both switch AES instruction supports on.
>
> Regarding the tests, all new *intrin.h should also be added to 
> gcc.target/i386/sse-[13,14] to check that intrinsics compile with and 
> without optimization (I would also recommend to add the header to 
> g++.dg/other/i386-3.C)
>
>

Here is the updated patch. OK to install?

Thanks.


H.J.
-----
gcc/

2008-04-03  H.J. Lu  <hongjiu.lu@intel.com>

	* config.gcc (extra_headers): Add wmmintrin.h for x86 and x86-64.

	* config/i386/cpuid.h (bit_AES): New.
	(bit_CLMUL): Likewise.

	* config/i386/i386.c (pta_flags): Add PTA_AES and PTA_CLMUL.
	(override_options): Handle PTA_AES and PTA_CLMUL.  Enable
	SSE 4.2 if AES or CLMUL is enabled.
	(ix86_builtins): Add IX86_BUILTIN_AESENC128,
	IX86_BUILTIN_AESENCLAST128, IX86_BUILTIN_AESDEC128,
	IX86_BUILTIN_AESDECLAST128, IX86_BUILTIN_AESIMC128,
	IX86_BUILTIN_AESKEYGENASSIST128 and IX86_BUILTIN_PCLMULQDQ128.
	(bdesc_sse_3arg): Add IX86_BUILTIN_PCLMULQDQ128.
	(bdesc_2arg): Add IX86_BUILTIN_AESENC128,
	IX86_BUILTIN_AESENCLAST128, IX86_BUILTIN_AESDEC128,
	IX86_BUILTIN_AESDECLAST128 and IX86_BUILTIN_AESKEYGENASSIST128.
	(bdesc_1arg): Add IX86_BUILTIN_AESIMC128.
	(ix86_init_mmx_sse_builtins): Define __builtin_ia32_aesenc128,
	__builtin_ia32_aesenclast128, __builtin_ia32_aesdec128,
	__builtin_ia32_aesdeclast128,__builtin_ia32_aesimc128,
	__builtin_ia32_aeskeygenassist128 and
	__builtin_ia32_pclmulqdq128.
	* config/i386/i386.c (ix86_expand_binop_imm_builtin): New.
	(ix86_expand_builtin): Use it for IX86_BUILTIN_PSLLDQI128 and
	IX86_BUILTIN_PSRLDQI128.  Handle IX86_BUILTIN_AESKEYGENASSIST128.

	* config/i386/i386.h (TARGET_AES): New.
	(TARGET_CLMUL): Likewise.
	(TARGET_CPU_CPP_BUILTINS): Handle TARGET_AES and TARGET_CLMUL.

	* config/i386/i386.md (UNSPEC_AESENC): New.
	(UNSPEC_AESENCLAST): Likewise.
	(UNSPEC_AESDEC): Likewise.
	(UNSPEC_AESDECLAST): Likewise.
	(UNSPEC_AESIMC): Likewise.
	(UNSPEC_AESKEYGENASSIST): Likewise.
	(UNSPEC_PCLMULQDQ): Likewise.

	* config/i386/i386.opt (maes): New.
	(mclmul): Likewise.

	* config/i386/sse.md (aesenc): New pattern.
	(aesenclast): Likewise.
	(aesdec): Likewise.
	(aesdeclast): Likewise.
	(aesimc): Likewise.
	(aeskeygenassist): Likewise.
	(pclmulqdq): Likewise.

	* config/i386/wmmintrin.h: New.

	* doc/extend.texi: Document AES and CLMUL built-in function.

	* doc/invoke.texi: Document -maes and -mclmul.

gcc/testsuite/

2008-04-03  H.J. Lu  <hongjiu.lu@intel.com>

	* g++.dg/other/i386-2.C: Include <wmmintrin.h> instead of
	<smmintrin.h>.
	* g++.dg/other/i386-3.C: Likewise.
	* gcc.target/i386/sse-13.c: Likewise.
	* gcc.target/i386/sse-14.c: Likewise.

	* gcc.target/i386/aes-check.h: New.
	* gcc.target/i386/aesdec.c: Likewise.
	* gcc.target/i386/aesdeclast.c: Likewise.
	* gcc.target/i386/aesenc.c: Likewise.
	* gcc.target/i386/aesenclast.c: Likewise.
	* gcc.target/i386/aesimc.c: Likewise.
	* gcc.target/i386/aeskeygenassist.c: Likewise.
	* gcc.target/i386/pclmulqdq.c: Likewise.
	* gcc.target/i386/clmul-check.h: Likewise.

	* gcc.target/i386/i386.exp (check_effective_target_aes): New.
	(check_effective_target_clmul): Likewise.

--- gcc/config.gcc.aes	2008-04-03 12:20:45.000000000 -0700
+++ gcc/config.gcc	2008-04-03 12:20:45.000000000 -0700
@@ -309,13 +309,15 @@ i[34567]86-*-*)
 	cpu_type=i386
 	extra_headers="cpuid.h mmintrin.h mm3dnow.h xmmintrin.h emmintrin.h
 		       pmmintrin.h tmmintrin.h ammintrin.h smmintrin.h
-		       nmmintrin.h bmmintrin.h mmintrin-common.h"
+		       nmmintrin.h bmmintrin.h mmintrin-common.h
+		       wmmintrin.h"
 	;;
 x86_64-*-*)
 	cpu_type=i386
 	extra_headers="cpuid.h mmintrin.h mm3dnow.h xmmintrin.h emmintrin.h
 		       pmmintrin.h tmmintrin.h ammintrin.h smmintrin.h
-		       nmmintrin.h bmmintrin.h mmintrin-common.h"
+		       nmmintrin.h bmmintrin.h mmintrin-common.h
+		       wmmintrin.h"
 	need_64bit_hwint=yes
 	;;
 ia64-*-*)
--- gcc/config/i386/cpuid.h.aes	2008-02-25 09:57:37.000000000 -0800
+++ gcc/config/i386/cpuid.h	2008-04-03 12:20:45.000000000 -0700
@@ -33,11 +33,13 @@
 
 /* %ecx */
 #define bit_SSE3	(1 << 0)
+#define bit_CLMUL	(1 << 1)
 #define bit_SSSE3	(1 << 9)
 #define bit_CMPXCHG16B	(1 << 13)
 #define bit_SSE4_1	(1 << 19)
 #define bit_SSE4_2	(1 << 20)
 #define bit_POPCNT	(1 << 23)
+#define bit_AES		(1 << 25)
 
 /* %edx */
 #define bit_CMPXCHG8B	(1 << 8)
--- gcc/config/i386/i386.c.aes	2008-04-03 12:20:45.000000000 -0700
+++ gcc/config/i386/i386.c	2008-04-03 14:35:19.000000000 -0700
@@ -2078,7 +2078,9 @@ override_options (void)
       PTA_NO_SAHF = 1 << 13,
       PTA_SSE4_1 = 1 << 14,
       PTA_SSE4_2 = 1 << 15,
-      PTA_SSE5 = 1 << 16
+      PTA_SSE5 = 1 << 16,
+      PTA_AES = 1 << 17,
+      PTA_CLMUL = 1 << 18
     };
 
   static struct pta
@@ -2385,6 +2387,10 @@ override_options (void)
 	  x86_prefetch_sse = true;
 	if (!(TARGET_64BIT && (processor_alias_table[i].flags & PTA_NO_SAHF)))
 	  x86_sahf = true;
+	if (processor_alias_table[i].flags & PTA_AES)
+	  x86_aes = true;
+	if (processor_alias_table[i].flags & PTA_CLMUL)
+	  x86_clmul = true;
 
 	break;
       }
@@ -2428,6 +2434,14 @@ override_options (void)
   if (i == pta_size)
     error ("bad value (%s) for -mtune= switch", ix86_tune_string);
 
+  /* Enable SSE 4.2 if AES or CLMUL is enabled.  */
+  if ((x86_aes || x86_clmul)
+      && !(ix86_isa_flags_explicit & OPTION_MASK_ISA_SSE4_2))
+    {
+      ix86_isa_flags |= OPTION_MASK_ISA_SSE4_2_SET;
+      ix86_isa_flags_explicit |= OPTION_MASK_ISA_SSE4_2_SET;
+    }
+
   ix86_tune_mask = 1u << ix86_tune;
   for (i = 0; i < X86_TUNE_LAST; ++i)
     ix86_tune_features[i] &= ix86_tune_mask;
@@ -17626,6 +17640,17 @@ enum ix86_builtins
 
   IX86_BUILTIN_PCMPGTQ,
 
+  /* AES instructions */
+  IX86_BUILTIN_AESENC128,
+  IX86_BUILTIN_AESENCLAST128,
+  IX86_BUILTIN_AESDEC128,
+  IX86_BUILTIN_AESDECLAST128,
+  IX86_BUILTIN_AESIMC128,
+  IX86_BUILTIN_AESKEYGENASSIST128,
+
+  /* CLMUL instruction */
+  IX86_BUILTIN_PCLMULQDQ128,
+
   /* TFmode support builtins.  */
   IX86_BUILTIN_INFQ,
   IX86_BUILTIN_FABSQ,
@@ -17987,6 +18012,9 @@ static const struct builtin_description 
   { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_pblendw, "__builtin_ia32_pblendw128", IX86_BUILTIN_PBLENDW128, UNKNOWN, 0 },
   { OPTION_MASK_ISA_ROUND, CODE_FOR_sse4_1_roundsd, "__builtin_ia32_roundsd", IX86_BUILTIN_ROUNDSD, UNKNOWN, 0 },
   { OPTION_MASK_ISA_ROUND, CODE_FOR_sse4_1_roundss, "__builtin_ia32_roundss", IX86_BUILTIN_ROUNDSS, UNKNOWN, 0 },
+
+  /* CLMUL */
+  { OPTION_MASK_ISA_SSE4_2, CODE_FOR_pclmulqdq, 0, IX86_BUILTIN_PCLMULQDQ128, UNKNOWN, 0 },
 };
 
 static const struct builtin_description bdesc_2arg[] =
@@ -18267,6 +18295,13 @@ static const struct builtin_description 
 
   /* SSE4.2 */
   { OPTION_MASK_ISA_SSE4_2, CODE_FOR_sse4_2_gtv2di3, "__builtin_ia32_pcmpgtq", IX86_BUILTIN_PCMPGTQ, UNKNOWN, 0 },
+
+  /* AES */
+  { OPTION_MASK_ISA_SSE4_2, CODE_FOR_aesenc, 0, IX86_BUILTIN_AESENC128, UNKNOWN, 0 },
+  { OPTION_MASK_ISA_SSE4_2, CODE_FOR_aesenclast, 0, IX86_BUILTIN_AESENCLAST128, UNKNOWN, 0 },
+  { OPTION_MASK_ISA_SSE4_2, CODE_FOR_aesdec, 0, IX86_BUILTIN_AESDEC128, UNKNOWN, 0 },
+  { OPTION_MASK_ISA_SSE4_2, CODE_FOR_aesdeclast, 0, IX86_BUILTIN_AESDECLAST128, UNKNOWN, 0 },
+  { OPTION_MASK_ISA_SSE4_2, CODE_FOR_aeskeygenassist, 0, IX86_BUILTIN_AESKEYGENASSIST128, UNKNOWN, 0 },
 };
 
 static const struct builtin_description bdesc_1arg[] =
@@ -18344,6 +18379,9 @@ static const struct builtin_description 
   /* Fake 1 arg builtins with a constant smaller than 8 bits as the 2nd arg.  */
   { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_roundpd, 0, IX86_BUILTIN_ROUNDPD, UNKNOWN, 0 },
   { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_roundps, 0, IX86_BUILTIN_ROUNDPS, UNKNOWN, 0 },
+
+  /* AES */
+  { OPTION_MASK_ISA_SSE4_2, CODE_FOR_aesimc, 0, IX86_BUILTIN_AESIMC128, UNKNOWN, 0 },
 };
 
 /* SSE5 */
@@ -19580,6 +19618,25 @@ ix86_init_mmx_sse_builtins (void)
 				    NULL_TREE);
   def_builtin_const (OPTION_MASK_ISA_SSE4_2, "__builtin_ia32_crc32di", ftype, IX86_BUILTIN_CRC32DI);
 
+  /* AES */
+  if (TARGET_AES)
+    {
+      /* Define AES built-in functions only if AES is enabled.  */
+      def_builtin (OPTION_MASK_ISA_SSE4_2, "__builtin_ia32_aesenc128", v2di_ftype_v2di_v2di, IX86_BUILTIN_AESENC128);
+      def_builtin (OPTION_MASK_ISA_SSE4_2, "__builtin_ia32_aesenclast128", v2di_ftype_v2di_v2di, IX86_BUILTIN_AESENCLAST128);
+      def_builtin (OPTION_MASK_ISA_SSE4_2, "__builtin_ia32_aesdec128", v2di_ftype_v2di_v2di, IX86_BUILTIN_AESDEC128);
+      def_builtin (OPTION_MASK_ISA_SSE4_2, "__builtin_ia32_aesdeclast128", v2di_ftype_v2di_v2di, IX86_BUILTIN_AESDECLAST128);
+      def_builtin_const (OPTION_MASK_ISA_SSE4_2, "__builtin_ia32_aesimc128", v2di_ftype_v2di, IX86_BUILTIN_AESIMC128);
+      def_builtin_const (OPTION_MASK_ISA_SSE2, "__builtin_ia32_aeskeygenassist128", v2di_ftype_v2di_int, IX86_BUILTIN_AESKEYGENASSIST128);
+    }
+
+  /* CLMUL */
+  if (TARGET_CLMUL)
+    {
+      /* Define CLMUL built-in function only if CLMUL is enabled.  */
+      def_builtin (OPTION_MASK_ISA_SSE4_2, "__builtin_ia32_pclmulqdq128", v2di_ftype_v2di_v2di_int, IX86_BUILTIN_PCLMULQDQ128);
+    }
+
   /* AMDFAM10 SSE4A New built-ins  */
   def_builtin (OPTION_MASK_ISA_SSE4A, "__builtin_ia32_movntsd", void_ftype_pdouble_v2df, IX86_BUILTIN_MOVNTSD);
   def_builtin (OPTION_MASK_ISA_SSE4A, "__builtin_ia32_movntss", void_ftype_pfloat_v4sf, IX86_BUILTIN_MOVNTSS);
@@ -19860,6 +19917,44 @@ ix86_expand_crc32 (enum insn_code icode,
   return target;
 }
 
+/* Subroutine of ix86_expand_builtin to take care of binop insns
+   with an immediate.  */
+
+static rtx
+ix86_expand_binop_imm_builtin (enum insn_code icode, tree exp,
+				rtx target)
+{
+  rtx pat;
+  tree arg0 = CALL_EXPR_ARG (exp, 0);
+  tree arg1 = CALL_EXPR_ARG (exp, 1);
+  rtx op0 = expand_normal (arg0);
+  rtx op1 = expand_normal (arg1);
+  enum machine_mode tmode = insn_data[icode].operand[0].mode;
+  enum machine_mode mode0 = insn_data[icode].operand[1].mode;
+  enum machine_mode mode1 = insn_data[icode].operand[2].mode;
+
+  if (! (*insn_data[icode].operand[1].predicate) (op0, mode1))
+    {
+      op0 = copy_to_reg (op0);
+      op0 = simplify_gen_subreg (mode0, op0, GET_MODE (op0), 0);
+    }
+
+  if (! (*insn_data[icode].operand[2].predicate) (op1, mode1))
+    {
+      error ("the last operand must be an immediate");
+      return const0_rtx;
+    }
+
+  target = gen_reg_rtx (V2DImode);
+  pat = GEN_FCN (icode) (simplify_gen_subreg (tmode, target,
+					      V2DImode, 0),
+			 op0, op1);
+  if (! pat)
+    return 0;
+  emit_insn (pat);
+  return target;
+}
+
 /* Subroutine of ix86_expand_builtin to take care of binop insns.  */
 
 static rtx
@@ -20952,34 +21047,18 @@ ix86_expand_builtin (tree exp, rtx targe
       return target;
 
     case IX86_BUILTIN_PSLLDQI128:
+      return ix86_expand_binop_imm_builtin (CODE_FOR_sse2_ashlti3,
+					     exp, target);
+      break;
+
     case IX86_BUILTIN_PSRLDQI128:
-      icode = (fcode == IX86_BUILTIN_PSLLDQI128 ? CODE_FOR_sse2_ashlti3
-	       : CODE_FOR_sse2_lshrti3);
-      arg0 = CALL_EXPR_ARG (exp, 0);
-      arg1 = CALL_EXPR_ARG (exp, 1);
-      op0 = expand_normal (arg0);
-      op1 = expand_normal (arg1);
-      tmode = insn_data[icode].operand[0].mode;
-      mode1 = insn_data[icode].operand[1].mode;
-      mode2 = insn_data[icode].operand[2].mode;
+      return ix86_expand_binop_imm_builtin (CODE_FOR_sse2_lshrti3,
+					     exp, target);
+      break;
 
-      if (! (*insn_data[icode].operand[1].predicate) (op0, mode1))
-	{
-	  op0 = copy_to_reg (op0);
-	  op0 = simplify_gen_subreg (mode1, op0, GET_MODE (op0), 0);
-	}
-      if (! (*insn_data[icode].operand[2].predicate) (op1, mode2))
-	{
-	  error ("shift must be an immediate");
-	  return const0_rtx;
-	}
-      target = gen_reg_rtx (V2DImode);
-      pat = GEN_FCN (icode) (simplify_gen_subreg (tmode, target, V2DImode, 0),
-			     op0, op1);
-      if (! pat)
-	return 0;
-      emit_insn (pat);
-      return target;
+    case IX86_BUILTIN_AESKEYGENASSIST128:
+      return ix86_expand_binop_imm_builtin (CODE_FOR_aeskeygenassist,
+					     exp, target);
 
     case IX86_BUILTIN_FEMMS:
       emit_insn (gen_mmx_femms ());
--- gcc/config/i386/i386.h.aes	2008-04-03 12:20:45.000000000 -0700
+++ gcc/config/i386/i386.h	2008-04-03 13:13:28.000000000 -0700
@@ -395,6 +395,8 @@ extern int x86_prefetch_sse;
 #define TARGET_SAHF		x86_sahf
 #define TARGET_RECIP		x86_recip
 #define TARGET_FUSED_MADD	x86_fused_muladd
+#define TARGET_AES		(TARGET_SSE4_2 && x86_aes)
+#define TARGET_CLMUL		(TARGET_SSE4_2 && x86_clmul)
 
 #define ASSEMBLER_DIALECT	(ix86_asm_dialect)
 
@@ -683,6 +685,10 @@ extern const char *host_detect_local_cpu
 	builtin_define ("__SSE4_1__");				\
       if (TARGET_SSE4_2)					\
 	builtin_define ("__SSE4_2__");				\
+      if (TARGET_AES)						\
+	builtin_define ("__AES__");				\
+      if (TARGET_CLMUL)						\
+	builtin_define ("__CLMUL__");				\
       if (TARGET_SSE4A)						\
  	builtin_define ("__SSE4A__");		                \
       if (TARGET_SSE5)						\
--- gcc/config/i386/i386.md.aes	2008-04-03 12:20:45.000000000 -0700
+++ gcc/config/i386/i386.md	2008-04-03 12:20:45.000000000 -0700
@@ -186,6 +186,17 @@
    (UNSPEC_FRCZ			156)
    (UNSPEC_CVTPH2PS		157)
    (UNSPEC_CVTPS2PH		158)
+
+   ; For AES support
+   (UNSPEC_AESENC		159)
+   (UNSPEC_AESENCLAST		160)
+   (UNSPEC_AESDEC		161)
+   (UNSPEC_AESDECLAST		162)
+   (UNSPEC_AESIMC		163)
+   (UNSPEC_AESKEYGENASSIST	164)
+
+   ; For CLMUL support
+   (UNSPEC_CLMUL		165)
   ])
 
 (define_constants
--- gcc/config/i386/i386.opt.aes	2007-09-12 21:44:32.000000000 -0700
+++ gcc/config/i386/i386.opt	2008-04-03 13:14:09.000000000 -0700
@@ -275,3 +275,11 @@ Target Report Var(x86_fused_muladd) Init
 Enable automatic generation of fused floating point multiply-add instructions
 if the ISA supports such instructions.  The -mfused-madd option is on by
 default.
+
+maes
+Target Report RejectNegative Var(x86_aes)
+Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2 and AES built-in functions and code generation
+
+mclmul
+Target Report RejectNegative Var(x86_clmul)
+Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2 and CLMUL built-in functions and code generation
--- gcc/config/i386/sse.md.aes	2008-04-03 11:49:09.000000000 -0700
+++ gcc/config/i386/sse.md	2008-04-03 12:20:45.000000000 -0700
@@ -7897,3 +7897,80 @@
 }
   [(set_attr "type" "ssecmp")
    (set_attr "mode" "TI")])
+
+(define_insn "aesenc"
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
+	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0")
+		       (match_operand:V2DI 2 "nonimmediate_operand" "xm")]
+		      UNSPEC_AESENC))]
+  "TARGET_AES"
+  "aesenc\t{%2, %0|%0, %2}"
+  [(set_attr "type" "sselog1")
+   (set_attr "prefix_extra" "1")
+   (set_attr "mode" "TI")])
+
+(define_insn "aesenclast"
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
+	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0")
+		       (match_operand:V2DI 2 "nonimmediate_operand" "xm")]
+		      UNSPEC_AESENCLAST))]
+  "TARGET_AES"
+  "aesenclast\t{%2, %0|%0, %2}"
+  [(set_attr "type" "sselog1")
+   (set_attr "prefix_extra" "1")
+   (set_attr "mode" "TI")])
+
+(define_insn "aesdec"
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
+	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0")
+		       (match_operand:V2DI 2 "nonimmediate_operand" "xm")]
+		      UNSPEC_AESDEC))]
+  "TARGET_AES"
+  "aesdec\t{%2, %0|%0, %2}"
+  [(set_attr "type" "sselog1")
+   (set_attr "prefix_extra" "1")
+   (set_attr "mode" "TI")])
+
+(define_insn "aesdeclast"
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
+	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0")
+		       (match_operand:V2DI 2 "nonimmediate_operand" "xm")]
+		      UNSPEC_AESDECLAST))]
+  "TARGET_AES"
+  "aesdeclast\t{%2, %0|%0, %2}"
+  [(set_attr "type" "sselog1")
+   (set_attr "prefix_extra" "1")
+   (set_attr "mode" "TI")])
+
+(define_insn "aesimc"
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
+	(unspec:V2DI [(match_operand:V2DI 1 "nonimmediate_operand" "xm")]
+		      UNSPEC_AESIMC))]
+  "TARGET_AES"
+  "aesimc\t{%1, %0|%0, %1}"
+  [(set_attr "type" "sselog1")
+   (set_attr "prefix_extra" "1")
+   (set_attr "mode" "TI")])
+
+(define_insn "aeskeygenassist"
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
+	(unspec:V2DI [(match_operand:V2DI 1 "nonimmediate_operand" "xm")
+		      (match_operand:SI 2 "const_0_to_255_operand" "n")]
+		     UNSPEC_AESKEYGENASSIST))]
+  "TARGET_AES"
+  "aeskeygenassist\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "sselog1")
+   (set_attr "prefix_extra" "1")
+   (set_attr "mode" "TI")])
+
+(define_insn "pclmulqdq"
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
+	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0")
+		      (match_operand:V2DI 2 "nonimmediate_operand" "xm")
+		      (match_operand:SI 3 "const_0_to_255_operand" "n")]
+		     UNSPEC_CLMUL))]
+  "TARGET_CLMUL"
+  "pclmulqdq\t{%3, %2, %0|%0, %2, %3}"
+  [(set_attr "type" "sselog1")
+   (set_attr "prefix_extra" "1")
+   (set_attr "mode" "TI")])
--- gcc/config/i386/wmmintrin.h.aes	2008-04-03 12:20:45.000000000 -0700
+++ gcc/config/i386/wmmintrin.h	2008-04-03 14:39:31.000000000 -0700
@@ -0,0 +1,124 @@
+/* Copyright (C) 2008 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 2, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING.  If not, write to
+   the Free Software Foundation, 59 Temple Place - Suite 330,
+   Boston, MA 02111-1307, USA.  */
+
+/* As a special exception, if you include this header file into source
+   files compiled by GCC, this header file does not by itself cause
+   the resulting executable to be covered by the GNU General Public
+   License.  This exception does not however invalidate any other
+   reasons why the executable file might be covered by the GNU General
+   Public License.  */
+
+/* Implemented from the specification included in the Intel C++ Compiler
+   User Guide and Reference, version 10.1.  */
+
+#ifndef _WMMINTRIN_H_INCLUDED
+#define _WMMINTRIN_H_INCLUDED
+
+#if !defined (__AES__) && !defined (__CLMUL__)
+# error "AES/CLMUL instruction set not enabled"
+#else
+
+/* We need definitions from the SSE4, SSSE3, SSE3, SSE2 and SSE header
+   files.  */
+#include <smmintrin.h>
+
+/* AES */
+
+#ifdef __AES__
+/* Performs 1 round of AES decryption of the first m128i using 
+   the second m128i as a round key.  */
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_aesdec_si128 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) __builtin_ia32_aesdec128 ((__v2di)__X, (__v2di)__Y);
+}
+
+/* Performs the last round of AES decryption of the first m128i 
+   using the second m128i as a round key.  */
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_aesdeclast_si128 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) __builtin_ia32_aesdeclast128 ((__v2di)__X,
+						 (__v2di)__Y);
+}
+
+/* Performs 1 round of AES encryption of the first m128i using 
+   the second m128i as a round key.  */
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_aesenc_si128 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) __builtin_ia32_aesenc128 ((__v2di)__X, (__v2di)__Y);
+}
+
+/* Performs the last round of AES encryption of the first m128i
+   using the second m128i as a round key.  */
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_aesenclast_si128 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) __builtin_ia32_aesenclast128 ((__v2di)__X, (__v2di)__Y);
+}
+
+/* Performs the InverseMixColumn operation on the source m128i 
+   and stores the result into m128i destination.  */
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_aesimc_si128 (__m128i __X)
+{
+  return (__m128i) __builtin_ia32_aesimc128 ((__v2di)__X);
+}
+
+/* Generates a m128i round key for the input m128i AES cipher key and
+   byte round constant.  The second parameter must be a compile time
+   constant.  */
+#ifdef __OPTIMIZE__
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_aeskeygenassist_si128 (__m128i __X, const int __C)
+{
+  return (__m128i) __builtin_ia32_aeskeygenassist128 ((__v2di)__X, __C);
+}
+#else
+#define _mm_aeskeygenassist_si128(X, C)					\
+  ((__m128i) __builtin_ia32_aeskeygenassist128 ((__v2di)(__m128i)(X),	\
+						(int)(C)))
+#endif
+#endif  /* __AES__ */
+
+/* CLMUL */
+
+#ifdef __CLMUL__
+/* Performs carry-less integer multiplication of 64-bit halves of
+   128-bit input operands.  The third parameter inducates which 64-bit
+   haves of the input parameters v1 and v2 should be used. It must be
+   a compile time constant.  */
+#ifdef __OPTIMIZE__
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_clmulepi64_si128 (__m128i __X, __m128i __Y, const int __I)
+{
+  return (__m128i) __builtin_ia32_pclmulqdq128 ((__v2di)__X,
+						(__v2di)__Y, __I);
+}
+#else
+#define _mm_clmulepi64_si128(X, Y, I)					\
+  ((__m128i) __builtin_ia32_pclmulqdq128 ((__v2di)(__m128i)(X),		\
+					  (__v2di)(__m128i)(Y), (int)(I)))
+#endif
+#endif  /* __CLMUL__  */
+
+#endif /* __AES__/__CLMUL__ */
+
+#endif /* _WMMINTRIN_H_INCLUDED */
--- gcc/doc/extend.texi.aes	2008-03-28 13:03:17.000000000 -0700
+++ gcc/doc/extend.texi	2008-04-03 12:20:45.000000000 -0700
@@ -8013,6 +8013,27 @@ depending on the size of @code{unsigned 
 Generates the @code{popcntq} machine instruction.
 @end table
 
+The following built-in functions are available when @option{-maes} is
+used.  All of them generate the machine instruction that is part of the
+name.
+
+@smallexample
+v2di __builtin_ia32_aesenc128 (v2di, v2di)
+v2di __builtin_ia32_aesenclast128 (v2di, v2di)
+v2di __builtin_ia32_aesdec128 (v2di, v2di)
+v2di __builtin_ia32_aesdeclast128 (v2di, v2di)
+v2di __builtin_ia32_aeskeygenassist128 (v2di, const int)
+v2di __builtin_ia32_aesimc128 (v2di)
+@end smallexample
+
+The following built-in function is available when @option{-mclmul} is
+used.
+
+@table @code
+@item v2di __builtin_ia32_pclmulqdq128 (v2di, v2di, const int)
+Generates the @code{pclmulqdq} machine instruction.
+@end table
+
 The following built-in functions are available when @option{-msse4a} is used.
 All of them generate the machine instruction that is part of the name.
 
--- gcc/doc/invoke.texi.aes	2008-04-03 12:20:45.000000000 -0700
+++ gcc/doc/invoke.texi	2008-04-03 12:20:45.000000000 -0700
@@ -555,6 +555,7 @@ Objective-C and Objective-C++ Dialects}.
 -mno-wide-multiply  -mrtd  -malign-double @gol
 -mpreferred-stack-boundary=@var{num} -mcx16 -msahf -mrecip @gol
 -mmmx  -msse  -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msse4 @gol
+-maes -mclmul @gol
 -msse4a -m3dnow -mpopcnt -mabm -msse5 @gol
 -mthreads  -mno-align-stringops  -minline-all-stringops @gol
 -mpush-args  -maccumulate-outgoing-args  -m128bit-long-double @gol
@@ -10720,6 +10721,10 @@ preferred alignment to @option{-mpreferr
 @itemx -mno-sse4.2
 @itemx -msse4
 @itemx -mno-sse4
+@itemx -maes
+@itemx -mno-aes
+@itemx -mclmul
+@itemx -mno-clmul
 @itemx -msse4a
 @itemx -mno-sse4a
 @itemx -msse5
@@ -10737,8 +10742,8 @@ preferred alignment to @option{-mpreferr
 @opindex m3dnow
 @opindex mno-3dnow
 These switches enable or disable the use of instructions in the MMX,
-SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4A, SSE5, ABM or 3DNow!@: extended
-instruction sets.
+SSE, SSE2, SSE3, SSSE3, SSE4.1, AES, CLMUL, SSE4A, SSE5, ABM or
+3DNow!@: extended instruction sets.
 These extensions are also available as built-in functions: see
 @ref{X86 Built-in Functions}, for details of the functions enabled and
 disabled by these switches.
--- gcc/testsuite/g++.dg/other/i386-2.C.aes	2007-12-15 14:50:08.000000000 -0800
+++ gcc/testsuite/g++.dg/other/i386-2.C	2008-04-03 14:21:06.000000000 -0700
@@ -1,10 +1,10 @@
-/* Test that {,x,e,p,t,s,a,b}mmintrin.h, mm3dnow.h and mm_malloc.h are
+/* Test that {,x,e,p,t,s,w,a,b}mmintrin.h, mm3dnow.h and mm_malloc.h are
    usable with -O -pedantic-errors.  */
 /* { dg-do compile { target i?86-*-* x86_64-*-* } } */
-/* { dg-options "-O -pedantic-errors -march=k8 -m3dnow -msse4 -msse5" } */
+/* { dg-options "-O -pedantic-errors -march=k8 -m3dnow -msse5 -maes -mclmul" } */
 
 #include <bmmintrin.h>
-#include <smmintrin.h>
+#include <wmmintrin.h>
 #include <mm3dnow.h>
 
 int dummy;
--- gcc/testsuite/g++.dg/other/i386-3.C.aes	2008-03-17 06:44:39.000000000 -0700
+++ gcc/testsuite/g++.dg/other/i386-3.C	2008-04-03 14:21:19.000000000 -0700
@@ -1,8 +1,8 @@
-/* Test that {,x,e,p,t,s,a,b}mmintrin.h, mm3dnow.h and mm_malloc.h are
+/* Test that {,x,e,p,t,s,w,a,b}mmintrin.h, mm3dnow.h and mm_malloc.h are
    usable with -O -fkeep-inline-functions.  */
 /* { dg-do compile { target i?86-*-* x86_64-*-* } } */
-/* { dg-options "-O -fkeep-inline-functions -march=k8 -m3dnow -msse4 -msse5" } */
+/* { dg-options "-O -fkeep-inline-functions -march=k8 -m3dnow -maes -mclmul -msse5" } */
 
 #include <bmmintrin.h>
-#include <smmintrin.h>
+#include <wmmintrin.h>
 #include <mm3dnow.h>
--- gcc/testsuite/gcc.target/i386/aes-check.h.aes	2008-04-03 12:20:45.000000000 -0700
+++ gcc/testsuite/gcc.target/i386/aes-check.h	2008-04-03 12:20:45.000000000 -0700
@@ -0,0 +1,30 @@
+#include <stdio.h>
+#include <stdlib.h>
+
+#include "cpuid.h"
+
+static void aes_test (void);
+
+int
+main ()
+{
+  unsigned int eax, ebx, ecx, edx;
+ 
+  if (!__get_cpuid (1, &eax, &ebx, &ecx, &edx))
+    return 0;
+
+  /* Run AES test only if host has AES support.  */
+  if (ecx & bit_AES)
+    {
+      aes_test ();
+#ifdef DEBUG
+    printf ("PASSED\n");
+#endif
+    }
+#ifdef DEBUG
+  else
+    printf ("SKIPPED\n");
+#endif
+
+  return 0;
+}
--- gcc/testsuite/gcc.target/i386/aesdec.c.aes	2008-04-03 12:20:45.000000000 -0700
+++ gcc/testsuite/gcc.target/i386/aesdec.c	2008-04-03 12:20:45.000000000 -0700
@@ -0,0 +1,67 @@
+/* { dg-do run } */
+/* { dg-require-effective-target aes } */
+/* { dg-options "-O2 -maes" } */
+
+#include <wmmintrin.h>
+#include <string.h>
+
+#include "aes-check.h"
+
+extern void abort (void);
+
+#define NUM 1024
+
+static __m128i src1[NUM];
+static __m128i src2[NUM];
+static __m128i edst[NUM];
+
+static __m128i resdst[NUM];
+
+/* Initialize input/output vectors.  (Currently, there is only one set
+   of input/output vectors).  */
+static void
+init_data (__m128i *s1, __m128i *s2, __m128i *d)
+{
+  int i;
+  for (i = 0; i < NUM; i++)
+    {
+      s1[i] = _mm_setr_epi32 (0x5d53475d, 0x63746f72,
+			      0x73745665, 0x7b5b5465);
+      s2[i] = _mm_setr_epi32 (0x726f6e5d, 0x5b477565,
+			      0x68617929, 0x48692853);
+      d[i]  = _mm_setr_epi32 (0xb730392a, 0xb58eb95e,
+			      0xfaea2787, 0x138ac342);
+    }
+}
+
+static void
+aes_test (void)
+{
+  int i;
+
+  init_data (src1, src2, edst);
+
+  for (i = 0; i < NUM; i += 16)
+    {
+      resdst[i] = _mm_aesdec_si128 (src1[i], src2[i]);
+      resdst[i + 1] = _mm_aesdec_si128 (src1[i + 1], src2[i + 1]);
+      resdst[i + 2] = _mm_aesdec_si128 (src1[i + 2], src2[i + 2]);
+      resdst[i + 3] = _mm_aesdec_si128 (src1[i + 3], src2[i + 3]);
+      resdst[i + 4] = _mm_aesdec_si128 (src1[i + 4], src2[i + 4]);
+      resdst[i + 5] = _mm_aesdec_si128 (src1[i + 5], src2[i + 5]);
+      resdst[i + 6] = _mm_aesdec_si128 (src1[i + 6], src2[i + 6]);
+      resdst[i + 7] = _mm_aesdec_si128 (src1[i + 7], src2[i + 7]);
+      resdst[i + 8] = _mm_aesdec_si128 (src1[i + 8], src2[i + 8]);
+      resdst[i + 9] = _mm_aesdec_si128 (src1[i + 9], src2[i + 9]);
+      resdst[i + 10] = _mm_aesdec_si128 (src1[i + 10], src2[i + 10]);
+      resdst[i + 11] = _mm_aesdec_si128 (src1[i + 11], src2[i + 11]);
+      resdst[i + 12] = _mm_aesdec_si128 (src1[i + 12], src2[i + 12]);
+      resdst[i + 13] = _mm_aesdec_si128 (src1[i + 13], src2[i + 13]);
+      resdst[i + 14] = _mm_aesdec_si128 (src1[i + 14], src2[i + 14]);
+      resdst[i + 15] = _mm_aesdec_si128 (src1[i + 15], src2[i + 15]);
+    }
+
+  for (i = 0; i < NUM; i++)
+    if (memcmp (edst + i, resdst + i, sizeof (__m128i)))
+      abort ();
+}
--- gcc/testsuite/gcc.target/i386/aesdeclast.c.aes	2008-04-03 12:20:45.000000000 -0700
+++ gcc/testsuite/gcc.target/i386/aesdeclast.c	2008-04-03 12:20:45.000000000 -0700
@@ -0,0 +1,69 @@
+/* { dg-do run } */
+/* { dg-require-effective-target aes } */
+/* { dg-options "-O2 -maes" } */
+
+#include <wmmintrin.h>
+#include <string.h>
+
+#include "aes-check.h"
+
+extern void abort (void);
+
+#define NUM 1024
+
+static __m128i src1[NUM];
+static __m128i src2[NUM];
+static __m128i edst[NUM];
+
+static __m128i resdst[NUM];
+
+/* Initialize input/output vectors.  (Currently, there is only one set of
+   input/output vectors).  */
+
+static void
+init_data (__m128i *s1, __m128i *s2, __m128i *d)
+{
+  int i;
+
+  for (i = 0; i < NUM; i++)
+    {
+      s1[i] = _mm_setr_epi32 (0x5d53475d, 0x63746f72,
+			      0x73745665, 0x7b5b5465);
+      s2[i] = _mm_setr_epi32 (0x726f6e5d, 0x5b477565,
+			      0x68617929, 0x48692853);
+      d[i] = _mm_setr_epi32 (0x72a593d0, 0xd410637b,
+			     0x6b317f95, 0xc5a391ef);
+    }
+}
+
+static void
+aes_test (void)
+{
+  int i;
+
+  init_data (src1, src2, edst);
+
+  for (i = 0; i < NUM; i += 16)
+    {
+      resdst[i] = _mm_aesdeclast_si128 (src1[i], src2[i]);
+      resdst[i + 1] = _mm_aesdeclast_si128 (src1[i + 1], src2[i + 1]);
+      resdst[i + 2] = _mm_aesdeclast_si128 (src1[i + 2], src2[i + 2]);
+      resdst[i + 3] = _mm_aesdeclast_si128 (src1[i + 3], src2[i + 3]);
+      resdst[i + 4] = _mm_aesdeclast_si128 (src1[i + 4], src2[i + 4]);
+      resdst[i + 5] = _mm_aesdeclast_si128 (src1[i + 5], src2[i + 5]);
+      resdst[i + 6] = _mm_aesdeclast_si128 (src1[i + 6], src2[i + 6]);
+      resdst[i + 7] = _mm_aesdeclast_si128 (src1[i + 7], src2[i + 7]);
+      resdst[i + 8] = _mm_aesdeclast_si128 (src1[i + 8], src2[i + 8]);
+      resdst[i + 9] = _mm_aesdeclast_si128 (src1[i + 9], src2[i + 9]);
+      resdst[i + 10] = _mm_aesdeclast_si128 (src1[i + 10], src2[i + 10]);
+      resdst[i + 11] = _mm_aesdeclast_si128 (src1[i + 11], src2[i + 11]);
+      resdst[i + 12] = _mm_aesdeclast_si128 (src1[i + 12], src2[i + 12]);
+      resdst[i + 13] = _mm_aesdeclast_si128 (src1[i + 13], src2[i + 13]);
+      resdst[i + 14] = _mm_aesdeclast_si128 (src1[i + 14], src2[i + 14]);
+      resdst[i + 15] = _mm_aesdeclast_si128 (src1[i + 15], src2[i + 15]);
+    }
+
+  for (i = 0; i < NUM; i++)
+    if (memcmp (edst + i, resdst + i, sizeof (__m128i)))
+      abort ();
+}
--- gcc/testsuite/gcc.target/i386/aesenc.c.aes	2008-04-03 12:20:45.000000000 -0700
+++ gcc/testsuite/gcc.target/i386/aesenc.c	2008-04-03 12:20:45.000000000 -0700
@@ -0,0 +1,68 @@
+/* { dg-do run } */
+/* { dg-require-effective-target aes } */
+/* { dg-options "-O2 -maes" } */
+
+#include <wmmintrin.h>
+#include <string.h>
+
+#include "aes-check.h"
+
+extern void abort (void);
+
+#define NUM 1024
+
+static __m128i src1[NUM];
+static __m128i src2[NUM];
+static __m128i edst[NUM];
+
+static __m128i resdst[NUM];
+
+/* Initialize input/output vectors.  (Currently, there is only one set
+   of input/output vectors).  */
+
+static void
+init_data (__m128i *s1, __m128i *s2, __m128i *d)
+{
+  int i;
+  for (i = 0; i < NUM; i++)
+    {
+      s1[i] = _mm_setr_epi32 (0x5d53475d, 0x63746f72,
+			      0x73745665, 0x7b5b5465);
+      s2[i] = _mm_setr_epi32 (0x726f6e5d, 0x5b477565,
+			      0x68617929, 0x48692853);
+      d[i] = _mm_setr_epi32 (0xded7e595, 0x8b104b58,
+			     0x9fdba3c5, 0xa8311c2f);
+    }
+}
+
+static void
+aes_test (void)
+{
+  int i;
+
+  init_data (src1, src2, edst);
+
+  for (i = 0; i < NUM; i += 16)
+    {
+      resdst[i] = _mm_aesenc_si128 (src1[i], src2[i]);
+      resdst[i + 1] = _mm_aesenc_si128 (src1[i + 1], src2[i + 1]);
+      resdst[i + 2] = _mm_aesenc_si128 (src1[i + 2], src2[i + 2]);
+      resdst[i + 3] = _mm_aesenc_si128 (src1[i + 3], src2[i + 3]);
+      resdst[i + 4] = _mm_aesenc_si128 (src1[i + 4], src2[i + 4]);
+      resdst[i + 5] = _mm_aesenc_si128 (src1[i + 5], src2[i + 5]);
+      resdst[i + 6] = _mm_aesenc_si128 (src1[i + 6], src2[i + 6]);
+      resdst[i + 7] = _mm_aesenc_si128 (src1[i + 7], src2[i + 7]);
+      resdst[i + 8] = _mm_aesenc_si128 (src1[i + 8], src2[i + 8]);
+      resdst[i + 9] = _mm_aesenc_si128 (src1[i + 9], src2[i + 9]);
+      resdst[i + 10] = _mm_aesenc_si128 (src1[i + 10], src2[i + 10]);
+      resdst[i + 11] = _mm_aesenc_si128 (src1[i + 11], src2[i + 11]);
+      resdst[i + 12] = _mm_aesenc_si128 (src1[i + 12], src2[i + 12]);
+      resdst[i + 13] = _mm_aesenc_si128 (src1[i + 13], src2[i + 13]);
+      resdst[i + 14] = _mm_aesenc_si128 (src1[i + 14], src2[i + 14]);
+      resdst[i + 15] = _mm_aesenc_si128 (src1[i + 15], src2[i + 15]);
+    }
+
+  for (i = 0; i < NUM; i++)
+    if (memcmp (edst + i, resdst + i, sizeof (__m128i)))
+      abort ();
+}
--- gcc/testsuite/gcc.target/i386/aesenclast.c.aes	2008-04-03 12:20:45.000000000 -0700
+++ gcc/testsuite/gcc.target/i386/aesenclast.c	2008-04-03 12:20:45.000000000 -0700
@@ -0,0 +1,68 @@
+/* { dg-do run } */
+/* { dg-require-effective-target aes } */
+/* { dg-options "-O2 -maes" } */
+
+#include <wmmintrin.h>
+#include <string.h>
+
+#include "aes-check.h"
+
+extern void abort (void);
+
+#define NUM 1024
+
+static __m128i src1[NUM];
+static __m128i src2[NUM];
+static __m128i edst[NUM];
+
+static __m128i resdst[NUM];
+
+/* Initialize input/output vectors.  (Currently, there is only one
+   set of input/output vectors).  */
+
+static void
+init_data (__m128i *s1, __m128i *s2, __m128i *d)
+{
+  int i;
+  for (i = 0; i < NUM; i++)
+    {
+      s1[i] = _mm_setr_epi32 (0x5d53475d, 0x63746f72,
+			      0x73745665, 0x7b5b5465);
+      s2[i] = _mm_setr_epi32 (0x726f6e5d, 0x5b477565,
+			      0x68617929, 0x48692853);
+      d[i] = _mm_setr_epi32 (0x53fdc611, 0x177ec425,
+			     0x938c5964, 0xc7fb881e);
+    }
+}
+
+static void
+aes_test (void)
+{
+  int i;
+
+  init_data (src1, src2, edst);
+
+  for (i = 0; i < NUM; i += 16)
+    {
+      resdst[i] = _mm_aesenclast_si128 (src1[i], src2[i]);
+      resdst[i + 1] = _mm_aesenclast_si128 (src1[i + 1], src2[i + 1]);
+      resdst[i + 2] = _mm_aesenclast_si128 (src1[i + 2], src2[i + 2]);
+      resdst[i + 3] = _mm_aesenclast_si128 (src1[i + 3], src2[i + 3]);
+      resdst[i + 4] = _mm_aesenclast_si128 (src1[i + 4], src2[i + 4]);
+      resdst[i + 5] = _mm_aesenclast_si128 (src1[i + 5], src2[i + 5]);
+      resdst[i + 6] = _mm_aesenclast_si128 (src1[i + 6], src2[i + 6]);
+      resdst[i + 7] = _mm_aesenclast_si128 (src1[i + 7], src2[i + 7]);
+      resdst[i + 8] = _mm_aesenclast_si128 (src1[i + 8], src2[i + 8]);
+      resdst[i + 9] = _mm_aesenclast_si128 (src1[i + 9], src2[i + 9]);
+      resdst[i + 10] = _mm_aesenclast_si128 (src1[i + 10], src2[i + 10]);
+      resdst[i + 11] = _mm_aesenclast_si128 (src1[i + 11], src2[i + 11]);
+      resdst[i + 12] = _mm_aesenclast_si128 (src1[i + 12], src2[i + 12]);
+      resdst[i + 13] = _mm_aesenclast_si128 (src1[i + 13], src2[i + 13]);
+      resdst[i + 14] = _mm_aesenclast_si128 (src1[i + 14], src2[i + 14]);
+      resdst[i + 15] = _mm_aesenclast_si128 (src1[i + 15], src2[i + 15]);
+    }
+
+  for (i = 0; i < NUM; i++)
+    if (memcmp(edst + i, resdst + i, sizeof (__m128i)))
+      abort ();
+}
--- gcc/testsuite/gcc.target/i386/aesimc.c.aes	2008-04-03 12:20:45.000000000 -0700
+++ gcc/testsuite/gcc.target/i386/aesimc.c	2008-04-03 12:20:45.000000000 -0700
@@ -0,0 +1,66 @@
+/* { dg-do run } */
+/* { dg-require-effective-target aes } */
+/* { dg-options "-O2 -maes" } */
+
+#include <wmmintrin.h>
+#include <string.h>
+
+#include "aes-check.h"
+
+extern void abort (void);
+
+#define NUM 1024
+
+static __m128i src1[NUM];
+static __m128i edst[NUM];
+
+static __m128i resdst[NUM];
+
+/* Initialize input/output vectors.  (Currently, there is only one set
+   of input/output vectors).   */
+
+static void
+init_data (__m128i *s1, __m128i *d)
+{
+  int i;
+
+  for (i = 0; i < NUM; i++)
+    {
+      s1[i] = _mm_setr_epi32 (0x5d53475d, 0x63746f72,
+			      0x73745665, 0x7b5b5465);
+      d[i] = _mm_setr_epi32 (0x81c3b3e5, 0x2b18330a,
+			     0x44b109c8, 0x627a6f66);
+    }
+}
+
+static void
+aes_test (void)
+{
+  int i;
+
+  init_data (src1, edst);
+
+  for (i = 0; i < NUM; i += 16)
+    {
+      resdst[i] = _mm_aesimc_si128 (src1[i]);
+      resdst[i + 1] = _mm_aesimc_si128 (src1[i + 1]);
+      resdst[i + 2] = _mm_aesimc_si128 (src1[i + 2]);
+      resdst[i + 3] = _mm_aesimc_si128 (src1[i + 3]);
+      resdst[i + 4] = _mm_aesimc_si128 (src1[i + 4]);
+      resdst[i + 5] = _mm_aesimc_si128 (src1[i + 5]);
+      resdst[i + 6] = _mm_aesimc_si128 (src1[i + 6]);
+      resdst[i + 7] = _mm_aesimc_si128 (src1[i + 7]);
+      resdst[i + 8] = _mm_aesimc_si128 (src1[i + 8]);
+      resdst[i + 9] = _mm_aesimc_si128 (src1[i + 9]);
+      resdst[i + 10] = _mm_aesimc_si128 (src1[i + 10]);
+      resdst[i + 11] = _mm_aesimc_si128 (src1[i + 11]);
+      resdst[i + 12] = _mm_aesimc_si128 (src1[i + 12]);
+      resdst[i + 13] = _mm_aesimc_si128 (src1[i + 13]);
+      resdst[i + 14] = _mm_aesimc_si128 (src1[i + 14]);
+      resdst[i + 15] = _mm_aesimc_si128 (src1[i + 15]);
+    }
+
+  for (i = 0; i < NUM; i++)
+    if (memcmp(edst + i, resdst + i, sizeof (__m128i)))
+      abort ();
+}
--- gcc/testsuite/gcc.target/i386/aeskeygenassist.c.aes	2008-04-03 12:20:45.000000000 -0700
+++ gcc/testsuite/gcc.target/i386/aeskeygenassist.c	2008-04-03 12:20:45.000000000 -0700
@@ -0,0 +1,66 @@
+/* { dg-do run } */
+/* { dg-require-effective-target aes } */
+/* { dg-options "-O2 -maes" } */
+
+#include <wmmintrin.h>
+#include <string.h>
+
+#include "aes-check.h"
+
+extern void abort (void);
+
+#define NUM 1024
+#define IMM8 1
+
+static __m128i src1[NUM];
+static __m128i edst[NUM];
+
+static __m128i resdst[NUM];
+
+/* Initialize input/output vectors.  (Currently, there is only one set
+   of input/output vectors).  */
+
+static void
+init_data (__m128i *s1, __m128i *d)
+{
+  int i;
+  for (i = 0; i < NUM; i++)
+    {
+      s1[i] = _mm_setr_epi32 (0x16157e2b, 0xa6d2ae28,
+			      0x8815f7ab, 0x3c4fcf09);
+      d[i] = _mm_setr_epi32 (0x24b5e434, 0x3424b5e5,
+			     0xeb848a01, 0x01eb848b);
+    }
+}
+
+static void
+aes_test (void)
+{
+  int i;
+
+  init_data (src1, edst);
+
+  for (i = 0; i < NUM; i += 16)
+    {
+      resdst[i]  = _mm_aeskeygenassist_si128 (src1[i], IMM8);
+      resdst[i + 1] = _mm_aeskeygenassist_si128 (src1[i + 1], IMM8);
+      resdst[i + 2] = _mm_aeskeygenassist_si128 (src1[i + 2], IMM8);
+      resdst[i + 3] = _mm_aeskeygenassist_si128 (src1[i + 3], IMM8);
+      resdst[i + 4] = _mm_aeskeygenassist_si128 (src1[i + 4], IMM8);
+      resdst[i + 5] = _mm_aeskeygenassist_si128 (src1[i + 5], IMM8);
+      resdst[i + 6] = _mm_aeskeygenassist_si128 (src1[i + 6], IMM8);
+      resdst[i + 7] = _mm_aeskeygenassist_si128 (src1[i + 7], IMM8);
+      resdst[i + 8] = _mm_aeskeygenassist_si128 (src1[i + 8], IMM8);
+      resdst[i + 9] = _mm_aeskeygenassist_si128 (src1[i + 9], IMM8);
+      resdst[i + 10] = _mm_aeskeygenassist_si128 (src1[i + 10], IMM8);
+      resdst[i + 11] = _mm_aeskeygenassist_si128 (src1[i + 11], IMM8);
+      resdst[i + 12] = _mm_aeskeygenassist_si128 (src1[i + 12], IMM8);
+      resdst[i + 13] = _mm_aeskeygenassist_si128 (src1[i + 13], IMM8);
+      resdst[i + 14] = _mm_aeskeygenassist_si128 (src1[i + 14], IMM8);
+      resdst[i + 15] = _mm_aeskeygenassist_si128 (src1[i + 15], IMM8);
+    }
+
+  for (i = 0; i < NUM; i++)
+    if (memcmp(edst + i, resdst + i, sizeof (__m128i)))
+      abort ();
+}
--- gcc/testsuite/gcc.target/i386/clmul-check.h.aes	2008-04-03 12:20:45.000000000 -0700
+++ gcc/testsuite/gcc.target/i386/clmul-check.h	2008-04-03 12:20:45.000000000 -0700
@@ -0,0 +1,30 @@
+#include <stdio.h>
+#include <stdlib.h>
+
+#include "cpuid.h"
+
+static void clmul_test (void);
+
+int
+main ()
+{
+  unsigned int eax, ebx, ecx, edx;
+ 
+  if (!__get_cpuid (1, &eax, &ebx, &ecx, &edx))
+    return 0;
+
+  /* Run PCLMULQDQ test only if host has PCLMULQDQ support.  */
+  if (ecx & bit_CLMUL)
+    {
+      clmul_test ();
+#ifdef DEBUG
+      printf ("PASSED\n");
+#endif
+    }
+#ifdef DEBUG
+  else
+    printf ("SKIPPED\n");
+#endif
+
+  return 0;
+}
--- gcc/testsuite/gcc.target/i386/i386.exp.aes	2008-01-07 17:31:32.000000000 -0800
+++ gcc/testsuite/gcc.target/i386/i386.exp	2008-04-03 12:20:45.000000000 -0700
@@ -51,6 +51,34 @@ proc check_effective_target_sse4 { } {
     } "-O2 -msse4.1" ]
 }
 
+# Return 1 if aes instructions can be compiled.
+proc check_effective_target_aes { } {
+    return [check_no_compiler_messages aes object {
+	typedef long long __m128i __attribute__ ((__vector_size__ (16)));
+	typedef long long __v2di __attribute__ ((__vector_size__ (16)));
+
+	__m128i _mm_aesimc_si128 (__m128i __X)
+	{
+	    return (__m128i) __builtin_ia32_aesimc128 ((__v2di)__X);
+	}
+    } "-O2 -maes" ]
+}
+
+# Return 1 if clmul instructions can be compiled.
+proc check_effective_target_clmul { } {
+    return [check_no_compiler_messages clmul object {
+	typedef long long __m128i __attribute__ ((__vector_size__ (16)));
+	typedef long long __v2di __attribute__ ((__vector_size__ (16)));
+
+	__m128i pclmulqdq_test (__m128i __X, __m128i __Y)
+	{
+	    return (__m128i) __builtin_ia32_pclmulqdq128 ((__v2di)__X,
+							  (__v2di)__Y,
+							  1);
+	}
+    } "-O2 -mclmul" ]
+}
+
 # Return 1 if sse4a instructions can be compiled.
 proc check_effective_target_sse4a { } {
     return [check_no_compiler_messages sse4a object {
--- gcc/testsuite/gcc.target/i386/pclmulqdq.c.aes	2008-04-03 12:20:45.000000000 -0700
+++ gcc/testsuite/gcc.target/i386/pclmulqdq.c	2008-04-03 12:20:45.000000000 -0700
@@ -0,0 +1,87 @@
+/* { dg-do run } */
+/* { dg-require-effective-target clmul } */
+/* { dg-options "-O2 -mclmul" } */
+
+#include <wmmintrin.h>
+#include <string.h>
+
+#include "clmul-check.h"
+
+extern void abort (void);
+
+#define NUM 1024
+
+static __m128i s1[NUM];
+static __m128i s2[NUM];
+/* We need this array to generate mem form of inst */
+static __m128i s2m[NUM];
+
+static __m128i e_00[NUM];
+static __m128i e_01[NUM];
+static __m128i e_10[NUM];
+static __m128i e_11[NUM];
+
+static __m128i d_00[NUM];
+static __m128i d_01[NUM];
+static __m128i d_10[NUM];
+static __m128i d_11[NUM];
+
+/* Initialize input/output vectors.  (Currently, there is only one set
+   of input/output vectors).  */
+static void
+init_data (__m128i *ls1,   __m128i *ls2, __m128i *le_00, __m128i *le_01,
+	   __m128i *le_10, __m128i *le_11)
+{
+  int i;
+
+  for (i = 0; i < NUM; i++)
+    {
+      ls1[i] = _mm_set_epi32 (0x7B5B5465, 0x73745665,
+			      0x63746F72, 0x5D53475D);
+      ls2[i] = _mm_set_epi32 (0x48692853, 0x68617929,
+			      0x5B477565, 0x726F6E5D);
+      s2m[i] = _mm_set_epi32 (0x48692853, 0x68617929,
+			      0x5B477565, 0x726F6E5D);
+      le_00[i] = _mm_set_epi32 (0x1D4D84C8, 0x5C3440C0,
+				0x929633D5, 0xD36F0451);
+      le_01[i] = _mm_set_epi32 (0x1A2BF6DB, 0x3A30862F,
+				0xBABF262D, 0xF4B7D5C9);
+      le_10[i] = _mm_set_epi32 (0x1BD17C8D, 0x556AB5A1,
+				0x7FA540AC, 0x2A281315);
+      le_11[i] = _mm_set_epi32 (0x1D1E1F2C, 0x592E7C45,
+				0xD66EE03E, 0x410FD4ED);
+    }
+}
+
+static void
+clmul_test (void)
+{
+  int i;
+
+  init_data (s1, s2, e_00, e_01, e_10, e_11);
+
+  for (i = 0; i < NUM; i += 2)
+    {
+      d_00[i] = _mm_clmulepi64_si128 (s1[i], s2m[i], 0x00);
+      d_01[i] = _mm_clmulepi64_si128 (s1[i], s2[i], 0x01);
+      d_10[i] = _mm_clmulepi64_si128 (s1[i], s2[i], 0x10);
+      d_11[i] = _mm_clmulepi64_si128 (s1[i], s2[i], 0x11);
+
+      d_11[i + 1] = _mm_clmulepi64_si128 (s1[i + 1], s2[i + 1], 0x11);
+      d_00[i + 1] = _mm_clmulepi64_si128 (s1[i + 1], s2[i + 1], 0x00);
+      d_10[i + 1] = _mm_clmulepi64_si128 (s1[i + 1], s2m[i + 1], 0x10);
+      d_01[i + 1] = _mm_clmulepi64_si128 (s1[i + 1], s2[i + 1], 0x01);
+    }
+
+  for (i = 0; i < NUM; i++)
+    {
+      if (memcmp (d_00 + i, e_00 + i, sizeof (__m128i)))
+	abort ();
+      if (memcmp (d_01 + i, e_01 + i, sizeof (__m128i)))
+	abort ();
+      if (memcmp (d_10 + i, e_10 + i, sizeof (__m128i)))
+	abort ();
+      if (memcmp(d_11 + i, e_11 + i, sizeof (__m128i)))
+	abort ();
+    }
+}
--- gcc/testsuite/gcc.target/i386/sse-13.c.aes	2008-03-28 13:03:18.000000000 -0700
+++ gcc/testsuite/gcc.target/i386/sse-13.c	2008-04-03 14:21:42.000000000 -0700
@@ -1,10 +1,10 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -m3dnow -msse4 -msse5" } */
+/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -m3dnow -msse5 -maes -mclmul" } */
 
 #include <mm_malloc.h>
 
 /* Test that the intrinsics compile with optimization.  All of them are
-   defined as inline functions in {,x,e,p,t,s,a,b}mmintrin.h and mm3dnow.h
+   defined as inline functions in {,x,e,p,t,s,w,a,b}mmintrin.h and mm3dnow.h
    that reference the proper builtin functions.  Defining away "extern" and
    "__inline" results in all of them being compiled as proper functions.  */
 
@@ -17,6 +17,10 @@
 #define __builtin_ia32_extrqi(X, I, L)  __builtin_ia32_extrqi(X, 1, 1)
 #define __builtin_ia32_insertqi(X, Y, I, L) __builtin_ia32_insertqi(X, Y, 1, 1)
 
+/* wmmintrin.h */
+#define __builtin_ia32_aeskeygenassist128(X, C) __builtin_ia32_aeskeygenassist128(X, 1)
+#define __builtin_ia32_pclmulqdq128(X, Y, I) __builtin_ia32_pclmulqdq128(X, Y, 1)
+
 /* smmintrin.h */
 #define __builtin_ia32_pblendw128(X, Y, M) __builtin_ia32_pblendw128 (X, Y, 1)
 #define __builtin_ia32_blendps(X, Y, M) __builtin_ia32_blendps(X, Y, 1)
@@ -95,5 +99,5 @@
 #define __builtin_ia32_protqi(A, B) __builtin_ia32_protqi(A,1)
 
 #include <bmmintrin.h>
-#include <smmintrin.h>
+#include <wmmintrin.h>
 #include <mm3dnow.h>
--- gcc/testsuite/gcc.target/i386/sse-14.c.aes	2008-03-28 13:03:18.000000000 -0700
+++ gcc/testsuite/gcc.target/i386/sse-14.c	2008-04-03 14:21:53.000000000 -0700
@@ -1,10 +1,10 @@
 /* { dg-do compile } */
-/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -m3dnow -msse4 -msse5" } */
+/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -m3dnow -msse5 -maes -mclmul" } */
 
 #include <mm_malloc.h>
 
 /* Test that the intrinsics compile without optimization.  All of them are
-   defined as inline functions in {,x,e,p,t,s,a,b}mmintrin.h  and mm3dnow.h
+   defined as inline functions in {,x,e,p,t,s,w,a,b}mmintrin.h  and mm3dnow.h
    that reference the proper builtin functions.  Defining away "extern" and
    "__inline" results in all of them being compiled as proper functions.  */
 
@@ -12,7 +12,7 @@
 #define __inline
 
 #include <bmmintrin.h>
-#include <smmintrin.h>
+#include <wmmintrin.h>
 #include <mm3dnow.h>
 
 #define _CONCAT(x,y) x ## y
@@ -46,6 +46,10 @@
 test_1x (_mm_extracti_si64, __m128i, __m128i, 1, 1)
 test_2x (_mm_inserti_si64, __m128i, __m128i, __m128i, 1, 1)
 
+/* wmmintrin.h */
+test_1 (_mm_aeskeygenassist_si128, __m128i, __m128i, 1)
+test_2 (_mm_clmulepi64_si128, __m128i, __m128i, __m128i, 1)
+
 /* smmintrin.h */
 test_2 (_mm_blend_epi16, __m128i, __m128i, __m128i, 1)
 test_2 (_mm_blend_ps, __m128, __m128, __m128, 1)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: PATCH: Enable Intel AES/CLMUL
@ 2008-04-03 18:53 Uros Bizjak
  2008-04-03 22:00 ` H.J. Lu
  0 siblings, 1 reply; 23+ messages in thread
From: Uros Bizjak @ 2008-04-03 18:53 UTC (permalink / raw)
  To: GCC Patches; +Cc: H.J. Lu

Hello!

> 	* config/i386/i386.c (OPTION_MASK_ISA_AES_SET): New.
> 	(OPTION_MASK_ISA_CLMUL_SET): Likewise.
> 	(OPTION_MASK_ISA_AES_UNSET): Likewise.
> 	(OPTION_MASK_ISA_CLMUL_UNSET): Likewise.
> 	(OPTION_MASK_ISA_SSE4_2_UNSET): Add OPTION_MASK_ISA_AES_UNSET
> 	and OPTION_MASK_ISA_CLMUL_UNSET.

I don't think that MASK_ISA is correct approach here, since MASK_ISA is 
reserved for different SSE levels that siwtch on or off whole pack of 
instructions. Perhaps we should do with:

i386.h:
#define TARGET_CLMUL      x86_aes

where

i386.opt:
maes
Target Report RejectNegative Var(x86_aes)

We can even #define TARGET_CLMUL  (TARGET_SSE4_2 && x86_aes), depending 
on how we want to enable generation of AES instructions. Please also 
note, that we will use PTA_* flags in future for certain targets and we 
can switch this flag depending on target features. So:

gcc -msse -maes

or

gcc -mfancy_future_target

will both switch AES instruction supports on.

Regarding the tests, all new *intrin.h should also be added to 
gcc.target/i386/sse-[13,14] to check that intrinsics compile with and 
without optimization (I would also recommend to add the header to 
g++.dg/other/i386-3.C)


Uros.

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2008-04-06 21:05 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-04-03 14:31 PATCH: Enable Intel AES/CLMUL H.J. Lu
2008-04-03 16:21 ` Daniel Berlin
2008-04-03 16:23   ` H.J. Lu
     [not found] ` <alpine.LSU.1.00.0804062013110.22304@acrux.dbai.tuwien.ac.at>
2008-04-06 21:06   ` PATCH: Mention Intel AES/PCLMUL H.J. Lu
2008-04-06 21:17     ` Gerald Pfeifer
2008-04-03 18:53 PATCH: Enable Intel AES/CLMUL Uros Bizjak
2008-04-03 22:00 ` H.J. Lu
2008-04-03 23:17   ` H.J. Lu
2008-04-04  6:35   ` Uros Bizjak
2008-04-04 12:53     ` H.J. Lu
2008-04-04 13:29       ` Uros Bizjak
2008-04-04 13:32         ` H.J. Lu
2008-04-04 13:56           ` Uros Bizjak
2008-04-04 14:08             ` Uros Bizjak
2008-04-04 14:51             ` H.J. Lu
2008-04-04 14:56               ` Uros Bizjak
2008-04-04 15:00                 ` Jakub Jelinek
2008-04-04 15:58                   ` H.J. Lu
2008-04-04 16:33                   ` Uros Bizjak
2008-04-04 15:31                 ` H.J. Lu
2008-04-04 16:08                   ` Uros Bizjak
2008-04-04 20:27           ` Michael Meissner
2008-04-04 20:43             ` H.J. Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).