public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* PATCH: Enable Intel AES/CLMUL
@ 2008-04-03 14:31 H.J. Lu
  2008-04-03 16:21 ` Daniel Berlin
       [not found] ` <alpine.LSU.1.00.0804062013110.22304@acrux.dbai.tuwien.ac.at>
  0 siblings, 2 replies; 5+ messages in thread
From: H.J. Lu @ 2008-04-03 14:31 UTC (permalink / raw)
  To: gcc-patches; +Cc: ubizjak

Hi,

This patch enables Intel AES/CLMUL:

http://softwareprojects.intel.com/avx/

OK for mainline?

Thanks.

H.J.
----
gcc/

2008-04-03  H.J. Lu  <hongjiu.lu@intel.com>

	* config.gcc (extra_headers): Add wmmintrin.h for x86 and x86-64.

	* config/i386/cpuid.h (bit_AES): New.
	(bit_CLMUL): Likewise.

	* config/i386/i386.c (OPTION_MASK_ISA_AES_SET): New.
	(OPTION_MASK_ISA_CLMUL_SET): Likewise.
	(OPTION_MASK_ISA_AES_UNSET): Likewise.
	(OPTION_MASK_ISA_CLMUL_UNSET): Likewise.
	(OPTION_MASK_ISA_SSE4_2_UNSET): Add OPTION_MASK_ISA_AES_UNSET
	and OPTION_MASK_ISA_CLMUL_UNSET.
	(ix86_handle_option): Handle OPT_maes and OPT_mclmul.
	(pta_flags): Add PTA_AES and PTA_PCLMULQDQ.
	(override_options): Handle PTA_AES and PTA_PCLMULQDQ.
	(ix86_builtins): Add IX86_BUILTIN_AESENC128,
	IX86_BUILTIN_AESENCLAST128, IX86_BUILTIN_AESDEC128,
	IX86_BUILTIN_AESDECLAST128, IX86_BUILTIN_AESIMC128,
	IX86_BUILTIN_AESKEYGENASSIST128 and IX86_BUILTIN_PCLMULQDQ128.
	(bdesc_sse_3arg): Add __builtin_ia32_pclmulqdq128.
	(bdesc_2arg): Add __builtin_ia32_aesenc128,
	__builtin_ia32_aesenclast128, __builtin_ia32_aesdec128,
	__builtin_ia32_aesdeclast128,__builtin_ia32_aesimc128 and
	IX86_BUILTIN_AESKEYGENASSIST128.
	(bdesc_1arg): Add __builtin_ia32_aesimc128.
	(ix86_init_mmx_sse_builtins): Handle V2DImode for bdesc_1arg.
	Define __builtin_ia32_aeskeygenassist128.
	* config/i386/i386.c (ix86_expand_binop_imm_builtin): New.
	(ix86_expand_builtin): Use it for IX86_BUILTIN_PSLLDQI128 and
	IX86_BUILTIN_PSRLDQI128.  Handle IX86_BUILTIN_AESKEYGENASSIST128.

	* config/i386/i386.h (TARGET_AES): New.
	(TARGET_CLMUL): Likewise.
	(TARGET_CPU_CPP_BUILTINS): Handle TARGET_AES and TARGET_CLMUL.

	* config/i386/i386.md (UNSPEC_AESENC): New.
	(UNSPEC_AESENCLAST): Likewise.
	(UNSPEC_AESDEC): Likewise.
	(UNSPEC_AESDECLAST): Likewise.
	(UNSPEC_AESIMC): Likewise.
	(UNSPEC_AESKEYGENASSIST): Likewise.
	(UNSPEC_PCLMULQDQ): Likewise.

	* config/i386/i386.opt (maes): New.
	(mclmul): Likewise.

	* config/i386/sse.md (aesenc): New pattern.
	(aesenclast): Likewise.
	(aesdec): Likewise.
	(aesdeclast): Likewise.
	(aesimc): Likewise.
	(aeskeygenassist): Likewise.
	(pclmulqdq): Likewise.

	* config/i386/wmmintrin.h: New.

	* doc/extend.texi: Document AES and CLMUL built-in function.

	* doc/invoke.texi: Document -maes and -mclmul.

gcc/testsuite/

2008-04-03  H.J. Lu  <hongjiu.lu@intel.com>

	* gcc.target/i386/aes-check.h: New.
	* gcc.target/i386/aesdec.c: Likewise.
	* gcc.target/i386/aesdeclast.c: Likewise.
	* gcc.target/i386/aesenc.c: Likewise.
	* gcc.target/i386/aesenclast.c: Likewise.
	* gcc.target/i386/aesimc.c: Likewise.
	* gcc.target/i386/aeskeygenassist.c: Likewise.
	* gcc.target/i386/pclmulqdq.c: Likewise.
	* gcc.target/i386/clmul-check.h: Likewise.

	* gcc.target/i386/i386.exp (check_effective_target_aes): New.
	(check_effective_target_clmul): Likewise.

Index: gcc/doc/extend.texi
===================================================================
--- gcc/doc/extend.texi	(.../fsf/trunk)	(revision 2007)
+++ gcc/doc/extend.texi	(.../branches/aes)	(revision 2007)
@@ -8013,6 +8013,27 @@ depending on the size of @code{unsigned 
 Generates the @code{popcntq} machine instruction.
 @end table
 
+The following built-in functions are available when @option{-maes} is
+used.  All of them generate the machine instruction that is part of the
+name.
+
+@smallexample
+v2di __builtin_ia32_aesenc128 (v2di, v2di)
+v2di __builtin_ia32_aesenclast128 (v2di, v2di)
+v2di __builtin_ia32_aesdec128 (v2di, v2di)
+v2di __builtin_ia32_aesdeclast128 (v2di, v2di)
+v2di __builtin_ia32_aeskeygenassist128 (v2di, const int)
+v2di __builtin_ia32_aesimc128 (v2di)
+@end smallexample
+
+The following built-in function is available when @option{-mclmul} is
+used.
+
+@table @code
+@item v2di __builtin_ia32_pclmulqdq128 (v2di, v2di, const int)
+Generates the @code{pclmulqdq} machine instruction.
+@end table
+
 The following built-in functions are available when @option{-msse4a} is used.
 All of them generate the machine instruction that is part of the name.
 
Index: gcc/doc/invoke.texi
===================================================================
--- gcc/doc/invoke.texi	(.../fsf/trunk)	(revision 2007)
+++ gcc/doc/invoke.texi	(.../branches/aes)	(revision 2007)
@@ -555,6 +555,7 @@ Objective-C and Objective-C++ Dialects}.
 -mno-wide-multiply  -mrtd  -malign-double @gol
 -mpreferred-stack-boundary=@var{num} -mcx16 -msahf -mrecip @gol
 -mmmx  -msse  -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msse4 @gol
+-maes -mclmul @gol
 -msse4a -m3dnow -mpopcnt -mabm -msse5 @gol
 -mthreads  -mno-align-stringops  -minline-all-stringops @gol
 -mpush-args  -maccumulate-outgoing-args  -m128bit-long-double @gol
@@ -10720,6 +10721,10 @@ preferred alignment to @option{-mpreferr
 @itemx -mno-sse4.2
 @itemx -msse4
 @itemx -mno-sse4
+@itemx -maes
+@itemx -mno-aes
+@itemx -mclmul
+@itemx -mno-clmul
 @itemx -msse4a
 @itemx -mno-sse4a
 @itemx -msse5
@@ -10737,8 +10742,8 @@ preferred alignment to @option{-mpreferr
 @opindex m3dnow
 @opindex mno-3dnow
 These switches enable or disable the use of instructions in the MMX,
-SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4A, SSE5, ABM or 3DNow!@: extended
-instruction sets.
+SSE, SSE2, SSE3, SSSE3, SSE4.1, AES, CLMUL, SSE4A, SSE5, ABM or
+3DNow!@: extended instruction sets.
 These extensions are also available as built-in functions: see
 @ref{X86 Built-in Functions}, for details of the functions enabled and
 disabled by these switches.
Index: gcc/testsuite/gcc.target/i386/i386.exp
===================================================================
--- gcc/testsuite/gcc.target/i386/i386.exp	(.../fsf/trunk)	(revision 2007)
+++ gcc/testsuite/gcc.target/i386/i386.exp	(.../branches/aes)	(revision 2007)
@@ -51,6 +51,34 @@ proc check_effective_target_sse4 { } {
     } "-O2 -msse4.1" ]
 }
 
+# Return 1 if aes instructions can be compiled.
+proc check_effective_target_aes { } {
+    return [check_no_compiler_messages aes object {
+	typedef long long __m128i __attribute__ ((__vector_size__ (16)));
+	typedef long long __v2di __attribute__ ((__vector_size__ (16)));
+
+	__m128i _mm_aesimc_si128 (__m128i __X)
+	{
+	    return (__m128i) __builtin_ia32_aesimc128 ((__v2di)__X);
+	}
+    } "-O2 -maes" ]
+}
+
+# Return 1 if clmul instructions can be compiled.
+proc check_effective_target_clmul { } {
+    return [check_no_compiler_messages clmul object {
+	typedef long long __m128i __attribute__ ((__vector_size__ (16)));
+	typedef long long __v2di __attribute__ ((__vector_size__ (16)));
+
+	__m128i pclmulqdq_test (__m128i __X, __m128i __Y)
+	{
+	    return (__m128i) __builtin_ia32_pclmulqdq128 ((__v2di)__X,
+							  (__v2di)__Y,
+							  1);
+	}
+    } "-O2 -mclmul" ]
+}
+
 # Return 1 if sse4a instructions can be compiled.
 proc check_effective_target_sse4a { } {
     return [check_no_compiler_messages sse4a object {
Index: gcc/testsuite/gcc.target/i386/aesdeclast.c
===================================================================
--- gcc/testsuite/gcc.target/i386/aesdeclast.c	(.../fsf/trunk)	(revision 0)
+++ gcc/testsuite/gcc.target/i386/aesdeclast.c	(.../branches/aes)	(revision 2007)
@@ -0,0 +1,69 @@
+/* { dg-do run } */
+/* { dg-require-effective-target aes } */
+/* { dg-options "-O2 -maes" } */
+
+#include <wmmintrin.h>
+#include <string.h>
+
+#include "aes-check.h"
+
+extern void abort (void);
+
+#define NUM 1024
+
+static __m128i src1[NUM];
+static __m128i src2[NUM];
+static __m128i edst[NUM];
+
+static __m128i resdst[NUM];
+
+/* Initialize input/output vectors.  (Currently, there is only one set of
+   input/output vectors).  */
+
+static void
+init_data (__m128i *s1, __m128i *s2, __m128i *d)
+{
+  int i;
+
+  for (i = 0; i < NUM; i++)
+    {
+      s1[i] = _mm_setr_epi32 (0x5d53475d, 0x63746f72,
+			      0x73745665, 0x7b5b5465);
+      s2[i] = _mm_setr_epi32 (0x726f6e5d, 0x5b477565,
+			      0x68617929, 0x48692853);
+      d[i] = _mm_setr_epi32 (0x72a593d0, 0xd410637b,
+			     0x6b317f95, 0xc5a391ef);
+    }
+}
+
+static void
+aes_test (void)
+{
+  int i;
+
+  init_data (src1, src2, edst);
+
+  for (i = 0; i < NUM; i += 16)
+    {
+      resdst[i] = _mm_aesdeclast_si128 (src1[i], src2[i]);
+      resdst[i + 1] = _mm_aesdeclast_si128 (src1[i + 1], src2[i + 1]);
+      resdst[i + 2] = _mm_aesdeclast_si128 (src1[i + 2], src2[i + 2]);
+      resdst[i + 3] = _mm_aesdeclast_si128 (src1[i + 3], src2[i + 3]);
+      resdst[i + 4] = _mm_aesdeclast_si128 (src1[i + 4], src2[i + 4]);
+      resdst[i + 5] = _mm_aesdeclast_si128 (src1[i + 5], src2[i + 5]);
+      resdst[i + 6] = _mm_aesdeclast_si128 (src1[i + 6], src2[i + 6]);
+      resdst[i + 7] = _mm_aesdeclast_si128 (src1[i + 7], src2[i + 7]);
+      resdst[i + 8] = _mm_aesdeclast_si128 (src1[i + 8], src2[i + 8]);
+      resdst[i + 9] = _mm_aesdeclast_si128 (src1[i + 9], src2[i + 9]);
+      resdst[i + 10] = _mm_aesdeclast_si128 (src1[i + 10], src2[i + 10]);
+      resdst[i + 11] = _mm_aesdeclast_si128 (src1[i + 11], src2[i + 11]);
+      resdst[i + 12] = _mm_aesdeclast_si128 (src1[i + 12], src2[i + 12]);
+      resdst[i + 13] = _mm_aesdeclast_si128 (src1[i + 13], src2[i + 13]);
+      resdst[i + 14] = _mm_aesdeclast_si128 (src1[i + 14], src2[i + 14]);
+      resdst[i + 15] = _mm_aesdeclast_si128 (src1[i + 15], src2[i + 15]);
+    }
+
+  for (i = 0; i < NUM; i++)
+    if (memcmp (edst + i, resdst + i, sizeof (__m128i)))
+      abort ();
+}
Index: gcc/testsuite/gcc.target/i386/pclmulqdq.c
===================================================================
--- gcc/testsuite/gcc.target/i386/pclmulqdq.c	(.../fsf/trunk)	(revision 0)
+++ gcc/testsuite/gcc.target/i386/pclmulqdq.c	(.../branches/aes)	(revision 2007)
@@ -0,0 +1,87 @@
+/* { dg-do run } */
+/* { dg-require-effective-target clmul } */
+/* { dg-options "-O2 -mclmul" } */
+
+#include <wmmintrin.h>
+#include <string.h>
+
+#include "clmul-check.h"
+
+extern void abort (void);
+
+#define NUM 1024
+
+static __m128i s1[NUM];
+static __m128i s2[NUM];
+/* We need this array to generate mem form of inst */
+static __m128i s2m[NUM];
+
+static __m128i e_00[NUM];
+static __m128i e_01[NUM];
+static __m128i e_10[NUM];
+static __m128i e_11[NUM];
+
+static __m128i d_00[NUM];
+static __m128i d_01[NUM];
+static __m128i d_10[NUM];
+static __m128i d_11[NUM];
+
+/* Initialize input/output vectors.  (Currently, there is only one set
+   of input/output vectors).  */
+static void
+init_data (__m128i *ls1,   __m128i *ls2, __m128i *le_00, __m128i *le_01,
+	   __m128i *le_10, __m128i *le_11)
+{
+  int i;
+
+  for (i = 0; i < NUM; i++)
+    {
+      ls1[i] = _mm_set_epi32 (0x7B5B5465, 0x73745665,
+			      0x63746F72, 0x5D53475D);
+      ls2[i] = _mm_set_epi32 (0x48692853, 0x68617929,
+			      0x5B477565, 0x726F6E5D);
+      s2m[i] = _mm_set_epi32 (0x48692853, 0x68617929,
+			      0x5B477565, 0x726F6E5D);
+      le_00[i] = _mm_set_epi32 (0x1D4D84C8, 0x5C3440C0,
+				0x929633D5, 0xD36F0451);
+      le_01[i] = _mm_set_epi32 (0x1A2BF6DB, 0x3A30862F,
+				0xBABF262D, 0xF4B7D5C9);
+      le_10[i] = _mm_set_epi32 (0x1BD17C8D, 0x556AB5A1,
+				0x7FA540AC, 0x2A281315);
+      le_11[i] = _mm_set_epi32 (0x1D1E1F2C, 0x592E7C45,
+				0xD66EE03E, 0x410FD4ED);
+    }
+}
+
+static void
+clmul_test (void)
+{
+  int i;
+
+  init_data (s1, s2, e_00, e_01, e_10, e_11);
+
+  for (i = 0; i < NUM; i += 2)
+    {
+      d_00[i] = _mm_clmulepi64_si128 (s1[i], s2m[i], 0x00);
+      d_01[i] = _mm_clmulepi64_si128 (s1[i], s2[i], 0x01);
+      d_10[i] = _mm_clmulepi64_si128 (s1[i], s2[i], 0x10);
+      d_11[i] = _mm_clmulepi64_si128 (s1[i], s2[i], 0x11);
+
+      d_11[i + 1] = _mm_clmulepi64_si128 (s1[i + 1], s2[i + 1], 0x11);
+      d_00[i + 1] = _mm_clmulepi64_si128 (s1[i + 1], s2[i + 1], 0x00);
+      d_10[i + 1] = _mm_clmulepi64_si128 (s1[i + 1], s2m[i + 1], 0x10);
+      d_01[i + 1] = _mm_clmulepi64_si128 (s1[i + 1], s2[i + 1], 0x01);
+    }
+
+  for (i = 0; i < NUM; i++)
+    {
+      if (memcmp (d_00 + i, e_00 + i, sizeof (__m128i)))
+	abort ();
+      if (memcmp (d_01 + i, e_01 + i, sizeof (__m128i)))
+	abort ();
+      if (memcmp (d_10 + i, e_10 + i, sizeof (__m128i)))
+	abort ();
+      if (memcmp(d_11 + i, e_11 + i, sizeof (__m128i)))
+	abort ();
+    }
+}

Property changes on: gcc/testsuite/gcc.target/i386/pclmulqdq.c
___________________________________________________________________
Name: svn:executable
   + *

Index: gcc/testsuite/gcc.target/i386/clmul-check.h
===================================================================
--- gcc/testsuite/gcc.target/i386/clmul-check.h	(.../fsf/trunk)	(revision 0)
+++ gcc/testsuite/gcc.target/i386/clmul-check.h	(.../branches/aes)	(revision 2007)
@@ -0,0 +1,30 @@
+#include <stdio.h>
+#include <stdlib.h>
+
+#include "cpuid.h"
+
+static void clmul_test (void);
+
+int
+main ()
+{
+  unsigned int eax, ebx, ecx, edx;
+ 
+  if (!__get_cpuid (1, &eax, &ebx, &ecx, &edx))
+    return 0;
+
+  /* Run PCLMULQDQ test only if host has PCLMULQDQ support.  */
+  if (ecx & bit_CLMUL)
+    {
+      clmul_test ();
+#ifdef DEBUG
+      printf ("PASSED\n");
+#endif
+    }
+#ifdef DEBUG
+  else
+    printf ("SKIPPED\n");
+#endif
+
+  return 0;
+}
Index: gcc/testsuite/gcc.target/i386/aes-check.h
===================================================================
--- gcc/testsuite/gcc.target/i386/aes-check.h	(.../fsf/trunk)	(revision 0)
+++ gcc/testsuite/gcc.target/i386/aes-check.h	(.../branches/aes)	(revision 2007)
@@ -0,0 +1,30 @@
+#include <stdio.h>
+#include <stdlib.h>
+
+#include "cpuid.h"
+
+static void aes_test (void);
+
+int
+main ()
+{
+  unsigned int eax, ebx, ecx, edx;
+ 
+  if (!__get_cpuid (1, &eax, &ebx, &ecx, &edx))
+    return 0;
+
+  /* Run AES test only if host has AES support.  */
+  if (ecx & bit_AES)
+    {
+      aes_test ();
+#ifdef DEBUG
+    printf ("PASSED\n");
+#endif
+    }
+#ifdef DEBUG
+  else
+    printf ("SKIPPED\n");
+#endif
+
+  return 0;
+}
Index: gcc/testsuite/gcc.target/i386/aeskeygenassist.c
===================================================================
--- gcc/testsuite/gcc.target/i386/aeskeygenassist.c	(.../fsf/trunk)	(revision 0)
+++ gcc/testsuite/gcc.target/i386/aeskeygenassist.c	(.../branches/aes)	(revision 2007)
@@ -0,0 +1,66 @@
+/* { dg-do run } */
+/* { dg-require-effective-target aes } */
+/* { dg-options "-O2 -maes" } */
+
+#include <wmmintrin.h>
+#include <string.h>
+
+#include "aes-check.h"
+
+extern void abort (void);
+
+#define NUM 1024
+#define IMM8 1
+
+static __m128i src1[NUM];
+static __m128i edst[NUM];
+
+static __m128i resdst[NUM];
+
+/* Initialize input/output vectors.  (Currently, there is only one set
+   of input/output vectors).  */
+
+static void
+init_data (__m128i *s1, __m128i *d)
+{
+  int i;
+  for (i = 0; i < NUM; i++)
+    {
+      s1[i] = _mm_setr_epi32 (0x16157e2b, 0xa6d2ae28,
+			      0x8815f7ab, 0x3c4fcf09);
+      d[i] = _mm_setr_epi32 (0x24b5e434, 0x3424b5e5,
+			     0xeb848a01, 0x01eb848b);
+    }
+}
+
+static void
+aes_test (void)
+{
+  int i;
+
+  init_data (src1, edst);
+
+  for (i = 0; i < NUM; i += 16)
+    {
+      resdst[i]  = _mm_aeskeygenassist_si128 (src1[i], IMM8);
+      resdst[i + 1] = _mm_aeskeygenassist_si128 (src1[i + 1], IMM8);
+      resdst[i + 2] = _mm_aeskeygenassist_si128 (src1[i + 2], IMM8);
+      resdst[i + 3] = _mm_aeskeygenassist_si128 (src1[i + 3], IMM8);
+      resdst[i + 4] = _mm_aeskeygenassist_si128 (src1[i + 4], IMM8);
+      resdst[i + 5] = _mm_aeskeygenassist_si128 (src1[i + 5], IMM8);
+      resdst[i + 6] = _mm_aeskeygenassist_si128 (src1[i + 6], IMM8);
+      resdst[i + 7] = _mm_aeskeygenassist_si128 (src1[i + 7], IMM8);
+      resdst[i + 8] = _mm_aeskeygenassist_si128 (src1[i + 8], IMM8);
+      resdst[i + 9] = _mm_aeskeygenassist_si128 (src1[i + 9], IMM8);
+      resdst[i + 10] = _mm_aeskeygenassist_si128 (src1[i + 10], IMM8);
+      resdst[i + 11] = _mm_aeskeygenassist_si128 (src1[i + 11], IMM8);
+      resdst[i + 12] = _mm_aeskeygenassist_si128 (src1[i + 12], IMM8);
+      resdst[i + 13] = _mm_aeskeygenassist_si128 (src1[i + 13], IMM8);
+      resdst[i + 14] = _mm_aeskeygenassist_si128 (src1[i + 14], IMM8);
+      resdst[i + 15] = _mm_aeskeygenassist_si128 (src1[i + 15], IMM8);
+    }
+
+  for (i = 0; i < NUM; i++)
+    if (memcmp(edst + i, resdst + i, sizeof (__m128i)))
+      abort ();
+}
Index: gcc/testsuite/gcc.target/i386/aesenclast.c
===================================================================
--- gcc/testsuite/gcc.target/i386/aesenclast.c	(.../fsf/trunk)	(revision 0)
+++ gcc/testsuite/gcc.target/i386/aesenclast.c	(.../branches/aes)	(revision 2007)
@@ -0,0 +1,68 @@
+/* { dg-do run } */
+/* { dg-require-effective-target aes } */
+/* { dg-options "-O2 -maes" } */
+
+#include <wmmintrin.h>
+#include <string.h>
+
+#include "aes-check.h"
+
+extern void abort (void);
+
+#define NUM 1024
+
+static __m128i src1[NUM];
+static __m128i src2[NUM];
+static __m128i edst[NUM];
+
+static __m128i resdst[NUM];
+
+/* Initialize input/output vectors.  (Currently, there is only one
+   set of input/output vectors).  */
+
+static void
+init_data (__m128i *s1, __m128i *s2, __m128i *d)
+{
+  int i;
+  for (i = 0; i < NUM; i++)
+    {
+      s1[i] = _mm_setr_epi32 (0x5d53475d, 0x63746f72,
+			      0x73745665, 0x7b5b5465);
+      s2[i] = _mm_setr_epi32 (0x726f6e5d, 0x5b477565,
+			      0x68617929, 0x48692853);
+      d[i] = _mm_setr_epi32 (0x53fdc611, 0x177ec425,
+			     0x938c5964, 0xc7fb881e);
+    }
+}
+
+static void
+aes_test (void)
+{
+  int i;
+
+  init_data (src1, src2, edst);
+
+  for (i = 0; i < NUM; i += 16)
+    {
+      resdst[i] = _mm_aesenclast_si128 (src1[i], src2[i]);
+      resdst[i + 1] = _mm_aesenclast_si128 (src1[i + 1], src2[i + 1]);
+      resdst[i + 2] = _mm_aesenclast_si128 (src1[i + 2], src2[i + 2]);
+      resdst[i + 3] = _mm_aesenclast_si128 (src1[i + 3], src2[i + 3]);
+      resdst[i + 4] = _mm_aesenclast_si128 (src1[i + 4], src2[i + 4]);
+      resdst[i + 5] = _mm_aesenclast_si128 (src1[i + 5], src2[i + 5]);
+      resdst[i + 6] = _mm_aesenclast_si128 (src1[i + 6], src2[i + 6]);
+      resdst[i + 7] = _mm_aesenclast_si128 (src1[i + 7], src2[i + 7]);
+      resdst[i + 8] = _mm_aesenclast_si128 (src1[i + 8], src2[i + 8]);
+      resdst[i + 9] = _mm_aesenclast_si128 (src1[i + 9], src2[i + 9]);
+      resdst[i + 10] = _mm_aesenclast_si128 (src1[i + 10], src2[i + 10]);
+      resdst[i + 11] = _mm_aesenclast_si128 (src1[i + 11], src2[i + 11]);
+      resdst[i + 12] = _mm_aesenclast_si128 (src1[i + 12], src2[i + 12]);
+      resdst[i + 13] = _mm_aesenclast_si128 (src1[i + 13], src2[i + 13]);
+      resdst[i + 14] = _mm_aesenclast_si128 (src1[i + 14], src2[i + 14]);
+      resdst[i + 15] = _mm_aesenclast_si128 (src1[i + 15], src2[i + 15]);
+    }
+
+  for (i = 0; i < NUM; i++)
+    if (memcmp(edst + i, resdst + i, sizeof (__m128i)))
+      abort ();
+}
Index: gcc/testsuite/gcc.target/i386/aesimc.c
===================================================================
--- gcc/testsuite/gcc.target/i386/aesimc.c	(.../fsf/trunk)	(revision 0)
+++ gcc/testsuite/gcc.target/i386/aesimc.c	(.../branches/aes)	(revision 2007)
@@ -0,0 +1,66 @@
+/* { dg-do run } */
+/* { dg-require-effective-target aes } */
+/* { dg-options "-O2 -maes" } */
+
+#include <wmmintrin.h>
+#include <string.h>
+
+#include "aes-check.h"
+
+extern void abort (void);
+
+#define NUM 1024
+
+static __m128i src1[NUM];
+static __m128i edst[NUM];
+
+static __m128i resdst[NUM];
+
+/* Initialize input/output vectors.  (Currently, there is only one set
+   of input/output vectors).   */
+
+static void
+init_data (__m128i *s1, __m128i *d)
+{
+  int i;
+
+  for (i = 0; i < NUM; i++)
+    {
+      s1[i] = _mm_setr_epi32 (0x5d53475d, 0x63746f72,
+			      0x73745665, 0x7b5b5465);
+      d[i] = _mm_setr_epi32 (0x81c3b3e5, 0x2b18330a,
+			     0x44b109c8, 0x627a6f66);
+    }
+}
+
+static void
+aes_test (void)
+{
+  int i;
+
+  init_data (src1, edst);
+
+  for (i = 0; i < NUM; i += 16)
+    {
+      resdst[i] = _mm_aesimc_si128 (src1[i]);
+      resdst[i + 1] = _mm_aesimc_si128 (src1[i + 1]);
+      resdst[i + 2] = _mm_aesimc_si128 (src1[i + 2]);
+      resdst[i + 3] = _mm_aesimc_si128 (src1[i + 3]);
+      resdst[i + 4] = _mm_aesimc_si128 (src1[i + 4]);
+      resdst[i + 5] = _mm_aesimc_si128 (src1[i + 5]);
+      resdst[i + 6] = _mm_aesimc_si128 (src1[i + 6]);
+      resdst[i + 7] = _mm_aesimc_si128 (src1[i + 7]);
+      resdst[i + 8] = _mm_aesimc_si128 (src1[i + 8]);
+      resdst[i + 9] = _mm_aesimc_si128 (src1[i + 9]);
+      resdst[i + 10] = _mm_aesimc_si128 (src1[i + 10]);
+      resdst[i + 11] = _mm_aesimc_si128 (src1[i + 11]);
+      resdst[i + 12] = _mm_aesimc_si128 (src1[i + 12]);
+      resdst[i + 13] = _mm_aesimc_si128 (src1[i + 13]);
+      resdst[i + 14] = _mm_aesimc_si128 (src1[i + 14]);
+      resdst[i + 15] = _mm_aesimc_si128 (src1[i + 15]);
+    }
+
+  for (i = 0; i < NUM; i++)
+    if (memcmp(edst + i, resdst + i, sizeof (__m128i)))
+      abort ();
+}
Index: gcc/testsuite/gcc.target/i386/aesenc.c
===================================================================
--- gcc/testsuite/gcc.target/i386/aesenc.c	(.../fsf/trunk)	(revision 0)
+++ gcc/testsuite/gcc.target/i386/aesenc.c	(.../branches/aes)	(revision 2007)
@@ -0,0 +1,68 @@
+/* { dg-do run } */
+/* { dg-require-effective-target aes } */
+/* { dg-options "-O2 -maes" } */
+
+#include <wmmintrin.h>
+#include <string.h>
+
+#include "aes-check.h"
+
+extern void abort (void);
+
+#define NUM 1024
+
+static __m128i src1[NUM];
+static __m128i src2[NUM];
+static __m128i edst[NUM];
+
+static __m128i resdst[NUM];
+
+/* Initialize input/output vectors.  (Currently, there is only one set
+   of input/output vectors).  */
+
+static void
+init_data (__m128i *s1, __m128i *s2, __m128i *d)
+{
+  int i;
+  for (i = 0; i < NUM; i++)
+    {
+      s1[i] = _mm_setr_epi32 (0x5d53475d, 0x63746f72,
+			      0x73745665, 0x7b5b5465);
+      s2[i] = _mm_setr_epi32 (0x726f6e5d, 0x5b477565,
+			      0x68617929, 0x48692853);
+      d[i] = _mm_setr_epi32 (0xded7e595, 0x8b104b58,
+			     0x9fdba3c5, 0xa8311c2f);
+    }
+}
+
+static void
+aes_test (void)
+{
+  int i;
+
+  init_data (src1, src2, edst);
+
+  for (i = 0; i < NUM; i += 16)
+    {
+      resdst[i] = _mm_aesenc_si128 (src1[i], src2[i]);
+      resdst[i + 1] = _mm_aesenc_si128 (src1[i + 1], src2[i + 1]);
+      resdst[i + 2] = _mm_aesenc_si128 (src1[i + 2], src2[i + 2]);
+      resdst[i + 3] = _mm_aesenc_si128 (src1[i + 3], src2[i + 3]);
+      resdst[i + 4] = _mm_aesenc_si128 (src1[i + 4], src2[i + 4]);
+      resdst[i + 5] = _mm_aesenc_si128 (src1[i + 5], src2[i + 5]);
+      resdst[i + 6] = _mm_aesenc_si128 (src1[i + 6], src2[i + 6]);
+      resdst[i + 7] = _mm_aesenc_si128 (src1[i + 7], src2[i + 7]);
+      resdst[i + 8] = _mm_aesenc_si128 (src1[i + 8], src2[i + 8]);
+      resdst[i + 9] = _mm_aesenc_si128 (src1[i + 9], src2[i + 9]);
+      resdst[i + 10] = _mm_aesenc_si128 (src1[i + 10], src2[i + 10]);
+      resdst[i + 11] = _mm_aesenc_si128 (src1[i + 11], src2[i + 11]);
+      resdst[i + 12] = _mm_aesenc_si128 (src1[i + 12], src2[i + 12]);
+      resdst[i + 13] = _mm_aesenc_si128 (src1[i + 13], src2[i + 13]);
+      resdst[i + 14] = _mm_aesenc_si128 (src1[i + 14], src2[i + 14]);
+      resdst[i + 15] = _mm_aesenc_si128 (src1[i + 15], src2[i + 15]);
+    }
+
+  for (i = 0; i < NUM; i++)
+    if (memcmp (edst + i, resdst + i, sizeof (__m128i)))
+      abort ();
+}
Index: gcc/testsuite/gcc.target/i386/aesdec.c
===================================================================
--- gcc/testsuite/gcc.target/i386/aesdec.c	(.../fsf/trunk)	(revision 0)
+++ gcc/testsuite/gcc.target/i386/aesdec.c	(.../branches/aes)	(revision 2007)
@@ -0,0 +1,67 @@
+/* { dg-do run } */
+/* { dg-require-effective-target aes } */
+/* { dg-options "-O2 -maes" } */
+
+#include <wmmintrin.h>
+#include <string.h>
+
+#include "aes-check.h"
+
+extern void abort (void);
+
+#define NUM 1024
+
+static __m128i src1[NUM];
+static __m128i src2[NUM];
+static __m128i edst[NUM];
+
+static __m128i resdst[NUM];
+
+/* Initialize input/output vectors.  (Currently, there is only one set
+   of input/output vectors).  */
+static void
+init_data (__m128i *s1, __m128i *s2, __m128i *d)
+{
+  int i;
+  for (i = 0; i < NUM; i++)
+    {
+      s1[i] = _mm_setr_epi32 (0x5d53475d, 0x63746f72,
+			      0x73745665, 0x7b5b5465);
+      s2[i] = _mm_setr_epi32 (0x726f6e5d, 0x5b477565,
+			      0x68617929, 0x48692853);
+      d[i]  = _mm_setr_epi32 (0xb730392a, 0xb58eb95e,
+			      0xfaea2787, 0x138ac342);
+    }
+}
+
+static void
+aes_test (void)
+{
+  int i;
+
+  init_data (src1, src2, edst);
+
+  for (i = 0; i < NUM; i += 16)
+    {
+      resdst[i] = _mm_aesdec_si128 (src1[i], src2[i]);
+      resdst[i + 1] = _mm_aesdec_si128 (src1[i + 1], src2[i + 1]);
+      resdst[i + 2] = _mm_aesdec_si128 (src1[i + 2], src2[i + 2]);
+      resdst[i + 3] = _mm_aesdec_si128 (src1[i + 3], src2[i + 3]);
+      resdst[i + 4] = _mm_aesdec_si128 (src1[i + 4], src2[i + 4]);
+      resdst[i + 5] = _mm_aesdec_si128 (src1[i + 5], src2[i + 5]);
+      resdst[i + 6] = _mm_aesdec_si128 (src1[i + 6], src2[i + 6]);
+      resdst[i + 7] = _mm_aesdec_si128 (src1[i + 7], src2[i + 7]);
+      resdst[i + 8] = _mm_aesdec_si128 (src1[i + 8], src2[i + 8]);
+      resdst[i + 9] = _mm_aesdec_si128 (src1[i + 9], src2[i + 9]);
+      resdst[i + 10] = _mm_aesdec_si128 (src1[i + 10], src2[i + 10]);
+      resdst[i + 11] = _mm_aesdec_si128 (src1[i + 11], src2[i + 11]);
+      resdst[i + 12] = _mm_aesdec_si128 (src1[i + 12], src2[i + 12]);
+      resdst[i + 13] = _mm_aesdec_si128 (src1[i + 13], src2[i + 13]);
+      resdst[i + 14] = _mm_aesdec_si128 (src1[i + 14], src2[i + 14]);
+      resdst[i + 15] = _mm_aesdec_si128 (src1[i + 15], src2[i + 15]);
+    }
+
+  for (i = 0; i < NUM; i++)
+    if (memcmp (edst + i, resdst + i, sizeof (__m128i)))
+      abort ();
+}
Index: gcc/config.gcc
===================================================================
--- gcc/config.gcc	(.../fsf/trunk)	(revision 2007)
+++ gcc/config.gcc	(.../branches/aes)	(revision 2007)
@@ -309,13 +309,15 @@ i[34567]86-*-*)
 	cpu_type=i386
 	extra_headers="cpuid.h mmintrin.h mm3dnow.h xmmintrin.h emmintrin.h
 		       pmmintrin.h tmmintrin.h ammintrin.h smmintrin.h
-		       nmmintrin.h bmmintrin.h mmintrin-common.h"
+		       nmmintrin.h bmmintrin.h mmintrin-common.h
+		       wmmintrin.h"
 	;;
 x86_64-*-*)
 	cpu_type=i386
 	extra_headers="cpuid.h mmintrin.h mm3dnow.h xmmintrin.h emmintrin.h
 		       pmmintrin.h tmmintrin.h ammintrin.h smmintrin.h
-		       nmmintrin.h bmmintrin.h mmintrin-common.h"
+		       nmmintrin.h bmmintrin.h mmintrin-common.h
+		       wmmintrin.h"
 	need_64bit_hwint=yes
 	;;
 ia64-*-*)
Index: gcc/config/i386/i386.h
===================================================================
--- gcc/config/i386/i386.h	(.../fsf/trunk)	(revision 2007)
+++ gcc/config/i386/i386.h	(.../branches/aes)	(revision 2007)
@@ -46,6 +46,8 @@ along with GCC; see the file COPYING3.  
 #define TARGET_SSSE3	OPTION_ISA_SSSE3
 #define TARGET_SSE4_1	OPTION_ISA_SSE4_1
 #define TARGET_SSE4_2	OPTION_ISA_SSE4_2
+#define TARGET_AES	OPTION_ISA_AES
+#define TARGET_CLMUL	OPTION_ISA_CLMUL
 #define TARGET_SSE4A	OPTION_ISA_SSE4A
 #define TARGET_SSE5	OPTION_ISA_SSE5
 #define TARGET_ROUND	OPTION_ISA_ROUND
@@ -683,6 +685,10 @@ extern const char *host_detect_local_cpu
 	builtin_define ("__SSE4_1__");				\
       if (TARGET_SSE4_2)					\
 	builtin_define ("__SSE4_2__");				\
+      if (TARGET_AES)						\
+	builtin_define ("__AES__");				\
+      if (TARGET_CLMUL)						\
+	builtin_define ("__CLMUL__");				\
       if (TARGET_SSE4A)						\
  	builtin_define ("__SSE4A__");		                \
       if (TARGET_SSE5)						\
Index: gcc/config/i386/i386.md
===================================================================
--- gcc/config/i386/i386.md	(.../fsf/trunk)	(revision 2007)
+++ gcc/config/i386/i386.md	(.../branches/aes)	(revision 2007)
@@ -186,6 +186,17 @@
    (UNSPEC_FRCZ			156)
    (UNSPEC_CVTPH2PS		157)
    (UNSPEC_CVTPS2PH		158)
+
+   ; For AES support
+   (UNSPEC_AESENC		159)
+   (UNSPEC_AESENCLAST		160)
+   (UNSPEC_AESDEC		161)
+   (UNSPEC_AESDECLAST		162)
+   (UNSPEC_AESIMC		163)
+   (UNSPEC_AESKEYGENASSIST	164)
+
+   ; For CLMUL support
+   (UNSPEC_CLMUL		165)
   ])
 
 (define_constants
Index: gcc/config/i386/wmmintrin.h
===================================================================
--- gcc/config/i386/wmmintrin.h	(.../fsf/trunk)	(revision 0)
+++ gcc/config/i386/wmmintrin.h	(.../branches/aes)	(revision 2007)
@@ -0,0 +1,124 @@
+/* Copyright (C) 2008 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 2, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING.  If not, write to
+   the Free Software Foundation, 59 Temple Place - Suite 330,
+   Boston, MA 02111-1307, USA.  */
+
+/* As a special exception, if you include this header file into source
+   files compiled by GCC, this header file does not by itself cause
+   the resulting executable to be covered by the GNU General Public
+   License.  This exception does not however invalidate any other
+   reasons why the executable file might be covered by the GNU General
+   Public License.  */
+
+/* Implemented from the specification included in the Intel C++ Compiler
+   User Guide and Reference, version 11.0.  */
+
+#ifndef _WMMINTRIN_H_INCLUDED
+#define _WMMINTRIN_H_INCLUDED
+
+#if !defined (__AES__) && !defined (__CLMUL__)
+# error "AES/CLMUL instruction set not enabled"
+#else
+
+/* We need definitions from the SSE4, SSSE3, SSE3, SSE2 and SSE header
+   files.  */
+#include <smmintrin.h>
+
+/* AES */
+
+#ifdef __AES__
+/* Performs 1 round of AES decryption of the first m128i using 
+   the second m128i as a round key.  */
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_aesdec_si128 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) __builtin_ia32_aesdec128 ((__v2di)__X, (__v2di)__Y);
+}
+
+/* Performs the last round of AES decryption of the first m128i 
+   using the second m128i as a round key.  */
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_aesdeclast_si128 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) __builtin_ia32_aesdeclast128 ((__v2di)__X,
+						 (__v2di)__Y);
+}
+
+/* Performs 1 round of AES encryption of the first m128i using 
+   the second m128i as a round key.  */
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_aesenc_si128 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) __builtin_ia32_aesenc128 ((__v2di)__X, (__v2di)__Y);
+}
+
+/* Performs the last round of AES encryption of the first m128i
+   using the second m128i as a round key.  */
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_aesenclast_si128 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) __builtin_ia32_aesenclast128 ((__v2di)__X, (__v2di)__Y);
+}
+
+/* Performs the InverseMixColumn operation on the source m128i 
+   and stores the result into m128i destination.  */
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_aesimc_si128 (__m128i __X)
+{
+  return (__m128i) __builtin_ia32_aesimc128 ((__v2di)__X);
+}
+
+/* Generates a m128i round key for the input m128i AES cipher key and
+   byte round constant.  The second parameter must be a compile time
+   constant.  */
+#ifdef __OPTIMIZE__
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_aeskeygenassist_si128 (__m128i __X, const int __C)
+{
+  return (__m128i) __builtin_ia32_aeskeygenassist128 ((__v2di)__X, __C);
+}
+#else
+#define _mm_aeskeygenassist_si128(X, C)					\
+  ((__m128i) __builtin_ia32_aeskeygenassist128 ((__v2di)(__m128i)(X),	\
+						(int)(C)))
+#endif
+#endif  /* __AES__ */
+
+/* CLMUL */
+
+#ifdef __CLMUL__
+/* Performs carry-less integer multiplication of 64-bit halves of
+   128-bit input operands.  The third parameter inducates which 64-bit
+   haves of the input parameters v1 and v2 should be used. It must be
+   a compile time constant.  */
+#ifdef __OPTIMIZE__
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_clmulepi64_si128 (__m128i __X, __m128i __Y, const int __I)
+{
+  return (__m128i) __builtin_ia32_pclmulqdq128 ((__v2di)__X,
+						(__v2di)__Y, __I);
+}
+#else
+#define _mm_clmulepi64_si128(X, Y, I)					\
+  ((__m128i) __builtin_ia32_pclmulqdq128 ((__v2di)(__m128i)(X),		\
+					  (__v2di)(__m128i)(Y), (int)(I)))
+#endif
+#endif  /* __CLMUL__  */
+
+#endif /* __AES__/__CLMUL__ */
+
+#endif /* _WMMINTRIN_H_INCLUDED */
Index: gcc/config/i386/cpuid.h
===================================================================
--- gcc/config/i386/cpuid.h	(.../fsf/trunk)	(revision 2007)
+++ gcc/config/i386/cpuid.h	(.../branches/aes)	(revision 2007)
@@ -33,11 +33,13 @@
 
 /* %ecx */
 #define bit_SSE3	(1 << 0)
+#define bit_CLMUL	(1 << 1)
 #define bit_SSSE3	(1 << 9)
 #define bit_CMPXCHG16B	(1 << 13)
 #define bit_SSE4_1	(1 << 19)
 #define bit_SSE4_2	(1 << 20)
 #define bit_POPCNT	(1 << 23)
+#define bit_AES		(1 << 25)
 
 /* %edx */
 #define bit_CMPXCHG8B	(1 << 8)
Index: gcc/config/i386/sse.md
===================================================================
--- gcc/config/i386/sse.md	(.../fsf/trunk)	(revision 2007)
+++ gcc/config/i386/sse.md	(.../branches/aes)	(revision 2007)
@@ -8047,3 +8047,80 @@
 }
   [(set_attr "type" "ssecmp")
    (set_attr "mode" "TI")])
+
+(define_insn "aesenc"
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
+	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0")
+		       (match_operand:V2DI 2 "nonimmediate_operand" "xm")]
+		      UNSPEC_AESENC))]
+  "TARGET_AES"
+  "aesenc\t{%2, %0|%0, %2}"
+  [(set_attr "type" "sselog1")
+   (set_attr "prefix_extra" "1")
+   (set_attr "mode" "TI")])
+
+(define_insn "aesenclast"
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
+	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0")
+		       (match_operand:V2DI 2 "nonimmediate_operand" "xm")]
+		      UNSPEC_AESENCLAST))]
+  "TARGET_AES"
+  "aesenclast\t{%2, %0|%0, %2}"
+  [(set_attr "type" "sselog1")
+   (set_attr "prefix_extra" "1")
+   (set_attr "mode" "TI")])
+
+(define_insn "aesdec"
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
+	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0")
+		       (match_operand:V2DI 2 "nonimmediate_operand" "xm")]
+		      UNSPEC_AESDEC))]
+  "TARGET_AES"
+  "aesdec\t{%2, %0|%0, %2}"
+  [(set_attr "type" "sselog1")
+   (set_attr "prefix_extra" "1")
+   (set_attr "mode" "TI")])
+
+(define_insn "aesdeclast"
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
+	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0")
+		       (match_operand:V2DI 2 "nonimmediate_operand" "xm")]
+		      UNSPEC_AESDECLAST))]
+  "TARGET_AES"
+  "aesdeclast\t{%2, %0|%0, %2}"
+  [(set_attr "type" "sselog1")
+   (set_attr "prefix_extra" "1")
+   (set_attr "mode" "TI")])
+
+(define_insn "aesimc"
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
+	(unspec:V2DI [(match_operand:V2DI 1 "nonimmediate_operand" "xm")]
+		      UNSPEC_AESIMC))]
+  "TARGET_AES"
+  "aesimc\t{%1, %0|%0, %1}"
+  [(set_attr "type" "sselog1")
+   (set_attr "prefix_extra" "1")
+   (set_attr "mode" "TI")])
+
+(define_insn "aeskeygenassist"
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
+	(unspec:V2DI [(match_operand:V2DI 1 "nonimmediate_operand" "xm")
+		      (match_operand:SI 2 "const_0_to_255_operand" "n")]
+		     UNSPEC_AESKEYGENASSIST))]
+  "TARGET_AES"
+  "aeskeygenassist\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "sselog1")
+   (set_attr "prefix_extra" "1")
+   (set_attr "mode" "TI")])
+
+(define_insn "pclmulqdq"
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
+	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0")
+		      (match_operand:V2DI 2 "nonimmediate_operand" "xm")
+		      (match_operand:SI 3 "const_0_to_255_operand" "n")]
+		     UNSPEC_CLMUL))]
+  "TARGET_CLMUL"
+  "pclmulqdq\t{%3, %2, %0|%0, %2, %3}"
+  [(set_attr "type" "sselog1")
+   (set_attr "prefix_extra" "1")
+   (set_attr "mode" "TI")])
Index: gcc/config/i386/i386.opt
===================================================================
--- gcc/config/i386/i386.opt	(.../fsf/trunk)	(revision 2007)
+++ gcc/config/i386/i386.opt	(.../branches/aes)	(revision 2007)
@@ -236,6 +236,14 @@ msse4
 Target RejectNegative Report Mask(ISA_SSE4_2) MaskExists Var(ix86_isa_flags) VarExists
 Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1 and SSE4.2 built-in functions and code generation
 
+maes
+Target Report Mask(ISA_AES) Var(ix86_isa_flags) VarExists
+Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2 and AES built-in functions and code generation
+
+mclmul
+Target Report Mask(ISA_CLMUL) Var(ix86_isa_flags) VarExists
+Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2 and CLMUL built-in functions and code generation
+
 mno-sse4
 Target RejectNegative Report InverseMask(ISA_SSE4_1) MaskExists Var(ix86_isa_flags) VarExists
 Do not support SSE4.1 and SSE4.2 built-in functions and code generation
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(.../fsf/trunk)	(revision 2007)
+++ gcc/config/i386/i386.c	(.../branches/aes)	(revision 2007)
@@ -1786,6 +1786,10 @@ static int ix86_isa_flags_explicit;
   (OPTION_MASK_ISA_SSE4_1 | OPTION_MASK_ISA_SSSE3_SET)
 #define OPTION_MASK_ISA_SSE4_2_SET \
   (OPTION_MASK_ISA_SSE4_2 | OPTION_MASK_ISA_SSE4_1_SET)
+#define OPTION_MASK_ISA_AES_SET \
+  (OPTION_MASK_ISA_AES | OPTION_MASK_ISA_SSE4_2_SET)
+#define OPTION_MASK_ISA_CLMUL_SET \
+  (OPTION_MASK_ISA_CLMUL | OPTION_MASK_ISA_SSE4_2_SET)
 
 /* SSE4 includes both SSE4.1 and SSE4.2. -msse4 should be the same
    as -msse4.2.  */
@@ -1817,7 +1821,12 @@ static int ix86_isa_flags_explicit;
   (OPTION_MASK_ISA_SSSE3 | OPTION_MASK_ISA_SSE4_1_UNSET)
 #define OPTION_MASK_ISA_SSE4_1_UNSET \
   (OPTION_MASK_ISA_SSE4_1 | OPTION_MASK_ISA_SSE4_2_UNSET)
-#define OPTION_MASK_ISA_SSE4_2_UNSET OPTION_MASK_ISA_SSE4_2
+#define OPTION_MASK_ISA_SSE4_2_UNSET \
+  (OPTION_MASK_ISA_SSE4_2 \
+   | OPTION_MASK_ISA_AES_UNSET \
+   | OPTION_MASK_ISA_CLMUL_UNSET)
+#define OPTION_MASK_ISA_AES_UNSET OPTION_MASK_ISA_AES
+#define OPTION_MASK_ISA_CLMUL_UNSET OPTION_MASK_ISA_CLMUL
 
 /* SSE4 includes both SSE4.1 and SSE4.2.  -mno-sse4 should the same
    as -mno-sse4.1. */
@@ -1947,6 +1956,32 @@ ix86_handle_option (size_t code, const c
 	}
       return true;
 
+    case OPT_maes:
+      if (value)
+	{
+	  ix86_isa_flags |= OPTION_MASK_ISA_AES_SET;
+	  ix86_isa_flags_explicit |= OPTION_MASK_ISA_AES_SET;
+	}
+      else
+	{
+	  ix86_isa_flags &= ~OPTION_MASK_ISA_AES_UNSET;
+	  ix86_isa_flags_explicit |= OPTION_MASK_ISA_AES_UNSET;
+	}
+      return true;
+
+    case OPT_mclmul:
+      if (value)
+	{
+	  ix86_isa_flags |= OPTION_MASK_ISA_CLMUL_SET;
+	  ix86_isa_flags_explicit |= OPTION_MASK_ISA_CLMUL_SET;
+	}
+      else
+	{
+	  ix86_isa_flags &= ~OPTION_MASK_ISA_CLMUL_UNSET;
+	  ix86_isa_flags_explicit |= OPTION_MASK_ISA_CLMUL_UNSET;
+	}
+      return true;
+
     case OPT_msse4:
       ix86_isa_flags |= OPTION_MASK_ISA_SSE4_SET;
       ix86_isa_flags_explicit |= OPTION_MASK_ISA_SSE4_SET;
@@ -2078,7 +2113,9 @@ override_options (void)
       PTA_NO_SAHF = 1 << 13,
       PTA_SSE4_1 = 1 << 14,
       PTA_SSE4_2 = 1 << 15,
-      PTA_SSE5 = 1 << 16
+      PTA_SSE5 = 1 << 16,
+      PTA_AES = 1 << 17,
+      PTA_CLMUL = 1 << 18
     };
 
   static struct pta
@@ -2368,6 +2405,12 @@ override_options (void)
 	if (processor_alias_table[i].flags & PTA_SSE4_2
 	    && !(ix86_isa_flags_explicit & OPTION_MASK_ISA_SSE4_2))
 	  ix86_isa_flags |= OPTION_MASK_ISA_SSE4_2;
+	if (processor_alias_table[i].flags & PTA_AES
+	    && !(ix86_isa_flags_explicit & OPTION_MASK_ISA_AES))
+	  ix86_isa_flags |= OPTION_MASK_ISA_AES;
+	if (processor_alias_table[i].flags & PTA_CLMUL
+	    && !(ix86_isa_flags_explicit & OPTION_MASK_ISA_CLMUL))
+	  ix86_isa_flags |= OPTION_MASK_ISA_CLMUL;
 	if (processor_alias_table[i].flags & PTA_SSE4A
 	    && !(ix86_isa_flags_explicit & OPTION_MASK_ISA_SSE4A))
 	  ix86_isa_flags |= OPTION_MASK_ISA_SSE4A;
@@ -17559,6 +17602,17 @@ enum ix86_builtins
 
   IX86_BUILTIN_PCMPGTQ,
 
+  /* AES instructions */
+  IX86_BUILTIN_AESENC128,
+  IX86_BUILTIN_AESENCLAST128,
+  IX86_BUILTIN_AESDEC128,
+  IX86_BUILTIN_AESDECLAST128,
+  IX86_BUILTIN_AESIMC128,
+  IX86_BUILTIN_AESKEYGENASSIST128,
+
+  /* CLMUL instruction */
+  IX86_BUILTIN_PCLMULQDQ128,
+
   /* TFmode support builtins.  */
   IX86_BUILTIN_INFQ,
   IX86_BUILTIN_FABSQ,
@@ -17914,6 +17968,9 @@ static const struct builtin_description 
   { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_pblendw, "__builtin_ia32_pblendw128", IX86_BUILTIN_PBLENDW128, UNKNOWN, 0 },
   { OPTION_MASK_ISA_ROUND, CODE_FOR_sse4_1_roundsd, 0, IX86_BUILTIN_ROUNDSD, UNKNOWN, 0 },
   { OPTION_MASK_ISA_ROUND, CODE_FOR_sse4_1_roundss, 0, IX86_BUILTIN_ROUNDSS, UNKNOWN, 0 },
+
+  /* CLMUL */
+  { OPTION_MASK_ISA_CLMUL, CODE_FOR_pclmulqdq, "__builtin_ia32_pclmulqdq128", IX86_BUILTIN_PCLMULQDQ128, UNKNOWN, 0 },
 };
 
 static const struct builtin_description bdesc_2arg[] =
@@ -18194,6 +18251,13 @@ static const struct builtin_description 
 
   /* SSE4.2 */
   { OPTION_MASK_ISA_SSE4_2, CODE_FOR_sse4_2_gtv2di3, "__builtin_ia32_pcmpgtq", IX86_BUILTIN_PCMPGTQ, UNKNOWN, 0 },
+
+  /* AES */
+  { OPTION_MASK_ISA_AES, CODE_FOR_aesenc, "__builtin_ia32_aesenc128", IX86_BUILTIN_AESENC128, UNKNOWN, 0 },
+  { OPTION_MASK_ISA_AES, CODE_FOR_aesenclast, "__builtin_ia32_aesenclast128", IX86_BUILTIN_AESENCLAST128, UNKNOWN, 0 },
+  { OPTION_MASK_ISA_AES, CODE_FOR_aesdec, "__builtin_ia32_aesdec128", IX86_BUILTIN_AESDEC128, UNKNOWN, 0 },
+  { OPTION_MASK_ISA_AES, CODE_FOR_aesdeclast, "__builtin_ia32_aesdeclast128", IX86_BUILTIN_AESDECLAST128, UNKNOWN, 0 },
+  { OPTION_MASK_ISA_AES, CODE_FOR_aeskeygenassist, 0, IX86_BUILTIN_AESKEYGENASSIST128, UNKNOWN, 0 },
 };
 
 static const struct builtin_description bdesc_1arg[] =
@@ -18271,6 +18335,9 @@ static const struct builtin_description 
   /* Fake 1 arg builtins with a constant smaller than 8 bits as the 2nd arg.  */
   { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_roundpd, 0, IX86_BUILTIN_ROUNDPD, UNKNOWN, 0 },
   { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_roundps, 0, IX86_BUILTIN_ROUNDPS, UNKNOWN, 0 },
+
+  /* AES */
+  { OPTION_MASK_ISA_AES, CODE_FOR_aesimc, "__builtin_ia32_aesimc128", IX86_BUILTIN_AESIMC128, UNKNOWN, 0 },
 };
 
 /* SSE5 */
@@ -19214,6 +19281,9 @@ ix86_init_mmx_sse_builtins (void)
 	case V4SImode:
 	  type = v4si_ftype_v4si;
 	  break;
+	case V2DImode:
+	  type = v2di_ftype_v2di;
+	  break;
 	case V2DFmode:
 	  type = v2df_ftype_v2df;
 	  break;
@@ -19513,6 +19583,9 @@ ix86_init_mmx_sse_builtins (void)
 				    NULL_TREE);
   def_builtin_const (OPTION_MASK_ISA_SSE4_2, "__builtin_ia32_crc32di", ftype, IX86_BUILTIN_CRC32DI);
 
+  /* AES */
+  def_builtin_const (OPTION_MASK_ISA_AES, "__builtin_ia32_aeskeygenassist128", v2di_ftype_v2di_int, IX86_BUILTIN_AESKEYGENASSIST128);
+
   /* AMDFAM10 SSE4A New built-ins  */
   def_builtin (OPTION_MASK_ISA_SSE4A, "__builtin_ia32_movntsd", void_ftype_pdouble_v2df, IX86_BUILTIN_MOVNTSD);
   def_builtin (OPTION_MASK_ISA_SSE4A, "__builtin_ia32_movntss", void_ftype_pfloat_v4sf, IX86_BUILTIN_MOVNTSS);
@@ -19793,6 +19866,44 @@ ix86_expand_crc32 (enum insn_code icode,
   return target;
 }
 
+/* Subroutine of ix86_expand_builtin to take care of binop insns
+   with an immediate.  */
+
+static rtx
+ix86_expand_binop_imm_builtin (enum insn_code icode, tree exp,
+				rtx target)
+{
+  rtx pat;
+  tree arg0 = CALL_EXPR_ARG (exp, 0);
+  tree arg1 = CALL_EXPR_ARG (exp, 1);
+  rtx op0 = expand_normal (arg0);
+  rtx op1 = expand_normal (arg1);
+  enum machine_mode tmode = insn_data[icode].operand[0].mode;
+  enum machine_mode mode0 = insn_data[icode].operand[1].mode;
+  enum machine_mode mode1 = insn_data[icode].operand[2].mode;
+
+  if (! (*insn_data[icode].operand[1].predicate) (op0, mode1))
+    {
+      op0 = copy_to_reg (op0);
+      op0 = simplify_gen_subreg (mode0, op0, GET_MODE (op0), 0);
+    }
+
+  if (! (*insn_data[icode].operand[2].predicate) (op1, mode1))
+    {
+      error ("the last operand must be an immediate");
+      return const0_rtx;
+    }
+
+  target = gen_reg_rtx (V2DImode);
+  pat = GEN_FCN (icode) (simplify_gen_subreg (tmode, target,
+					      V2DImode, 0),
+			 op0, op1);
+  if (! pat)
+    return 0;
+  emit_insn (pat);
+  return target;
+}
+
 /* Subroutine of ix86_expand_builtin to take care of binop insns.  */
 
 static rtx
@@ -20922,34 +21033,18 @@ ix86_expand_builtin (tree exp, rtx targe
       return target;
 
     case IX86_BUILTIN_PSLLDQI128:
+      return ix86_expand_binop_imm_builtin (CODE_FOR_sse2_ashlti3,
+					     exp, target);
+      break;
+
     case IX86_BUILTIN_PSRLDQI128:
-      icode = (fcode == IX86_BUILTIN_PSLLDQI128 ? CODE_FOR_sse2_ashlti3
-	       : CODE_FOR_sse2_lshrti3);
-      arg0 = CALL_EXPR_ARG (exp, 0);
-      arg1 = CALL_EXPR_ARG (exp, 1);
-      op0 = expand_normal (arg0);
-      op1 = expand_normal (arg1);
-      tmode = insn_data[icode].operand[0].mode;
-      mode1 = insn_data[icode].operand[1].mode;
-      mode2 = insn_data[icode].operand[2].mode;
+      return ix86_expand_binop_imm_builtin (CODE_FOR_sse2_lshrti3,
+					     exp, target);
+      break;
 
-      if (! (*insn_data[icode].operand[1].predicate) (op0, mode1))
-	{
-	  op0 = copy_to_reg (op0);
-	  op0 = simplify_gen_subreg (mode1, op0, GET_MODE (op0), 0);
-	}
-      if (! (*insn_data[icode].operand[2].predicate) (op1, mode2))
-	{
-	  error ("shift must be an immediate");
-	  return const0_rtx;
-	}
-      target = gen_reg_rtx (V2DImode);
-      pat = GEN_FCN (icode) (simplify_gen_subreg (tmode, target, V2DImode, 0),
-			     op0, op1);
-      if (! pat)
-	return 0;
-      emit_insn (pat);
-      return target;
+    case IX86_BUILTIN_AESKEYGENASSIST128:
+      return ix86_expand_binop_imm_builtin (CODE_FOR_aeskeygenassist,
+					     exp, target);
 
     case IX86_BUILTIN_FEMMS:
       emit_insn (gen_mmx_femms ());

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: PATCH: Enable Intel AES/CLMUL
  2008-04-03 14:31 PATCH: Enable Intel AES/CLMUL H.J. Lu
@ 2008-04-03 16:21 ` Daniel Berlin
  2008-04-03 16:23   ` H.J. Lu
       [not found] ` <alpine.LSU.1.00.0804062013110.22304@acrux.dbai.tuwien.ac.at>
  1 sibling, 1 reply; 5+ messages in thread
From: Daniel Berlin @ 2008-04-03 16:21 UTC (permalink / raw)
  To: H.J. Lu; +Cc: gcc-patches, ubizjak

So, when are you going to teach the tree level to auto-transform aes
encryption/etc into these instructions?
:)


On Thu, Apr 3, 2008 at 9:50 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> Hi,
>
>  This patch enables Intel AES/CLMUL:
>
>  http://softwareprojects.intel.com/avx/
>
>  OK for mainline?
>
>  Thanks.
>
>  H.J.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: PATCH: Enable Intel AES/CLMUL
  2008-04-03 16:21 ` Daniel Berlin
@ 2008-04-03 16:23   ` H.J. Lu
  0 siblings, 0 replies; 5+ messages in thread
From: H.J. Lu @ 2008-04-03 16:23 UTC (permalink / raw)
  To: Daniel Berlin; +Cc: gcc-patches, ubizjak

On Thu, Apr 3, 2008 at 9:13 AM, Daniel Berlin <dberlin@dberlin.org> wrote:
> So, when are you going to teach the tree level to auto-transform aes
>  encryption/etc into these instructions?
>  :)
>

That will be a fun project. Are there any examples?


H.J.
>
>
>  On Thu, Apr 3, 2008 at 9:50 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>  > Hi,
>  >
>  >  This patch enables Intel AES/CLMUL:
>  >
>  >  http://softwareprojects.intel.com/avx/
>  >
>  >  OK for mainline?
>  >
>  >  Thanks.
>  >
>  >  H.J.
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* PATCH: Mention Intel AES/PCLMUL
       [not found] ` <alpine.LSU.1.00.0804062013110.22304@acrux.dbai.tuwien.ac.at>
@ 2008-04-06 21:06   ` H.J. Lu
  2008-04-06 21:17     ` Gerald Pfeifer
  0 siblings, 1 reply; 5+ messages in thread
From: H.J. Lu @ 2008-04-06 21:06 UTC (permalink / raw)
  To: Gerald Pfeifer; +Cc: gcc-patches

On Sun, Apr 06, 2008 at 08:13:38PM +0200, Gerald Pfeifer wrote:
> Hi HJ,
> 
> On Thu, 3 Apr 2008, H.J. Lu wrote:
> > This patch enables Intel AES/CLMUL:
> > 
> > http://softwareprojects.intel.com/avx/
> 
> would you mind also adding a note to gcc-4.4/changes.html?

Like this?

Thanks.


H.J.
---
Index: gcc-4.4/changes.html
===================================================================
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.4/changes.html,v
retrieving revision 1.6
diff -r1.6 changes.html
54a55,62
> <h3>IA-32/x86-64</h3>
>   <ul>
>     <li>Support for Intel AES built-in functions and code generation are
> 	available via <code>-maes</code>.</li>
>     <li>Support for Intel PCLMUL built-in function and code generation are
> 	available via <code>-mpclmul</code>.</li>
>   </ul>
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: PATCH: Mention Intel AES/PCLMUL
  2008-04-06 21:06   ` PATCH: Mention Intel AES/PCLMUL H.J. Lu
@ 2008-04-06 21:17     ` Gerald Pfeifer
  0 siblings, 0 replies; 5+ messages in thread
From: Gerald Pfeifer @ 2008-04-06 21:17 UTC (permalink / raw)
  To: H.J. Lu; +Cc: gcc-patches

On Sun, 6 Apr 2008, H.J. Lu wrote:
> Like this?

Yep. :-)

Gerald

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2008-04-06 21:05 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-04-03 14:31 PATCH: Enable Intel AES/CLMUL H.J. Lu
2008-04-03 16:21 ` Daniel Berlin
2008-04-03 16:23   ` H.J. Lu
     [not found] ` <alpine.LSU.1.00.0804062013110.22304@acrux.dbai.tuwien.ac.at>
2008-04-06 21:06   ` PATCH: Mention Intel AES/PCLMUL H.J. Lu
2008-04-06 21:17     ` Gerald Pfeifer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).