public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* PATCH: Add LWP support for upcoming AMD Orochi processor.
@ 2009-10-09  1:13 Harsha Jagasia
  2009-10-09 10:07 ` Jakub Jelinek
  2009-11-05  9:32 ` Jakub Jelinek
  0 siblings, 2 replies; 32+ messages in thread
From: Harsha Jagasia @ 2009-10-09  1:13 UTC (permalink / raw)
  To: gcc-patches, hubicka, rth, dwarak.rajagopal, christophe.harle,
	ubizjak, jakub, Harsha Jagasia
  Cc: Harsha Jagasia

Hello,

> > > I think the easiest would be to use (unspec_volatile ...
> > > UNSPECV_LWPVAL...)
> > > instead.  Otherwise the insn that doesn't set any register may be
> > > eliminated
> > > as unneeded.
> >
> > Then these instructions should be defined as unspec_volatile. OTOH,
> > perhaps you should introduce new fixed register to hold LWP state and
> > change all instructions to correctly depend on this register. Since
> > LWP state won't be hidden from the compiler, you can define "normal"
> > insn patterns using "set". This will also benefit scheduler and will
> > increase general happiness of the compiler ;)
> 
> Well, with modeling LWP as register, one would need to add explicit USE
> pattern to every instruction that differs in behaviour based on LWP
> state.  From quick glance at specs it seems that it is about every
> instruction.
> 
> I guess LWP should act as full scheduling barrier (so we don't get code
> we want to profile moved before profiling starts or after it finish), so
> unspec_volatile is preferred variant.

I have changed the patterns to use unspec_volatile/UNSPECV_LWPVAL...

> > 	* config/i386/sse.md (lwp_llwpcbhi1): New lwp pattern.
> >	...
> 
> There is nothing SSE specific in these patterns, so I think they
> should go in i386.md.

Done.

Thanks to Honza, Uros and Jakub for the input.
I will check in below (after changes and acceptance of XOP patch),
unless there is further review.

Thanks,
Harsha

-----------------
2009-10-8  Harsha Jagasia  <harsha.jagasia@amd.com>

	* doc/invoke.texi (-mlwp): Add documentation.
	* doc/extend.texi (x86 intrinsics): Add LWP intrinsics.

	* config.gcc (i[34567]86-*-*): Include lwpintrin.h.
	(x86_64-*-*): Ditto.
	* config/i386/lwpintrin.h: New file, provide x86 compiler
	intrinisics for LWP.
	* config/i386/cpuid.h (bit_LWP): Define LWP bit.
	* config/i386/x86intrin.h: Add LWP check and lwpintrin.h.
	* config/i386/i386-c.c(ix86_target_macros_internal): Check
	ISA_FLAG for LWP. 
	* config/i386/i386.h(TARGET_LWP): New macro for LWP.
	* config/i386/i386.opt (-mlwp): New switch for LWP support.

	* config/i386/i386.c (OPTION_MASK_ISA_LWP_SET): New.
	(OPTION_MASK_ISA_LWP_UNSET): New.	
	(ix86_handle_option): Handle -mlwp.
	(isa_opts): Handle -mlwp.
	(enum pta_flags): Add PTA_LWP.
	(override_options): Add LWP support.

	(IX86_BUILTIN_LLWPCB16): New for LWP intrinsic.
	(IX86_BUILTIN_LLWPCB32): Ditto
	(IX86_BUILTIN_LLWPCB64): Ditto
	(IX86_BUILTIN_SLWPCB16): Ditto
	(IX86_BUILTIN_SLWPCB32): Ditto
	(IX86_BUILTIN_SLWPCB64): Ditto
	(IX86_BUILTIN_LWPVAL16): Ditto
	(IX86_BUILTIN_LWPVAL32): Ditto
	(IX86_BUILTIN_LWPVAL64): Ditto
	(IX86_BUILTIN_LWPINS16): Ditto
	(IX86_BUILTIN_LWPINS32): Ditto
	(IX86_BUILTIN_LWPINS64): Ditto

	(enum  ix86_special_builtin_type): Add LWP intrinsic support.
	(builtin_description): Ditto.
	(ix86_init_mmx_sse_builtins): Ditto.
	(ix86_expand_special_args_builtin): Ditto.

	* config/i386/i386.md (UNSPEC_LLWP_INTRINSIC):
	(UNSPEC_SLWP_INTRINSIC):
	(UNSPECV_LWPVAL_INTRINSIC):
	(UNSPECV_LWPINS_INTRINSIC): Add new UNSPEC for LWP support.
	(lwp_llwpcbhi1): New lwp pattern.
	(lwp_llwpcbsi1): Ditto.
	(lwp_llwpcbdi1): Ditto.
	(lwp_slwpcbhi1): Ditto.
	(lwp_slwpcbsi1): Ditto.
	(lwp_slwpcbdi1): Ditto.
	(lwp_lwpvalhi3): Ditto.
	(lwp_lwpvalsi3): Ditto.
	(lwp_lwpvaldi3): Ditto.
	(lwp_lwpinshi3): Ditto.
	(lwp_lwpinssi3): Ditto.
	(lwp_lwpinsdi3): Ditto.


diff -upNw gcc-xop-2/gcc/config.gcc gcc-lwp/gcc/config.gcc
--- gcc-xop-2/gcc/config.gcc	2009-09-30 22:37:58.000000000 -0500
+++ gcc-lwp/gcc/config.gcc	2009-09-30 16:33:28.000000000 -0500
@@ -288,7 +288,7 @@ i[34567]86-*-*)
 		       pmmintrin.h tmmintrin.h ammintrin.h smmintrin.h
 		       nmmintrin.h bmmintrin.h fma4intrin.h wmmintrin.h
 		       immintrin.h x86intrin.h avxintrin.h xopintrin.h
-		       ia32intrin.h cross-stdarg.h"
+		       ia32intrin.h cross-stdarg.h lwpintrin.h"
 	;;
 x86_64-*-*)
 	cpu_type=i386
@@ -298,7 +298,7 @@ x86_64-*-*)
 		       pmmintrin.h tmmintrin.h ammintrin.h smmintrin.h
 		       nmmintrin.h bmmintrin.h fma4intrin.h wmmintrin.h
 		       immintrin.h x86intrin.h avxintrin.h xopintrin.h 
-		       ia32intrin.h cross-stdarg.h"
+		       ia32intrin.h cross-stdarg.h lwpintrin.h"
 	need_64bit_hwint=yes
 	;;
 ia64-*-*)
diff -upNw gcc-xop-2/gcc/doc/extend.texi gcc-lwp/gcc/doc/extend.texi
--- gcc-xop-2/gcc/doc/extend.texi	2009-09-30 22:37:58.000000000 -0500
+++ gcc-lwp/gcc/doc/extend.texi	2009-10-03 11:49:19.000000000 -0500
@@ -3178,6 +3178,11 @@ Enable/disable the generation of the FMA
 @cindex @code{target("xop")} attribute
 Enable/disable the generation of the XOP instructions.
 
+@item lwp
+@itemx no-lwp
+@cindex @code{target("lwp")} attribute
+Enable/disable the generation of the LWP instructions.
+
 @item ssse3
 @itemx no-ssse3
 @cindex @code{target("ssse3")} attribute
@@ -9066,5 +9071,22 @@ v8sf __builtin_ia32_fmsubaddps256 (v8sf,
 
 @end smallexample
 
+The following built-in functions are available when @option{-mlwp} is used.
+
+@smallexample
+void __builtin_ia32_llwpcb16 (void *);
+void __builtin_ia32_llwpcb32 (void *);
+void __builtin_ia32_llwpcb64 (void *);
+void * __builtin_ia32_llwpcb16 (void);
+void * __builtin_ia32_llwpcb32 (void);
+void * __builtin_ia32_llwpcb64 (void);
+void __builtin_ia32_lwpval16 (unsigned short, unsigned int, unsigned short)
+void __builtin_ia32_lwpval32 (unsigned int, unsigned int, unsigned int)
+void __builtin_ia32_lwpval64 (unsigned __int64, unsigned int, unsigned int)
+unsigned char __builtin_ia32_lwpins16 (unsigned short, unsigned int, unsigned short)
+unsigned char __builtin_ia32_lwpins32 (unsigned int, unsigned int, unsigned int)
+unsigned char __builtin_ia32_lwpins64 (unsigned __int64, unsigned int, unsigned int)
+@end smallexample
+
 The following built-in functions are available when @option{-m3dnow} is used.
 All of them generate the machine instruction that is part of the name.

diff -upNw gcc-xop-2/gcc/doc/invoke.texi gcc-lwp/gcc/doc/invoke.texi
--- gcc-xop-2/gcc/doc/invoke.texi	2009-09-30 22:37:58.000000000 -0500
+++ gcc-lwp/gcc/doc/invoke.texi	2009-09-30 16:33:28.000000000 -0500
@@ -592,7 +592,7 @@ Objective-C and Objective-C++ Dialects}.
 -mcld -mcx16 -msahf -mmovbe -mcrc32 -mrecip @gol
 -mmmx  -msse  -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msse4 -mavx @gol
 -maes -mpclmul @gol
--msse4a -m3dnow -mpopcnt -mabm -mfma4 -mxop @gol
+-msse4a -m3dnow -mpopcnt -mabm -mfma4 -mxop -mlwp @gol
 -mthreads  -mno-align-stringops  -minline-all-stringops @gol
 -minline-stringops-dynamically -mstringop-strategy=@var{alg} @gol
 -mpush-args  -maccumulate-outgoing-args  -m128bit-long-double @gol
@@ -11731,6 +11731,8 @@ preferred alignment to @option{-mpreferr
 @itemx -mno-fma4
 @itemx -mxop
 @itemx -mno-xop
+@itemx -mlwp
+@itemx -mno-lwp
 @itemx -m3dnow
 @itemx -mno-3dnow
 @itemx -mpopcnt
@@ -11745,7 +11747,7 @@ preferred alignment to @option{-mpreferr
 @opindex mno-3dnow
 These switches enable or disable the use of instructions in the MMX,
 SSE, SSE2, SSE3, SSSE3, SSE4.1, AVX, AES, PCLMUL, SSE4A, FMA4, XOP,
-ABM or 3DNow!@: extended instruction sets.
+LWP, ABM or 3DNow!@: extended instruction sets.
 These extensions are also available as built-in functions: see
 @ref{X86 Built-in Functions}, for details of the functions enabled and
 disabled by these switches.
diff -upNw gcc-xop-2/gcc/config/i386/cpuid.h gcc-lwp/gcc/config/i386/cpuid.h
--- gcc-xop-2/gcc/config/i386/cpuid.h	2009-09-30 22:37:58.000000000 -0500
+++ gcc-lwp/gcc/config/i386/cpuid.h	2009-09-30 16:33:28.000000000 -0500
@@ -49,6 +49,7 @@
 #define bit_LAHF_LM	(1 << 0)
 #define bit_SSE4a	(1 << 6)
 #define bit_FMA4	(1 << 16)
+#define bit_LWP 	(1 << 15)
 #define bit_XOP         (1 << 11)
 
 /* %edx */
diff -upNw gcc-xop-2/gcc/config/i386/i386.c gcc-lwp/gcc/config/i386/i386.c
--- gcc-xop-2/gcc/config/i386/i386.c	2009-10-07 15:51:22.000000000 -0500
+++ gcc-lwp/gcc/config/i386/i386.c	2009-10-07 14:33:43.000000000 -0500
@@ -1960,6 +1960,8 @@ static int ix86_isa_flags_explicit;
    | OPTION_MASK_ISA_AVX_SET)
 #define OPTION_MASK_ISA_XOP_SET \
   (OPTION_MASK_ISA_XOP | OPTION_MASK_ISA_FMA4_SET)
+#define OPTION_MASK_ISA_LWP_SET \
+  OPTION_MASK_ISA_LWP
 
 /* AES and PCLMUL need SSE2 because they use xmm registers */
 #define OPTION_MASK_ISA_AES_SET \
@@ -2014,6 +2016,7 @@ static int ix86_isa_flags_explicit;
 #define OPTION_MASK_ISA_FMA4_UNSET \
   (OPTION_MASK_ISA_FMA4 | OPTION_MASK_ISA_XOP_UNSET)
 #define OPTION_MASK_ISA_XOP_UNSET OPTION_MASK_ISA_XOP
+#define OPTION_MASK_ISA_LWP_UNSET OPTION_MASK_ISA_LWP
 
 #define OPTION_MASK_ISA_AES_UNSET OPTION_MASK_ISA_AES
 #define OPTION_MASK_ISA_PCLMUL_UNSET OPTION_MASK_ISA_PCLMUL
@@ -2274,6 +2277,19 @@ ix86_handle_option (size_t code, const c
 	}
       return true;
 
+   case OPT_mlwp:
+      if (value)
+	{
+	  ix86_isa_flags |= OPTION_MASK_ISA_LWP_SET;
+	  ix86_isa_flags_explicit |= OPTION_MASK_ISA_LWP_SET;
+	}
+      else
+	{
+	  ix86_isa_flags &= ~OPTION_MASK_ISA_LWP_UNSET;
+	  ix86_isa_flags_explicit |= OPTION_MASK_ISA_LWP_UNSET;
+	}
+      return true;
+
     case OPT_mabm:
       if (value)
 	{
@@ -2403,6 +2419,7 @@ ix86_target_string (int isa, int flags, 
     { "-m64",		OPTION_MASK_ISA_64BIT },
     { "-mfma4",		OPTION_MASK_ISA_FMA4 },
     { "-mxop",		OPTION_MASK_ISA_XOP },
+    { "-mlwp",		OPTION_MASK_ISA_LWP },
     { "-msse4a",	OPTION_MASK_ISA_SSE4A },
     { "-msse4.2",	OPTION_MASK_ISA_SSE4_2 },
     { "-msse4.1",	OPTION_MASK_ISA_SSE4_1 },
@@ -2634,7 +2651,8 @@ override_options (bool main_args_p)
       PTA_FMA = 1 << 19,
       PTA_MOVBE = 1 << 20,
       PTA_FMA4 = 1 << 21,
-      PTA_XOP = 1 << 22
+      PTA_XOP = 1 << 22,
+      PTA_LWP = 1 << 23
     };
 
   static struct pta
@@ -2983,6 +3001,9 @@ override_options (bool main_args_p)
 	if (processor_alias_table[i].flags & PTA_XOP
 	    && !(ix86_isa_flags_explicit & OPTION_MASK_ISA_XOP))
 	  ix86_isa_flags |= OPTION_MASK_ISA_XOP;
+	if (processor_alias_table[i].flags & PTA_LWP
+	    && !(ix86_isa_flags_explicit & OPTION_MASK_ISA_LWP))
+	  ix86_isa_flags |= OPTION_MASK_ISA_LWP;
 	if (processor_alias_table[i].flags & PTA_ABM
 	    && !(ix86_isa_flags_explicit & OPTION_MASK_ISA_ABM))
 	  ix86_isa_flags |= OPTION_MASK_ISA_ABM;
@@ -3668,6 +3689,7 @@ ix86_valid_target_attribute_inner_p (tre
     IX86_ATTR_ISA ("ssse3",	OPT_mssse3),
     IX86_ATTR_ISA ("fma4",	OPT_mfma4),
     IX86_ATTR_ISA ("xop",	OPT_mxop),
+    IX86_ATTR_ISA ("lwp",	OPT_mlwp),
 
     /* string options */
     IX86_ATTR_STR ("arch=",	IX86_FUNCTION_SPECIFIC_ARCH),
@@ -20987,7 +21009,7 @@ enum ix86_builtins
 
   IX86_BUILTIN_CVTUDQ2PS,
 
-  /* FMA4 instructions.  */
+  /* FMA4 and XOP instructions.  */
   IX86_BUILTIN_VFMADDSS,
   IX86_BUILTIN_VFMADDSD,
   IX86_BUILTIN_VFMADDPS,
@@ -21164,6 +21186,20 @@ enum ix86_builtins
   IX86_BUILTIN_VPCOMFALSEQ,
   IX86_BUILTIN_VPCOMTRUEQ,
 
+  /* LWP instructions.  */
+  IX86_BUILTIN_LLWPCB16,
+  IX86_BUILTIN_LLWPCB32,
+  IX86_BUILTIN_LLWPCB64,
+  IX86_BUILTIN_SLWPCB16,
+  IX86_BUILTIN_SLWPCB32,
+  IX86_BUILTIN_SLWPCB64,
+  IX86_BUILTIN_LWPVAL16,
+  IX86_BUILTIN_LWPVAL32,
+  IX86_BUILTIN_LWPVAL64,
+  IX86_BUILTIN_LWPINS16,
+  IX86_BUILTIN_LWPINS32,
+  IX86_BUILTIN_LWPINS64,
+
   IX86_BUILTIN_MAX
 };
 
@@ -21377,7 +21413,13 @@ enum ix86_special_builtin_type
   VOID_FTYPE_PV8SF_V8SF_V8SF,
   VOID_FTYPE_PV4DF_V4DF_V4DF,
   VOID_FTYPE_PV4SF_V4SF_V4SF,
-  VOID_FTYPE_PV2DF_V2DF_V2DF
+  VOID_FTYPE_PV2DF_V2DF_V2DF,
+  VOID_FTYPE_USHORT_UINT_USHORT,
+  VOID_FTYPE_UINT_UINT_UINT,
+  VOID_FTYPE_UINT64_UINT_UINT,
+  UCHAR_FTYPE_USHORT_UINT_USHORT,
+  UCHAR_FTYPE_UINT_UINT_UINT,
+  UCHAR_FTYPE_UINT64_UINT_UINT
 };
 
 /* Builtin types */
@@ -21624,6 +21666,22 @@ static const struct builtin_description 
   { OPTION_MASK_ISA_AVX, CODE_FOR_avx_maskstoreps, "__builtin_ia32_maskstoreps", IX86_BUILTIN_MASKSTOREPS, UNKNOWN, (int) VOID_FTYPE_PV4SF_V4SF_V4SF },
   { OPTION_MASK_ISA_AVX, CODE_FOR_avx_maskstorepd256, "__builtin_ia32_maskstorepd256", IX86_BUILTIN_MASKSTOREPD256, UNKNOWN, (int) VOID_FTYPE_PV4DF_V4DF_V4DF },
   { OPTION_MASK_ISA_AVX, CODE_FOR_avx_maskstoreps256, "__builtin_ia32_maskstoreps256", IX86_BUILTIN_MASKSTOREPS256, UNKNOWN, (int) VOID_FTYPE_PV8SF_V8SF_V8SF },
+
+  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_llwpcbhi1,   "__builtin_ia32_llwpcb16",   IX86_BUILTIN_LLWPCB16,    UNKNOWN,     (int) VOID_FTYPE_VOID },
+  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_llwpcbsi1,   "__builtin_ia32_llwpcb32",   IX86_BUILTIN_LLWPCB32,    UNKNOWN,     (int) VOID_FTYPE_VOID },
+  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_llwpcbdi1,   "__builtin_ia32_llwpcb64",   IX86_BUILTIN_LLWPCB64,    UNKNOWN,     (int) VOID_FTYPE_VOID },
+
+  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_slwpcbhi1,   "__builtin_ia32_slwpcb16",   IX86_BUILTIN_SLWPCB16,    UNKNOWN,     (int) VOID_FTYPE_VOID },
+  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_slwpcbsi1,   "__builtin_ia32_slwpcb32",   IX86_BUILTIN_SLWPCB32,    UNKNOWN,     (int) VOID_FTYPE_VOID },
+  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_slwpcbdi1,   "__builtin_ia32_slwpcb64",   IX86_BUILTIN_SLWPCB64,    UNKNOWN,     (int) VOID_FTYPE_VOID },
+
+  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpvalhi3,   "__builtin_ia32_lwpval16", IX86_BUILTIN_LWPVAL16,  UNKNOWN,     (int) VOID_FTYPE_USHORT_UINT_USHORT },
+  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpvalsi3,   "__builtin_ia32_lwpval32", IX86_BUILTIN_LWPVAL64,  UNKNOWN,     (int) VOID_FTYPE_UINT_UINT_UINT },
+  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpvaldi3,   "__builtin_ia32_lwpval64", IX86_BUILTIN_LWPVAL64,  UNKNOWN,     (int) VOID_FTYPE_UINT64_UINT_UINT },
+  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpinshi3,   "__builtin_ia32_lwpins16", IX86_BUILTIN_LWPINS16,  UNKNOWN,     (int) UCHAR_FTYPE_USHORT_UINT_USHORT },
+  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpinssi3,   "__builtin_ia32_lwpins32", IX86_BUILTIN_LWPINS64,  UNKNOWN,     (int) UCHAR_FTYPE_UINT_UINT_UINT },
+  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpinsdi3,   "__builtin_ia32_lwpins64", IX86_BUILTIN_LWPINS64,  UNKNOWN,     (int) UCHAR_FTYPE_UINT64_UINT_UINT },
+
 };
 
 /* Builtins with variable number of arguments.  */
@@ -23282,6 +23340,50 @@ ix86_init_mmx_sse_builtins (void)
 				integer_type_node,
 				NULL_TREE);
 
+  /* LWP instructions.  */
+
+  tree void_ftype_ushort_unsigned_ushort
+    = build_function_type_list (void_type_node,
+				short_unsigned_type_node,
+				unsigned_type_node,
+				short_unsigned_type_node,
+				NULL_TREE);
+
+  tree void_ftype_unsigned_unsigned_unsigned
+    = build_function_type_list (void_type_node,
+				unsigned_type_node,
+				unsigned_type_node,
+				unsigned_type_node,
+				NULL_TREE);
+
+  tree void_ftype_uint64_unsigned_unsigned
+    = build_function_type_list (void_type_node,
+				long_long_unsigned_type_node,
+				unsigned_type_node,
+				unsigned_type_node,
+				NULL_TREE);
+
+  tree uchar_ftype_ushort_unsigned_ushort
+    = build_function_type_list (unsigned_char_type_node,
+				short_unsigned_type_node,
+				unsigned_type_node,
+				short_unsigned_type_node,
+				NULL_TREE);
+
+  tree uchar_ftype_unsigned_unsigned_unsigned
+    = build_function_type_list (unsigned_char_type_node,
+				unsigned_type_node,
+				unsigned_type_node,
+				unsigned_type_node,
+				NULL_TREE);
+
+  tree uchar_ftype_uint64_unsigned_unsigned
+    = build_function_type_list (unsigned_char_type_node,
+				long_long_unsigned_type_node,
+				unsigned_type_node,
+				unsigned_type_node,
+				NULL_TREE);
+
   tree ftype;
 
   /* Add all special builtins with variable number of operands.  */
@@ -23395,6 +23497,25 @@ ix86_init_mmx_sse_builtins (void)
 	case VOID_FTYPE_PV2DF_V2DF_V2DF:
 	  type = void_ftype_pv2df_v2df_v2df;
 	  break;
+	case VOID_FTYPE_USHORT_UINT_USHORT:
+	  type = void_ftype_ushort_unsigned_ushort;
+	  break;
+	case VOID_FTYPE_UINT_UINT_UINT:
+	  type = void_ftype_unsigned_unsigned_unsigned;
+	  break;
+	case VOID_FTYPE_UINT64_UINT_UINT:
+	  type = void_ftype_uint64_unsigned_unsigned;
+	  break;
+	case UCHAR_FTYPE_USHORT_UINT_USHORT:
+	  type = uchar_ftype_ushort_unsigned_ushort;
+	  break;
+	case UCHAR_FTYPE_UINT_UINT_UINT:
+	  type = uchar_ftype_unsigned_unsigned_unsigned;
+	  break;
+	case UCHAR_FTYPE_UINT64_UINT_UINT:
+	  type = uchar_ftype_uint64_unsigned_unsigned;
+	  break;
+
 	default:
 	  gcc_unreachable ();
 	}
@@ -25275,6 +25396,16 @@ ix86_expand_special_args_builtin (const 
       /* Reserve memory operand for target.  */
       memory = ARRAY_SIZE (args);
       break;
+    case VOID_FTYPE_USHORT_UINT_USHORT:
+    case VOID_FTYPE_UINT_UINT_UINT:
+    case VOID_FTYPE_UINT64_UINT_UINT:
+    case UCHAR_FTYPE_USHORT_UINT_USHORT:
+    case UCHAR_FTYPE_UINT_UINT_UINT:
+    case UCHAR_FTYPE_UINT64_UINT_UINT:
+      nargs = 3;
+      klass = store;
+      memory = 0;
+      break;
     default:
       gcc_unreachable ();
     }
diff -upNw gcc-xop-2/gcc/config/i386/i386-c.c gcc-lwp/gcc/config/i386/i386-c.c
--- gcc-xop-2/gcc/config/i386/i386-c.c	2009-09-30 22:37:58.000000000 -0500
+++ gcc-lwp/gcc/config/i386/i386-c.c	2009-09-30 16:33:28.000000000 -0500
@@ -234,6 +234,8 @@ ix86_target_macros_internal (int isa_fla
     def_or_undef (parse_in, "__FMA4__");
   if (isa_flag & OPTION_MASK_ISA_XOP)
     def_or_undef (parse_in, "__XOP__");
+  if (isa_flag & OPTION_MASK_ISA_LWP)
+    def_or_undef (parse_in, "__LWP__");
   if ((fpmath & FPMATH_SSE) && (isa_flag & OPTION_MASK_ISA_SSE))
     def_or_undef (parse_in, "__SSE_MATH__");
   if ((fpmath & FPMATH_SSE) && (isa_flag & OPTION_MASK_ISA_SSE2))
diff -upNw gcc-xop-2/gcc/config/i386/i386.h gcc-lwp/gcc/config/i386/i386.h
--- gcc-xop-2/gcc/config/i386/i386.h	2009-09-30 22:37:58.000000000 -0500
+++ gcc-lwp/gcc/config/i386/i386.h	2009-09-30 16:33:28.000000000 -0500
@@ -56,6 +56,7 @@ see the files COPYING3 and COPYING.RUNTI
 #define TARGET_SSE4A	OPTION_ISA_SSE4A
 #define TARGET_FMA4	OPTION_ISA_FMA4
 #define TARGET_XOP	OPTION_ISA_XOP
+#define TARGET_LWP	OPTION_ISA_LWP
 #define TARGET_ROUND	OPTION_ISA_ROUND
 #define TARGET_ABM	OPTION_ISA_ABM
 #define TARGET_POPCNT	OPTION_ISA_POPCNT
diff -upNw gcc-xop-2/gcc/config/i386/i386.md gcc-lwp/gcc/config/i386/i386.md
--- gcc-xop-2/gcc/config/i386/i386.md	2009-10-07 15:48:45.000000000 -0500
+++ gcc-lwp/gcc/config/i386/i386.md	2009-10-07 12:42:09.000000000 -0500
@@ -204,6 +204,10 @@
    (UNSPEC_XOP_TRUEFALSE	152)
    (UNSPEC_XOP_PERMUTE		153)
    (UNSPEC_FRCZ			154)
+   (UNSPEC_LLWP_INTRINSIC	155)
+   (UNSPEC_SLWP_INTRINSIC	156)
+   (UNSPECV_LWPVAL_INTRINSIC	157)
+   (UNSPECV_LWPINS_INTRINSIC	158)
 
    ; For AES support
    (UNSPEC_AESENC		159)
@@ -352,7 +356,7 @@
    fmov,fop,fsgn,fmul,fdiv,fpspc,fcmov,fcmp,fxch,fistp,fisttp,frndint,
    sselog,sselog1,sseiadd,sseiadd1,sseishft,sseimul,
    sse,ssemov,sseadd,ssemul,ssecmp,ssecomi,ssecvt,ssecvt1,sseicvt,ssediv,sseins,
-   ssemuladd,sse4arg,
+   ssemuladd,sse4arg,lwp,
    mmx,mmxmov,mmxadd,mmxmul,mmxcmp,mmxcvt,mmxshft"
   (const_string "other"))
 
@@ -22731,6 +22735,120 @@
   [(set_attr "type" "other")
    (set_attr "length" "3")])
 
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;;
+;; LWP instructions
+;;
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+
+(define_insn "lwp_llwpcbhi1"
+  [(unspec [(match_operand:HI 0 "register_operand" "r")]
+  	   UNSPEC_LLWP_INTRINSIC)]
+  "TARGET_LWP"
+  "llwpcb\t%0"
+  [(set_attr "type" "lwp")
+   (set_attr "mode" "HI")])
+
+(define_insn "lwp_llwpcbsi1"
+  [(unspec [(match_operand:SI 0 "register_operand" "r")]
+  	   UNSPEC_LLWP_INTRINSIC)]
+  "TARGET_LWP"
+  "llwpcb\t%0"
+  [(set_attr "type" "lwp")
+   (set_attr "mode" "SI")])
+
+(define_insn "lwp_llwpcbdi1"
+  [(unspec [(match_operand:DI 0 "register_operand" "r")]
+  	   UNSPEC_LLWP_INTRINSIC)]
+  "TARGET_LWP"
+  "llwpcb\t%0"
+  [(set_attr "type" "lwp")
+   (set_attr "mode" "DI")])
+
+(define_insn "lwp_slwpcbhi1"
+  [(unspec [(match_operand:HI 0 "register_operand" "r")]
+  	   UNSPEC_SLWP_INTRINSIC)]
+  "TARGET_LWP"
+  "slwpcb\t%0"
+  [(set_attr "type" "lwp")
+   (set_attr "mode" "HI")])
+
+(define_insn "lwp_slwpcbsi1"
+  [(unspec [(match_operand:SI 0 "register_operand" "r")]
+  	   UNSPEC_SLWP_INTRINSIC)]
+  "TARGET_LWP"
+  "slwpcb\t%0"
+  [(set_attr "type" "lwp")
+   (set_attr "mode" "SI")])
+
+(define_insn "lwp_slwpcbdi1"
+  [(unspec [(match_operand:DI 0 "register_operand" "r")]
+  	   UNSPEC_SLWP_INTRINSIC)]
+  "TARGET_LWP"
+  "slwpcb\t%0"
+  [(set_attr "type" "lwp")
+   (set_attr "mode" "DI")])
+
+(define_insn "lwp_lwpvalhi3"
+  [(unspec_volatile [(match_operand:HI 0 "register_operand" "r")
+  	   	     (match_operand:SI 1 "nonimmediate_operand" "rm")
+	   	     (match_operand:HI 2 "const_int_operand" "")]
+  	   	    UNSPECV_LWPVAL_INTRINSIC)]
+  "TARGET_LWP"
+  "lwpval\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "lwp")
+   (set_attr "mode" "HI")])
+
+(define_insn "lwp_lwpvalsi3"
+  [(unspec_volatile [(match_operand:SI 0 "register_operand" "r")
+    	    	     (match_operand:SI 1 "nonimmediate_operand" "rm")
+	    	     (match_operand:SI 2 "const_int_operand" "")]
+		    UNSPECV_LWPVAL_INTRINSIC)]
+  "TARGET_LWP"
+  "lwpval\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "lwp")
+   (set_attr "mode" "SI")])
+
+(define_insn "lwp_lwpvaldi3"
+  [(unspec_volatile [(match_operand:DI 0 "register_operand" "r")
+  		     (match_operand:SI 1 "nonimmediate_operand" "rm")
+		     (match_operand:SI 2 "const_int_operand" "")]
+		    UNSPECV_LWPVAL_INTRINSIC)]
+  "TARGET_LWP"
+  "lwpval\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "lwp")
+   (set_attr "mode" "DI")])
+
+(define_insn "lwp_lwpinshi3"
+  [(unspec_volatile [(match_operand:HI 0 "register_operand" "r")
+  		     (match_operand:SI 1 "nonimmediate_operand" "rm")
+		     (match_operand:HI 2 "const_int_operand" "")]
+		    UNSPECV_LWPINS_INTRINSIC)]
+  "TARGET_LWP"
+  "lwpins\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "lwp")
+   (set_attr "mode" "HI")])
+
+(define_insn "lwp_lwpinssi3"
+  [(unspec_volatile [(match_operand:SI 0 "register_operand" "r")
+  		     (match_operand:SI 1 "nonimmediate_operand" "rm")
+		     (match_operand:SI 2 "const_int_operand" "")]
+		    UNSPECV_LWPINS_INTRINSIC)]
+  "TARGET_LWP"
+  "lwpins\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "lwp")
+   (set_attr "mode" "SI")])
+
+(define_insn "lwp_lwpinsdi3"
+  [(unspec_volatile [(match_operand:DI 0 "register_operand" "r")
+  		     (match_operand:SI 1 "nonimmediate_operand" "rm")
+		     (match_operand:SI 2 "const_int_operand" "")]
+		    UNSPECV_LWPINS_INTRINSIC)]
+  "TARGET_LWP"
+  "lwpins\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "lwp")
+   (set_attr "mode" "DI")])
+
 (include "mmx.md")
 (include "sse.md")
 (include "sync.md")
diff -upNw gcc-xop-2/gcc/config/i386/i386.opt gcc-lwp/gcc/config/i386/i386.opt
--- gcc-xop-2/gcc/config/i386/i386.opt	2009-09-30 22:37:58.000000000 -0500
+++ gcc-lwp/gcc/config/i386/i386.opt	2009-09-30 16:33:28.000000000 -0500
@@ -318,6 +318,10 @@ mxop
 Target Report Mask(ISA_XOP) Var(ix86_isa_flags) VarExists Save
 Support XOP built-in functions and code generation 
 
+mlwp
+Target Report Mask(ISA_LWP) Var(ix86_isa_flags) VarExists Save
+Support LWP built-in functions and code generation 
+
 mabm
 Target Report Mask(ISA_ABM) Var(ix86_isa_flags) VarExists Save
 Support code generation of Advanced Bit Manipulation (ABM) instructions.
diff -upNw gcc-xop-2/gcc/config/i386/lwpintrin.h gcc-lwp/gcc/config/i386/lwpintrin.h
--- gcc-xop-2/gcc/config/i386/lwpintrin.h	1969-12-31 18:00:00.000000000 -0600
+++ gcc-lwp/gcc/config/i386/lwpintrin.h	2009-10-02 18:30:09.000000000 -0500
@@ -0,0 +1,109 @@
+/* Copyright (C) 2007, 2008, 2009 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _X86INTRIN_H_INCLUDED
+# error "Never use <lwpintrin.h> directly; include <x86intrin.h> instead."
+#endif
+
+#ifndef _LWPINTRIN_H_INCLUDED
+#define _LWPINTRIN_H_INCLUDED
+
+#ifndef __LWP__
+# error "LWP instruction set not enabled"
+#else
+
+extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+__llwpcb16 (void *pcbAddress)
+{
+  __builtin_ia32_llwpcb16 (pcbAddress);
+}
+
+extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+__llwpcb32 (void *pcbAddress)
+{
+  __builtin_ia32_llwpcb32 (pcbAddress);
+}
+
+extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+__llwpcb64 (void *pcbAddress)
+{
+  __builtin_ia32_llwpcb64 (pcbAddress);
+}
+
+extern __inline void * __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+__slwpcb16 (void)
+{
+  return __builtin_ia32_slwpcb16 ();
+}
+
+extern __inline void * __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+__slwpcb32 (void)
+{
+  return __builtin_ia32_slwpcb32 ();
+}
+
+extern __inline void * __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+__slwpcb64 (void)
+{
+  return __builtin_ia32_slwpcb64 ();
+}
+
+extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+__lwpval16 (unsigned short data2, unsigned int data1, unsigned short flags)
+{
+  __builtin_ia32_lwpval16 (data2, data1, flags);
+}
+/*
+extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+__lwpval32 (unsigned int data2, unsigned int data1, unsigned int flags)
+{
+  __builtin_ia32_lwpval32 (data2, data1, flags);
+}
+
+extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+__lwpval64 (unsigned __int64 data2, unsigned int data1, unsigned int flags)
+{
+  __builtin_ia32_lwpval64 (data2, data1, flags);
+}
+
+extern __inline unsigned char __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+__lwpins16 (unsigned short data2, unsigned int data1, unsigned short flags)
+{
+  return __builtin_ia32_lwpins16 (data2, data1, flags);
+}
+
+extern __inline unsigned char __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+__lwpins32 (unsigned int data2, unsigned int data1, unsigned int flags)
+{
+  return __builtin_ia32_lwpins32 (data2, data1, flags);
+}
+
+extern __inline unsigned char __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+__lwpins64 (unsigned __int64 data2, unsigned int data1, unsigned int flags)
+{
+  return __builtin_ia32_lwpins64 (data2, data1, flags);
+}
+*/
+#endif /* __LWP__ */
+
+#endif /* _LWPINTRIN_H_INCLUDED */
diff -upNw gcc-xop-2/gcc/config/i386/x86intrin.h gcc-lwp/gcc/config/i386/x86intrin.h
--- gcc-xop-2/gcc/config/i386/x86intrin.h	2009-09-30 22:37:58.000000000 -0500
+++ gcc-lwp/gcc/config/i386/x86intrin.h	2009-09-30 16:33:28.000000000 -0500
@@ -62,6 +62,10 @@
 #include <xopintrin.h>
 #endif
 
+#ifdef __LWP__
+#include <lwpintrin.h>
+#endif
+
 #if defined (__AES__) || defined (__PCLMUL__)
 #include <wmmintrin.h>
 #endif

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: PATCH: Add LWP support for upcoming AMD Orochi processor.
  2009-10-09  1:13 PATCH: Add LWP support for upcoming AMD Orochi processor Harsha Jagasia
@ 2009-10-09 10:07 ` Jakub Jelinek
  2009-10-22 21:07   ` rajagopal, dwarak
  2009-11-05  9:32 ` Jakub Jelinek
  1 sibling, 1 reply; 32+ messages in thread
From: Jakub Jelinek @ 2009-10-09 10:07 UTC (permalink / raw)
  To: Harsha Jagasia; +Cc: gcc-patches

Hi!

On Thu, Oct 08, 2009 at 07:43:43PM -0500, Harsha Jagasia wrote:

Just minor comments regarding CL formatting:
1) 2009-10-08, not 2009-10-8
2) missing spaces between filename and ( in several places
3) empty lines in CL separate unrelated changes, these are all related, so
there shouldn't be any.  Especially not between lines for the same filename,
unless you write the filename again.
4) All entries end with ., you have e.g. many with Ditto without full stop.
5) I don't believe it is correct to write no comment at all and let it
fall through to some later comment.  You should write the comment on the
first line and for the rest write Likewise. or Ditto., or accumulate them
together with commas.

> 2009-10-8  Harsha Jagasia  <harsha.jagasia@amd.com>
> 
> 	* doc/invoke.texi (-mlwp): Add documentation.
> 	* doc/extend.texi (x86 intrinsics): Add LWP intrinsics.
> 
> 	* config.gcc (i[34567]86-*-*): Include lwpintrin.h.
> 	(x86_64-*-*): Ditto.
> 	* config/i386/lwpintrin.h: New file, provide x86 compiler
> 	intrinisics for LWP.
> 	* config/i386/cpuid.h (bit_LWP): Define LWP bit.
> 	* config/i386/x86intrin.h: Add LWP check and lwpintrin.h.
> 	* config/i386/i386-c.c(ix86_target_macros_internal): Check
> 	ISA_FLAG for LWP. 
> 	* config/i386/i386.h(TARGET_LWP): New macro for LWP.
> 	* config/i386/i386.opt (-mlwp): New switch for LWP support.
> 
> 	* config/i386/i386.c (OPTION_MASK_ISA_LWP_SET): New.
> 	(OPTION_MASK_ISA_LWP_UNSET): New.	
> 	(ix86_handle_option): Handle -mlwp.
> 	(isa_opts): Handle -mlwp.
> 	(enum pta_flags): Add PTA_LWP.
> 	(override_options): Add LWP support.
> 
> 	(IX86_BUILTIN_LLWPCB16): New for LWP intrinsic.
> 	(IX86_BUILTIN_LLWPCB32): Ditto
> 	(IX86_BUILTIN_LLWPCB64): Ditto
> 	(IX86_BUILTIN_SLWPCB16): Ditto
> 	(IX86_BUILTIN_SLWPCB32): Ditto
> 	(IX86_BUILTIN_SLWPCB64): Ditto
> 	(IX86_BUILTIN_LWPVAL16): Ditto
> 	(IX86_BUILTIN_LWPVAL32): Ditto
> 	(IX86_BUILTIN_LWPVAL64): Ditto
> 	(IX86_BUILTIN_LWPINS16): Ditto
> 	(IX86_BUILTIN_LWPINS32): Ditto
> 	(IX86_BUILTIN_LWPINS64): Ditto
> 
> 	(enum  ix86_special_builtin_type): Add LWP intrinsic support.
> 	(builtin_description): Ditto.
> 	(ix86_init_mmx_sse_builtins): Ditto.
> 	(ix86_expand_special_args_builtin): Ditto.
> 
> 	* config/i386/i386.md (UNSPEC_LLWP_INTRINSIC):
> 	(UNSPEC_SLWP_INTRINSIC):
> 	(UNSPECV_LWPVAL_INTRINSIC):
> 	(UNSPECV_LWPINS_INTRINSIC): Add new UNSPEC for LWP support.
> 	(lwp_llwpcbhi1): New lwp pattern.
> 	(lwp_llwpcbsi1): Ditto.
> 	(lwp_llwpcbdi1): Ditto.
> 	(lwp_slwpcbhi1): Ditto.
> 	(lwp_slwpcbsi1): Ditto.
> 	(lwp_slwpcbdi1): Ditto.
> 	(lwp_lwpvalhi3): Ditto.
> 	(lwp_lwpvalsi3): Ditto.
> 	(lwp_lwpvaldi3): Ditto.
> 	(lwp_lwpinshi3): Ditto.
> 	(lwp_lwpinssi3): Ditto.
> 	(lwp_lwpinsdi3): Ditto.

	Jakub

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: PATCH: Add LWP support for upcoming AMD Orochi processor.
  2009-10-09 10:07 ` Jakub Jelinek
@ 2009-10-22 21:07   ` rajagopal, dwarak
  0 siblings, 0 replies; 32+ messages in thread
From: rajagopal, dwarak @ 2009-10-22 21:07 UTC (permalink / raw)
  To: Jakub Jelinek, Jagasia, Harsha; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 869 bytes --]

Hi Jakub,

> Just minor comments regarding CL formatting:
> 1) 2009-10-08, not 2009-10-8
> 2) missing spaces between filename and ( in several places
> 3) empty lines in CL separate unrelated changes, these are all related, so
> there shouldn't be any.  Especially not between lines for the same
> filename,
> unless you write the filename again.
> 4) All entries end with ., you have e.g. many with Ditto without full
> stop.
> 5) I don't believe it is correct to write no comment at all and let it
> fall through to some later comment.  You should write the comment on the
> first line and for the rest write Likewise. or Ditto., or accumulate them
> together with commas.

Thanks for pointing these issues. I have fixed this and I will check in the patch (after I check in the XOP patch) unless there is any other comment.

Thanks,
Dwarak


 

[-- Attachment #2: lwp-gcc.patch --]
[-- Type: application/octet-stream, Size: 23914 bytes --]

diff -purNw xop/gcc/config/i386/cpuid.h lwp/gcc/config/i386/cpuid.h
--- xop/gcc/config/i386/cpuid.h	2009-10-21 15:49:24.000000000 -0500
+++ lwp/gcc/config/i386/cpuid.h	2009-10-21 15:39:03.000000000 -0500
@@ -49,6 +49,7 @@
 #define bit_LAHF_LM	(1 << 0)
 #define bit_SSE4a	(1 << 6)
 #define bit_FMA4	(1 << 16)
+#define bit_LWP 	(1 << 15)
 #define bit_XOP         (1 << 11)
 
 /* %edx */
diff -purNw xop/gcc/config/i386/i386.c lwp/gcc/config/i386/i386.c
--- xop/gcc/config/i386/i386.c	2009-10-21 15:49:24.000000000 -0500
+++ lwp/gcc/config/i386/i386.c	2009-10-21 15:39:03.000000000 -0500
@@ -1960,6 +1960,8 @@ static int ix86_isa_flags_explicit;
    | OPTION_MASK_ISA_AVX_SET)
 #define OPTION_MASK_ISA_XOP_SET \
   (OPTION_MASK_ISA_XOP | OPTION_MASK_ISA_FMA4_SET)
+#define OPTION_MASK_ISA_LWP_SET \
+  OPTION_MASK_ISA_LWP
 
 /* AES and PCLMUL need SSE2 because they use xmm registers */
 #define OPTION_MASK_ISA_AES_SET \
@@ -2014,6 +2016,7 @@ static int ix86_isa_flags_explicit;
 #define OPTION_MASK_ISA_FMA4_UNSET \
   (OPTION_MASK_ISA_FMA4 | OPTION_MASK_ISA_XOP_UNSET)
 #define OPTION_MASK_ISA_XOP_UNSET OPTION_MASK_ISA_XOP
+#define OPTION_MASK_ISA_LWP_UNSET OPTION_MASK_ISA_LWP
 
 #define OPTION_MASK_ISA_AES_UNSET OPTION_MASK_ISA_AES
 #define OPTION_MASK_ISA_PCLMUL_UNSET OPTION_MASK_ISA_PCLMUL
@@ -2274,6 +2277,19 @@ ix86_handle_option (size_t code, const c
 	}
       return true;
 
+   case OPT_mlwp:
+      if (value)
+	{
+	  ix86_isa_flags |= OPTION_MASK_ISA_LWP_SET;
+	  ix86_isa_flags_explicit |= OPTION_MASK_ISA_LWP_SET;
+	}
+      else
+	{
+	  ix86_isa_flags &= ~OPTION_MASK_ISA_LWP_UNSET;
+	  ix86_isa_flags_explicit |= OPTION_MASK_ISA_LWP_UNSET;
+	}
+      return true;
+
     case OPT_mabm:
       if (value)
 	{
@@ -2403,6 +2419,7 @@ ix86_target_string (int isa, int flags,
     { "-m64",		OPTION_MASK_ISA_64BIT },
     { "-mfma4",		OPTION_MASK_ISA_FMA4 },
     { "-mxop",		OPTION_MASK_ISA_XOP },
+    { "-mlwp",		OPTION_MASK_ISA_LWP },
     { "-msse4a",	OPTION_MASK_ISA_SSE4A },
     { "-msse4.2",	OPTION_MASK_ISA_SSE4_2 },
     { "-msse4.1",	OPTION_MASK_ISA_SSE4_1 },
@@ -2634,7 +2651,8 @@ override_options (bool main_args_p)
       PTA_FMA = 1 << 19,
       PTA_MOVBE = 1 << 20,
       PTA_FMA4 = 1 << 21,
-      PTA_XOP = 1 << 22
+      PTA_XOP = 1 << 22,
+      PTA_LWP = 1 << 23
     };
 
   static struct pta
@@ -2983,6 +3001,9 @@ override_options (bool main_args_p)
 	if (processor_alias_table[i].flags & PTA_XOP
 	    && !(ix86_isa_flags_explicit & OPTION_MASK_ISA_XOP))
 	  ix86_isa_flags |= OPTION_MASK_ISA_XOP;
+	if (processor_alias_table[i].flags & PTA_LWP
+	    && !(ix86_isa_flags_explicit & OPTION_MASK_ISA_LWP))
+	  ix86_isa_flags |= OPTION_MASK_ISA_LWP;
 	if (processor_alias_table[i].flags & PTA_ABM
 	    && !(ix86_isa_flags_explicit & OPTION_MASK_ISA_ABM))
 	  ix86_isa_flags |= OPTION_MASK_ISA_ABM;
@@ -3668,6 +3689,7 @@ ix86_valid_target_attribute_inner_p (tre
     IX86_ATTR_ISA ("ssse3",	OPT_mssse3),
     IX86_ATTR_ISA ("fma4",	OPT_mfma4),
     IX86_ATTR_ISA ("xop",	OPT_mxop),
+    IX86_ATTR_ISA ("lwp",	OPT_mlwp),
 
     /* string options */
     IX86_ATTR_STR ("arch=",	IX86_FUNCTION_SPECIFIC_ARCH),
@@ -20868,7 +20890,7 @@ enum ix86_builtins
 
   IX86_BUILTIN_CVTUDQ2PS,
 
-  /* FMA4 instructions.  */
+  /* FMA4 and XOP instructions.  */
   IX86_BUILTIN_VFMADDSS,
   IX86_BUILTIN_VFMADDSD,
   IX86_BUILTIN_VFMADDPS,
@@ -21045,6 +21067,20 @@ enum ix86_builtins
   IX86_BUILTIN_VPCOMFALSEQ,
   IX86_BUILTIN_VPCOMTRUEQ,
 
+  /* LWP instructions.  */
+  IX86_BUILTIN_LLWPCB16,
+  IX86_BUILTIN_LLWPCB32,
+  IX86_BUILTIN_LLWPCB64,
+  IX86_BUILTIN_SLWPCB16,
+  IX86_BUILTIN_SLWPCB32,
+  IX86_BUILTIN_SLWPCB64,
+  IX86_BUILTIN_LWPVAL16,
+  IX86_BUILTIN_LWPVAL32,
+  IX86_BUILTIN_LWPVAL64,
+  IX86_BUILTIN_LWPINS16,
+  IX86_BUILTIN_LWPINS32,
+  IX86_BUILTIN_LWPINS64,
+
   IX86_BUILTIN_MAX
 };
 
@@ -21258,7 +21294,13 @@ enum ix86_special_builtin_type
   VOID_FTYPE_PV8SF_V8SF_V8SF,
   VOID_FTYPE_PV4DF_V4DF_V4DF,
   VOID_FTYPE_PV4SF_V4SF_V4SF,
-  VOID_FTYPE_PV2DF_V2DF_V2DF
+  VOID_FTYPE_PV2DF_V2DF_V2DF,
+  VOID_FTYPE_USHORT_UINT_USHORT,
+  VOID_FTYPE_UINT_UINT_UINT,
+  VOID_FTYPE_UINT64_UINT_UINT,
+  UCHAR_FTYPE_USHORT_UINT_USHORT,
+  UCHAR_FTYPE_UINT_UINT_UINT,
+  UCHAR_FTYPE_UINT64_UINT_UINT
 };
 
 /* Builtin types */
@@ -21505,6 +21547,22 @@ static const struct builtin_description
   { OPTION_MASK_ISA_AVX, CODE_FOR_avx_maskstoreps, "__builtin_ia32_maskstoreps", IX86_BUILTIN_MASKSTOREPS, UNKNOWN, (int) VOID_FTYPE_PV4SF_V4SF_V4SF },
   { OPTION_MASK_ISA_AVX, CODE_FOR_avx_maskstorepd256, "__builtin_ia32_maskstorepd256", IX86_BUILTIN_MASKSTOREPD256, UNKNOWN, (int) VOID_FTYPE_PV4DF_V4DF_V4DF },
   { OPTION_MASK_ISA_AVX, CODE_FOR_avx_maskstoreps256, "__builtin_ia32_maskstoreps256", IX86_BUILTIN_MASKSTOREPS256, UNKNOWN, (int) VOID_FTYPE_PV8SF_V8SF_V8SF },
+
+  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_llwpcbhi1,   "__builtin_ia32_llwpcb16",   IX86_BUILTIN_LLWPCB16,    UNKNOWN,     (int) VOID_FTYPE_VOID },
+  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_llwpcbsi1,   "__builtin_ia32_llwpcb32",   IX86_BUILTIN_LLWPCB32,    UNKNOWN,     (int) VOID_FTYPE_VOID },
+  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_llwpcbdi1,   "__builtin_ia32_llwpcb64",   IX86_BUILTIN_LLWPCB64,    UNKNOWN,     (int) VOID_FTYPE_VOID },
+
+  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_slwpcbhi1,   "__builtin_ia32_slwpcb16",   IX86_BUILTIN_SLWPCB16,    UNKNOWN,     (int) VOID_FTYPE_VOID },
+  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_slwpcbsi1,   "__builtin_ia32_slwpcb32",   IX86_BUILTIN_SLWPCB32,    UNKNOWN,     (int) VOID_FTYPE_VOID },
+  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_slwpcbdi1,   "__builtin_ia32_slwpcb64",   IX86_BUILTIN_SLWPCB64,    UNKNOWN,     (int) VOID_FTYPE_VOID },
+
+  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpvalhi3,   "__builtin_ia32_lwpval16", IX86_BUILTIN_LWPVAL16,  UNKNOWN,     (int) VOID_FTYPE_USHORT_UINT_USHORT },
+  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpvalsi3,   "__builtin_ia32_lwpval32", IX86_BUILTIN_LWPVAL64,  UNKNOWN,     (int) VOID_FTYPE_UINT_UINT_UINT },
+  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpvaldi3,   "__builtin_ia32_lwpval64", IX86_BUILTIN_LWPVAL64,  UNKNOWN,     (int) VOID_FTYPE_UINT64_UINT_UINT },
+  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpinshi3,   "__builtin_ia32_lwpins16", IX86_BUILTIN_LWPINS16,  UNKNOWN,     (int) UCHAR_FTYPE_USHORT_UINT_USHORT },
+  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpinssi3,   "__builtin_ia32_lwpins32", IX86_BUILTIN_LWPINS64,  UNKNOWN,     (int) UCHAR_FTYPE_UINT_UINT_UINT },
+  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpinsdi3,   "__builtin_ia32_lwpins64", IX86_BUILTIN_LWPINS64,  UNKNOWN,     (int) UCHAR_FTYPE_UINT64_UINT_UINT },
+
 };
 
 /* Builtins with variable number of arguments.  */
@@ -23163,6 +23221,50 @@ ix86_init_mmx_sse_builtins (void)
 				integer_type_node,
 				NULL_TREE);
 
+  /* LWP instructions.  */
+
+  tree void_ftype_ushort_unsigned_ushort
+    = build_function_type_list (void_type_node,
+				short_unsigned_type_node,
+				unsigned_type_node,
+				short_unsigned_type_node,
+				NULL_TREE);
+
+  tree void_ftype_unsigned_unsigned_unsigned
+    = build_function_type_list (void_type_node,
+				unsigned_type_node,
+				unsigned_type_node,
+				unsigned_type_node,
+				NULL_TREE);
+
+  tree void_ftype_uint64_unsigned_unsigned
+    = build_function_type_list (void_type_node,
+				long_long_unsigned_type_node,
+				unsigned_type_node,
+				unsigned_type_node,
+				NULL_TREE);
+
+  tree uchar_ftype_ushort_unsigned_ushort
+    = build_function_type_list (unsigned_char_type_node,
+				short_unsigned_type_node,
+				unsigned_type_node,
+				short_unsigned_type_node,
+				NULL_TREE);
+
+  tree uchar_ftype_unsigned_unsigned_unsigned
+    = build_function_type_list (unsigned_char_type_node,
+				unsigned_type_node,
+				unsigned_type_node,
+				unsigned_type_node,
+				NULL_TREE);
+
+  tree uchar_ftype_uint64_unsigned_unsigned
+    = build_function_type_list (unsigned_char_type_node,
+				long_long_unsigned_type_node,
+				unsigned_type_node,
+				unsigned_type_node,
+				NULL_TREE);
+
   tree ftype;
 
   /* Add all special builtins with variable number of operands.  */
@@ -23276,6 +23378,25 @@ ix86_init_mmx_sse_builtins (void)
 	case VOID_FTYPE_PV2DF_V2DF_V2DF:
 	  type = void_ftype_pv2df_v2df_v2df;
 	  break;
+	case VOID_FTYPE_USHORT_UINT_USHORT:
+	  type = void_ftype_ushort_unsigned_ushort;
+	  break;
+	case VOID_FTYPE_UINT_UINT_UINT:
+	  type = void_ftype_unsigned_unsigned_unsigned;
+	  break;
+	case VOID_FTYPE_UINT64_UINT_UINT:
+	  type = void_ftype_uint64_unsigned_unsigned;
+	  break;
+	case UCHAR_FTYPE_USHORT_UINT_USHORT:
+	  type = uchar_ftype_ushort_unsigned_ushort;
+	  break;
+	case UCHAR_FTYPE_UINT_UINT_UINT:
+	  type = uchar_ftype_unsigned_unsigned_unsigned;
+	  break;
+	case UCHAR_FTYPE_UINT64_UINT_UINT:
+	  type = uchar_ftype_uint64_unsigned_unsigned;
+	  break;
+
 	default:
 	  gcc_unreachable ();
 	}
@@ -25167,6 +25288,16 @@ ix86_expand_special_args_builtin (const
       /* Reserve memory operand for target.  */
       memory = ARRAY_SIZE (args);
       break;
+    case VOID_FTYPE_USHORT_UINT_USHORT:
+    case VOID_FTYPE_UINT_UINT_UINT:
+    case VOID_FTYPE_UINT64_UINT_UINT:
+    case UCHAR_FTYPE_USHORT_UINT_USHORT:
+    case UCHAR_FTYPE_UINT_UINT_UINT:
+    case UCHAR_FTYPE_UINT64_UINT_UINT:
+      nargs = 3;
+      klass = store;
+      memory = 0;
+      break;
     default:
       gcc_unreachable ();
     }
diff -purNw xop/gcc/config/i386/i386-c.c lwp/gcc/config/i386/i386-c.c
--- xop/gcc/config/i386/i386-c.c	2009-10-21 15:49:24.000000000 -0500
+++ lwp/gcc/config/i386/i386-c.c	2009-10-21 15:39:03.000000000 -0500
@@ -234,6 +234,8 @@ ix86_target_macros_internal (int isa_fla
     def_or_undef (parse_in, "__FMA4__");
   if (isa_flag & OPTION_MASK_ISA_XOP)
     def_or_undef (parse_in, "__XOP__");
+  if (isa_flag & OPTION_MASK_ISA_LWP)
+    def_or_undef (parse_in, "__LWP__");
   if ((fpmath & FPMATH_SSE) && (isa_flag & OPTION_MASK_ISA_SSE))
     def_or_undef (parse_in, "__SSE_MATH__");
   if ((fpmath & FPMATH_SSE) && (isa_flag & OPTION_MASK_ISA_SSE2))
diff -purNw xop/gcc/config/i386/i386.h lwp/gcc/config/i386/i386.h
--- xop/gcc/config/i386/i386.h	2009-10-21 15:49:24.000000000 -0500
+++ lwp/gcc/config/i386/i386.h	2009-10-21 15:39:03.000000000 -0500
@@ -56,6 +56,7 @@ see the files COPYING3 and COPYING.RUNTI
 #define TARGET_SSE4A	OPTION_ISA_SSE4A
 #define TARGET_FMA4	OPTION_ISA_FMA4
 #define TARGET_XOP	OPTION_ISA_XOP
+#define TARGET_LWP	OPTION_ISA_LWP
 #define TARGET_ROUND	OPTION_ISA_ROUND
 #define TARGET_ABM	OPTION_ISA_ABM
 #define TARGET_POPCNT	OPTION_ISA_POPCNT
diff -purNw xop/gcc/config/i386/i386.md lwp/gcc/config/i386/i386.md
--- xop/gcc/config/i386/i386.md	2009-10-21 15:49:24.000000000 -0500
+++ lwp/gcc/config/i386/i386.md	2009-10-21 15:39:03.000000000 -0500
@@ -204,6 +204,10 @@
    (UNSPEC_XOP_TRUEFALSE	152)
    (UNSPEC_XOP_PERMUTE		153)
    (UNSPEC_FRCZ			154)
+   (UNSPEC_LLWP_INTRINSIC	155)
+   (UNSPEC_SLWP_INTRINSIC	156)
+   (UNSPECV_LWPVAL_INTRINSIC	157)
+   (UNSPECV_LWPINS_INTRINSIC	158)
 
    ; For AES support
    (UNSPEC_AESENC		159)
@@ -353,7 +357,7 @@
    fmov,fop,fsgn,fmul,fdiv,fpspc,fcmov,fcmp,fxch,fistp,fisttp,frndint,
    sselog,sselog1,sseiadd,sseiadd1,sseishft,sseimul,
    sse,ssemov,sseadd,ssemul,ssecmp,ssecomi,ssecvt,ssecvt1,sseicvt,ssediv,sseins,
-   ssemuladd,sse4arg,
+   ssemuladd,sse4arg,lwp,
    mmx,mmxmov,mmxadd,mmxmul,mmxcmp,mmxcvt,mmxshft"
   (const_string "other"))
 
@@ -21788,6 +21792,120 @@
   [(set_attr "type" "other")
    (set_attr "length" "3")])
 
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;;
+;; LWP instructions
+;;
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+
+(define_insn "lwp_llwpcbhi1"
+  [(unspec [(match_operand:HI 0 "register_operand" "r")]
+  	   UNSPEC_LLWP_INTRINSIC)]
+  "TARGET_LWP"
+  "llwpcb\t%0"
+  [(set_attr "type" "lwp")
+   (set_attr "mode" "HI")])
+
+(define_insn "lwp_llwpcbsi1"
+  [(unspec [(match_operand:SI 0 "register_operand" "r")]
+  	   UNSPEC_LLWP_INTRINSIC)]
+  "TARGET_LWP"
+  "llwpcb\t%0"
+  [(set_attr "type" "lwp")
+   (set_attr "mode" "SI")])
+
+(define_insn "lwp_llwpcbdi1"
+  [(unspec [(match_operand:DI 0 "register_operand" "r")]
+  	   UNSPEC_LLWP_INTRINSIC)]
+  "TARGET_LWP"
+  "llwpcb\t%0"
+  [(set_attr "type" "lwp")
+   (set_attr "mode" "DI")])
+
+(define_insn "lwp_slwpcbhi1"
+  [(unspec [(match_operand:HI 0 "register_operand" "r")]
+  	   UNSPEC_SLWP_INTRINSIC)]
+  "TARGET_LWP"
+  "slwpcb\t%0"
+  [(set_attr "type" "lwp")
+   (set_attr "mode" "HI")])
+
+(define_insn "lwp_slwpcbsi1"
+  [(unspec [(match_operand:SI 0 "register_operand" "r")]
+  	   UNSPEC_SLWP_INTRINSIC)]
+  "TARGET_LWP"
+  "slwpcb\t%0"
+  [(set_attr "type" "lwp")
+   (set_attr "mode" "SI")])
+
+(define_insn "lwp_slwpcbdi1"
+  [(unspec [(match_operand:DI 0 "register_operand" "r")]
+  	   UNSPEC_SLWP_INTRINSIC)]
+  "TARGET_LWP"
+  "slwpcb\t%0"
+  [(set_attr "type" "lwp")
+   (set_attr "mode" "DI")])
+
+(define_insn "lwp_lwpvalhi3"
+  [(unspec_volatile [(match_operand:HI 0 "register_operand" "r")
+  	   	     (match_operand:SI 1 "nonimmediate_operand" "rm")
+	   	     (match_operand:HI 2 "const_int_operand" "")]
+  	   	    UNSPECV_LWPVAL_INTRINSIC)]
+  "TARGET_LWP"
+  "lwpval\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "lwp")
+   (set_attr "mode" "HI")])
+
+(define_insn "lwp_lwpvalsi3"
+  [(unspec_volatile [(match_operand:SI 0 "register_operand" "r")
+    	    	     (match_operand:SI 1 "nonimmediate_operand" "rm")
+	    	     (match_operand:SI 2 "const_int_operand" "")]
+		    UNSPECV_LWPVAL_INTRINSIC)]
+  "TARGET_LWP"
+  "lwpval\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "lwp")
+   (set_attr "mode" "SI")])
+
+(define_insn "lwp_lwpvaldi3"
+  [(unspec_volatile [(match_operand:DI 0 "register_operand" "r")
+  		     (match_operand:SI 1 "nonimmediate_operand" "rm")
+		     (match_operand:SI 2 "const_int_operand" "")]
+		    UNSPECV_LWPVAL_INTRINSIC)]
+  "TARGET_LWP"
+  "lwpval\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "lwp")
+   (set_attr "mode" "DI")])
+
+(define_insn "lwp_lwpinshi3"
+  [(unspec_volatile [(match_operand:HI 0 "register_operand" "r")
+  		     (match_operand:SI 1 "nonimmediate_operand" "rm")
+		     (match_operand:HI 2 "const_int_operand" "")]
+		    UNSPECV_LWPINS_INTRINSIC)]
+  "TARGET_LWP"
+  "lwpins\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "lwp")
+   (set_attr "mode" "HI")])
+
+(define_insn "lwp_lwpinssi3"
+  [(unspec_volatile [(match_operand:SI 0 "register_operand" "r")
+  		     (match_operand:SI 1 "nonimmediate_operand" "rm")
+		     (match_operand:SI 2 "const_int_operand" "")]
+		    UNSPECV_LWPINS_INTRINSIC)]
+  "TARGET_LWP"
+  "lwpins\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "lwp")
+   (set_attr "mode" "SI")])
+
+(define_insn "lwp_lwpinsdi3"
+  [(unspec_volatile [(match_operand:DI 0 "register_operand" "r")
+  		     (match_operand:SI 1 "nonimmediate_operand" "rm")
+		     (match_operand:SI 2 "const_int_operand" "")]
+		    UNSPECV_LWPINS_INTRINSIC)]
+  "TARGET_LWP"
+  "lwpins\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "lwp")
+   (set_attr "mode" "DI")])
+
 (include "mmx.md")
 (include "sse.md")
 (include "sync.md")
diff -purNw xop/gcc/config/i386/i386.opt lwp/gcc/config/i386/i386.opt
--- xop/gcc/config/i386/i386.opt	2009-10-21 15:49:24.000000000 -0500
+++ lwp/gcc/config/i386/i386.opt	2009-10-21 15:39:03.000000000 -0500
@@ -318,6 +318,10 @@ mxop
 Target Report Mask(ISA_XOP) Var(ix86_isa_flags) VarExists Save
 Support XOP built-in functions and code generation 
 
+mlwp
+Target Report Mask(ISA_LWP) Var(ix86_isa_flags) VarExists Save
+Support LWP built-in functions and code generation 
+
 mabm
 Target Report Mask(ISA_ABM) Var(ix86_isa_flags) VarExists Save
 Support code generation of Advanced Bit Manipulation (ABM) instructions.
diff -purNw xop/gcc/config/i386/lwpintrin.h lwp/gcc/config/i386/lwpintrin.h
--- xop/gcc/config/i386/lwpintrin.h	1969-12-31 18:00:00.000000000 -0600
+++ lwp/gcc/config/i386/lwpintrin.h	2009-10-21 15:39:03.000000000 -0500
@@ -0,0 +1,109 @@
+/* Copyright (C) 2007, 2008, 2009 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _X86INTRIN_H_INCLUDED
+# error "Never use <lwpintrin.h> directly; include <x86intrin.h> instead."
+#endif
+
+#ifndef _LWPINTRIN_H_INCLUDED
+#define _LWPINTRIN_H_INCLUDED
+
+#ifndef __LWP__
+# error "LWP instruction set not enabled"
+#else
+
+extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+__llwpcb16 (void *pcbAddress)
+{
+  __builtin_ia32_llwpcb16 (pcbAddress);
+}
+
+extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+__llwpcb32 (void *pcbAddress)
+{
+  __builtin_ia32_llwpcb32 (pcbAddress);
+}
+
+extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+__llwpcb64 (void *pcbAddress)
+{
+  __builtin_ia32_llwpcb64 (pcbAddress);
+}
+
+extern __inline void * __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+__slwpcb16 (void)
+{
+  return __builtin_ia32_slwpcb16 ();
+}
+
+extern __inline void * __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+__slwpcb32 (void)
+{
+  return __builtin_ia32_slwpcb32 ();
+}
+
+extern __inline void * __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+__slwpcb64 (void)
+{
+  return __builtin_ia32_slwpcb64 ();
+}
+
+extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+__lwpval16 (unsigned short data2, unsigned int data1, unsigned short flags)
+{
+  __builtin_ia32_lwpval16 (data2, data1, flags);
+}
+/*
+extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+__lwpval32 (unsigned int data2, unsigned int data1, unsigned int flags)
+{
+  __builtin_ia32_lwpval32 (data2, data1, flags);
+}
+
+extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+__lwpval64 (unsigned __int64 data2, unsigned int data1, unsigned int flags)
+{
+  __builtin_ia32_lwpval64 (data2, data1, flags);
+}
+
+extern __inline unsigned char __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+__lwpins16 (unsigned short data2, unsigned int data1, unsigned short flags)
+{
+  return __builtin_ia32_lwpins16 (data2, data1, flags);
+}
+
+extern __inline unsigned char __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+__lwpins32 (unsigned int data2, unsigned int data1, unsigned int flags)
+{
+  return __builtin_ia32_lwpins32 (data2, data1, flags);
+}
+
+extern __inline unsigned char __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+__lwpins64 (unsigned __int64 data2, unsigned int data1, unsigned int flags)
+{
+  return __builtin_ia32_lwpins64 (data2, data1, flags);
+}
+*/
+#endif /* __LWP__ */
+
+#endif /* _LWPINTRIN_H_INCLUDED */
diff -purNw xop/gcc/config/i386/x86intrin.h lwp/gcc/config/i386/x86intrin.h
--- xop/gcc/config/i386/x86intrin.h	2009-10-21 15:49:24.000000000 -0500
+++ lwp/gcc/config/i386/x86intrin.h	2009-10-21 15:39:03.000000000 -0500
@@ -62,6 +62,10 @@
 #include <xopintrin.h>
 #endif
 
+#ifdef __LWP__
+#include <lwpintrin.h>
+#endif
+
 #if defined (__AES__) || defined (__PCLMUL__)
 #include <wmmintrin.h>
 #endif
diff -purNw xop/gcc/config.gcc lwp/gcc/config.gcc
--- xop/gcc/config.gcc	2009-10-21 15:49:16.000000000 -0500
+++ lwp/gcc/config.gcc	2009-10-21 15:39:03.000000000 -0500
@@ -288,7 +288,7 @@ i[34567]86-*-*)
 		       pmmintrin.h tmmintrin.h ammintrin.h smmintrin.h
 		       nmmintrin.h bmmintrin.h fma4intrin.h wmmintrin.h
 		       immintrin.h x86intrin.h avxintrin.h xopintrin.h
-		       ia32intrin.h cross-stdarg.h"
+		       ia32intrin.h cross-stdarg.h lwpintrin.h"
 	;;
 x86_64-*-*)
 	cpu_type=i386
@@ -298,7 +298,7 @@ x86_64-*-*)
 		       pmmintrin.h tmmintrin.h ammintrin.h smmintrin.h
 		       nmmintrin.h bmmintrin.h fma4intrin.h wmmintrin.h
 		       immintrin.h x86intrin.h avxintrin.h xopintrin.h 
-		       ia32intrin.h cross-stdarg.h"
+		       ia32intrin.h cross-stdarg.h lwpintrin.h"
 	need_64bit_hwint=yes
 	;;
 ia64-*-*)
diff -purNw xop/gcc/doc/extend.texi lwp/gcc/doc/extend.texi
--- xop/gcc/doc/extend.texi	2009-10-21 15:49:28.000000000 -0500
+++ lwp/gcc/doc/extend.texi	2009-10-21 15:39:03.000000000 -0500
@@ -3186,6 +3186,11 @@ Enable/disable the generation of the FMA
 @cindex @code{target("xop")} attribute
 Enable/disable the generation of the XOP instructions.
 
+@item lwp
+@itemx no-lwp
+@cindex @code{target("lwp")} attribute
+Enable/disable the generation of the LWP instructions.
+
 @item ssse3
 @itemx no-ssse3
 @cindex @code{target("ssse3")} attribute
@@ -9074,6 +9079,23 @@ v8sf __builtin_ia32_fmsubaddps256 (v8sf,
 
 @end smallexample
 
+The following built-in functions are available when @option{-mlwp} is used.
+
+@smallexample
+void __builtin_ia32_llwpcb16 (void *);
+void __builtin_ia32_llwpcb32 (void *);
+void __builtin_ia32_llwpcb64 (void *);
+void * __builtin_ia32_llwpcb16 (void);
+void * __builtin_ia32_llwpcb32 (void);
+void * __builtin_ia32_llwpcb64 (void);
+void __builtin_ia32_lwpval16 (unsigned short, unsigned int, unsigned short)
+void __builtin_ia32_lwpval32 (unsigned int, unsigned int, unsigned int)
+void __builtin_ia32_lwpval64 (unsigned __int64, unsigned int, unsigned int)
+unsigned char __builtin_ia32_lwpins16 (unsigned short, unsigned int, unsigned short)
+unsigned char __builtin_ia32_lwpins32 (unsigned int, unsigned int, unsigned int)
+unsigned char __builtin_ia32_lwpins64 (unsigned __int64, unsigned int, unsigned int)
+@end smallexample
+
 The following built-in functions are available when @option{-m3dnow} is used.
 All of them generate the machine instruction that is part of the name.
 
diff -purNw xop/gcc/doc/invoke.texi lwp/gcc/doc/invoke.texi
--- xop/gcc/doc/invoke.texi	2009-10-21 15:49:27.000000000 -0500
+++ lwp/gcc/doc/invoke.texi	2009-10-21 15:39:03.000000000 -0500
@@ -594,7 +594,7 @@ Objective-C and Objective-C++ Dialects}.
 -mcld -mcx16 -msahf -mmovbe -mcrc32 -mrecip @gol
 -mmmx  -msse  -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msse4 -mavx @gol
 -maes -mpclmul @gol
--msse4a -m3dnow -mpopcnt -mabm -mfma4 -mxop @gol
+-msse4a -m3dnow -mpopcnt -mabm -mfma4 -mxop -mlwp @gol
 -mthreads  -mno-align-stringops  -minline-all-stringops @gol
 -minline-stringops-dynamically -mstringop-strategy=@var{alg} @gol
 -mpush-args  -maccumulate-outgoing-args  -m128bit-long-double @gol
@@ -11979,6 +11979,8 @@ preferred alignment to @option{-mpreferr
 @itemx -mno-fma4
 @itemx -mxop
 @itemx -mno-xop
+@itemx -mlwp
+@itemx -mno-lwp
 @itemx -m3dnow
 @itemx -mno-3dnow
 @itemx -mpopcnt
@@ -11993,7 +11995,7 @@ preferred alignment to @option{-mpreferr
 @opindex mno-3dnow
 These switches enable or disable the use of instructions in the MMX,
 SSE, SSE2, SSE3, SSSE3, SSE4.1, AVX, AES, PCLMUL, SSE4A, FMA4, XOP,
-ABM or 3DNow!@: extended instruction sets.
+LWP, ABM or 3DNow!@: extended instruction sets.
 These extensions are also available as built-in functions: see
 @ref{X86 Built-in Functions}, for details of the functions enabled and
 disabled by these switches.

[-- Attachment #3: lwp.ChangeLog --]
[-- Type: application/octet-stream, Size: 1984 bytes --]

2009-10-21  Harsha Jagasia  <harsha.jagasia@amd.com>
	    Dwarakanath Rajagopal  <dwarak.rajagopal@amd.com>

	* doc/invoke.texi (-mlwp): Add documentation.
	* doc/extend.texi (x86 intrinsics): Add LWP intrinsics.
	* config.gcc (i[34567]86-*-*): Include lwpintrin.h.
	(x86_64-*-*): Ditto.
	* config/i386/lwpintrin.h: New file, provide x86 compiler
	intrinisics for LWP.
	* config/i386/cpuid.h (bit_LWP): Define LWP bit.
	* config/i386/x86intrin.h: Add LWP check and lwpintrin.h.
	* config/i386/i386-c.c (ix86_target_macros_internal): Check
	ISA_FLAG for LWP. 
	* config/i386/i386.h (TARGET_LWP): New macro for LWP.
	* config/i386/i386.opt (-mlwp): New switch for LWP support.
	* config/i386/i386.c (OPTION_MASK_ISA_LWP_SET): New.
	(OPTION_MASK_ISA_LWP_UNSET): New.	
	(ix86_handle_option): Handle -mlwp.
	(isa_opts): Handle -mlwp.
	(enum pta_flags): Add PTA_LWP.
	(override_options): Add LWP support.
	(IX86_BUILTIN_LLWPCB16): New for LWP intrinsic.
	(IX86_BUILTIN_LLWPCB32): Ditto.
	(IX86_BUILTIN_LLWPCB64): Ditto.
	(IX86_BUILTIN_SLWPCB16): Ditto.
	(IX86_BUILTIN_SLWPCB32): Ditto.
	(IX86_BUILTIN_SLWPCB64): Ditto.
	(IX86_BUILTIN_LWPVAL16): Ditto.
	(IX86_BUILTIN_LWPVAL32): Ditto.
	(IX86_BUILTIN_LWPVAL64): Ditto.
	(IX86_BUILTIN_LWPINS16): Ditto.
	(IX86_BUILTIN_LWPINS32): Ditto.
	(IX86_BUILTIN_LWPINS64): Ditto.
	(enum  ix86_special_builtin_type): Add LWP intrinsic support.
	(builtin_description): Ditto.
	(ix86_init_mmx_sse_builtins): Ditto.
	(ix86_expand_special_args_builtin): Ditto.
	* config/i386/i386.md (UNSPEC_LLWP_INTRINSIC): Add new UNSPEC for 
	LWP support.
	(UNSPEC_SLWP_INTRINSIC): Ditto.
	(UNSPECV_LWPVAL_INTRINSIC): Ditto.
	(UNSPECV_LWPINS_INTRINSIC): Ditto.
	(lwp_llwpcbhi1): New lwp pattern.
	(lwp_llwpcbsi1): Ditto.
	(lwp_llwpcbdi1): Ditto.
	(lwp_slwpcbhi1): Ditto.
	(lwp_slwpcbsi1): Ditto.
	(lwp_slwpcbdi1): Ditto.
	(lwp_lwpvalhi3): Ditto.
	(lwp_lwpvalsi3): Ditto.
	(lwp_lwpvaldi3): Ditto.
	(lwp_lwpinshi3): Ditto.
	(lwp_lwpinssi3): Ditto.
	(lwp_lwpinsdi3): Ditto.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: PATCH: Add LWP support for upcoming AMD Orochi processor.
  2009-10-09  1:13 PATCH: Add LWP support for upcoming AMD Orochi processor Harsha Jagasia
  2009-10-09 10:07 ` Jakub Jelinek
@ 2009-11-05  9:32 ` Jakub Jelinek
  2009-11-05 16:21   ` Jakub Jelinek
  1 sibling, 1 reply; 32+ messages in thread
From: Jakub Jelinek @ 2009-11-05  9:32 UTC (permalink / raw)
  To: Harsha Jagasia; +Cc: gcc-patches, dwarak.rajagopal

On Thu, Oct 08, 2009 at 07:43:43PM -0500, Harsha Jagasia wrote:
> 	* doc/invoke.texi (-mlwp): Add documentation.
> 	* doc/extend.texi (x86 intrinsics): Add LWP intrinsics.
...

I see this got committed now, but testsuite hasn't been adjusted for it.
Please at least change gcc.dg/target/i386/sse-{12,13,14,22,23}.c, so that
they mention also testing lwpintrin.h (search for xopintrin.h), where
compiled with -mxop they are also compiled with -mlwp, etc.
BTW, sse-23.c mentions xopintrin.h just in comments, but doesn't actually
include xopintrin.h nor x86intrin.h, so it probably needs tweaking also for
xop.  Also, if possible, please add some lwp testcases.

	Jakub

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: PATCH: Add LWP support for upcoming AMD Orochi processor.
  2009-11-05  9:32 ` Jakub Jelinek
@ 2009-11-05 16:21   ` Jakub Jelinek
  2009-11-05 16:58     ` Sebastian Pop
  2009-11-06 10:15     ` Jakub Jelinek
  0 siblings, 2 replies; 32+ messages in thread
From: Jakub Jelinek @ 2009-11-05 16:21 UTC (permalink / raw)
  To: dwarak.rajagopal; +Cc: Harsha Jagasia, gcc-patches, Uros Bizjak, Sebastian Pop

On Thu, Nov 05, 2009 at 10:32:07AM +0100, Jakub Jelinek wrote:
> On Thu, Oct 08, 2009 at 07:43:43PM -0500, Harsha Jagasia wrote:
> > 	* doc/invoke.texi (-mlwp): Add documentation.
> > 	* doc/extend.texi (x86 intrinsics): Add LWP intrinsics.
> ...
> 
> I see this got committed now, but testsuite hasn't been adjusted for it.
> Please at least change gcc.dg/target/i386/sse-{12,13,14,22,23}.c, so that
> they mention also testing lwpintrin.h (search for xopintrin.h), where
> compiled with -mxop they are also compiled with -mlwp, etc.
> BTW, sse-23.c mentions xopintrin.h just in comments, but doesn't actually
> include xopintrin.h nor x86intrin.h, so it probably needs tweaking also for
> xop.  Also, if possible, please add some lwp testcases.

As the following patch proves, lwpintrin.h and the whole lwp support is
quite severely broken.

1) llwpcb* - the builtins are declared void __builtin_ia32_llwpcb* (void),
but lwpintrin.h expects them to take void * argument.  If I understand right, the
insn in reality has 3 address sizes to support 16-bit/32-bit/64-bit code,
I fail to see why we'd need 3 different intrinsics, instead of just one and
one builtin that takes void * address and uses the insn matching Pmode.
Furthermore, as the insn has no output, I believe you have to use UNSPECV
instead of UNSPEC.

2) slwpcb* - similarly, the builtins are VOID_FTYPE_VOID, when it is
expected to be void *__builtin_ia32_slwpcb* (void).  Again, I fail to see
why 3 intrinsics are needed, just one would be enough.  In i386.md they have
wrong patterns, as they set the registers it should be something like:
(define_insn "lwp_slwpcb<mode>1"
  [(set (match_operand:P 0 "register_operand" "=r")
	(unspec [(const_int 0)] UNSPEC_SLWP_INTRINSIC))]
  "TARGET_LWP"
  "slwpcb\t%0"
  [(set_attr "type" "lwp")
   (set_attr "mode" "<MODE>")])

3) lwpval*
  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpvalhi3, "__builtin_ia32_lwpval16", IX86_BUILTIN_LWPVAL16,  UNKNOWN,     (int) VOID_FTYPE_USHORT_UINT_USHORT },                                                                                                         
  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpvalsi3, "__builtin_ia32_lwpval32", IX86_BUILTIN_LWPVAL64,  UNKNOWN,     (int) VOID_FTYPE_UINT_UINT_UINT },                                                                                                             
  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpvaldi3, "__builtin_ia32_lwpval64", IX86_BUILTIN_LWPVAL64,  UNKNOWN,     (int) VOID_FTYPE_UINT64_UINT_UINT },
typo on the second line, s/IX86_BUILTIN_LWPVAL64/IX86_BUILTIN_LWPVAL32/
there.
Also, I don't think we have anything like unsigned __int64
type on Linux, guess you want to use int __attribute__((__mode__(__DI__)))
instead.

4) lwpins* is written to return char, what is the return value?
rFlags.CF value after the insn?  The insn pattern needs to be rewritten
to make it clear that it sets the (reg:CC FLAGS_REG) to some unspec
value.  i386.c has similar typo (LWPINS64 instead of LWPINS32 for
lwpins32).  And it needs to arrange for the return value to be somehow set
(define_expand that expands it to the actual lwpins insn plus setc insn?).

2009-11-05  Jakub Jelinek  <jakub@redhat.com>

	* gcc.target/i386/sse-12.c: Compile also with -mlwp.
	* gcc.target/i386/sse-13.c: Likewise.
	* gcc.target/i386/sse-14.c: Likewise.
	* gcc.target/i386/sse-23.c: Add lwp to GCC target pragma,
	include also x86intrin.h.

--- gcc/testsuite/gcc.target/i386/sse-12.c.jj	2009-11-04 18:31:22.000000000 +0100
+++ gcc/testsuite/gcc.target/i386/sse-12.c	2009-11-05 16:20:54.000000000 +0100
@@ -1,7 +1,7 @@
-/* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, xopintrin.h, mm3dnow.h and mm_malloc.h are
-   usable with -O -std=c89 -pedantic-errors.  */
+/* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, xopintrin.h, lwpintrin.h, mm3dnow.h
+   and mm_malloc.h are usable with -O -std=c89 -pedantic-errors.  */
 /* { dg-do compile } */
-/* { dg-options "-O -std=c89 -pedantic-errors -march=k8 -m3dnow -mavx -mfma4 -mxop -maes -mpclmul" } */
+/* { dg-options "-O -std=c89 -pedantic-errors -march=k8 -m3dnow -mavx -mfma4 -mxop -mlwp -maes -mpclmul" } */
 
 #include <x86intrin.h>
 
--- gcc/testsuite/gcc.target/i386/sse-13.c.jj	2009-11-04 18:31:22.000000000 +0100
+++ gcc/testsuite/gcc.target/i386/sse-13.c	2009-11-05 16:21:48.000000000 +0100
@@ -1,12 +1,13 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mxop -maes -mpclmul" } */
+/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mxop -mlwp -maes -mpclmul" } */
 
 #include <mm_malloc.h>
 
 /* Test that the intrinsics compile with optimization.  All of them are
-   defined as inline functions in {,x,e,p,t,s,w,a,b,i}mmintrin.h, xopintrin.h and mm3dnow.h
-   that reference the proper builtin functions.  Defining away "extern" and
-   "__inline" results in all of them being compiled as proper functions.  */
+   defined as inline functions in {,x,e,p,t,s,w,a,b,i}mmintrin.h, xopintrin.h,
+   lwpintrin.h and mm3dnow.h that reference the proper builtin functions.
+   Defining away "extern" and "__inline" results in all of them being compiled
+   as proper functions.  */
 
 #define extern
 #define __inline
--- gcc/testsuite/gcc.target/i386/sse-14.c.jj	2009-11-04 18:31:22.000000000 +0100
+++ gcc/testsuite/gcc.target/i386/sse-14.c	2009-11-05 16:26:39.000000000 +0100
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mxop -msse4a -maes -mpclmul" } */
+/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mxop -mlwp -msse4a -maes -mpclmul" } */
 
 #include <mm_malloc.h>
 
--- gcc/testsuite/gcc.target/i386/sse-23.c.jj	2009-11-04 18:31:22.000000000 +0100
+++ gcc/testsuite/gcc.target/i386/sse-23.c	2009-11-05 16:30:40.000000000 +0100
@@ -4,9 +4,10 @@
 #include <mm_malloc.h>
 
 /* Test that the intrinsics compile with optimization.  All of them are
-   defined as inline functions in {,x,e,p,t,s,w,a}mmintrin.h, xopintrin.h and mm3dnow.h
-   that reference the proper builtin functions.  Defining away "extern" and
-   "__inline" results in all of them being compiled as proper functions.  */
+   defined as inline functions in {,x,e,p,t,s,w,a}mmintrin.h, xopintrin.h,
+   lwpintrin.h and mm3dnow.h that reference the proper builtin functions.
+   Defining away "extern" and "__inline" results in all of them being compiled
+   as proper functions.  */
 
 #define extern
 #define __inline
@@ -99,7 +100,8 @@
 #define __builtin_ia32_vprotdi(A, B) __builtin_ia32_vprotdi(A,1)
 #define __builtin_ia32_vprotqi(A, B) __builtin_ia32_vprotqi(A,1)
 
-#pragma GCC target ("3dnow,sse4,sse4a,aes,pclmul,xop")
+#pragma GCC target ("3dnow,sse4,sse4a,aes,pclmul,xop,lwp")
 #include <wmmintrin.h>
 #include <smmintrin.h>
 #include <mm3dnow.h>
+#include <x86intrin.h>


	Jakub

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: PATCH: Add LWP support for upcoming AMD Orochi processor.
  2009-11-05 16:21   ` Jakub Jelinek
@ 2009-11-05 16:58     ` Sebastian Pop
  2009-11-05 17:03       ` Richard Guenther
  2009-11-05 17:21       ` Uros Bizjak
  2009-11-06 10:15     ` Jakub Jelinek
  1 sibling, 2 replies; 32+ messages in thread
From: Sebastian Pop @ 2009-11-05 16:58 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: dwarak.rajagopal, Harsha Jagasia, gcc-patches, Uros Bizjak

Hi,

On Thu, Nov 5, 2009 at 10:21, Jakub Jelinek <jakub@redhat.com> wrote:
> As the following patch proves, lwpintrin.h and the whole lwp support is
> quite severely broken.
>

Should we revert the LWP patch and fix the issues that you raised,
or should we submit patches on top of the current trunk?

Thanks,
Sebastian Pop
--
AMD / Open Source Compiler Engineering / GNU Tools

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: PATCH: Add LWP support for upcoming AMD Orochi processor.
  2009-11-05 16:58     ` Sebastian Pop
@ 2009-11-05 17:03       ` Richard Guenther
  2009-11-05 17:21       ` Uros Bizjak
  1 sibling, 0 replies; 32+ messages in thread
From: Richard Guenther @ 2009-11-05 17:03 UTC (permalink / raw)
  To: Sebastian Pop
  Cc: Jakub Jelinek, dwarak.rajagopal, Harsha Jagasia, gcc-patches,
	Uros Bizjak

On Thu, Nov 5, 2009 at 5:58 PM, Sebastian Pop <sebpop@gmail.com> wrote:
> Hi,
>
> On Thu, Nov 5, 2009 at 10:21, Jakub Jelinek <jakub@redhat.com> wrote:
>> As the following patch proves, lwpintrin.h and the whole lwp support is
>> quite severely broken.
>>
>
> Should we revert the LWP patch and fix the issues that you raised,
> or should we submit patches on top of the current trunk?

You should submit patches on top of trunk.

Richard.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: PATCH: Add LWP support for upcoming AMD Orochi processor.
  2009-11-05 16:58     ` Sebastian Pop
  2009-11-05 17:03       ` Richard Guenther
@ 2009-11-05 17:21       ` Uros Bizjak
  1 sibling, 0 replies; 32+ messages in thread
From: Uros Bizjak @ 2009-11-05 17:21 UTC (permalink / raw)
  To: Sebastian Pop
  Cc: Jakub Jelinek, dwarak.rajagopal, Harsha Jagasia, gcc-patches

On 11/05/2009 05:58 PM, Sebastian Pop wrote:

> On Thu, Nov 5, 2009 at 10:21, Jakub Jelinek<jakub@redhat.com>  wrote:
>    
>> As the following patch proves, lwpintrin.h and the whole lwp support is
>> quite severely broken.
>>
>>      
> Should we revert the LWP patch and fix the issues that you raised,
> or should we submit patches on top of the current trunk?
>    

Intrinsics should at least pass the sse-X tests that were mentioned in 
this thread many times. These tests ensure, that all intrinsics can be 
used together and won't interfere with each other or won't break with 
various compile time options.

BTW: I don't think that LWP patch was approved by any x86 maintainer.

Uros.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: PATCH: Add LWP support for upcoming AMD Orochi processor.
  2009-11-05 16:21   ` Jakub Jelinek
  2009-11-05 16:58     ` Sebastian Pop
@ 2009-11-06 10:15     ` Jakub Jelinek
  2009-12-10 19:58       ` Sebastian Pop
  1 sibling, 1 reply; 32+ messages in thread
From: Jakub Jelinek @ 2009-11-06 10:15 UTC (permalink / raw)
  To: dwarak.rajagopal; +Cc: Harsha Jagasia, gcc-patches, Uros Bizjak, Sebastian Pop

On Thu, Nov 05, 2009 at 05:21:24PM +0100, Jakub Jelinek wrote:
> As the following patch proves, lwpintrin.h and the whole lwp support is
> quite severely broken.

I forgot:

5) you shouldn't provide DI insns for !TARGET_64BIT, I don't think they are
valid in 32-bit mode

6) for insns that you keep multiple versions of and don't use :P iterator,
you should probably use :SWI248 mode iterator instead of duplicating
the pattern 3 times.

Once the lwpval/lwpins stuff in lwpintrin.h is uncommented, you'll also need
to further adjust my testsuite patch - you want to actually test the
intrinsics that require immediate arguments, like they are tested in other
intrin headers.

And as Uros mentioned, I think the patch hasn't been acked by anyone, just
some random comments have been posted about the patch by various people,
but never a real review.

	Jakub

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: PATCH: Add LWP support for upcoming AMD Orochi processor.
  2009-11-06 10:15     ` Jakub Jelinek
@ 2009-12-10 19:58       ` Sebastian Pop
  2009-12-10 21:01         ` Jakub Jelinek
  0 siblings, 1 reply; 32+ messages in thread
From: Sebastian Pop @ 2009-12-10 19:58 UTC (permalink / raw)
  To: Jakub Jelinek, Richard Henderson; +Cc: gcc-patches, Uros Bizjak

[-- Attachment #1: Type: text/plain, Size: 4038 bytes --]

On Fri, Nov 6, 2009 at 04:11, Jakub Jelinek <jakub@redhat.com> wrote:
> On Thu, Nov 05, 2009 at 05:21:24PM +0100, Jakub Jelinek wrote:
>> As the following patch proves, lwpintrin.h and the whole lwp support is
>> quite severely broken.
>

I realized the brokenness of the LWP support when I had to work on
correcting the following points.  Sorry.

First I am sending out a preliminary patch that does fix some of the
points below, but that still does not pass the testsuite with your
patch applied.

> As the following patch proves, lwpintrin.h and the whole lwp support is
> quite severely broken.
>
> 1) llwpcb* - the builtins are declared void __builtin_ia32_llwpcb* (void),
> but lwpintrin.h expects them to take void * argument.

Fixed.

> If I understand right, the
> insn in reality has 3 address sizes to support 16-bit/32-bit/64-bit code,
> I fail to see why we'd need 3 different intrinsics, instead of just one and
> one builtin that takes void * address and uses the insn matching Pmode.

Unless I am doing something wrong, I remarked that the HI mode is not
generated when I factor it in the :P mode.  In the attached patch I
merged only the 32 and 64 bit modes into one pattern for the llwpcb
and slwpcb insns.

> Furthermore, as the insn has no output, I believe you have to use UNSPECV
> instead of UNSPEC.
>

Fixed.

> 2) slwpcb* - similarly, the builtins are VOID_FTYPE_VOID, when it is
> expected to be void *__builtin_ia32_slwpcb* (void).

Fixed.

> Again, I fail to see
> why 3 intrinsics are needed, just one would be enough.  In i386.md they have
> wrong patterns, as they set the registers it should be something like:
> (define_insn "lwp_slwpcb<mode>1"
>  [(set (match_operand:P 0 "register_operand" "=r")

I fixed the =r part.

>        (unspec [(const_int 0)] UNSPEC_SLWP_INTRINSIC))]
>  "TARGET_LWP"
>  "slwpcb\t%0"
>  [(set_attr "type" "lwp")
>   (set_attr "mode" "<MODE>")])
>

Same remark as above, about HI not generated.

> 3) lwpval*
>  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpvalhi3, "__builtin_ia32_lwpval16", IX86_BUILTIN_LWPVAL16,  UNKNOWN,     (int) VOID_FTYPE_USHORT_UINT_USHORT },
>  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpvalsi3, "__builtin_ia32_lwpval32", IX86_BUILTIN_LWPVAL64,  UNKNOWN,     (int) VOID_FTYPE_UINT_UINT_UINT },
>  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpvaldi3, "__builtin_ia32_lwpval64", IX86_BUILTIN_LWPVAL64,  UNKNOWN,     (int) VOID_FTYPE_UINT64_UINT_UINT },
> typo on the second line, s/IX86_BUILTIN_LWPVAL64/IX86_BUILTIN_LWPVAL32/
> there.

Fixed.

> Also, I don't think we have anything like unsigned __int64
> type on Linux, guess you want to use int __attribute__((__mode__(__DI__)))
> instead.

I fixed this with what other insns are using in their intrinsics:
unsigned long long.

> 4) lwpins* is written to return char, what is the return value?
> rFlags.CF value after the insn?

Yes, rFlags.CF is what the lwpins* intrinsics are supposed to return.

> The insn pattern needs to be rewritten
> to make it clear that it sets the (reg:CC FLAGS_REG) to some unspec
> value.

I am proposing to use (clobber (reg:CC FLAGS_REG)) as some other insns
are doing.

> i386.c has similar typo (LWPINS64 instead of LWPINS32 for
> lwpins32).

Fixed.

> And it needs to arrange for the return value to be somehow set
> (define_expand that expands it to the actual lwpins insn plus setc insn?).
>

I am struggling with this one, could somebody help?

> 5) you shouldn't provide DI insns for !TARGET_64BIT, I don't think they are
> valid in 32-bit mode
>

Fixed.

> 6) for insns that you keep multiple versions of and don't use :P iterator,
> you should probably use :SWI248 mode iterator instead of duplicating
> the pattern 3 times.
>

Not done yet.

> Once the lwpval/lwpins stuff in lwpintrin.h is uncommented, you'll also need
> to further adjust my testsuite patch - you want to actually test the
> intrinsics that require immediate arguments, like they are tested in other
> intrin headers.
>

Not done yet.

Sebastian Pop
--
AMD / Open Source Compiler Engineering / GNU Tools

[-- Attachment #2: 0001-Add-LWP-ABM-and-POPCNT-to-the-testsuite.patch --]
[-- Type: text/x-patch, Size: 5653 bytes --]

From e9b255f81714796ee3bf5b1fcbea63e767920d3e Mon Sep 17 00:00:00 2001
From: Sebastian Pop <sebpop@gmail.com>
Date: Mon, 7 Dec 2009 14:50:15 -0600
Subject: [PATCH] Add LWP, ABM, and POPCNT to the testsuite.

---
 gcc/testsuite/gcc.target/i386/sse-12.c |    6 +++---
 gcc/testsuite/gcc.target/i386/sse-13.c |    9 +++++----
 gcc/testsuite/gcc.target/i386/sse-14.c |    2 +-
 gcc/testsuite/gcc.target/i386/sse-22.c |   12 +++++++-----
 gcc/testsuite/gcc.target/i386/sse-23.c |   13 ++++++++-----
 5 files changed, 24 insertions(+), 18 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/sse-12.c b/gcc/testsuite/gcc.target/i386/sse-12.c
index 4a314e8..77baff0 100644
--- a/gcc/testsuite/gcc.target/i386/sse-12.c
+++ b/gcc/testsuite/gcc.target/i386/sse-12.c
@@ -1,8 +1,8 @@
 /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, xopintrin.h, mm3dnow.h,
-   abmintrin.h and mm_malloc.h are usable with -O -std=c89
-   -pedantic-errors.  */
+   abmintrin.h, lwpintrin.h, popcntintrin.h and mm_malloc.h are usable
+   with -O -std=c89 -pedantic-errors.  */
 /* { dg-do compile } */
-/* { dg-options "-O -std=c89 -pedantic-errors -march=k8 -m3dnow -mavx -mfma4 -mxop -maes -mpclmul -mabm" } */
+/* { dg-options "-O -std=c89 -pedantic-errors -march=k8 -m3dnow -mavx -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlwp" } */
 
 #include <x86intrin.h>
 
diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c
index 546a99f..24853f4 100644
--- a/gcc/testsuite/gcc.target/i386/sse-13.c
+++ b/gcc/testsuite/gcc.target/i386/sse-13.c
@@ -1,13 +1,14 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mxop -maes -mpclmul -mabm" } */
+/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mxop -maes -mpclmul -mpopcnt -mabm -mlwp" } */
 
 #include <mm_malloc.h>
 
 /* Test that the intrinsics compile with optimization.  All of them
    are defined as inline functions in {,x,e,p,t,s,w,a,b,i}mmintrin.h,
-   xopintrin.h, abmintrin.h and mm3dnow.h that reference the proper
-   builtin functions.  Defining away "extern" and "__inline" results
-   in all of them being compiled as proper functions.  */
+   xopintrin.h, abmintrin.h, lwpintrin.h, popcntintrin.h and mm3dnow.h
+   that reference the proper builtin functions.  Defining away
+   "extern" and "__inline" results in all of them being compiled as
+   proper functions.  */
 
 #define extern
 #define __inline
diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c
index 783cd0a..80ceffb 100644
--- a/gcc/testsuite/gcc.target/i386/sse-14.c
+++ b/gcc/testsuite/gcc.target/i386/sse-14.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mxop -msse4a -maes -mpclmul" } */
+/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mxop -msse4a -maes -mpclmul -mpopcnt -mabm -mlwp" } */
 
 #include <mm_malloc.h>
 
diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c
index 541cad4..3c4c1d9 100644
--- a/gcc/testsuite/gcc.target/i386/sse-22.c
+++ b/gcc/testsuite/gcc.target/i386/sse-22.c
@@ -4,10 +4,12 @@
 
 #include <mm_malloc.h>
 
-/* Test that the intrinsics compile without optimization.  All of them are
-   defined as inline functions in {,x,e,p,t,s,w,a}mmintrin.h, xopintrin.h and mm3dnow.h
-   that reference the proper builtin functions.  Defining away "extern" and
-   "__inline" results in all of them being compiled as proper functions.  */
+/* Test that the intrinsics compile without optimization.  All of them
+   are defined as inline functions in {,x,e,p,t,s,w,a}mmintrin.h,
+   xopintrin.h, lwpintrin.h, popcntintrin.h and mm3dnow.h that
+   reference the proper builtin functions.  Defining away "extern" and
+   "__inline" results in all of them being compiled as proper
+   functions.  */
 
 #define extern
 #define __inline
@@ -37,7 +39,7 @@
 
 
 #ifndef DIFFERENT_PRAGMAS
-#pragma GCC target ("mmx,3dnow,sse,sse2,sse3,ssse3,sse4.1,sse4.2,sse4a,aes,pclmul,xop")
+#pragma GCC target ("mmx,3dnow,sse,sse2,sse3,ssse3,sse4.1,sse4.2,sse4a,aes,pclmul,xop,popcnt,abm,lwp")
 #endif
 
 /* Following intrinsics require immediate arguments.  They
diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c
index 3e0fa1f..5390ac0 100644
--- a/gcc/testsuite/gcc.target/i386/sse-23.c
+++ b/gcc/testsuite/gcc.target/i386/sse-23.c
@@ -3,10 +3,12 @@
 
 #include <mm_malloc.h>
 
-/* Test that the intrinsics compile with optimization.  All of them are
-   defined as inline functions in {,x,e,p,t,s,w,a}mmintrin.h, xopintrin.h and mm3dnow.h
-   that reference the proper builtin functions.  Defining away "extern" and
-   "__inline" results in all of them being compiled as proper functions.  */
+/* Test that the intrinsics compile with optimization.  All of them
+   are defined as inline functions in {,x,e,p,t,s,w,a}mmintrin.h,
+   xopintrin.h, lwpintrin.h, popcntintrin.h and mm3dnow.h that
+   reference the proper builtin functions.  Defining away "extern" and
+   "__inline" results in all of them being compiled as proper
+   functions.  */
 
 #define extern
 #define __inline
@@ -99,7 +101,8 @@
 #define __builtin_ia32_vprotdi(A, B) __builtin_ia32_vprotdi(A,1)
 #define __builtin_ia32_vprotqi(A, B) __builtin_ia32_vprotqi(A,1)
 
-#pragma GCC target ("3dnow,sse4,sse4a,aes,pclmul,xop")
+#pragma GCC target ("3dnow,sse4,sse4a,aes,pclmul,xop,abm,popcnt,lwp")
 #include <wmmintrin.h>
 #include <smmintrin.h>
 #include <mm3dnow.h>
+#include <x86intrin.h>
-- 
1.6.0.4


[-- Attachment #3: 0002-wip-LWP.patch --]
[-- Type: text/x-patch, Size: 12971 bytes --]

From 4be242f7557d886edc6c4df9cfcecb486ecb69af Mon Sep 17 00:00:00 2001
From: Sebastian Pop <sebpop@gmail.com>
Date: Thu, 10 Dec 2009 12:47:57 -0600
Subject: [PATCH] wip LWP

---
 gcc/config/i386/i386-builtin-types.def |    1 +
 gcc/config/i386/i386.c                 |   18 +++++----
 gcc/config/i386/i386.md                |   59 ++++++++++++-------------------
 gcc/config/i386/lwpintrin.h            |   27 ++++++++++-----
 4 files changed, 52 insertions(+), 53 deletions(-)

diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def
index e9e4d0c..f0d25f3 100644
--- a/gcc/config/i386/i386-builtin-types.def
+++ b/gcc/config/i386/i386-builtin-types.def
@@ -195,6 +195,7 @@ DEF_FUNCTION_TYPE (V8SF, V8SI)
 DEF_FUNCTION_TYPE (V8SI, V4SI)
 DEF_FUNCTION_TYPE (V8SI, V8SF)
 DEF_FUNCTION_TYPE (VOID, PCVOID)
+DEF_FUNCTION_TYPE (PCVOID, VOID)
 DEF_FUNCTION_TYPE (VOID, UNSIGNED)
 
 DEF_FUNCTION_TYPE (DI, V2DI, INT)
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 0e58a17..f78cfa7 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -21532,19 +21532,19 @@ static const struct builtin_description bdesc_special_args[] =
   { OPTION_MASK_ISA_AVX, CODE_FOR_avx_maskstorepd256, "__builtin_ia32_maskstorepd256", IX86_BUILTIN_MASKSTOREPD256, UNKNOWN, (int) VOID_FTYPE_PV4DF_V4DF_V4DF },
   { OPTION_MASK_ISA_AVX, CODE_FOR_avx_maskstoreps256, "__builtin_ia32_maskstoreps256", IX86_BUILTIN_MASKSTOREPS256, UNKNOWN, (int) VOID_FTYPE_PV8SF_V8SF_V8SF },
 
-  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_llwpcbhi1,   "__builtin_ia32_llwpcb16",   IX86_BUILTIN_LLWPCB16,    UNKNOWN,     (int) VOID_FTYPE_VOID },
-  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_llwpcbsi1,   "__builtin_ia32_llwpcb32",   IX86_BUILTIN_LLWPCB32,    UNKNOWN,     (int) VOID_FTYPE_VOID },
-  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_llwpcbdi1,   "__builtin_ia32_llwpcb64",   IX86_BUILTIN_LLWPCB64,    UNKNOWN,     (int) VOID_FTYPE_VOID },
+  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_llwpcbhi1,   "__builtin_ia32_llwpcb16",   IX86_BUILTIN_LLWPCB16,    UNKNOWN,     (int) VOID_FTYPE_PCVOID },
+  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_llwpcbsi1,   "__builtin_ia32_llwpcb32",   IX86_BUILTIN_LLWPCB32,    UNKNOWN,     (int) VOID_FTYPE_PCVOID },
+  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_llwpcbdi1,   "__builtin_ia32_llwpcb64",   IX86_BUILTIN_LLWPCB64,    UNKNOWN,     (int) VOID_FTYPE_PCVOID },
 
-  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_slwpcbhi1,   "__builtin_ia32_slwpcb16",   IX86_BUILTIN_SLWPCB16,    UNKNOWN,     (int) VOID_FTYPE_VOID },
-  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_slwpcbsi1,   "__builtin_ia32_slwpcb32",   IX86_BUILTIN_SLWPCB32,    UNKNOWN,     (int) VOID_FTYPE_VOID },
-  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_slwpcbdi1,   "__builtin_ia32_slwpcb64",   IX86_BUILTIN_SLWPCB64,    UNKNOWN,     (int) VOID_FTYPE_VOID },
+  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_slwpcbhi1,   "__builtin_ia32_slwpcb16",   IX86_BUILTIN_SLWPCB16,    UNKNOWN,     (int) PCVOID_FTYPE_VOID },
+  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_slwpcbsi1,   "__builtin_ia32_slwpcb32",   IX86_BUILTIN_SLWPCB32,    UNKNOWN,     (int) PCVOID_FTYPE_VOID },
+  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_slwpcbdi1,   "__builtin_ia32_slwpcb64",   IX86_BUILTIN_SLWPCB64,    UNKNOWN,     (int) PCVOID_FTYPE_VOID },
 
   { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpvalhi3,   "__builtin_ia32_lwpval16", IX86_BUILTIN_LWPVAL16,  UNKNOWN,     (int) VOID_FTYPE_USHORT_UINT_USHORT },
-  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpvalsi3,   "__builtin_ia32_lwpval32", IX86_BUILTIN_LWPVAL64,  UNKNOWN,     (int) VOID_FTYPE_UINT_UINT_UINT },
+  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpvalsi3,   "__builtin_ia32_lwpval32", IX86_BUILTIN_LWPVAL32,  UNKNOWN,     (int) VOID_FTYPE_UINT_UINT_UINT },
   { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpvaldi3,   "__builtin_ia32_lwpval64", IX86_BUILTIN_LWPVAL64,  UNKNOWN,     (int) VOID_FTYPE_UINT64_UINT_UINT },
   { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpinshi3,   "__builtin_ia32_lwpins16", IX86_BUILTIN_LWPINS16,  UNKNOWN,     (int) UCHAR_FTYPE_USHORT_UINT_USHORT },
-  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpinssi3,   "__builtin_ia32_lwpins32", IX86_BUILTIN_LWPINS64,  UNKNOWN,     (int) UCHAR_FTYPE_UINT_UINT_UINT },
+  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpinssi3,   "__builtin_ia32_lwpins32", IX86_BUILTIN_LWPINS32,  UNKNOWN,     (int) UCHAR_FTYPE_UINT_UINT_UINT },
   { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpinsdi3,   "__builtin_ia32_lwpins64", IX86_BUILTIN_LWPINS64,  UNKNOWN,     (int) UCHAR_FTYPE_UINT64_UINT_UINT },
 
 };
@@ -23747,6 +23747,7 @@ ix86_expand_special_args_builtin (const struct builtin_description *d,
       emit_insn (GEN_FCN (icode) (target));
       return 0;
     case UINT64_FTYPE_VOID:
+    case PCVOID_FTYPE_VOID:
       nargs = 0;
       klass = load;
       memory = 0;
@@ -23761,6 +23762,7 @@ ix86_expand_special_args_builtin (const struct builtin_description *d,
     case V4DF_FTYPE_PCV2DF:
     case V4DF_FTYPE_PCDOUBLE:
     case V2DF_FTYPE_PCDOUBLE:
+    case VOID_FTYPE_PCVOID:
       nargs = 1;
       klass = load;
       memory = 0;
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index a4e688a..0db4cbb 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -204,7 +204,7 @@
    (UNSPEC_XOP_TRUEFALSE	152)
    (UNSPEC_XOP_PERMUTE		153)
    (UNSPEC_FRCZ			154)
-   (UNSPEC_LLWP_INTRINSIC	155)
+   (UNSPECV_LLWP_INTRINSIC	155)
    (UNSPEC_SLWP_INTRINSIC	156)
    (UNSPECV_LWPVAL_INTRINSIC	157)
    (UNSPECV_LWPINS_INTRINSIC	158)
@@ -20836,57 +20836,41 @@
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 
 (define_insn "lwp_llwpcbhi1"
-  [(unspec [(match_operand:HI 0 "register_operand" "r")]
-  	   UNSPEC_LLWP_INTRINSIC)]
+  [(unspec_volatile [(match_operand:HI 0 "register_operand" "r")]
+  	   	    UNSPECV_LLWP_INTRINSIC)]
   "TARGET_LWP"
   "llwpcb\t%0"
   [(set_attr "type" "lwp")
    (set_attr "mode" "HI")])
 
-(define_insn "lwp_llwpcbsi1"
-  [(unspec [(match_operand:SI 0 "register_operand" "r")]
-  	   UNSPEC_LLWP_INTRINSIC)]
+(define_insn "lwp_llwpcb<mode>1"
+  [(unspec_volatile [(match_operand:P 0 "register_operand" "r")]
+  	   	    UNSPECV_LLWP_INTRINSIC)]
   "TARGET_LWP"
   "llwpcb\t%0"
   [(set_attr "type" "lwp")
-   (set_attr "mode" "SI")])
-
-(define_insn "lwp_llwpcbdi1"
-  [(unspec [(match_operand:DI 0 "register_operand" "r")]
-  	   UNSPEC_LLWP_INTRINSIC)]
-  "TARGET_LWP"
-  "llwpcb\t%0"
-  [(set_attr "type" "lwp")
-   (set_attr "mode" "DI")])
+   (set_attr "mode" "<MODE>")])
 
 (define_insn "lwp_slwpcbhi1"
-  [(unspec [(match_operand:HI 0 "register_operand" "r")]
+  [(unspec [(match_operand:HI 0 "register_operand" "=r")]
   	   UNSPEC_SLWP_INTRINSIC)]
   "TARGET_LWP"
   "slwpcb\t%0"
   [(set_attr "type" "lwp")
    (set_attr "mode" "HI")])
 
-(define_insn "lwp_slwpcbsi1"
-  [(unspec [(match_operand:SI 0 "register_operand" "r")]
-  	   UNSPEC_SLWP_INTRINSIC)]
-  "TARGET_LWP"
-  "slwpcb\t%0"
-  [(set_attr "type" "lwp")
-   (set_attr "mode" "SI")])
-
-(define_insn "lwp_slwpcbdi1"
-  [(unspec [(match_operand:DI 0 "register_operand" "r")]
+(define_insn "lwp_slwpcb<mode>1"
+  [(unspec [(match_operand:P 0 "register_operand" "=r")]
   	   UNSPEC_SLWP_INTRINSIC)]
   "TARGET_LWP"
   "slwpcb\t%0"
   [(set_attr "type" "lwp")
-   (set_attr "mode" "DI")])
+   (set_attr "mode" "<MODE>")])
 
 (define_insn "lwp_lwpvalhi3"
   [(unspec_volatile [(match_operand:HI 0 "register_operand" "r")
   	   	     (match_operand:SI 1 "nonimmediate_operand" "rm")
-	   	     (match_operand:HI 2 "const_int_operand" "")]
+		     (match_operand:HI 2 "const_int_operand" "i")]
   	   	    UNSPECV_LWPVAL_INTRINSIC)]
   "TARGET_LWP"
   "lwpval\t{%2, %1, %0|%0, %1, %2}"
@@ -20896,7 +20880,7 @@
 (define_insn "lwp_lwpvalsi3"
   [(unspec_volatile [(match_operand:SI 0 "register_operand" "r")
     	    	     (match_operand:SI 1 "nonimmediate_operand" "rm")
-	    	     (match_operand:SI 2 "const_int_operand" "")]
+		     (match_operand:SI 2 "const_int_operand" "i")]
 		    UNSPECV_LWPVAL_INTRINSIC)]
   "TARGET_LWP"
   "lwpval\t{%2, %1, %0|%0, %1, %2}"
@@ -20906,7 +20890,7 @@
 (define_insn "lwp_lwpvaldi3"
   [(unspec_volatile [(match_operand:DI 0 "register_operand" "r")
   		     (match_operand:SI 1 "nonimmediate_operand" "rm")
-		     (match_operand:SI 2 "const_int_operand" "")]
+		     (match_operand:SI 2 "const_int_operand" "i")]
 		    UNSPECV_LWPVAL_INTRINSIC)]
   "TARGET_LWP"
   "lwpval\t{%2, %1, %0|%0, %1, %2}"
@@ -20916,8 +20900,9 @@
 (define_insn "lwp_lwpinshi3"
   [(unspec_volatile [(match_operand:HI 0 "register_operand" "r")
   		     (match_operand:SI 1 "nonimmediate_operand" "rm")
-		     (match_operand:HI 2 "const_int_operand" "")]
-		    UNSPECV_LWPINS_INTRINSIC)]
+		     (match_operand:HI 2 "const_int_operand" "i")]
+		    UNSPECV_LWPINS_INTRINSIC)
+   (clobber (reg:CC FLAGS_REG))]
   "TARGET_LWP"
   "lwpins\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "type" "lwp")
@@ -20926,8 +20911,9 @@
 (define_insn "lwp_lwpinssi3"
   [(unspec_volatile [(match_operand:SI 0 "register_operand" "r")
   		     (match_operand:SI 1 "nonimmediate_operand" "rm")
-		     (match_operand:SI 2 "const_int_operand" "")]
-		    UNSPECV_LWPINS_INTRINSIC)]
+		     (match_operand:SI 2 "const_int_operand" "i")]
+		    UNSPECV_LWPINS_INTRINSIC)
+   (clobber (reg:CC FLAGS_REG))]
   "TARGET_LWP"
   "lwpins\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "type" "lwp")
@@ -20936,8 +20922,9 @@
 (define_insn "lwp_lwpinsdi3"
   [(unspec_volatile [(match_operand:DI 0 "register_operand" "r")
   		     (match_operand:SI 1 "nonimmediate_operand" "rm")
-		     (match_operand:SI 2 "const_int_operand" "")]
-		    UNSPECV_LWPINS_INTRINSIC)]
+		     (match_operand:SI 2 "const_int_operand" "i")]
+		    UNSPECV_LWPINS_INTRINSIC)
+   (clobber (reg:CC FLAGS_REG))]
   "TARGET_LWP"
   "lwpins\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "type" "lwp")
diff --git a/gcc/config/i386/lwpintrin.h b/gcc/config/i386/lwpintrin.h
index e5137ec..50ce2ff 100644
--- a/gcc/config/i386/lwpintrin.h
+++ b/gcc/config/i386/lwpintrin.h
@@ -33,59 +33,65 @@
 #else
 
 extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__))
-__llwpcb16 (void *pcbAddress)
+__llwpcb16 (void const *pcbAddress)
 {
   __builtin_ia32_llwpcb16 (pcbAddress);
 }
 
 extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__))
-__llwpcb32 (void *pcbAddress)
+__llwpcb32 (void const *pcbAddress)
 {
   __builtin_ia32_llwpcb32 (pcbAddress);
 }
 
+#ifdef __x86_64__
 extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__))
-__llwpcb64 (void *pcbAddress)
+__llwpcb64 (void const *pcbAddress)
 {
   __builtin_ia32_llwpcb64 (pcbAddress);
 }
+#endif
 
-extern __inline void * __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+extern __inline void const * __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 __slwpcb16 (void)
 {
   return __builtin_ia32_slwpcb16 ();
 }
 
-extern __inline void * __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+extern __inline void const * __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 __slwpcb32 (void)
 {
   return __builtin_ia32_slwpcb32 ();
 }
 
-extern __inline void * __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+#ifdef __x86_64__
+extern __inline void const * __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 __slwpcb64 (void)
 {
   return __builtin_ia32_slwpcb64 ();
 }
+#endif
 
 extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 __lwpval16 (unsigned short data2, unsigned int data1, unsigned short flags)
 {
   __builtin_ia32_lwpval16 (data2, data1, flags);
 }
-/*
 extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 __lwpval32 (unsigned int data2, unsigned int data1, unsigned int flags)
 {
   __builtin_ia32_lwpval32 (data2, data1, flags);
 }
 
+#ifdef __x86_64__
 extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__))
-__lwpval64 (unsigned __int64 data2, unsigned int data1, unsigned int flags)
+__lwpval64 (unsigned long long data2, unsigned int data1, unsigned int flags)
 {
   __builtin_ia32_lwpval64 (data2, data1, flags);
 }
+#endif
 
+/*
 extern __inline unsigned char __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 __lwpins16 (unsigned short data2, unsigned int data1, unsigned short flags)
 {
@@ -98,12 +104,15 @@ __lwpins32 (unsigned int data2, unsigned int data1, unsigned int flags)
   return __builtin_ia32_lwpins32 (data2, data1, flags);
 }
 
+#ifdef __x86_64__
 extern __inline unsigned char __attribute__((__gnu_inline__, __always_inline__, __artificial__))
-__lwpins64 (unsigned __int64 data2, unsigned int data1, unsigned int flags)
+__lwpins64 (unsigned long long data2, unsigned int data1, unsigned int flags)
 {
   return __builtin_ia32_lwpins64 (data2, data1, flags);
 }
+#endif
 */
+
 #endif /* __LWP__ */
 
 #endif /* _LWPINTRIN_H_INCLUDED */
-- 
1.6.0.4


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: PATCH: Add LWP support for upcoming AMD Orochi processor.
  2009-12-10 19:58       ` Sebastian Pop
@ 2009-12-10 21:01         ` Jakub Jelinek
  2009-12-10 21:04           ` Sebastian Pop
  0 siblings, 1 reply; 32+ messages in thread
From: Jakub Jelinek @ 2009-12-10 21:01 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: Richard Henderson, gcc-patches, Uros Bizjak

On Thu, Dec 10, 2009 at 01:42:41PM -0600, Sebastian Pop wrote:
> > 1) llwpcb* - the builtins are declared void __builtin_ia32_llwpcb* (void),
> > but lwpintrin.h expects them to take void * argument.
> 
> Fixed.
> 
> > If I understand right, the
> > insn in reality has 3 address sizes to support 16-bit/32-bit/64-bit code,
> > I fail to see why we'd need 3 different intrinsics, instead of just one and
> > one builtin that takes void * address and uses the insn matching Pmode.
> 
> Unless I am doing something wrong, I remarked that the HI mode is not
> generated when I factor it in the :P mode.  In the attached patch I
> merged only the 32 and 64 bit modes into one pattern for the llwpcb
> and slwpcb insns.

Why do you want the HI mode version generated, ever (except for 16-bit code which
gcc doesn't emit)?  IMNSHO you don't want to ever use even the 32-bit one
for -m64 code.  I believe you want lwpintrin.h to provide one intrinsic,
not three, it takes a void * argument anyway in all 3 cases.  What would be
an intrinsic that just uses some lower bits from the pointer good for?
Why should a user care whether the pointer is 32-bit or 64-bit or 16-bit?

What could make some very limited sense is when that void * pointer is
initialized through say movl symbol, %edx use the 32-bit insn even for -m64
code to save one byte, but 1) I doubt it is worth writing the peepholes
2) you don't want to let the user make this decision, instead you want the
compiler to decide (if at all).  And 16-bit addresses aren't really useful
at all.

	Jakub

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: PATCH: Add LWP support for upcoming AMD Orochi processor.
  2009-12-10 21:01         ` Jakub Jelinek
@ 2009-12-10 21:04           ` Sebastian Pop
  2009-12-10 21:52             ` Jakub Jelinek
  0 siblings, 1 reply; 32+ messages in thread
From: Sebastian Pop @ 2009-12-10 21:04 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Richard Henderson, gcc-patches, Uros Bizjak

[-- Attachment #1: Type: text/plain, Size: 1035 bytes --]

On Thu, Dec 10, 2009 at 14:06, Jakub Jelinek <jakub@redhat.com> wrote:
> Why do you want the HI mode version generated, ever (except for 16-bit code which
> gcc doesn't emit)?  IMNSHO you don't want to ever use even the 32-bit one
> for -m64 code.  I believe you want lwpintrin.h to provide one intrinsic,
> not three, it takes a void * argument anyway in all 3 cases.  What would be
> an intrinsic that just uses some lower bits from the pointer good for?
> Why should a user care whether the pointer is 32-bit or 64-bit or 16-bit?
>
> What could make some very limited sense is when that void * pointer is
> initialized through say movl symbol, %edx use the 32-bit insn even for -m64
> code to save one byte, but 1) I doubt it is worth writing the peepholes
> 2) you don't want to let the user make this decision, instead you want the
> compiler to decide (if at all).  And 16-bit addresses aren't really useful
> at all.

Ok.  I simplified the lwpintrin.h file and the LWP insns patterns like this.

Sebastian

[-- Attachment #2: 0003-LWP-factor-insn-patterns.patch --]
[-- Type: text/x-patch, Size: 9010 bytes --]

From 51d25a33c341f310f67c04d27ef69e0036bcae80 Mon Sep 17 00:00:00 2001
From: Sebastian Pop <sebpop@gmail.com>
Date: Thu, 10 Dec 2009 14:59:51 -0600
Subject: [PATCH] LWP factor insn patterns.

---
 gcc/config/i386/i386.c      |    4 --
 gcc/config/i386/i386.md     |   70 ++++---------------------------------------
 gcc/config/i386/lwpintrin.h |   46 ++++------------------------
 3 files changed, 13 insertions(+), 107 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index f78cfa7..5002654 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -21532,18 +21532,14 @@ static const struct builtin_description bdesc_special_args[] =
   { OPTION_MASK_ISA_AVX, CODE_FOR_avx_maskstorepd256, "__builtin_ia32_maskstorepd256", IX86_BUILTIN_MASKSTOREPD256, UNKNOWN, (int) VOID_FTYPE_PV4DF_V4DF_V4DF },
   { OPTION_MASK_ISA_AVX, CODE_FOR_avx_maskstoreps256, "__builtin_ia32_maskstoreps256", IX86_BUILTIN_MASKSTOREPS256, UNKNOWN, (int) VOID_FTYPE_PV8SF_V8SF_V8SF },
 
-  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_llwpcbhi1,   "__builtin_ia32_llwpcb16",   IX86_BUILTIN_LLWPCB16,    UNKNOWN,     (int) VOID_FTYPE_PCVOID },
   { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_llwpcbsi1,   "__builtin_ia32_llwpcb32",   IX86_BUILTIN_LLWPCB32,    UNKNOWN,     (int) VOID_FTYPE_PCVOID },
   { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_llwpcbdi1,   "__builtin_ia32_llwpcb64",   IX86_BUILTIN_LLWPCB64,    UNKNOWN,     (int) VOID_FTYPE_PCVOID },
 
-  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_slwpcbhi1,   "__builtin_ia32_slwpcb16",   IX86_BUILTIN_SLWPCB16,    UNKNOWN,     (int) PCVOID_FTYPE_VOID },
   { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_slwpcbsi1,   "__builtin_ia32_slwpcb32",   IX86_BUILTIN_SLWPCB32,    UNKNOWN,     (int) PCVOID_FTYPE_VOID },
   { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_slwpcbdi1,   "__builtin_ia32_slwpcb64",   IX86_BUILTIN_SLWPCB64,    UNKNOWN,     (int) PCVOID_FTYPE_VOID },
 
-  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpvalhi3,   "__builtin_ia32_lwpval16", IX86_BUILTIN_LWPVAL16,  UNKNOWN,     (int) VOID_FTYPE_USHORT_UINT_USHORT },
   { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpvalsi3,   "__builtin_ia32_lwpval32", IX86_BUILTIN_LWPVAL32,  UNKNOWN,     (int) VOID_FTYPE_UINT_UINT_UINT },
   { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpvaldi3,   "__builtin_ia32_lwpval64", IX86_BUILTIN_LWPVAL64,  UNKNOWN,     (int) VOID_FTYPE_UINT64_UINT_UINT },
-  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpinshi3,   "__builtin_ia32_lwpins16", IX86_BUILTIN_LWPINS16,  UNKNOWN,     (int) UCHAR_FTYPE_USHORT_UINT_USHORT },
   { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpinssi3,   "__builtin_ia32_lwpins32", IX86_BUILTIN_LWPINS32,  UNKNOWN,     (int) UCHAR_FTYPE_UINT_UINT_UINT },
   { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpinsdi3,   "__builtin_ia32_lwpins64", IX86_BUILTIN_LWPINS64,  UNKNOWN,     (int) UCHAR_FTYPE_UINT64_UINT_UINT },
 
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 0db4cbb..a8cbafa 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -20835,14 +20835,6 @@
 ;;
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 
-(define_insn "lwp_llwpcbhi1"
-  [(unspec_volatile [(match_operand:HI 0 "register_operand" "r")]
-  	   	    UNSPECV_LLWP_INTRINSIC)]
-  "TARGET_LWP"
-  "llwpcb\t%0"
-  [(set_attr "type" "lwp")
-   (set_attr "mode" "HI")])
-
 (define_insn "lwp_llwpcb<mode>1"
   [(unspec_volatile [(match_operand:P 0 "register_operand" "r")]
   	   	    UNSPECV_LLWP_INTRINSIC)]
@@ -20851,14 +20843,6 @@
   [(set_attr "type" "lwp")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "lwp_slwpcbhi1"
-  [(unspec [(match_operand:HI 0 "register_operand" "=r")]
-  	   UNSPEC_SLWP_INTRINSIC)]
-  "TARGET_LWP"
-  "slwpcb\t%0"
-  [(set_attr "type" "lwp")
-   (set_attr "mode" "HI")])
-
 (define_insn "lwp_slwpcb<mode>1"
   [(unspec [(match_operand:P 0 "register_operand" "=r")]
   	   UNSPEC_SLWP_INTRINSIC)]
@@ -20867,60 +20851,18 @@
   [(set_attr "type" "lwp")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "lwp_lwpvalhi3"
-  [(unspec_volatile [(match_operand:HI 0 "register_operand" "r")
-  	   	     (match_operand:SI 1 "nonimmediate_operand" "rm")
-		     (match_operand:HI 2 "const_int_operand" "i")]
-  	   	    UNSPECV_LWPVAL_INTRINSIC)]
-  "TARGET_LWP"
-  "lwpval\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "type" "lwp")
-   (set_attr "mode" "HI")])
-
-(define_insn "lwp_lwpvalsi3"
-  [(unspec_volatile [(match_operand:SI 0 "register_operand" "r")
+(define_insn "lwp_lwpval<mode>3"
+  [(unspec_volatile [(match_operand:SWI48 0 "register_operand" "r")
     	    	     (match_operand:SI 1 "nonimmediate_operand" "rm")
 		     (match_operand:SI 2 "const_int_operand" "i")]
 		    UNSPECV_LWPVAL_INTRINSIC)]
   "TARGET_LWP"
   "lwpval\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "type" "lwp")
-   (set_attr "mode" "SI")])
-
-(define_insn "lwp_lwpvaldi3"
-  [(unspec_volatile [(match_operand:DI 0 "register_operand" "r")
-  		     (match_operand:SI 1 "nonimmediate_operand" "rm")
-		     (match_operand:SI 2 "const_int_operand" "i")]
-		    UNSPECV_LWPVAL_INTRINSIC)]
-  "TARGET_LWP"
-  "lwpval\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "type" "lwp")
-   (set_attr "mode" "DI")])
-
-(define_insn "lwp_lwpinshi3"
-  [(unspec_volatile [(match_operand:HI 0 "register_operand" "r")
-  		     (match_operand:SI 1 "nonimmediate_operand" "rm")
-		     (match_operand:HI 2 "const_int_operand" "i")]
-		    UNSPECV_LWPINS_INTRINSIC)
-   (clobber (reg:CC FLAGS_REG))]
-  "TARGET_LWP"
-  "lwpins\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "type" "lwp")
-   (set_attr "mode" "HI")])
-
-(define_insn "lwp_lwpinssi3"
-  [(unspec_volatile [(match_operand:SI 0 "register_operand" "r")
-  		     (match_operand:SI 1 "nonimmediate_operand" "rm")
-		     (match_operand:SI 2 "const_int_operand" "i")]
-		    UNSPECV_LWPINS_INTRINSIC)
-   (clobber (reg:CC FLAGS_REG))]
-  "TARGET_LWP"
-  "lwpins\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "type" "lwp")
-   (set_attr "mode" "SI")])
+   (set_attr "mode" "<MODE>")])
 
-(define_insn "lwp_lwpinsdi3"
-  [(unspec_volatile [(match_operand:DI 0 "register_operand" "r")
+(define_insn "lwp_lwpins<mode>3"
+  [(unspec_volatile [(match_operand:SWI48 0 "register_operand" "r")
   		     (match_operand:SI 1 "nonimmediate_operand" "rm")
 		     (match_operand:SI 2 "const_int_operand" "i")]
 		    UNSPECV_LWPINS_INTRINSIC)
@@ -20928,7 +20870,7 @@
   "TARGET_LWP"
   "lwpins\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "type" "lwp")
-   (set_attr "mode" "DI")])
+   (set_attr "mode" "<MODE>")])
 
 (include "mmx.md")
 (include "sse.md")
diff --git a/gcc/config/i386/lwpintrin.h b/gcc/config/i386/lwpintrin.h
index 50ce2ff..7f3ea8f 100644
--- a/gcc/config/i386/lwpintrin.h
+++ b/gcc/config/i386/lwpintrin.h
@@ -33,50 +33,25 @@
 #else
 
 extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__))
-__llwpcb16 (void const *pcbAddress)
+__llwpcb (void const *pcbAddress)
 {
-  __builtin_ia32_llwpcb16 (pcbAddress);
-}
-
-extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__))
-__llwpcb32 (void const *pcbAddress)
-{
-  __builtin_ia32_llwpcb32 (pcbAddress);
-}
-
 #ifdef __x86_64__
-extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__))
-__llwpcb64 (void const *pcbAddress)
-{
   __builtin_ia32_llwpcb64 (pcbAddress);
-}
+#else
+  __builtin_ia32_llwpcb32 (pcbAddress);
 #endif
-
-extern __inline void const * __attribute__((__gnu_inline__, __always_inline__, __artificial__))
-__slwpcb16 (void)
-{
-  return __builtin_ia32_slwpcb16 ();
 }
 
 extern __inline void const * __attribute__((__gnu_inline__, __always_inline__, __artificial__))
-__slwpcb32 (void)
+__slwpcb (void)
 {
-  return __builtin_ia32_slwpcb32 ();
-}
-
 #ifdef __x86_64__
-extern __inline void const * __attribute__((__gnu_inline__, __always_inline__, __artificial__))
-__slwpcb64 (void)
-{
   return __builtin_ia32_slwpcb64 ();
-}
+#else
+  return __builtin_ia32_slwpcb32 ();
 #endif
-
-extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__))
-__lwpval16 (unsigned short data2, unsigned int data1, unsigned short flags)
-{
-  __builtin_ia32_lwpval16 (data2, data1, flags);
 }
+
 extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 __lwpval32 (unsigned int data2, unsigned int data1, unsigned int flags)
 {
@@ -91,12 +66,6 @@ __lwpval64 (unsigned long long data2, unsigned int data1, unsigned int flags)
 }
 #endif
 
-/*
-extern __inline unsigned char __attribute__((__gnu_inline__, __always_inline__, __artificial__))
-__lwpins16 (unsigned short data2, unsigned int data1, unsigned short flags)
-{
-  return __builtin_ia32_lwpins16 (data2, data1, flags);
-}
 
 extern __inline unsigned char __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 __lwpins32 (unsigned int data2, unsigned int data1, unsigned int flags)
@@ -111,7 +80,6 @@ __lwpins64 (unsigned long long data2, unsigned int data1, unsigned int flags)
   return __builtin_ia32_lwpins64 (data2, data1, flags);
 }
 #endif
-*/
 
 #endif /* __LWP__ */
 
-- 
1.6.0.4


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: PATCH: Add LWP support for upcoming AMD Orochi processor.
  2009-12-10 21:04           ` Sebastian Pop
@ 2009-12-10 21:52             ` Jakub Jelinek
  2009-12-11 14:51               ` Jakub Jelinek
  0 siblings, 1 reply; 32+ messages in thread
From: Jakub Jelinek @ 2009-12-10 21:52 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: Richard Henderson, gcc-patches, Uros Bizjak

On Thu, Dec 10, 2009 at 03:00:54PM -0600, Sebastian Pop wrote:
> On Thu, Dec 10, 2009 at 14:06, Jakub Jelinek <jakub@redhat.com> wrote:
> > Why do you want the HI mode version generated, ever (except for 16-bit code which
> > gcc doesn't emit)?  IMNSHO you don't want to ever use even the 32-bit one
> > for -m64 code.  I believe you want lwpintrin.h to provide one intrinsic,
> > not three, it takes a void * argument anyway in all 3 cases.  What would be
> > an intrinsic that just uses some lower bits from the pointer good for?
> > Why should a user care whether the pointer is 32-bit or 64-bit or 16-bit?
> >
> > What could make some very limited sense is when that void * pointer is
> > initialized through say movl symbol, %edx use the 32-bit insn even for -m64
> > code to save one byte, but 1) I doubt it is worth writing the peepholes
> > 2) you don't want to let the user make this decision, instead you want the
> > compiler to decide (if at all).  And 16-bit addresses aren't really useful
> > at all.
> 
> Ok.  I simplified the lwpintrin.h file and the LWP insns patterns like this.

Looks better to me, though I'd say you don't even need to have separate *32
and *64 builtins, just have one * builtin, with an expander that will choose
to return gen_*32 resp. gen_*64 depending on TARGET_64BIT, and then :P insn.

	Jakub

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: PATCH: Add LWP support for upcoming AMD Orochi processor.
  2009-12-10 21:52             ` Jakub Jelinek
@ 2009-12-11 14:51               ` Jakub Jelinek
  2009-12-11 16:54                 ` Richard Henderson
  2009-12-11 21:00                 ` Sebastian Pop
  0 siblings, 2 replies; 32+ messages in thread
From: Jakub Jelinek @ 2009-12-11 14:51 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: Richard Henderson, gcc-patches, Uros Bizjak

On Thu, Dec 10, 2009 at 04:09:45PM -0500, Jakub Jelinek wrote:
> On Thu, Dec 10, 2009 at 03:00:54PM -0600, Sebastian Pop wrote:
> > On Thu, Dec 10, 2009 at 14:06, Jakub Jelinek <jakub@redhat.com> wrote:
> > > Why do you want the HI mode version generated, ever (except for 16-bit code which
> > > gcc doesn't emit)?  IMNSHO you don't want to ever use even the 32-bit one
> > > for -m64 code.  I believe you want lwpintrin.h to provide one intrinsic,
> > > not three, it takes a void * argument anyway in all 3 cases.  What would be
> > > an intrinsic that just uses some lower bits from the pointer good for?
> > > Why should a user care whether the pointer is 32-bit or 64-bit or 16-bit?
> > >
> > > What could make some very limited sense is when that void * pointer is
> > > initialized through say movl symbol, %edx use the 32-bit insn even for -m64
> > > code to save one byte, but 1) I doubt it is worth writing the peepholes
> > > 2) you don't want to let the user make this decision, instead you want the
> > > compiler to decide (if at all).  And 16-bit addresses aren't really useful
> > > at all.
> > 
> > Ok.  I simplified the lwpintrin.h file and the LWP insns patterns like this.
> 
> Looks better to me, though I'd say you don't even need to have separate *32
> and *64 builtins, just have one * builtin, with an expander that will choose
> to return gen_*32 resp. gen_*64 depending on TARGET_64BIT, and then :P insn.

Here is roughly what I meant (your 3 patches combined, adjusted by me).

"length" attribute is wrong for lwpval/lwpins, didn't have time to write it
properly (but without a definition gcc ICEs because the new lwp type isn't
handled in many default attribute definitions).  As it can have memory
operands it won't be very simple though.  Perhaps other attributes still
should have lwp case handled too.

Tested with make check RUNTESTFLAGS=i386.exp=sse[0-9].c and by eyeballing
output (and RTL dumps) of -mlwp -O{0,2} -m{32,64}
#include <x86intrin.h>

void const *q;

int foo (void *p, unsigned int data1, unsigned int data2, unsigned long long
data3)
{
  int i = 0;
  __llwpcb (p);
  q = __slwpcb ();
  __lwpval32 (data1, data2, 16);
  i |= __lwpins32 (data1, data2, 16);
#ifdef __x86_64__
  __lwpval64 (data1, data2, 16);
  __lwpval64 (data3, data2, 16);
  i |= __lwpins64 (data1, data2, 16);
  i |= __lwpins64 (data3, data2, 16);
#endif
  return i;
}

Of course, I'm not an i386 BE maintainer, so Uros/Richard/Honza might have
different opinions.

--- gcc/config/i386/i386-builtin-types.def.jj	2009-12-02 08:56:00.000000000 +0100
+++ gcc/config/i386/i386-builtin-types.def	2009-12-11 12:31:17.000000000 +0100
@@ -195,6 +195,7 @@ DEF_FUNCTION_TYPE (V8SF, V8SI)
 DEF_FUNCTION_TYPE (V8SI, V4SI)
 DEF_FUNCTION_TYPE (V8SI, V8SF)
 DEF_FUNCTION_TYPE (VOID, PCVOID)
+DEF_FUNCTION_TYPE (PCVOID, VOID)
 DEF_FUNCTION_TYPE (VOID, UNSIGNED)
 
 DEF_FUNCTION_TYPE (DI, V2DI, INT)
--- gcc/config/i386/i386.md.jj	2009-12-07 23:13:50.000000000 +0100
+++ gcc/config/i386/i386.md	2009-12-11 15:38:16.000000000 +0100
@@ -204,10 +204,6 @@ (define_constants
    (UNSPEC_XOP_TRUEFALSE	152)
    (UNSPEC_XOP_PERMUTE		153)
    (UNSPEC_FRCZ			154)
-   (UNSPEC_LLWP_INTRINSIC	155)
-   (UNSPEC_SLWP_INTRINSIC	156)
-   (UNSPECV_LWPVAL_INTRINSIC	157)
-   (UNSPECV_LWPINS_INTRINSIC	158)
 
    ; For AES support
    (UNSPEC_AESENC		159)
@@ -251,7 +247,11 @@ (define_constants
    (UNSPECV_RDTSC		18)
    (UNSPECV_RDTSCP		19)
    (UNSPECV_RDPMC		20)
-   (UNSPECV_VSWAPMOV	21)
+   (UNSPECV_VSWAPMOV		21)
+   (UNSPECV_LLWP_INTRINSIC	22)
+   (UNSPECV_SLWP_INTRINSIC	23)
+   (UNSPECV_LWPVAL_INTRINSIC	24)
+   (UNSPECV_LWPINS_INTRINSIC	25)
   ])
 
 ;; Constants to represent pcomtrue/pcomfalse variants
@@ -578,7 +578,7 @@ (define_attr "length" ""
 ;; if the instruction is complex.
 
 (define_attr "memory" "none,load,store,both,unknown"
-  (cond [(eq_attr "type" "other,multi,str")
+  (cond [(eq_attr "type" "other,multi,str,lwp")
 	   (const_string "unknown")
 	 (eq_attr "type" "lea,fcmov,fpspc")
 	   (const_string "none")
@@ -20835,113 +20835,84 @@ (define_insn "*rdtscp_rex64"
 ;;
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 
-(define_insn "lwp_llwpcbhi1"
-  [(unspec [(match_operand:HI 0 "register_operand" "r")]
-  	   UNSPEC_LLWP_INTRINSIC)]
+(define_expand "lwp_llwpcb"
+  [(unspec_volatile [(match_operand 0 "register_operand" "r")]
+		    UNSPECV_LLWP_INTRINSIC)]
   "TARGET_LWP"
-  "llwpcb\t%0"
-  [(set_attr "type" "lwp")
-   (set_attr "mode" "HI")])
-
-(define_insn "lwp_llwpcbsi1"
-  [(unspec [(match_operand:SI 0 "register_operand" "r")]
-  	   UNSPEC_LLWP_INTRINSIC)]
-  "TARGET_LWP"
-  "llwpcb\t%0"
-  [(set_attr "type" "lwp")
-   (set_attr "mode" "SI")])
+  "")
 
-(define_insn "lwp_llwpcbdi1"
-  [(unspec [(match_operand:DI 0 "register_operand" "r")]
-  	   UNSPEC_LLWP_INTRINSIC)]
+(define_insn "*lwp_llwpcb<mode>1"
+  [(unspec_volatile [(match_operand:P 0 "register_operand" "r")]
+		    UNSPECV_LLWP_INTRINSIC)]
   "TARGET_LWP"
   "llwpcb\t%0"
   [(set_attr "type" "lwp")
-   (set_attr "mode" "DI")])
-
-(define_insn "lwp_slwpcbhi1"
-  [(unspec [(match_operand:HI 0 "register_operand" "r")]
-  	   UNSPEC_SLWP_INTRINSIC)]
-  "TARGET_LWP"
-  "slwpcb\t%0"
-  [(set_attr "type" "lwp")
-   (set_attr "mode" "HI")])
+   (set_attr "mode" "<MODE>")
+   (set_attr "length" "5")])
 
-(define_insn "lwp_slwpcbsi1"
-  [(unspec [(match_operand:SI 0 "register_operand" "r")]
-  	   UNSPEC_SLWP_INTRINSIC)]
+(define_expand "lwp_slwpcb"
+  [(set (match_operand 0 "register_operand" "=r")
+	(unspec_volatile [(const_int 0)] UNSPECV_SLWP_INTRINSIC))]
   "TARGET_LWP"
-  "slwpcb\t%0"
-  [(set_attr "type" "lwp")
-   (set_attr "mode" "SI")])
-
-(define_insn "lwp_slwpcbdi1"
-  [(unspec [(match_operand:DI 0 "register_operand" "r")]
-  	   UNSPEC_SLWP_INTRINSIC)]
+  {
+    if (TARGET_64BIT)
+      emit_insn (gen_lwp_slwpcbdi (operands[0]));
+    else
+      emit_insn (gen_lwp_slwpcbsi (operands[0]));
+    DONE;
+  })
+
+(define_insn "lwp_slwpcb<mode>"
+  [(set (match_operand:P 0 "register_operand" "=r")
+	(unspec_volatile:P [(const_int 0)] UNSPECV_SLWP_INTRINSIC))]
   "TARGET_LWP"
   "slwpcb\t%0"
   [(set_attr "type" "lwp")
-   (set_attr "mode" "DI")])
-
-(define_insn "lwp_lwpvalhi3"
-  [(unspec_volatile [(match_operand:HI 0 "register_operand" "r")
-  	   	     (match_operand:SI 1 "nonimmediate_operand" "rm")
-	   	     (match_operand:HI 2 "const_int_operand" "")]
-  	   	    UNSPECV_LWPVAL_INTRINSIC)]
-  "TARGET_LWP"
-  "lwpval\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "type" "lwp")
-   (set_attr "mode" "HI")])
+   (set_attr "mode" "<MODE>")
+   (set_attr "length" "5")])
 
-(define_insn "lwp_lwpvalsi3"
-  [(unspec_volatile [(match_operand:SI 0 "register_operand" "r")
-    	    	     (match_operand:SI 1 "nonimmediate_operand" "rm")
-	    	     (match_operand:SI 2 "const_int_operand" "")]
+(define_expand "lwp_lwpval<mode>3"
+  [(unspec_volatile [(match_operand:SWI48 1 "register_operand" "r")
+    	    	     (match_operand:SI 2 "nonimmediate_operand" "rm")
+		     (match_operand:SI 3 "const_int_operand" "i")]
 		    UNSPECV_LWPVAL_INTRINSIC)]
   "TARGET_LWP"
-  "lwpval\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "type" "lwp")
-   (set_attr "mode" "SI")])
+  "/* Avoid unused variable warning.  */
+   (void) operand0;")
 
-(define_insn "lwp_lwpvaldi3"
-  [(unspec_volatile [(match_operand:DI 0 "register_operand" "r")
-  		     (match_operand:SI 1 "nonimmediate_operand" "rm")
-		     (match_operand:SI 2 "const_int_operand" "")]
+(define_insn "*lwp_lwpval<mode>3_1"
+  [(unspec_volatile [(match_operand:SWI48 0 "register_operand" "r")
+    	    	     (match_operand:SI 1 "nonimmediate_operand" "rm")
+		     (match_operand:SI 2 "const_int_operand" "i")]
 		    UNSPECV_LWPVAL_INTRINSIC)]
   "TARGET_LWP"
   "lwpval\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "type" "lwp")
-   (set_attr "mode" "DI")])
-
-(define_insn "lwp_lwpinshi3"
-  [(unspec_volatile [(match_operand:HI 0 "register_operand" "r")
-  		     (match_operand:SI 1 "nonimmediate_operand" "rm")
-		     (match_operand:HI 2 "const_int_operand" "")]
-		    UNSPECV_LWPINS_INTRINSIC)]
-  "TARGET_LWP"
-  "lwpins\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "type" "lwp")
-   (set_attr "mode" "HI")])
+   (set_attr "mode" "<MODE>")
+   (set_attr "length" "9")])
 
-(define_insn "lwp_lwpinssi3"
-  [(unspec_volatile [(match_operand:SI 0 "register_operand" "r")
-  		     (match_operand:SI 1 "nonimmediate_operand" "rm")
-		     (match_operand:SI 2 "const_int_operand" "")]
-		    UNSPECV_LWPINS_INTRINSIC)]
+(define_expand "lwp_lwpins<mode>3"
+  [(set (reg:CCC FLAGS_REG)
+	(unspec_volatile:CCC [(match_operand:SWI48 1 "register_operand" "r")
+			      (match_operand:SI 2 "nonimmediate_operand" "rm")
+			      (match_operand:SI 3 "const_int_operand" "i")]
+			     UNSPECV_LWPINS_INTRINSIC))
+   (set (match_operand:QI 0 "nonimmediate_operand" "=qm")
+	(eq:QI (reg:CCC FLAGS_REG) (const_int 0)))]
   "TARGET_LWP"
-  "lwpins\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "type" "lwp")
-   (set_attr "mode" "SI")])
+  "")
 
-(define_insn "lwp_lwpinsdi3"
-  [(unspec_volatile [(match_operand:DI 0 "register_operand" "r")
-  		     (match_operand:SI 1 "nonimmediate_operand" "rm")
-		     (match_operand:SI 2 "const_int_operand" "")]
-		    UNSPECV_LWPINS_INTRINSIC)]
+(define_insn "*lwp_lwpins<mode>3_1"
+  [(set (reg:CCC FLAGS_REG)
+	(unspec_volatile:CCC [(match_operand:SWI48 0 "register_operand" "r")
+			      (match_operand:SI 1 "nonimmediate_operand" "rm")
+			      (match_operand:SI 2 "const_int_operand" "i")]
+			     UNSPECV_LWPINS_INTRINSIC))]
   "TARGET_LWP"
   "lwpins\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "type" "lwp")
-   (set_attr "mode" "DI")])
+   (set_attr "mode" "<MODE>")
+   (set_attr "length" "9")])
 
 (include "mmx.md")
 (include "sse.md")
--- gcc/config/i386/i386.c.jj	2009-12-07 23:32:53.000000000 +0100
+++ gcc/config/i386/i386.c	2009-12-11 15:05:25.000000000 +0100
@@ -21251,16 +21251,10 @@ enum ix86_builtins
   IX86_BUILTIN_VPCOMTRUEQ,
 
   /* LWP instructions.  */
-  IX86_BUILTIN_LLWPCB16,
-  IX86_BUILTIN_LLWPCB32,
-  IX86_BUILTIN_LLWPCB64,
-  IX86_BUILTIN_SLWPCB16,
-  IX86_BUILTIN_SLWPCB32,
-  IX86_BUILTIN_SLWPCB64,
-  IX86_BUILTIN_LWPVAL16,
+  IX86_BUILTIN_LLWPCB,
+  IX86_BUILTIN_SLWPCB,
   IX86_BUILTIN_LWPVAL32,
   IX86_BUILTIN_LWPVAL64,
-  IX86_BUILTIN_LWPINS16,
   IX86_BUILTIN_LWPINS32,
   IX86_BUILTIN_LWPINS64,
 
@@ -21532,20 +21526,12 @@ static const struct builtin_description 
   { OPTION_MASK_ISA_AVX, CODE_FOR_avx_maskstorepd256, "__builtin_ia32_maskstorepd256", IX86_BUILTIN_MASKSTOREPD256, UNKNOWN, (int) VOID_FTYPE_PV4DF_V4DF_V4DF },
   { OPTION_MASK_ISA_AVX, CODE_FOR_avx_maskstoreps256, "__builtin_ia32_maskstoreps256", IX86_BUILTIN_MASKSTOREPS256, UNKNOWN, (int) VOID_FTYPE_PV8SF_V8SF_V8SF },
 
-  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_llwpcbhi1,   "__builtin_ia32_llwpcb16",   IX86_BUILTIN_LLWPCB16,    UNKNOWN,     (int) VOID_FTYPE_VOID },
-  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_llwpcbsi1,   "__builtin_ia32_llwpcb32",   IX86_BUILTIN_LLWPCB32,    UNKNOWN,     (int) VOID_FTYPE_VOID },
-  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_llwpcbdi1,   "__builtin_ia32_llwpcb64",   IX86_BUILTIN_LLWPCB64,    UNKNOWN,     (int) VOID_FTYPE_VOID },
-
-  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_slwpcbhi1,   "__builtin_ia32_slwpcb16",   IX86_BUILTIN_SLWPCB16,    UNKNOWN,     (int) VOID_FTYPE_VOID },
-  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_slwpcbsi1,   "__builtin_ia32_slwpcb32",   IX86_BUILTIN_SLWPCB32,    UNKNOWN,     (int) VOID_FTYPE_VOID },
-  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_slwpcbdi1,   "__builtin_ia32_slwpcb64",   IX86_BUILTIN_SLWPCB64,    UNKNOWN,     (int) VOID_FTYPE_VOID },
-
-  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpvalhi3,   "__builtin_ia32_lwpval16", IX86_BUILTIN_LWPVAL16,  UNKNOWN,     (int) VOID_FTYPE_USHORT_UINT_USHORT },
-  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpvalsi3,   "__builtin_ia32_lwpval32", IX86_BUILTIN_LWPVAL64,  UNKNOWN,     (int) VOID_FTYPE_UINT_UINT_UINT },
-  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpvaldi3,   "__builtin_ia32_lwpval64", IX86_BUILTIN_LWPVAL64,  UNKNOWN,     (int) VOID_FTYPE_UINT64_UINT_UINT },
-  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpinshi3,   "__builtin_ia32_lwpins16", IX86_BUILTIN_LWPINS16,  UNKNOWN,     (int) UCHAR_FTYPE_USHORT_UINT_USHORT },
-  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpinssi3,   "__builtin_ia32_lwpins32", IX86_BUILTIN_LWPINS64,  UNKNOWN,     (int) UCHAR_FTYPE_UINT_UINT_UINT },
-  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpinsdi3,   "__builtin_ia32_lwpins64", IX86_BUILTIN_LWPINS64,  UNKNOWN,     (int) UCHAR_FTYPE_UINT64_UINT_UINT },
+  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_llwpcb,   "__builtin_ia32_llwpcb", IX86_BUILTIN_LLWPCB, UNKNOWN, (int) VOID_FTYPE_PCVOID },
+  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_slwpcb,   "__builtin_ia32_slwpcb",   IX86_BUILTIN_SLWPCB, UNKNOWN, (int) PCVOID_FTYPE_VOID },
+  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpvalsi3, "__builtin_ia32_lwpval32", IX86_BUILTIN_LWPVAL32, UNKNOWN, (int) VOID_FTYPE_UINT_UINT_UINT },
+  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpvaldi3, "__builtin_ia32_lwpval64", IX86_BUILTIN_LWPVAL64, UNKNOWN, (int) VOID_FTYPE_UINT64_UINT_UINT },
+  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpinssi3, "__builtin_ia32_lwpins32", IX86_BUILTIN_LWPINS32, UNKNOWN, (int) UCHAR_FTYPE_UINT_UINT_UINT },
+  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpinsdi3, "__builtin_ia32_lwpins64", IX86_BUILTIN_LWPINS64, UNKNOWN, (int) UCHAR_FTYPE_UINT64_UINT_UINT },
 
 };
 
@@ -23734,7 +23721,7 @@ ix86_expand_special_args_builtin (const 
     {
       rtx op;
       enum machine_mode mode;
-    } args[2];
+    } args[3];
   enum insn_code icode = d->icode;
   bool last_arg_constant = false;
   const struct insn_data *insn_p = &insn_data[icode];
@@ -23761,6 +23748,7 @@ ix86_expand_special_args_builtin (const 
     case V4DF_FTYPE_PCV2DF:
     case V4DF_FTYPE_PCDOUBLE:
     case V2DF_FTYPE_PCDOUBLE:
+    case VOID_FTYPE_PCVOID:
       nargs = 1;
       klass = load;
       memory = 0;
@@ -23804,15 +23792,14 @@ ix86_expand_special_args_builtin (const 
       /* Reserve memory operand for target.  */
       memory = ARRAY_SIZE (args);
       break;
-    case VOID_FTYPE_USHORT_UINT_USHORT:
     case VOID_FTYPE_UINT_UINT_UINT:
     case VOID_FTYPE_UINT64_UINT_UINT:
-    case UCHAR_FTYPE_USHORT_UINT_USHORT:
     case UCHAR_FTYPE_UINT_UINT_UINT:
     case UCHAR_FTYPE_UINT64_UINT_UINT:
       nargs = 3;
-      klass = store;
-      memory = 0;
+      klass = load;
+      memory = ARRAY_SIZE (args);
+      last_arg_constant = true;
       break;
     default:
       gcc_unreachable ();
@@ -23852,7 +23839,14 @@ ix86_expand_special_args_builtin (const 
 	  if (!match)
 	    switch (icode)
 	      {
-	     default:
+	      case CODE_FOR_lwp_lwpvalsi3:
+	      case CODE_FOR_lwp_lwpvaldi3:
+	      case CODE_FOR_lwp_lwpinssi3:
+	      case CODE_FOR_lwp_lwpinsdi3:
+		error ("the last argument must be a 32-bit immediate");
+		return const0_rtx;
+
+	      default:
 		error ("the last argument must be an 8-bit immediate");
 		return const0_rtx;
 	      }
@@ -23893,6 +23887,9 @@ ix86_expand_special_args_builtin (const 
     case 2:
       pat = GEN_FCN (icode) (target, args[0].op, args[1].op);
       break;
+    case 3:
+      pat = GEN_FCN (icode) (target, args[0].op, args[1].op, args[2].op);
+      break;
     default:
       gcc_unreachable ();
     }
@@ -24205,6 +24202,23 @@ ix86_expand_builtin (tree exp, rtx targe
 	return target;
       }
 
+    case IX86_BUILTIN_LLWPCB:
+      arg0 = CALL_EXPR_ARG (exp, 0);
+      op0 = expand_normal (arg0);
+      icode = CODE_FOR_lwp_llwpcb;
+      if (! (*insn_data[icode].operand[0].predicate) (op0, Pmode))
+	op0 = copy_to_mode_reg (Pmode, op0);
+      emit_insn (gen_lwp_llwpcb (op0));
+      return 0;
+
+    case IX86_BUILTIN_SLWPCB:
+      icode = CODE_FOR_lwp_slwpcb;
+      if (!target
+	  || ! (*insn_data[icode].operand[0].predicate) (target, Pmode))
+	target = gen_reg_rtx (Pmode);
+      emit_insn (gen_lwp_slwpcb (target));
+      return target;
+
     default:
       break;
     }
--- gcc/config/i386/lwpintrin.h.jj	2009-11-05 16:22:25.000000000 +0100
+++ gcc/config/i386/lwpintrin.h	2009-12-11 15:23:56.000000000 +0100
@@ -33,77 +33,68 @@
 #else
 
 extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__))
-__llwpcb16 (void *pcbAddress)
+__llwpcb (void const *pcbAddress)
 {
-  __builtin_ia32_llwpcb16 (pcbAddress);
+  __builtin_ia32_llwpcb (pcbAddress);
 }
 
-extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__))
-__llwpcb32 (void *pcbAddress)
+extern __inline void const * __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+__slwpcb (void)
 {
-  __builtin_ia32_llwpcb32 (pcbAddress);
+  return __builtin_ia32_slwpcb ();
 }
 
-extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__))
-__llwpcb64 (void *pcbAddress)
-{
-  __builtin_ia32_llwpcb64 (pcbAddress);
-}
-
-extern __inline void * __attribute__((__gnu_inline__, __always_inline__, __artificial__))
-__slwpcb16 (void)
-{
-  return __builtin_ia32_slwpcb16 ();
-}
-
-extern __inline void * __attribute__((__gnu_inline__, __always_inline__, __artificial__))
-__slwpcb32 (void)
-{
-  return __builtin_ia32_slwpcb32 ();
-}
-
-extern __inline void * __attribute__((__gnu_inline__, __always_inline__, __artificial__))
-__slwpcb64 (void)
-{
-  return __builtin_ia32_slwpcb64 ();
-}
-
-extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__))
-__lwpval16 (unsigned short data2, unsigned int data1, unsigned short flags)
-{
-  __builtin_ia32_lwpval16 (data2, data1, flags);
-}
-/*
+#ifdef __OPTIMIZE__
 extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 __lwpval32 (unsigned int data2, unsigned int data1, unsigned int flags)
 {
   __builtin_ia32_lwpval32 (data2, data1, flags);
 }
 
+#ifdef __x86_64__
 extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__))
-__lwpval64 (unsigned __int64 data2, unsigned int data1, unsigned int flags)
+__lwpval64 (unsigned long long data2, unsigned int data1, unsigned int flags)
 {
   __builtin_ia32_lwpval64 (data2, data1, flags);
 }
+#endif
+#else
+#define __lwpval32(D2, D1, F) \
+  (__builtin_ia32_lwpval32 ((unsigned int) (D2), (unsigned int) (D1), \
+			    (unsigned int) (F)))
+#ifdef __x86_64__
+#define __lwpval64(D2, D1, F) \
+  (__builtin_ia32_lwpval64 ((unsigned long long) (D2), (unsigned int) (D1), \
+			    (unsigned int) (F)))
+#endif
+#endif
 
-extern __inline unsigned char __attribute__((__gnu_inline__, __always_inline__, __artificial__))
-__lwpins16 (unsigned short data2, unsigned int data1, unsigned short flags)
-{
-  return __builtin_ia32_lwpins16 (data2, data1, flags);
-}
 
+#ifdef __OPTIMIZE__
 extern __inline unsigned char __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 __lwpins32 (unsigned int data2, unsigned int data1, unsigned int flags)
 {
   return __builtin_ia32_lwpins32 (data2, data1, flags);
 }
 
+#ifdef __x86_64__
 extern __inline unsigned char __attribute__((__gnu_inline__, __always_inline__, __artificial__))
-__lwpins64 (unsigned __int64 data2, unsigned int data1, unsigned int flags)
+__lwpins64 (unsigned long long data2, unsigned int data1, unsigned int flags)
 {
   return __builtin_ia32_lwpins64 (data2, data1, flags);
 }
-*/
+#endif
+#else
+#define __lwpins32(D2, D1, F) \
+  (__builtin_ia32_lwpins32 ((unsigned int) (D2), (unsigned int) (D1), \
+			    (unsigned int) (F)))
+#ifdef __x86_64__
+#define __lwpins64(D2, D1, F) \
+  (__builtin_ia32_lwpins64 ((unsigned long long) (D2), (unsigned int) (D1), \
+			    (unsigned int) (F)))
+#endif
+#endif
+
 #endif /* __LWP__ */
 
 #endif /* _LWPINTRIN_H_INCLUDED */
--- gcc/testsuite/gcc.target/i386/sse-22.c.jj	2009-11-04 18:36:12.000000000 +0100
+++ gcc/testsuite/gcc.target/i386/sse-22.c	2009-12-11 15:31:58.000000000 +0100
@@ -4,10 +4,12 @@
 
 #include <mm_malloc.h>
 
-/* Test that the intrinsics compile without optimization.  All of them are
-   defined as inline functions in {,x,e,p,t,s,w,a}mmintrin.h, xopintrin.h and mm3dnow.h
-   that reference the proper builtin functions.  Defining away "extern" and
-   "__inline" results in all of them being compiled as proper functions.  */
+/* Test that the intrinsics compile without optimization.  All of them
+   are defined as inline functions in {,x,e,p,t,s,w,a}mmintrin.h,
+   xopintrin.h, lwpintrin.h, popcntintrin.h and mm3dnow.h that
+   reference the proper builtin functions.  Defining away "extern" and
+   "__inline" results in all of them being compiled as proper
+   functions.  */
 
 #define extern
 #define __inline
@@ -37,7 +39,7 @@
 
 
 #ifndef DIFFERENT_PRAGMAS
-#pragma GCC target ("mmx,3dnow,sse,sse2,sse3,ssse3,sse4.1,sse4.2,sse4a,aes,pclmul,xop")
+#pragma GCC target ("mmx,3dnow,sse,sse2,sse3,ssse3,sse4.1,sse4.2,sse4a,aes,pclmul,xop,popcnt,abm,lwp")
 #endif
 
 /* Following intrinsics require immediate arguments.  They
@@ -162,10 +164,18 @@ test_2 (_mm_round_ss, __m128, __m128, __
 
 /* xopintrin.h (XOP). */
 #ifdef DIFFERENT_PRAGMAS
-#pragma GCC target ("xop")
+#pragma GCC target ("xop,lwp")
 #endif
 #include <x86intrin.h>
 test_1 ( _mm_roti_epi8, __m128i, __m128i, 1)
 test_1 ( _mm_roti_epi16, __m128i, __m128i, 1)
 test_1 ( _mm_roti_epi32, __m128i, __m128i, 1)
 test_1 ( _mm_roti_epi64, __m128i, __m128i, 1)
+
+/* lwpintrin.h (LWP). */
+test_2 ( __lwpval32, void, unsigned int, unsigned int, 1)
+test_2 ( __lwpins32, unsigned char, unsigned int, unsigned int, 1)
+#ifdef __x86_64__
+test_2 ( __lwpval64, void, unsigned long long, unsigned int, 1)
+test_2 ( __lwpins64, unsigned char, unsigned long long, unsigned int, 1)
+#endif
--- gcc/testsuite/gcc.target/i386/sse-13.c.jj	2009-12-07 23:32:49.000000000 +0100
+++ gcc/testsuite/gcc.target/i386/sse-13.c	2009-12-11 15:27:36.000000000 +0100
@@ -1,13 +1,14 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mxop -maes -mpclmul -mabm" } */
+/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mxop -maes -mpclmul -mpopcnt -mabm -mlwp" } */
 
 #include <mm_malloc.h>
 
 /* Test that the intrinsics compile with optimization.  All of them
    are defined as inline functions in {,x,e,p,t,s,w,a,b,i}mmintrin.h,
-   xopintrin.h, abmintrin.h and mm3dnow.h that reference the proper
-   builtin functions.  Defining away "extern" and "__inline" results
-   in all of them being compiled as proper functions.  */
+   xopintrin.h, abmintrin.h, lwpintrin.h, popcntintrin.h and mm3dnow.h
+   that reference the proper builtin functions.  Defining away
+   "extern" and "__inline" results in all of them being compiled as
+   proper functions.  */
 
 #define extern
 #define __inline
@@ -127,9 +128,15 @@
 #define __builtin_ia32_shufps(A, B, N) __builtin_ia32_shufps(A, B, 0)
 
 /* xopintrin.h */
-#define  __builtin_ia32_vprotbi(A, N) __builtin_ia32_vprotbi (A,1)
-#define  __builtin_ia32_vprotwi(A, N) __builtin_ia32_vprotwi (A,1)
-#define  __builtin_ia32_vprotdi(A, N) __builtin_ia32_vprotdi (A,1)
-#define  __builtin_ia32_vprotqi(A, N) __builtin_ia32_vprotqi (A,1)
+#define __builtin_ia32_vprotbi(A, N) __builtin_ia32_vprotbi (A,1)
+#define __builtin_ia32_vprotwi(A, N) __builtin_ia32_vprotwi (A,1)
+#define __builtin_ia32_vprotdi(A, N) __builtin_ia32_vprotdi (A,1)
+#define __builtin_ia32_vprotqi(A, N) __builtin_ia32_vprotqi (A,1)
+
+/* lwpintrin.h */
+#define __builtin_ia32_lwpval32(D2, D1, F) __builtin_ia32_lwpval32 (D2, D1, 1)
+#define __builtin_ia32_lwpval64(D2, D1, F) __builtin_ia32_lwpval64 (D2, D1, 1)
+#define __builtin_ia32_lwpins32(D2, D1, F) __builtin_ia32_lwpins32 (D2, D1, 1)
+#define __builtin_ia32_lwpins64(D2, D1, F) __builtin_ia32_lwpins64 (D2, D1, 1)
 
 #include <x86intrin.h>
--- gcc/testsuite/gcc.target/i386/sse-23.c.jj	2009-11-04 18:36:12.000000000 +0100
+++ gcc/testsuite/gcc.target/i386/sse-23.c	2009-12-11 12:27:37.000000000 +0100
@@ -3,10 +3,12 @@
 
 #include <mm_malloc.h>
 
-/* Test that the intrinsics compile with optimization.  All of them are
-   defined as inline functions in {,x,e,p,t,s,w,a}mmintrin.h, xopintrin.h and mm3dnow.h
-   that reference the proper builtin functions.  Defining away "extern" and
-   "__inline" results in all of them being compiled as proper functions.  */
+/* Test that the intrinsics compile with optimization.  All of them
+   are defined as inline functions in {,x,e,p,t,s,w,a}mmintrin.h,
+   xopintrin.h, lwpintrin.h, popcntintrin.h and mm3dnow.h that
+   reference the proper builtin functions.  Defining away "extern" and
+   "__inline" results in all of them being compiled as proper
+   functions.  */
 
 #define extern
 #define __inline
@@ -99,7 +101,8 @@
 #define __builtin_ia32_vprotdi(A, B) __builtin_ia32_vprotdi(A,1)
 #define __builtin_ia32_vprotqi(A, B) __builtin_ia32_vprotqi(A,1)
 
-#pragma GCC target ("3dnow,sse4,sse4a,aes,pclmul,xop")
+#pragma GCC target ("3dnow,sse4,sse4a,aes,pclmul,xop,abm,popcnt,lwp")
 #include <wmmintrin.h>
 #include <smmintrin.h>
 #include <mm3dnow.h>
+#include <x86intrin.h>
--- gcc/testsuite/gcc.target/i386/sse-14.c.jj	2009-11-04 18:36:12.000000000 +0100
+++ gcc/testsuite/gcc.target/i386/sse-14.c	2009-12-11 15:30:26.000000000 +0100
@@ -1,12 +1,13 @@
 /* { dg-do compile } */
-/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mxop -msse4a -maes -mpclmul" } */
+/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mxop -msse4a -maes -mpclmul -mpopcnt -mabm -mlwp" } */
 
 #include <mm_malloc.h>
 
 /* Test that the intrinsics compile without optimization.  All of them are
-   defined as inline functions in {,x,e,p,t,s,w,a}mmintrin.h, xopintrin.h  and mm3dnow.h
-   that reference the proper builtin functions.  Defining away "extern" and
-   "__inline" results in all of them being compiled as proper functions.  */
+   defined as inline functions in {,x,e,p,t,s,w,a}mmintrin.h, xopintrin.h,
+   lwpintrin.h and mm3dnow.h that reference the proper builtin functions.
+   Defining away "extern" and "__inline" results in all of them being compiled
+   as proper functions.  */
 
 #define extern
 #define __inline
@@ -162,3 +163,10 @@ test_1 ( _mm_roti_epi16, __m128i, __m128
 test_1 ( _mm_roti_epi32, __m128i, __m128i, 1)
 test_1 ( _mm_roti_epi64, __m128i, __m128i, 1)
 
+/* lwpintrin.h */
+test_2 ( __lwpval32, void, unsigned int, unsigned int, 1)
+test_2 ( __lwpins32, unsigned char, unsigned int, unsigned int, 1)
+#ifdef __x86_64__
+test_2 ( __lwpval64, void, unsigned long long, unsigned int, 1)
+test_2 ( __lwpins64, unsigned char, unsigned long long, unsigned int, 1)
+#endif
--- gcc/testsuite/gcc.target/i386/sse-12.c.jj	2009-12-07 23:32:49.000000000 +0100
+++ gcc/testsuite/gcc.target/i386/sse-12.c	2009-12-11 12:27:37.000000000 +0100
@@ -1,8 +1,8 @@
 /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, xopintrin.h, mm3dnow.h,
-   abmintrin.h and mm_malloc.h are usable with -O -std=c89
-   -pedantic-errors.  */
+   abmintrin.h, lwpintrin.h, popcntintrin.h and mm_malloc.h are usable
+   with -O -std=c89 -pedantic-errors.  */
 /* { dg-do compile } */
-/* { dg-options "-O -std=c89 -pedantic-errors -march=k8 -m3dnow -mavx -mfma4 -mxop -maes -mpclmul -mabm" } */
+/* { dg-options "-O -std=c89 -pedantic-errors -march=k8 -m3dnow -mavx -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlwp" } */
 
 #include <x86intrin.h>
 


	Jakub

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: PATCH: Add LWP support for upcoming AMD Orochi processor.
  2009-12-11 14:51               ` Jakub Jelinek
@ 2009-12-11 16:54                 ` Richard Henderson
  2009-12-11 21:00                 ` Sebastian Pop
  1 sibling, 0 replies; 32+ messages in thread
From: Richard Henderson @ 2009-12-11 16:54 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Sebastian Pop, gcc-patches, Uros Bizjak

On 12/11/2009 06:50 AM, Jakub Jelinek wrote:
> Here is roughly what I meant (your 3 patches combined, adjusted by me).

This looks much better.


r~

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: PATCH: Add LWP support for upcoming AMD Orochi processor.
  2009-12-11 14:51               ` Jakub Jelinek
  2009-12-11 16:54                 ` Richard Henderson
@ 2009-12-11 21:00                 ` Sebastian Pop
  2009-12-11 21:43                   ` Jakub Jelinek
  1 sibling, 1 reply; 32+ messages in thread
From: Sebastian Pop @ 2009-12-11 21:00 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Richard Henderson, gcc-patches, Uros Bizjak

[-- Attachment #1: Type: text/plain, Size: 459 bytes --]

On Fri, Dec 11, 2009 at 08:50, Jakub Jelinek <jakub@redhat.com> wrote:
> Here is roughly what I meant (your 3 patches combined, adjusted by me).

Thanks for the patch.

I have discussed with other people internally and they highly
recommended that we do not define the intrinsics for llwpcb and slwpcb
as "void const *", but just "void *" as the address of the PCB is
susceptible to be changed by the hardware.  See attached patch on top
of yours.

Sebastian

[-- Attachment #2: 0002-Make-void-pointers-non-const.patch --]
[-- Type: text/x-patch, Size: 3429 bytes --]

From e0ae807cf025d425faecd8bc8c789b304fc76648 Mon Sep 17 00:00:00 2001
From: Sebastian Pop <sebpop@gmail.com>
Date: Fri, 11 Dec 2009 14:50:22 -0600
Subject: [PATCH] Make void pointers non-const.

---
 gcc/config/i386/i386-builtin-types.def |    4 +++-
 gcc/config/i386/i386.c                 |    4 ++--
 gcc/config/i386/lwpintrin.h            |    4 ++--
 3 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def
index f0d25f3..820f854 100644
--- a/gcc/config/i386/i386-builtin-types.def
+++ b/gcc/config/i386/i386-builtin-types.def
@@ -104,6 +104,7 @@ DEF_POINTER_TYPE (PCDOUBLE, DOUBLE, CONST)
 DEF_POINTER_TYPE (PCFLOAT, FLOAT, CONST)
 DEF_POINTER_TYPE (PCHAR, CHAR)
 DEF_POINTER_TYPE (PCVOID, VOID, CONST)
+DEF_POINTER_TYPE (PVOID, VOID)
 DEF_POINTER_TYPE (PDOUBLE, DOUBLE)
 DEF_POINTER_TYPE (PFLOAT, FLOAT)
 DEF_POINTER_TYPE (PINT, INT)
@@ -195,7 +196,8 @@ DEF_FUNCTION_TYPE (V8SF, V8SI)
 DEF_FUNCTION_TYPE (V8SI, V4SI)
 DEF_FUNCTION_TYPE (V8SI, V8SF)
 DEF_FUNCTION_TYPE (VOID, PCVOID)
-DEF_FUNCTION_TYPE (PCVOID, VOID)
+DEF_FUNCTION_TYPE (VOID, PVOID)
+DEF_FUNCTION_TYPE (PVOID, VOID)
 DEF_FUNCTION_TYPE (VOID, UNSIGNED)
 
 DEF_FUNCTION_TYPE (DI, V2DI, INT)
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index cdf84fe..367f5ab 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -21526,8 +21526,8 @@ static const struct builtin_description bdesc_special_args[] =
   { OPTION_MASK_ISA_AVX, CODE_FOR_avx_maskstorepd256, "__builtin_ia32_maskstorepd256", IX86_BUILTIN_MASKSTOREPD256, UNKNOWN, (int) VOID_FTYPE_PV4DF_V4DF_V4DF },
   { OPTION_MASK_ISA_AVX, CODE_FOR_avx_maskstoreps256, "__builtin_ia32_maskstoreps256", IX86_BUILTIN_MASKSTOREPS256, UNKNOWN, (int) VOID_FTYPE_PV8SF_V8SF_V8SF },
 
-  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_llwpcb,   "__builtin_ia32_llwpcb", IX86_BUILTIN_LLWPCB, UNKNOWN, (int) VOID_FTYPE_PCVOID },
-  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_slwpcb,   "__builtin_ia32_slwpcb",   IX86_BUILTIN_SLWPCB, UNKNOWN, (int) PCVOID_FTYPE_VOID },
+  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_llwpcb,   "__builtin_ia32_llwpcb", IX86_BUILTIN_LLWPCB, UNKNOWN, (int) VOID_FTYPE_PVOID },
+  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_slwpcb,   "__builtin_ia32_slwpcb",   IX86_BUILTIN_SLWPCB, UNKNOWN, (int) PVOID_FTYPE_VOID },
   { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpvalsi3, "__builtin_ia32_lwpval32", IX86_BUILTIN_LWPVAL32, UNKNOWN, (int) VOID_FTYPE_UINT_UINT_UINT },
   { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpvaldi3, "__builtin_ia32_lwpval64", IX86_BUILTIN_LWPVAL64, UNKNOWN, (int) VOID_FTYPE_UINT64_UINT_UINT },
   { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpinssi3, "__builtin_ia32_lwpins32", IX86_BUILTIN_LWPINS32, UNKNOWN, (int) UCHAR_FTYPE_UINT_UINT_UINT },
diff --git a/gcc/config/i386/lwpintrin.h b/gcc/config/i386/lwpintrin.h
index 48f4875..954b039 100644
--- a/gcc/config/i386/lwpintrin.h
+++ b/gcc/config/i386/lwpintrin.h
@@ -33,12 +33,12 @@
 #else
 
 extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__))
-__llwpcb (void const *pcbAddress)
+__llwpcb (void *pcbAddress)
 {
   __builtin_ia32_llwpcb (pcbAddress);
 }
 
-extern __inline void const * __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+extern __inline void * __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 __slwpcb (void)
 {
   return __builtin_ia32_slwpcb ();
-- 
1.6.0.4


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: PATCH: Add LWP support for upcoming AMD Orochi processor.
  2009-12-11 21:00                 ` Sebastian Pop
@ 2009-12-11 21:43                   ` Jakub Jelinek
  2009-12-11 22:27                     ` Sebastian Pop
  0 siblings, 1 reply; 32+ messages in thread
From: Jakub Jelinek @ 2009-12-11 21:43 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: Richard Henderson, gcc-patches, Uros Bizjak

On Fri, Dec 11, 2009 at 02:51:14PM -0600, Sebastian Pop wrote:
> On Fri, Dec 11, 2009 at 08:50, Jakub Jelinek <jakub@redhat.com> wrote:
> > Here is roughly what I meant (your 3 patches combined, adjusted by me).
> 
> Thanks for the patch.
> 
> I have discussed with other people internally and they highly
> recommended that we do not define the intrinsics for llwpcb and slwpcb
> as "void const *", but just "void *" as the address of the PCB is
> susceptible to be changed by the hardware.  See attached patch on top
> of yours.

Sure, I've been wondering about that myself too.  That change looks good to
me.

Are those other people ok with the lwpintrin.h changes (primarily only
one intrinsics without number instead of 3 for llwpcb/slwpcb)?

Is the return value from __lwpins the expected one (i.e. setc %al rather
than say setnc %al)?

I've briefly looked at the lengths of lwpins and lwpval insns and
the following worked well in all cases I've tried.

If you are ok with these, can you combine the 3 patches posted today,
write ChangeLog, test it and submit?

OT, wonder why x86-64-lwp.[sd] in gas testsuite only tests addr32 modes,
which are very unlikely to occur in 64-bit code, and not normal addresses
with 64-bit base/index registers.

--- gcc/config/i386/i386.md.jj	2009-12-11 15:38:16.000000000 +0100
+++ gcc/config/i386/i386.md	2009-12-11 22:26:42.000000000 +0100
@@ -20889,7 +20889,8 @@ (define_insn "*lwp_lwpval<mode>3_1"
   "lwpval\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "type" "lwp")
    (set_attr "mode" "<MODE>")
-   (set_attr "length" "9")])
+   (set (attr "length")
+	(symbol_ref "ix86_attr_length_address_default (insn) + 9"))])
 
 (define_expand "lwp_lwpins<mode>3"
   [(set (reg:CCC FLAGS_REG)
@@ -20912,7 +20913,8 @@ (define_insn "*lwp_lwpins<mode>3_1"
   "lwpins\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "type" "lwp")
    (set_attr "mode" "<MODE>")
-   (set_attr "length" "9")])
+   (set (attr "length")
+	(symbol_ref "ix86_attr_length_address_default (insn) + 9"))])
 
 (include "mmx.md")
 (include "sse.md")


	Jakub

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: PATCH: Add LWP support for upcoming AMD Orochi processor.
  2009-12-11 21:43                   ` Jakub Jelinek
@ 2009-12-11 22:27                     ` Sebastian Pop
  2009-12-12  9:27                       ` Sebastian Pop
  0 siblings, 1 reply; 32+ messages in thread
From: Sebastian Pop @ 2009-12-11 22:27 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Richard Henderson, gcc-patches, Uros Bizjak

On Fri, Dec 11, 2009 at 15:34, Jakub Jelinek <jakub@redhat.com> wrote:
> Are those other people ok with the lwpintrin.h changes (primarily only
> one intrinsics without number instead of 3 for llwpcb/slwpcb)?

Yes, I got the ok for the current llwpcb/slwpcb intrinsics.

> Is the return value from __lwpins the expected one (i.e. setc %al rather
> than say setnc %al)?
>

They recommended to use "setb %al", but I am seeing from the manual
that this is similar to "setc %al" as you implemented it.

> I've briefly looked at the lengths of lwpins and lwpval insns and
> the following worked well in all cases I've tried.
>
> If you are ok with these, can you combine the 3 patches posted today,
> write ChangeLog, test it and submit?
>

Yes, I will do this.

> OT, wonder why x86-64-lwp.[sd] in gas testsuite only tests addr32 modes,

Hmm... I don't know, I will give a look at the testsuite, and I will
add more testcases as needed.

> which are very unlikely to occur in 64-bit code, and not normal addresses
> with 64-bit base/index registers.
>
> --- gcc/config/i386/i386.md.jj  2009-12-11 15:38:16.000000000 +0100
> +++ gcc/config/i386/i386.md     2009-12-11 22:26:42.000000000 +0100
> @@ -20889,7 +20889,8 @@ (define_insn "*lwp_lwpval<mode>3_1"
>   "lwpval\t{%2, %1, %0|%0, %1, %2}"
>   [(set_attr "type" "lwp")
>    (set_attr "mode" "<MODE>")
> -   (set_attr "length" "9")])
> +   (set (attr "length")
> +       (symbol_ref "ix86_attr_length_address_default (insn) + 9"))])
>
>  (define_expand "lwp_lwpins<mode>3"
>   [(set (reg:CCC FLAGS_REG)
> @@ -20912,7 +20913,8 @@ (define_insn "*lwp_lwpins<mode>3_1"
>   "lwpins\t{%2, %1, %0|%0, %1, %2}"
>   [(set_attr "type" "lwp")
>    (set_attr "mode" "<MODE>")
> -   (set_attr "length" "9")])
> +   (set (attr "length")
> +       (symbol_ref "ix86_attr_length_address_default (insn) + 9"))])
>
>  (include "mmx.md")
>  (include "sse.md")
>
>
>        Jakub
>

Many thanks for your help,
Sebastian Pop
--
AMD / Open Source Compiler Engineering / GNU Tools

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: PATCH: Add LWP support for upcoming AMD Orochi processor.
  2009-12-11 22:27                     ` Sebastian Pop
@ 2009-12-12  9:27                       ` Sebastian Pop
  2009-12-14 16:35                         ` Richard Henderson
                                           ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Sebastian Pop @ 2009-12-12  9:27 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Richard Henderson, gcc-patches, Uros Bizjak

[-- Attachment #1: Type: text/plain, Size: 2386 bytes --]

Hi,

On Fri, Dec 11, 2009 at 16:26, Sebastian Pop <sebpop@gmail.com> wrote:
> On Fri, Dec 11, 2009 at 15:34, Jakub Jelinek <jakub@redhat.com> wrote:
>> If you are ok with these, can you combine the 3 patches posted today,
>> write ChangeLog, test it and submit?
>>
>
> Yes, I will do this.

Here is the patch that fixes the support for LWP.

2009-12-11  Jakub Jelinek  <jakub@redhat.com>
	    Sebastian Pop  <sebastian.pop@amd.com>

	* config/i386/i386-builtin-types.def (PVOID): Declared.
	(VOID_FTYPE_PVOID): Declared.
	(PVOID_FTYPE_VOID): Declared.
	(UCHAR_FTYPE_USHORT_UINT_USHORT): Removed.
	(VOID_FTYPE_USHORT_UINT_USHORT): Removed.
	* config/i386/i386.c (IX86_BUILTIN_LLWPCB16, IX86_BUILTIN_LLWPCB32,
	IX86_BUILTIN_LLWPCB64, IX86_BUILTIN_SLWPCB16, IX86_BUILTIN_SLWPCB32,
	IX86_BUILTIN_SLWPCB64, IX86_BUILTIN_LWPVAL16, IX86_BUILTIN_LWPINS16):
	Removed.
	(IX86_BUILTIN_LLWPCB, IX86_BUILTIN_SLWPCB): New.
	(bdesc_special_args): Adjust declaration of __builtin_ia32_llwpcb,
	__builtin_ia32_slwpcb, __builtin_ia32_lwpval32,
	__builtin_ia32_lwpval64, __builtin_ia32_lwpins32, and
	__builtin_ia32_lwpins64.
	(ix86_expand_special_args_builtin): Handle VOID_FTYPE_PVOID.
	Do not handle VOID_FTYPE_USHORT_UINT_USHORT and
	UCHAR_FTYPE_USHORT_UINT_USHORT.  Warn when the third operand is
	not an immediate.  Also handle builtin functions with 3 arguments.
	(ix86_expand_builtin): Handle IX86_BUILTIN_LLWPCB and
	IX86_BUILTIN_SLWPCB.
	* config/i386/i386.md (UNSPEC_LLWP_INTRINSIC, UNSPEC_SLWP_INTRINSIC):
	Renamed UNSPECV_LLWP_INTRINSIC and UNSPECV_SLWP_INTRINSIC.
	(memory attribute): Handle lwp.
	(lwp*): Rewrite all the insn patterns for LWP.
	* config/i386/lwpintrin.h (__llwpcb16, __llwpcb32, __llwpcb64,
	__slwpcb16, __slwpcb32, __slwpcb64, __lwpval16, __lwpins16): Removed.
	(__llwpcb, __slwpcb): New.

	testsuite/
	* gcc.target/i386/sse-12.c: Add -mpopcnt and -mlwp.
	* gcc.target/i386/sse-13.c: Same.
	(__builtin_ia32_lwpval32, __builtin_ia32_lwpval64,
	__builtin_ia32_lwpins32, __builtin_ia32_lwpins64): Added testcases.
	* gcc.target/i386/sse-14.c: Add -mpopcnt -mabm -mlwp.
	Added tests for __lwpval32, __lwpins32, __lwpval64, and __lwpins64.
	* gcc.target/i386/sse-22.c: Added tests for popcnt, abm, and lwp.
	* gcc.target/i386/sse-23.c: Same.

Passed bootstrap and test on amd64-linux.  Ok for trunk?

Thanks,
Sebastian Pop
--
AMD / Open Source Compiler Engineering / GNU Tools

[-- Attachment #2: 0001-Fix-LWP.patch --]
[-- Type: text/x-patch, Size: 32279 bytes --]

From 84b601e62a4c23852ab89e48d78e3b66c49d8f23 Mon Sep 17 00:00:00 2001
From: Sebastian Pop <sebpop@gmail.com>
Date: Fri, 11 Dec 2009 14:16:18 -0600
Subject: [PATCH] Fix LWP.

2009-12-11  Jakub Jelinek  <jakub@redhat.com>
	    Sebastian Pop  <sebastian.pop@amd.com>

	* config/i386/i386-builtin-types.def (PVOID): Declared.
	(VOID_FTYPE_PVOID): Declared.
	(PVOID_FTYPE_VOID): Declared.
	(UCHAR_FTYPE_USHORT_UINT_USHORT): Removed.
	(VOID_FTYPE_USHORT_UINT_USHORT): Removed.
	* config/i386/i386.c (IX86_BUILTIN_LLWPCB16, IX86_BUILTIN_LLWPCB32,
	IX86_BUILTIN_LLWPCB64, IX86_BUILTIN_SLWPCB16, IX86_BUILTIN_SLWPCB32,
	IX86_BUILTIN_SLWPCB64, IX86_BUILTIN_LWPVAL16, IX86_BUILTIN_LWPINS16):
	Removed.
	(IX86_BUILTIN_LLWPCB, IX86_BUILTIN_SLWPCB): New.
	(bdesc_special_args): Adjust declaration of __builtin_ia32_llwpcb,
	__builtin_ia32_slwpcb, __builtin_ia32_lwpval32,
	__builtin_ia32_lwpval64, __builtin_ia32_lwpins32, and
	__builtin_ia32_lwpins64.
	(ix86_expand_special_args_builtin): Handle VOID_FTYPE_PVOID.
	Do not handle VOID_FTYPE_USHORT_UINT_USHORT and
	UCHAR_FTYPE_USHORT_UINT_USHORT.  Warn when the third operand is
	not an immediate.  Also handle builtin functions with 3 arguments.
	(ix86_expand_builtin): Handle IX86_BUILTIN_LLWPCB and
	IX86_BUILTIN_SLWPCB.
	* config/i386/i386.md (UNSPEC_LLWP_INTRINSIC, UNSPEC_SLWP_INTRINSIC):
	Renamed UNSPECV_LLWP_INTRINSIC and UNSPECV_SLWP_INTRINSIC.
	(memory attribute): Handle lwp.
	(lwp*): Rewrite all the insn patterns for LWP.
	* config/i386/lwpintrin.h (__llwpcb16, __llwpcb32, __llwpcb64,
	__slwpcb16, __slwpcb32, __slwpcb64, __lwpval16, __lwpins16): Removed.
	(__llwpcb, __slwpcb): New.

	testsuite/
	* gcc.target/i386/sse-12.c: Add -mpopcnt and -mlwp.
	* gcc.target/i386/sse-13.c: Same.
	(__builtin_ia32_lwpval32, __builtin_ia32_lwpval64,
	__builtin_ia32_lwpins32, __builtin_ia32_lwpins64): Added testcases.
	* gcc.target/i386/sse-14.c: Add -mpopcnt -mabm -mlwp.
	Added tests for __lwpval32, __lwpins32, __lwpval64, and __lwpins64.
	* gcc.target/i386/sse-22.c: Added tests for popcnt, abm, and lwp.
	* gcc.target/i386/sse-23.c: Same.
---
 gcc/config/i386/i386-builtin-types.def |    5 +-
 gcc/config/i386/i386.c                 |   69 +++++++++------
 gcc/config/i386/i386.md                |  151 +++++++++++++-------------------
 gcc/config/i386/lwpintrin.h            |   75 +++++++---------
 gcc/testsuite/gcc.target/i386/sse-12.c |    6 +-
 gcc/testsuite/gcc.target/i386/sse-13.c |   23 +++--
 gcc/testsuite/gcc.target/i386/sse-14.c |   16 +++-
 gcc/testsuite/gcc.target/i386/sse-22.c |   22 ++++--
 gcc/testsuite/gcc.target/i386/sse-23.c |   51 ++++++++++-
 9 files changed, 231 insertions(+), 187 deletions(-)

diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def
index e9e4d0c..1fad60f 100644
--- a/gcc/config/i386/i386-builtin-types.def
+++ b/gcc/config/i386/i386-builtin-types.def
@@ -104,6 +104,7 @@ DEF_POINTER_TYPE (PCDOUBLE, DOUBLE, CONST)
 DEF_POINTER_TYPE (PCFLOAT, FLOAT, CONST)
 DEF_POINTER_TYPE (PCHAR, CHAR)
 DEF_POINTER_TYPE (PCVOID, VOID, CONST)
+DEF_POINTER_TYPE (PVOID, VOID)
 DEF_POINTER_TYPE (PDOUBLE, DOUBLE)
 DEF_POINTER_TYPE (PFLOAT, FLOAT)
 DEF_POINTER_TYPE (PINT, INT)
@@ -195,6 +196,8 @@ DEF_FUNCTION_TYPE (V8SF, V8SI)
 DEF_FUNCTION_TYPE (V8SI, V4SI)
 DEF_FUNCTION_TYPE (V8SI, V8SF)
 DEF_FUNCTION_TYPE (VOID, PCVOID)
+DEF_FUNCTION_TYPE (VOID, PVOID)
+DEF_FUNCTION_TYPE (PVOID, VOID)
 DEF_FUNCTION_TYPE (VOID, UNSIGNED)
 
 DEF_FUNCTION_TYPE (DI, V2DI, INT)
@@ -301,7 +304,6 @@ DEF_FUNCTION_TYPE (VOID, UNSIGNED, UNSIGNED)
 DEF_FUNCTION_TYPE (INT, V16QI, V16QI, INT)
 DEF_FUNCTION_TYPE (UCHAR, UINT, UINT, UINT)
 DEF_FUNCTION_TYPE (UCHAR, UINT64, UINT, UINT)
-DEF_FUNCTION_TYPE (UCHAR, USHORT, UINT, USHORT)
 DEF_FUNCTION_TYPE (V16HI, V16HI, V16HI, V16HI)
 DEF_FUNCTION_TYPE (V16QI, V16QI, QI, INT)
 DEF_FUNCTION_TYPE (V16QI, V16QI, V16QI, INT)
@@ -343,7 +345,6 @@ DEF_FUNCTION_TYPE (VOID, PV4SF, V4SF, V4SF)
 DEF_FUNCTION_TYPE (VOID, PV8SF, V8SF, V8SF)
 DEF_FUNCTION_TYPE (VOID, UINT, UINT, UINT)
 DEF_FUNCTION_TYPE (VOID, UINT64, UINT, UINT)
-DEF_FUNCTION_TYPE (VOID, USHORT, UINT, USHORT)
 DEF_FUNCTION_TYPE (VOID, V16QI, V16QI, PCHAR)
 DEF_FUNCTION_TYPE (VOID, V8QI, V8QI, PCHAR)
 DEF_FUNCTION_TYPE (V2DF, V2DF, V2DF, V2DI)
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 0e58a17..9b739a6 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -21251,16 +21251,10 @@ enum ix86_builtins
   IX86_BUILTIN_VPCOMTRUEQ,
 
   /* LWP instructions.  */
-  IX86_BUILTIN_LLWPCB16,
-  IX86_BUILTIN_LLWPCB32,
-  IX86_BUILTIN_LLWPCB64,
-  IX86_BUILTIN_SLWPCB16,
-  IX86_BUILTIN_SLWPCB32,
-  IX86_BUILTIN_SLWPCB64,
-  IX86_BUILTIN_LWPVAL16,
+  IX86_BUILTIN_LLWPCB,
+  IX86_BUILTIN_SLWPCB,
   IX86_BUILTIN_LWPVAL32,
   IX86_BUILTIN_LWPVAL64,
-  IX86_BUILTIN_LWPINS16,
   IX86_BUILTIN_LWPINS32,
   IX86_BUILTIN_LWPINS64,
 
@@ -21532,20 +21526,12 @@ static const struct builtin_description bdesc_special_args[] =
   { OPTION_MASK_ISA_AVX, CODE_FOR_avx_maskstorepd256, "__builtin_ia32_maskstorepd256", IX86_BUILTIN_MASKSTOREPD256, UNKNOWN, (int) VOID_FTYPE_PV4DF_V4DF_V4DF },
   { OPTION_MASK_ISA_AVX, CODE_FOR_avx_maskstoreps256, "__builtin_ia32_maskstoreps256", IX86_BUILTIN_MASKSTOREPS256, UNKNOWN, (int) VOID_FTYPE_PV8SF_V8SF_V8SF },
 
-  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_llwpcbhi1,   "__builtin_ia32_llwpcb16",   IX86_BUILTIN_LLWPCB16,    UNKNOWN,     (int) VOID_FTYPE_VOID },
-  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_llwpcbsi1,   "__builtin_ia32_llwpcb32",   IX86_BUILTIN_LLWPCB32,    UNKNOWN,     (int) VOID_FTYPE_VOID },
-  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_llwpcbdi1,   "__builtin_ia32_llwpcb64",   IX86_BUILTIN_LLWPCB64,    UNKNOWN,     (int) VOID_FTYPE_VOID },
-
-  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_slwpcbhi1,   "__builtin_ia32_slwpcb16",   IX86_BUILTIN_SLWPCB16,    UNKNOWN,     (int) VOID_FTYPE_VOID },
-  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_slwpcbsi1,   "__builtin_ia32_slwpcb32",   IX86_BUILTIN_SLWPCB32,    UNKNOWN,     (int) VOID_FTYPE_VOID },
-  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_slwpcbdi1,   "__builtin_ia32_slwpcb64",   IX86_BUILTIN_SLWPCB64,    UNKNOWN,     (int) VOID_FTYPE_VOID },
-
-  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpvalhi3,   "__builtin_ia32_lwpval16", IX86_BUILTIN_LWPVAL16,  UNKNOWN,     (int) VOID_FTYPE_USHORT_UINT_USHORT },
-  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpvalsi3,   "__builtin_ia32_lwpval32", IX86_BUILTIN_LWPVAL64,  UNKNOWN,     (int) VOID_FTYPE_UINT_UINT_UINT },
-  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpvaldi3,   "__builtin_ia32_lwpval64", IX86_BUILTIN_LWPVAL64,  UNKNOWN,     (int) VOID_FTYPE_UINT64_UINT_UINT },
-  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpinshi3,   "__builtin_ia32_lwpins16", IX86_BUILTIN_LWPINS16,  UNKNOWN,     (int) UCHAR_FTYPE_USHORT_UINT_USHORT },
-  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpinssi3,   "__builtin_ia32_lwpins32", IX86_BUILTIN_LWPINS64,  UNKNOWN,     (int) UCHAR_FTYPE_UINT_UINT_UINT },
-  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpinsdi3,   "__builtin_ia32_lwpins64", IX86_BUILTIN_LWPINS64,  UNKNOWN,     (int) UCHAR_FTYPE_UINT64_UINT_UINT },
+  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_llwpcb, "__builtin_ia32_llwpcb", IX86_BUILTIN_LLWPCB, UNKNOWN, (int) VOID_FTYPE_PVOID },
+  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_slwpcb, "__builtin_ia32_slwpcb", IX86_BUILTIN_SLWPCB, UNKNOWN, (int) PVOID_FTYPE_VOID },
+  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpvalsi3, "__builtin_ia32_lwpval32", IX86_BUILTIN_LWPVAL32, UNKNOWN, (int) VOID_FTYPE_UINT_UINT_UINT },
+  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpvaldi3, "__builtin_ia32_lwpval64", IX86_BUILTIN_LWPVAL64, UNKNOWN, (int) VOID_FTYPE_UINT64_UINT_UINT },
+  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpinssi3, "__builtin_ia32_lwpins32", IX86_BUILTIN_LWPINS32, UNKNOWN, (int) UCHAR_FTYPE_UINT_UINT_UINT },
+  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpinsdi3, "__builtin_ia32_lwpins64", IX86_BUILTIN_LWPINS64, UNKNOWN, (int) UCHAR_FTYPE_UINT64_UINT_UINT },
 
 };
 
@@ -23734,7 +23720,7 @@ ix86_expand_special_args_builtin (const struct builtin_description *d,
     {
       rtx op;
       enum machine_mode mode;
-    } args[2];
+    } args[3];
   enum insn_code icode = d->icode;
   bool last_arg_constant = false;
   const struct insn_data *insn_p = &insn_data[icode];
@@ -23761,6 +23747,7 @@ ix86_expand_special_args_builtin (const struct builtin_description *d,
     case V4DF_FTYPE_PCV2DF:
     case V4DF_FTYPE_PCDOUBLE:
     case V2DF_FTYPE_PCDOUBLE:
+    case VOID_FTYPE_PVOID:
       nargs = 1;
       klass = load;
       memory = 0;
@@ -23804,15 +23791,14 @@ ix86_expand_special_args_builtin (const struct builtin_description *d,
       /* Reserve memory operand for target.  */
       memory = ARRAY_SIZE (args);
       break;
-    case VOID_FTYPE_USHORT_UINT_USHORT:
     case VOID_FTYPE_UINT_UINT_UINT:
     case VOID_FTYPE_UINT64_UINT_UINT:
-    case UCHAR_FTYPE_USHORT_UINT_USHORT:
     case UCHAR_FTYPE_UINT_UINT_UINT:
     case UCHAR_FTYPE_UINT64_UINT_UINT:
       nargs = 3;
-      klass = store;
-      memory = 0;
+      klass = load;
+      memory = ARRAY_SIZE (args);
+      last_arg_constant = true;
       break;
     default:
       gcc_unreachable ();
@@ -23852,7 +23838,14 @@ ix86_expand_special_args_builtin (const struct builtin_description *d,
 	  if (!match)
 	    switch (icode)
 	      {
-	     default:
+	      case CODE_FOR_lwp_lwpvalsi3:
+	      case CODE_FOR_lwp_lwpvaldi3:
+	      case CODE_FOR_lwp_lwpinssi3:
+	      case CODE_FOR_lwp_lwpinsdi3:
+		error ("the last argument must be a 32-bit immediate");
+		return const0_rtx;
+
+	      default:
 		error ("the last argument must be an 8-bit immediate");
 		return const0_rtx;
 	      }
@@ -23893,6 +23886,9 @@ ix86_expand_special_args_builtin (const struct builtin_description *d,
     case 2:
       pat = GEN_FCN (icode) (target, args[0].op, args[1].op);
       break;
+    case 3:
+      pat = GEN_FCN (icode) (target, args[0].op, args[1].op, args[2].op);
+      break;
     default:
       gcc_unreachable ();
     }
@@ -24205,6 +24201,23 @@ ix86_expand_builtin (tree exp, rtx target, rtx subtarget ATTRIBUTE_UNUSED,
 	return target;
       }
 
+    case IX86_BUILTIN_LLWPCB:
+      arg0 = CALL_EXPR_ARG (exp, 0);
+      op0 = expand_normal (arg0);
+      icode = CODE_FOR_lwp_llwpcb;
+      if (! (*insn_data[icode].operand[0].predicate) (op0, Pmode))
+	op0 = copy_to_mode_reg (Pmode, op0);
+      emit_insn (gen_lwp_llwpcb (op0));
+      return 0;
+
+    case IX86_BUILTIN_SLWPCB:
+      icode = CODE_FOR_lwp_slwpcb;
+      if (!target
+	  || ! (*insn_data[icode].operand[0].predicate) (target, Pmode))
+	target = gen_reg_rtx (Pmode);
+      emit_insn (gen_lwp_slwpcb (target));
+      return target;
+
     default:
       break;
     }
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index a4e688a..22e6049 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -204,10 +204,6 @@
    (UNSPEC_XOP_TRUEFALSE	152)
    (UNSPEC_XOP_PERMUTE		153)
    (UNSPEC_FRCZ			154)
-   (UNSPEC_LLWP_INTRINSIC	155)
-   (UNSPEC_SLWP_INTRINSIC	156)
-   (UNSPECV_LWPVAL_INTRINSIC	157)
-   (UNSPECV_LWPINS_INTRINSIC	158)
 
    ; For AES support
    (UNSPEC_AESENC		159)
@@ -251,7 +247,11 @@
    (UNSPECV_RDTSC		18)
    (UNSPECV_RDTSCP		19)
    (UNSPECV_RDPMC		20)
-   (UNSPECV_VSWAPMOV	21)
+   (UNSPECV_VSWAPMOV		21)
+   (UNSPECV_LLWP_INTRINSIC	22)
+   (UNSPECV_SLWP_INTRINSIC	23)
+   (UNSPECV_LWPVAL_INTRINSIC	24)
+   (UNSPECV_LWPINS_INTRINSIC	25)
   ])
 
 ;; Constants to represent pcomtrue/pcomfalse variants
@@ -578,7 +578,7 @@
 ;; if the instruction is complex.
 
 (define_attr "memory" "none,load,store,both,unknown"
-  (cond [(eq_attr "type" "other,multi,str")
+  (cond [(eq_attr "type" "other,multi,str,lwp")
 	   (const_string "unknown")
 	 (eq_attr "type" "lea,fcmov,fpspc")
 	   (const_string "none")
@@ -20835,113 +20835,86 @@
 ;;
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 
-(define_insn "lwp_llwpcbhi1"
-  [(unspec [(match_operand:HI 0 "register_operand" "r")]
-  	   UNSPEC_LLWP_INTRINSIC)]
+(define_expand "lwp_llwpcb"
+  [(unspec_volatile [(match_operand 0 "register_operand" "r")]
+		    UNSPECV_LLWP_INTRINSIC)]
   "TARGET_LWP"
-  "llwpcb\t%0"
-  [(set_attr "type" "lwp")
-   (set_attr "mode" "HI")])
-
-(define_insn "lwp_llwpcbsi1"
-  [(unspec [(match_operand:SI 0 "register_operand" "r")]
-  	   UNSPEC_LLWP_INTRINSIC)]
-  "TARGET_LWP"
-  "llwpcb\t%0"
-  [(set_attr "type" "lwp")
-   (set_attr "mode" "SI")])
+  "")
 
-(define_insn "lwp_llwpcbdi1"
-  [(unspec [(match_operand:DI 0 "register_operand" "r")]
-  	   UNSPEC_LLWP_INTRINSIC)]
+(define_insn "*lwp_llwpcb<mode>1"
+  [(unspec_volatile [(match_operand:P 0 "register_operand" "r")]
+		    UNSPECV_LLWP_INTRINSIC)]
   "TARGET_LWP"
   "llwpcb\t%0"
   [(set_attr "type" "lwp")
-   (set_attr "mode" "DI")])
-
-(define_insn "lwp_slwpcbhi1"
-  [(unspec [(match_operand:HI 0 "register_operand" "r")]
-  	   UNSPEC_SLWP_INTRINSIC)]
-  "TARGET_LWP"
-  "slwpcb\t%0"
-  [(set_attr "type" "lwp")
-   (set_attr "mode" "HI")])
+   (set_attr "mode" "<MODE>")
+   (set_attr "length" "5")])
 
-(define_insn "lwp_slwpcbsi1"
-  [(unspec [(match_operand:SI 0 "register_operand" "r")]
-  	   UNSPEC_SLWP_INTRINSIC)]
+(define_expand "lwp_slwpcb"
+  [(set (match_operand 0 "register_operand" "=r")
+	(unspec_volatile [(const_int 0)] UNSPECV_SLWP_INTRINSIC))]
   "TARGET_LWP"
-  "slwpcb\t%0"
-  [(set_attr "type" "lwp")
-   (set_attr "mode" "SI")])
-
-(define_insn "lwp_slwpcbdi1"
-  [(unspec [(match_operand:DI 0 "register_operand" "r")]
-  	   UNSPEC_SLWP_INTRINSIC)]
+  {
+    if (TARGET_64BIT)
+      emit_insn (gen_lwp_slwpcbdi (operands[0]));
+    else
+      emit_insn (gen_lwp_slwpcbsi (operands[0]));
+    DONE;
+  })
+
+(define_insn "lwp_slwpcb<mode>"
+  [(set (match_operand:P 0 "register_operand" "=r")
+	(unspec_volatile:P [(const_int 0)] UNSPECV_SLWP_INTRINSIC))]
   "TARGET_LWP"
   "slwpcb\t%0"
   [(set_attr "type" "lwp")
-   (set_attr "mode" "DI")])
-
-(define_insn "lwp_lwpvalhi3"
-  [(unspec_volatile [(match_operand:HI 0 "register_operand" "r")
-  	   	     (match_operand:SI 1 "nonimmediate_operand" "rm")
-	   	     (match_operand:HI 2 "const_int_operand" "")]
-  	   	    UNSPECV_LWPVAL_INTRINSIC)]
-  "TARGET_LWP"
-  "lwpval\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "type" "lwp")
-   (set_attr "mode" "HI")])
+   (set_attr "mode" "<MODE>")
+   (set_attr "length" "5")])
 
-(define_insn "lwp_lwpvalsi3"
-  [(unspec_volatile [(match_operand:SI 0 "register_operand" "r")
-    	    	     (match_operand:SI 1 "nonimmediate_operand" "rm")
-	    	     (match_operand:SI 2 "const_int_operand" "")]
+(define_expand "lwp_lwpval<mode>3"
+  [(unspec_volatile [(match_operand:SWI48 1 "register_operand" "r")
+    	    	     (match_operand:SI 2 "nonimmediate_operand" "rm")
+		     (match_operand:SI 3 "const_int_operand" "i")]
 		    UNSPECV_LWPVAL_INTRINSIC)]
   "TARGET_LWP"
-  "lwpval\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "type" "lwp")
-   (set_attr "mode" "SI")])
+  "/* Avoid unused variable warning.  */
+   (void) operand0;")
 
-(define_insn "lwp_lwpvaldi3"
-  [(unspec_volatile [(match_operand:DI 0 "register_operand" "r")
-  		     (match_operand:SI 1 "nonimmediate_operand" "rm")
-		     (match_operand:SI 2 "const_int_operand" "")]
+(define_insn "*lwp_lwpval<mode>3_1"
+  [(unspec_volatile [(match_operand:SWI48 0 "register_operand" "r")
+    	    	     (match_operand:SI 1 "nonimmediate_operand" "rm")
+		     (match_operand:SI 2 "const_int_operand" "i")]
 		    UNSPECV_LWPVAL_INTRINSIC)]
   "TARGET_LWP"
   "lwpval\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "type" "lwp")
-   (set_attr "mode" "DI")])
-
-(define_insn "lwp_lwpinshi3"
-  [(unspec_volatile [(match_operand:HI 0 "register_operand" "r")
-  		     (match_operand:SI 1 "nonimmediate_operand" "rm")
-		     (match_operand:HI 2 "const_int_operand" "")]
-		    UNSPECV_LWPINS_INTRINSIC)]
-  "TARGET_LWP"
-  "lwpins\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "type" "lwp")
-   (set_attr "mode" "HI")])
+   (set_attr "mode" "<MODE>")
+   (set (attr "length")
+        (symbol_ref "ix86_attr_length_address_default (insn) + 9"))])
 
-(define_insn "lwp_lwpinssi3"
-  [(unspec_volatile [(match_operand:SI 0 "register_operand" "r")
-  		     (match_operand:SI 1 "nonimmediate_operand" "rm")
-		     (match_operand:SI 2 "const_int_operand" "")]
-		    UNSPECV_LWPINS_INTRINSIC)]
+(define_expand "lwp_lwpins<mode>3"
+  [(set (reg:CCC FLAGS_REG)
+	(unspec_volatile:CCC [(match_operand:SWI48 1 "register_operand" "r")
+			      (match_operand:SI 2 "nonimmediate_operand" "rm")
+			      (match_operand:SI 3 "const_int_operand" "i")]
+			     UNSPECV_LWPINS_INTRINSIC))
+   (set (match_operand:QI 0 "nonimmediate_operand" "=qm")
+	(eq:QI (reg:CCC FLAGS_REG) (const_int 0)))]
   "TARGET_LWP"
-  "lwpins\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "type" "lwp")
-   (set_attr "mode" "SI")])
+  "")
 
-(define_insn "lwp_lwpinsdi3"
-  [(unspec_volatile [(match_operand:DI 0 "register_operand" "r")
-  		     (match_operand:SI 1 "nonimmediate_operand" "rm")
-		     (match_operand:SI 2 "const_int_operand" "")]
-		    UNSPECV_LWPINS_INTRINSIC)]
+(define_insn "*lwp_lwpins<mode>3_1"
+  [(set (reg:CCC FLAGS_REG)
+	(unspec_volatile:CCC [(match_operand:SWI48 0 "register_operand" "r")
+			      (match_operand:SI 1 "nonimmediate_operand" "rm")
+			      (match_operand:SI 2 "const_int_operand" "i")]
+			     UNSPECV_LWPINS_INTRINSIC))]
   "TARGET_LWP"
   "lwpins\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "type" "lwp")
-   (set_attr "mode" "DI")])
+   (set_attr "mode" "<MODE>")
+   (set (attr "length")
+        (symbol_ref "ix86_attr_length_address_default (insn) + 9"))])
 
 (include "mmx.md")
 (include "sse.md")
diff --git a/gcc/config/i386/lwpintrin.h b/gcc/config/i386/lwpintrin.h
index e5137ec..954b039 100644
--- a/gcc/config/i386/lwpintrin.h
+++ b/gcc/config/i386/lwpintrin.h
@@ -33,77 +33,68 @@
 #else
 
 extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__))
-__llwpcb16 (void *pcbAddress)
+__llwpcb (void *pcbAddress)
 {
-  __builtin_ia32_llwpcb16 (pcbAddress);
-}
-
-extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__))
-__llwpcb32 (void *pcbAddress)
-{
-  __builtin_ia32_llwpcb32 (pcbAddress);
-}
-
-extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__))
-__llwpcb64 (void *pcbAddress)
-{
-  __builtin_ia32_llwpcb64 (pcbAddress);
-}
-
-extern __inline void * __attribute__((__gnu_inline__, __always_inline__, __artificial__))
-__slwpcb16 (void)
-{
-  return __builtin_ia32_slwpcb16 ();
-}
-
-extern __inline void * __attribute__((__gnu_inline__, __always_inline__, __artificial__))
-__slwpcb32 (void)
-{
-  return __builtin_ia32_slwpcb32 ();
+  __builtin_ia32_llwpcb (pcbAddress);
 }
 
 extern __inline void * __attribute__((__gnu_inline__, __always_inline__, __artificial__))
-__slwpcb64 (void)
+__slwpcb (void)
 {
-  return __builtin_ia32_slwpcb64 ();
+  return __builtin_ia32_slwpcb ();
 }
 
-extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__))
-__lwpval16 (unsigned short data2, unsigned int data1, unsigned short flags)
-{
-  __builtin_ia32_lwpval16 (data2, data1, flags);
-}
-/*
+#ifdef __OPTIMIZE__
 extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 __lwpval32 (unsigned int data2, unsigned int data1, unsigned int flags)
 {
   __builtin_ia32_lwpval32 (data2, data1, flags);
 }
 
+#ifdef __x86_64__
 extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__))
-__lwpval64 (unsigned __int64 data2, unsigned int data1, unsigned int flags)
+__lwpval64 (unsigned long long data2, unsigned int data1, unsigned int flags)
 {
   __builtin_ia32_lwpval64 (data2, data1, flags);
 }
+#endif
+#else
+#define __lwpval32(D2, D1, F) \
+  (__builtin_ia32_lwpval32 ((unsigned int) (D2), (unsigned int) (D1), \
+			    (unsigned int) (F)))
+#ifdef __x86_64__
+#define __lwpval64(D2, D1, F) \
+  (__builtin_ia32_lwpval64 ((unsigned long long) (D2), (unsigned int) (D1), \
+			    (unsigned int) (F)))
+#endif
+#endif
 
-extern __inline unsigned char __attribute__((__gnu_inline__, __always_inline__, __artificial__))
-__lwpins16 (unsigned short data2, unsigned int data1, unsigned short flags)
-{
-  return __builtin_ia32_lwpins16 (data2, data1, flags);
-}
 
+#ifdef __OPTIMIZE__
 extern __inline unsigned char __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 __lwpins32 (unsigned int data2, unsigned int data1, unsigned int flags)
 {
   return __builtin_ia32_lwpins32 (data2, data1, flags);
 }
 
+#ifdef __x86_64__
 extern __inline unsigned char __attribute__((__gnu_inline__, __always_inline__, __artificial__))
-__lwpins64 (unsigned __int64 data2, unsigned int data1, unsigned int flags)
+__lwpins64 (unsigned long long data2, unsigned int data1, unsigned int flags)
 {
   return __builtin_ia32_lwpins64 (data2, data1, flags);
 }
-*/
+#endif
+#else
+#define __lwpins32(D2, D1, F) \
+  (__builtin_ia32_lwpins32 ((unsigned int) (D2), (unsigned int) (D1), \
+			    (unsigned int) (F)))
+#ifdef __x86_64__
+#define __lwpins64(D2, D1, F) \
+  (__builtin_ia32_lwpins64 ((unsigned long long) (D2), (unsigned int) (D1), \
+			    (unsigned int) (F)))
+#endif
+#endif
+
 #endif /* __LWP__ */
 
 #endif /* _LWPINTRIN_H_INCLUDED */
diff --git a/gcc/testsuite/gcc.target/i386/sse-12.c b/gcc/testsuite/gcc.target/i386/sse-12.c
index 4a314e8..77baff0 100644
--- a/gcc/testsuite/gcc.target/i386/sse-12.c
+++ b/gcc/testsuite/gcc.target/i386/sse-12.c
@@ -1,8 +1,8 @@
 /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, xopintrin.h, mm3dnow.h,
-   abmintrin.h and mm_malloc.h are usable with -O -std=c89
-   -pedantic-errors.  */
+   abmintrin.h, lwpintrin.h, popcntintrin.h and mm_malloc.h are usable
+   with -O -std=c89 -pedantic-errors.  */
 /* { dg-do compile } */
-/* { dg-options "-O -std=c89 -pedantic-errors -march=k8 -m3dnow -mavx -mfma4 -mxop -maes -mpclmul -mabm" } */
+/* { dg-options "-O -std=c89 -pedantic-errors -march=k8 -m3dnow -mavx -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlwp" } */
 
 #include <x86intrin.h>
 
diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c
index 546a99f..96214e0 100644
--- a/gcc/testsuite/gcc.target/i386/sse-13.c
+++ b/gcc/testsuite/gcc.target/i386/sse-13.c
@@ -1,13 +1,14 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mxop -maes -mpclmul -mabm" } */
+/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mxop -maes -mpclmul -mpopcnt -mabm -mlwp" } */
 
 #include <mm_malloc.h>
 
 /* Test that the intrinsics compile with optimization.  All of them
    are defined as inline functions in {,x,e,p,t,s,w,a,b,i}mmintrin.h,
-   xopintrin.h, abmintrin.h and mm3dnow.h that reference the proper
-   builtin functions.  Defining away "extern" and "__inline" results
-   in all of them being compiled as proper functions.  */
+   xopintrin.h, abmintrin.h, lwpintrin.h, popcntintrin.h and mm3dnow.h
+   that reference the proper builtin functions.  Defining away
+   "extern" and "__inline" results in all of them being compiled as
+   proper functions.  */
 
 #define extern
 #define __inline
@@ -127,9 +128,15 @@
 #define __builtin_ia32_shufps(A, B, N) __builtin_ia32_shufps(A, B, 0)
 
 /* xopintrin.h */
-#define  __builtin_ia32_vprotbi(A, N) __builtin_ia32_vprotbi (A,1)
-#define  __builtin_ia32_vprotwi(A, N) __builtin_ia32_vprotwi (A,1)
-#define  __builtin_ia32_vprotdi(A, N) __builtin_ia32_vprotdi (A,1)
-#define  __builtin_ia32_vprotqi(A, N) __builtin_ia32_vprotqi (A,1)
+#define __builtin_ia32_vprotbi(A, N) __builtin_ia32_vprotbi (A,1)
+#define __builtin_ia32_vprotwi(A, N) __builtin_ia32_vprotwi (A,1)
+#define __builtin_ia32_vprotdi(A, N) __builtin_ia32_vprotdi (A,1)
+#define __builtin_ia32_vprotqi(A, N) __builtin_ia32_vprotqi (A,1)
+
+/* lwpintrin.h */
+#define __builtin_ia32_lwpval32(D2, D1, F) __builtin_ia32_lwpval32 (D2, D1, 1)
+#define __builtin_ia32_lwpval64(D2, D1, F) __builtin_ia32_lwpval64 (D2, D1, 1)
+#define __builtin_ia32_lwpins32(D2, D1, F) __builtin_ia32_lwpins32 (D2, D1, 1)
+#define __builtin_ia32_lwpins64(D2, D1, F) __builtin_ia32_lwpins64 (D2, D1, 1)
 
 #include <x86intrin.h>
diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c
index 783cd0a..c3f72e4 100644
--- a/gcc/testsuite/gcc.target/i386/sse-14.c
+++ b/gcc/testsuite/gcc.target/i386/sse-14.c
@@ -1,12 +1,13 @@
 /* { dg-do compile } */
-/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mxop -msse4a -maes -mpclmul" } */
+/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mxop -msse4a -maes -mpclmul -mpopcnt -mabm -mlwp" } */
 
 #include <mm_malloc.h>
 
 /* Test that the intrinsics compile without optimization.  All of them are
-   defined as inline functions in {,x,e,p,t,s,w,a}mmintrin.h, xopintrin.h  and mm3dnow.h
-   that reference the proper builtin functions.  Defining away "extern" and
-   "__inline" results in all of them being compiled as proper functions.  */
+   defined as inline functions in {,x,e,p,t,s,w,a}mmintrin.h, xopintrin.h,
+   lwpintrin.h and mm3dnow.h that reference the proper builtin functions.
+   Defining away "extern" and "__inline" results in all of them being compiled
+   as proper functions.  */
 
 #define extern
 #define __inline
@@ -162,3 +163,10 @@ test_1 ( _mm_roti_epi16, __m128i, __m128i, 1)
 test_1 ( _mm_roti_epi32, __m128i, __m128i, 1)
 test_1 ( _mm_roti_epi64, __m128i, __m128i, 1)
 
+/* lwpintrin.h */
+test_2 ( __lwpval32, void, unsigned int, unsigned int, 1)
+test_2 ( __lwpins32, unsigned char, unsigned int, unsigned int, 1)
+#ifdef __x86_64__
+test_2 ( __lwpval64, void, unsigned long long, unsigned int, 1)
+test_2 ( __lwpins64, unsigned char, unsigned long long, unsigned int, 1)
+#endif
diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c
index 541cad4..6d97697 100644
--- a/gcc/testsuite/gcc.target/i386/sse-22.c
+++ b/gcc/testsuite/gcc.target/i386/sse-22.c
@@ -4,10 +4,12 @@
 
 #include <mm_malloc.h>
 
-/* Test that the intrinsics compile without optimization.  All of them are
-   defined as inline functions in {,x,e,p,t,s,w,a}mmintrin.h, xopintrin.h and mm3dnow.h
-   that reference the proper builtin functions.  Defining away "extern" and
-   "__inline" results in all of them being compiled as proper functions.  */
+/* Test that the intrinsics compile without optimization.  All of them
+   are defined as inline functions in {,x,e,p,t,s,w,a}mmintrin.h,
+   xopintrin.h, lwpintrin.h, popcntintrin.h and mm3dnow.h that
+   reference the proper builtin functions.  Defining away "extern" and
+   "__inline" results in all of them being compiled as proper
+   functions.  */
 
 #define extern
 #define __inline
@@ -37,7 +39,7 @@
 
 
 #ifndef DIFFERENT_PRAGMAS
-#pragma GCC target ("mmx,3dnow,sse,sse2,sse3,ssse3,sse4.1,sse4.2,sse4a,aes,pclmul,xop")
+#pragma GCC target ("mmx,3dnow,sse,sse2,sse3,ssse3,sse4.1,sse4.2,sse4a,aes,pclmul,xop,popcnt,abm,lwp")
 #endif
 
 /* Following intrinsics require immediate arguments.  They
@@ -162,10 +164,18 @@ test_2 (_mm_round_ss, __m128, __m128, __m128, 1)
 
 /* xopintrin.h (XOP). */
 #ifdef DIFFERENT_PRAGMAS
-#pragma GCC target ("xop")
+#pragma GCC target ("xop,lwp")
 #endif
 #include <x86intrin.h>
 test_1 ( _mm_roti_epi8, __m128i, __m128i, 1)
 test_1 ( _mm_roti_epi16, __m128i, __m128i, 1)
 test_1 ( _mm_roti_epi32, __m128i, __m128i, 1)
 test_1 ( _mm_roti_epi64, __m128i, __m128i, 1)
+
+/* lwpintrin.h (LWP). */
+test_2 ( __lwpval32, void, unsigned int, unsigned int, 1)
+test_2 ( __lwpins32, unsigned char, unsigned int, unsigned int, 1)
+#ifdef __x86_64__
+test_2 ( __lwpval64, void, unsigned long long, unsigned int, 1)
+test_2 ( __lwpins64, unsigned char, unsigned long long, unsigned int, 1)
+#endif
diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c
index 3e0fa1f..f74d3a7 100644
--- a/gcc/testsuite/gcc.target/i386/sse-23.c
+++ b/gcc/testsuite/gcc.target/i386/sse-23.c
@@ -3,10 +3,12 @@
 
 #include <mm_malloc.h>
 
-/* Test that the intrinsics compile with optimization.  All of them are
-   defined as inline functions in {,x,e,p,t,s,w,a}mmintrin.h, xopintrin.h and mm3dnow.h
-   that reference the proper builtin functions.  Defining away "extern" and
-   "__inline" results in all of them being compiled as proper functions.  */
+/* Test that the intrinsics compile with optimization.  All of them
+   are defined as inline functions in {,x,e,p,t,s,w,a}mmintrin.h,
+   xopintrin.h, lwpintrin.h, popcntintrin.h and mm3dnow.h that
+   reference the proper builtin functions.  Defining away "extern" and
+   "__inline" results in all of them being compiled as proper
+   functions.  */
 
 #define extern
 #define __inline
@@ -93,13 +95,52 @@
 #define __builtin_ia32_vec_ext_v4hi(A, N) __builtin_ia32_vec_ext_v4hi(A, 0)
 #define __builtin_ia32_shufps(A, B, N) __builtin_ia32_shufps(A, B, 0)
 
+/* immintrin.h */
+#define __builtin_ia32_blendpd256(X, Y, M) __builtin_ia32_blendpd256(X, Y, 1)
+#define __builtin_ia32_blendps256(X, Y, M) __builtin_ia32_blendps256(X, Y, 1)
+#define __builtin_ia32_dpps256(X, Y, M) __builtin_ia32_dpps256(X, Y, 1)
+#define __builtin_ia32_shufpd256(X, Y, M) __builtin_ia32_shufpd256(X, Y, 1)
+#define __builtin_ia32_shufps256(X, Y, M) __builtin_ia32_shufps256(X, Y, 1)
+#define __builtin_ia32_cmpsd(X, Y, O) __builtin_ia32_cmpsd(X, Y, 1)
+#define __builtin_ia32_cmpss(X, Y, O) __builtin_ia32_cmpss(X, Y, 1)
+#define __builtin_ia32_cmppd(X, Y, O) __builtin_ia32_cmppd(X, Y, 1)
+#define __builtin_ia32_cmpps(X, Y, O) __builtin_ia32_cmpps(X, Y, 1)
+#define __builtin_ia32_cmppd256(X, Y, O) __builtin_ia32_cmppd256(X, Y, 1)
+#define __builtin_ia32_cmpps256(X, Y, O) __builtin_ia32_cmpps256(X, Y, 1)
+#define __builtin_ia32_vextractf128_pd256(X, N) __builtin_ia32_vextractf128_pd256(X, 1)
+#define __builtin_ia32_vextractf128_ps256(X, N) __builtin_ia32_vextractf128_ps256(X, 1)
+#define __builtin_ia32_vextractf128_si256(X, N) __builtin_ia32_vextractf128_si256(X, 1)
+#define __builtin_ia32_vpermilpd(X, N) __builtin_ia32_vpermilpd(X, 1)
+#define __builtin_ia32_vpermilpd256(X, N) __builtin_ia32_vpermilpd256(X, 1)
+#define __builtin_ia32_vpermilps(X, N) __builtin_ia32_vpermilps(X, 1)
+#define __builtin_ia32_vpermilps256(X, N) __builtin_ia32_vpermilps256(X, 1)
+#define __builtin_ia32_vpermil2pd(X, Y, C, I) __builtin_ia32_vpermil2pd(X, Y, C, 1)
+#define __builtin_ia32_vpermil2pd256(X, Y, C, I) __builtin_ia32_vpermil2pd256(X, Y, C, 1)
+#define __builtin_ia32_vpermil2ps(X, Y, C, I) __builtin_ia32_vpermil2ps(X, Y, C, 1)
+#define __builtin_ia32_vpermil2ps256(X, Y, C, I) __builtin_ia32_vpermil2ps256(X, Y, C, 1)
+#define __builtin_ia32_vperm2f128_pd256(X, Y, C) __builtin_ia32_vperm2f128_pd256(X, Y, 1)
+#define __builtin_ia32_vperm2f128_ps256(X, Y, C) __builtin_ia32_vperm2f128_ps256(X, Y, 1)
+#define __builtin_ia32_vperm2f128_si256(X, Y, C) __builtin_ia32_vperm2f128_si256(X, Y, 1)
+#define __builtin_ia32_vinsertf128_pd256(X, Y, C) __builtin_ia32_vinsertf128_pd256(X, Y, 1)
+#define __builtin_ia32_vinsertf128_ps256(X, Y, C) __builtin_ia32_vinsertf128_ps256(X, Y, 1)
+#define __builtin_ia32_vinsertf128_si256(X, Y, C) __builtin_ia32_vinsertf128_si256(X, Y, 1)
+#define __builtin_ia32_roundpd256(V, M) __builtin_ia32_roundpd256(V, 1)
+#define __builtin_ia32_roundps256(V, M) __builtin_ia32_roundps256(V, 1)
+
 /* xopintrin.h */
 #define __builtin_ia32_vprotbi(A, B) __builtin_ia32_vprotbi(A,1)
 #define __builtin_ia32_vprotwi(A, B) __builtin_ia32_vprotwi(A,1)
 #define __builtin_ia32_vprotdi(A, B) __builtin_ia32_vprotdi(A,1)
 #define __builtin_ia32_vprotqi(A, B) __builtin_ia32_vprotqi(A,1)
 
-#pragma GCC target ("3dnow,sse4,sse4a,aes,pclmul,xop")
+/* lwpintrin.h */
+#define __builtin_ia32_lwpval32(D2, D1, F) __builtin_ia32_lwpval32 (D2, D1, 1)
+#define __builtin_ia32_lwpval64(D2, D1, F) __builtin_ia32_lwpval64 (D2, D1, 1)
+#define __builtin_ia32_lwpins32(D2, D1, F) __builtin_ia32_lwpins32 (D2, D1, 1)
+#define __builtin_ia32_lwpins64(D2, D1, F) __builtin_ia32_lwpins64 (D2, D1, 1)
+
+#pragma GCC target ("3dnow,sse4,sse4a,aes,pclmul,xop,abm,popcnt,lwp")
 #include <wmmintrin.h>
 #include <smmintrin.h>
 #include <mm3dnow.h>
+#include <x86intrin.h>
-- 
1.6.0.4


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: PATCH: Add LWP support for upcoming AMD Orochi processor.
  2009-12-12  9:27                       ` Sebastian Pop
@ 2009-12-14 16:35                         ` Richard Henderson
  2009-12-14 19:15                         ` H.J. Lu
  2009-12-14 20:14                         ` Uros Bizjak
  2 siblings, 0 replies; 32+ messages in thread
From: Richard Henderson @ 2009-12-14 16:35 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: Jakub Jelinek, gcc-patches, Uros Bizjak

Ok.


r~

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: PATCH: Add LWP support for upcoming AMD Orochi processor.
  2009-12-12  9:27                       ` Sebastian Pop
  2009-12-14 16:35                         ` Richard Henderson
@ 2009-12-14 19:15                         ` H.J. Lu
  2009-12-14 19:21                           ` Jakub Jelinek
  2009-12-14 20:14                         ` Uros Bizjak
  2 siblings, 1 reply; 32+ messages in thread
From: H.J. Lu @ 2009-12-14 19:15 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: Jakub Jelinek, Richard Henderson, gcc-patches, Uros Bizjak

On Fri, Dec 11, 2009 at 11:24 PM, Sebastian Pop <sebpop@gmail.com> wrote:
> Hi,
>
> On Fri, Dec 11, 2009 at 16:26, Sebastian Pop <sebpop@gmail.com> wrote:
>> On Fri, Dec 11, 2009 at 15:34, Jakub Jelinek <jakub@redhat.com> wrote:
>>> If you are ok with these, can you combine the 3 patches posted today,
>>> write ChangeLog, test it and submit?
>>>
>>
>> Yes, I will do this.
>
> Here is the patch that fixes the support for LWP.
>
> 2009-12-11  Jakub Jelinek  <jakub@redhat.com>
>            Sebastian Pop  <sebastian.pop@amd.com>
>
>        * config/i386/i386-builtin-types.def (PVOID): Declared.
>        (VOID_FTYPE_PVOID): Declared.
>        (PVOID_FTYPE_VOID): Declared.
>        (UCHAR_FTYPE_USHORT_UINT_USHORT): Removed.
>        (VOID_FTYPE_USHORT_UINT_USHORT): Removed.
>        * config/i386/i386.c (IX86_BUILTIN_LLWPCB16, IX86_BUILTIN_LLWPCB32,
>        IX86_BUILTIN_LLWPCB64, IX86_BUILTIN_SLWPCB16, IX86_BUILTIN_SLWPCB32,
>        IX86_BUILTIN_SLWPCB64, IX86_BUILTIN_LWPVAL16, IX86_BUILTIN_LWPINS16):
>        Removed.
>        (IX86_BUILTIN_LLWPCB, IX86_BUILTIN_SLWPCB): New.
>        (bdesc_special_args): Adjust declaration of __builtin_ia32_llwpcb,
>        __builtin_ia32_slwpcb, __builtin_ia32_lwpval32,
>        __builtin_ia32_lwpval64, __builtin_ia32_lwpins32, and
>        __builtin_ia32_lwpins64.
>        (ix86_expand_special_args_builtin): Handle VOID_FTYPE_PVOID.
>        Do not handle VOID_FTYPE_USHORT_UINT_USHORT and
>        UCHAR_FTYPE_USHORT_UINT_USHORT.  Warn when the third operand is
>        not an immediate.  Also handle builtin functions with 3 arguments.
>        (ix86_expand_builtin): Handle IX86_BUILTIN_LLWPCB and
>        IX86_BUILTIN_SLWPCB.
>        * config/i386/i386.md (UNSPEC_LLWP_INTRINSIC, UNSPEC_SLWP_INTRINSIC):
>        Renamed UNSPECV_LLWP_INTRINSIC and UNSPECV_SLWP_INTRINSIC.
>        (memory attribute): Handle lwp.
>        (lwp*): Rewrite all the insn patterns for LWP.
>        * config/i386/lwpintrin.h (__llwpcb16, __llwpcb32, __llwpcb64,
>        __slwpcb16, __slwpcb32, __slwpcb64, __lwpval16, __lwpins16): Removed.
>        (__llwpcb, __slwpcb): New.
>
>        testsuite/
>        * gcc.target/i386/sse-12.c: Add -mpopcnt and -mlwp.
>        * gcc.target/i386/sse-13.c: Same.
>        (__builtin_ia32_lwpval32, __builtin_ia32_lwpval64,
>        __builtin_ia32_lwpins32, __builtin_ia32_lwpins64): Added testcases.
>        * gcc.target/i386/sse-14.c: Add -mpopcnt -mabm -mlwp.
>        Added tests for __lwpval32, __lwpins32, __lwpval64, and __lwpins64.
>        * gcc.target/i386/sse-22.c: Added tests for popcnt, abm, and lwp.
>        * gcc.target/i386/sse-23.c: Same.
>

This patch breaks gcc bootstrap on Linux/ia32:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42369

The problem is

;; Single word integer modes without QImode and HImode.
(define_mode_iterator SWI48 [SI (DI "TARGET_64BIT")])

with

              case CODE_FOR_lwp_lwpvalsi3:
              case CODE_FOR_lwp_lwpvaldi3:
              case CODE_FOR_lwp_lwpinssi3:
              case CODE_FOR_lwp_lwpinsdi3:


Both CODE_FOR_lwp_lwpvaldi3 and CODE_FOR_lwp_lwpinsdi3
are  CODE_FOR_nothing for ia32.

I don't think you can use (DI "TARGET_64BIT") in define_mode_iterator.


-- 
H.J.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: PATCH: Add LWP support for upcoming AMD Orochi processor.
  2009-12-14 19:15                         ` H.J. Lu
@ 2009-12-14 19:21                           ` Jakub Jelinek
  2009-12-14 19:38                             ` Richard Henderson
  0 siblings, 1 reply; 32+ messages in thread
From: Jakub Jelinek @ 2009-12-14 19:21 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Sebastian Pop, Richard Henderson, gcc-patches, Uros Bizjak

On Mon, Dec 14, 2009 at 11:00:05AM -0800, H.J. Lu wrote:
> This patch breaks gcc bootstrap on Linux/ia32:
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42369
> 
> The problem is
> 
> ;; Single word integer modes without QImode and HImode.
> (define_mode_iterator SWI48 [SI (DI "TARGET_64BIT")])
> 
> with
> 
>               case CODE_FOR_lwp_lwpvalsi3:
>               case CODE_FOR_lwp_lwpvaldi3:
>               case CODE_FOR_lwp_lwpinssi3:
>               case CODE_FOR_lwp_lwpinsdi3:
> 
> 
> Both CODE_FOR_lwp_lwpvaldi3 and CODE_FOR_lwp_lwpinsdi3
> are  CODE_FOR_nothing for ia32.
> 
> I don't think you can use (DI "TARGET_64BIT") in define_mode_iterator.

That's of course fine, what is not is using those CODE_FOR_*
in case labels when those CODE_FOR_* values aren't guaranteed to be
not CODE_FOR_nothing.

This works for me:

2009-12-14  Jakub Jelinek  <jakub@redhat.com>

	PR bootstrap/42369
	* config/i386/i386.c (ix86_expand_special_args_builtin): Don't use
	CODE_FOR_lwp_lwp{val,ins}di3 in case labels.

--- gcc/config/i386/i386.c.jj	2009-12-14 17:50:14.000000000 +0100
+++ gcc/config/i386/i386.c	2009-12-14 20:04:48.000000000 +0100
@@ -23839,13 +23839,21 @@ ix86_expand_special_args_builtin (const 
 	    switch (icode)
 	      {
 	      case CODE_FOR_lwp_lwpvalsi3:
-	      case CODE_FOR_lwp_lwpvaldi3:
 	      case CODE_FOR_lwp_lwpinssi3:
-	      case CODE_FOR_lwp_lwpinsdi3:
 		error ("the last argument must be a 32-bit immediate");
 		return const0_rtx;
 
 	      default:
+		/* These 2 codes can't use case labels, as
+		   in 32-bit only target builds they are both
+		   CODE_FOR_nothing.  */
+		if (TARGET_64BIT
+		    && (icode == CODE_FOR_lwp_lwpvaldi3
+			|| icode == CODE_FOR_lwp_lwpinsdi3))
+		  {
+		    error ("the last argument must be a 32-bit immediate");
+		    return const0_rtx;
+		  }
 		error ("the last argument must be an 8-bit immediate");
 		return const0_rtx;
 	      }


	Jakub

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: PATCH: Add LWP support for upcoming AMD Orochi processor.
  2009-12-14 19:21                           ` Jakub Jelinek
@ 2009-12-14 19:38                             ` Richard Henderson
  2009-12-14 20:15                               ` Jakub Jelinek
  0 siblings, 1 reply; 32+ messages in thread
From: Richard Henderson @ 2009-12-14 19:38 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: H.J. Lu, Sebastian Pop, gcc-patches, Uros Bizjak

On 12/14/2009 11:08 AM, Jakub Jelinek wrote:
> +		/* These 2 codes can't use case labels, as
> +		   in 32-bit only target builds they are both
> +		   CODE_FOR_nothing.  */
> +		if (TARGET_64BIT
> +		&&  (icode == CODE_FOR_lwp_lwpvaldi3
> +			|| icode == CODE_FOR_lwp_lwpinsdi3))

I wonder if it just wouldn't be easier to use an IF here.
You don't need the TARGET_64BIT since we know icode != CODE_FOR_nothing.


r~

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: PATCH: Add LWP support for upcoming AMD Orochi processor.
  2009-12-12  9:27                       ` Sebastian Pop
  2009-12-14 16:35                         ` Richard Henderson
  2009-12-14 19:15                         ` H.J. Lu
@ 2009-12-14 20:14                         ` Uros Bizjak
  2009-12-14 20:38                           ` Jakub Jelinek
  2 siblings, 1 reply; 32+ messages in thread
From: Uros Bizjak @ 2009-12-14 20:14 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: Jakub Jelinek, Richard Henderson, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 3103 bytes --]

On 12/12/2009 08:24 AM, Sebastian Pop wrote:
>
>>> If you are ok with these, can you combine the 3 patches posted today,
>>> write ChangeLog, test it and submit?
>>>
>>>        
>> Yes, I will do this.
>>      
> Here is the patch that fixes the support for LWP.
>
> 2009-12-11  Jakub Jelinek<jakub@redhat.com>
> 	Sebastian Pop<sebastian.pop@amd.com>
>    
...
>
> 	testsuite/
> 	* gcc.target/i386/sse-12.c: Add -mpopcnt and -mlwp.
> 	* gcc.target/i386/sse-13.c: Same.
> 	(__builtin_ia32_lwpval32, __builtin_ia32_lwpval64,
> 	__builtin_ia32_lwpins32, __builtin_ia32_lwpins64): Added testcases.
> 	* gcc.target/i386/sse-14.c: Add -mpopcnt -mabm -mlwp.
> 	Added tests for __lwpval32, __lwpins32, __lwpval64, and __lwpins64.
> 	* gcc.target/i386/sse-22.c: Added tests for popcnt, abm, and lwp.
> 	* gcc.target/i386/sse-23.c: Same.
>
> Passed bootstrap and test on amd64-linux.  Ok for trunk?
>    

You forgot to change/add intrinsics tests to g++.dg/other/sse-{2,3}.C, 
like in attached patch (sse-5.C and sse-6.C can be deleted, since all 
headers play nicely with each other.

Strangely, with attached patch, I got a couple of errors:

In file included from 
/home/uros/gcc-build/gcc/testsuite/g++/../../include/x86intrin.h:77:0,
                  from 
/home/uros/gcc-svn/trunk/gcc/testsuite/g++.dg/other/i386-2.C:8:
<built-in>: In function 'void* __slwpcb()':
<built-in>:0:0: error: too few arguments to function 'void* 
__builtin_ia32_slwpcb(void)'
/home/uros/gcc-build/gcc/testsuite/g++/../../include/lwpintrin.h:44:33: 
error: at this point in file
compiler exited with status 1
output is:
In file included from 
/home/uros/gcc-build/gcc/testsuite/g++/../../include/x86intrin.h:77:0,
                  from 
/home/uros/gcc-svn/trunk/gcc/testsuite/g++.dg/other/i386-2.C:8:
<built-in>: In function 'void* __slwpcb()':
<built-in>:0:0: error: too few arguments to function 'void* 
__builtin_ia32_slwpcb(void)'
/home/uros/gcc-build/gcc/testsuite/g++/../../include/lwpintrin.h:44:33: 
error: at this point in file

FAIL: g++.dg/other/i386-2.C (test for excess errors)

and:

In file included from 
/home/uros/gcc-build/gcc/testsuite/g++/../../include/x86intrin.h:77:0,
                  from 
/home/uros/gcc-svn/trunk/gcc/testsuite/g++.dg/other/i386-3.C:8:
<built-in>: In function 'void* __slwpcb()':
<built-in>:0:0: error: too few arguments to function 'void* 
__builtin_ia32_slwpcb(void)'
/home/uros/gcc-build/gcc/testsuite/g++/../../include/lwpintrin.h:44:33: 
error: at this point in file
compiler exited with status 1
output is:
In file included from 
/home/uros/gcc-build/gcc/testsuite/g++/../../include/x86intrin.h:77:0,
                  from 
/home/uros/gcc-svn/trunk/gcc/testsuite/g++.dg/other/i386-3.C:8:
<built-in>: In function 'void* __slwpcb()':
<built-in>:0:0: error: too few arguments to function 'void* 
__builtin_ia32_slwpcb(void)'
/home/uros/gcc-build/gcc/testsuite/g++/../../include/lwpintrin.h:44:33: 
error: at this point in file

FAIL: g++.dg/other/i386-3.C (test for excess errors)

Uros.
> Thanks,
> Sebastian Pop
> --
> AMD / Open Source Compiler Engineering / GNU Tools
>    


[-- Attachment #2: s.diff.txt --]
[-- Type: text/plain, Size: 2726 bytes --]

Index: g++.dg/other/i386-2.C
===================================================================
--- g++.dg/other/i386-2.C	(revision 155226)
+++ g++.dg/other/i386-2.C	(working copy)
@@ -1,8 +1,10 @@
-/* Test that {,x,e,p,t,s,w,a,i}mmintrin.h, fma4intrin.h, xopintrin.h, mm3dnow.h and
-   mm_malloc.h are usable with -O -pedantic-errors.  */
 /* { dg-do compile { target i?86-*-* x86_64-*-* } } */
-/* { dg-options "-O -pedantic-errors -march=k8 -m3dnow -mavx -msse4a -mfma4 -mxop -maes -mpclmul" } */
+/* { dg-options "-O -pedantic-errors -march=k8 -m3dnow -mavx -mxop -maes -mpclmul -mpopcnt -mabm -mlwp" } */
 
+/* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, xopintrin.h, abmintrin.h,
+   lwpintrin.h, popcntintrin.h and mm3dnow.h are usable with
+   -O -pedantic-errors.  */
+
 #include <x86intrin.h>
 
 int dummy;
Index: g++.dg/other/i386-6.C
===================================================================
--- g++.dg/other/i386-6.C	(revision 155226)
+++ g++.dg/other/i386-6.C	(working copy)
@@ -1,8 +0,0 @@
-/* Test that {,x,e,p,t,s,w,a,i}mmintrin.h, fma4intrin.h, xopintrin.h, mm3dnow.h and
-   mm_malloc.h are usable with -O -pedantic-errors.  */
-/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
-/* { dg-options "-O -pedantic-errors -march=k8 -m3dnow -mavx -msse4a -mfma4 -mxop -maes -mpclmul" } */
-
-#include <x86intrin.h>
-
-int dummy;
Index: g++.dg/other/i386-3.C
===================================================================
--- g++.dg/other/i386-3.C	(revision 155226)
+++ g++.dg/other/i386-3.C	(working copy)
@@ -1,6 +1,8 @@
-/* Test that {,x,e,p,t,s,w,a,i}mmintrin.h, fma4intrin.h, mm3dnow.h, xopintrin.h and
-    mm_malloc.h are usable with -O -fkeep-inline-functions.  */
 /* { dg-do compile { target i?86-*-* x86_64-*-* } } */
-/* { dg-options "-O -fkeep-inline-functions -march=k8 -m3dnow -mavx -msse4a -mfma4 -mxop -maes -mpclmul" } */
+/* { dg-options "-O -fkeep-inline-functions -march=k8 -m3dnow -mavx -mxop -maes -mpclmul -mpopcnt -mabm -mlwp" } */
 
+/* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, xopintrin.h, abmintrin.h,
+   lwpintrin.h, popcntintrin.h and mm3dnow.h are usable with
+   -O -fkeep-inline-functions.  */
+
 #include <x86intrin.h>
Index: g++.dg/other/i386-5.C
===================================================================
--- g++.dg/other/i386-5.C	(revision 155226)
+++ g++.dg/other/i386-5.C	(working copy)
@@ -1,6 +0,0 @@
-/* Test that {,x,e,p,t,s,w,a,i}mmintrin.h, fma4intrin.h, xopintrin.h, mm3dnow.h and
-   mm_malloc.h are usable with -O -fkeep-inline-functions.  */
-/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
-/* { dg-options "-O -fkeep-inline-functions -march=k8 -m3dnow -mavx -msse4a -mfma4 -mxop -maes -mpclmul" } */
-
-#include <x86intrin.h>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: PATCH: Add LWP support for upcoming AMD Orochi processor.
  2009-12-14 19:38                             ` Richard Henderson
@ 2009-12-14 20:15                               ` Jakub Jelinek
  0 siblings, 0 replies; 32+ messages in thread
From: Jakub Jelinek @ 2009-12-14 20:15 UTC (permalink / raw)
  To: Richard Henderson; +Cc: H.J. Lu, Sebastian Pop, gcc-patches, Uros Bizjak

On Mon, Dec 14, 2009 at 11:12:01AM -0800, Richard Henderson wrote:
> On 12/14/2009 11:08 AM, Jakub Jelinek wrote:
> >+		/* These 2 codes can't use case labels, as
> >+		   in 32-bit only target builds they are both
> >+		   CODE_FOR_nothing.  */
> >+		if (TARGET_64BIT
> >+		&&  (icode == CODE_FOR_lwp_lwpvaldi3
> >+			|| icode == CODE_FOR_lwp_lwpinsdi3))
> 
> I wonder if it just wouldn't be easier to use an IF here.
> You don't need the TARGET_64BIT since we know icode != CODE_FOR_nothing.

This works of course too.  I used the switch just because there was one
already (with just default: label and no other cases).

Whatever you prefer...

2009-12-14  Jakub Jelinek  <jakub@redhat.com>

	PR bootstrap/42369
	* config/i386/i386.c (ix86_expand_special_args_builtin): Avoid
	using switch with CODE_FOR_lwp_lwp* cases.

--- gcc/config/i386/i386.c.jj	2009-12-14 17:50:14.000000000 +0100
+++ gcc/config/i386/i386.c	2009-12-14 20:34:03.000000000 +0100
@@ -23836,19 +23836,16 @@ ix86_expand_special_args_builtin (const 
       if (last_arg_constant && (i + 1) == nargs)
 	{
 	  if (!match)
-	    switch (icode)
-	      {
-	      case CODE_FOR_lwp_lwpvalsi3:
-	      case CODE_FOR_lwp_lwpvaldi3:
-	      case CODE_FOR_lwp_lwpinssi3:
-	      case CODE_FOR_lwp_lwpinsdi3:
+	    {
+	      if (icode == CODE_FOR_lwp_lwpvalsi3
+		  || icode == CODE_FOR_lwp_lwpinssi3
+		  || icode == CODE_FOR_lwp_lwpvaldi3
+		  || icode == CODE_FOR_lwp_lwpinsdi3)
 		error ("the last argument must be a 32-bit immediate");
-		return const0_rtx;
-
-	      default:
+	      else
 		error ("the last argument must be an 8-bit immediate");
-		return const0_rtx;
-	      }
+	      return const0_rtx;
+	    }
 	}
       else
 	{


	Jakub

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: PATCH: Add LWP support for upcoming AMD Orochi processor.
  2009-12-14 20:14                         ` Uros Bizjak
@ 2009-12-14 20:38                           ` Jakub Jelinek
  2009-12-14 20:52                             ` Uros Bizjak
  0 siblings, 1 reply; 32+ messages in thread
From: Jakub Jelinek @ 2009-12-14 20:38 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: Sebastian Pop, Richard Henderson, gcc-patches

On Mon, Dec 14, 2009 at 08:38:20PM +0100, Uros Bizjak wrote:
> Strangely, with attached patch, I got a couple of errors:
> 
> In file included from 
> /home/uros/gcc-build/gcc/testsuite/g++/../../include/x86intrin.h:77:0,
>                  from 
> /home/uros/gcc-svn/trunk/gcc/testsuite/g++.dg/other/i386-2.C:8:
> <built-in>: In function 'void* __slwpcb()':
> <built-in>:0:0: error: too few arguments to function 'void* 
> __builtin_ia32_slwpcb(void)'
> /home/uros/gcc-build/gcc/testsuite/g++/../../include/lwpintrin.h:44:33: 
> error: at this point in file

Here is a fix for that.  Committing as obvious...

2009-12-14  Jakub Jelinek  <jakub@redhat.com>

	* config/i386/i386-builtin-types.def (PVOID_FTYPE_VOID): Use
	DEF_FUNCTION_TYPE (PVOID) instead of DEF_FUNCTION_TYPE (PVOID, VOID).

--- gcc/config/i386/i386-builtin-types.def.jj	2009-12-14 17:50:14.000000000 +0100
+++ gcc/config/i386/i386-builtin-types.def	2009-12-14 20:48:07.000000000 +0100
@@ -129,6 +129,7 @@ DEF_FUNCTION_TYPE (FLOAT128)
 DEF_FUNCTION_TYPE (UINT64)
 DEF_FUNCTION_TYPE (UNSIGNED)
 DEF_FUNCTION_TYPE (VOID)
+DEF_FUNCTION_TYPE (PVOID)
 
 DEF_FUNCTION_TYPE (FLOAT, FLOAT)
 DEF_FUNCTION_TYPE (FLOAT128, FLOAT128)
@@ -197,7 +198,6 @@ DEF_FUNCTION_TYPE (V8SI, V4SI)
 DEF_FUNCTION_TYPE (V8SI, V8SF)
 DEF_FUNCTION_TYPE (VOID, PCVOID)
 DEF_FUNCTION_TYPE (VOID, PVOID)
-DEF_FUNCTION_TYPE (PVOID, VOID)
 DEF_FUNCTION_TYPE (VOID, UNSIGNED)
 
 DEF_FUNCTION_TYPE (DI, V2DI, INT)


	Jakub

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: PATCH: Add LWP support for upcoming AMD Orochi processor.
  2009-12-14 20:38                           ` Jakub Jelinek
@ 2009-12-14 20:52                             ` Uros Bizjak
  0 siblings, 0 replies; 32+ messages in thread
From: Uros Bizjak @ 2009-12-14 20:52 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Sebastian Pop, Richard Henderson, gcc-patches

On 12/14/2009 08:51 PM, Jakub Jelinek wrote:
> On Mon, Dec 14, 2009 at 08:38:20PM +0100, Uros Bizjak wrote:
>    
>> Strangely, with attached patch, I got a couple of errors:
>>
>> In file included from
>> /home/uros/gcc-build/gcc/testsuite/g++/../../include/x86intrin.h:77:0,
>>                   from
>> /home/uros/gcc-svn/trunk/gcc/testsuite/g++.dg/other/i386-2.C:8:
>> <built-in>: In function 'void* __slwpcb()':
>> <built-in>:0:0: error: too few arguments to function 'void*
>> __builtin_ia32_slwpcb(void)'
>> /home/uros/gcc-build/gcc/testsuite/g++/../../include/lwpintrin.h:44:33:
>> error: at this point in file
>>      
> Here is a fix for that.  Committing as obvious...
>    

Thanks, works for me.

I have committed my testsuite patch with following changelog:

2009-12-14  Uros Bizjak <ubizjak@gmail.com>

     * g++.dg/other/i386-2.C: Add -mpopcnt -mabm -mlwp to dg-options.
     * g++.dg/other/i386-3.C: Ditto.
     * g++.dg/other/i386-5.C: Remove duplicated test.
     * g++.dg/other/i386-6.C: Ditto.

Patch was tested on x86_64-pc-linux-gnu {,-m32}.

Uros.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: PATCH: Add LWP support for upcoming AMD Orochi processor.
@ 2009-10-09  2:12 Ross Ridge
  0 siblings, 0 replies; 32+ messages in thread
From: Ross Ridge @ 2009-10-09  2:12 UTC (permalink / raw)
  To: gcc-patches

Harsha Jagasia writes:
>+(define_insn "lwp_lwpinshi3"
>+  [(unspec_volatile [(match_operand:HI 0 "register_operand" "r")
>+  		     (match_operand:SI 1 "nonimmediate_operand" "rm")
>+		     (match_operand:HI 2 "const_int_operand" "")]
>+		    UNSPECV_LWPINS_INTRINSIC)]
>+  "TARGET_LWP"
>+  "lwpins\t{%2, %1, %0|%0, %1, %2}"
>+  [(set_attr "type" "lwp")
>+   (set_attr "mode" "HI")])
>+

The LWPINS instruction is documented as setting the carry flag (CF),
and I think this value is supposed to be returned to caller, given the
return type of the intrinsics.

					Ross Ridge

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: PATCH: Add LWP support for upcoming AMD Orochi processor.
  2009-10-01  7:51 Uros Bizjak
@ 2009-10-01 10:09 ` Jan Hubicka
  0 siblings, 0 replies; 32+ messages in thread
From: Jan Hubicka @ 2009-10-01 10:09 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: gcc-patches, Harsha Jagasia, hubicka, rth, dwarak.rajagopal,
	christophe.harle

> 
> Then these instructions should be defined as unspec_volatile. OTOH,
> perhaps you should introduce new fixed register to hold LWP state and
> change all instructions to correctly depend on this register. Since
> LWP state won't be hidden from the compiler, you can define "normal"
> insn patterns using "set". This will also benefit scheduler and will
> increase general happiness of the compiler ;)

Well, with modeling LWP as register, one would need to add explicit USE
pattern to every instruction that differs in behaviour based on LWP
state.  From quick glance at specs it seems that it is about every
instruction.

I guess LWP should act as full scheduling barrier (so we don't get code
we want to profile moved before profiling starts or after it finish), so
unspec_volatile is preferred variant.

Honza

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: PATCH: Add LWP support for upcoming AMD Orochi processor.
@ 2009-10-01  7:51 Uros Bizjak
  2009-10-01 10:09 ` Jan Hubicka
  0 siblings, 1 reply; 32+ messages in thread
From: Uros Bizjak @ 2009-10-01  7:51 UTC (permalink / raw)
  To: gcc-patches
  Cc: Harsha Jagasia, hubicka, rth, dwarak.rajagopal, christophe.harle

Hello!

> This patch is for LWP instruction set support for gcc 4.5 for the
> upcoming AMD Orochi processor. Please see the AMD spec for the LWP
> ISA at http://support.amd.com/us/Processor_TechDocs/43724.pdf
>
> One of the issues I am hoping the maintainers can give guidance on:
>
> - Currently the code for the lwpval and lwpins instructions is
> commented out. These instructions are different from typical
> instructions in that they have no destination register
> (please see the spec). I am not sure how to repesent the patterns
> for the same and would appreciate some input.

Then these instructions should be defined as unspec_volatile. OTOH,
perhaps you should introduce new fixed register to hold LWP state and
change all instructions to correctly depend on this register. Since
LWP state won't be hidden from the compiler, you can define "normal"
insn patterns using "set". This will also benefit scheduler and will
increase general happiness of the compiler ;)

You can just look at FPCR and FPSR handling in i386.md and their
definition in i386.h.

> 	* config/i386/sse.md (lwp_llwpcbhi1): New lwp pattern.
>	...

There is nothing SSE specific in these patterns, so I think they
should go in i386.md.

Uros.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: PATCH: Add LWP support for upcoming AMD Orochi processor.
  2009-10-01  4:06 Harsha Jagasia
@ 2009-10-01  6:30 ` Jakub Jelinek
  0 siblings, 0 replies; 32+ messages in thread
From: Jakub Jelinek @ 2009-10-01  6:30 UTC (permalink / raw)
  To: Harsha Jagasia
  Cc: gcc-patches, hubicka, rth, dwarak.rajagopal, christophe.harle

On Wed, Sep 30, 2009 at 11:06:23PM -0500, Harsha Jagasia wrote:
> - Currently the code for the lwpval and lwpins instructions is
> commented out. These instructions are different from typical
> instructions in that they have no destination register
> (please see the spec). I am not sure how to repesent the patterns
> for the same and would appreciate some input.

> +;;(define_insn "lwp_lwpvalhi3"
> +;;  [(unspec [(match_operand:HI 0 "register_operand" "r")
> +;;  	   (match_operand:SI 1 "nonimmediate_operand" "rm")
> +;;	   (match_operand:HI 2 "const_int_operand" "")]
> +;;  	   UNSPEC_LWPVAL_INTRINSIC)]
> +;;  "TARGET_LWP"
> +;;  "lwpval\t{%2, %1, %0|%0, %1, %2}"
> +;;  [(set_attr "type" "lwp")
> +;;   (set_attr "mode" "HI")])

I think the easiest would be to use (unspec_volatile ... UNSPECV_LWPVAL...)
instead.  Otherwise the insn that doesn't set any register may be eliminated
as unneeded.

	Jakub

^ permalink raw reply	[flat|nested] 32+ messages in thread

* PATCH: Add LWP support for upcoming AMD Orochi processor.
@ 2009-10-01  4:06 Harsha Jagasia
  2009-10-01  6:30 ` Jakub Jelinek
  0 siblings, 1 reply; 32+ messages in thread
From: Harsha Jagasia @ 2009-10-01  4:06 UTC (permalink / raw)
  To: Harsha Jagasia, gcc-patches, hubicka, rth, dwarak.rajagopal,
	christophe.harle
  Cc: Harsha Jagasia

This patch is for LWP instruction set support for gcc 4.5 for the
upcoming AMD Orochi processor. Please see the AMD spec for the LWP
ISA at http://support.amd.com/us/Processor_TechDocs/43724.pdf

We are still in the process of wrapping up the LWP binutils work
and expect it to be checked in during stage 3.

The attached patch is based on the latest trunk and bootstrap
and target tests pass. A full make check is still running.
I will update the list with the results of make check, but I
wanted to send the patch out so that the reviewers can look at it.

One of the issues I am hoping the maintainers can give guidance
on:

- Currently the code for the lwpval and lwpins instructions is
commented out. These instructions are different from typical
instructions in that they have no destination register
(please see the spec). I am not sure how to repesent the patterns
for the same and would appreciate some input.

Thanks in advance.


2009-09-29  Harsha Jagasia  <harsha.jagasia@amd.com>

	* doc/invoke.texi (-mlwp): Add documentation.
	* doc/extend.texi (x86 intrinsics): Add LWP intrinsics.

	* config.gcc (i[34567]86-*-*): Include lwpintrin.h.
	(x86_64-*-*): Ditto.
	* config/i386/lwpintrin.h: New file, provide x86 compiler
	intrinisics for LWP.
	* config/i386/cpuid.h (bit_LWP): Define LWP bit.
	* config/i386/x86intrin.h: Add LWP check and lwpintrin.h.
	* config/i386/i386-c.c(ix86_target_macros_internal): Check
	ISA_FLAG for LWP. 
	* config/i386/i386.h(TARGET_LWP): New macro for LWP.
	* config/i386/i386.opt (-mlwp): New switch for LWP support.

	* config/i386/i386.c (OPTION_MASK_ISA_LWP_SET): New.
	(OPTION_MASK_ISA_LWP_UNSET): New.	
	(ix86_handle_option): Handle -mlwp.
	(isa_opts): Handle -mlwp.
	(enum pta_flags): Add PTA_LWP.
	(override_options): Add LWP support.

	(IX86_BUILTIN_LLWPCB16): New for LWP intrinsic.
	(IX86_BUILTIN_LLWPCB32): Ditto
	(IX86_BUILTIN_LLWPCB64): Ditto
	(IX86_BUILTIN_SLWPCB16): Ditto
	(IX86_BUILTIN_SLWPCB32): Ditto
	(IX86_BUILTIN_SLWPCB64): Ditto
	(IX86_BUILTIN_LWPVAL16): Ditto
	(IX86_BUILTIN_LWPVAL32): Ditto
	(IX86_BUILTIN_LWPVAL64): Ditto
	(IX86_BUILTIN_LWPINS16): Ditto
	(IX86_BUILTIN_LWPINS32): Ditto
	(IX86_BUILTIN_LWPINS64): Ditto

	(enum  ix86_builtin_type): Add LWP intrinsic support.
	(builtin_description): Ditto.
	(ix86_init_mmx_sse_builtins): Ditto.
	(ix86_expand_args_builtin): Ditto.

	* config/i386/i386.md (UNSPEC_LLWP_INTRINSIC):
	(UNSPEC_SLWP_INTRINSIC):
	(UNSPEC_LWPVAL_INTRINSIC):
	(UNSPEC_LWPINS_INTRINSIC): Add new UNSPEC for LWP support.

	* config/i386/sse.md (lwp_llwpcbhi1): New lwp pattern.
	(lwp_llwpcbsi1): Ditto.
	(lwp_llwpcbdi1): Ditto.
	(lwp_slwpcbhi1): Ditto.
	(lwp_slwpcbsi1): Ditto.
	(lwp_slwpcbdi1): Ditto.
	(lwp_lwpvalhi3): Ditto.
	(lwp_lwpvalsi3): Ditto.
	(lwp_lwpvaldi3): Ditto.
	(lwp_lwpinshi3): Ditto.
	(lwp_lwpinssi3): Ditto.
	(lwp_lwpinsdi3): Ditto.



diff -upNw gcc-xop-2/gcc/config.gcc gcc-lwp/gcc/config.gcc
--- gcc-xop-2/gcc/config.gcc	2009-09-30 14:12:36.000000000 -0500
+++ gcc-lwp/gcc/config.gcc	2009-09-30 16:33:28.000000000 -0500
@@ -288,7 +288,7 @@ i[34567]86-*-*)
 		       pmmintrin.h tmmintrin.h ammintrin.h smmintrin.h
 		       nmmintrin.h bmmintrin.h fma4intrin.h wmmintrin.h
 		       immintrin.h x86intrin.h avxintrin.h xopintrin.h
-		       ia32intrin.h cross-stdarg.h"
+		       ia32intrin.h cross-stdarg.h lwpintrin.h"
 	;;
 x86_64-*-*)
 	cpu_type=i386
@@ -298,7 +298,7 @@ x86_64-*-*)
 		       pmmintrin.h tmmintrin.h ammintrin.h smmintrin.h
 		       nmmintrin.h bmmintrin.h fma4intrin.h wmmintrin.h
 		       immintrin.h x86intrin.h avxintrin.h xopintrin.h 
-		       ia32intrin.h cross-stdarg.h"
+		       ia32intrin.h cross-stdarg.h lwpintrin.h"
 	need_64bit_hwint=yes
 	;;
 ia64-*-*)
diff -upNw gcc-xop-2/gcc/doc/extend.texi gcc-lwp/gcc/doc/extend.texi
--- gcc-xop-2/gcc/doc/extend.texi	2009-09-29 19:41:02.000000000 -0500
+++ gcc-lwp/gcc/doc/extend.texi	2009-09-30 16:33:28.000000000 -0500
@@ -3178,6 +3178,11 @@ Enable/disable the generation of the FMA
 @cindex @code{target("xop")} attribute
 Enable/disable the generation of the XOP instructions.
 
+@item lwp
+@itemx no-lwp
+@cindex @code{target("lwp")} attribute
+Enable/disable the generation of the LWP instructions.
+
 @item ssse3
 @itemx no-ssse3
 @cindex @code{target("ssse3")} attribute
@@ -9066,5 +9071,22 @@ v8sf __builtin_ia32_fmsubaddps256 (v8sf,
 
 @end smallexample
 
+The following built-in functions are available when @option{-mlwp} is used.
+
+@smallexample
+void __builtin_ia32_llwpcb16 (void *);
+void __builtin_ia32_llwpcb32 (void *);
+void __builtin_ia32_llwpcb64 (void *);
+void * __builtin_ia32_llwpcb16 (void);
+void * __builtin_ia32_llwpcb32 (void);
+void * __builtin_ia32_llwpcb64 (void);
+@c void __builtin_ia32_lwpval16 (unsigned short, unsigned int, unsigned short)
+@c void __builtin_ia32_lwpval32 (unsigned int, unsigned int, unsigned int)
+@c void __builtin_ia32_lwpval64 (unsigned __int64, unsigned int, unsigned int)
+@c unsigned char __builtin_ia32_lwpins16 (unsigned short, unsigned int, unsigned short)
+@c unsigned char __builtin_ia32_lwpins32 (unsigned int, unsigned int, unsigned int)
+@c unsigned char __builtin_ia32_lwpins64 (unsigned __int64, unsigned int, unsigned int)
+@end smallexample
+
 The following built-in functions are available when @option{-m3dnow} is used.
 All of them generate the machine instruction that is part of the name.

diff -upNw gcc-xop-2/gcc/doc/invoke.texi gcc-lwp/gcc/doc/invoke.texi
--- gcc-xop-2/gcc/doc/invoke.texi	2009-09-29 19:41:02.000000000 -0500
+++ gcc-lwp/gcc/doc/invoke.texi	2009-09-30 16:33:28.000000000 -0500
@@ -592,7 +592,7 @@ Objective-C and Objective-C++ Dialects}.
 -mcld -mcx16 -msahf -mmovbe -mcrc32 -mrecip @gol
 -mmmx  -msse  -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msse4 -mavx @gol
 -maes -mpclmul @gol
--msse4a -m3dnow -mpopcnt -mabm -mfma4 -mxop @gol
+-msse4a -m3dnow -mpopcnt -mabm -mfma4 -mxop -mlwp @gol
 -mthreads  -mno-align-stringops  -minline-all-stringops @gol
 -minline-stringops-dynamically -mstringop-strategy=@var{alg} @gol
 -mpush-args  -maccumulate-outgoing-args  -m128bit-long-double @gol
@@ -11731,6 +11731,8 @@ preferred alignment to @option{-mpreferr
 @itemx -mno-fma4
 @itemx -mxop
 @itemx -mno-xop
+@itemx -mlwp
+@itemx -mno-lwp
 @itemx -m3dnow
 @itemx -mno-3dnow
 @itemx -mpopcnt
@@ -11745,7 +11747,7 @@ preferred alignment to @option{-mpreferr
 @opindex mno-3dnow
 These switches enable or disable the use of instructions in the MMX,
 SSE, SSE2, SSE3, SSSE3, SSE4.1, AVX, AES, PCLMUL, SSE4A, FMA4, XOP,
-ABM or 3DNow!@: extended instruction sets.
+LWP, ABM or 3DNow!@: extended instruction sets.
 These extensions are also available as built-in functions: see
 @ref{X86 Built-in Functions}, for details of the functions enabled and
 disabled by these switches.
diff -upNw gcc-xop-2/gcc/config/i386/cpuid.h gcc-lwp/gcc/config/i386/cpuid.h
--- gcc-xop-2/gcc/config/i386/cpuid.h	2009-09-29 19:41:02.000000000 -0500
+++ gcc-lwp/gcc/config/i386/cpuid.h	2009-09-30 16:33:28.000000000 -0500
@@ -49,6 +49,7 @@
 #define bit_LAHF_LM	(1 << 0)
 #define bit_SSE4a	(1 << 6)
 #define bit_FMA4	(1 << 16)
+#define bit_LWP 	(1 << 15)
 #define bit_XOP         (1 << 11)
 
 /* %edx */
diff -upNw gcc-xop-2/gcc/config/i386/i386.c gcc-lwp/gcc/config/i386/i386.c
--- gcc-xop-2/gcc/config/i386/i386.c	2009-09-29 19:41:03.000000000 -0500
+++ gcc-lwp/gcc/config/i386/i386.c	2009-09-30 16:33:28.000000000 -0500
@@ -1960,6 +1960,8 @@ static int ix86_isa_flags_explicit;
    | OPTION_MASK_ISA_AVX_SET)
 #define OPTION_MASK_ISA_XOP_SET \
   (OPTION_MASK_ISA_XOP | OPTION_MASK_ISA_FMA4_SET)
+#define OPTION_MASK_ISA_LWP_SET \
+  OPTION_MASK_ISA_LWP
 
 /* AES and PCLMUL need SSE2 because they use xmm registers */
 #define OPTION_MASK_ISA_AES_SET \
@@ -2014,6 +2016,7 @@ static int ix86_isa_flags_explicit;
 #define OPTION_MASK_ISA_FMA4_UNSET \
   (OPTION_MASK_ISA_FMA4 | OPTION_MASK_ISA_XOP_UNSET)
 #define OPTION_MASK_ISA_XOP_UNSET OPTION_MASK_ISA_XOP
+#define OPTION_MASK_ISA_LWP_UNSET OPTION_MASK_ISA_LWP
 
 #define OPTION_MASK_ISA_AES_UNSET OPTION_MASK_ISA_AES
 #define OPTION_MASK_ISA_PCLMUL_UNSET OPTION_MASK_ISA_PCLMUL
@@ -2274,6 +2277,19 @@ ix86_handle_option (size_t code, const c
 	}
       return true;
 
+   case OPT_mlwp:
+      if (value)
+	{
+	  ix86_isa_flags |= OPTION_MASK_ISA_LWP_SET;
+	  ix86_isa_flags_explicit |= OPTION_MASK_ISA_LWP_SET;
+	}
+      else
+	{
+	  ix86_isa_flags &= ~OPTION_MASK_ISA_LWP_UNSET;
+	  ix86_isa_flags_explicit |= OPTION_MASK_ISA_LWP_UNSET;
+	}
+      return true;
+
     case OPT_mabm:
       if (value)
 	{
@@ -2403,6 +2419,7 @@ ix86_target_string (int isa, int flags, 
     { "-m64",		OPTION_MASK_ISA_64BIT },
     { "-mfma4",		OPTION_MASK_ISA_FMA4 },
     { "-mxop",		OPTION_MASK_ISA_XOP },
+    { "-mlwp",		OPTION_MASK_ISA_LWP },
     { "-msse4a",	OPTION_MASK_ISA_SSE4A },
     { "-msse4.2",	OPTION_MASK_ISA_SSE4_2 },
     { "-msse4.1",	OPTION_MASK_ISA_SSE4_1 },
@@ -2634,7 +2651,8 @@ override_options (bool main_args_p)
       PTA_FMA = 1 << 19,
       PTA_MOVBE = 1 << 20,
       PTA_FMA4 = 1 << 21,
-      PTA_XOP = 1 << 22
+      PTA_XOP = 1 << 22,
+      PTA_LWP = 1 << 23
     };
 
   static struct pta
@@ -2983,6 +3001,9 @@ override_options (bool main_args_p)
 	if (processor_alias_table[i].flags & PTA_XOP
 	    && !(ix86_isa_flags_explicit & OPTION_MASK_ISA_XOP))
 	  ix86_isa_flags |= OPTION_MASK_ISA_XOP;
+	if (processor_alias_table[i].flags & PTA_LWP
+	    && !(ix86_isa_flags_explicit & OPTION_MASK_ISA_LWP))
+	  ix86_isa_flags |= OPTION_MASK_ISA_LWP;
 	if (processor_alias_table[i].flags & PTA_ABM
 	    && !(ix86_isa_flags_explicit & OPTION_MASK_ISA_ABM))
 	  ix86_isa_flags |= OPTION_MASK_ISA_ABM;
@@ -3668,6 +3689,7 @@ ix86_valid_target_attribute_inner_p (tre
     IX86_ATTR_ISA ("ssse3",	OPT_mssse3),
     IX86_ATTR_ISA ("fma4",	OPT_mfma4),
     IX86_ATTR_ISA ("xop",	OPT_mxop),
+    IX86_ATTR_ISA ("lwp",	OPT_mlwp),
 
     /* string options */
     IX86_ATTR_STR ("arch=",	IX86_FUNCTION_SPECIFIC_ARCH),
@@ -20987,7 +21009,7 @@ enum ix86_builtins
 
   IX86_BUILTIN_CVTUDQ2PS,
 
-  /* FMA4 instructions.  */
+  /* FMA4 and XOP instructions.  */
   IX86_BUILTIN_VFMADDSS,
   IX86_BUILTIN_VFMADDSD,
   IX86_BUILTIN_VFMADDPS,
@@ -21164,6 +21186,23 @@ enum ix86_builtins
   IX86_BUILTIN_VPCOMFALSEQ,
   IX86_BUILTIN_VPCOMTRUEQ,
 
+  /* LWP instructions.  */
+  IX86_BUILTIN_LLWPCB16,
+  IX86_BUILTIN_LLWPCB32,
+  IX86_BUILTIN_LLWPCB64,
+  IX86_BUILTIN_SLWPCB16,
+  IX86_BUILTIN_SLWPCB32,
+  IX86_BUILTIN_SLWPCB64,
+
+  /*
+  IX86_BUILTIN_LWPVAL16,
+  IX86_BUILTIN_LWPVAL32,
+  IX86_BUILTIN_LWPVAL64,
+  IX86_BUILTIN_LWPINS16,
+  IX86_BUILTIN_LWPINS32,
+  IX86_BUILTIN_LWPINS64,
+  */
+
   IX86_BUILTIN_MAX
 };
 
@@ -21540,7 +21579,13 @@ enum ix86_builtin_type
   V1DI2DI_FTYPE_V1DI_V1DI_INT,
   V2DF_FTYPE_V2DF_V2DF_INT,
   V2DI_FTYPE_V2DI_UINT_UINT,
-  V2DI_FTYPE_V2DI_V2DI_UINT_UINT
+  V2DI_FTYPE_V2DI_V2DI_UINT_UINT,
+  VOID_FTYPE_USHORT_UINT_USHORT,
+  VOID_FTYPE_UINT_UINT_UINT,
+  VOID_FTYPE_UINT64_UINT_UINT,
+  UCHAR_FTYPE_USHORT_UINT_USHORT,
+  UCHAR_FTYPE_UINT_UINT_UINT,
+  UCHAR_FTYPE_UINT64_UINT_UINT
 };
 
 /* Special builtins with variable number of arguments.  */
@@ -22237,7 +22282,7 @@ static const struct builtin_description 
   { OPTION_MASK_ISA_AVX, CODE_FOR_avx_movmskps256, "__builtin_ia32_movmskps256", IX86_BUILTIN_MOVMSKPS256, UNKNOWN, (int) INT_FTYPE_V8SF },
 };
 
-/* FMA4 and XOP.  */
+/* FMA4, XOP.  */
 enum multi_arg_type {
   MULTI_ARG_UNKNOWN,
   MULTI_ARG_3_SF,
@@ -22484,6 +22529,23 @@ static const struct builtin_description 
   { OPTION_MASK_ISA_XOP, CODE_FOR_xop_pcom_tfv4si3,      "__builtin_ia32_vpcomtrueud", IX86_BUILTIN_VPCOMTRUEUD, (enum rtx_code) PCOM_TRUE,    (int)MULTI_ARG_2_SI_TF },
   { OPTION_MASK_ISA_XOP, CODE_FOR_xop_pcom_tfv2di3,      "__builtin_ia32_vpcomtrueuq", IX86_BUILTIN_VPCOMTRUEUQ, (enum rtx_code) PCOM_TRUE,    (int)MULTI_ARG_2_DI_TF },
 
+  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_llwpcbhi1,            "__builtin_ia32_llwpcb16",   IX86_BUILTIN_LLWPCB16,    UNKNOWN,     (int) VOID_FTYPE_VOID },
+  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_llwpcbsi1,            "__builtin_ia32_llwpcb32",   IX86_BUILTIN_LLWPCB32,    UNKNOWN,     (int) VOID_FTYPE_VOID },
+  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_llwpcbdi1,            "__builtin_ia32_llwpcb64",   IX86_BUILTIN_LLWPCB64,    UNKNOWN,     (int) VOID_FTYPE_VOID },
+
+  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_slwpcbhi1,            "__builtin_ia32_slwpcb16",   IX86_BUILTIN_SLWPCB16,    UNKNOWN,     (int) VOID_FTYPE_VOID },
+  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_slwpcbsi1,            "__builtin_ia32_slwpcb32",   IX86_BUILTIN_SLWPCB32,    UNKNOWN,     (int) VOID_FTYPE_VOID },
+  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_slwpcbdi1,            "__builtin_ia32_slwpcb64",   IX86_BUILTIN_SLWPCB64,    UNKNOWN,     (int) VOID_FTYPE_VOID },
+
+  /*
+  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpvalhi3,          "__builtin_ia32_lwpval16", IX86_BUILTIN_LWPVAL16,  UNKNOWN,     (int) VOID_FTYPE_USHORT_UINT_USHORT },
+  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpvalsi3,          "__builtin_ia32_lwpval32", IX86_BUILTIN_LWPVAL64,  UNKNOWN,     (int) VOID_FTYPE_UINT_UINT_UINT },
+  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpvaldi3,          "__builtin_ia32_lwpval64", IX86_BUILTIN_LWPVAL64,  UNKNOWN,     (int) VOID_FTYPE_UINT64_UINT_UINT },
+
+  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpinshi3,          "__builtin_ia32_lwpins16", IX86_BUILTIN_LWPINS16,  UNKNOWN,     (int) UCHAR_FTYPE_USHORT_UINT_USHORT },
+  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpinssi3,          "__builtin_ia32_lwpins32", IX86_BUILTIN_LWPINS64,  UNKNOWN,     (int) UCHAR_FTYPE_UINT_UINT_UINT },
+  { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpinsdi3,          "__builtin_ia32_lwpins64", IX86_BUILTIN_LWPINS64,  UNKNOWN,     (int) UCHAR_FTYPE_UINT64_UINT_UINT },
+  */
 };
 
 /* Set up all the MMX/SSE builtins, even builtins for instructions that are not
@@ -23253,6 +23315,50 @@ ix86_init_mmx_sse_builtins (void)
 				float_type_node,
 				NULL_TREE);
 
+  /* LWP instructions.  */
+
+  tree void_ftype_ushort_unsigned_ushort
+    = build_function_type_list (void_type_node,
+				short_unsigned_type_node,
+				unsigned_type_node,
+				short_unsigned_type_node,
+				NULL_TREE);
+
+  tree void_ftype_unsigned_unsigned_unsigned
+    = build_function_type_list (void_type_node,
+				unsigned_type_node,
+				unsigned_type_node,
+				unsigned_type_node,
+				NULL_TREE);
+
+  tree void_ftype_uint64_unsigned_unsigned
+    = build_function_type_list (void_type_node,
+				long_long_unsigned_type_node,
+				unsigned_type_node,
+				unsigned_type_node,
+				NULL_TREE);
+
+  tree uchar_ftype_ushort_unsigned_ushort
+    = build_function_type_list (unsigned_char_type_node,
+				short_unsigned_type_node,
+				unsigned_type_node,
+				short_unsigned_type_node,
+				NULL_TREE);
+
+  tree uchar_ftype_unsigned_unsigned_unsigned
+    = build_function_type_list (unsigned_char_type_node,
+				unsigned_type_node,
+				unsigned_type_node,
+				unsigned_type_node,
+				NULL_TREE);
+
+  tree uchar_ftype_uint64_unsigned_unsigned
+    = build_function_type_list (unsigned_char_type_node,
+				long_long_unsigned_type_node,
+				unsigned_type_node,
+				unsigned_type_node,
+				NULL_TREE);
+
   /* Integer intrinsics.  */
   tree uint64_ftype_void
     = build_function_type (long_long_unsigned_type_node,
@@ -23855,6 +23961,25 @@ ix86_init_mmx_sse_builtins (void)
 	case V1DI2DI_FTYPE_V1DI_V1DI_INT:
 	  type = v1di_ftype_v1di_v1di_int;
 	  break;
+	case VOID_FTYPE_USHORT_UINT_USHORT:
+	  type = void_ftype_ushort_unsigned_ushort;
+	  break;
+	case VOID_FTYPE_UINT_UINT_UINT:
+	  type = void_ftype_unsigned_unsigned_unsigned;
+	  break;
+	case VOID_FTYPE_UINT64_UINT_UINT:
+	  type = void_ftype_uint64_unsigned_unsigned;
+	  break;
+	case UCHAR_FTYPE_USHORT_UINT_USHORT:
+	  type = uchar_ftype_ushort_unsigned_ushort;
+	  break;
+	case UCHAR_FTYPE_UINT_UINT_UINT:
+	  type = uchar_ftype_unsigned_unsigned_unsigned;
+	  break;
+	case UCHAR_FTYPE_UINT64_UINT_UINT:
+	  type = uchar_ftype_uint64_unsigned_unsigned;
+	  break;
+
 	default:
 	  gcc_unreachable ();
 	}
@@ -25034,6 +25159,15 @@ ix86_expand_args_builtin (const struct b
       nargs = 4;
       nargs_constant = 2;
       break;
+    case VOID_FTYPE_USHORT_UINT_USHORT:
+    case VOID_FTYPE_UINT_UINT_UINT:
+    case VOID_FTYPE_UINT64_UINT_UINT:
+    case UCHAR_FTYPE_USHORT_UINT_USHORT:
+    case UCHAR_FTYPE_UINT_UINT_UINT:
+    case UCHAR_FTYPE_UINT64_UINT_UINT:
+      nargs = 3;
+      nargs_constant = 3;
+      break;
     default:
       gcc_unreachable ();
     }
diff -upNw gcc-xop-2/gcc/config/i386/i386-c.c gcc-lwp/gcc/config/i386/i386-c.c
--- gcc-xop-2/gcc/config/i386/i386-c.c	2009-09-29 19:41:03.000000000 -0500
+++ gcc-lwp/gcc/config/i386/i386-c.c	2009-09-30 16:33:28.000000000 -0500
@@ -234,6 +234,8 @@ ix86_target_macros_internal (int isa_fla
     def_or_undef (parse_in, "__FMA4__");
   if (isa_flag & OPTION_MASK_ISA_XOP)
     def_or_undef (parse_in, "__XOP__");
+  if (isa_flag & OPTION_MASK_ISA_LWP)
+    def_or_undef (parse_in, "__LWP__");
   if ((fpmath & FPMATH_SSE) && (isa_flag & OPTION_MASK_ISA_SSE))
     def_or_undef (parse_in, "__SSE_MATH__");
   if ((fpmath & FPMATH_SSE) && (isa_flag & OPTION_MASK_ISA_SSE2))
diff -upNw gcc-xop-2/gcc/config/i386/i386.h gcc-lwp/gcc/config/i386/i386.h
--- gcc-xop-2/gcc/config/i386/i386.h	2009-09-29 19:41:03.000000000 -0500
+++ gcc-lwp/gcc/config/i386/i386.h	2009-09-30 16:33:28.000000000 -0500
@@ -56,6 +56,7 @@ see the files COPYING3 and COPYING.RUNTI
 #define TARGET_SSE4A	OPTION_ISA_SSE4A
 #define TARGET_FMA4	OPTION_ISA_FMA4
 #define TARGET_XOP	OPTION_ISA_XOP
+#define TARGET_LWP	OPTION_ISA_LWP
 #define TARGET_ROUND	OPTION_ISA_ROUND
 #define TARGET_ABM	OPTION_ISA_ABM
 #define TARGET_POPCNT	OPTION_ISA_POPCNT
diff -upNw gcc-xop-2/gcc/config/i386/i386.md gcc-lwp/gcc/config/i386/i386.md
--- gcc-xop-2/gcc/config/i386/i386.md	2009-09-29 19:41:03.000000000 -0500
+++ gcc-lwp/gcc/config/i386/i386.md	2009-09-30 16:33:28.000000000 -0500
@@ -204,6 +204,10 @@
    (UNSPEC_XOP_TRUEFALSE	152)
    (UNSPEC_XOP_PERMUTE		153)
    (UNSPEC_FRCZ			154)
+   (UNSPEC_LLWP_INTRINSIC	155)
+   (UNSPEC_SLWP_INTRINSIC	156)
+   (UNSPEC_LWPVAL_INTRINSIC	157)
+   (UNSPEC_LWPINS_INTRINSIC	158)
 
    ; For AES support
    (UNSPEC_AESENC		159)
@@ -352,7 +356,7 @@
    fmov,fop,fsgn,fmul,fdiv,fpspc,fcmov,fcmp,fxch,fistp,fisttp,frndint,
    sselog,sselog1,sseiadd,sseiadd1,sseishft,sseimul,
    sse,ssemov,sseadd,ssemul,ssecmp,ssecomi,ssecvt,ssecvt1,sseicvt,ssediv,sseins,
-   ssemuladd,sse4arg,
+   ssemuladd,sse4arg,lwp,
    mmx,mmxmov,mmxadd,mmxmul,mmxcmp,mmxcvt,mmxshft"
   (const_string "other"))
 
diff -upNw gcc-xop-2/gcc/config/i386/i386.opt gcc-lwp/gcc/config/i386/i386.opt
--- gcc-xop-2/gcc/config/i386/i386.opt	2009-09-29 19:41:03.000000000 -0500
+++ gcc-lwp/gcc/config/i386/i386.opt	2009-09-30 16:33:28.000000000 -0500
@@ -318,6 +318,10 @@ mxop
 Target Report Mask(ISA_XOP) Var(ix86_isa_flags) VarExists Save
 Support XOP built-in functions and code generation 
 
+mlwp
+Target Report Mask(ISA_LWP) Var(ix86_isa_flags) VarExists Save
+Support LWP built-in functions and code generation 
+
 mabm
 Target Report Mask(ISA_ABM) Var(ix86_isa_flags) VarExists Save
 Support code generation of Advanced Bit Manipulation (ABM) instructions.
diff -upNw gcc-xop-2/gcc/config/i386/lwpintrin.h gcc-lwp/gcc/config/i386/lwpintrin.h
--- gcc-xop-2/gcc/config/i386/lwpintrin.h	1969-12-31 18:00:00.000000000 -0600
+++ gcc-lwp/gcc/config/i386/lwpintrin.h	2009-09-30 16:33:28.000000000 -0500
@@ -0,0 +1,111 @@
+/* Copyright (C) 2007, 2008, 2009 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _X86INTRIN_H_INCLUDED
+# error "Never use <lwpintrin.h> directly; include <x86intrin.h> instead."
+#endif
+
+#ifndef _LWPINTRIN_H_INCLUDED
+#define _LWPINTRIN_H_INCLUDED
+
+#ifndef __LWP__
+# error "LWP instruction set not enabled"
+#else
+
+extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+__llwpcb16 (void *pcbAddress)
+{
+  __builtin_ia32_llwpcb16 (pcbAddress);
+}
+
+extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+__llwpcb32 (void *pcbAddress)
+{
+  __builtin_ia32_llwpcb32 (pcbAddress);
+}
+
+extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+__llwpcb64 (void *pcbAddress)
+{
+  __builtin_ia32_llwpcb64 (pcbAddress);
+}
+
+extern __inline void * __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+__slwpcb16 (void)
+{
+  return __builtin_ia32_slwpcb16 ();
+}
+
+extern __inline void * __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+__slwpcb32 (void)
+{
+  return __builtin_ia32_slwpcb32 ();
+}
+
+extern __inline void * __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+__slwpcb64 (void)
+{
+  return __builtin_ia32_slwpcb64 ();
+}
+
+/*
+extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+__lwpval16 (unsigned short data2, unsigned int data1, unsigned short flags)
+{
+  __builtin_ia32_lwpval16 (data2, data1, flags);
+}
+
+extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+__lwpval32 (unsigned int data2, unsigned int data1, unsigned int flags)
+{
+  __builtin_ia32_lwpval32 (data2, data1, flags);
+}
+
+extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+__lwpval64 (unsigned __int64 data2, unsigned int data1, unsigned int flags)
+{
+  __builtin_ia32_lwpval64 (data2, data1, flags);
+}
+
+extern __inline unsigned char __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+__lwpins16 (unsigned short data2, unsigned int data1, unsigned short flags)
+{
+  return __builtin_ia32_lwpins16 (data2, data1, flags);
+}
+
+extern __inline unsigned char __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+__lwpins32 (unsigned int data2, unsigned int data1, unsigned int flags)
+{
+  return __builtin_ia32_lwpins32 (data2, data1, flags);
+}
+
+extern __inline unsigned char __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+__lwpins64 (unsigned __int64 data2, unsigned int data1, unsigned int flags)
+{
+  return __builtin_ia32_lwpins64 (data2, data1, flags);
+}
+*/
+
+#endif /* __LWP__ */
+
+#endif /* _LWPINTRIN_H_INCLUDED */
diff -upNw gcc-xop-2/gcc/config/i386/sse.md gcc-lwp/gcc/config/i386/sse.md
--- gcc-xop-2/gcc/config/i386/sse.md	2009-09-29 19:41:03.000000000 -0500
+++ gcc-lwp/gcc/config/i386/sse.md	2009-09-30 16:33:28.000000000 -0500
@@ -12092,6 +12092,121 @@
    (set_attr "length_immediate" "1")
    (set_attr "mode" "TI")])
 
+
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;;
+;; LWP instructions
+;;
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+
+(define_insn "lwp_llwpcbhi1"
+  [(unspec [(match_operand:HI 0 "register_operand" "r")]
+  	   UNSPEC_LLWP_INTRINSIC)]
+  "TARGET_LWP"
+  "llwpcb\t%0"
+  [(set_attr "type" "lwp")
+   (set_attr "mode" "HI")])
+
+(define_insn "lwp_llwpcbsi1"
+  [(unspec [(match_operand:SI 0 "register_operand" "r")]
+  	   UNSPEC_LLWP_INTRINSIC)]
+  "TARGET_LWP"
+  "llwpcb\t%0"
+  [(set_attr "type" "lwp")
+   (set_attr "mode" "SI")])
+
+(define_insn "lwp_llwpcbdi1"
+  [(unspec [(match_operand:DI 0 "register_operand" "r")]
+  	   UNSPEC_LLWP_INTRINSIC)]
+  "TARGET_LWP"
+  "llwpcb\t%0"
+  [(set_attr "type" "lwp")
+   (set_attr "mode" "DI")])
+
+(define_insn "lwp_slwpcbhi1"
+  [(unspec [(match_operand:HI 0 "register_operand" "r")]
+  	   UNSPEC_SLWP_INTRINSIC)]
+  "TARGET_LWP"
+  "slwpcb\t%0"
+  [(set_attr "type" "lwp")
+   (set_attr "mode" "HI")])
+
+(define_insn "lwp_slwpcbsi1"
+  [(unspec [(match_operand:SI 0 "register_operand" "r")]
+  	   UNSPEC_SLWP_INTRINSIC)]
+  "TARGET_LWP"
+  "slwpcb\t%0"
+  [(set_attr "type" "lwp")
+   (set_attr "mode" "SI")])
+
+(define_insn "lwp_slwpcbdi1"
+  [(unspec [(match_operand:DI 0 "register_operand" "r")]
+  	   UNSPEC_SLWP_INTRINSIC)]
+  "TARGET_LWP"
+  "slwpcb\t%0"
+  [(set_attr "type" "lwp")
+   (set_attr "mode" "DI")])
+
+;;(define_insn "lwp_lwpvalhi3"
+;;  [(unspec [(match_operand:HI 0 "register_operand" "r")
+;;  	   (match_operand:SI 1 "nonimmediate_operand" "rm")
+;;	   (match_operand:HI 2 "const_int_operand" "")]
+;;  	   UNSPEC_LWPVAL_INTRINSIC)]
+;;  "TARGET_LWP"
+;;  "lwpval\t{%2, %1, %0|%0, %1, %2}"
+;;  [(set_attr "type" "lwp")
+;;   (set_attr "mode" "HI")])
+
+;;(define_insn "lwp_lwpvalsi3"
+;;  [(unspec [(match_operand:SI 0 "register_operand" "r")]
+;;  	   (match_operand:SI 1 "nonimmediate_operand" "rm")
+;;	   (match_operand:SI 2 "const_int_operand" "")]
+;;  	   UNSPEC_LWPVAL_INTRINSIC)]
+;;  "TARGET_LWP"
+;;  "lwpval\t{%2, %1, %0|%0, %1, %2}"
+;;  [(set_attr "type" "lwp")
+;;   (set_attr "mode" "SI")])
+
+;;(define_insn "lwp_lwpvaldi3"
+;;  [(unspec [(match_operand:DI 0 "register_operand" "r")]
+;;  	   [(match_operand:SI 1 "nonimmediate_operand" "rm")]
+;;	   [(match_operand:SI 2 "const_int_operand" "")]
+;;  	   UNSPEC_LWPVAL_INTRINSIC)]
+;;  "TARGET_LWP"
+;;  "lwpval\t{%2, %1, %0|%0, %1, %2}"
+;;  [(set_attr "type" "lwp")
+;;   (set_attr "mode" "DI")])
+
+;;(define_insn "lwp_lwpinshi3"
+;;  [(unspec [(match_operand:HI 0 "register_operand" "r")]
+;;  	   (match_operand:SI 1 "nonimmediate_operand" "rm")
+;;	   (match_operand:HI 2 "const_int_operand" "")]
+;;  	   UNSPEC_LWPINS_INTRINSIC)]
+;;  "TARGET_LWP"
+;;  "lwpins\t{%2, %1, %0|%0, %1, %2}"
+;;  [(set_attr "type" "lwp")
+;;   (set_attr "mode" "HI")])
+
+;;(define_insn "lwp_lwpinssi3"
+;;  [(unspec [(match_operand:SI 0 "register_operand" "r")
+;;  	   (match_operand:SI 1 "nonimmediate_operand" "rm")
+;;	   (match_operand:SI 2 "const_int_operand" "")]
+;;  	   UNSPEC_LWPINS_INTRINSIC)]
+;;  "TARGET_LWP"
+;;  "lwpins\t{%2, %1, %0|%0, %1, %2}"
+;;  [(set_attr "type" "lwp")
+;;   (set_attr "mode" "SI")])
+
+;;(define_insn "lwp_lwpinsdi3"
+;;  [(unspec [(match_operand:DI 0 "register_operand" "r")]
+;;  	   (match_operand:SI 1 "nonimmediate_operand" "rm")
+;;	   (match_operand:SI 2 "const_int_operand" "")]
+;;  	   UNSPEC_LWPINS_INTRINSIC)]
+;;  "TARGET_LWP"
+;;  "lwpins\t{%2, %1, %0|%0, %1, %2}"
+;;  [(set_attr "type" "lwp")
+;;   (set_attr "mode" "DI")])
+
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 (define_insn "*avx_aesenc"
   [(set (match_operand:V2DI 0 "register_operand" "=x")
diff -upNw gcc-xop-2/gcc/config/i386/x86intrin.h gcc-lwp/gcc/config/i386/x86intrin.h
--- gcc-xop-2/gcc/config/i386/x86intrin.h	2009-09-29 19:41:03.000000000 -0500
+++ gcc-lwp/gcc/config/i386/x86intrin.h	2009-09-30 16:33:28.000000000 -0500
@@ -62,6 +62,10 @@
 #include <xopintrin.h>
 #endif
 
+#ifdef __LWP__
+#include <lwpintrin.h>
+#endif
+
 #if defined (__AES__) || defined (__PCLMUL__)
 #include <wmmintrin.h>
 #endif

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2009-12-14 20:15 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-10-09  1:13 PATCH: Add LWP support for upcoming AMD Orochi processor Harsha Jagasia
2009-10-09 10:07 ` Jakub Jelinek
2009-10-22 21:07   ` rajagopal, dwarak
2009-11-05  9:32 ` Jakub Jelinek
2009-11-05 16:21   ` Jakub Jelinek
2009-11-05 16:58     ` Sebastian Pop
2009-11-05 17:03       ` Richard Guenther
2009-11-05 17:21       ` Uros Bizjak
2009-11-06 10:15     ` Jakub Jelinek
2009-12-10 19:58       ` Sebastian Pop
2009-12-10 21:01         ` Jakub Jelinek
2009-12-10 21:04           ` Sebastian Pop
2009-12-10 21:52             ` Jakub Jelinek
2009-12-11 14:51               ` Jakub Jelinek
2009-12-11 16:54                 ` Richard Henderson
2009-12-11 21:00                 ` Sebastian Pop
2009-12-11 21:43                   ` Jakub Jelinek
2009-12-11 22:27                     ` Sebastian Pop
2009-12-12  9:27                       ` Sebastian Pop
2009-12-14 16:35                         ` Richard Henderson
2009-12-14 19:15                         ` H.J. Lu
2009-12-14 19:21                           ` Jakub Jelinek
2009-12-14 19:38                             ` Richard Henderson
2009-12-14 20:15                               ` Jakub Jelinek
2009-12-14 20:14                         ` Uros Bizjak
2009-12-14 20:38                           ` Jakub Jelinek
2009-12-14 20:52                             ` Uros Bizjak
  -- strict thread matches above, loose matches on Subject: below --
2009-10-09  2:12 Ross Ridge
2009-10-01  7:51 Uros Bizjak
2009-10-01 10:09 ` Jan Hubicka
2009-10-01  4:06 Harsha Jagasia
2009-10-01  6:30 ` Jakub Jelinek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).