public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* PATCH: Add XOP 128-bit and 256-bit support for upcoming AMD Orochi processor.
@ 2009-09-30  6:52 Harsha Jagasia
  2009-10-01 15:37 ` Jan Hubicka
  0 siblings, 1 reply; 15+ messages in thread
From: Harsha Jagasia @ 2009-09-30  6:52 UTC (permalink / raw)
  To: gcc-patches, hubicka, rth, dwarak.rajagopal, christophe.harle
  Cc: Harsha Jagasia

This patch is for the 256-bit and 128-bit XOP support for gcc 4.5
for the upcoming AMD Orochi processor.

We are still in the process of wrapping up the XOP binutils work
and expect it to be checked in during stage 3.

A slightly older version of this patch based on gcc 4.5 rev 152230
passed bootstrap and make check, including the XOP specific
target tests on my old x86. Only the xop-hadd/sub*.c target
tests fail with unsupported message due to the absence of XOP
assembler.

The attached patch is based on the latest trunk and both bootstrap
and make check are still running. Since there have been no changes to
the patch, I do not anticipate any issues. I will update the list with
the results, but I wanted to send the patch out so that the reviwers
can look at it.

For the most part the changes are minimal:
- Add v in front of the instruction names.
- Remove restriction on the destination register being same as one
of the source registers.
- Add 256-bit pcmov and frcz XOP instructions, which did not exist
before.

Ok for check-in?

2009-09-29  Harsha Jagasia  <harsha.jagasia@amd.com>

	* config.gcc (i[34567]86-*-*): Include xopintrin.h.
	(x86_64-*-*): Ditto.
	* config/i386/xopintrin.h: New file, provide common x86 compiler
	intrinisics for XOP.
	* config/i386/cpuid.h (bit_XOP): Define XOP bit.
	* config/i386/x86intrin.h: Add XOP check and xopintrin.h.
	* config/i386/i386-c.c(ix86_target_macros_internal): Check
	ISA_FLAG for XOP. 
	* config/i386/i386.h(TARGET_XOP): New macro for XOP.
	* config/i386/i386.opt (-mxop): New switch for XOP support.
	* config/i386/i386-protos.h (ix86_expand_xop_unpack)
	(ix86_expand_xop_pack): Add declaration.
	* config/i386/i386.md (UNSPEC_XOP_UNSIGNED_CMP)
	(UNSPEC_XOP_TRUEFALSE)
	(UNSPEC_XOP_PERMUTE)
	(UNSPEC_FRCZ): Add new UNSPEC for XOP support.
	(PPERM_*): New constants for vpperm instruction.
	(xop_pcmov_<mode>): Add XOP conditional mov instructions.
	* config/i386/i386.c (OPTION_MASK_ISA_XOP_SET): New.
	(OPTION_MASK_ISA_XOP_UNSET): New.	
	(OPTION_MASK_ISA_XOP_UNSET): Change definition to
	depend on XOP.
	(ix86_handle_option): Handle -mxop.
	(isa_opts): Handle -mxop.
	(enum pta_flags): Add PTA_XOP.
	(override_options): Add XOP support.
	(print_operand): Add code for XOP compare instructions.
	(ix86_expand_sse_movcc): Extend for XOP conditional move instruction.
	(ix86_expand_int_vcond): Extend for XOP compare instruction.
	(ix86_expand_xop_unpack)
	(ix86_expand_xop_pack): New.

	(IX86_BUILTIN_VPCMOV): New for XOP intrinsic.
	(IX86_BUILTIN_VPCMOV_V2DI): Ditto.
	(IX86_BUILTIN_VPCMOV_V4SI): Ditto.
	(IX86_BUILTIN_VPCMOV_V8HI): Ditto.
	(IX86_BUILTIN_VPCMOV_V16QI): Ditto.
	(IX86_BUILTIN_VPCMOV_V4SF): Ditto.
	(IX86_BUILTIN_VPCMOV_V2DF): Ditto.

	(IX86_BUILTIN_VPCMOV256): Ditto.
	(IX86_BUILTIN_VPCMOV_V4DI256): Ditto.
	(IX86_BUILTIN_VPCMOV_V8SI256): Ditto.
	(IX86_BUILTIN_VPCMOV_V16HI256): Ditto.
	(IX86_BUILTIN_VPCMOV_V32QI256): Ditto.
	(IX86_BUILTIN_VPCMOV_V8SF256): Ditto.
	(IX86_BUILTIN_VPCMOV_V4DF256): Ditto.

	(IX86_BUILTIN_VPPERM): Ditto.

	(IX86_BUILTIN_VPMACSSWW): Ditto.
	(IX86_BUILTIN_VPMACSWW): Ditto.
	(IX86_BUILTIN_VPMACSSWD): Ditto.
	(IX86_BUILTIN_VPMACSWD): Ditto.
	(IX86_BUILTIN_VPMACSSDD): Ditto.
	(IX86_BUILTIN_VPMACSDD): Ditto.
	(IX86_BUILTIN_VPMACSSDQL): Ditto.
	(IX86_BUILTIN_VPMACSSDQH): Ditto.
	(IX86_BUILTIN_VPMACSDQL): Ditto.
	(IX86_BUILTIN_VPMACSDQH): Ditto.
	(IX86_BUILTIN_VPMADCSSWD): Ditto.
	(IX86_BUILTIN_VPMADCSWD): Ditto.

	(IX86_BUILTIN_VPHADDBW): Ditto.
	(IX86_BUILTIN_VPHADDBD): Ditto.
	(IX86_BUILTIN_VPHADDBQ): Ditto.
	(IX86_BUILTIN_VPHADDWD): Ditto.
	(IX86_BUILTIN_VPHADDWQ): Ditto.
	(IX86_BUILTIN_VPHADDDQ): Ditto.
	(IX86_BUILTIN_VPHADDUBW): Ditto.
	(IX86_BUILTIN_VPHADDUBD): Ditto.
	(IX86_BUILTIN_VPHADDUBQ): Ditto.
	(IX86_BUILTIN_VPHADDUWD): Ditto.
	(IX86_BUILTIN_VPHADDUWQ): Ditto.
	(IX86_BUILTIN_VPHADDUDQ): Ditto.
	(IX86_BUILTIN_VPHSUBBW): Ditto.
	(IX86_BUILTIN_VPHSUBWD): Ditto.
	(IX86_BUILTIN_VPHSUBDQ): Ditto.

	(IX86_BUILTIN_VPROTB): Ditto.
	(IX86_BUILTIN_VPROTW): Ditto.
	(IX86_BUILTIN_VPROTD): Ditto.
	(IX86_BUILTIN_VPROTQ): Ditto.
	(IX86_BUILTIN_VPROTB_IMM): Ditto.
	(IX86_BUILTIN_VPROTW_IMM): Ditto.
	(IX86_BUILTIN_VPROTD_IMM): Ditto.
	(IX86_BUILTIN_VPROTQ_IMM): Ditto.

	(IX86_BUILTIN_VPSHLB): Ditto.
	(IX86_BUILTIN_VPSHLW): Ditto.
	(IX86_BUILTIN_VPSHLD): Ditto.
	(IX86_BUILTIN_VPSHLQ): Ditto.
	(IX86_BUILTIN_VPSHAB): Ditto.
	(IX86_BUILTIN_VPSHAW): Ditto.
	(IX86_BUILTIN_VPSHAD): Ditto.
	(IX86_BUILTIN_VPSHAQ): Ditto.

	(IX86_BUILTIN_VFRCZSS): Ditto.
	(IX86_BUILTIN_VFRCZSD): Ditto.
	(IX86_BUILTIN_VFRCZPS): Ditto.
	(IX86_BUILTIN_VFRCZPD): Ditto.
	(IX86_BUILTIN_VFRCZPS256): Ditto.
	(IX86_BUILTIN_VFRCZPD256): Ditto.

	(IX86_BUILTIN_VPCOMEQUB): Ditto.
	(IX86_BUILTIN_VPCOMNEUB): Ditto.
	(IX86_BUILTIN_VPCOMLTUB): Ditto.
	(IX86_BUILTIN_VPCOMLEUB): Ditto.
	(IX86_BUILTIN_VPCOMGTUB): Ditto.
	(IX86_BUILTIN_VPCOMGEUB): Ditto.
	(IX86_BUILTIN_VPCOMFALSEUB): Ditto.
	(IX86_BUILTIN_VPCOMTRUEUB): Ditto.

	(IX86_BUILTIN_VPCOMEQUW): Ditto.
	(IX86_BUILTIN_VPCOMNEUW): Ditto.
	(IX86_BUILTIN_VPCOMLTUW): Ditto.
	(IX86_BUILTIN_VPCOMLEUW): Ditto.
	(IX86_BUILTIN_VPCOMGTUW): Ditto.
	(IX86_BUILTIN_VPCOMGEUW): Ditto.
	(IX86_BUILTIN_VPCOMFALSEUW): Ditto.
	(IX86_BUILTIN_VPCOMTRUEUW): Ditto.

	(IX86_BUILTIN_VPCOMEQUD): Ditto.
	(IX86_BUILTIN_VPCOMNEUD): Ditto.
	(IX86_BUILTIN_VPCOMLTUD): Ditto.
	(IX86_BUILTIN_VPCOMLEUD): Ditto.
	(IX86_BUILTIN_VPCOMGTUD): Ditto.
	(IX86_BUILTIN_VPCOMGEUD): Ditto.
	(IX86_BUILTIN_VPCOMFALSEUD): Ditto.
	(IX86_BUILTIN_VPCOMTRUEUD): Ditto.

	(IX86_BUILTIN_VPCOMEQUQ): Ditto.
	(IX86_BUILTIN_VPCOMNEUQ): Ditto.
	(IX86_BUILTIN_VPCOMLTUQ): Ditto.
	(IX86_BUILTIN_VPCOMLEUQ): Ditto.
	(IX86_BUILTIN_VPCOMGTUQ): Ditto.
	(IX86_BUILTIN_VPCOMGEUQ): Ditto.
	(IX86_BUILTIN_VPCOMFALSEUQ): Ditto.
	(IX86_BUILTIN_VPCOMTRUEUQ): Ditto.

	(IX86_BUILTIN_VPCOMEQB): Ditto.
	(IX86_BUILTIN_VPCOMNEB): Ditto.
	(IX86_BUILTIN_VPCOMLTB): Ditto.
	(IX86_BUILTIN_VPCOMLEB): Ditto.
	(IX86_BUILTIN_VPCOMGTB): Ditto.
	(IX86_BUILTIN_VPCOMGEB): Ditto.
	(IX86_BUILTIN_VPCOMFALSEB): Ditto.
	(IX86_BUILTIN_VPCOMTRUEB): Ditto.

	(IX86_BUILTIN_VPCOMEQW): Ditto.
	(IX86_BUILTIN_VPCOMNEW): Ditto.
	(IX86_BUILTIN_VPCOMLTW): Ditto.
	(IX86_BUILTIN_VPCOMLEW): Ditto.
	(IX86_BUILTIN_VPCOMGTW): Ditto.
	(IX86_BUILTIN_VPCOMGEW): Ditto.
	(IX86_BUILTIN_VPCOMFALSEW): Ditto.
	(IX86_BUILTIN_VPCOMTRUEW): Ditto.

	(IX86_BUILTIN_VPCOMEQD): Ditto.
	(IX86_BUILTIN_VPCOMNED): Ditto.
	(IX86_BUILTIN_VPCOMLTD): Ditto.
	(IX86_BUILTIN_VPCOMLED): Ditto.
	(IX86_BUILTIN_VPCOMGTD): Ditto.
	(IX86_BUILTIN_VPCOMGED): Ditto.
	(IX86_BUILTIN_VPCOMFALSED): Ditto.
	(IX86_BUILTIN_VPCOMTRUED): Ditto.

	(IX86_BUILTIN_VPCOMEQQ): Ditto.
	(IX86_BUILTIN_VPCOMNEQ): Ditto.
	(IX86_BUILTIN_VPCOMLTQ): Ditto.
	(IX86_BUILTIN_VPCOMLEQ): Ditto.
	(IX86_BUILTIN_VPCOMGTQ): Ditto.
	(IX86_BUILTIN_VPCOMGEQ): Ditto.
	(IX86_BUILTIN_VPCOMFALSEQ): Ditto.
	(IX86_BUILTIN_VPCOMTRUEQ): Ditto.

	(enum multi_arg_type): New enum for describing the various XOP
	intrinsic argument types.
	(bdesc_multi_arg): New table for XOP intrinsics.
	(ix86_init_mmx_sse_builtins): Add XOP intrinsic support.
	(ix86_expand_multi_arg_builtin): New function for creating XOP
	intrinsics.

	* config/i386/sse.md (sserotatemax): New mode attribute for XOP.
	(xop_pmacsww): Ditto.
	(xop_pmacssww): Ditto.
	(xop_pmacsdd): Ditto.
	(xop_pmacssdd): Ditto.
	(xop_pmacssdql): Ditto.
	(xop_pmacssdqh): Ditto.
	(xop_pmacsdql): Ditto.
	(xop_pmacsdql_mem): Ditto.
	(xop_mulv2div2di3_low): Ditto.
	(xop_pmacsdqh): Ditto.
	(xop_pmacsdqh_mem): Ditto.
	(xop_mulv2div2di3_high): Ditto.
	(xop_pmacsswd): Ditto.
	(xop_pmacswd): Ditto.
	(xop_pmadcsswd): Ditto.
	(xop_pmadcswd): Ditto.
	(xop_pcmov_<mode>): Ditto.
	(xop_pcmov_<mode>)256: Ditto.
	(xop_phaddbw): Ditto.
	(xop_phaddbd): Ditto.
	(xop_phaddbq): Ditto.
	(xop_phaddwd): Ditto.
	(xop_phaddwq): Ditto.
	(xop_phadddq): Ditto.
	(xop_phaddubw): Ditto.
	(xop_phaddubd): Ditto.
	(xop_phaddubq): Ditto.
	(xop_phadduwd): Ditto.
	(xop_phadduwq): Ditto.
	(xop_phaddudq): Ditto.
	(xop_phsubbw): Ditto.
	(xop_phsubwd): Ditto.
	(xop_phsubdq): Ditto.
	(xop_pperm): Ditto.
	(xop_pperm_zero_v16qi_v8hi): Ditto.
	(xop_pperm_sign_v16qi_v8hi): Ditto.
	(xop_pperm_zero_v8hi_v4si): Ditto.
	(xop_pperm_sign_v8hi_v4si): Ditto.
	(xop_pperm_zero_v4si_v2di): Ditto.
	(xop_pperm_sign_v4si_v2di): Ditto.
	(xop_pperm_pack_v2di_v4si): Ditto.
	(xop_pperm_pack_v4si_v8hi): Ditto.
	(xop_pperm_pack_v8hi_v16qi): Ditto.
	(rotl<mode>3): Ditto.
	(rotr<mode>3): Ditto.
	(xop_rotl<mode>3): Ditto.
	(xop_rotr<mode>3): Ditto.
	(vrotr<mode>3): Ditto.
	(vrotl<mode>3): Ditto.
	(xop_vrotl<mode>3): Ditto.
	(vlshr<mode>3): Ditto.
	(vashr<mode>3): Ditto.
	(vashl<mode>3
	(xop_ashl<mode>3): Ditto.
	(xop_lshl<mode>3): Ditto.
	(ashlv16qi3): Ditto.
	(lshlv16qi3): Ditto.
	(ashrv16qi3): Ditto.
	(ashrv2di3): Ditto.
	(xop_frcz<mode>2): Ditto.
	(xop_vmfrcz<mode>2): Ditto.
	(xop_frcz<mode>2256): Ditto.	
	(xop_maskcmp<mode>3): Ditto.
	(xop_maskcmp_uns<mode>3): Ditto.
	(xop_maskcmp_uns2<mode>3): Ditto.
	(xop_pcom_tf<mode>3): Ditto.

	* doc/invoke.texi (-mxop): Add documentation.
	* doc/extend.texi (x86 intrinsics): Add XOP intrinsics.

	* gcc.target/i386/xop-check.h
	* gcc.target/i386/xop-hadduX.c
	* gcc.target/i386/xop-haddX.c
	* gcc.target/i386/xop-hsubX.c
	* gcc.target/i386/xop-ima-vector.c
	* gcc.target/i386/xop-imul32widen-vector.c
	* gcc.target/i386/xop-imul32widen-vector.c
	* gcc.target/i386/xop-pcmov2.c
	* gcc.target/i386/xop-pcmov.c
	* gcc.target/i386/xop-rotate1-vector.c
	* gcc.target/i386/xop-rotate2-vector.c
	* gcc.target/i386/xop-rotate3-vector.c
	* gcc.target/i386/xop-shift1-vector.c
	* gcc.target/i386/xop-shift2-vector.c
	* gcc.target/i386/xop-shift3-vector.c: New file.
	* gcc.target/i386/i386.exp:  Add check_effective_target_xop.


Index: doc/extend.texi
===================================================================
--- doc/extend.texi	(revision 152317)
+++ doc/extend.texi	(working copy)
@@ -3173,6 +3173,11 @@ Enable/disable the generation of the SSE
 @cindex @code{target("fma4")} attribute
 Enable/disable the generation of the FMA4 instructions.
 
+@item xop
+@itemx no-xop
+@cindex @code{target("xop")} attribute
+Enable/disable the generation of the XOP instructions.
+
 @item ssse3
 @itemx no-ssse3
 @cindex @code{target("ssse3")} attribute
@@ -8893,6 +8898,134 @@ v2di __builtin_ia32_insertq (v2di, v2di)
 v2di __builtin_ia32_insertqi (v2di, v2di, const unsigned int, const unsigned int)
 @end smallexample
 
+The following built-in functions are available when @option{-mxop} is used.
+@smallexample
+v2df __builtin_ia32_vfrczpd (v2df)
+v4sf __builtin_ia32_vfrczps (v4sf)
+v2df __builtin_ia32_vfrczsd (v2df, v2df)
+v4sf __builtin_ia32_vfrczss (v4sf, v4sf)
+v4df __builtin_ia32_vfrczpd256 (v4df)
+v8sf __builtin_ia32_vfrczps256 (v8sf)
+v2di __builtin_ia32_vpcmov (v2di, v2di, v2di)
+v2di __builtin_ia32_vpcmov_v2di (v2di, v2di, v2di)
+v4si __builtin_ia32_vpcmov_v4si (v4si, v4si, v4si)
+v8hi __builtin_ia32_vpcmov_v8hi (v8hi, v8hi, v8hi)
+v16qi __builtin_ia32_vpcmov_v16qi (v16qi, v16qi, v16qi)
+v2df __builtin_ia32_vpcmov_v2df (v2df, v2df, v2df)
+v4sf __builtin_ia32_vpcmov_v4sf (v4sf, v4sf, v4sf)
+v4di __builtin_ia32_vpcmov_v4di256 (v4di, v4di, v4di)
+v8si __builtin_ia32_vpcmov_v8si256 (v8si, v8si, v8si)
+v16hi __builtin_ia32_vpcmov_v16hi256 (v16hi, v16hi, v16hi)
+v32qi __builtin_ia32_vpcmov_v32qi256 (v32qi, v32qi, v32qi)
+v4df __builtin_ia32_vpcmov_v4df256 (v4df, v4df, v4df)
+v8sf __builtin_ia32_vpcmov_v8sf256 (v8sf, v8sf, v8sf)
+v16qi __builtin_ia32_vpcomeqb (v16qi, v16qi)
+v8hi __builtin_ia32_vpcomeqw (v8hi, v8hi)
+v4si __builtin_ia32_vpcomeqd (v4si, v4si)
+v2di __builtin_ia32_vpcomeqq (v2di, v2di)
+v16qi __builtin_ia32_vpcomequb (v16qi, v16qi)
+v4si __builtin_ia32_vpcomequd (v4si, v4si)
+v2di __builtin_ia32_vpcomequq (v2di, v2di)
+v8hi __builtin_ia32_vpcomequw (v8hi, v8hi)
+v8hi __builtin_ia32_vpcomeqw (v8hi, v8hi)
+v16qi __builtin_ia32_vpcomfalseb (v16qi, v16qi)
+v4si __builtin_ia32_vpcomfalsed (v4si, v4si)
+v2di __builtin_ia32_vpcomfalseq (v2di, v2di)
+v16qi __builtin_ia32_vpcomfalseub (v16qi, v16qi)
+v4si __builtin_ia32_vpcomfalseud (v4si, v4si)
+v2di __builtin_ia32_vpcomfalseuq (v2di, v2di)
+v8hi __builtin_ia32_vpcomfalseuw (v8hi, v8hi)
+v8hi __builtin_ia32_vpcomfalsew (v8hi, v8hi)
+v16qi __builtin_ia32_vpcomgeb (v16qi, v16qi)
+v4si __builtin_ia32_vpcomged (v4si, v4si)
+v2di __builtin_ia32_vpcomgeq (v2di, v2di)
+v16qi __builtin_ia32_vpcomgeub (v16qi, v16qi)
+v4si __builtin_ia32_vpcomgeud (v4si, v4si)
+v2di __builtin_ia32_vpcomgeuq (v2di, v2di)
+v8hi __builtin_ia32_vpcomgeuw (v8hi, v8hi)
+v8hi __builtin_ia32_vpcomgew (v8hi, v8hi)
+v16qi __builtin_ia32_vpcomgtb (v16qi, v16qi)
+v4si __builtin_ia32_vpcomgtd (v4si, v4si)
+v2di __builtin_ia32_vpcomgtq (v2di, v2di)
+v16qi __builtin_ia32_vpcomgtub (v16qi, v16qi)
+v4si __builtin_ia32_vpcomgtud (v4si, v4si)
+v2di __builtin_ia32_vpcomgtuq (v2di, v2di)
+v8hi __builtin_ia32_vpcomgtuw (v8hi, v8hi)
+v8hi __builtin_ia32_vpcomgtw (v8hi, v8hi)
+v16qi __builtin_ia32_vpcomleb (v16qi, v16qi)
+v4si __builtin_ia32_vpcomled (v4si, v4si)
+v2di __builtin_ia32_vpcomleq (v2di, v2di)
+v16qi __builtin_ia32_vpcomleub (v16qi, v16qi)
+v4si __builtin_ia32_vpcomleud (v4si, v4si)
+v2di __builtin_ia32_vpcomleuq (v2di, v2di)
+v8hi __builtin_ia32_vpcomleuw (v8hi, v8hi)
+v8hi __builtin_ia32_vpcomlew (v8hi, v8hi)
+v16qi __builtin_ia32_vpcomltb (v16qi, v16qi)
+v4si __builtin_ia32_vpcomltd (v4si, v4si)
+v2di __builtin_ia32_vpcomltq (v2di, v2di)
+v16qi __builtin_ia32_vpcomltub (v16qi, v16qi)
+v4si __builtin_ia32_vpcomltud (v4si, v4si)
+v2di __builtin_ia32_vpcomltuq (v2di, v2di)
+v8hi __builtin_ia32_vpcomltuw (v8hi, v8hi)
+v8hi __builtin_ia32_vpcomltw (v8hi, v8hi)
+v16qi __builtin_ia32_vpcomneb (v16qi, v16qi)
+v4si __builtin_ia32_vpcomned (v4si, v4si)
+v2di __builtin_ia32_vpcomneq (v2di, v2di)
+v16qi __builtin_ia32_vpcomneub (v16qi, v16qi)
+v4si __builtin_ia32_vpcomneud (v4si, v4si)
+v2di __builtin_ia32_vpcomneuq (v2di, v2di)
+v8hi __builtin_ia32_vpcomneuw (v8hi, v8hi)
+v8hi __builtin_ia32_vpcomnew (v8hi, v8hi)
+v16qi __builtin_ia32_vpcomtrueb (v16qi, v16qi)
+v4si __builtin_ia32_vpcomtrued (v4si, v4si)
+v2di __builtin_ia32_vpcomtrueq (v2di, v2di)
+v16qi __builtin_ia32_vpcomtrueub (v16qi, v16qi)
+v4si __builtin_ia32_vpcomtrueud (v4si, v4si)
+v2di __builtin_ia32_vpcomtrueuq (v2di, v2di)
+v8hi __builtin_ia32_vpcomtrueuw (v8hi, v8hi)
+v8hi __builtin_ia32_vpcomtruew (v8hi, v8hi)
+v4si __builtin_ia32_vphaddbd (v16qi)
+v2di __builtin_ia32_vphaddbq (v16qi)
+v8hi __builtin_ia32_vphaddbw (v16qi)
+v2di __builtin_ia32_vphadddq (v4si)
+v4si __builtin_ia32_vphaddubd (v16qi)
+v2di __builtin_ia32_vphaddubq (v16qi)
+v8hi __builtin_ia32_vphaddubw (v16qi)
+v2di __builtin_ia32_vphaddudq (v4si)
+v4si __builtin_ia32_vphadduwd (v8hi)
+v2di __builtin_ia32_vphadduwq (v8hi)
+v4si __builtin_ia32_vphaddwd (v8hi)
+v2di __builtin_ia32_vphaddwq (v8hi)
+v8hi __builtin_ia32_vphsubbw (v16qi)
+v2di __builtin_ia32_vphsubdq (v4si)
+v4si __builtin_ia32_vphsubwd (v8hi)
+v4si __builtin_ia32_vpmacsdd (v4si, v4si, v4si)
+v2di __builtin_ia32_vpmacsdqh (v4si, v4si, v2di)
+v2di __builtin_ia32_vpmacsdql (v4si, v4si, v2di)
+v4si __builtin_ia32_vpmacssdd (v4si, v4si, v4si)
+v2di __builtin_ia32_vpmacssdqh (v4si, v4si, v2di)
+v2di __builtin_ia32_vpmacssdql (v4si, v4si, v2di)
+v4si __builtin_ia32_vpmacsswd (v8hi, v8hi, v4si)
+v8hi __builtin_ia32_vpmacssww (v8hi, v8hi, v8hi)
+v4si __builtin_ia32_vpmacswd (v8hi, v8hi, v4si)
+v8hi __builtin_ia32_vpmacsww (v8hi, v8hi, v8hi)
+v4si __builtin_ia32_vpmadcsswd (v8hi, v8hi, v4si)
+v4si __builtin_ia32_vpmadcswd (v8hi, v8hi, v4si)
+v16qi __builtin_ia32_vpperm (v16qi, v16qi, v16qi)
+v16qi __builtin_ia32_vprotb (v16qi, v16qi)
+v4si __builtin_ia32_vprotd (v4si, v4si)
+v2di __builtin_ia32_vprotq (v2di, v2di)
+v8hi __builtin_ia32_vprotw (v8hi, v8hi)
+v16qi __builtin_ia32_vpshab (v16qi, v16qi)
+v4si __builtin_ia32_vpshad (v4si, v4si)
+v2di __builtin_ia32_vpshaq (v2di, v2di)
+v8hi __builtin_ia32_vpshaw (v8hi, v8hi)
+v16qi __builtin_ia32_vpshlb (v16qi, v16qi)
+v4si __builtin_ia32_vpshld (v4si, v4si)
+v2di __builtin_ia32_vpshlq (v2di, v2di)
+v8hi __builtin_ia32_vpshlw (v8hi, v8hi)
+@end smallexample
+
 The following built-in functions are available when @option{-mfma4} is used.
 All of them generate the machine instruction that is part of the name
 with MMX registers.
Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi	(revision 152317)
+++ doc/invoke.texi	(working copy)
@@ -592,7 +592,7 @@ Objective-C and Objective-C++ Dialects}.
 -mcld -mcx16 -msahf -mmovbe -mcrc32 -mrecip @gol
 -mmmx  -msse  -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msse4 -mavx @gol
 -maes -mpclmul @gol
--msse4a -m3dnow -mpopcnt -mabm -mfma4 @gol
+-msse4a -m3dnow -mpopcnt -mabm -mfma4 -mxop @gol
 -mthreads  -mno-align-stringops  -minline-all-stringops @gol
 -minline-stringops-dynamically -mstringop-strategy=@var{alg} @gol
 -mpush-args  -maccumulate-outgoing-args  -m128bit-long-double @gol
@@ -11729,6 +11729,8 @@ preferred alignment to @option{-mpreferr
 @itemx -mno-sse4a
 @itemx -mfma4
 @itemx -mno-fma4
+@itemx -mxop
+@itemx -mno-xop
 @itemx -m3dnow
 @itemx -mno-3dnow
 @itemx -mpopcnt
@@ -11742,8 +11744,8 @@ preferred alignment to @option{-mpreferr
 @opindex m3dnow
 @opindex mno-3dnow
 These switches enable or disable the use of instructions in the MMX,
-SSE, SSE2, SSE3, SSSE3, SSE4.1, AVX, AES, PCLMUL, SSE4A, FMA4, ABM or
-3DNow!@: extended instruction sets.
+SSE, SSE2, SSE3, SSSE3, SSE4.1, AVX, AES, PCLMUL, SSE4A, FMA4, XOP,
+ABM or 3DNow!@: extended instruction sets.
 These extensions are also available as built-in functions: see
 @ref{X86 Built-in Functions}, for details of the functions enabled and
 disabled by these switches.
Index: testsuite/gcc.target/i386/xop-haddX.c
===================================================================
--- testsuite/gcc.target/i386/xop-haddX.c	(revision 0)
+++ testsuite/gcc.target/i386/xop-haddX.c	(revision 0)
@@ -0,0 +1,206 @@
+/* { dg-do run } */
+/* { dg-require-effective-target xop } */
+/* { dg-options "-O2 -mxop" } */
+
+#include "xop-check.h"
+
+#include <x86intrin.h>
+#include <string.h>
+
+#define NUM 10
+
+union
+{
+  __m128i x[NUM];
+  signed char ssi[NUM * 16];
+  short si[NUM * 8];
+  int li[NUM * 4];
+  long long lli[NUM * 2];
+} dst, res, src1;
+
+static void
+init_sbyte ()
+{
+  int i;
+  for (i=0; i < NUM * 16; i++)
+    src1.ssi[i] = i;
+}
+
+static void
+init_sword ()
+{
+  int i;
+  for (i=0; i < NUM * 8; i++)
+    src1.si[i] = i;
+}
+
+
+static void
+init_sdword ()
+{
+  int i;
+  for (i=0; i < NUM * 4; i++)
+    src1.li[i] = i;
+}
+
+static int 
+check_sbyte2word ()
+{
+  int i, j, s, t, check_fails = 0;
+  for (i = 0; i < NUM * 16; i = i + 16)
+    {
+      for (j = 0; j < 8; j++)
+	{
+	  t = i + (2 * j);
+	  s = (i / 2) + j;
+	  res.si[s] = src1.ssi[t] + src1.ssi[t + 1] ;
+	  if (res.si[s] != dst.si[s]) 
+	    check_fails++;	
+	}
+    }
+}
+
+static int 
+check_sbyte2dword ()
+{
+  int i, j, s, t, check_fails = 0;
+  for (i = 0; i < NUM * 16; i = i + 16)
+    {
+      for (j = 0; j < 4; j++)
+	{
+	  t = i + (4 * j);
+	  s = (i / 4) + j;
+	  res.li[s] = (src1.ssi[t] + src1.ssi[t + 1]) + (src1.ssi[t + 2]
+	              + src1.ssi[t + 3]); 
+	  if (res.li[s] != dst.li[s]) 
+	    check_fails++;
+	}
+    }
+  return check_fails++;
+}
+
+static int
+check_sbyte2qword ()
+{
+  int i, j, s, t, check_fails = 0;
+  for (i = 0; i < NUM * 16; i = i + 16)
+    {
+      for (j = 0; j < 2; j++)
+	{
+	  t = i + (8 * j);
+	  s = (i / 8) + j;
+	  res.lli[s] = ((src1.ssi[t] + src1.ssi[t + 1]) + (src1.ssi[t + 2] 
+		       + src1.ssi[t + 3])) + ((src1.ssi[t + 4] + src1.ssi[t +5])
+	               + (src1.ssi[t + 6] + src1.ssi[t + 7])); 
+	  if (res.lli[s] != dst.lli[s]) 
+	    check_fails++;
+	}
+    }
+  return check_fails++;
+}
+
+static int
+check_sword2dword ()
+{
+  int i, j, s, t, check_fails = 0;
+  for (i = 0; i < (NUM * 8); i = i + 8)
+    {
+      for (j = 0; j < 4; j++)
+	{
+	  t = i + (2 * j);
+	  s = (i / 2) + j;
+	  res.li[s] = src1.si[t] + src1.si[t + 1] ;
+	  if (res.li[s] != dst.li[s]) 
+	    check_fails++;	
+	}
+    }
+}
+
+static int 
+check_sword2qword ()
+{
+  int i, j, s, t, check_fails = 0;
+  for (i = 0; i < NUM * 8; i = i + 8)
+    {
+      for (j = 0; j < 2; j++)
+	{
+	  t = i + (4 * j);
+	  s = (i / 4) + j;
+	  res.lli[s] = (src1.si[t] + src1.si[t + 1]) + (src1.si[t + 2]
+	               + src1.si[t + 3]); 
+	  if (res.lli[s] != dst.lli[s]) 
+	    check_fails++;
+	}
+    }
+  return check_fails++;
+}
+
+static int
+check_dword2qword ()
+{
+  int i, j, s, t, check_fails = 0;
+  for (i = 0; i < (NUM * 4); i = i + 4)
+    {
+      for (j = 0; j < 2; j++)
+	{
+	  t = i + (2 * j);
+	  s = (i / 2) + j;
+	  res.lli[s] = src1.li[t] + src1.li[t + 1] ;
+	  if (res.lli[s] != dst.lli[s]) 
+	    check_fails++;	
+	}
+    }
+}
+
+static void
+xop_test (void)
+{
+  int i;
+
+  init_sbyte ();
+  
+  for (i = 0; i < NUM; i++)
+    dst.x[i] = _mm_haddw_epi8 (src1.x[i]);
+  
+  if (check_sbyte2word())
+  abort ();
+  
+
+  for (i = 0; i < (NUM ); i++)
+    dst.x[i] = _mm_haddd_epi8 (src1.x[i]);
+  
+  if (check_sbyte2dword())
+    abort (); 
+  
+
+  for (i = 0; i < NUM; i++)
+    dst.x[i] = _mm_haddq_epi8 (src1.x[i]);
+  
+  if (check_sbyte2qword())
+    abort ();
+
+
+  init_sword ();
+
+  for (i = 0; i < (NUM ); i++)
+    dst.x[i] = _mm_haddd_epi16 (src1.x[i]);
+  
+  if (check_sword2dword())
+    abort (); 
+
+  for (i = 0; i < NUM; i++)
+    dst.x[i] = _mm_haddq_epi16 (src1.x[i]);
+  
+  if (check_sword2qword())
+    abort ();
+ 
+
+  init_sdword ();
+
+    for (i = 0; i < NUM; i++)
+    dst.x[i] = _mm_haddq_epi32 (src1.x[i]);
+  
+  if (check_dword2qword())
+    abort ();
+
+}
Index: testsuite/gcc.target/i386/i386.exp
===================================================================
--- testsuite/gcc.target/i386/i386.exp	(revision 152317)
+++ testsuite/gcc.target/i386/i386.exp	(working copy)
@@ -134,6 +134,20 @@ proc check_effective_target_fma4 { } {
     } "-O2 -mfma4" ]
 }
 
+# Return 1 if xop instructions can be compiled.
+proc check_effective_target_xop { } {
+    return [check_no_compiler_messages xop object {
+	typedef long long __m128i __attribute__ ((__vector_size__ (16), __may_alias__));
+	typedef short __v8hi __attribute__ ((__vector_size__ (16)));
+	__m128i _mm_maccs_epi16(__m128i __A, __m128i __B, __m128i __C)
+	{
+	    return (__m128i) __builtin_ia32_pmacssww ((__v8hi)__A,
+						      (__v8hi)__B,
+						      (__v8hi)__C);
+	}
+    } "-O2 -mxop" ]
+}
+
 # If a testcase doesn't have special options, use these.
 global DEFAULT_CFLAGS
 if ![info exists DEFAULT_CFLAGS] then {
Index: testsuite/gcc.target/i386/xop-ima-vector.c
===================================================================
--- testsuite/gcc.target/i386/xop-ima-vector.c	(revision 0)
+++ testsuite/gcc.target/i386/xop-ima-vector.c	(revision 0)
@@ -0,0 +1,34 @@
+/* Test that the compiler properly optimizes vector 32-bit integer point
+   multiply and add instructions vector into pmacsdd on XOP systems.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-options "-O2 -mxop -ftree-vectorize" } */
+
+extern void exit (int);
+
+typedef long long __m128i __attribute__ ((__vector_size__ (16), __may_alias__));
+
+#define SIZE 10240
+
+union {
+  __m128i align;
+  int i[SIZE];
+} a, b, c, d;
+
+void
+int_mul_add (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    a.i[i] = (b.i[i] * c.i[i]) + d.i[i];
+}
+
+int main ()
+{
+  int_mul_add ();
+  exit (0);
+}
+
+/* { dg-final { scan-assembler "vpmacsdd" } } */
Index: testsuite/gcc.target/i386/xop-rotate1-vector.c
===================================================================
--- testsuite/gcc.target/i386/xop-rotate1-vector.c	(revision 0)
+++ testsuite/gcc.target/i386/xop-rotate1-vector.c	(revision 0)
@@ -0,0 +1,35 @@
+/* Test that the compiler properly optimizes vector rotate instructions vector
+   into prot on XOP systems.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-options "-O2 -mxop -ftree-vectorize" } */
+
+extern void exit (int);
+
+typedef long __m128i  __attribute__ ((__vector_size__ (16), __may_alias__));
+
+#define SIZE 10240
+
+union {
+  __m128i i_align;
+  unsigned u32[SIZE];
+} a, b, c;
+
+void
+left_rotate32 (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    a.u32[i] = (b.u32[i] << ((sizeof (int) * 8) - 4)) | (b.u32[i] >> 4);
+}
+
+int
+main ()
+{
+  left_rotate32 ();
+  exit (0);
+}
+
+/* { dg-final { scan-assembler "vprotd" } } */
Index: testsuite/gcc.target/i386/xop-check.h
===================================================================
--- testsuite/gcc.target/i386/xop-check.h	(revision 0)
+++ testsuite/gcc.target/i386/xop-check.h	(revision 0)
@@ -0,0 +1,20 @@
+#include <stdlib.h>
+
+#include "cpuid.h"
+
+static void xop_test (void);
+
+int
+main ()
+{
+  unsigned int eax, ebx, ecx, edx;
+ 
+  if (!__get_cpuid (0x80000001, &eax, &ebx, &ecx, &edx))
+    return 0;
+
+  /* Run XOP test only if host has XOP support.  */
+  if (ecx & bit_XOP)
+    xop_test ();
+
+  exit (0);
+}
Index: testsuite/gcc.target/i386/xop-rotate2-vector.c
===================================================================
--- testsuite/gcc.target/i386/xop-rotate2-vector.c	(revision 0)
+++ testsuite/gcc.target/i386/xop-rotate2-vector.c	(revision 0)
@@ -0,0 +1,35 @@
+/* Test that the compiler properly optimizes vector rotate instructions vector
+   into prot on XOP systems.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-options "-O2 -mxop -ftree-vectorize" } */
+
+extern void exit (int);
+
+typedef long __m128i  __attribute__ ((__vector_size__ (16), __may_alias__));
+
+#define SIZE 10240
+
+union {
+  __m128i i_align;
+  unsigned u32[SIZE];
+} a, b, c;
+
+void
+right_rotate32_b (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    a.u32[i] = (b.u32[i] >> ((sizeof (int) * 8) - 4)) | (b.u32[i] << 4);
+}
+
+int
+main ()
+{
+  right_rotate ();
+  exit (0);
+}
+
+/* { dg-final { scan-assembler "vprot" } } */
Index: testsuite/gcc.target/i386/xop-imul32widen-vector.c
===================================================================
--- testsuite/gcc.target/i386/xop-imul32widen-vector.c	(revision 0)
+++ testsuite/gcc.target/i386/xop-imul32widen-vector.c	(revision 0)
@@ -0,0 +1,36 @@
+/* Test that the compiler properly optimizes floating point multiply and add
+   instructions vector into pmacsdd/etc. on XOP systems.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-options "-O2 -mxop -ftree-vectorize" } */
+
+extern void exit (int);
+
+typedef long __m128i  __attribute__ ((__vector_size__ (16), __may_alias__));
+
+#define SIZE 10240
+
+union {
+  __m128i i_align;
+  int i32[SIZE];
+  long i64[SIZE];
+} a, b, c, d;
+
+void
+imul32_to_64 (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    a.i64[i] = ((long)b.i32[i]) * ((long)c.i32[i]);
+}
+
+int main ()
+{
+  imul32_to_64 ();
+  exit (0);
+}
+
+/* { dg-final { scan-assembler "vpmacsdql" } } */
+/* { dg-final { scan-assembler "vpmacsdqh" } } */
Index: testsuite/gcc.target/i386/xop-rotate3-vector.c
===================================================================
--- testsuite/gcc.target/i386/xop-rotate3-vector.c	(revision 0)
+++ testsuite/gcc.target/i386/xop-rotate3-vector.c	(revision 0)
@@ -0,0 +1,34 @@
+/* Test that the compiler properly optimizes vector rotate instructions vector
+   into prot on XOP systems.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-options "-O2 -mxop -ftree-vectorize" } */
+
+extern void exit (int);
+
+typedef long __m128i  __attribute__ ((__vector_size__ (16), __may_alias__));
+
+#define SIZE 10240
+
+union {
+  __m128i i_align;
+  unsigned u32[SIZE];
+} a, b, c;
+
+void
+vector_rotate32 (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    a.u32[i] = (b.u32[i] >> ((sizeof (int) * 8) - c.u32[i])) | (b.u32[i] << c.u32[i]);
+}
+
+int main ()
+{
+  vector_rotate32 ();
+  exit (0);
+}
+
+/* { dg-final { scan-assembler "vprotd" } } */
Index: testsuite/gcc.target/i386/xop-hsubX.c
===================================================================
--- testsuite/gcc.target/i386/xop-hsubX.c	(revision 0)
+++ testsuite/gcc.target/i386/xop-hsubX.c	(revision 0)
@@ -0,0 +1,128 @@
+/* { dg-do run } */
+/* { dg-require-effective-target xop } */
+/* { dg-options "-O2 -mxop" } */
+
+#include "xop-check.h"
+
+#include <x86intrin.h>
+#include <string.h>
+
+#define NUM 10
+
+union
+{
+  __m128i x[NUM];
+  signed char ssi[NUM * 16];
+  short si[NUM * 8];
+  int li[NUM * 4];
+  long long lli[NUM * 2];
+} dst, res, src1;
+
+static void
+init_sbyte ()
+{
+  int i;
+  for (i=0; i < NUM * 16; i++)
+    src1.ssi[i] = i;
+}
+
+static void
+init_sword ()
+{
+  int i;
+  for (i=0; i < NUM * 8; i++)
+    src1.si[i] = i;
+}
+
+
+static void
+init_sdword ()
+{
+  int i;
+  for (i=0; i < NUM * 4; i++)
+    src1.li[i] = i;
+}
+
+static int 
+check_sbyte2word ()
+{
+  int i, j, s, t, check_fails = 0;
+  for (i = 0; i < NUM * 16; i = i + 16)
+    {
+      for (j = 0; j < 8; j++)
+	{
+	  t = i + (2 * j);
+	  s = (i / 2) + j;
+	  res.si[s] = src1.ssi[t] - src1.ssi[t + 1] ;
+	  if (res.si[s] != dst.si[s]) 
+	    check_fails++;	
+	}
+    }
+}
+
+static int
+check_sword2dword ()
+{
+  int i, j, s, t, check_fails = 0;
+  for (i = 0; i < (NUM * 8); i = i + 8)
+    {
+      for (j = 0; j < 4; j++)
+	{
+	  t = i + (2 * j);
+	  s = (i / 2) + j;
+	  res.li[s] = src1.si[t] - src1.si[t + 1] ;
+	  if (res.li[s] != dst.li[s]) 
+	    check_fails++;	
+	}
+    }
+}
+
+static int
+check_dword2qword ()
+{
+  int i, j, s, t, check_fails = 0;
+  for (i = 0; i < (NUM * 4); i = i + 4)
+    {
+      for (j = 0; j < 2; j++)
+	{
+	  t = i + (2 * j);
+	  s = (i / 2) + j;
+	  res.lli[s] = src1.li[t] - src1.li[t + 1] ;
+	  if (res.lli[s] != dst.lli[s]) 
+	    check_fails++;	
+	}
+    }
+}
+
+static void
+xop_test (void)
+{
+  int i;
+  
+  /* Check hsubbw */
+  init_sbyte ();
+  
+  for (i = 0; i < NUM; i++)
+    dst.x[i] = _mm_hsubw_epi8 (src1.x[i]);
+  
+  if (check_sbyte2word())
+  abort ();
+  
+
+  /* Check hsubwd */
+  init_sword ();
+
+  for (i = 0; i < (NUM ); i++)
+    dst.x[i] = _mm_hsubd_epi16 (src1.x[i]);
+  
+  if (check_sword2dword())
+    abort (); 
+   
+   /* Check hsubdq */
+  init_sdword ();
+    for (i = 0; i < NUM; i++)
+    dst.x[i] = _mm_hsubq_epi32 (src1.x[i]);
+  
+  if (check_dword2qword())
+    abort ();
+}
Index: testsuite/gcc.target/i386/xop-pcmov.c
===================================================================
--- testsuite/gcc.target/i386/xop-pcmov.c	(revision 0)
+++ testsuite/gcc.target/i386/xop-pcmov.c	(revision 0)
@@ -0,0 +1,23 @@
+/* Test that the compiler properly optimizes conditional floating point moves
+   into the pcmov instruction on XOP systems.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-options "-O2 -mxop" } */
+
+extern void exit (int);
+
+double dbl_test (double a, double b, double c, double d)
+{
+  return (a > b) ? c : d;
+}
+
+double dbl_a = 1, dbl_b = 2, dbl_c = 3, dbl_d = 4, dbl_e;
+
+int main()
+{
+  dbl_e = dbl_test (dbl_a, dbl_b, dbl_c, dbl_d);
+  exit (0);
+}
+
+/* { dg-final { scan-assembler "vpcmov" } } */
Index: testsuite/gcc.target/i386/xop-imul64-vector.c
===================================================================
--- testsuite/gcc.target/i386/xop-imul64-vector.c	(revision 0)
+++ testsuite/gcc.target/i386/xop-imul64-vector.c	(revision 0)
@@ -0,0 +1,36 @@
+/* Test that the compiler properly optimizes floating point multiply and add
+   instructions vector into pmacsdd/etc. on XOP systems.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-options "-O2 -mxop -ftree-vectorize" } */
+
+extern void exit (int);
+
+typedef long __m128i  __attribute__ ((__vector_size__ (16), __may_alias__));
+
+#define SIZE 10240
+
+union {
+  __m128i i_align;
+  long i64[SIZE];
+} a, b, c, d;
+
+void
+imul64 (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    a.i64[i] = b.i64[i] * c.i64[i];
+}
+
+int main ()
+{
+  imul64 ();
+  exit (0);
+}
+
+/* { dg-final { scan-assembler "vpmacsdd" } } */
+/* { dg-final { scan-assembler "vphadddq" } } */
+/* { dg-final { scan-assembler "vpmacsdql" } } */
Index: testsuite/gcc.target/i386/xop-shift1-vector.c
===================================================================
--- testsuite/gcc.target/i386/xop-shift1-vector.c	(revision 0)
+++ testsuite/gcc.target/i386/xop-shift1-vector.c	(revision 0)
@@ -0,0 +1,35 @@
+/* Test that the compiler properly optimizes vector shift instructions into
+   psha/pshl on XOP systems.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-options "-O2 -mxop -ftree-vectorize" } */
+
+extern void exit (int);
+
+typedef long __m128i  __attribute__ ((__vector_size__ (16), __may_alias__));
+
+#define SIZE 10240
+
+union {
+  __m128i i_align;
+  int i32[SIZE];
+  unsigned u32[SIZE];
+} a, b, c;
+
+void
+left_shift32 (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    a.i32[i] = b.i32[i] << c.i32[i];
+}
+
+int main ()
+{
+  left_shfit32 ();
+  exit (0);
+}
+
+/* { dg-final { scan-assembler "vpshad" } } */
Index: testsuite/gcc.target/i386/xop-pcmov2.c
===================================================================
--- testsuite/gcc.target/i386/xop-pcmov2.c	(revision 0)
+++ testsuite/gcc.target/i386/xop-pcmov2.c	(revision 0)
@@ -0,0 +1,23 @@
+/* Test that the compiler properly optimizes conditional floating point moves
+   into the pcmov instruction on XOP systems.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-options "-O2 -mxop" } */
+
+extern void exit (int);
+
+float flt_test (float a, float b, float c, float d)
+{
+  return (a > b) ? c : d;
+}
+
+float flt_a = 1, flt_b = 2, flt_c = 3, flt_d = 4, flt_e;
+
+int main()
+{
+  flt_e = flt_test (flt_a, flt_b, flt_c, flt_d);
+  exit (0);
+}
+
+/* { dg-final { scan-assembler "vpcmov" } } */
Index: testsuite/gcc.target/i386/xop-shift2-vector.c
===================================================================
--- testsuite/gcc.target/i386/xop-shift2-vector.c	(revision 0)
+++ testsuite/gcc.target/i386/xop-shift2-vector.c	(revision 0)
@@ -0,0 +1,35 @@
+/* Test that the compiler properly optimizes vector shift instructions into
+   psha/pshl on XOP systems.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-options "-O2 -mxop -ftree-vectorize" } */
+
+extern void exit (int);
+
+typedef long __m128i  __attribute__ ((__vector_size__ (16), __may_alias__));
+
+#define SIZE 10240
+
+union {
+  __m128i i_align;
+  int i32[SIZE];
+  unsigned u32[SIZE];
+} a, b, c;
+
+void
+right_sign_shift32 (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    a.i32[i] = b.i32[i] >> c.i32[i];
+}
+
+int main ()
+{
+  right_sign_shfit32 ();
+  exit (0);
+}
+
+/* { dg-final { scan-assembler "vpshad" } } */
Index: testsuite/gcc.target/i386/xop-hadduX.c
===================================================================
--- testsuite/gcc.target/i386/xop-hadduX.c	(revision 0)
+++ testsuite/gcc.target/i386/xop-hadduX.c	(revision 0)
@@ -0,0 +1,207 @@
+/* { dg-do run } */
+/* { dg-require-effective-target xop } */
+/* { dg-options "-O2 -mxop" } */
+
+#include "xop-check.h"
+
+#include <x86intrin.h>
+#include <string.h>
+
+#define NUM 10
+
+union
+{
+  __m128i x[NUM];
+  unsigned char  ssi[NUM * 16];
+  unsigned short si[NUM * 8];
+  unsigned int li[NUM * 4];
+  unsigned long long  lli[NUM * 2];
+} dst, res, src1;
+
+static void
+init_byte ()
+{
+  int i;
+  for (i=0; i < NUM * 16; i++)
+    src1.ssi[i] = i;
+}
+
+static void
+init_word ()
+{
+  int i;
+  for (i=0; i < NUM * 8; i++)
+    src1.si[i] = i;
+}
+
+
+static void
+init_dword ()
+{
+  int i;
+  for (i=0; i < NUM * 4; i++)
+    src1.li[i] = i;
+}
+
+static int 
+check_byte2word ()
+{
+  int i, j, s, t, check_fails = 0;
+  for (i = 0; i < NUM * 16; i = i + 16)
+    {
+      for (j = 0; j < 8; j++)
+	{
+	  t = i + (2 * j);
+	  s = (i / 2) + j;
+	  res.si[s] = src1.ssi[t] + src1.ssi[t + 1] ;
+	  if (res.si[s] != dst.si[s]) 
+	    check_fails++;	
+	}
+    }
+}
+
+static int 
+check_byte2dword ()
+{
+  int i, j, s, t, check_fails = 0;
+  for (i = 0; i < NUM * 16; i = i + 16)
+    {
+      for (j = 0; j < 4; j++)
+	{
+	  t = i + (4 * j);
+	  s = (i / 4) + j;
+	  res.li[s] = (src1.ssi[t] + src1.ssi[t + 1]) + (src1.ssi[t + 2]
+	              + src1.ssi[t + 3]); 
+	  if (res.li[s] != dst.li[s]) 
+	    check_fails++;
+	}
+    }
+  return check_fails++;
+}
+
+static int
+check_byte2qword ()
+{
+  int i, j, s, t, check_fails = 0;
+  for (i = 0; i < NUM * 16; i = i + 16)
+    {
+      for (j = 0; j < 2; j++)
+	{
+	  t = i + (8 * j);
+	  s = (i / 8) + j;
+	  res.lli[s] = ((src1.ssi[t] + src1.ssi[t + 1]) + (src1.ssi[t + 2] 
+		       + src1.ssi[t + 3])) + ((src1.ssi[t + 4] + src1.ssi[t +5])
+	               + (src1.ssi[t + 6] + src1.ssi[t + 7])); 
+	  if (res.lli[s] != dst.lli[s]) 
+	    check_fails++;
+	}
+    }
+  return check_fails++;
+}
+
+static int
+check_word2dword ()
+{
+  int i, j, s, t, check_fails = 0;
+  for (i = 0; i < (NUM * 8); i = i + 8)
+    {
+      for (j = 0; j < 4; j++)
+	{
+	  t = i + (2 * j);
+	  s = (i / 2) + j;
+	  res.li[s] = src1.si[t] + src1.si[t + 1] ;
+	  if (res.li[s] != dst.li[s]) 
+	    check_fails++;	
+	}
+    }
+}
+
+static int 
+check_word2qword ()
+{
+  int i, j, s, t, check_fails = 0;
+  for (i = 0; i < NUM * 8; i = i + 8)
+    {
+      for (j = 0; j < 2; j++)
+	{
+	  t = i + (4 * j);
+	  s = (i / 4) + j;
+	  res.lli[s] = (src1.si[t] + src1.si[t + 1]) + (src1.si[t + 2]
+	               + src1.si[t + 3]); 
+	  if (res.lli[s] != dst.lli[s]) 
+	    check_fails++;
+	}
+    }
+  return check_fails++;
+}
+
+static int
+check_dword2qword ()
+{
+  int i, j, s, t, check_fails = 0;
+  for (i = 0; i < (NUM * 4); i = i + 4)
+    {
+      for (j = 0; j < 2; j++)
+	{
+	  t = i + (2 * j);
+	  s = (i / 2) + j;
+	  res.lli[s] = src1.li[t] + src1.li[t + 1] ;
+	  if (res.lli[s] != dst.lli[s]) 
+	    check_fails++;	
+	}
+    }
+}
+
+static void
+xop_test (void)
+{
+  int i;
+  
+  /* Check haddubw */
+  init_byte ();
+  
+  for (i = 0; i < NUM; i++)
+    dst.x[i] = _mm_haddw_epu8 (src1.x[i]);
+  
+  if (check_byte2word())
+  abort ();
+  
+  /* Check haddubd */
+  for (i = 0; i < (NUM ); i++)
+    dst.x[i] = _mm_haddd_epu8 (src1.x[i]);
+  
+  if (check_byte2dword())
+    abort (); 
+  
+  /* Check haddubq */
+  for (i = 0; i < NUM; i++)
+    dst.x[i] = _mm_haddq_epu8 (src1.x[i]);
+  
+  if (check_byte2qword())
+    abort ();
+
+  /* Check hadduwd */
+  init_word ();
+
+  for (i = 0; i < (NUM ); i++)
+    dst.x[i] = _mm_haddd_epu16 (src1.x[i]);
+  
+  if (check_word2dword())
+    abort (); 
+   
+  /* Check haddbuwq */
+ 
+  for (i = 0; i < NUM; i++)
+    dst.x[i] = _mm_haddq_epu16 (src1.x[i]);
+  
+  if (check_word2qword())
+    abort ();
+ 
+  /* Check hadudq */
+  init_dword ();
+    for (i = 0; i < NUM; i++)
+    dst.x[i] = _mm_haddq_epu32 (src1.x[i]);
+  
+  if (check_dword2qword())
+    abort ();
+}
Index: testsuite/gcc.target/i386/xop-shift3-vector.c
===================================================================
--- testsuite/gcc.target/i386/xop-shift3-vector.c	(revision 0)
+++ testsuite/gcc.target/i386/xop-shift3-vector.c	(revision 0)
@@ -0,0 +1,35 @@
+/* Test that the compiler properly optimizes vector shift instructions into
+   psha/pshl on XOP systems.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-options "-O2 -mxop -ftree-vectorize" } */
+
+extern void exit (int);
+
+typedef long __m128i  __attribute__ ((__vector_size__ (16), __may_alias__));
+
+#define SIZE 10240
+
+union {
+  __m128i i_align;
+  int i32[SIZE];
+  unsigned u32[SIZE];
+} a, b, c;
+
+void
+right_uns_shift32 (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    a.u32[i] = b.u32[i] >> c.i32[i];
+}
+
+int main ()
+{
+  right_uns_shfit32 ();
+  exit (0);
+}
+
+/* { dg-final { scan-assembler "vpshld" } } */
Index: config.gcc
===================================================================
--- config.gcc	(revision 152317)
+++ config.gcc	(working copy)
@@ -287,7 +287,7 @@ i[34567]86-*-*)
 	extra_headers="cpuid.h mmintrin.h mm3dnow.h xmmintrin.h emmintrin.h
 		       pmmintrin.h tmmintrin.h ammintrin.h smmintrin.h
 		       nmmintrin.h bmmintrin.h fma4intrin.h wmmintrin.h
-		       immintrin.h x86intrin.h avxintrin.h 
+		       immintrin.h x86intrin.h avxintrin.h xopintrin.h
 		       ia32intrin.h cross-stdarg.h"
 	;;
 x86_64-*-*)
@@ -297,7 +297,7 @@ x86_64-*-*)
 	extra_headers="cpuid.h mmintrin.h mm3dnow.h xmmintrin.h emmintrin.h
 		       pmmintrin.h tmmintrin.h ammintrin.h smmintrin.h
 		       nmmintrin.h bmmintrin.h fma4intrin.h wmmintrin.h
-		       immintrin.h x86intrin.h avxintrin.h 
+		       immintrin.h x86intrin.h avxintrin.h xopintrin.h 
 		       ia32intrin.h cross-stdarg.h"
 	need_64bit_hwint=yes
 	;;
Index: config/i386/i386.h
===================================================================
--- config/i386/i386.h	(revision 152317)
+++ config/i386/i386.h	(working copy)
@@ -55,6 +55,7 @@ see the files COPYING3 and COPYING.RUNTI
 #define TARGET_FMA	OPTION_ISA_FMA
 #define TARGET_SSE4A	OPTION_ISA_SSE4A
 #define TARGET_FMA4	OPTION_ISA_FMA4
+#define TARGET_XOP	OPTION_ISA_XOP
 #define TARGET_ROUND	OPTION_ISA_ROUND
 #define TARGET_ABM	OPTION_ISA_ABM
 #define TARGET_POPCNT	OPTION_ISA_POPCNT
Index: config/i386/i386.md
===================================================================
--- config/i386/i386.md	(revision 152317)
+++ config/i386/i386.md	(working copy)
@@ -57,6 +57,7 @@
 ;; X -- don't print any sort of PIC '@' suffix for a symbol.
 ;; & -- print some in-use local-dynamic symbol name.
 ;; H -- print a memory address offset by 8; used for sse high-parts
+;; Y -- print condition for XOP pcom* instruction.
 ;; + -- print a branch hint as 'cs' or 'ds' prefix
 ;; ; -- print a semicolon (after prefixes due to bug in older gas).
 
@@ -199,6 +200,11 @@
    (UNSPEC_FMA4_INTRINSIC	150)
    (UNSPEC_FMA4_FMADDSUB	151)
    (UNSPEC_FMA4_FMSUBADD	152)
+   (UNSPEC_XOP_UNSIGNED_CMP	151)
+   (UNSPEC_XOP_TRUEFALSE	152)
+   (UNSPEC_XOP_PERMUTE		153)
+   (UNSPEC_FRCZ			154)
+
    ; For AES support
    (UNSPEC_AESENC		159)
    (UNSPEC_AESENCLAST		160)
@@ -253,6 +259,20 @@
    (COM_TRUE_P			5)
   ])
 
+;; Constants used in the XOP pperm instruction
+(define_constants
+  [(PPERM_SRC			0x00)	/* copy source */
+   (PPERM_INVERT		0x20)	/* invert source */
+   (PPERM_REVERSE		0x40)	/* bit reverse source */
+   (PPERM_REV_INV		0x60)	/* bit reverse & invert src */
+   (PPERM_ZERO			0x80)	/* all 0's */
+   (PPERM_ONES			0xa0)	/* all 1's */
+   (PPERM_SIGN			0xc0)	/* propagate sign bit */
+   (PPERM_INV_SIGN		0xe0)	/* invert & propagate sign */
+   (PPERM_SRC1			0x00)	/* use first source byte */
+   (PPERM_SRC2			0x10)	/* use second source byte */
+   ])
+
 ;; Registers by name.
 (define_constants
   [(AX_REG			 0)
@@ -20609,6 +20629,20 @@
   [(set_attr "type" "fcmov")
    (set_attr "mode" "XF")])
 
+;; All moves in XOP pcmov instructions are 128 bits and hence we restrict
+;; the scalar versions to have only XMM registers as operands.
+
+;; XOP conditional move
+(define_insn "*xop_pcmov_<mode>"
+  [(set (match_operand:MODEF 0 "register_operand" "=x,x")
+	(if_then_else:MODEF
+	  (match_operand:MODEF 1 "register_operand" "x,x")
+	  (match_operand:MODEF 2 "register_operand" "x,x")
+	  (match_operand:MODEF 3 "register_operand" "x,x")))]
+  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)"
+  "vpcmov\t{%1, %3, %2, %0|%0, %2, %3, %1}"
+  [(set_attr "type" "sse4arg")])
+
 ;; These versions of the min/max patterns are intentionally ignorant of
 ;; their behavior wrt -0.0 and NaN (via the commutative operand mark).
 ;; Since both the tree-level MAX_EXPR and the rtl-level SMAX operator
Index: config/i386/cpuid.h
===================================================================
--- config/i386/cpuid.h	(revision 152317)
+++ config/i386/cpuid.h	(working copy)
@@ -49,6 +49,7 @@
 #define bit_LAHF_LM	(1 << 0)
 #define bit_SSE4a	(1 << 6)
 #define bit_FMA4	(1 << 16)
+#define bit_XOP         (1 << 11)
 
 /* %edx */
 #define bit_LM		(1 << 29)
Index: config/i386/x86intrin.h
===================================================================
--- config/i386/x86intrin.h	(revision 152317)
+++ config/i386/x86intrin.h	(working copy)
@@ -58,6 +58,10 @@
 #include <fma4intrin.h>
 #endif
 
+#ifdef __XOP__
+#include <xopintrin.h>
+#endif
+
 #if defined (__AES__) || defined (__PCLMUL__)
 #include <wmmintrin.h>
 #endif
Index: config/i386/sse.md
===================================================================
--- config/i386/sse.md	(revision 152317)
+++ config/i386/sse.md	(working copy)
@@ -86,6 +86,9 @@
 
 (define_mode_attr ssemodesuffixf2c [(V4SF "s") (V2DF "d")])
 
+;; Mapping of the max integer size for xop rotate immediate constraint
+(define_mode_attr sserotatemax [(V16QI "7") (V8HI "15") (V4SI "31") (V2DI "63")])
+
 ;; Mapping of vector modes back to the scalar modes
 (define_mode_attr ssescalarmode [(V4SF "SF") (V2DF "DF")
 				 (V16QI "QI") (V8HI "HI")
@@ -1455,7 +1458,8 @@
 	(match_operator:SSEMODEF4 3 "sse_comparison_operator"
 		[(match_operand:SSEMODEF4 1 "register_operand" "0")
 		 (match_operand:SSEMODEF4 2 "nonimmediate_operand" "xm")]))]
-  "(SSE_FLOAT_MODE_P (<MODE>mode) || SSE_VEC_FLOAT_MODE_P (<MODE>mode))"
+  "(SSE_FLOAT_MODE_P (<MODE>mode) || SSE_VEC_FLOAT_MODE_P (<MODE>mode))
+   && !TARGET_XOP"
   "cmp%D3<ssemodesuffixf4>\t{%2, %0|%0, %2}"
   [(set_attr "type" "ssecmp")
    (set_attr "length_immediate" "1")
@@ -5248,9 +5252,40 @@
   "&& 1"
   [(const_int 0)]
 {
-  rtx t[12];
+  rtx t[12], op[3];
   int i;
 
+  if (TARGET_XOP)
+    {
+      /* On XOP, we can take advantage of the vpperm instruction to pack and
+	 unpack the bytes.  Unpack data such that we've got a source byte in
+	 each low byte of each word.  We don't care what goes into the high
+	 byte, so put 0 there.  */
+      for (i = 0; i < 6; ++i)
+        t[i] = gen_reg_rtx (V8HImode);
+
+      for (i = 0; i < 2; i++)
+        {
+          op[0] = t[i];
+          op[1] = operands[i+1];
+          ix86_expand_xop_unpack (op, true, true);		/* high bytes */
+
+          op[0] = t[i+2];
+          ix86_expand_xop_unpack (op, true, false);		/* low bytes */
+        }
+
+      /* Multiply words.  */
+      emit_insn (gen_mulv8hi3 (t[4], t[0], t[1]));		/* high bytes */
+      emit_insn (gen_mulv8hi3 (t[5], t[2], t[3]));		/* low  bytes */
+
+      /* Pack the low byte of each word back into a single xmm */
+      op[0] = operands[0];
+      op[1] = t[5];
+      op[2] = t[4];
+      ix86_expand_xop_pack (op);
+      DONE;
+    }
+
   for (i = 0; i < 12; ++i)
     t[i] = gen_reg_rtx (V16QImode);
 
@@ -5614,7 +5649,7 @@
 		   (match_operand:V4SI 2 "register_operand" "")))]
   "TARGET_SSE2"
 {
-  if (TARGET_SSE4_1)
+  if (TARGET_SSE4_1 || TARGET_XOP)
     ix86_fixup_binary_operands_no_copy (MULT, V4SImode, operands);
 })
 
@@ -5639,11 +5674,36 @@
    (set_attr "prefix_extra" "1")
    (set_attr "mode" "TI")])
 
+;; We don't have a straight 32-bit parallel multiply on XOP, so fake it with a
+;; multiply/add.  In general, we expect the define_split to occur before
+;; register allocation, so we have to handle the corner case where the target
+;; is the same as one of the inputs.
+(define_insn_and_split "*xop_mulv4si3"
+  [(set (match_operand:V4SI 0 "register_operand" "=&x")
+	(mult:V4SI (match_operand:V4SI 1 "register_operand" "%x")
+		   (match_operand:V4SI 2 "nonimmediate_operand" "xm")))]
+  "TARGET_XOP"
+  "#"
+  "&& (reload_completed
+       || (!reg_mentioned_p (operands[0], operands[1])
+	   && !reg_mentioned_p (operands[0], operands[2])))"
+  [(set (match_dup 0)
+	(match_dup 3))
+   (set (match_dup 0)
+	(plus:V4SI (mult:V4SI (match_dup 1)
+			      (match_dup 2))
+		   (match_dup 0)))]
+{
+  operands[3] = CONST0_RTX (V4SImode);
+}
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "TI")])
+
 (define_insn_and_split "*sse2_mulv4si3"
   [(set (match_operand:V4SI 0 "register_operand" "")
 	(mult:V4SI (match_operand:V4SI 1 "register_operand" "")
 		   (match_operand:V4SI 2 "register_operand" "")))]
-  "TARGET_SSE2 && !TARGET_SSE4_1
+  "TARGET_SSE2 && !TARGET_SSE4_1 && !TARGET_XOP
    && can_create_pseudo_p ()"
   "#"
   "&& 1"
@@ -5705,6 +5765,42 @@
   rtx t1, t2, t3, t4, t5, t6, thirtytwo;
   rtx op0, op1, op2;
 
+  if (TARGET_XOP)
+    {
+      /* op1: A,B,C,D, op2: E,F,G,H */
+      op0 = operands[0];
+      op1 = gen_lowpart (V4SImode, operands[1]);
+      op2 = gen_lowpart (V4SImode, operands[2]);
+      t1 = gen_reg_rtx (V4SImode);
+      t2 = gen_reg_rtx (V4SImode);
+      t3 = gen_reg_rtx (V4SImode);
+      t4 = gen_reg_rtx (V2DImode);
+      t5 = gen_reg_rtx (V2DImode);
+
+      /* t1: B,A,D,C */
+      emit_insn (gen_sse2_pshufd_1 (t1, op1,
+				    GEN_INT (1),
+				    GEN_INT (0),
+				    GEN_INT (3),
+				    GEN_INT (2)));
+
+      /* t2: 0 */
+      emit_move_insn (t2, CONST0_RTX (V4SImode));
+
+      /* t3: (B*E),(A*F),(D*G),(C*H) */
+      emit_insn (gen_xop_pmacsdd (t3, t1, op2, t2));
+
+      /* t4: (B*E)+(A*F), (D*G)+(C*H) */
+      emit_insn (gen_xop_phadddq (t4, t3));
+
+      /* t5: ((B*E)+(A*F))<<32, ((D*G)+(C*H))<<32 */
+      emit_insn (gen_ashlv2di3 (t5, t4, GEN_INT (32)));
+
+      /* op0: (((B*E)+(A*F))<<32)+(B*F), (((D*G)+(C*H))<<32)+(D*H) */
+      emit_insn (gen_xop_pmacsdql (op0, op1, op2, t5));
+      DONE;
+    }
+
   op0 = operands[0];
   op1 = operands[1];
   op2 = operands[2];
@@ -5820,6 +5916,57 @@
   DONE;
 })
 
+(define_expand "vec_widen_smult_hi_v4si"
+  [(match_operand:V2DI 0 "register_operand" "")
+   (match_operand:V4SI 1 "register_operand" "")
+   (match_operand:V4SI 2 "register_operand" "")]
+  "TARGET_XOP"
+{
+  rtx t1, t2;
+
+  t1 = gen_reg_rtx (V4SImode);
+  t2 = gen_reg_rtx (V4SImode);
+
+  emit_insn (gen_sse2_pshufd_1 (t1, operands[1],
+				GEN_INT (0),
+				GEN_INT (2),
+				GEN_INT (1),
+				GEN_INT (3)));
+  emit_insn (gen_sse2_pshufd_1 (t2, operands[2],
+				GEN_INT (0),
+				GEN_INT (2),
+				GEN_INT (1),
+				GEN_INT (3)));
+  emit_insn (gen_xop_mulv2div2di3_high (operands[0], t1, t2));
+  DONE;
+})
+
+(define_expand "vec_widen_smult_lo_v4si"
+  [(match_operand:V2DI 0 "register_operand" "")
+   (match_operand:V4SI 1 "register_operand" "")
+   (match_operand:V4SI 2 "register_operand" "")]
+  "TARGET_XOP"
+{
+  rtx t1, t2;
+
+  t1 = gen_reg_rtx (V4SImode);
+  t2 = gen_reg_rtx (V4SImode);
+
+  emit_insn (gen_sse2_pshufd_1 (t1, operands[1],
+				GEN_INT (0),
+				GEN_INT (2),
+				GEN_INT (1),
+				GEN_INT (3)));
+  emit_insn (gen_sse2_pshufd_1 (t2, operands[2],
+				GEN_INT (0),
+				GEN_INT (2),
+				GEN_INT (1),
+				GEN_INT (3)));
+  emit_insn (gen_xop_mulv2div2di3_low (operands[0], t1, t2));
+  DONE;
+  DONE;
+})
+
 (define_expand "vec_widen_umult_hi_v4si"
   [(match_operand:V2DI 0 "register_operand" "")
    (match_operand:V4SI 1 "register_operand" "")
@@ -6217,7 +6364,7 @@
 	(eq:SSEMODE124
 	  (match_operand:SSEMODE124 1 "nonimmediate_operand" "")
 	  (match_operand:SSEMODE124 2 "nonimmediate_operand" "")))]
-  "TARGET_SSE2"
+  "TARGET_SSE2 && !TARGET_XOP "
   "ix86_fixup_binary_operands_no_copy (EQ, <MODE>mode, operands);")
 
 (define_insn "*avx_eq<mode>3"
@@ -6240,7 +6387,7 @@
 	(eq:SSEMODE124
 	  (match_operand:SSEMODE124 1 "nonimmediate_operand" "%0")
 	  (match_operand:SSEMODE124 2 "nonimmediate_operand" "xm")))]
-  "TARGET_SSE2
+  "TARGET_SSE2 && !TARGET_XOP
    && ix86_binary_operator_ok (EQ, <MODE>mode, operands)"
   "pcmpeq<ssevecsize>\t{%2, %0|%0, %2}"
   [(set_attr "type" "ssecmp")
@@ -6286,7 +6433,7 @@
 	(gt:SSEMODE124
 	  (match_operand:SSEMODE124 1 "register_operand" "0")
 	  (match_operand:SSEMODE124 2 "nonimmediate_operand" "xm")))]
-  "TARGET_SSE2"
+  "TARGET_SSE2 && !TARGET_XOP"
   "pcmpgt<ssevecsize>\t{%2, %0|%0, %2}"
   [(set_attr "type" "ssecmp")
    (set_attr "prefix_data16" "1")
@@ -6505,6 +6652,12 @@
 {
   rtx op1, op2, h1, l1, h2, l2, h3, l3;
 
+  if (TARGET_XOP)
+    {
+      ix86_expand_xop_pack (operands);
+      DONE;	
+    }	
+ 
   op1 = gen_lowpart (V16QImode, operands[1]);
   op2 = gen_lowpart (V16QImode, operands[2]);
   h1 = gen_reg_rtx (V16QImode);
@@ -6540,6 +6693,12 @@
 {
   rtx op1, op2, h1, l1, h2, l2;
 
+  if (TARGET_XOP)
+    {
+      ix86_expand_xop_pack (operands);
+      DONE;	
+    }	
+
   op1 = gen_lowpart (V8HImode, operands[1]);
   op2 = gen_lowpart (V8HImode, operands[2]);
   h1 = gen_reg_rtx (V8HImode);
@@ -6569,6 +6728,12 @@
 {
   rtx op1, op2, h1, l1;
 
+  if (TARGET_XOP)
+    {
+      ix86_expand_xop_pack (operands);
+      DONE;	
+    }	
+
   op1 = gen_lowpart (V4SImode, operands[1]);
   op2 = gen_lowpart (V4SImode, operands[2]);
   h1 = gen_reg_rtx (V4SImode);
@@ -7780,6 +7945,8 @@
 {
   if (TARGET_SSE4_1)
     ix86_expand_sse4_unpack (operands, true, true);
+  else if (TARGET_XOP)
+    ix86_expand_xop_unpack (operands, true, true);
   else
     ix86_expand_sse_unpack (operands, true, true);
   DONE;
@@ -7792,6 +7959,8 @@
 {
   if (TARGET_SSE4_1)
     ix86_expand_sse4_unpack (operands, false, true);
+  else if (TARGET_XOP)
+    ix86_expand_xop_unpack (operands, false, true);
   else
     ix86_expand_sse_unpack (operands, false, true);
   DONE;
@@ -7804,6 +7973,8 @@
 {
   if (TARGET_SSE4_1)
     ix86_expand_sse4_unpack (operands, true, false);
+  else if (TARGET_XOP)
+    ix86_expand_xop_unpack (operands, true, false);
   else
     ix86_expand_sse_unpack (operands, true, false);
   DONE;
@@ -7816,6 +7987,8 @@
 {
   if (TARGET_SSE4_1)
     ix86_expand_sse4_unpack (operands, false, false);
+  else if (TARGET_XOP)
+    ix86_expand_xop_unpack (operands, false, false);
   else
     ix86_expand_sse_unpack (operands, false, false);
   DONE;
@@ -7828,6 +8001,8 @@
 {
   if (TARGET_SSE4_1)
     ix86_expand_sse4_unpack (operands, true, true);
+  else if (TARGET_XOP)
+    ix86_expand_xop_unpack (operands, true, true);
   else
     ix86_expand_sse_unpack (operands, true, true);
   DONE;
@@ -7840,6 +8015,8 @@
 {
   if (TARGET_SSE4_1)
     ix86_expand_sse4_unpack (operands, false, true);
+  else if (TARGET_XOP)
+    ix86_expand_xop_unpack (operands, false, true);
   else
     ix86_expand_sse_unpack (operands, false, true);
   DONE;
@@ -7852,6 +8029,8 @@
 {
   if (TARGET_SSE4_1)
     ix86_expand_sse4_unpack (operands, true, false);
+  else if (TARGET_XOP)
+    ix86_expand_xop_unpack (operands, true, false);
   else
     ix86_expand_sse_unpack (operands, true, false);
   DONE;
@@ -7864,6 +8043,8 @@
 {
   if (TARGET_SSE4_1)
     ix86_expand_sse4_unpack (operands, false, false);
+  else if (TARGET_XOP)
+    ix86_expand_xop_unpack (operands, false, false);
   else
     ix86_expand_sse_unpack (operands, false, false);
   DONE;
@@ -7876,6 +8057,8 @@
 {
   if (TARGET_SSE4_1)
     ix86_expand_sse4_unpack (operands, true, true);
+  else if (TARGET_XOP)
+    ix86_expand_xop_unpack (operands, true, true);
   else
     ix86_expand_sse_unpack (operands, true, true);
   DONE;
@@ -7888,6 +8071,8 @@
 {
   if (TARGET_SSE4_1)
     ix86_expand_sse4_unpack (operands, false, true);
+  else if (TARGET_XOP)
+    ix86_expand_xop_unpack (operands, false, true);
   else
     ix86_expand_sse_unpack (operands, false, true);
   DONE;
@@ -7900,6 +8085,8 @@
 {
   if (TARGET_SSE4_1)
     ix86_expand_sse4_unpack (operands, true, false);
+  else if (TARGET_XOP)
+    ix86_expand_xop_unpack (operands, true, false);
   else
     ix86_expand_sse_unpack (operands, true, false);
   DONE;
@@ -7912,6 +8099,8 @@
 {
   if (TARGET_SSE4_1)
     ix86_expand_sse4_unpack (operands, false, false);
+  else if (TARGET_XOP)
+    ix86_expand_xop_unpack (operands, false, false);
   else
     ix86_expand_sse_unpack (operands, false, false);
   DONE;
@@ -10364,6 +10553,1546 @@
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "TI")])
 
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;;
+;; XOP instructions
+;;
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+
+;; XOP parallel integer multiply/add instructions.
+;; Note the instruction does not allow the value being added to be a memory
+;; operation.  However by pretending via the nonimmediate_operand predicate
+;; that it does and splitting it later allows the following to be recognized:
+;;	a[i] = b[i] * c[i] + d[i];
+(define_insn "xop_pmacsww"
+  [(set (match_operand:V8HI 0 "register_operand" "=x,x,x")
+        (plus:V8HI
+	 (mult:V8HI
+	  (match_operand:V8HI 1 "nonimmediate_operand" "%x,x,xm")
+	  (match_operand:V8HI 2 "nonimmediate_operand" "x,xm,x"))
+	 (match_operand:V8HI 3 "register_operand" "x,x,x")))]
+  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, false, 2, true)"
+  "@
+   vpmacsww\t{%3, %2, %1, %0|%0, %1, %2, %3}
+   vpmacsww\t{%3, %2, %1, %0|%0, %1, %2, %3}
+   vpmacsww\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "TI")])
+
+;; Split pmacsww with two memory operands into a load and the pmacsww.
+(define_split
+  [(set (match_operand:V8HI 0 "register_operand" "")
+	(plus:V8HI
+	 (mult:V8HI (match_operand:V8HI 1 "nonimmediate_operand" "")
+		    (match_operand:V8HI 2 "nonimmediate_operand" ""))
+	 (match_operand:V8HI 3 "nonimmediate_operand" "")))]
+  "TARGET_XOP
+   && !ix86_fma4_valid_op_p (operands, insn, 4, false, 1, true)
+   && ix86_fma4_valid_op_p (operands, insn, 4, false, 2, true)
+   && !reg_mentioned_p (operands[0], operands[1])
+   && !reg_mentioned_p (operands[0], operands[2])
+   && !reg_mentioned_p (operands[0], operands[3])"
+  [(const_int 0)]
+{
+  ix86_expand_fma4_multiple_memory (operands, 4, V8HImode);
+  emit_insn (gen_xop_pmacsww (operands[0], operands[1], operands[2],
+			      operands[3]));
+  DONE;
+})
+
+(define_insn "xop_pmacssww"
+  [(set (match_operand:V8HI 0 "register_operand" "=x,x,x")
+        (ss_plus:V8HI
+	 (mult:V8HI (match_operand:V8HI 1 "nonimmediate_operand" "%x,x,m")
+		    (match_operand:V8HI 2 "nonimmediate_operand" "x,m,x"))
+	 (match_operand:V8HI 3 "register_operand" "x,x,x")))]
+  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, false, 1, true)"
+  "@
+   vpmacssww\t{%3, %2, %1, %0|%0, %1, %2, %3}
+   vpmacssww\t{%3, %2, %1, %0|%0, %1, %2, %3}
+   vpmacssww\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "TI")])
+
+;; Note the instruction does not allow the value being added to be a memory
+;; operation.  However by pretending via the nonimmediate_operand predicate
+;; that it does and splitting it later allows the following to be recognized:
+;;	a[i] = b[i] * c[i] + d[i];
+(define_insn "xop_pmacsdd"
+  [(set (match_operand:V4SI 0 "register_operand" "=x,x,x")
+        (plus:V4SI
+	 (mult:V4SI
+	  (match_operand:V4SI 1 "nonimmediate_operand" "%x,x,m")
+	  (match_operand:V4SI 2 "nonimmediate_operand" "x,m,x"))
+	 (match_operand:V4SI 3 "register_operand" "x,x,x")))]
+  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, false, 2, true)"
+  "@
+   vpmacsdd\t{%3, %2, %1, %0|%0, %1, %2, %3}
+   vpmacsdd\t{%3, %2, %1, %0|%0, %1, %2, %3}
+   vpmacsdd\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "TI")])
+
+;; Split pmacsdd with two memory operands into a load and the pmacsdd.
+(define_split
+  [(set (match_operand:V4SI 0 "register_operand" "")
+	(plus:V4SI
+	 (mult:V4SI (match_operand:V4SI 1 "nonimmediate_operand" "")
+		    (match_operand:V4SI 2 "nonimmediate_operand" ""))
+	 (match_operand:V4SI 3 "nonimmediate_operand" "")))]
+  "TARGET_XOP
+   && !ix86_fma4_valid_op_p (operands, insn, 4, false, 1, true)
+   && ix86_fma4_valid_op_p (operands, insn, 4, false, 2, true)
+   && !reg_mentioned_p (operands[0], operands[1])
+   && !reg_mentioned_p (operands[0], operands[2])
+   && !reg_mentioned_p (operands[0], operands[3])"
+  [(const_int 0)]
+{
+  ix86_expand_fma4_multiple_memory (operands, 4, V4SImode);
+  emit_insn (gen_xop_pmacsdd (operands[0], operands[1], operands[2],
+			      operands[3]));
+  DONE;
+})
+
+(define_insn "xop_pmacssdd"
+  [(set (match_operand:V4SI 0 "register_operand" "=x,x,x")
+        (ss_plus:V4SI
+	 (mult:V4SI (match_operand:V4SI 1 "nonimmediate_operand" "%x,x,m")
+		    (match_operand:V4SI 2 "nonimmediate_operand" "x,m,x"))
+	 (match_operand:V4SI 3 "register_operand" "x,x,x")))]
+  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, false, 1, true)"
+  "@
+   vpmacssdd\t{%3, %2, %1, %0|%0, %1, %2, %3}
+   vpmacssdd\t{%3, %2, %1, %0|%0, %1, %2, %3}
+   vpmacssdd\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "TI")])
+
+(define_insn "xop_pmacssdql"
+  [(set (match_operand:V2DI 0 "register_operand" "=x,x,x")
+	(ss_plus:V2DI
+	 (mult:V2DI
+	  (sign_extend:V2DI
+	   (vec_select:V2SI
+	    (match_operand:V4SI 1 "nonimmediate_operand" "%x,x,m")
+	    (parallel [(const_int 1)
+		       (const_int 3)])))
+	  (vec_select:V2SI
+	   (match_operand:V4SI 2 "nonimmediate_operand" "x,m,x")
+	   (parallel [(const_int 1)
+		      (const_int 3)])))
+	 (match_operand:V2DI 3 "register_operand" "x,x,x")))]
+  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, false, 1, true)"
+  "@
+   vpmacssdql\t{%3, %2, %1, %0|%0, %1, %2, %3}
+   vpmacssdql\t{%3, %2, %1, %0|%0, %1, %2, %3}
+   vpmacssdql\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "TI")])
+
+(define_insn "xop_pmacssdqh"
+  [(set (match_operand:V2DI 0 "register_operand" "=x,x,x")
+	(ss_plus:V2DI
+	 (mult:V2DI
+	  (sign_extend:V2DI
+	   (vec_select:V2SI
+	    (match_operand:V4SI 1 "nonimmediate_operand" "%x,x,m")
+	    (parallel [(const_int 0)
+		       (const_int 2)])))
+	  (sign_extend:V2DI
+	   (vec_select:V2SI
+	    (match_operand:V4SI 2 "nonimmediate_operand" "x,m,x")
+	    (parallel [(const_int 0)
+		       (const_int 2)]))))
+	 (match_operand:V2DI 3 "register_operand" "x,x,x")))]
+  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, false, 1, true)"
+  "@
+   vpmacssdqh\t{%3, %2, %1, %0|%0, %1, %2, %3}
+   vpmacssdqh\t{%3, %2, %1, %0|%0, %1, %2, %3}
+   vpmacssdqh\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "TI")])
+
+(define_insn "xop_pmacsdql"
+  [(set (match_operand:V2DI 0 "register_operand" "=x,x,x")
+	(plus:V2DI
+	 (mult:V2DI
+	  (sign_extend:V2DI
+	   (vec_select:V2SI
+	    (match_operand:V4SI 1 "nonimmediate_operand" "%x,x,m")
+	    (parallel [(const_int 1)
+		       (const_int 3)])))
+	  (sign_extend:V2DI
+	   (vec_select:V2SI
+	    (match_operand:V4SI 2 "nonimmediate_operand" "x,m,x")
+	    (parallel [(const_int 1)
+		       (const_int 3)]))))
+	 (match_operand:V2DI 3 "register_operand" "x,x,x")))]
+  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, false, 1, true)"
+  "@
+   vpmacsdql\t{%3, %2, %1, %0|%0, %1, %2, %3}
+   vpmacsdql\t{%3, %2, %1, %0|%0, %1, %2, %3}
+   vpmacsdql\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "TI")])
+
+(define_insn_and_split "*xop_pmacsdql_mem"
+  [(set (match_operand:V2DI 0 "register_operand" "=&x,&x,&x")
+	(plus:V2DI
+	 (mult:V2DI
+	  (sign_extend:V2DI
+	   (vec_select:V2SI
+	    (match_operand:V4SI 1 "nonimmediate_operand" "%x,x,m")
+	    (parallel [(const_int 1)
+		       (const_int 3)])))
+	  (sign_extend:V2DI
+	   (vec_select:V2SI
+	    (match_operand:V4SI 2 "nonimmediate_operand" "x,m,x")
+	    (parallel [(const_int 1)
+		       (const_int 3)]))))
+	 (match_operand:V2DI 3 "memory_operand" "m,m,m")))]
+  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, false, -1, true)"
+  "#"
+  "&& (reload_completed
+       || (!reg_mentioned_p (operands[0], operands[1])
+	   && !reg_mentioned_p (operands[0], operands[2])))"
+  [(set (match_dup 0)
+	(match_dup 3))
+   (set (match_dup 0)
+	(plus:V2DI
+	 (mult:V2DI
+	  (sign_extend:V2DI
+	   (vec_select:V2SI
+	    (match_dup 1)
+	    (parallel [(const_int 1)
+		       (const_int 3)])))
+	  (sign_extend:V2DI
+	   (vec_select:V2SI
+	    (match_dup 2)
+	    (parallel [(const_int 1)
+		       (const_int 3)]))))
+	 (match_dup 0)))])
+
+;; We don't have a straight 32-bit parallel multiply and extend on XOP, so
+;; fake it with a multiply/add.  In general, we expect the define_split to
+;; occur before register allocation, so we have to handle the corner case where
+;; the target is the same as operands 1/2
+(define_insn_and_split "xop_mulv2div2di3_low"
+  [(set (match_operand:V2DI 0 "register_operand" "=&x")
+	(mult:V2DI
+	  (sign_extend:V2DI
+	    (vec_select:V2SI
+	      (match_operand:V4SI 1 "nonimmediate_operand" "%x")
+	      (parallel [(const_int 1)
+			 (const_int 3)])))
+	  (sign_extend:V2DI
+	    (vec_select:V2SI
+	      (match_operand:V4SI 2 "nonimmediate_operand" "xm")
+	      (parallel [(const_int 1)
+			 (const_int 3)])))))]
+  "TARGET_XOP"
+  "#"
+  "&& (reload_completed
+       || (!reg_mentioned_p (operands[0], operands[1])
+	   && !reg_mentioned_p (operands[0], operands[2])))"
+  [(set (match_dup 0)
+	(match_dup 3))
+   (set (match_dup 0)
+	(plus:V2DI
+	 (mult:V2DI
+	  (sign_extend:V2DI
+	   (vec_select:V2SI
+	    (match_dup 1)
+	    (parallel [(const_int 1)
+		       (const_int 3)])))
+	  (sign_extend:V2DI
+	   (vec_select:V2SI
+	    (match_dup 2)
+	    (parallel [(const_int 1)
+		       (const_int 3)]))))
+	 (match_dup 0)))]
+{
+  operands[3] = CONST0_RTX (V2DImode);
+}
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "TI")])
+
+(define_insn "xop_pmacsdqh"
+  [(set (match_operand:V2DI 0 "register_operand" "=x,x,x")
+	(plus:V2DI
+	 (mult:V2DI
+	  (sign_extend:V2DI
+	   (vec_select:V2SI
+	    (match_operand:V4SI 1 "nonimmediate_operand" "%x,x,m")
+	    (parallel [(const_int 0)
+		       (const_int 2)])))
+	  (sign_extend:V2DI
+	   (vec_select:V2SI
+	    (match_operand:V4SI 2 "nonimmediate_operand" "x,m,x")
+	    (parallel [(const_int 0)
+		       (const_int 2)]))))
+	 (match_operand:V2DI 3 "register_operand" "x,x,x")))]
+  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, false, 1, true)"
+  "@
+   vpmacsdqh\t{%3, %2, %1, %0|%0, %1, %2, %3}
+   vpmacsdqh\t{%3, %2, %1, %0|%0, %1, %2, %3}
+   vpmacsdqh\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "TI")])
+
+(define_insn_and_split "*xop_pmacsdqh_mem"
+  [(set (match_operand:V2DI 0 "register_operand" "=&x,&x,&x")
+	(plus:V2DI
+	 (mult:V2DI
+	  (sign_extend:V2DI
+	   (vec_select:V2SI
+	    (match_operand:V4SI 1 "nonimmediate_operand" "%x,x,m")
+	    (parallel [(const_int 0)
+		       (const_int 2)])))
+	  (sign_extend:V2DI
+	   (vec_select:V2SI
+	    (match_operand:V4SI 2 "nonimmediate_operand" "x,m,x")
+	    (parallel [(const_int 0)
+		       (const_int 2)]))))
+	 (match_operand:V2DI 3 "memory_operand" "m,m,m")))]
+  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, false, -1, true)"
+  "#"
+  "&& (reload_completed
+       || (!reg_mentioned_p (operands[0], operands[1])
+	   && !reg_mentioned_p (operands[0], operands[2])))"
+  [(set (match_dup 0)
+	(match_dup 3))
+   (set (match_dup 0)
+	(plus:V2DI
+	 (mult:V2DI
+	  (sign_extend:V2DI
+	   (vec_select:V2SI
+	    (match_dup 1)
+	    (parallel [(const_int 0)
+		       (const_int 2)])))
+	  (sign_extend:V2DI
+	   (vec_select:V2SI
+	    (match_dup 2)
+	    (parallel [(const_int 0)
+		       (const_int 2)]))))
+	 (match_dup 0)))])
+
+;; We don't have a straight 32-bit parallel multiply and extend on XOP, so
+;; fake it with a multiply/add.  In general, we expect the define_split to
+;; occur before register allocation, so we have to handle the corner case where
+;; the target is the same as either operands[1] or operands[2]
+(define_insn_and_split "xop_mulv2div2di3_high"
+  [(set (match_operand:V2DI 0 "register_operand" "=&x")
+	(mult:V2DI
+	  (sign_extend:V2DI
+	    (vec_select:V2SI
+	      (match_operand:V4SI 1 "nonimmediate_operand" "%x")
+	      (parallel [(const_int 0)
+			 (const_int 2)])))
+	  (sign_extend:V2DI
+	    (vec_select:V2SI
+	      (match_operand:V4SI 2 "nonimmediate_operand" "xm")
+	      (parallel [(const_int 0)
+			 (const_int 2)])))))]
+  "TARGET_XOP"
+  "#"
+  "&& (reload_completed
+       || (!reg_mentioned_p (operands[0], operands[1])
+	   && !reg_mentioned_p (operands[0], operands[2])))"
+  [(set (match_dup 0)
+	(match_dup 3))
+   (set (match_dup 0)
+	(plus:V2DI
+	 (mult:V2DI
+	  (sign_extend:V2DI
+	   (vec_select:V2SI
+	    (match_dup 1)
+	    (parallel [(const_int 0)
+		       (const_int 2)])))
+	  (sign_extend:V2DI
+	   (vec_select:V2SI
+	    (match_dup 2)
+	    (parallel [(const_int 0)
+		       (const_int 2)]))))
+	 (match_dup 0)))]
+{
+  operands[3] = CONST0_RTX (V2DImode);
+}
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "TI")])
+
+;; XOP parallel integer multiply/add instructions for the intrinisics
+(define_insn "xop_pmacsswd"
+  [(set (match_operand:V4SI 0 "register_operand" "=x,x,x")
+	(ss_plus:V4SI
+	 (mult:V4SI
+	  (sign_extend:V4SI
+	   (vec_select:V4HI
+	    (match_operand:V8HI 1 "nonimmediate_operand" "%x,x,m")
+	    (parallel [(const_int 1)
+		       (const_int 3)
+		       (const_int 5)
+		       (const_int 7)])))
+	  (sign_extend:V4SI
+	   (vec_select:V4HI
+	    (match_operand:V8HI 2 "nonimmediate_operand" "x,m,x")
+	    (parallel [(const_int 1)
+		       (const_int 3)
+		       (const_int 5)
+		       (const_int 7)]))))
+	 (match_operand:V4SI 3 "register_operand" "x,x,x")))]
+  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, false, 1, true)"
+  "@
+   vpmacsswd\t{%3, %2, %1, %0|%0, %1, %2, %3}
+   vpmacsswd\t{%3, %2, %1, %0|%0, %1, %2, %3}
+   vpmacsswd\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "TI")])
+
+(define_insn "xop_pmacswd"
+  [(set (match_operand:V4SI 0 "register_operand" "=x,x,x")
+	(plus:V4SI
+	 (mult:V4SI
+	  (sign_extend:V4SI
+	   (vec_select:V4HI
+	    (match_operand:V8HI 1 "nonimmediate_operand" "%x,x,m")
+	    (parallel [(const_int 1)
+		       (const_int 3)
+		       (const_int 5)
+		       (const_int 7)])))
+	  (sign_extend:V4SI
+	   (vec_select:V4HI
+	    (match_operand:V8HI 2 "nonimmediate_operand" "x,m,x")
+	    (parallel [(const_int 1)
+		       (const_int 3)
+		       (const_int 5)
+		       (const_int 7)]))))
+	 (match_operand:V4SI 3 "register_operand" "x,x,x")))]
+  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, false, 1, true)"
+  "@
+   vpmacswd\t{%3, %2, %1, %0|%0, %1, %2, %3}
+   vpmacswd\t{%3, %2, %1, %0|%0, %1, %2, %3}
+   vpmacswd\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "TI")])
+
+(define_insn "xop_pmadcsswd"
+  [(set (match_operand:V4SI 0 "register_operand" "=x,x,x")
+	(ss_plus:V4SI
+	 (plus:V4SI
+	  (mult:V4SI
+	   (sign_extend:V4SI
+	    (vec_select:V4HI
+	     (match_operand:V8HI 1 "nonimmediate_operand" "%x,x,m")
+	     (parallel [(const_int 0)
+			(const_int 2)
+			(const_int 4)
+			(const_int 6)])))
+	   (sign_extend:V4SI
+	    (vec_select:V4HI
+	     (match_operand:V8HI 2 "nonimmediate_operand" "x,m,x")
+	     (parallel [(const_int 0)
+			(const_int 2)
+			(const_int 4)
+			(const_int 6)]))))
+	  (mult:V4SI
+	   (sign_extend:V4SI
+	    (vec_select:V4HI
+	     (match_dup 1)
+	     (parallel [(const_int 1)
+			(const_int 3)
+			(const_int 5)
+			(const_int 7)])))
+	   (sign_extend:V4SI
+	    (vec_select:V4HI
+	     (match_dup 2)
+	     (parallel [(const_int 1)
+			(const_int 3)
+			(const_int 5)
+			(const_int 7)])))))
+	 (match_operand:V4SI 3 "register_operand" "x,x,x")))]
+  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, false, 1, true)"
+  "@
+   vpmadcsswd\t{%3, %2, %1, %0|%0, %1, %2, %3}
+   vpmadcsswd\t{%3, %2, %1, %0|%0, %1, %2, %3}
+   vpmadcsswd\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "TI")])
+
+(define_insn "xop_pmadcswd"
+  [(set (match_operand:V4SI 0 "register_operand" "=x,x,x")
+	(plus:V4SI
+	 (plus:V4SI
+	  (mult:V4SI
+	   (sign_extend:V4SI
+	    (vec_select:V4HI
+	     (match_operand:V8HI 1 "nonimmediate_operand" "%x,x,m")
+	     (parallel [(const_int 0)
+			(const_int 2)
+			(const_int 4)
+			(const_int 6)])))
+	   (sign_extend:V4SI
+	    (vec_select:V4HI
+	     (match_operand:V8HI 2 "nonimmediate_operand" "x,m,x")
+	     (parallel [(const_int 0)
+			(const_int 2)
+			(const_int 4)
+			(const_int 6)]))))
+	  (mult:V4SI
+	   (sign_extend:V4SI
+	    (vec_select:V4HI
+	     (match_dup 1)
+	     (parallel [(const_int 1)
+			(const_int 3)
+			(const_int 5)
+			(const_int 7)])))
+	   (sign_extend:V4SI
+	    (vec_select:V4HI
+	     (match_dup 2)
+	     (parallel [(const_int 1)
+			(const_int 3)
+			(const_int 5)
+			(const_int 7)])))))
+	 (match_operand:V4SI 3 "register_operand" "x,x,x")))]
+  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, false, 1, true)"
+  "@
+   vpmadcswd\t{%3, %2, %1, %0|%0, %1, %2, %3}
+   vpmadcswd\t{%3, %2, %1, %0|%0, %1, %2, %3}
+   vpmadcswd\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "TI")])
+
+;; XOP parallel XMM conditional moves
+(define_insn "xop_pcmov_<mode>"
+  [(set (match_operand:SSEMODE 0 "register_operand" "=x,x,x")
+	(if_then_else:SSEMODE
+	  (match_operand:SSEMODE 3 "nonimmediate_operand" "x,x,xm")
+	  (match_operand:SSEMODE 1 "vector_move_operand" "x,xm,x")
+	  (match_operand:SSEMODE 2 "vector_move_operand" "xm,x,x")))]
+  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)"
+  "@
+   vpcmov\t{%3, %2, %1, %0|%0, %1, %2, %3}
+   vpcmov\t{%3, %2, %1, %0|%0, %1, %2, %3}
+   vpcmov\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "sse4arg")])
+
+(define_insn "xop_pcmov_<mode>256"
+  [(set (match_operand:AVX256MODE 0 "register_operand" "=x,x,x")
+	(if_then_else:AVX256MODE
+	  (match_operand:AVX256MODE 3 "nonimmediate_operand" "x,x,xm")
+	  (match_operand:AVX256MODE 1 "vector_move_operand" "x,xm,x")
+	  (match_operand:AVX256MODE 2 "vector_move_operand" "xm,x,x")))]
+  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)"
+  "@
+   vpcmov\t{%3, %2, %1, %0|%0, %1, %2, %3}
+   vpcmov\t{%3, %2, %1, %0|%0, %1, %2, %3}
+   vpcmov\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "sse4arg")])
+
+;; XOP horizontal add/subtract instructions
+(define_insn "xop_phaddbw"
+  [(set (match_operand:V8HI 0 "register_operand" "=x")
+	(plus:V8HI
+	 (sign_extend:V8HI
+	  (vec_select:V8QI
+	   (match_operand:V16QI 1 "nonimmediate_operand" "xm")
+	   (parallel [(const_int 0)
+		      (const_int 2)
+		      (const_int 4)
+		      (const_int 6)
+		      (const_int 8)
+		      (const_int 10)
+		      (const_int 12)
+		      (const_int 14)])))
+	 (sign_extend:V8HI
+	  (vec_select:V8QI
+	   (match_dup 1)
+	   (parallel [(const_int 1)
+		      (const_int 3)
+		      (const_int 5)
+		      (const_int 7)
+		      (const_int 9)
+		      (const_int 11)
+		      (const_int 13)
+		      (const_int 15)])))))]
+  "TARGET_XOP"
+  "vphaddbw\t{%1, %0|%0, %1}"
+  [(set_attr "type" "sseiadd1")])
+
+(define_insn "xop_phaddbd"
+  [(set (match_operand:V4SI 0 "register_operand" "=x")
+	(plus:V4SI
+	 (plus:V4SI
+	  (sign_extend:V4SI
+	   (vec_select:V4QI
+	    (match_operand:V16QI 1 "nonimmediate_operand" "xm")
+	    (parallel [(const_int 0)
+		       (const_int 4)
+		       (const_int 8)
+		       (const_int 12)])))
+	  (sign_extend:V4SI
+	   (vec_select:V4QI
+	    (match_dup 1)
+	    (parallel [(const_int 1)
+		       (const_int 5)
+		       (const_int 9)
+		       (const_int 13)]))))
+	 (plus:V4SI
+	  (sign_extend:V4SI
+	   (vec_select:V4QI
+	    (match_dup 1)
+	    (parallel [(const_int 2)
+		       (const_int 6)
+		       (const_int 10)
+		       (const_int 14)])))
+	  (sign_extend:V4SI
+	   (vec_select:V4QI
+	    (match_dup 1)
+	    (parallel [(const_int 3)
+		       (const_int 7)
+		       (const_int 11)
+		       (const_int 15)]))))))]
+  "TARGET_XOP"
+  "vphaddbd\t{%1, %0|%0, %1}"
+  [(set_attr "type" "sseiadd1")])
+
+(define_insn "xop_phaddbq"
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
+	(plus:V2DI
+	 (plus:V2DI
+	  (plus:V2DI
+	   (sign_extend:V2DI
+	    (vec_select:V2QI
+	     (match_operand:V16QI 1 "nonimmediate_operand" "xm")
+	     (parallel [(const_int 0)
+			(const_int 4)])))
+	   (sign_extend:V2DI
+	    (vec_select:V2QI
+	     (match_dup 1)
+	     (parallel [(const_int 1)
+			(const_int 5)]))))
+	  (plus:V2DI
+	   (sign_extend:V2DI
+	    (vec_select:V2QI
+	     (match_dup 1)
+	     (parallel [(const_int 2)
+			(const_int 6)])))
+	   (sign_extend:V2DI
+	    (vec_select:V2QI
+	     (match_dup 1)
+	     (parallel [(const_int 3)
+			(const_int 7)])))))
+	 (plus:V2DI
+	  (plus:V2DI
+	   (sign_extend:V2DI
+	    (vec_select:V2QI
+	     (match_dup 1)
+	     (parallel [(const_int 8)
+			(const_int 12)])))
+	   (sign_extend:V2DI
+	    (vec_select:V2QI
+	     (match_dup 1)
+	     (parallel [(const_int 9)
+			(const_int 13)]))))
+	  (plus:V2DI
+	   (sign_extend:V2DI
+	    (vec_select:V2QI
+	     (match_dup 1)
+	     (parallel [(const_int 10)
+			(const_int 14)])))
+	   (sign_extend:V2DI
+	    (vec_select:V2QI
+	     (match_dup 1)
+	     (parallel [(const_int 11)
+			(const_int 15)])))))))]
+  "TARGET_XOP"
+  "vphaddbq\t{%1, %0|%0, %1}"
+  [(set_attr "type" "sseiadd1")])
+
+(define_insn "xop_phaddwd"
+  [(set (match_operand:V4SI 0 "register_operand" "=x")
+	(plus:V4SI
+	 (sign_extend:V4SI
+	  (vec_select:V4HI
+	   (match_operand:V8HI 1 "nonimmediate_operand" "xm")
+	   (parallel [(const_int 0)
+		      (const_int 2)
+		      (const_int 4)
+		      (const_int 6)])))
+	 (sign_extend:V4SI
+	  (vec_select:V4HI
+	   (match_dup 1)
+	   (parallel [(const_int 1)
+		      (const_int 3)
+		      (const_int 5)
+		      (const_int 7)])))))]
+  "TARGET_XOP"
+  "vphaddwd\t{%1, %0|%0, %1}"
+  [(set_attr "type" "sseiadd1")])
+
+(define_insn "xop_phaddwq"
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
+	(plus:V2DI
+	 (plus:V2DI
+	  (sign_extend:V2DI
+	   (vec_select:V2HI
+	    (match_operand:V8HI 1 "nonimmediate_operand" "xm")
+	    (parallel [(const_int 0)
+		       (const_int 4)])))
+	  (sign_extend:V2DI
+	   (vec_select:V2HI
+	    (match_dup 1)
+	    (parallel [(const_int 1)
+		       (const_int 5)]))))
+	 (plus:V2DI
+	  (sign_extend:V2DI
+	   (vec_select:V2HI
+	    (match_dup 1)
+	    (parallel [(const_int 2)
+		       (const_int 6)])))
+	  (sign_extend:V2DI
+	   (vec_select:V2HI
+	    (match_dup 1)
+	    (parallel [(const_int 3)
+		       (const_int 7)]))))))]
+  "TARGET_XOP"
+  "vphaddwq\t{%1, %0|%0, %1}"
+  [(set_attr "type" "sseiadd1")])
+
+(define_insn "xop_phadddq"
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
+	(plus:V2DI
+	 (sign_extend:V2DI
+	  (vec_select:V2SI
+	   (match_operand:V4SI 1 "nonimmediate_operand" "xm")
+	   (parallel [(const_int 0)
+		      (const_int 2)])))
+	 (sign_extend:V2DI
+	  (vec_select:V2SI
+	   (match_dup 1)
+	   (parallel [(const_int 1)
+		      (const_int 3)])))))]
+  "TARGET_XOP"
+  "vphadddq\t{%1, %0|%0, %1}"
+  [(set_attr "type" "sseiadd1")])
+
+(define_insn "xop_phaddubw"
+  [(set (match_operand:V8HI 0 "register_operand" "=x")
+	(plus:V8HI
+	 (zero_extend:V8HI
+	  (vec_select:V8QI
+	   (match_operand:V16QI 1 "nonimmediate_operand" "xm")
+	   (parallel [(const_int 0)
+		      (const_int 2)
+		      (const_int 4)
+		      (const_int 6)
+		      (const_int 8)
+		      (const_int 10)
+		      (const_int 12)
+		      (const_int 14)])))
+	 (zero_extend:V8HI
+	  (vec_select:V8QI
+	   (match_dup 1)
+	   (parallel [(const_int 1)
+		      (const_int 3)
+		      (const_int 5)
+		      (const_int 7)
+		      (const_int 9)
+		      (const_int 11)
+		      (const_int 13)
+		      (const_int 15)])))))]
+  "TARGET_XOP"
+  "vphaddubw\t{%1, %0|%0, %1}"
+  [(set_attr "type" "sseiadd1")])
+
+(define_insn "xop_phaddubd"
+  [(set (match_operand:V4SI 0 "register_operand" "=x")
+	(plus:V4SI
+	 (plus:V4SI
+	  (zero_extend:V4SI
+	   (vec_select:V4QI
+	    (match_operand:V16QI 1 "nonimmediate_operand" "xm")
+	    (parallel [(const_int 0)
+		       (const_int 4)
+		       (const_int 8)
+		       (const_int 12)])))
+	  (zero_extend:V4SI
+	   (vec_select:V4QI
+	    (match_dup 1)
+	    (parallel [(const_int 1)
+		       (const_int 5)
+		       (const_int 9)
+		       (const_int 13)]))))
+	 (plus:V4SI
+	  (zero_extend:V4SI
+	   (vec_select:V4QI
+	    (match_dup 1)
+	    (parallel [(const_int 2)
+		       (const_int 6)
+		       (const_int 10)
+		       (const_int 14)])))
+	  (zero_extend:V4SI
+	   (vec_select:V4QI
+	    (match_dup 1)
+	    (parallel [(const_int 3)
+		       (const_int 7)
+		       (const_int 11)
+		       (const_int 15)]))))))]
+  "TARGET_XOP"
+  "vphaddubd\t{%1, %0|%0, %1}"
+  [(set_attr "type" "sseiadd1")])
+
+(define_insn "xop_phaddubq"
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
+	(plus:V2DI
+	 (plus:V2DI
+	  (plus:V2DI
+	   (zero_extend:V2DI
+	    (vec_select:V2QI
+	     (match_operand:V16QI 1 "nonimmediate_operand" "xm")
+	     (parallel [(const_int 0)
+			(const_int 4)])))
+	   (sign_extend:V2DI
+	    (vec_select:V2QI
+	     (match_dup 1)
+	     (parallel [(const_int 1)
+			(const_int 5)]))))
+	  (plus:V2DI
+	   (zero_extend:V2DI
+	    (vec_select:V2QI
+	     (match_dup 1)
+	     (parallel [(const_int 2)
+			(const_int 6)])))
+	   (zero_extend:V2DI
+	    (vec_select:V2QI
+	     (match_dup 1)
+	     (parallel [(const_int 3)
+			(const_int 7)])))))
+	 (plus:V2DI
+	  (plus:V2DI
+	   (zero_extend:V2DI
+	    (vec_select:V2QI
+	     (match_dup 1)
+	     (parallel [(const_int 8)
+			(const_int 12)])))
+	   (sign_extend:V2DI
+	    (vec_select:V2QI
+	     (match_dup 1)
+	     (parallel [(const_int 9)
+			(const_int 13)]))))
+	  (plus:V2DI
+	   (zero_extend:V2DI
+	    (vec_select:V2QI
+	     (match_dup 1)
+	     (parallel [(const_int 10)
+			(const_int 14)])))
+	   (zero_extend:V2DI
+	    (vec_select:V2QI
+	     (match_dup 1)
+	     (parallel [(const_int 11)
+			(const_int 15)])))))))]
+  "TARGET_XOP"
+  "vphaddubq\t{%1, %0|%0, %1}"
+  [(set_attr "type" "sseiadd1")])
+
+(define_insn "xop_phadduwd"
+  [(set (match_operand:V4SI 0 "register_operand" "=x")
+	(plus:V4SI
+	 (zero_extend:V4SI
+	  (vec_select:V4HI
+	   (match_operand:V8HI 1 "nonimmediate_operand" "xm")
+	   (parallel [(const_int 0)
+		      (const_int 2)
+		      (const_int 4)
+		      (const_int 6)])))
+	 (zero_extend:V4SI
+	  (vec_select:V4HI
+	   (match_dup 1)
+	   (parallel [(const_int 1)
+		      (const_int 3)
+		      (const_int 5)
+		      (const_int 7)])))))]
+  "TARGET_XOP"
+  "vphadduwd\t{%1, %0|%0, %1}"
+  [(set_attr "type" "sseiadd1")])
+
+(define_insn "xop_phadduwq"
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
+	(plus:V2DI
+	 (plus:V2DI
+	  (zero_extend:V2DI
+	   (vec_select:V2HI
+	    (match_operand:V8HI 1 "nonimmediate_operand" "xm")
+	    (parallel [(const_int 0)
+		       (const_int 4)])))
+	  (zero_extend:V2DI
+	   (vec_select:V2HI
+	    (match_dup 1)
+	    (parallel [(const_int 1)
+		       (const_int 5)]))))
+	 (plus:V2DI
+	  (zero_extend:V2DI
+	   (vec_select:V2HI
+	    (match_dup 1)
+	    (parallel [(const_int 2)
+		       (const_int 6)])))
+	  (zero_extend:V2DI
+	   (vec_select:V2HI
+	    (match_dup 1)
+	    (parallel [(const_int 3)
+		       (const_int 7)]))))))]
+  "TARGET_XOP"
+  "vphadduwq\t{%1, %0|%0, %1}"
+  [(set_attr "type" "sseiadd1")])
+
+(define_insn "xop_phaddudq"
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
+	(plus:V2DI
+	 (zero_extend:V2DI
+	  (vec_select:V2SI
+	   (match_operand:V4SI 1 "nonimmediate_operand" "xm")
+	   (parallel [(const_int 0)
+		      (const_int 2)])))
+	 (zero_extend:V2DI
+	  (vec_select:V2SI
+	   (match_dup 1)
+	   (parallel [(const_int 1)
+		      (const_int 3)])))))]
+  "TARGET_XOP"
+  "vphaddudq\t{%1, %0|%0, %1}"
+  [(set_attr "type" "sseiadd1")])
+
+(define_insn "xop_phsubbw"
+  [(set (match_operand:V8HI 0 "register_operand" "=x")
+	(minus:V8HI
+	 (sign_extend:V8HI
+	  (vec_select:V8QI
+	   (match_operand:V16QI 1 "nonimmediate_operand" "xm")
+	   (parallel [(const_int 0)
+		      (const_int 2)
+		      (const_int 4)
+		      (const_int 6)
+		      (const_int 8)
+		      (const_int 10)
+		      (const_int 12)
+		      (const_int 14)])))
+	 (sign_extend:V8HI
+	  (vec_select:V8QI
+	   (match_dup 1)
+	   (parallel [(const_int 1)
+		      (const_int 3)
+		      (const_int 5)
+		      (const_int 7)
+		      (const_int 9)
+		      (const_int 11)
+		      (const_int 13)
+		      (const_int 15)])))))]
+  "TARGET_XOP"
+  "vphsubbw\t{%1, %0|%0, %1}"
+  [(set_attr "type" "sseiadd1")])
+
+(define_insn "xop_phsubwd"
+  [(set (match_operand:V4SI 0 "register_operand" "=x")
+	(minus:V4SI
+	 (sign_extend:V4SI
+	  (vec_select:V4HI
+	   (match_operand:V8HI 1 "nonimmediate_operand" "xm")
+	   (parallel [(const_int 0)
+		      (const_int 2)
+		      (const_int 4)
+		      (const_int 6)])))
+	 (sign_extend:V4SI
+	  (vec_select:V4HI
+	   (match_dup 1)
+	   (parallel [(const_int 1)
+		      (const_int 3)
+		      (const_int 5)
+		      (const_int 7)])))))]
+  "TARGET_XOP"
+  "vphsubwd\t{%1, %0|%0, %1}"
+  [(set_attr "type" "sseiadd1")])
+
+(define_insn "xop_phsubdq"
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
+	(minus:V2DI
+	 (sign_extend:V2DI
+	  (vec_select:V2SI
+	   (match_operand:V4SI 1 "nonimmediate_operand" "xm")
+	   (parallel [(const_int 0)
+		      (const_int 2)])))
+	 (sign_extend:V2DI
+	  (vec_select:V2SI
+	   (match_dup 1)
+	   (parallel [(const_int 1)
+		      (const_int 3)])))))]
+  "TARGET_XOP"
+  "vphsubdq\t{%1, %0|%0, %1}"
+  [(set_attr "type" "sseiadd1")])
+
+;; XOP permute instructions
+(define_insn "xop_pperm"
+  [(set (match_operand:V16QI 0 "register_operand" "=x,x,x")
+	(unspec:V16QI
+	  [(match_operand:V16QI 1 "nonimmediate_operand" "x,x,xm")
+	   (match_operand:V16QI 2 "nonimmediate_operand" "x,xm,x")
+	   (match_operand:V16QI 3 "nonimmediate_operand" "xm,x,x")]
+	  UNSPEC_XOP_PERMUTE))]
+  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)"
+  "vpperm\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "sse4arg")
+   (set_attr "mode" "TI")])
+
+;; The following are for the various unpack insns which doesn't need the first
+;; source operand, so we can just use the output operand for the first operand.
+;; This allows either of the other two operands to be a memory operand.  We
+;; can't just use the first operand as an argument to the normal pperm because
+;; then an output only argument, suddenly becomes an input operand.
+(define_insn "xop_pperm_zero_v16qi_v8hi"
+  [(set (match_operand:V8HI 0 "register_operand" "=x,x")
+	(zero_extend:V8HI
+	 (vec_select:V8QI
+	  (match_operand:V16QI 1 "nonimmediate_operand" "xm,x")
+	  (match_operand 2 "" ""))))	;; parallel with const_int's
+   (use (match_operand:V16QI 3 "nonimmediate_operand" "x,xm"))]
+  "TARGET_XOP
+   && (register_operand (operands[1], V16QImode)
+       || register_operand (operands[2], V16QImode))"
+  "vpperm\t{%3, %1, %0, %0|%0, %0, %1, %3}"
+  [(set_attr "type" "sseadd")
+   (set_attr "prefix_data16" "0")
+   (set_attr "prefix_extra" "2")
+   (set_attr "mode" "TI")])
+
+(define_insn "xop_pperm_sign_v16qi_v8hi"
+  [(set (match_operand:V8HI 0 "register_operand" "=x,x")
+	(sign_extend:V8HI
+	 (vec_select:V8QI
+	  (match_operand:V16QI 1 "nonimmediate_operand" "xm,x")
+	  (match_operand 2 "" ""))))	;; parallel with const_int's
+   (use (match_operand:V16QI 3 "nonimmediate_operand" "x,xm"))]
+  "TARGET_XOP
+   && (register_operand (operands[1], V16QImode)
+       || register_operand (operands[2], V16QImode))"
+  "vpperm\t{%3, %1, %0, %0|%0, %0, %1, %3}"
+  [(set_attr "type" "sseadd")
+   (set_attr "prefix_data16" "0")
+   (set_attr "prefix_extra" "2")
+   (set_attr "mode" "TI")])
+
+(define_insn "xop_pperm_zero_v8hi_v4si"
+  [(set (match_operand:V4SI 0 "register_operand" "=x,x")
+	(zero_extend:V4SI
+	 (vec_select:V4HI
+	  (match_operand:V8HI 1 "nonimmediate_operand" "xm,x")
+	  (match_operand 2 "" ""))))	;; parallel with const_int's
+   (use (match_operand:V16QI 3 "nonimmediate_operand" "x,xm"))]
+  "TARGET_XOP
+   && (register_operand (operands[1], V8HImode)
+       || register_operand (operands[2], V16QImode))"
+  "vpperm\t{%3, %1, %0, %0|%0, %0, %1, %3}"
+  [(set_attr "type" "sseadd")
+   (set_attr "prefix_data16" "0")
+   (set_attr "prefix_extra" "2")
+   (set_attr "mode" "TI")])
+
+(define_insn "xop_pperm_sign_v8hi_v4si"
+  [(set (match_operand:V4SI 0 "register_operand" "=x,x")
+	(sign_extend:V4SI
+	 (vec_select:V4HI
+	  (match_operand:V8HI 1 "nonimmediate_operand" "xm,x")
+	  (match_operand 2 "" ""))))	;; parallel with const_int's
+   (use (match_operand:V16QI 3 "nonimmediate_operand" "x,xm"))]
+  "TARGET_XOP
+   && (register_operand (operands[1], V8HImode)
+       || register_operand (operands[2], V16QImode))"
+  "vpperm\t{%3, %1, %0, %0|%0, %0, %1, %3}"
+  [(set_attr "type" "sseadd")
+   (set_attr "prefix_data16" "0")
+   (set_attr "prefix_extra" "2")
+   (set_attr "mode" "TI")])
+
+(define_insn "xop_pperm_zero_v4si_v2di"
+  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
+	(zero_extend:V2DI
+	 (vec_select:V2SI
+	  (match_operand:V4SI 1 "nonimmediate_operand" "xm,x")
+	  (match_operand 2 "" ""))))	;; parallel with const_int's
+   (use (match_operand:V16QI 3 "nonimmediate_operand" "x,xm"))]
+  "TARGET_XOP
+   && (register_operand (operands[1], V4SImode)
+       || register_operand (operands[2], V16QImode))"
+  "vpperm\t{%3, %1, %0, %0|%0, %0, %1, %3}"
+  [(set_attr "type" "sseadd")
+   (set_attr "prefix_data16" "0")
+   (set_attr "prefix_extra" "2")
+   (set_attr "mode" "TI")])
+
+(define_insn "xop_pperm_sign_v4si_v2di"
+  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
+	(sign_extend:V2DI
+	 (vec_select:V2SI
+	  (match_operand:V4SI 1 "nonimmediate_operand" "xm,x")
+	  (match_operand 2 "" ""))))	;; parallel with const_int's
+   (use (match_operand:V16QI 3 "nonimmediate_operand" "x,xm"))]
+  "TARGET_XOP
+   && (register_operand (operands[1], V4SImode)
+       || register_operand (operands[2], V16QImode))"
+  "vpperm\t{%3, %1, %0, %0|%0, %0, %1, %3}"
+  [(set_attr "type" "sseadd")
+   (set_attr "prefix_data16" "0")
+   (set_attr "prefix_extra" "2")
+   (set_attr "mode" "TI")])
+
+;; XOP pack instructions that combine two vectors into a smaller vector
+(define_insn "xop_pperm_pack_v2di_v4si"
+  [(set (match_operand:V4SI 0 "register_operand" "=x,x,x")
+	(vec_concat:V4SI
+	 (truncate:V2SI
+	  (match_operand:V2DI 1 "nonimmediate_operand" "x,x,xm"))
+	 (truncate:V2SI
+	  (match_operand:V2DI 2 "nonimmediate_operand" "x,xm,x"))))
+   (use (match_operand:V16QI 3 "nonimmediate_operand" "xm,x,x"))]
+  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)"
+  "vpperm\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "sse4arg")
+   (set_attr "mode" "TI")])
+
+(define_insn "xop_pperm_pack_v4si_v8hi"
+  [(set (match_operand:V8HI 0 "register_operand" "=x,x,x")
+	(vec_concat:V8HI
+	 (truncate:V4HI
+	  (match_operand:V4SI 1 "nonimmediate_operand" "x,x,xm"))
+	 (truncate:V4HI
+	  (match_operand:V4SI 2 "nonimmediate_operand" "x,xm,x"))))
+   (use (match_operand:V16QI 3 "nonimmediate_operand" "xm,x,x"))]
+  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)"
+  "vpperm\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "sse4arg")
+   (set_attr "mode" "TI")])
+
+(define_insn "xop_pperm_pack_v8hi_v16qi"
+  [(set (match_operand:V16QI 0 "register_operand" "=x,x,x")
+	(vec_concat:V16QI
+	 (truncate:V8QI
+	  (match_operand:V8HI 1 "nonimmediate_operand" "x,x,xm"))
+	 (truncate:V8QI
+	  (match_operand:V8HI 2 "nonimmediate_operand" "x,xm,x"))))
+   (use (match_operand:V16QI 3 "nonimmediate_operand" "xm,x,x"))]
+  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)"
+  "vpperm\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "sse4arg")
+   (set_attr "mode" "TI")])
+
+;; XOP packed rotate instructions
+(define_expand "rotl<mode>3"
+  [(set (match_operand:SSEMODE1248 0 "register_operand" "")
+	(rotate:SSEMODE1248
+	 (match_operand:SSEMODE1248 1 "nonimmediate_operand" "")
+	 (match_operand:SI 2 "general_operand")))]
+  "TARGET_XOP"
+{
+  /* If we were given a scalar, convert it to parallel */
+  if (! const_0_to_<sserotatemax>_operand (operands[2], SImode))
+    {
+      rtvec vs = rtvec_alloc (<ssescalarnum>);
+      rtx par = gen_rtx_PARALLEL (<MODE>mode, vs);
+      rtx reg = gen_reg_rtx (<MODE>mode);
+      rtx op2 = operands[2];
+      int i;
+
+      if (GET_MODE (op2) != <ssescalarmode>mode)
+        {
+	  op2 = gen_reg_rtx (<ssescalarmode>mode);
+	  convert_move (op2, operands[2], false);
+	}
+
+      for (i = 0; i < <ssescalarnum>; i++)
+	RTVEC_ELT (vs, i) = op2;
+
+      emit_insn (gen_vec_init<mode> (reg, par));
+      emit_insn (gen_xop_vrotl<mode>3 (operands[0], operands[1], reg));
+      DONE;
+    }
+})
+
+(define_expand "rotr<mode>3"
+  [(set (match_operand:SSEMODE1248 0 "register_operand" "")
+	(rotatert:SSEMODE1248
+	 (match_operand:SSEMODE1248 1 "nonimmediate_operand" "")
+	 (match_operand:SI 2 "general_operand")))]
+  "TARGET_XOP"
+{
+  /* If we were given a scalar, convert it to parallel */
+  if (! const_0_to_<sserotatemax>_operand (operands[2], SImode))
+    {
+      rtvec vs = rtvec_alloc (<ssescalarnum>);
+      rtx par = gen_rtx_PARALLEL (<MODE>mode, vs);
+      rtx neg = gen_reg_rtx (<MODE>mode);
+      rtx reg = gen_reg_rtx (<MODE>mode);
+      rtx op2 = operands[2];
+      int i;
+
+      if (GET_MODE (op2) != <ssescalarmode>mode)
+        {
+	  op2 = gen_reg_rtx (<ssescalarmode>mode);
+	  convert_move (op2, operands[2], false);
+	}
+
+      for (i = 0; i < <ssescalarnum>; i++)
+	RTVEC_ELT (vs, i) = op2;
+
+      emit_insn (gen_vec_init<mode> (reg, par));
+      emit_insn (gen_neg<mode>2 (neg, reg));
+      emit_insn (gen_xop_vrotl<mode>3 (operands[0], operands[1], neg));
+      DONE;
+    }
+})
+
+(define_insn "xop_rotl<mode>3"
+  [(set (match_operand:SSEMODE1248 0 "register_operand" "=x")
+	(rotate:SSEMODE1248
+	 (match_operand:SSEMODE1248 1 "nonimmediate_operand" "xm")
+	 (match_operand:SI 2 "const_0_to_<sserotatemax>_operand" "n")))]
+  "TARGET_XOP"
+  "vprot<ssevecsize>\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "sseishft")
+   (set_attr "length_immediate" "1")
+   (set_attr "mode" "TI")])
+
+(define_insn "xop_rotr<mode>3"
+  [(set (match_operand:SSEMODE1248 0 "register_operand" "=x")
+	(rotatert:SSEMODE1248
+	 (match_operand:SSEMODE1248 1 "nonimmediate_operand" "xm")
+	 (match_operand:SI 2 "const_0_to_<sserotatemax>_operand" "n")))]
+  "TARGET_XOP"
+{
+  operands[3] = GEN_INT ((<ssescalarnum> * 8) - INTVAL (operands[2]));
+  return \"vprot<ssevecsize>\t{%3, %1, %0|%0, %1, %3}\";
+}
+  [(set_attr "type" "sseishft")
+   (set_attr "length_immediate" "1")
+   (set_attr "mode" "TI")])
+
+(define_expand "vrotr<mode>3"
+  [(match_operand:SSEMODE1248 0 "register_operand" "")
+   (match_operand:SSEMODE1248 1 "register_operand" "")
+   (match_operand:SSEMODE1248 2 "register_operand" "")]
+  "TARGET_XOP"
+{
+  rtx reg = gen_reg_rtx (<MODE>mode);
+  emit_insn (gen_neg<mode>2 (reg, operands[2]));
+  emit_insn (gen_xop_vrotl<mode>3 (operands[0], operands[1], reg));
+  DONE;
+})
+
+(define_expand "vrotl<mode>3"
+  [(match_operand:SSEMODE1248 0 "register_operand" "")
+   (match_operand:SSEMODE1248 1 "register_operand" "")
+   (match_operand:SSEMODE1248 2 "register_operand" "")]
+  "TARGET_XOP"
+{
+  emit_insn (gen_xop_vrotl<mode>3 (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
+(define_insn "xop_vrotl<mode>3"
+  [(set (match_operand:SSEMODE1248 0 "register_operand" "=x,x")
+	(if_then_else:SSEMODE1248
+	 (ge:SSEMODE1248
+	  (match_operand:SSEMODE1248 2 "nonimmediate_operand" "xm,x")
+	  (const_int 0))
+	 (rotate:SSEMODE1248
+	  (match_operand:SSEMODE1248 1 "nonimmediate_operand" "x,xm")
+	  (match_dup 2))
+	 (rotatert:SSEMODE1248
+	  (match_dup 1)
+	  (neg:SSEMODE1248 (match_dup 2)))))]
+  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 3, true, 1, false)"
+  "vprot<ssevecsize>\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "sseishft")
+   (set_attr "prefix_data16" "0")
+   (set_attr "prefix_extra" "2")
+   (set_attr "mode" "TI")])
+
+;; XOP packed shift instructions.
+;; FIXME: add V2DI back in
+(define_expand "vlshr<mode>3"
+  [(match_operand:SSEMODE124 0 "register_operand" "")
+   (match_operand:SSEMODE124 1 "register_operand" "")
+   (match_operand:SSEMODE124 2 "register_operand" "")]
+  "TARGET_XOP"
+{
+  rtx neg = gen_reg_rtx (<MODE>mode);
+  emit_insn (gen_neg<mode>2 (neg, operands[2]));
+  emit_insn (gen_xop_lshl<mode>3 (operands[0], operands[1], neg));
+  DONE;
+})
+
+(define_expand "vashr<mode>3"
+  [(match_operand:SSEMODE124 0 "register_operand" "")
+   (match_operand:SSEMODE124 1 "register_operand" "")
+   (match_operand:SSEMODE124 2 "register_operand" "")]
+  "TARGET_XOP"
+{
+  rtx neg = gen_reg_rtx (<MODE>mode);
+  emit_insn (gen_neg<mode>2 (neg, operands[2]));
+  emit_insn (gen_xop_ashl<mode>3 (operands[0], operands[1], neg));
+  DONE;
+})
+
+(define_expand "vashl<mode>3"
+  [(match_operand:SSEMODE124 0 "register_operand" "")
+   (match_operand:SSEMODE124 1 "register_operand" "")
+   (match_operand:SSEMODE124 2 "register_operand" "")]
+  "TARGET_XOP"
+{
+  emit_insn (gen_xop_ashl<mode>3 (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
+(define_insn "xop_ashl<mode>3"
+  [(set (match_operand:SSEMODE1248 0 "register_operand" "=x,x")
+	(if_then_else:SSEMODE1248
+	 (ge:SSEMODE1248
+	  (match_operand:SSEMODE1248 2 "nonimmediate_operand" "xm,x")
+	  (const_int 0))
+	 (ashift:SSEMODE1248
+	  (match_operand:SSEMODE1248 1 "nonimmediate_operand" "x,xm")
+	  (match_dup 2))
+	 (ashiftrt:SSEMODE1248
+	  (match_dup 1)
+	  (neg:SSEMODE1248 (match_dup 2)))))]
+  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 3, true, 1, false)"
+  "vpsha<ssevecsize>\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "sseishft")
+   (set_attr "prefix_data16" "0")
+   (set_attr "prefix_extra" "2")
+   (set_attr "mode" "TI")])
+
+(define_insn "xop_lshl<mode>3"
+  [(set (match_operand:SSEMODE1248 0 "register_operand" "=x,x")
+	(if_then_else:SSEMODE1248
+	 (ge:SSEMODE1248
+	  (match_operand:SSEMODE1248 2 "nonimmediate_operand" "xm,x")
+	  (const_int 0))
+	 (ashift:SSEMODE1248
+	  (match_operand:SSEMODE1248 1 "nonimmediate_operand" "x,xm")
+	  (match_dup 2))
+	 (lshiftrt:SSEMODE1248
+	  (match_dup 1)
+	  (neg:SSEMODE1248 (match_dup 2)))))]
+  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 3, true, 1, false)"
+  "vpshl<ssevecsize>\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "sseishft")
+   (set_attr "prefix_data16" "0")
+   (set_attr "prefix_extra" "2")
+   (set_attr "mode" "TI")])
+
+;; SSE2 doesn't have some shift varients, so define versions for XOP
+(define_expand "ashlv16qi3"
+  [(match_operand:V16QI 0 "register_operand" "")
+   (match_operand:V16QI 1 "register_operand" "")
+   (match_operand:SI 2 "nonmemory_operand" "")]
+  "TARGET_XOP"
+{
+  rtvec vs = rtvec_alloc (16);
+  rtx par = gen_rtx_PARALLEL (V16QImode, vs);
+  rtx reg = gen_reg_rtx (V16QImode);
+  int i;
+  for (i = 0; i < 16; i++)
+    RTVEC_ELT (vs, i) = operands[2];
+
+  emit_insn (gen_vec_initv16qi (reg, par));
+  emit_insn (gen_xop_ashlv16qi3 (operands[0], operands[1], reg));
+  DONE;
+})
+
+(define_expand "lshlv16qi3"
+  [(match_operand:V16QI 0 "register_operand" "")
+   (match_operand:V16QI 1 "register_operand" "")
+   (match_operand:SI 2 "nonmemory_operand" "")]
+  "TARGET_XOP"
+{
+  rtvec vs = rtvec_alloc (16);
+  rtx par = gen_rtx_PARALLEL (V16QImode, vs);
+  rtx reg = gen_reg_rtx (V16QImode);
+  int i;
+  for (i = 0; i < 16; i++)
+    RTVEC_ELT (vs, i) = operands[2];
+
+  emit_insn (gen_vec_initv16qi (reg, par));
+  emit_insn (gen_xop_lshlv16qi3 (operands[0], operands[1], reg));
+  DONE;
+})
+
+(define_expand "ashrv16qi3"
+  [(match_operand:V16QI 0 "register_operand" "")
+   (match_operand:V16QI 1 "register_operand" "")
+   (match_operand:SI 2 "nonmemory_operand" "")]
+  "TARGET_XOP"
+{
+  rtvec vs = rtvec_alloc (16);
+  rtx par = gen_rtx_PARALLEL (V16QImode, vs);
+  rtx reg = gen_reg_rtx (V16QImode);
+  int i;
+  rtx ele = ((CONST_INT_P (operands[2]))
+	     ? GEN_INT (- INTVAL (operands[2]))
+	     : operands[2]);
+
+  for (i = 0; i < 16; i++)
+    RTVEC_ELT (vs, i) = ele;
+
+  emit_insn (gen_vec_initv16qi (reg, par));
+
+  if (!CONST_INT_P (operands[2]))
+    {
+      rtx neg = gen_reg_rtx (V16QImode);
+      emit_insn (gen_negv16qi2 (neg, reg));
+      emit_insn (gen_xop_ashlv16qi3 (operands[0], operands[1], neg));
+    }
+  else
+    emit_insn (gen_xop_ashlv16qi3 (operands[0], operands[1], reg));
+
+  DONE;
+})
+
+(define_expand "ashrv2di3"
+  [(match_operand:V2DI 0 "register_operand" "")
+   (match_operand:V2DI 1 "register_operand" "")
+   (match_operand:DI 2 "nonmemory_operand" "")]
+  "TARGET_XOP"
+{
+  rtvec vs = rtvec_alloc (2);
+  rtx par = gen_rtx_PARALLEL (V2DImode, vs);
+  rtx reg = gen_reg_rtx (V2DImode);
+  rtx ele;
+
+  if (CONST_INT_P (operands[2]))
+    ele = GEN_INT (- INTVAL (operands[2]));
+  else if (GET_MODE (operands[2]) != DImode)
+    {
+      rtx move = gen_reg_rtx (DImode);
+      ele = gen_reg_rtx (DImode);
+      convert_move (move, operands[2], false);
+      emit_insn (gen_negdi2 (ele, move));
+    }
+  else
+    {
+      ele = gen_reg_rtx (DImode);
+      emit_insn (gen_negdi2 (ele, operands[2]));
+    }
+
+  RTVEC_ELT (vs, 0) = ele;
+  RTVEC_ELT (vs, 1) = ele;
+  emit_insn (gen_vec_initv2di (reg, par));
+  emit_insn (gen_xop_ashlv2di3 (operands[0], operands[1], reg));
+  DONE;
+})
+
+;; XOP FRCZ support
+;; parallel insns
+(define_insn "xop_frcz<mode>2"
+  [(set (match_operand:SSEMODEF2P 0 "register_operand" "=x")
+	(unspec:SSEMODEF2P
+	 [(match_operand:SSEMODEF2P 1 "nonimmediate_operand" "xm")]
+	 UNSPEC_FRCZ))]
+  "TARGET_XOP"
+  "vfrcz<ssemodesuffixf4>\t{%1, %0|%0, %1}"
+  [(set_attr "type" "ssecvt1")
+   (set_attr "mode" "<MODE>")])
+
+;; scalar insns
+(define_insn "xop_vmfrcz<mode>2"
+  [(set (match_operand:SSEMODEF2P 0 "register_operand" "=x")
+	(vec_merge:SSEMODEF2P
+	  (unspec:SSEMODEF2P
+	   [(match_operand:SSEMODEF2P 2 "nonimmediate_operand" "xm")]
+	   UNSPEC_FRCZ)
+	  (match_operand:SSEMODEF2P 1 "register_operand" "0")
+	  (const_int 1)))]
+  "TARGET_XOP"
+  "vfrcz<ssemodesuffixf2s>\t{%2, %0|%0, %2}"
+  [(set_attr "type" "ssecvt1")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "xop_frcz<mode>2256"
+  [(set (match_operand:FMA4MODEF4 0 "register_operand" "=x")
+	(unspec:FMA4MODEF4
+	 [(match_operand:FMA4MODEF4 1 "nonimmediate_operand" "xm")]
+	 UNSPEC_FRCZ))]
+  "TARGET_XOP"
+  "vfrcz<fma4modesuffixf4>\t{%1, %0|%0, %1}"
+  [(set_attr "type" "ssecvt1")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "xop_maskcmp<mode>3"
+  [(set (match_operand:SSEMODE1248 0 "register_operand" "=x")
+	(match_operator:SSEMODE1248 1 "ix86_comparison_int_operator"
+	 [(match_operand:SSEMODE1248 2 "register_operand" "x")
+	  (match_operand:SSEMODE1248 3 "nonimmediate_operand" "xm")]))]
+  "TARGET_XOP"
+  "vpcom%Y1<ssevecsize>\t{%3, %2, %0|%0, %2, %3}"
+  [(set_attr "type" "sse4arg")
+   (set_attr "prefix_data16" "0")
+   (set_attr "prefix_rep" "0")
+   (set_attr "prefix_extra" "2")
+   (set_attr "length_immediate" "1")
+   (set_attr "mode" "TI")])
+
+(define_insn "xop_maskcmp_uns<mode>3"
+  [(set (match_operand:SSEMODE1248 0 "register_operand" "=x")
+	(match_operator:SSEMODE1248 1 "ix86_comparison_uns_operator"
+	 [(match_operand:SSEMODE1248 2 "register_operand" "x")
+	  (match_operand:SSEMODE1248 3 "nonimmediate_operand" "xm")]))]
+  "TARGET_XOP"
+  "vpcom%Y1u<ssevecsize>\t{%3, %2, %0|%0, %2, %3}"
+  [(set_attr "type" "ssecmp")
+   (set_attr "prefix_data16" "0")
+   (set_attr "prefix_rep" "0")
+   (set_attr "prefix_extra" "2")
+   (set_attr "length_immediate" "1")
+   (set_attr "mode" "TI")])
+
+;; Version of pcom*u* that is called from the intrinsics that allows pcomequ*
+;; and pcomneu* not to be converted to the signed ones in case somebody needs
+;; the exact instruction generated for the intrinsic.
+(define_insn "xop_maskcmp_uns2<mode>3"
+  [(set (match_operand:SSEMODE1248 0 "register_operand" "=x")
+	(unspec:SSEMODE1248
+	 [(match_operator:SSEMODE1248 1 "ix86_comparison_uns_operator"
+	  [(match_operand:SSEMODE1248 2 "register_operand" "x")
+	   (match_operand:SSEMODE1248 3 "nonimmediate_operand" "xm")])]
+	 UNSPEC_XOP_UNSIGNED_CMP))]
+  "TARGET_XOP"
+  "vpcom%Y1u<ssevecsize>\t{%3, %2, %0|%0, %2, %3}"
+  [(set_attr "type" "ssecmp")
+   (set_attr "prefix_data16" "0")
+   (set_attr "prefix_extra" "2")
+   (set_attr "length_immediate" "1")
+   (set_attr "mode" "TI")])
+
+;; Pcomtrue and pcomfalse support.  These are useless instructions, but are
+;; being added here to be complete.
+(define_insn "xop_pcom_tf<mode>3"
+  [(set (match_operand:SSEMODE1248 0 "register_operand" "=x")
+	(unspec:SSEMODE1248
+	  [(match_operand:SSEMODE1248 1 "register_operand" "x")
+	   (match_operand:SSEMODE1248 2 "nonimmediate_operand" "xm")
+	   (match_operand:SI 3 "const_int_operand" "n")]
+	  UNSPEC_XOP_TRUEFALSE))]
+  "TARGET_XOP"
+{
+  return ((INTVAL (operands[3]) != 0)
+	  ? "vpcomtrue<ssevecsize>\t{%2, %1, %0|%0, %1, %2}"
+	  : "vpcomfalse<ssevecsize>\t{%2, %1, %0|%0, %1, %2}");
+}
+  [(set_attr "type" "ssecmp")
+   (set_attr "prefix_data16" "0")
+   (set_attr "prefix_extra" "2")
+   (set_attr "length_immediate" "1")
+   (set_attr "mode" "TI")])
+
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 (define_insn "*avx_aesenc"
   [(set (match_operand:V2DI 0 "register_operand" "=x")
 	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "x")
Index: config/i386/i386.opt
===================================================================
--- config/i386/i386.opt	(revision 152317)
+++ config/i386/i386.opt	(working copy)
@@ -314,6 +314,10 @@ mfma4
 Target Report Mask(ISA_FMA4) Var(ix86_isa_flags) VarExists Save
 Support FMA4 built-in functions and code generation 
 
+mxop
+Target Report Mask(ISA_XOP) Var(ix86_isa_flags) VarExists Save
+Support XOP built-in functions and code generation 
+
 mabm
 Target Report Mask(ISA_ABM) Var(ix86_isa_flags) VarExists Save
 Support code generation of Advanced Bit Manipulation (ABM) instructions.
Index: config/i386/i386-c.c
===================================================================
--- config/i386/i386-c.c	(revision 152317)
+++ config/i386/i386-c.c	(working copy)
@@ -232,6 +232,8 @@ ix86_target_macros_internal (int isa_fla
     def_or_undef (parse_in, "__SSE4A__");
   if (isa_flag & OPTION_MASK_ISA_FMA4)
     def_or_undef (parse_in, "__FMA4__");
+  if (isa_flag & OPTION_MASK_ISA_XOP)
+    def_or_undef (parse_in, "__XOP__");
   if ((fpmath & FPMATH_SSE) && (isa_flag & OPTION_MASK_ISA_SSE))
     def_or_undef (parse_in, "__SSE_MATH__");
   if ((fpmath & FPMATH_SSE) && (isa_flag & OPTION_MASK_ISA_SSE2))
Index: config/i386/i386-protos.h
===================================================================
--- config/i386/i386-protos.h	(revision 152317)
+++ config/i386/i386-protos.h	(working copy)
@@ -113,6 +113,8 @@ extern bool ix86_expand_fp_vcond (rtx[])
 extern bool ix86_expand_int_vcond (rtx[]);
 extern void ix86_expand_sse_unpack (rtx[], bool, bool);
 extern void ix86_expand_sse4_unpack (rtx[], bool, bool);
+extern void ix86_expand_xop_unpack (rtx[], bool, bool);
+extern void ix86_expand_xop_pack (rtx[]);
 extern int ix86_expand_int_addcc (rtx[]);
 extern void ix86_expand_call (rtx, rtx, rtx, rtx, rtx, int);
 extern void x86_initialize_trampoline (rtx, rtx, rtx);
Index: config/i386/xopintrin.h
===================================================================
--- config/i386/xopintrin.h	(revision 0)
+++ config/i386/xopintrin.h	(revision 0)
@@ -0,0 +1,777 @@
+/* Copyright (C) 2007, 2008, 2009 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _X86INTRIN_H_INCLUDED
+# error "Never use <xopintrin.h> directly; include <x86intrin.h> instead."
+#endif
+
+#ifndef _XOPMMINTRIN_H_INCLUDED
+#define _XOPMMINTRIN_H_INCLUDED
+
+#ifndef __XOP__
+# error "XOP instruction set not enabled"
+#else
+
+#include <fma4intrin.h>
+
+/* Integer multiply/add intructions. */
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maccs_epi16(__m128i __A, __m128i __B, __m128i __C)
+{
+  return (__m128i) __builtin_ia32_vpmacssww ((__v8hi)__A,(__v8hi)__B, (__v8hi)__C);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_macc_epi16(__m128i __A, __m128i __B, __m128i __C)
+{
+  return (__m128i) __builtin_ia32_vpmacsww ((__v8hi)__A, (__v8hi)__B, (__v8hi)__C);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maccsd_epi16(__m128i __A, __m128i __B, __m128i __C)
+{
+  return  (__m128i) __builtin_ia32_vpmacsswd ((__v8hi)__A, (__v8hi)__B, (__v4si)__C);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maccd_epi16(__m128i __A, __m128i __B, __m128i __C)
+{
+  return  (__m128i) __builtin_ia32_vpmacswd ((__v8hi)__A, (__v8hi)__B, (__v4si)__C);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maccs_epi32(__m128i __A, __m128i __B, __m128i __C)
+{
+  return  (__m128i) __builtin_ia32_vpmacssdd ((__v4si)__A, (__v4si)__B, (__v4si)__C);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_macc_epi32(__m128i __A, __m128i __B, __m128i __C)
+{
+  return  (__m128i) __builtin_ia32_vpmacsdd ((__v4si)__A, (__v4si)__B, (__v4si)__C);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maccslo_epi32(__m128i __A, __m128i __B, __m128i __C)
+{
+  return  (__m128i) __builtin_ia32_vpmacssdql ((__v4si)__A, (__v4si)__B, (__v2di)__C);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_macclo_epi32(__m128i __A, __m128i __B, __m128i __C)
+{
+  return  (__m128i) __builtin_ia32_vpmacsdql ((__v4si)__A, (__v4si)__B, (__v2di)__C);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maccshi_epi32(__m128i __A, __m128i __B, __m128i __C)
+{
+  return  (__m128i) __builtin_ia32_vpmacssdqh ((__v4si)__A, (__v4si)__B, (__v2di)__C);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_macchi_epi32(__m128i __A, __m128i __B, __m128i __C)
+{
+  return  (__m128i) __builtin_ia32_vpmacsdqh ((__v4si)__A, (__v4si)__B, (__v2di)__C);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maddsd_epi16(__m128i __A, __m128i __B, __m128i __C)
+{
+  return  (__m128i) __builtin_ia32_vpmadcsswd ((__v8hi)__A,(__v8hi)__B,(__v4si)__C);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maddd_epi16(__m128i __A, __m128i __B, __m128i __C)
+{
+  return  (__m128i) __builtin_ia32_vpmadcswd ((__v8hi)__A,(__v8hi)__B,(__v4si)__C);
+}
+
+/* Packed Integer Horizontal Add and Subtract */
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_haddw_epi8(__m128i __A)
+{
+  return  (__m128i) __builtin_ia32_vphaddbw ((__v16qi)__A);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_haddd_epi8(__m128i __A)
+{
+  return  (__m128i) __builtin_ia32_vphaddbd ((__v16qi)__A);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_haddq_epi8(__m128i __A)
+{
+  return  (__m128i) __builtin_ia32_vphaddbq ((__v16qi)__A);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_haddd_epi16(__m128i __A)
+{
+  return  (__m128i) __builtin_ia32_vphaddwd ((__v8hi)__A);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_haddq_epi16(__m128i __A)
+{
+  return  (__m128i) __builtin_ia32_vphaddwq ((__v8hi)__A);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_haddq_epi32(__m128i __A)
+{
+  return  (__m128i) __builtin_ia32_vphadddq ((__v4si)__A);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_haddw_epu8(__m128i __A)
+{
+  return  (__m128i) __builtin_ia32_vphaddubw ((__v16qi)__A);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_haddd_epu8(__m128i __A)
+{
+  return  (__m128i) __builtin_ia32_vphaddubd ((__v16qi)__A);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_haddq_epu8(__m128i __A)
+{
+  return  (__m128i) __builtin_ia32_vphaddubq ((__v16qi)__A);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_haddd_epu16(__m128i __A)
+{
+  return  (__m128i) __builtin_ia32_vphadduwd ((__v8hi)__A);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_haddq_epu16(__m128i __A)
+{
+  return  (__m128i) __builtin_ia32_vphadduwq ((__v8hi)__A);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_haddq_epu32(__m128i __A)
+{
+  return  (__m128i) __builtin_ia32_vphaddudq ((__v4si)__A);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_hsubw_epi8(__m128i __A)
+{
+  return  (__m128i) __builtin_ia32_vphsubbw ((__v16qi)__A);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_hsubd_epi16(__m128i __A)
+{
+  return  (__m128i) __builtin_ia32_vphsubwd ((__v8hi)__A);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_hsubq_epi32(__m128i __A)
+{
+  return  (__m128i) __builtin_ia32_vphsubdq ((__v4si)__A);
+}
+
+/* Vector conditional move and permute */
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cmov_si128(__m128i __A, __m128i __B, __m128i __C)
+{
+  return  (__m128i) __builtin_ia32_vpcmov (__A, __B, __C);
+}
+
+extern __inline __m256i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_cmov_si256(__m256i __A, __m256i __B, __m256i __C)
+{
+  return  (__m256i) __builtin_ia32_vpcmov256 (__A, __B, __C);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_perm_epi8(__m128i __A, __m128i __B, __m128i __C)
+{
+  return  (__m128i) __builtin_ia32_vpperm ((__v16qi)__A, (__v16qi)__B, (__v16qi)__C);
+}
+
+/* Packed Integer Rotates and Shifts
+   Rotates - Non-Immediate form */
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_rot_epi8(__m128i __A,  __m128i __B)
+{
+  return  (__m128i) __builtin_ia32_vprotb ((__v16qi)__A, (__v16qi)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_rot_epi16(__m128i __A,  __m128i __B)
+{
+  return  (__m128i) __builtin_ia32_vprotw ((__v8hi)__A, (__v8hi)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_rot_epi32(__m128i __A,  __m128i __B)
+{
+  return  (__m128i) __builtin_ia32_vprotd ((__v4si)__A, (__v4si)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_rot_epi64(__m128i __A,  __m128i __B)
+{
+  return (__m128i)  __builtin_ia32_vprotq ((__v2di)__A, (__v2di)__B);
+}
+
+/* Rotates - Immediate form */
+
+#ifdef __OPTIMIZE__
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_roti_epi8(__m128i __A, const int __B)
+{
+  return  (__m128i) __builtin_ia32_vprotbi ((__v16qi)__A, __B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_roti_epi16(__m128i __A, const int __B)
+{
+  return  (__m128i) __builtin_ia32_vprotwi ((__v8hi)__A, __B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_roti_epi32(__m128i __A, const int __B)
+{
+  return  (__m128i) __builtin_ia32_vprotdi ((__v4si)__A, __B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_roti_epi64(__m128i __A, const int __B)
+{
+  return  (__m128i) __builtin_ia32_vprotqi ((__v2di)__A, __B);
+}
+#else
+#define _mm_roti_epi8(A, N) \
+  ((__m128i) __builtin_ia32_vprotbi ((__v16qi)(__m128i)(A), (int)(N)))
+#define _mm_roti_epi16(A, N) \
+  ((__m128i) __builtin_ia32_vprotwi ((__v8hi)(__m128i)(A), (int)(N)))
+#define _mm_roti_epi32(A, N) \
+  ((__m128i) __builtin_ia32_vprotdi ((__v4si)(__m128i)(A), (int)(N)))
+#define _mm_roti_epi64(A, N) \
+  ((__m128i) __builtin_ia32_vprotqi ((__v2di)(__m128i)(A), (int)(N)))
+#endif
+
+/* Shifts */
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_shl_epi8(__m128i __A,  __m128i __B)
+{
+  return  (__m128i) __builtin_ia32_vpshlb ((__v16qi)__A, (__v16qi)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_shl_epi16(__m128i __A,  __m128i __B)
+{
+  return  (__m128i) __builtin_ia32_vpshlw ((__v8hi)__A, (__v8hi)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_shl_epi32(__m128i __A,  __m128i __B)
+{
+  return  (__m128i) __builtin_ia32_vpshld ((__v4si)__A, (__v4si)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_shl_epi64(__m128i __A,  __m128i __B)
+{
+  return  (__m128i) __builtin_ia32_vpshlq ((__v2di)__A, (__v2di)__B);
+}
+
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_sha_epi8(__m128i __A,  __m128i __B)
+{
+  return  (__m128i) __builtin_ia32_vpshab ((__v16qi)__A, (__v16qi)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_sha_epi16(__m128i __A,  __m128i __B)
+{
+  return  (__m128i) __builtin_ia32_vpshaw ((__v8hi)__A, (__v8hi)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_sha_epi32(__m128i __A,  __m128i __B)
+{
+  return  (__m128i) __builtin_ia32_vpshad ((__v4si)__A, (__v4si)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_sha_epi64(__m128i __A,  __m128i __B)
+{
+  return  (__m128i) __builtin_ia32_vpshaq ((__v2di)__A, (__v2di)__B);
+}
+
+/* Compare and Predicate Generation
+   pcom (integer, unsinged bytes) */
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comlt_epu8(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomltub ((__v16qi)__A, (__v16qi)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comle_epu8(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomleub ((__v16qi)__A, (__v16qi)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comgt_epu8(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomgtub ((__v16qi)__A, (__v16qi)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comge_epu8(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomgeub ((__v16qi)__A, (__v16qi)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comeq_epu8(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomequb ((__v16qi)__A, (__v16qi)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comneq_epu8(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomnequb ((__v16qi)__A, (__v16qi)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comfalse_epu8(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomfalseub ((__v16qi)__A, (__v16qi)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comtrue_epu8(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomtrueub ((__v16qi)__A, (__v16qi)__B);
+}
+
+/*pcom (integer, unsinged words) */
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comlt_epu16(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomltuw ((__v8hi)__A, (__v8hi)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comle_epu16(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomleuw ((__v8hi)__A, (__v8hi)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comgt_epu16(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomgtuw ((__v8hi)__A, (__v8hi)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comge_epu16(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomgeuw ((__v8hi)__A, (__v8hi)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comeq_epu16(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomequw ((__v8hi)__A, (__v8hi)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comneq_epu16(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomnequw ((__v8hi)__A, (__v8hi)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comfalse_epu16(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomfalseuw ((__v8hi)__A, (__v8hi)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comtrue_epu16(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomtrueuw ((__v8hi)__A, (__v8hi)__B);
+}
+
+/*pcom (integer, unsinged double words) */
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comlt_epu32(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomltud ((__v4si)__A, (__v4si)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comle_epu32(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomleud ((__v4si)__A, (__v4si)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comgt_epu32(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomgtud ((__v4si)__A, (__v4si)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comge_epu32(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomgeud ((__v4si)__A, (__v4si)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comeq_epu32(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomequd ((__v4si)__A, (__v4si)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comneq_epu32(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomnequd ((__v4si)__A, (__v4si)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comfalse_epu32(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomfalseud ((__v4si)__A, (__v4si)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comtrue_epu32(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomtrueud ((__v4si)__A, (__v4si)__B);
+}
+
+/*pcom (integer, unsinged quad words) */
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comlt_epu64(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomltuq ((__v2di)__A, (__v2di)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comle_epu64(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomleuq ((__v2di)__A, (__v2di)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comgt_epu64(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomgtuq ((__v2di)__A, (__v2di)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comge_epu64(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomgeuq ((__v2di)__A, (__v2di)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comeq_epu64(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomequq ((__v2di)__A, (__v2di)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comneq_epu64(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomnequq ((__v2di)__A, (__v2di)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comfalse_epu64(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomfalseuq ((__v2di)__A, (__v2di)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comtrue_epu64(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomtrueuq ((__v2di)__A, (__v2di)__B);
+}
+
+/*pcom (integer, signed bytes) */
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comlt_epi8(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomltb ((__v16qi)__A, (__v16qi)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comle_epi8(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomleb ((__v16qi)__A, (__v16qi)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comgt_epi8(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomgtb ((__v16qi)__A, (__v16qi)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comge_epi8(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomgeb ((__v16qi)__A, (__v16qi)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comeq_epi8(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomeqb ((__v16qi)__A, (__v16qi)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comneq_epi8(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomneqb ((__v16qi)__A, (__v16qi)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comfalse_epi8(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomfalseb ((__v16qi)__A, (__v16qi)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comtrue_epi8(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomtrueb ((__v16qi)__A, (__v16qi)__B);
+}
+
+/*pcom (integer, signed words) */
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comlt_epi16(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomltw ((__v8hi)__A, (__v8hi)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comle_epi16(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomlew ((__v8hi)__A, (__v8hi)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comgt_epi16(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomgtw ((__v8hi)__A, (__v8hi)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comge_epi16(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomgew ((__v8hi)__A, (__v8hi)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comeq_epi16(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomeqw ((__v8hi)__A, (__v8hi)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comneq_epi16(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomneqw ((__v8hi)__A, (__v8hi)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comfalse_epi16(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomfalsew ((__v8hi)__A, (__v8hi)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comtrue_epi16(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomtruew ((__v8hi)__A, (__v8hi)__B);
+}
+
+/*pcom (integer, signed double words) */
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comlt_epi32(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomltd ((__v4si)__A, (__v4si)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comle_epi32(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomled ((__v4si)__A, (__v4si)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comgt_epi32(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomgtd ((__v4si)__A, (__v4si)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comge_epi32(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomged ((__v4si)__A, (__v4si)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comeq_epi32(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomeqd ((__v4si)__A, (__v4si)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comneq_epi32(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomneqd ((__v4si)__A, (__v4si)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comfalse_epi32(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomfalsed ((__v4si)__A, (__v4si)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comtrue_epi32(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomtrued ((__v4si)__A, (__v4si)__B);
+}
+
+/*pcom (integer, signed quad words) */
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comlt_epi64(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomltq ((__v2di)__A, (__v2di)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comle_epi64(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomleq ((__v2di)__A, (__v2di)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comgt_epi64(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomgtq ((__v2di)__A, (__v2di)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comge_epi64(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomgeq ((__v2di)__A, (__v2di)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comeq_epi64(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomeqq ((__v2di)__A, (__v2di)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comneq_epi64(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomneqq ((__v2di)__A, (__v2di)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comfalse_epi64(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomfalseq ((__v2di)__A, (__v2di)__B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_comtrue_epi64(__m128i __A, __m128i __B)
+{
+  return (__m128i) __builtin_ia32_vpcomtrueq ((__v2di)__A, (__v2di)__B);
+}
+
+/* FRCZ */
+
+extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_frcz_ps (__m128 __A)
+{
+  return (__m128) __builtin_ia32_vfrczps ((__v4sf)__A);
+}
+
+extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_frcz_pd (__m128d __A)
+{
+  return (__m128d) __builtin_ia32_vfrczpd ((__v2df)__A);
+}
+
+extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_frcz_ss (__m128 __A, __m128 __B)
+{
+  return (__m128) __builtin_ia32_vfrczss ((__v4sf)__A, (__v4sf)__B);
+}
+
+extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_frcz_sd (__m128d __A, __m128d __B)
+{
+  return (__m128d) __builtin_ia32_vfrczsd ((__v2df)__A, (__v2df)__B);
+}
+
+extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_frcz_ps (__m256 __A)
+{
+  return (__m256) __builtin_ia32_vfrczps256 ((__v8sf)__A);
+}
+
+extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_frcz_pd (__m256d __A)
+{
+  return (__m256d) __builtin_ia32_vfrczpd256 ((__v4df)__A);
+}
+
+#endif /* __XOP__ */
+
+#endif /* _XOPMMINTRIN_H_INCLUDED */
Index: config/i386/i386.c
===================================================================
--- config/i386/i386.c	(revision 152317)
+++ config/i386/i386.c	(working copy)
@@ -1958,6 +1958,8 @@ static int ix86_isa_flags_explicit;
 #define OPTION_MASK_ISA_FMA4_SET \
   (OPTION_MASK_ISA_FMA4 | OPTION_MASK_ISA_SSE4A_SET \
    | OPTION_MASK_ISA_AVX_SET)
+#define OPTION_MASK_ISA_XOP_SET \
+  (OPTION_MASK_ISA_XOP | OPTION_MASK_ISA_FMA4_SET)
 
 /* AES and PCLMUL need SSE2 because they use xmm registers */
 #define OPTION_MASK_ISA_AES_SET \
@@ -2009,7 +2011,9 @@ static int ix86_isa_flags_explicit;
 #define OPTION_MASK_ISA_SSE4A_UNSET \
   (OPTION_MASK_ISA_SSE4A | OPTION_MASK_ISA_FMA4_UNSET)
 
-#define OPTION_MASK_ISA_FMA4_UNSET OPTION_MASK_ISA_FMA4
+#define OPTION_MASK_ISA_FMA4_UNSET \
+  (OPTION_MASK_ISA_FMA4 | OPTION_MASK_ISA_XOP_UNSET)
+#define OPTION_MASK_ISA_XOP_UNSET OPTION_MASK_ISA_XOP
 
 #define OPTION_MASK_ISA_AES_UNSET OPTION_MASK_ISA_AES
 #define OPTION_MASK_ISA_PCLMUL_UNSET OPTION_MASK_ISA_PCLMUL
@@ -2257,6 +2261,19 @@ ix86_handle_option (size_t code, const c
 	}
       return true;
 
+   case OPT_mxop:
+      if (value)
+	{
+	  ix86_isa_flags |= OPTION_MASK_ISA_XOP_SET;
+	  ix86_isa_flags_explicit |= OPTION_MASK_ISA_XOP_SET;
+	}
+      else
+	{
+	  ix86_isa_flags &= ~OPTION_MASK_ISA_XOP_UNSET;
+	  ix86_isa_flags_explicit |= OPTION_MASK_ISA_XOP_UNSET;
+	}
+      return true;
+
     case OPT_mabm:
       if (value)
 	{
@@ -2385,6 +2402,7 @@ ix86_target_string (int isa, int flags, 
   {
     { "-m64",		OPTION_MASK_ISA_64BIT },
     { "-mfma4",		OPTION_MASK_ISA_FMA4 },
+    { "-mxop",		OPTION_MASK_ISA_XOP },
     { "-msse4a",	OPTION_MASK_ISA_SSE4A },
     { "-msse4.2",	OPTION_MASK_ISA_SSE4_2 },
     { "-msse4.1",	OPTION_MASK_ISA_SSE4_1 },
@@ -2615,7 +2633,8 @@ override_options (bool main_args_p)
       PTA_AVX = 1 << 18,
       PTA_FMA = 1 << 19,
       PTA_MOVBE = 1 << 20,
-      PTA_FMA4 = 1 << 21
+      PTA_FMA4 = 1 << 21,
+      PTA_XOP = 1 << 22
     };
 
   static struct pta
@@ -2961,6 +2980,9 @@ override_options (bool main_args_p)
 	if (processor_alias_table[i].flags & PTA_FMA4
 	    && !(ix86_isa_flags_explicit & OPTION_MASK_ISA_FMA4))
 	  ix86_isa_flags |= OPTION_MASK_ISA_FMA4;
+	if (processor_alias_table[i].flags & PTA_XOP
+	    && !(ix86_isa_flags_explicit & OPTION_MASK_ISA_XOP))
+	  ix86_isa_flags |= OPTION_MASK_ISA_XOP;
 	if (processor_alias_table[i].flags & PTA_ABM
 	    && !(ix86_isa_flags_explicit & OPTION_MASK_ISA_ABM))
 	  ix86_isa_flags |= OPTION_MASK_ISA_ABM;
@@ -3645,6 +3667,7 @@ ix86_valid_target_attribute_inner_p (tre
     IX86_ATTR_ISA ("sse4a",	OPT_msse4a),
     IX86_ATTR_ISA ("ssse3",	OPT_mssse3),
     IX86_ATTR_ISA ("fma4",	OPT_mfma4),
+    IX86_ATTR_ISA ("xop",	OPT_mxop),
 
     /* string options */
     IX86_ATTR_STR ("arch=",	IX86_FUNCTION_SPECIFIC_ARCH),
@@ -11203,6 +11226,7 @@ get_some_local_dynamic_name (void)
    X -- don't print any sort of PIC '@' suffix for a symbol.
    & -- print some in-use local-dynamic symbol name.
    H -- print a memory address offset by 8; used for sse high-parts
+   Y -- print condition for XOP pcom* instruction.
    + -- print a branch hint as 'cs' or 'ds' prefix
    ; -- print a semicolon (after prefixes due to bug in older gas).
  */
@@ -11620,6 +11644,61 @@ print_operand (FILE *file, rtx x, int co
 	    return;
 	  }
 
+	case 'Y':
+	  switch (GET_CODE (x))
+	    {
+	    case NE:
+	      fputs ("neq", file);
+	      break;
+	    case EQ:
+	      fputs ("eq", file);
+	      break;
+	    case GE:
+	    case GEU:
+	      fputs (INTEGRAL_MODE_P (GET_MODE (x)) ? "ge" : "unlt", file);
+	      break;
+	    case GT:
+	    case GTU:
+	      fputs (INTEGRAL_MODE_P (GET_MODE (x)) ? "gt" : "unle", file);
+	      break;
+	    case LE:
+	    case LEU:
+	      fputs ("le", file);
+	      break;
+	    case LT:
+	    case LTU:
+	      fputs ("lt", file);
+	      break;
+	    case UNORDERED:
+	      fputs ("unord", file);
+	      break;
+	    case ORDERED:
+	      fputs ("ord", file);
+	      break;
+	    case UNEQ:
+	      fputs ("ueq", file);
+	      break;
+	    case UNGE:
+	      fputs ("nlt", file);
+	      break;
+	    case UNGT:
+	      fputs ("nle", file);
+	      break;
+	    case UNLE:
+	      fputs ("ule", file);
+	      break;
+	    case UNLT:
+	      fputs ("ult", file);
+	      break;
+	    case LTGT:
+	      fputs ("une", file);
+	      break;
+	    default:
+	      output_operand_lossage ("operand is not a condition code, invalid operand code 'D'");
+	      return;
+	    }
+	  return;
+
 	case ';':
 #if TARGET_MACHO
 	  fputs (" ; ", file);
@@ -15828,6 +15907,14 @@ ix86_expand_sse_movcc (rtx dest, rtx cmp
       x = gen_rtx_AND (mode, x, op_false);
       emit_insn (gen_rtx_SET (VOIDmode, dest, x));
     }
+  else if (TARGET_XOP)
+    {
+      rtx pcmov = gen_rtx_SET (mode, dest,
+			       gen_rtx_IF_THEN_ELSE (mode, cmp,
+						     op_true,
+						     op_false));
+      emit_insn (pcmov);
+    }
   else
     {
       op_true = force_reg (mode, op_true);
@@ -15950,6 +16037,9 @@ ix86_expand_int_vcond (rtx operands[])
   cop0 = operands[4];
   cop1 = operands[5];
 
+  /* XOP supports all of the comparisons on all vector int types.  */
+  if (!TARGET_XOP)
+    {
   /* Canonicalize the comparison to EQ, GT, GTU.  */
   switch (code)
     {
@@ -16060,6 +16150,7 @@ ix86_expand_int_vcond (rtx operands[])
       cop0 = x;
       cop1 = CONST0_RTX (mode);
     }
+    }
 
   x = ix86_expand_sse_cmp (operands[0], code, cop0, cop1,
 			   operands[1+negate], operands[2-negate]);
@@ -16164,6 +16255,190 @@ ix86_expand_sse4_unpack (rtx operands[2]
   emit_insn (unpack (dest, src));
 }
 
+/* This function performs the same task as ix86_expand_sse_unpack,
+   but with xop instructions.  */
+
+void
+ix86_expand_xop_unpack (rtx operands[2], bool unsigned_p, bool high_p)
+{
+  enum machine_mode imode = GET_MODE (operands[1]);
+  int pperm_bytes[16];
+  int i;
+  int h = (high_p) ? 8 : 0;
+  int h2;
+  int sign_extend;
+  rtvec v = rtvec_alloc (16);
+  rtvec vs;
+  rtx x, p;
+  rtx op0 = operands[0], op1 = operands[1];
+
+  switch (imode)
+    {
+    case V16QImode:
+      vs = rtvec_alloc (8);
+      h2 = (high_p) ? 8 : 0;
+      for (i = 0; i < 8; i++)
+	{
+	  pperm_bytes[2*i+0] = PPERM_SRC | PPERM_SRC2 | i | h;
+	  pperm_bytes[2*i+1] = ((unsigned_p)
+				? PPERM_ZERO
+				: PPERM_SIGN | PPERM_SRC2 | i | h);
+	}
+
+      for (i = 0; i < 16; i++)
+	RTVEC_ELT (v, i) = GEN_INT (pperm_bytes[i]);
+
+      for (i = 0; i < 8; i++)
+	RTVEC_ELT (vs, i) = GEN_INT (i + h2);
+
+      p = gen_rtx_PARALLEL (VOIDmode, vs);
+      x = force_reg (V16QImode, gen_rtx_CONST_VECTOR (V16QImode, v));
+      if (unsigned_p)
+	emit_insn (gen_xop_pperm_zero_v16qi_v8hi (op0, op1, p, x));
+      else
+	emit_insn (gen_xop_pperm_sign_v16qi_v8hi (op0, op1, p, x));
+      break;
+
+    case V8HImode:
+      vs = rtvec_alloc (4);
+      h2 = (high_p) ? 4 : 0;
+      for (i = 0; i < 4; i++)
+	{
+	  sign_extend = ((unsigned_p)
+			 ? PPERM_ZERO
+			 : PPERM_SIGN | PPERM_SRC2 | ((2*i) + 1 + h));
+	  pperm_bytes[4*i+0] = PPERM_SRC | PPERM_SRC2 | ((2*i) + 0 + h);
+	  pperm_bytes[4*i+1] = PPERM_SRC | PPERM_SRC2 | ((2*i) + 1 + h);
+	  pperm_bytes[4*i+2] = sign_extend;
+	  pperm_bytes[4*i+3] = sign_extend;
+	}
+
+      for (i = 0; i < 16; i++)
+	RTVEC_ELT (v, i) = GEN_INT (pperm_bytes[i]);
+
+      for (i = 0; i < 4; i++)
+	RTVEC_ELT (vs, i) = GEN_INT (i + h2);
+
+      p = gen_rtx_PARALLEL (VOIDmode, vs);
+      x = force_reg (V16QImode, gen_rtx_CONST_VECTOR (V16QImode, v));
+      if (unsigned_p)
+	emit_insn (gen_xop_pperm_zero_v8hi_v4si (op0, op1, p, x));
+      else
+	emit_insn (gen_xop_pperm_sign_v8hi_v4si (op0, op1, p, x));
+      break;
+
+    case V4SImode:
+      vs = rtvec_alloc (2);
+      h2 = (high_p) ? 2 : 0;
+      for (i = 0; i < 2; i++)
+	{
+	  sign_extend = ((unsigned_p)
+			 ? PPERM_ZERO
+			 : PPERM_SIGN | PPERM_SRC2 | ((4*i) + 3 + h));
+	  pperm_bytes[8*i+0] = PPERM_SRC | PPERM_SRC2 | ((4*i) + 0 + h);
+	  pperm_bytes[8*i+1] = PPERM_SRC | PPERM_SRC2 | ((4*i) + 1 + h);
+	  pperm_bytes[8*i+2] = PPERM_SRC | PPERM_SRC2 | ((4*i) + 2 + h);
+	  pperm_bytes[8*i+3] = PPERM_SRC | PPERM_SRC2 | ((4*i) + 3 + h);
+	  pperm_bytes[8*i+4] = sign_extend;
+	  pperm_bytes[8*i+5] = sign_extend;
+	  pperm_bytes[8*i+6] = sign_extend;
+	  pperm_bytes[8*i+7] = sign_extend;
+	}
+
+      for (i = 0; i < 16; i++)
+	RTVEC_ELT (v, i) = GEN_INT (pperm_bytes[i]);
+
+      for (i = 0; i < 2; i++)
+	RTVEC_ELT (vs, i) = GEN_INT (i + h2);
+
+      p = gen_rtx_PARALLEL (VOIDmode, vs);
+      x = force_reg (V16QImode, gen_rtx_CONST_VECTOR (V16QImode, v));
+      if (unsigned_p)
+	emit_insn (gen_xop_pperm_zero_v4si_v2di (op0, op1, p, x));
+      else
+	emit_insn (gen_xop_pperm_sign_v4si_v2di (op0, op1, p, x));
+      break;
+
+    default:
+      gcc_unreachable ();
+    }
+
+  return;
+}
+
+/* Pack the high bits from OPERANDS[1] and low bits from OPERANDS[2] into the
+   next narrower integer vector type */
+void
+ix86_expand_xop_pack (rtx operands[3])
+{
+  enum machine_mode imode = GET_MODE (operands[0]);
+  int pperm_bytes[16];
+  int i;
+  rtvec v = rtvec_alloc (16);
+  rtx x;
+  rtx op0 = operands[0];
+  rtx op1 = operands[1];
+  rtx op2 = operands[2];
+
+  switch (imode)
+    {
+    case V16QImode:
+      for (i = 0; i < 8; i++)
+	{
+	  pperm_bytes[i+0] = PPERM_SRC | PPERM_SRC1 | (i*2);
+	  pperm_bytes[i+8] = PPERM_SRC | PPERM_SRC2 | (i*2);
+	}
+
+      for (i = 0; i < 16; i++)
+	RTVEC_ELT (v, i) = GEN_INT (pperm_bytes[i]);
+
+      x = force_reg (V16QImode, gen_rtx_CONST_VECTOR (V16QImode, v));
+      emit_insn (gen_xop_pperm_pack_v8hi_v16qi (op0, op1, op2, x));
+      break;
+
+    case V8HImode:
+      for (i = 0; i < 4; i++)
+	{
+	  pperm_bytes[(2*i)+0] = PPERM_SRC | PPERM_SRC1 | ((i*4) + 0);
+	  pperm_bytes[(2*i)+1] = PPERM_SRC | PPERM_SRC1 | ((i*4) + 1);
+	  pperm_bytes[(2*i)+8] = PPERM_SRC | PPERM_SRC2 | ((i*4) + 0);
+	  pperm_bytes[(2*i)+9] = PPERM_SRC | PPERM_SRC2 | ((i*4) + 1);
+	}
+
+      for (i = 0; i < 16; i++)
+	RTVEC_ELT (v, i) = GEN_INT (pperm_bytes[i]);
+
+      x = force_reg (V16QImode, gen_rtx_CONST_VECTOR (V16QImode, v));
+      emit_insn (gen_xop_pperm_pack_v4si_v8hi (op0, op1, op2, x));
+      break;
+
+    case V4SImode:
+      for (i = 0; i < 2; i++)
+	{
+	  pperm_bytes[(4*i)+0]  = PPERM_SRC | PPERM_SRC1 | ((i*8) + 0);
+	  pperm_bytes[(4*i)+1]  = PPERM_SRC | PPERM_SRC1 | ((i*8) + 1);
+	  pperm_bytes[(4*i)+2]  = PPERM_SRC | PPERM_SRC1 | ((i*8) + 2);
+	  pperm_bytes[(4*i)+3]  = PPERM_SRC | PPERM_SRC1 | ((i*8) + 3);
+	  pperm_bytes[(4*i)+8]  = PPERM_SRC | PPERM_SRC2 | ((i*8) + 0);
+	  pperm_bytes[(4*i)+9]  = PPERM_SRC | PPERM_SRC2 | ((i*8) + 1);
+	  pperm_bytes[(4*i)+10] = PPERM_SRC | PPERM_SRC2 | ((i*8) + 2);
+	  pperm_bytes[(4*i)+11] = PPERM_SRC | PPERM_SRC2 | ((i*8) + 3);
+	}
+
+      for (i = 0; i < 16; i++)
+	RTVEC_ELT (v, i) = GEN_INT (pperm_bytes[i]);
+
+      x = force_reg (V16QImode, gen_rtx_CONST_VECTOR (V16QImode, v));
+      emit_insn (gen_xop_pperm_pack_v2di_v4si (op0, op1, op2, x));
+      break;
+
+    default:
+      gcc_unreachable ();
+    }
+
+  return;
+}
+
 /* Expand conditional increment or decrement using adb/sbb instructions.
    The default case using setcc followed by the conditional move can be
    done by generic code.  */
@@ -20745,6 +21020,150 @@ enum ix86_builtins
   IX86_BUILTIN_VFNMADDPD256,
   IX86_BUILTIN_VFNMSUBPS256,
   IX86_BUILTIN_VFNMSUBPD256,
+
+  IX86_BUILTIN_VPCMOV,
+  IX86_BUILTIN_VPCMOV_V2DI,
+  IX86_BUILTIN_VPCMOV_V4SI,
+  IX86_BUILTIN_VPCMOV_V8HI,
+  IX86_BUILTIN_VPCMOV_V16QI,
+  IX86_BUILTIN_VPCMOV_V4SF,
+  IX86_BUILTIN_VPCMOV_V2DF,
+  IX86_BUILTIN_VPCMOV256,
+  IX86_BUILTIN_VPCMOV_V4DI256,
+  IX86_BUILTIN_VPCMOV_V8SI256,
+  IX86_BUILTIN_VPCMOV_V16HI256,
+  IX86_BUILTIN_VPCMOV_V32QI256,
+  IX86_BUILTIN_VPCMOV_V8SF256,
+  IX86_BUILTIN_VPCMOV_V4DF256,
+
+  IX86_BUILTIN_VPPERM,
+
+  IX86_BUILTIN_VPMACSSWW,
+  IX86_BUILTIN_VPMACSWW,
+  IX86_BUILTIN_VPMACSSWD,
+  IX86_BUILTIN_VPMACSWD,
+  IX86_BUILTIN_VPMACSSDD,
+  IX86_BUILTIN_VPMACSDD,
+  IX86_BUILTIN_VPMACSSDQL,
+  IX86_BUILTIN_VPMACSSDQH,
+  IX86_BUILTIN_VPMACSDQL,
+  IX86_BUILTIN_VPMACSDQH,
+  IX86_BUILTIN_VPMADCSSWD,
+  IX86_BUILTIN_VPMADCSWD,
+
+  IX86_BUILTIN_VPHADDBW,
+  IX86_BUILTIN_VPHADDBD,
+  IX86_BUILTIN_VPHADDBQ,
+  IX86_BUILTIN_VPHADDWD,
+  IX86_BUILTIN_VPHADDWQ,
+  IX86_BUILTIN_VPHADDDQ,
+  IX86_BUILTIN_VPHADDUBW,
+  IX86_BUILTIN_VPHADDUBD,
+  IX86_BUILTIN_VPHADDUBQ,
+  IX86_BUILTIN_VPHADDUWD,
+  IX86_BUILTIN_VPHADDUWQ,
+  IX86_BUILTIN_VPHADDUDQ,
+  IX86_BUILTIN_VPHSUBBW,
+  IX86_BUILTIN_VPHSUBWD,
+  IX86_BUILTIN_VPHSUBDQ,
+
+  IX86_BUILTIN_VPROTB,
+  IX86_BUILTIN_VPROTW,
+  IX86_BUILTIN_VPROTD,
+  IX86_BUILTIN_VPROTQ,
+  IX86_BUILTIN_VPROTB_IMM,
+  IX86_BUILTIN_VPROTW_IMM,
+  IX86_BUILTIN_VPROTD_IMM,
+  IX86_BUILTIN_VPROTQ_IMM,
+
+  IX86_BUILTIN_VPSHLB,
+  IX86_BUILTIN_VPSHLW,
+  IX86_BUILTIN_VPSHLD,
+  IX86_BUILTIN_VPSHLQ,
+  IX86_BUILTIN_VPSHAB,
+  IX86_BUILTIN_VPSHAW,
+  IX86_BUILTIN_VPSHAD,
+  IX86_BUILTIN_VPSHAQ,
+
+  IX86_BUILTIN_VFRCZSS,
+  IX86_BUILTIN_VFRCZSD,
+  IX86_BUILTIN_VFRCZPS,
+  IX86_BUILTIN_VFRCZPD,
+  IX86_BUILTIN_VFRCZPS256,
+  IX86_BUILTIN_VFRCZPD256,
+
+  IX86_BUILTIN_VPCOMEQUB,
+  IX86_BUILTIN_VPCOMNEUB,
+  IX86_BUILTIN_VPCOMLTUB,
+  IX86_BUILTIN_VPCOMLEUB,
+  IX86_BUILTIN_VPCOMGTUB,
+  IX86_BUILTIN_VPCOMGEUB,
+  IX86_BUILTIN_VPCOMFALSEUB,
+  IX86_BUILTIN_VPCOMTRUEUB,
+
+  IX86_BUILTIN_VPCOMEQUW,
+  IX86_BUILTIN_VPCOMNEUW,
+  IX86_BUILTIN_VPCOMLTUW,
+  IX86_BUILTIN_VPCOMLEUW,
+  IX86_BUILTIN_VPCOMGTUW,
+  IX86_BUILTIN_VPCOMGEUW,
+  IX86_BUILTIN_VPCOMFALSEUW,
+  IX86_BUILTIN_VPCOMTRUEUW,
+
+  IX86_BUILTIN_VPCOMEQUD,
+  IX86_BUILTIN_VPCOMNEUD,
+  IX86_BUILTIN_VPCOMLTUD,
+  IX86_BUILTIN_VPCOMLEUD,
+  IX86_BUILTIN_VPCOMGTUD,
+  IX86_BUILTIN_VPCOMGEUD,
+  IX86_BUILTIN_VPCOMFALSEUD,
+  IX86_BUILTIN_VPCOMTRUEUD,
+
+  IX86_BUILTIN_VPCOMEQUQ,
+  IX86_BUILTIN_VPCOMNEUQ,
+  IX86_BUILTIN_VPCOMLTUQ,
+  IX86_BUILTIN_VPCOMLEUQ,
+  IX86_BUILTIN_VPCOMGTUQ,
+  IX86_BUILTIN_VPCOMGEUQ,
+  IX86_BUILTIN_VPCOMFALSEUQ,
+  IX86_BUILTIN_VPCOMTRUEUQ,
+
+  IX86_BUILTIN_VPCOMEQB,
+  IX86_BUILTIN_VPCOMNEB,
+  IX86_BUILTIN_VPCOMLTB,
+  IX86_BUILTIN_VPCOMLEB,
+  IX86_BUILTIN_VPCOMGTB,
+  IX86_BUILTIN_VPCOMGEB,
+  IX86_BUILTIN_VPCOMFALSEB,
+  IX86_BUILTIN_VPCOMTRUEB,
+
+  IX86_BUILTIN_VPCOMEQW,
+  IX86_BUILTIN_VPCOMNEW,
+  IX86_BUILTIN_VPCOMLTW,
+  IX86_BUILTIN_VPCOMLEW,
+  IX86_BUILTIN_VPCOMGTW,
+  IX86_BUILTIN_VPCOMGEW,
+  IX86_BUILTIN_VPCOMFALSEW,
+  IX86_BUILTIN_VPCOMTRUEW,
+
+  IX86_BUILTIN_VPCOMEQD,
+  IX86_BUILTIN_VPCOMNED,
+  IX86_BUILTIN_VPCOMLTD,
+  IX86_BUILTIN_VPCOMLED,
+  IX86_BUILTIN_VPCOMGTD,
+  IX86_BUILTIN_VPCOMGED,
+  IX86_BUILTIN_VPCOMFALSED,
+  IX86_BUILTIN_VPCOMTRUED,
+
+  IX86_BUILTIN_VPCOMEQQ,
+  IX86_BUILTIN_VPCOMNEQ,
+  IX86_BUILTIN_VPCOMLTQ,
+  IX86_BUILTIN_VPCOMLEQ,
+  IX86_BUILTIN_VPCOMGTQ,
+  IX86_BUILTIN_VPCOMGEQ,
+  IX86_BUILTIN_VPCOMFALSEQ,
+  IX86_BUILTIN_VPCOMTRUEQ,
+
   IX86_BUILTIN_MAX
 };
 
@@ -21818,13 +22237,58 @@ static const struct builtin_description 
   { OPTION_MASK_ISA_AVX, CODE_FOR_avx_movmskps256, "__builtin_ia32_movmskps256", IX86_BUILTIN_MOVMSKPS256, UNKNOWN, (int) INT_FTYPE_V8SF },
 };
 
-/* FMA4.  */
+/* FMA4 and XOP.  */
 enum multi_arg_type {
   MULTI_ARG_UNKNOWN,
   MULTI_ARG_3_SF,
   MULTI_ARG_3_DF,
   MULTI_ARG_3_SF2,
-  MULTI_ARG_3_DF2
+  MULTI_ARG_3_DF2,
+  MULTI_ARG_3_DI,
+  MULTI_ARG_3_SI,
+  MULTI_ARG_3_SI_DI,
+  MULTI_ARG_3_HI,
+  MULTI_ARG_3_HI_SI,
+  MULTI_ARG_3_QI,
+  MULTI_ARG_3_DI2,
+  MULTI_ARG_3_SI2,
+  MULTI_ARG_3_HI2,
+  MULTI_ARG_3_QI2,
+  MULTI_ARG_2_SF,
+  MULTI_ARG_2_DF,
+  MULTI_ARG_2_DI,
+  MULTI_ARG_2_SI,
+  MULTI_ARG_2_HI,
+  MULTI_ARG_2_QI,
+  MULTI_ARG_2_DI_IMM,
+  MULTI_ARG_2_SI_IMM,
+  MULTI_ARG_2_HI_IMM,
+  MULTI_ARG_2_QI_IMM,
+  MULTI_ARG_2_DI_CMP,
+  MULTI_ARG_2_SI_CMP,
+  MULTI_ARG_2_HI_CMP,
+  MULTI_ARG_2_QI_CMP,
+  MULTI_ARG_2_DI_TF,
+  MULTI_ARG_2_SI_TF,
+  MULTI_ARG_2_HI_TF,
+  MULTI_ARG_2_QI_TF,
+  MULTI_ARG_2_SF_TF,
+  MULTI_ARG_2_DF_TF,
+  MULTI_ARG_1_SF,
+  MULTI_ARG_1_DF,
+  MULTI_ARG_1_SF2,
+  MULTI_ARG_1_DF2,
+  MULTI_ARG_1_DI,
+  MULTI_ARG_1_SI,
+  MULTI_ARG_1_HI,
+  MULTI_ARG_1_QI,
+  MULTI_ARG_1_SI_DI,
+  MULTI_ARG_1_HI_DI,
+  MULTI_ARG_1_HI_SI,
+  MULTI_ARG_1_QI_DI,
+  MULTI_ARG_1_QI_SI,
+  MULTI_ARG_1_QI_HI
+
 };
 
 static const struct builtin_description bdesc_multi_arg[] =
@@ -21865,7 +22329,160 @@ static const struct builtin_description 
   { OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fmaddsubv8sf4,	   "__builtin_ia32_vfmaddsubps256", IX86_BUILTIN_VFMADDSUBPS256,    UNKNOWN,      (int)MULTI_ARG_3_SF2 },
   { OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fmaddsubv4df4,	   "__builtin_ia32_vfmaddsubpd256", IX86_BUILTIN_VFMADDSUBPD256,    UNKNOWN,      (int)MULTI_ARG_3_DF2 },
   { OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fmsubaddv8sf4,	   "__builtin_ia32_vfmsubaddps256", IX86_BUILTIN_VFMSUBADDPS256,    UNKNOWN,      (int)MULTI_ARG_3_SF2 },
-  { OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fmsubaddv4df4,	   "__builtin_ia32_vfmsubaddpd256", IX86_BUILTIN_VFMSUBADDPD256,    UNKNOWN,      (int)MULTI_ARG_3_DF2 }
+  { OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fmsubaddv4df4,	   "__builtin_ia32_vfmsubaddpd256", IX86_BUILTIN_VFMSUBADDPD256,    UNKNOWN,      (int)MULTI_ARG_3_DF2 },
+
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_pcmov_v2di,        "__builtin_ia32_vpcmov",      IX86_BUILTIN_VPCMOV,	 UNKNOWN,      (int)MULTI_ARG_3_DI },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_pcmov_v2di,        "__builtin_ia32_vpcmov_v2di", IX86_BUILTIN_VPCMOV_V2DI, UNKNOWN,      (int)MULTI_ARG_3_DI },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_pcmov_v4si,        "__builtin_ia32_vpcmov_v4si", IX86_BUILTIN_VPCMOV_V4SI, UNKNOWN,      (int)MULTI_ARG_3_SI },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_pcmov_v8hi,        "__builtin_ia32_vpcmov_v8hi", IX86_BUILTIN_VPCMOV_V8HI, UNKNOWN,      (int)MULTI_ARG_3_HI },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_pcmov_v16qi,       "__builtin_ia32_vpcmov_v16qi",IX86_BUILTIN_VPCMOV_V16QI,UNKNOWN,      (int)MULTI_ARG_3_QI },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_pcmov_v2df,        "__builtin_ia32_vpcmov_v2df", IX86_BUILTIN_VPCMOV_V2DF, UNKNOWN,      (int)MULTI_ARG_3_DF },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_pcmov_v4sf,        "__builtin_ia32_vpcmov_v4sf", IX86_BUILTIN_VPCMOV_V4SF, UNKNOWN,      (int)MULTI_ARG_3_SF },
+
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_pcmov_v4di256,        "__builtin_ia32_vpcmov256",       IX86_BUILTIN_VPCMOV256,       UNKNOWN,      (int)MULTI_ARG_3_DI2 },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_pcmov_v4di256,        "__builtin_ia32_vpcmov_v4di256",  IX86_BUILTIN_VPCMOV_V4DI256,  UNKNOWN,      (int)MULTI_ARG_3_DI2 },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_pcmov_v8si256,        "__builtin_ia32_vpcmov_v8si256",  IX86_BUILTIN_VPCMOV_V8SI256,  UNKNOWN,      (int)MULTI_ARG_3_SI2 },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_pcmov_v16hi256,       "__builtin_ia32_vpcmov_v16hi256", IX86_BUILTIN_VPCMOV_V16HI256, UNKNOWN,      (int)MULTI_ARG_3_HI2 },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_pcmov_v32qi256,       "__builtin_ia32_vpcmov_v32qi256", IX86_BUILTIN_VPCMOV_V32QI256, UNKNOWN,      (int)MULTI_ARG_3_QI2 },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_pcmov_v4df256,        "__builtin_ia32_vpcmov_v4df256",  IX86_BUILTIN_VPCMOV_V4DF256,  UNKNOWN,      (int)MULTI_ARG_3_DF2 },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_pcmov_v8sf256,        "__builtin_ia32_vpcmov_v8sf256",  IX86_BUILTIN_VPCMOV_V8SF256,  UNKNOWN,      (int)MULTI_ARG_3_SF2 },
+
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_pperm,             "__builtin_ia32_vpperm",      IX86_BUILTIN_VPPERM,      UNKNOWN,      (int)MULTI_ARG_3_QI },
+
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_pmacssww,          "__builtin_ia32_vpmacssww",   IX86_BUILTIN_VPMACSSWW,   UNKNOWN,      (int)MULTI_ARG_3_HI },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_pmacsww,           "__builtin_ia32_vpmacsww",    IX86_BUILTIN_VPMACSWW,    UNKNOWN,      (int)MULTI_ARG_3_HI },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_pmacsswd,          "__builtin_ia32_vpmacsswd",   IX86_BUILTIN_VPMACSSWD,   UNKNOWN,      (int)MULTI_ARG_3_HI_SI },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_pmacswd,           "__builtin_ia32_vpmacswd",    IX86_BUILTIN_VPMACSWD,    UNKNOWN,      (int)MULTI_ARG_3_HI_SI },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_pmacssdd,          "__builtin_ia32_vpmacssdd",   IX86_BUILTIN_VPMACSSDD,   UNKNOWN,      (int)MULTI_ARG_3_SI },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_pmacsdd,           "__builtin_ia32_vpmacsdd",    IX86_BUILTIN_VPMACSDD,    UNKNOWN,      (int)MULTI_ARG_3_SI },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_pmacssdql,         "__builtin_ia32_vpmacssdql",  IX86_BUILTIN_VPMACSSDQL,  UNKNOWN,      (int)MULTI_ARG_3_SI_DI },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_pmacssdqh,         "__builtin_ia32_vpmacssdqh",  IX86_BUILTIN_VPMACSSDQH,  UNKNOWN,      (int)MULTI_ARG_3_SI_DI },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_pmacsdql,          "__builtin_ia32_vpmacsdql",   IX86_BUILTIN_VPMACSDQL,   UNKNOWN,      (int)MULTI_ARG_3_SI_DI },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_pmacsdqh,          "__builtin_ia32_vpmacsdqh",   IX86_BUILTIN_VPMACSDQH,   UNKNOWN,      (int)MULTI_ARG_3_SI_DI },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_pmadcsswd,         "__builtin_ia32_vpmadcsswd",  IX86_BUILTIN_VPMADCSSWD,  UNKNOWN,      (int)MULTI_ARG_3_HI_SI },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_pmadcswd,          "__builtin_ia32_vpmadcswd",   IX86_BUILTIN_VPMADCSWD,   UNKNOWN,      (int)MULTI_ARG_3_HI_SI },
+
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_vrotlv2di3,        "__builtin_ia32_vprotq",      IX86_BUILTIN_VPROTQ,      UNKNOWN,      (int)MULTI_ARG_2_DI },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_vrotlv4si3,        "__builtin_ia32_vprotd",      IX86_BUILTIN_VPROTD,      UNKNOWN,      (int)MULTI_ARG_2_SI },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_vrotlv8hi3,        "__builtin_ia32_vprotw",      IX86_BUILTIN_VPROTW,      UNKNOWN,      (int)MULTI_ARG_2_HI },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_vrotlv16qi3,       "__builtin_ia32_vprotb",      IX86_BUILTIN_VPROTB,      UNKNOWN,      (int)MULTI_ARG_2_QI },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_rotlv2di3,         "__builtin_ia32_vprotqi",     IX86_BUILTIN_VPROTQ_IMM,  UNKNOWN,      (int)MULTI_ARG_2_DI_IMM },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_rotlv4si3,         "__builtin_ia32_vprotdi",     IX86_BUILTIN_VPROTD_IMM,  UNKNOWN,      (int)MULTI_ARG_2_SI_IMM },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_rotlv8hi3,         "__builtin_ia32_vprotwi",     IX86_BUILTIN_VPROTW_IMM,  UNKNOWN,      (int)MULTI_ARG_2_HI_IMM },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_rotlv16qi3,        "__builtin_ia32_vprotbi",     IX86_BUILTIN_VPROTB_IMM,  UNKNOWN,      (int)MULTI_ARG_2_QI_IMM },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_ashlv2di3,         "__builtin_ia32_vpshaq",      IX86_BUILTIN_VPSHAQ,      UNKNOWN,      (int)MULTI_ARG_2_DI },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_ashlv4si3,         "__builtin_ia32_vpshad",      IX86_BUILTIN_VPSHAD,      UNKNOWN,      (int)MULTI_ARG_2_SI },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_ashlv8hi3,         "__builtin_ia32_vpshaw",      IX86_BUILTIN_VPSHAW,      UNKNOWN,      (int)MULTI_ARG_2_HI },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_ashlv16qi3,        "__builtin_ia32_vpshab",      IX86_BUILTIN_VPSHAB,      UNKNOWN,      (int)MULTI_ARG_2_QI },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_lshlv2di3,         "__builtin_ia32_vpshlq",      IX86_BUILTIN_VPSHLQ,      UNKNOWN,      (int)MULTI_ARG_2_DI },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_lshlv4si3,         "__builtin_ia32_vpshld",      IX86_BUILTIN_VPSHLD,      UNKNOWN,      (int)MULTI_ARG_2_SI },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_lshlv8hi3,         "__builtin_ia32_vpshlw",      IX86_BUILTIN_VPSHLW,      UNKNOWN,      (int)MULTI_ARG_2_HI },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_lshlv16qi3,        "__builtin_ia32_vpshlb",      IX86_BUILTIN_VPSHLB,      UNKNOWN,      (int)MULTI_ARG_2_QI },
+
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_vmfrczv4sf2,       "__builtin_ia32_vfrczss",     IX86_BUILTIN_VFRCZSS,     UNKNOWN,      (int)MULTI_ARG_2_SF },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_vmfrczv2df2,       "__builtin_ia32_vfrczsd",     IX86_BUILTIN_VFRCZSD,     UNKNOWN,      (int)MULTI_ARG_2_DF },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_frczv4sf2,         "__builtin_ia32_vfrczps",     IX86_BUILTIN_VFRCZPS,     UNKNOWN,      (int)MULTI_ARG_1_SF },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_frczv2df2,         "__builtin_ia32_vfrczpd",     IX86_BUILTIN_VFRCZPD,     UNKNOWN,      (int)MULTI_ARG_1_DF },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_frczv8sf2256,         "__builtin_ia32_vfrczps256",  IX86_BUILTIN_VFRCZPS256,  UNKNOWN,      (int)MULTI_ARG_1_SF2 },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_frczv4df2256,         "__builtin_ia32_vfrczpd256",  IX86_BUILTIN_VFRCZPD256,  UNKNOWN,      (int)MULTI_ARG_1_DF2 },
+
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_phaddbw,           "__builtin_ia32_vphaddbw",    IX86_BUILTIN_VPHADDBW,    UNKNOWN,      (int)MULTI_ARG_1_QI_HI },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_phaddbd,           "__builtin_ia32_vphaddbd",    IX86_BUILTIN_VPHADDBD,    UNKNOWN,      (int)MULTI_ARG_1_QI_SI },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_phaddbq,           "__builtin_ia32_vphaddbq",    IX86_BUILTIN_VPHADDBQ,    UNKNOWN,      (int)MULTI_ARG_1_QI_DI },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_phaddwd,           "__builtin_ia32_vphaddwd",    IX86_BUILTIN_VPHADDWD,    UNKNOWN,      (int)MULTI_ARG_1_HI_SI },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_phaddwq,           "__builtin_ia32_vphaddwq",    IX86_BUILTIN_VPHADDWQ,    UNKNOWN,      (int)MULTI_ARG_1_HI_DI },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_phadddq,           "__builtin_ia32_vphadddq",    IX86_BUILTIN_VPHADDDQ,    UNKNOWN,      (int)MULTI_ARG_1_SI_DI },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_phaddubw,          "__builtin_ia32_vphaddubw",   IX86_BUILTIN_VPHADDUBW,   UNKNOWN,      (int)MULTI_ARG_1_QI_HI },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_phaddubd,          "__builtin_ia32_vphaddubd",   IX86_BUILTIN_VPHADDUBD,   UNKNOWN,      (int)MULTI_ARG_1_QI_SI },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_phaddubq,          "__builtin_ia32_vphaddubq",   IX86_BUILTIN_VPHADDUBQ,   UNKNOWN,      (int)MULTI_ARG_1_QI_DI },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_phadduwd,          "__builtin_ia32_vphadduwd",   IX86_BUILTIN_VPHADDUWD,   UNKNOWN,      (int)MULTI_ARG_1_HI_SI },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_phadduwq,          "__builtin_ia32_vphadduwq",   IX86_BUILTIN_VPHADDUWQ,   UNKNOWN,      (int)MULTI_ARG_1_HI_DI },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_phaddudq,          "__builtin_ia32_vphaddudq",   IX86_BUILTIN_VPHADDUDQ,   UNKNOWN,      (int)MULTI_ARG_1_SI_DI },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_phsubbw,           "__builtin_ia32_vphsubbw",    IX86_BUILTIN_VPHSUBBW,    UNKNOWN,      (int)MULTI_ARG_1_QI_HI },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_phsubwd,           "__builtin_ia32_vphsubwd",    IX86_BUILTIN_VPHSUBWD,    UNKNOWN,      (int)MULTI_ARG_1_HI_SI },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_phsubdq,           "__builtin_ia32_vphsubdq",    IX86_BUILTIN_VPHSUBDQ,    UNKNOWN,      (int)MULTI_ARG_1_SI_DI },
+
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_maskcmpv16qi3,     "__builtin_ia32_vpcomeqb",    IX86_BUILTIN_VPCOMEQB,    EQ,           (int)MULTI_ARG_2_QI_CMP },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_maskcmpv16qi3,     "__builtin_ia32_vpcomneb",    IX86_BUILTIN_VPCOMNEB,    NE,           (int)MULTI_ARG_2_QI_CMP },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_maskcmpv16qi3,     "__builtin_ia32_vpcomneqb",   IX86_BUILTIN_VPCOMNEB,    NE,           (int)MULTI_ARG_2_QI_CMP },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_maskcmpv16qi3,     "__builtin_ia32_vpcomltb",    IX86_BUILTIN_VPCOMLTB,    LT,           (int)MULTI_ARG_2_QI_CMP },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_maskcmpv16qi3,     "__builtin_ia32_vpcomleb",    IX86_BUILTIN_VPCOMLEB,    LE,           (int)MULTI_ARG_2_QI_CMP },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_maskcmpv16qi3,     "__builtin_ia32_vpcomgtb",    IX86_BUILTIN_VPCOMGTB,    GT,           (int)MULTI_ARG_2_QI_CMP },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_maskcmpv16qi3,     "__builtin_ia32_vpcomgeb",    IX86_BUILTIN_VPCOMGEB,    GE,           (int)MULTI_ARG_2_QI_CMP },
+
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_maskcmpv8hi3,      "__builtin_ia32_vpcomeqw",    IX86_BUILTIN_VPCOMEQW,    EQ,           (int)MULTI_ARG_2_HI_CMP },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_maskcmpv8hi3,      "__builtin_ia32_vpcomnew",    IX86_BUILTIN_VPCOMNEW,    NE,           (int)MULTI_ARG_2_HI_CMP },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_maskcmpv8hi3,      "__builtin_ia32_vpcomneqw",   IX86_BUILTIN_VPCOMNEW,    NE,           (int)MULTI_ARG_2_HI_CMP },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_maskcmpv8hi3,      "__builtin_ia32_vpcomltw",    IX86_BUILTIN_VPCOMLTW,    LT,           (int)MULTI_ARG_2_HI_CMP },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_maskcmpv8hi3,      "__builtin_ia32_vpcomlew",    IX86_BUILTIN_VPCOMLEW,    LE,           (int)MULTI_ARG_2_HI_CMP },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_maskcmpv8hi3,      "__builtin_ia32_vpcomgtw",    IX86_BUILTIN_VPCOMGTW,    GT,           (int)MULTI_ARG_2_HI_CMP },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_maskcmpv8hi3,      "__builtin_ia32_vpcomgew",    IX86_BUILTIN_VPCOMGEW,    GE,           (int)MULTI_ARG_2_HI_CMP },
+
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_maskcmpv4si3,      "__builtin_ia32_vpcomeqd",    IX86_BUILTIN_VPCOMEQD,    EQ,           (int)MULTI_ARG_2_SI_CMP },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_maskcmpv4si3,      "__builtin_ia32_vpcomned",    IX86_BUILTIN_VPCOMNED,    NE,           (int)MULTI_ARG_2_SI_CMP },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_maskcmpv4si3,      "__builtin_ia32_vpcomneqd",   IX86_BUILTIN_VPCOMNED,    NE,           (int)MULTI_ARG_2_SI_CMP },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_maskcmpv4si3,      "__builtin_ia32_vpcomltd",    IX86_BUILTIN_VPCOMLTD,    LT,           (int)MULTI_ARG_2_SI_CMP },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_maskcmpv4si3,      "__builtin_ia32_vpcomled",    IX86_BUILTIN_VPCOMLED,    LE,           (int)MULTI_ARG_2_SI_CMP },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_maskcmpv4si3,      "__builtin_ia32_vpcomgtd",    IX86_BUILTIN_VPCOMGTD,    GT,           (int)MULTI_ARG_2_SI_CMP },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_maskcmpv4si3,      "__builtin_ia32_vpcomged",    IX86_BUILTIN_VPCOMGED,    GE,           (int)MULTI_ARG_2_SI_CMP },
+
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_maskcmpv2di3,      "__builtin_ia32_vpcomeqq",    IX86_BUILTIN_VPCOMEQQ,    EQ,           (int)MULTI_ARG_2_DI_CMP },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_maskcmpv2di3,      "__builtin_ia32_vpcomneq",    IX86_BUILTIN_VPCOMNEQ,    NE,           (int)MULTI_ARG_2_DI_CMP },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_maskcmpv2di3,      "__builtin_ia32_vpcomneqq",   IX86_BUILTIN_VPCOMNEQ,    NE,           (int)MULTI_ARG_2_DI_CMP },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_maskcmpv2di3,      "__builtin_ia32_vpcomltq",    IX86_BUILTIN_VPCOMLTQ,    LT,           (int)MULTI_ARG_2_DI_CMP },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_maskcmpv2di3,      "__builtin_ia32_vpcomleq",    IX86_BUILTIN_VPCOMLEQ,    LE,           (int)MULTI_ARG_2_DI_CMP },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_maskcmpv2di3,      "__builtin_ia32_vpcomgtq",    IX86_BUILTIN_VPCOMGTQ,    GT,           (int)MULTI_ARG_2_DI_CMP },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_maskcmpv2di3,      "__builtin_ia32_vpcomgeq",    IX86_BUILTIN_VPCOMGEQ,    GE,           (int)MULTI_ARG_2_DI_CMP },
+
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_maskcmp_uns2v16qi3,"__builtin_ia32_vpcomequb",   IX86_BUILTIN_VPCOMEQUB,   EQ,           (int)MULTI_ARG_2_QI_CMP },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_maskcmp_uns2v16qi3,"__builtin_ia32_vpcomneub",   IX86_BUILTIN_VPCOMNEUB,   NE,           (int)MULTI_ARG_2_QI_CMP },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_maskcmp_uns2v16qi3,"__builtin_ia32_vpcomnequb",  IX86_BUILTIN_VPCOMNEUB,   NE,           (int)MULTI_ARG_2_QI_CMP },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_maskcmp_unsv16qi3, "__builtin_ia32_vpcomltub",   IX86_BUILTIN_VPCOMLTUB,   LTU,          (int)MULTI_ARG_2_QI_CMP },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_maskcmp_unsv16qi3, "__builtin_ia32_vpcomleub",   IX86_BUILTIN_VPCOMLEUB,   LEU,          (int)MULTI_ARG_2_QI_CMP },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_maskcmp_unsv16qi3, "__builtin_ia32_vpcomgtub",   IX86_BUILTIN_VPCOMGTUB,   GTU,          (int)MULTI_ARG_2_QI_CMP },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_maskcmp_unsv16qi3, "__builtin_ia32_vpcomgeub",   IX86_BUILTIN_VPCOMGEUB,   GEU,          (int)MULTI_ARG_2_QI_CMP },
+
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_maskcmp_uns2v8hi3, "__builtin_ia32_vpcomequw",   IX86_BUILTIN_VPCOMEQUW,   EQ,           (int)MULTI_ARG_2_HI_CMP },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_maskcmp_uns2v8hi3, "__builtin_ia32_vpcomneuw",   IX86_BUILTIN_VPCOMNEUW,   NE,           (int)MULTI_ARG_2_HI_CMP },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_maskcmp_uns2v8hi3, "__builtin_ia32_vpcomnequw",  IX86_BUILTIN_VPCOMNEUW,   NE,           (int)MULTI_ARG_2_HI_CMP },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_maskcmp_unsv8hi3,  "__builtin_ia32_vpcomltuw",   IX86_BUILTIN_VPCOMLTUW,   LTU,          (int)MULTI_ARG_2_HI_CMP },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_maskcmp_unsv8hi3,  "__builtin_ia32_vpcomleuw",   IX86_BUILTIN_VPCOMLEUW,   LEU,          (int)MULTI_ARG_2_HI_CMP },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_maskcmp_unsv8hi3,  "__builtin_ia32_vpcomgtuw",   IX86_BUILTIN_VPCOMGTUW,   GTU,          (int)MULTI_ARG_2_HI_CMP },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_maskcmp_unsv8hi3,  "__builtin_ia32_vpcomgeuw",   IX86_BUILTIN_VPCOMGEUW,   GEU,          (int)MULTI_ARG_2_HI_CMP },
+
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_maskcmp_uns2v4si3, "__builtin_ia32_vpcomequd",   IX86_BUILTIN_VPCOMEQUD,   EQ,           (int)MULTI_ARG_2_SI_CMP },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_maskcmp_uns2v4si3, "__builtin_ia32_vpcomneud",   IX86_BUILTIN_VPCOMNEUD,   NE,           (int)MULTI_ARG_2_SI_CMP },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_maskcmp_uns2v4si3, "__builtin_ia32_vpcomnequd",  IX86_BUILTIN_VPCOMNEUD,   NE,           (int)MULTI_ARG_2_SI_CMP },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_maskcmp_unsv4si3,  "__builtin_ia32_vpcomltud",   IX86_BUILTIN_VPCOMLTUD,   LTU,          (int)MULTI_ARG_2_SI_CMP },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_maskcmp_unsv4si3,  "__builtin_ia32_vpcomleud",   IX86_BUILTIN_VPCOMLEUD,   LEU,          (int)MULTI_ARG_2_SI_CMP },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_maskcmp_unsv4si3,  "__builtin_ia32_vpcomgtud",   IX86_BUILTIN_VPCOMGTUD,   GTU,          (int)MULTI_ARG_2_SI_CMP },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_maskcmp_unsv4si3,  "__builtin_ia32_vpcomgeud",   IX86_BUILTIN_VPCOMGEUD,   GEU,          (int)MULTI_ARG_2_SI_CMP },
+
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_maskcmp_uns2v2di3, "__builtin_ia32_vpcomequq",   IX86_BUILTIN_VPCOMEQUQ,   EQ,           (int)MULTI_ARG_2_DI_CMP },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_maskcmp_uns2v2di3, "__builtin_ia32_vpcomneuq",   IX86_BUILTIN_VPCOMNEUQ,   NE,           (int)MULTI_ARG_2_DI_CMP },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_maskcmp_uns2v2di3, "__builtin_ia32_vpcomnequq",  IX86_BUILTIN_VPCOMNEUQ,   NE,           (int)MULTI_ARG_2_DI_CMP },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_maskcmp_unsv2di3,  "__builtin_ia32_vpcomltuq",   IX86_BUILTIN_VPCOMLTUQ,   LTU,          (int)MULTI_ARG_2_DI_CMP },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_maskcmp_unsv2di3,  "__builtin_ia32_vpcomleuq",   IX86_BUILTIN_VPCOMLEUQ,   LEU,          (int)MULTI_ARG_2_DI_CMP },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_maskcmp_unsv2di3,  "__builtin_ia32_vpcomgtuq",   IX86_BUILTIN_VPCOMGTUQ,   GTU,          (int)MULTI_ARG_2_DI_CMP },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_maskcmp_unsv2di3,  "__builtin_ia32_vpcomgeuq",   IX86_BUILTIN_VPCOMGEUQ,   GEU,          (int)MULTI_ARG_2_DI_CMP },
+
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_pcom_tfv16qi3,     "__builtin_ia32_vpcomfalseb", IX86_BUILTIN_VPCOMFALSEB, (enum rtx_code) PCOM_FALSE,   (int)MULTI_ARG_2_QI_TF },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_pcom_tfv8hi3,      "__builtin_ia32_vpcomfalsew", IX86_BUILTIN_VPCOMFALSEW, (enum rtx_code) PCOM_FALSE,   (int)MULTI_ARG_2_HI_TF },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_pcom_tfv4si3,      "__builtin_ia32_vpcomfalsed", IX86_BUILTIN_VPCOMFALSED, (enum rtx_code) PCOM_FALSE,   (int)MULTI_ARG_2_SI_TF },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_pcom_tfv2di3,      "__builtin_ia32_vpcomfalseq", IX86_BUILTIN_VPCOMFALSEQ, (enum rtx_code) PCOM_FALSE,   (int)MULTI_ARG_2_DI_TF },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_pcom_tfv16qi3,     "__builtin_ia32_vpcomfalseub",IX86_BUILTIN_VPCOMFALSEUB,(enum rtx_code) PCOM_FALSE,   (int)MULTI_ARG_2_QI_TF },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_pcom_tfv8hi3,      "__builtin_ia32_vpcomfalseuw",IX86_BUILTIN_VPCOMFALSEUW,(enum rtx_code) PCOM_FALSE,   (int)MULTI_ARG_2_HI_TF },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_pcom_tfv4si3,      "__builtin_ia32_vpcomfalseud",IX86_BUILTIN_VPCOMFALSEUD,(enum rtx_code) PCOM_FALSE,   (int)MULTI_ARG_2_SI_TF },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_pcom_tfv2di3,      "__builtin_ia32_vpcomfalseuq",IX86_BUILTIN_VPCOMFALSEUQ,(enum rtx_code) PCOM_FALSE,   (int)MULTI_ARG_2_DI_TF },
+
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_pcom_tfv16qi3,     "__builtin_ia32_vpcomtrueb",  IX86_BUILTIN_VPCOMTRUEB,  (enum rtx_code) PCOM_TRUE,    (int)MULTI_ARG_2_QI_TF },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_pcom_tfv8hi3,      "__builtin_ia32_vpcomtruew",  IX86_BUILTIN_VPCOMTRUEW,  (enum rtx_code) PCOM_TRUE,    (int)MULTI_ARG_2_HI_TF },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_pcom_tfv4si3,      "__builtin_ia32_vpcomtrued",  IX86_BUILTIN_VPCOMTRUED,  (enum rtx_code) PCOM_TRUE,    (int)MULTI_ARG_2_SI_TF },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_pcom_tfv2di3,      "__builtin_ia32_vpcomtrueq",  IX86_BUILTIN_VPCOMTRUEQ,  (enum rtx_code) PCOM_TRUE,    (int)MULTI_ARG_2_DI_TF },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_pcom_tfv16qi3,     "__builtin_ia32_vpcomtrueub", IX86_BUILTIN_VPCOMTRUEUB, (enum rtx_code) PCOM_TRUE,    (int)MULTI_ARG_2_QI_TF },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_pcom_tfv8hi3,      "__builtin_ia32_vpcomtrueuw", IX86_BUILTIN_VPCOMTRUEUW, (enum rtx_code) PCOM_TRUE,    (int)MULTI_ARG_2_HI_TF },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_pcom_tfv4si3,      "__builtin_ia32_vpcomtrueud", IX86_BUILTIN_VPCOMTRUEUD, (enum rtx_code) PCOM_TRUE,    (int)MULTI_ARG_2_SI_TF },
+  { OPTION_MASK_ISA_XOP, CODE_FOR_xop_pcom_tfv2di3,      "__builtin_ia32_vpcomtrueuq", IX86_BUILTIN_VPCOMTRUEUQ, (enum rtx_code) PCOM_TRUE,    (int)MULTI_ARG_2_DI_TF },
 
 };
 
@@ -22247,51 +22864,6 @@ ix86_init_mmx_sse_builtins (void)
 				integer_type_node,
 				NULL_TREE);
 
-
-  tree v2di_ftype_v2di
-    = build_function_type_list (V2DI_type_node, V2DI_type_node, NULL_TREE);
-
-  tree v16qi_ftype_v8hi_v8hi
-    = build_function_type_list (V16QI_type_node,
-				V8HI_type_node, V8HI_type_node,
-				NULL_TREE);
-  tree v8hi_ftype_v4si_v4si
-    = build_function_type_list (V8HI_type_node,
-				V4SI_type_node, V4SI_type_node,
-				NULL_TREE);
-  tree v8hi_ftype_v16qi_v16qi 
-    = build_function_type_list (V8HI_type_node,
-				V16QI_type_node, V16QI_type_node,
-				NULL_TREE);
-  tree v4hi_ftype_v8qi_v8qi 
-    = build_function_type_list (V4HI_type_node,
-				V8QI_type_node, V8QI_type_node,
-				NULL_TREE);
-  tree unsigned_ftype_unsigned_uchar
-    = build_function_type_list (unsigned_type_node,
-				unsigned_type_node,
-				unsigned_char_type_node,
-				NULL_TREE);
-  tree unsigned_ftype_unsigned_ushort
-    = build_function_type_list (unsigned_type_node,
-				unsigned_type_node,
-				short_unsigned_type_node,
-				NULL_TREE);
-  tree unsigned_ftype_unsigned_unsigned
-    = build_function_type_list (unsigned_type_node,
-				unsigned_type_node,
-				unsigned_type_node,
-				NULL_TREE);
-  tree uint64_ftype_uint64_uint64
-    = build_function_type_list (long_long_unsigned_type_node,
-				long_long_unsigned_type_node,
-				long_long_unsigned_type_node,
-				NULL_TREE);
-  tree float_ftype_float
-    = build_function_type_list (float_type_node,
-				float_type_node,
-				NULL_TREE);
-
   /* AVX builtins  */
   tree V32QI_type_node = build_vector_type_for_mode (char_type_node,
 						     V32QImode);
@@ -22303,6 +22875,8 @@ ix86_init_mmx_sse_builtins (void)
 						    V4DImode);
   tree V4DF_type_node = build_vector_type_for_mode (double_type_node,
 						    V4DFmode);
+  tree V16HI_type_node = build_vector_type_for_mode (intHI_type_node,
+						     V16HImode);
   tree v8sf_ftype_v8sf
     = build_function_type_list (V8SF_type_node,
 				V8SF_type_node,
@@ -22547,6 +23121,138 @@ ix86_init_mmx_sse_builtins (void)
     = build_function_type_list (V2DF_type_node,
 				V2DF_type_node, V2DI_type_node, NULL_TREE);
 
+  /* XOP instructions */
+  tree v2di_ftype_v2di_v2di_v2di
+    = build_function_type_list (V2DI_type_node,
+				V2DI_type_node,
+				V2DI_type_node,
+				V2DI_type_node,
+				NULL_TREE);
+
+  tree v4di_ftype_v4di_v4di_v4di
+    = build_function_type_list (V4DI_type_node,
+				V4DI_type_node,
+				V4DI_type_node,
+				V4DI_type_node,
+				NULL_TREE);
+
+  tree v4si_ftype_v4si_v4si_v4si
+    = build_function_type_list (V4SI_type_node,
+				V4SI_type_node,
+				V4SI_type_node,
+				V4SI_type_node,
+				NULL_TREE);
+
+  tree v8si_ftype_v8si_v8si_v8si
+    = build_function_type_list (V8SI_type_node,
+				V8SI_type_node,
+				V8SI_type_node,
+				V8SI_type_node,
+				NULL_TREE);
+
+  tree v32qi_ftype_v32qi_v32qi_v32qi
+    = build_function_type_list (V32QI_type_node,
+				V32QI_type_node,
+				V32QI_type_node,
+				V32QI_type_node,
+				NULL_TREE);
+
+  tree v4si_ftype_v4si_v4si_v2di
+    = build_function_type_list (V4SI_type_node,
+				V4SI_type_node,
+				V4SI_type_node,
+				V2DI_type_node,
+				NULL_TREE);
+
+  tree v8hi_ftype_v8hi_v8hi_v8hi
+    = build_function_type_list (V8HI_type_node,
+				V8HI_type_node,
+				V8HI_type_node,
+				V8HI_type_node,
+				NULL_TREE);
+
+  tree v16hi_ftype_v16hi_v16hi_v16hi
+    = build_function_type_list (V16HI_type_node,
+				V16HI_type_node,
+				V16HI_type_node,
+				V16HI_type_node,
+				NULL_TREE);
+
+  tree v8hi_ftype_v8hi_v8hi_v4si
+    = build_function_type_list (V8HI_type_node,
+				V8HI_type_node,
+				V8HI_type_node,
+				V4SI_type_node,
+				NULL_TREE);
+
+  tree v2di_ftype_v2di_si
+    = build_function_type_list (V2DI_type_node,
+				V2DI_type_node,
+				integer_type_node,
+				NULL_TREE);
+
+  tree v4si_ftype_v4si_si
+    = build_function_type_list (V4SI_type_node,
+				V4SI_type_node,
+				integer_type_node,
+				NULL_TREE);
+
+  tree v8hi_ftype_v8hi_si
+    = build_function_type_list (V8HI_type_node,
+				V8HI_type_node,
+				integer_type_node,
+				NULL_TREE);
+
+  tree v16qi_ftype_v16qi_si
+    = build_function_type_list (V16QI_type_node,
+				V16QI_type_node,
+				integer_type_node,
+				NULL_TREE);
+
+  tree v2di_ftype_v2di
+    = build_function_type_list (V2DI_type_node, V2DI_type_node, NULL_TREE);
+
+  tree v16qi_ftype_v8hi_v8hi
+    = build_function_type_list (V16QI_type_node,
+				V8HI_type_node, V8HI_type_node,
+				NULL_TREE);
+  tree v8hi_ftype_v4si_v4si
+    = build_function_type_list (V8HI_type_node,
+				V4SI_type_node, V4SI_type_node,
+				NULL_TREE);
+  tree v8hi_ftype_v16qi_v16qi 
+    = build_function_type_list (V8HI_type_node,
+				V16QI_type_node, V16QI_type_node,
+				NULL_TREE);
+  tree v4hi_ftype_v8qi_v8qi 
+    = build_function_type_list (V4HI_type_node,
+				V8QI_type_node, V8QI_type_node,
+				NULL_TREE);
+  tree unsigned_ftype_unsigned_uchar
+    = build_function_type_list (unsigned_type_node,
+				unsigned_type_node,
+				unsigned_char_type_node,
+				NULL_TREE);
+  tree unsigned_ftype_unsigned_ushort
+    = build_function_type_list (unsigned_type_node,
+				unsigned_type_node,
+				short_unsigned_type_node,
+				NULL_TREE);
+  tree unsigned_ftype_unsigned_unsigned
+    = build_function_type_list (unsigned_type_node,
+				unsigned_type_node,
+				unsigned_type_node,
+				NULL_TREE);
+  tree uint64_ftype_uint64_uint64
+    = build_function_type_list (long_long_unsigned_type_node,
+				long_long_unsigned_type_node,
+				long_long_unsigned_type_node,
+				NULL_TREE);
+  tree float_ftype_float
+    = build_function_type_list (float_type_node,
+				float_type_node,
+				NULL_TREE);
+
   /* Integer intrinsics.  */
   tree uint64_ftype_void
     = build_function_type (long_long_unsigned_type_node,
@@ -23315,6 +24021,50 @@ ix86_init_mmx_sse_builtins (void)
 	case MULTI_ARG_3_DF:     mtype = v2df_ftype_v2df_v2df_v2df; 	break;
 	case MULTI_ARG_3_SF2:    mtype = v8sf_ftype_v8sf_v8sf_v8sf; 	break;
 	case MULTI_ARG_3_DF2:    mtype = v4df_ftype_v4df_v4df_v4df; 	break;
+	case MULTI_ARG_3_DI:     mtype = v2di_ftype_v2di_v2di_v2di; 	break;
+	case MULTI_ARG_3_SI:     mtype = v4si_ftype_v4si_v4si_v4si; 	break;
+	case MULTI_ARG_3_SI_DI:  mtype = v4si_ftype_v4si_v4si_v2di; 	break;
+	case MULTI_ARG_3_HI:     mtype = v8hi_ftype_v8hi_v8hi_v8hi; 	break;
+	case MULTI_ARG_3_HI_SI:  mtype = v8hi_ftype_v8hi_v8hi_v4si; 	break;
+	case MULTI_ARG_3_QI:     mtype = v16qi_ftype_v16qi_v16qi_v16qi; break;
+	case MULTI_ARG_3_DI2:    mtype = v4di_ftype_v4di_v4di_v4di; 	break;
+	case MULTI_ARG_3_SI2:    mtype = v8si_ftype_v8si_v8si_v8si; 	break;
+	case MULTI_ARG_3_HI2:    mtype = v16hi_ftype_v16hi_v16hi_v16hi; break;
+	case MULTI_ARG_3_QI2:    mtype = v32qi_ftype_v32qi_v32qi_v32qi; break;
+	case MULTI_ARG_2_SF:     mtype = v4sf_ftype_v4sf_v4sf;      	break;
+	case MULTI_ARG_2_DF:     mtype = v2df_ftype_v2df_v2df;      	break;
+	case MULTI_ARG_2_DI:     mtype = v2di_ftype_v2di_v2di;      	break;
+	case MULTI_ARG_2_SI:     mtype = v4si_ftype_v4si_v4si;      	break;
+	case MULTI_ARG_2_HI:     mtype = v8hi_ftype_v8hi_v8hi;      	break;
+	case MULTI_ARG_2_QI:     mtype = v16qi_ftype_v16qi_v16qi;      	break;
+	case MULTI_ARG_2_DI_IMM: mtype = v2di_ftype_v2di_si;        	break;
+	case MULTI_ARG_2_SI_IMM: mtype = v4si_ftype_v4si_si;        	break;
+	case MULTI_ARG_2_HI_IMM: mtype = v8hi_ftype_v8hi_si;        	break;
+	case MULTI_ARG_2_QI_IMM: mtype = v16qi_ftype_v16qi_si;        	break;
+	case MULTI_ARG_2_DI_CMP: mtype = v2di_ftype_v2di_v2di;      	break;
+	case MULTI_ARG_2_SI_CMP: mtype = v4si_ftype_v4si_v4si;      	break;
+	case MULTI_ARG_2_HI_CMP: mtype = v8hi_ftype_v8hi_v8hi;      	break;
+	case MULTI_ARG_2_QI_CMP: mtype = v16qi_ftype_v16qi_v16qi;      	break;
+	case MULTI_ARG_2_SF_TF:  mtype = v4sf_ftype_v4sf_v4sf;      	break;
+	case MULTI_ARG_2_DF_TF:  mtype = v2df_ftype_v2df_v2df;      	break;
+	case MULTI_ARG_2_DI_TF:  mtype = v2di_ftype_v2di_v2di;      	break;
+	case MULTI_ARG_2_SI_TF:  mtype = v4si_ftype_v4si_v4si;      	break;
+	case MULTI_ARG_2_HI_TF:  mtype = v8hi_ftype_v8hi_v8hi;      	break;
+	case MULTI_ARG_2_QI_TF:  mtype = v16qi_ftype_v16qi_v16qi;      	break;
+	case MULTI_ARG_1_SF:     mtype = v4sf_ftype_v4sf;           	break;
+	case MULTI_ARG_1_DF:     mtype = v2df_ftype_v2df;           	break;
+	case MULTI_ARG_1_SF2:    mtype = v8sf_ftype_v8sf;           	break;
+	case MULTI_ARG_1_DF2:    mtype = v4df_ftype_v4df;           	break;
+	case MULTI_ARG_1_DI:     mtype = v2di_ftype_v2di;           	break;
+	case MULTI_ARG_1_SI:     mtype = v4si_ftype_v4si;           	break;
+	case MULTI_ARG_1_HI:     mtype = v8hi_ftype_v8hi;           	break;
+	case MULTI_ARG_1_QI:     mtype = v16qi_ftype_v16qi;           	break;
+	case MULTI_ARG_1_SI_DI:  mtype = v2di_ftype_v4si;           	break;
+	case MULTI_ARG_1_HI_DI:  mtype = v2di_ftype_v8hi;           	break;
+	case MULTI_ARG_1_HI_SI:  mtype = v4si_ftype_v8hi;           	break;
+	case MULTI_ARG_1_QI_DI:  mtype = v2di_ftype_v16qi;           	break;
+	case MULTI_ARG_1_QI_SI:  mtype = v4si_ftype_v16qi;           	break;
+	case MULTI_ARG_1_QI_HI:  mtype = v8hi_ftype_v16qi;           	break;
 
 	case MULTI_ARG_UNKNOWN:
 	default:
@@ -23523,9 +24273,71 @@ ix86_expand_multi_arg_builtin (enum insn
     case MULTI_ARG_3_DF:
     case MULTI_ARG_3_SF2:
     case MULTI_ARG_3_DF2:
+    case MULTI_ARG_3_DI:
+    case MULTI_ARG_3_SI:
+    case MULTI_ARG_3_SI_DI:
+    case MULTI_ARG_3_HI:
+    case MULTI_ARG_3_HI_SI:
+    case MULTI_ARG_3_QI:
+    case MULTI_ARG_3_DI2:
+    case MULTI_ARG_3_SI2:
+    case MULTI_ARG_3_HI2:
+    case MULTI_ARG_3_QI2:
       nargs = 3;
       break;
 
+    case MULTI_ARG_2_SF:
+    case MULTI_ARG_2_DF:
+    case MULTI_ARG_2_DI:
+    case MULTI_ARG_2_SI:
+    case MULTI_ARG_2_HI:
+    case MULTI_ARG_2_QI:
+      nargs = 2;
+      break;
+
+    case MULTI_ARG_2_DI_IMM:
+    case MULTI_ARG_2_SI_IMM:
+    case MULTI_ARG_2_HI_IMM:
+    case MULTI_ARG_2_QI_IMM:
+      nargs = 2;
+      last_arg_constant = true;
+      break;
+
+    case MULTI_ARG_1_SF:
+    case MULTI_ARG_1_DF:
+    case MULTI_ARG_1_SF2:
+    case MULTI_ARG_1_DF2:
+    case MULTI_ARG_1_DI:
+    case MULTI_ARG_1_SI:
+    case MULTI_ARG_1_HI:
+    case MULTI_ARG_1_QI:
+    case MULTI_ARG_1_SI_DI:
+    case MULTI_ARG_1_HI_DI:
+    case MULTI_ARG_1_HI_SI:
+    case MULTI_ARG_1_QI_DI:
+    case MULTI_ARG_1_QI_SI:
+    case MULTI_ARG_1_QI_HI:
+      nargs = 1;
+      break;
+
+    case MULTI_ARG_2_DI_CMP:
+    case MULTI_ARG_2_SI_CMP:
+    case MULTI_ARG_2_HI_CMP:
+    case MULTI_ARG_2_QI_CMP:
+      nargs = 2;
+      comparison_p = true;
+      break;
+
+    case MULTI_ARG_2_SF_TF:
+    case MULTI_ARG_2_DF_TF:
+    case MULTI_ARG_2_DI_TF:
+    case MULTI_ARG_2_SI_TF:
+    case MULTI_ARG_2_HI_TF:
+    case MULTI_ARG_2_QI_TF:
+      nargs = 2;
+      tf_p = true;
+      break;
+
     case MULTI_ARG_UNKNOWN:
     default:
       gcc_unreachable ();

^ permalink raw reply	[flat|nested] 15+ messages in thread
* RE: PATCH: Add XOP 128-bit and 256-bit support for upcoming AMD  Orochi processor.
@ 2009-10-12 16:54 rajagopal, dwarak
  2009-10-13 23:17 ` Jan Hubicka
  0 siblings, 1 reply; 15+ messages in thread
From: rajagopal, dwarak @ 2009-10-12 16:54 UTC (permalink / raw)
  To: 'Jan Hubicka', gcc-patches
  Cc: Harle, Christophe, Jagasia, Harsha, 'Jan Hubicka'


Hi Honza,
I will be going to take over this patch from Harsha and wrap this up, so that I can check in this patch.

It would be great if you could answer the comments below from Harsha so that I will get more clarity.

Thanks,
Dwarak

Hi Honza,

I have gone through your feedback on the XOP patch and it makes sense and I will fix as per your suggestions.

However I am not entirely sure I understand the 2 comments below. Since Mike Meissner added these patterns, I mostly inherited them from him.
 
Can you please elaborate or provide some more guidance on what I need to do.
(Please see my comments below)

> +;; We don't have a straight 32-bit parallel multiply on XOP, so fake it with a
> +;; multiply/add.  In general, we expect the define_split to occur before
> +;; register allocation, so we have to handle the corner case where the target
> +;; is the same as one of the inputs.
> +(define_insn_and_split "*xop_mulv4si3"
> +  [(set (match_operand:V4SI 0 "register_operand" "=&x")
> +	(mult:V4SI (match_operand:V4SI 1 "register_operand" "%x")
> +		   (match_operand:V4SI 2 "nonimmediate_operand" "xm")))]
> > +  "TARGET_XOP"
> > +  "#"
> > +  "&& (reload_completed
> > +       || (!reg_mentioned_p (operands[0], operands[1])
> > +	   && !reg_mentioned_p (operands[0], operands[2])))"
> 
> WHat happens when regs are mentioned?
> There are other cases 2 memory operand multiply-add splitting testing
> these, are we somehow making sure this conditional will always hold and
> we won't ICE not being able to satisfy the conditions?

Actually I am not even sure this xop_mulv4si3 pattern is needed because XOP now implies SSE 4.2 and AVX and so we can just generate the mulv4si3 patterns for AVX or SSE 4.1 when -mxop is used. Can I just remove this xop_mulv4si3 pattern then?

As for your reference to "other cases 2 memory operand multiply-add splitting", I assume you are referring to the vpmac/d* define_splits.
 
In XOP vpmac/d* instructions, there is no restriction any more for the destination reg to be same as the third src operand unlike SSE5. And only the second source can be memory. Also I don't see anything in the manual that the destination reg is enforced to be different from source 1, source 2 or source 3 operands individually either.

Should I then remove below from the pmac/d* patterns?

(!reg_mentioned_p (operands[0], operands[1])
> > +	   && !reg_mentioned_p (operands[0], operands[2]))


> 
> > +;; The following are for the various unpack insns which doesn't need
> the first
> > +;; source operand, so we can just use the output operand for the first
> operand.
> > +;; This allows either of the other two operands to be a memory operand.
> We
> > +;; can't just use the first operand as an argument to the normal pperm
> because
> > +;; then an output only argument, suddenly becomes an input operand.
> > +(define_insn "xop_pperm_zero_v16qi_v8hi"
> > +  [(set (match_operand:V8HI 0 "register_operand" "=x,x")
> > +	(zero_extend:V8HI
> > +	 (vec_select:V8QI
> > +	  (match_operand:V16QI 1 "nonimmediate_operand" "xm,x")
> > +	  (match_operand 2 "" ""))))	;; parallel with const_int's
> > +   (use (match_operand:V16QI 3 "nonimmediate_operand" "x,xm"))]
> 
> Hmm, there is no unspec or omething that would make it clear that we can
> not ever somehow simplify into this form with operand 2 being something
> different than parallel with const_ints.  I think this needs new
> predicate.

I can define a new predicate for it in predicates.md, but I am not sure how exactly to represent the "parallel with const ints" part.

Any suggestions?

Thanks,
Harsha

^ permalink raw reply	[flat|nested] 15+ messages in thread
* RE: PATCH: Add XOP 128-bit and 256-bit support for upcoming AMD   Orochi processor.
@ 2009-10-21 10:09 Uros Bizjak
  2009-10-21 19:53 ` rajagopal, dwarak
  0 siblings, 1 reply; 15+ messages in thread
From: Uros Bizjak @ 2009-10-21 10:09 UTC (permalink / raw)
  To: gcc-patches
  Cc: rajagopal, dwarak, Jan Hubicka, Harle, Christophe, Jagasia, Harsha

Hello!

>	* gcc.target/i386/xop-check.h: New file.
>	* gcc.target/i386/xop-hadduX.c: Ditto.
>	* gcc.target/i386/xop-haddX.c: Ditto.
>	* gcc.target/i386/xop-hsubX.c: Ditto.
>	* gcc.target/i386/xop-imul32widen-vector.c: Ditto.
>	* gcc.target/i386/xop-imul32widen-vector.c: Ditto.
>	* gcc.target/i386/xop-pcmov2.c: Ditto.
>	* gcc.target/i386/xop-pcmov.c: Ditto.
>	* gcc.target/i386/xop-rotate1-vector.c: Ditto.
>	* gcc.target/i386/xop-rotate2-vector.c: Ditto.
>	* gcc.target/i386/xop-rotate3-vector.c: Ditto.
>	* gcc.target/i386/xop-shift2-vector.c: Ditto.
>	* gcc.target/i386/xop-shift3-vector.c: Ditto.
>	* gcc.target/i386/i386.exp:  Add check_effective_target_xop

You should also update gcc.target/i386/sse-{12,13,14,22,23}.c and
g++.dg/other/i386-{2,3,5,6}.C with new compile options to activate and
check new intrinsic header files.

Uros.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2009-12-21 12:42 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-30  6:52 PATCH: Add XOP 128-bit and 256-bit support for upcoming AMD Orochi processor Harsha Jagasia
2009-10-01 15:37 ` Jan Hubicka
2009-10-12 16:54 rajagopal, dwarak
2009-10-13 23:17 ` Jan Hubicka
2009-10-20 21:34   ` rajagopal, dwarak
2009-10-21 10:09 Uros Bizjak
2009-10-21 19:53 ` rajagopal, dwarak
2009-10-26 17:29   ` Uros Bizjak
2009-10-29 18:18     ` rajagopal, dwarak
2009-10-29 19:21       ` Uros Bizjak
2009-11-19 15:58     ` Sebastian Pop
2009-11-19 20:19       ` Uros Bizjak
2009-11-19 21:41         ` Uros Bizjak
2009-12-21 13:57       ` Jakub Jelinek
2009-12-21 16:15         ` Jan Hubicka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).