public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Vector permutation support for x86
@ 2009-11-26  3:55 Richard Henderson
  2009-11-27 23:54 ` H.J. Lu
  2009-11-30 18:36 ` Sebastian Pop
  0 siblings, 2 replies; 45+ messages in thread
From: Richard Henderson @ 2009-11-26  3:55 UTC (permalink / raw)
  To: GCC Patches; +Cc: Sebastian Pop

[-- Attachment #1: Type: text/plain, Size: 860 bytes --]

The following implements the builtin_vec_perm hook so that the 
vectorizer can do its SLP thing.  As noted elsewhere, ISAs before SSSE3 
cannot arbitrarily permute, so this complicates things a bit.  But even 
given SSSE3, the arbitrary two-vector permute costs 3 insns, and so we 
would want to do most of this work to find the 1 and 2 insn special cases.

For the AMD folk: I tried to support the vpperm insn from the XOP ISA, 
but there seems to be some disconnect between trunk binutils and trunk 
gcc wrt vpperm.  This can be seen in the failure of the new test 
"vperm-v4si-2x.c".  I'm looking at the XOP spec labeled "Pub No 43479, 
Rev 3.03, May 2009", and what gcc is emitting looks ok.  But I've 
already been bitten by an out-of-date AVX spec during this adventure, so 
I'd appreciate some double-check.

Tested on an i7 machine (i.e. sse4.2).


r~

[-- Attachment #2: zz --]
[-- Type: text/plain, Size: 2180 bytes --]

	* config/i386/i386-builtin-types.awk (DEF_VECTOR_TYPE): Allow an
	optional 3rd argument to define the mode.
	* config/i386/i386-builtin-types.def (UQI, UHI, USI, UDI): New.
	(V2UDI, V4USI, V8UHI, V16UQI): New.
	(V4SF_FTYPE_V4SF_V4SF_V4SI, V2UDI_FTYPE_V2UDI_V2UDI_V2UDI,
	V4USI_FTYPE_V4USI_V4USI_V4USI, V8UHI_FTYPE_V8UHI_V8UHI_V8UHI,
	V16UQI_FTYPE_V16UQI_V16UQI_V16UQI): New.
	* config/i386/i386-modes.def: Rearrange for double-wide AVX.
	* config/i386/i386-protos.h (ix86_expand_vec_extract_even_odd): New.
	* config/i386/i386.c (IX86_BUILTIN_VEC_PERM_*): New.
	(bdesc_args): Add the builtin definitions to match.
	(ix86_expand_builtin): Expand them.
	(ix86_builtin_vectorization_cost): Rename from
	x86_builtin_vectorization_cost.
	(ix86_vectorize_builtin_vec_perm, struct expand_vec_perm_d,
	doublesize_vector_mode, expand_vselect, expand_vselect_vconcat,
	expand_vec_perm_blend, expand_vec_perm_vpermil,
	expand_vec_perm_pshufb, expand_vec_perm_1,
	expand_vec_perm_pshuflw_pshufhw, expand_vec_perm_palignr,
	expand_vec_perm_interleave2, expand_vec_perm_pshufb2,
	expand_vec_perm_even_odd_1, expand_vec_perm_even_odd,
	ix86_expand_vec_perm_builtin_1, extract_vec_perm_cst,
	ix86_expand_vec_perm_builtin, ix86_vectorize_builtin_vec_perm_ok,
	ix86_expand_vec_extract_even_odd, TARGET_VECTORIZE_BUILTIN_VEC_PERM,
	TARGET_VECTORIZE_BUILTIN_VEC_PERM_OK): New.
	* sse.md (SSEMODE_EO): New.
	(vec_extract_even<mode>): Use SSEMODE_EO and
	ix86_expand_vec_extract_even_odd.
	(vec_extract_odd<mode>): Likewise.
	(mulv16qi3, vec_pack_trunc_v8hi, vec_pack_trunc_v4si,
	vec_pack_trunc_v2di): Use ix86_expand_vec_extract_even_odd.

testsuite/
	* gcc.dg/vect/slp-21.c: Succeed with vect_extract_even_odd too.

	* lib/target-supports.exp
	(check_effective_target_vect_extract_even_odd): Add x86.

	* gcc.target/i386/isa-check.h: New.
	* gcc.target/i386/vperm-2-2.inc, gcc.target/i386/vperm-4-1.inc,
	gcc.target/i386/vperm-4-2.inc, gcc.target/i386/vperm-v2df.c,
	gcc.target/i386/vperm-v2di.c, gcc.target/i386/vperm-v4sf-1.c,
	gcc.target/i386/vperm-v4sf-2.c, gcc.target/i386/vperm-v4si-1.c,
	gcc.target/i386/vperm-v4si-2.c, gcc.target/i386/vperm-v4si-2x.c,
	gcc.target/i386/vperm.pl: New files.

[-- Attachment #3: z --]
[-- Type: text/plain, Size: 156430 bytes --]

diff --git a/gcc/config/i386/i386-builtin-types.awk b/gcc/config/i386/i386-builtin-types.awk
index 0c54458..7b016f4 100644
--- a/gcc/config/i386/i386-builtin-types.awk
+++ b/gcc/config/i386/i386-builtin-types.awk
@@ -69,11 +69,12 @@ $1 == "DEF_PRIMITIVE_TYPE" {
 }
 
 $1 == "DEF_VECTOR_TYPE" {
-    if (NF == 4) {
+    if (NF == 4 || NF == 5) {
 	check_type($3)
 	type_hash[$2] = 1
-	vect_mode[vect_defs] = $2
+	vect_name[vect_defs] = $2
 	vect_base[vect_defs] = $3
+	vect_mode[vect_defs] = (NF == 5 ? $4 : $2)
 	vect_defs++
     } else
 	do_error("DEF_VECTOR_TYPE expected 2 arguments")
@@ -152,8 +153,8 @@ END {
 	print "  IX86_BT_" prim_name[i] ","
     print "  IX86_BT_LAST_PRIM = IX86_BT_" prim_name[i-1] ","
     for (i = 0; i < vect_defs; ++i)
-	print "  IX86_BT_" vect_mode[i] ","
-    print "  IX86_BT_LAST_VECT = IX86_BT_" vect_mode[i-1] ","
+	print "  IX86_BT_" vect_name[i] ","
+    print "  IX86_BT_LAST_VECT = IX86_BT_" vect_name[i-1] ","
     for (i = 0; i < ptr_defs; ++i)
 	print "  IX86_BT_" ptr_name[i] ","
     print "  IX86_BT_LAST_PTR = IX86_BT_" ptr_name[i-1] ","
diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def
index 3f0b20b..9f45a13 100644
--- a/gcc/config/i386/i386-builtin-types.def
+++ b/gcc/config/i386/i386-builtin-types.def
@@ -10,12 +10,12 @@
 #   At present, that's all that's required; revisit if it turns out
 #   that we need more than that.
 #
-# DEF_VECTOR_TYPE (ENUM, TYPE)
+# DEF_VECTOR_TYPE (ENUM, TYPE [, MODE])
 #
-#  This describes a vector type.  ENUM doubles as both the identifier
-#  to define in the enumeration as well as the mode of the vector; TYPE is
-#  the enumeral for the inner type which should of course name a type of
-#  the proper inner mode.
+#  This describes a vector type.  ENUM is an identifier as above.
+#  TYPE is the enumeral for the inner type which should of course
+#  name a type of the proper inner mode.  If present, MODE is the
+#  machine mode, else the machine mode should be the same as ENUM.
 #
 # DEF_POINTER_TYPE (ENUM, TYPE [, CONST])
 #
@@ -40,10 +40,22 @@
 DEF_PRIMITIVE_TYPE (VOID, void_type_node)
 DEF_PRIMITIVE_TYPE (CHAR, char_type_node)
 DEF_PRIMITIVE_TYPE (UCHAR, unsigned_char_type_node)
-DEF_PRIMITIVE_TYPE (QI, intQI_type_node)
+# ??? Logically this should be intQI_type_node, but that maps to "signed char"
+# which is a different type than "char" even if "char" is signed.  This must
+# match the usage in emmintrin.h and changing this would change name mangling
+# and so is not advisable.
+DEF_PRIMITIVE_TYPE (QI, char_type_node)
 DEF_PRIMITIVE_TYPE (HI, intHI_type_node)
 DEF_PRIMITIVE_TYPE (SI, intSI_type_node)
+# ??? Logically this should be intDI_type_node, but that maps to "long"
+# with 64-bit, and that's not how the emmintrin.h is written.  Again, 
+# changing this would change name mangling.
 DEF_PRIMITIVE_TYPE (DI, long_long_integer_type_node)
+DEF_PRIMITIVE_TYPE (UQI, unsigned_intQI_type_node)
+DEF_PRIMITIVE_TYPE (UHI, unsigned_intHI_type_node)
+DEF_PRIMITIVE_TYPE (USI, unsigned_intSI_type_node)
+DEF_PRIMITIVE_TYPE (UDI, long_long_unsigned_type_node)
+# ??? Some of the types below should use the mode types above.
 DEF_PRIMITIVE_TYPE (USHORT, short_unsigned_type_node)
 DEF_PRIMITIVE_TYPE (INT, integer_type_node)
 DEF_PRIMITIVE_TYPE (UINT, unsigned_type_node)
@@ -59,23 +71,33 @@ DEF_PRIMITIVE_TYPE (DOUBLE, double_type_node)
 DEF_PRIMITIVE_TYPE (FLOAT80, float80_type_node)
 DEF_PRIMITIVE_TYPE (FLOAT128, float128_type_node)
 
-DEF_VECTOR_TYPE (V16HI, HI)
-DEF_VECTOR_TYPE (V16QI, CHAR)
-DEF_VECTOR_TYPE (V1DI, DI)
-DEF_VECTOR_TYPE (V2DF, DOUBLE)
-DEF_VECTOR_TYPE (V2DI, DI)
+# MMX vectors
 DEF_VECTOR_TYPE (V2SF, FLOAT)
+DEF_VECTOR_TYPE (V1DI, DI)
 DEF_VECTOR_TYPE (V2SI, SI)
-DEF_VECTOR_TYPE (V32QI, CHAR)
-DEF_VECTOR_TYPE (V4DF, DOUBLE)
-DEF_VECTOR_TYPE (V4DI, DI)
 DEF_VECTOR_TYPE (V4HI, HI)
+DEF_VECTOR_TYPE (V8QI, QI)
+
+# SSE vectors
+DEF_VECTOR_TYPE (V2DF, DOUBLE)
 DEF_VECTOR_TYPE (V4SF, FLOAT)
+DEF_VECTOR_TYPE (V2DI, DI)
 DEF_VECTOR_TYPE (V4SI, SI)
 DEF_VECTOR_TYPE (V8HI, HI)
-DEF_VECTOR_TYPE (V8QI, CHAR)
+DEF_VECTOR_TYPE (V16QI, QI)
+DEF_VECTOR_TYPE (V2UDI, UDI, V2DI)
+DEF_VECTOR_TYPE (V4USI, USI, V4SI)
+DEF_VECTOR_TYPE (V8UHI, UHI, V8HI)
+DEF_VECTOR_TYPE (V16UQI, UQI, V16QI)
+
+# AVX vectors
+DEF_VECTOR_TYPE (V4DF, DOUBLE)
 DEF_VECTOR_TYPE (V8SF, FLOAT)
+DEF_VECTOR_TYPE (V4DI, DI)
 DEF_VECTOR_TYPE (V8SI, SI)
+DEF_VECTOR_TYPE (V16HI, HI)
+DEF_VECTOR_TYPE (V32QI, QI)
+
 
 DEF_POINTER_TYPE (PCCHAR, CHAR, CONST)
 DEF_POINTER_TYPE (PCDOUBLE, DOUBLE, CONST)
@@ -323,6 +345,12 @@ DEF_FUNCTION_TYPE (VOID, UINT64, UINT, UINT)
 DEF_FUNCTION_TYPE (VOID, USHORT, UINT, USHORT)
 DEF_FUNCTION_TYPE (VOID, V16QI, V16QI, PCHAR)
 DEF_FUNCTION_TYPE (VOID, V8QI, V8QI, PCHAR)
+DEF_FUNCTION_TYPE (V2DF, V2DF, V2DF, V2DI)
+DEF_FUNCTION_TYPE (V4SF, V4SF, V4SF, V4SI)
+DEF_FUNCTION_TYPE (V2UDI, V2UDI, V2UDI, V2UDI)
+DEF_FUNCTION_TYPE (V4USI, V4USI, V4USI, V4USI)
+DEF_FUNCTION_TYPE (V8UHI, V8UHI, V8UHI, V8UHI)
+DEF_FUNCTION_TYPE (V16UQI, V16UQI, V16UQI, V16UQI)
 
 DEF_FUNCTION_TYPE (V2DI, V2DI, V2DI, UINT, UINT)
 DEF_FUNCTION_TYPE (V4HI, HI, HI, HI, HI)
diff --git a/gcc/config/i386/i386-modes.def b/gcc/config/i386/i386-modes.def
index 9c94802..f2e06ee 100644
--- a/gcc/config/i386/i386-modes.def
+++ b/gcc/config/i386/i386-modes.def
@@ -69,22 +69,20 @@ CC_MODE (CCZ);
 CC_MODE (CCFP);
 CC_MODE (CCFPU);
 
-/* Vector modes.  */
-VECTOR_MODES (INT, 4);        /*            V4QI V2HI */
-VECTOR_MODES (INT, 8);        /*       V8QI V4HI V2SI */
-VECTOR_MODES (INT, 16);       /* V16QI V8HI V4SI V2DI */
-VECTOR_MODES (INT, 32);       /* V32QI V16HI V8SI V4DI */
-VECTOR_MODES (FLOAT, 8);      /*            V4HF V2SF */
-VECTOR_MODES (FLOAT, 16);     /*       V8HF V4SF V2DF */
-VECTOR_MODES (FLOAT, 32);     /*      V16HF V8SF V4DF */
-VECTOR_MODE (INT, DI, 1);     /*                 V1DI */
-VECTOR_MODE (INT, SI, 1);     /*                 V1SI */
-VECTOR_MODE (INT, QI, 2);     /*                 V2QI */
-VECTOR_MODE (INT, DI, 8);     /*                 V8DI */
-VECTOR_MODE (INT, HI, 32);    /*                V32HI */
-VECTOR_MODE (INT, QI, 64);    /*                V64QI */
-VECTOR_MODE (FLOAT, DF, 8);   /*                 V8DF */
-VECTOR_MODE (FLOAT, SF, 16);  /*                V16SF */
+/* Vector modes.  Note that VEC_CONCAT patterns require vector
+   sizes twice as big as implemented in hardware.  */
+VECTOR_MODES (INT, 4);        /*              V4QI V2HI */
+VECTOR_MODES (INT, 8);        /*         V8QI V4HI V2SI */
+VECTOR_MODES (INT, 16);       /*   V16QI V8HI V4SI V2DI */
+VECTOR_MODES (INT, 32);       /*  V32QI V16HI V8SI V4DI */
+VECTOR_MODES (INT, 64);       /* V64QI V32HI V16SI V8DI */
+VECTOR_MODES (FLOAT, 8);      /*              V4HF V2SF */
+VECTOR_MODES (FLOAT, 16);     /*         V8HF V4SF V2DF */
+VECTOR_MODES (FLOAT, 32);     /*        V16HF V8SF V4DF */
+VECTOR_MODES (FLOAT, 64);     /*       V32HF V16SF V8DF */
+VECTOR_MODE (INT, DI, 1);     /*                   V1DI */
+VECTOR_MODE (INT, SI, 1);     /*                   V1SI */
+VECTOR_MODE (INT, QI, 2);     /*                   V2QI */
 
 INT_MODE (OI, 32);
 
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index d36b269..88acc1f 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -219,6 +219,8 @@ extern void ix86_expand_reduc_v4sf (rtx (*)(rtx, rtx, rtx), rtx, rtx);
 extern bool ix86_fma4_valid_op_p (rtx [], rtx, int, bool, int, bool);
 extern void ix86_expand_fma4_multiple_memory (rtx [], int, enum machine_mode);
 
+extern void ix86_expand_vec_extract_even_odd (rtx, rtx, rtx, unsigned);
+
 /* In i386-c.c  */
 extern void ix86_target_macros (void);
 extern void ix86_register_pragmas (void);
@@ -277,4 +279,3 @@ extern int asm_preferred_eh_data_format (int, int);
 #ifdef HAVE_ATTR_cpu
 extern enum attr_cpu ix86_schedule;
 #endif
-
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index f1bb9ec..4d5e8a3 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -54,7 +54,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "params.h"
 #include "cselib.h"
 
-static int x86_builtin_vectorization_cost (bool);
 static rtx legitimize_dllimport_symbol (rtx, bool);
 
 #ifndef CHECK_STACK_LIMIT
@@ -1885,6 +1884,7 @@ static void ix86_compute_frame_layout (struct ix86_frame *);
 static bool ix86_expand_vector_init_one_nonzero (bool, enum machine_mode,
 						 rtx, rtx, int);
 static void ix86_add_new_builtins (int);
+static rtx ix86_expand_vec_perm_builtin (tree);
 
 enum ix86_function_specific_strings
 {
@@ -21037,6 +21037,17 @@ enum ix86_builtins
 
   IX86_BUILTIN_CVTUDQ2PS,
 
+  IX86_BUILTIN_VEC_PERM_V2DF,
+  IX86_BUILTIN_VEC_PERM_V4SF,
+  IX86_BUILTIN_VEC_PERM_V2DI,
+  IX86_BUILTIN_VEC_PERM_V4SI,
+  IX86_BUILTIN_VEC_PERM_V8HI,
+  IX86_BUILTIN_VEC_PERM_V16QI,
+  IX86_BUILTIN_VEC_PERM_V2DI_U,
+  IX86_BUILTIN_VEC_PERM_V4SI_U,
+  IX86_BUILTIN_VEC_PERM_V8HI_U,
+  IX86_BUILTIN_VEC_PERM_V16QI_U,
+
   /* FMA4 and XOP instructions.  */
   IX86_BUILTIN_VFMADDSS,
   IX86_BUILTIN_VFMADDSD,
@@ -21710,6 +21721,17 @@ static const struct builtin_description bdesc_args[] =
   /* SSE2 */
   { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_shufpd, "__builtin_ia32_shufpd", IX86_BUILTIN_SHUFPD, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT },
 
+  { OPTION_MASK_ISA_SSE2, CODE_FOR_nothing, "__builtin_ia32_vec_perm_v2df", IX86_BUILTIN_VEC_PERM_V2DF, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DI },
+  { OPTION_MASK_ISA_SSE2, CODE_FOR_nothing, "__builtin_ia32_vec_perm_v4sf", IX86_BUILTIN_VEC_PERM_V4SF, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SI },
+  { OPTION_MASK_ISA_SSE2, CODE_FOR_nothing, "__builtin_ia32_vec_perm_v2di", IX86_BUILTIN_VEC_PERM_V2DI, UNKNOWN, (int) V2DI_FTYPE_V2DI_V2DI_V2DI },
+  { OPTION_MASK_ISA_SSE2, CODE_FOR_nothing, "__builtin_ia32_vec_perm_v4si", IX86_BUILTIN_VEC_PERM_V4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI },
+  { OPTION_MASK_ISA_SSE2, CODE_FOR_nothing, "__builtin_ia32_vec_perm_v8hi", IX86_BUILTIN_VEC_PERM_V8HI, UNKNOWN, (int) V8HI_FTYPE_V8HI_V8HI_V8HI },
+  { OPTION_MASK_ISA_SSE2, CODE_FOR_nothing, "__builtin_ia32_vec_perm_v16qi", IX86_BUILTIN_VEC_PERM_V16QI, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI_V16QI },
+  { OPTION_MASK_ISA_SSE2, CODE_FOR_nothing, "__builtin_ia32_vec_perm_v2di_u", IX86_BUILTIN_VEC_PERM_V2DI_U, UNKNOWN, (int) V2UDI_FTYPE_V2UDI_V2UDI_V2UDI },
+  { OPTION_MASK_ISA_SSE2, CODE_FOR_nothing, "__builtin_ia32_vec_perm_v4si_u", IX86_BUILTIN_VEC_PERM_V4SI_U, UNKNOWN, (int) V4USI_FTYPE_V4USI_V4USI_V4USI },
+  { OPTION_MASK_ISA_SSE2, CODE_FOR_nothing, "__builtin_ia32_vec_perm_v8hi_u", IX86_BUILTIN_VEC_PERM_V8HI_U, UNKNOWN, (int) V8UHI_FTYPE_V8UHI_V8UHI_V8UHI },
+  { OPTION_MASK_ISA_SSE2, CODE_FOR_nothing, "__builtin_ia32_vec_perm_v16qi_u", IX86_BUILTIN_VEC_PERM_V16QI_U, UNKNOWN, (int) V16UQI_FTYPE_V16UQI_V16UQI_V16UQI },
+
   { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_movmskpd, "__builtin_ia32_movmskpd", IX86_BUILTIN_MOVMSKPD, UNKNOWN, (int) INT_FTYPE_V2DF  },
   { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_pmovmskb, "__builtin_ia32_pmovmskb128", IX86_BUILTIN_PMOVMSKB128, UNKNOWN, (int) INT_FTYPE_V16QI },
   { OPTION_MASK_ISA_SSE2, CODE_FOR_sqrtv2df2, "__builtin_ia32_sqrtpd", IX86_BUILTIN_SQRTPD, UNKNOWN, (int) V2DF_FTYPE_V2DF },
@@ -24119,6 +24141,18 @@ ix86_expand_builtin (tree exp, rtx target, rtx subtarget ATTRIBUTE_UNUSED,
     case IX86_BUILTIN_VEC_SET_V16QI:
       return ix86_expand_vec_set_builtin (exp);
 
+    case IX86_BUILTIN_VEC_PERM_V2DF:
+    case IX86_BUILTIN_VEC_PERM_V4SF:
+    case IX86_BUILTIN_VEC_PERM_V2DI:
+    case IX86_BUILTIN_VEC_PERM_V4SI:
+    case IX86_BUILTIN_VEC_PERM_V8HI:
+    case IX86_BUILTIN_VEC_PERM_V16QI:
+    case IX86_BUILTIN_VEC_PERM_V2DI_U:
+    case IX86_BUILTIN_VEC_PERM_V4SI_U:
+    case IX86_BUILTIN_VEC_PERM_V8HI_U:
+    case IX86_BUILTIN_VEC_PERM_V16QI_U:
+      return ix86_expand_vec_perm_builtin (exp);
+
     case IX86_BUILTIN_INFQ:
     case IX86_BUILTIN_HUGE_VALQ:
       {
@@ -28904,7 +28938,7 @@ static const struct attribute_spec ix86_attribute_table[] =
 
 /* Implement targetm.vectorize.builtin_vectorization_cost.  */
 static int
-x86_builtin_vectorization_cost (bool runtime_test)
+ix86_builtin_vectorization_cost (bool runtime_test)
 {
   /* If the branch of the runtime test is taken - i.e. - the vectorized
      version is skipped - this incurs a misprediction cost (because the
@@ -28926,6 +28960,1091 @@ x86_builtin_vectorization_cost (bool runtime_test)
     return 0;
 }
 
+/* Implement targetm.vectorize.builtin_vec_perm.  */
+
+static tree
+ix86_vectorize_builtin_vec_perm (tree vec_type, tree *mask_type)
+{
+  tree itype = TREE_TYPE (vec_type);
+  bool u = TYPE_UNSIGNED (itype);
+  enum ix86_builtins fcode;
+
+  if (!TARGET_SSE2)
+    return NULL_TREE;
+
+  switch (TYPE_MODE (vec_type))
+    {
+    case V2DFmode:
+      itype = ix86_get_builtin_type (IX86_BT_DI);
+      fcode = IX86_BUILTIN_VEC_PERM_V2DF;
+      break;
+    case V4SFmode:
+      itype = ix86_get_builtin_type (IX86_BT_SI);
+      fcode = IX86_BUILTIN_VEC_PERM_V4SF;
+      break;
+    case V2DImode:
+      fcode = u ? IX86_BUILTIN_VEC_PERM_V2DI_U : IX86_BUILTIN_VEC_PERM_V2DI;
+      break;
+    case V4SImode:
+      fcode = u ? IX86_BUILTIN_VEC_PERM_V4SI_U : IX86_BUILTIN_VEC_PERM_V4SI;
+      break;
+    case V8HImode:
+      fcode = u ? IX86_BUILTIN_VEC_PERM_V8HI_U : IX86_BUILTIN_VEC_PERM_V8HI;
+      break;
+    case V16QImode:
+      fcode = u ? IX86_BUILTIN_VEC_PERM_V16QI_U : IX86_BUILTIN_VEC_PERM_V16QI;
+      break;
+    default:
+      return NULL_TREE;
+    }
+
+  *mask_type = itype;
+  return ix86_builtins[(int) fcode];
+}
+
+/* AVX does not support 32-byte integer vector operations,
+   thus the longest vector we are faced with is V16QImode.  */
+#define MAX_VECT_LEN	16
+
+struct expand_vec_perm_d
+{
+  rtx target, op0, op1;
+  unsigned char perm[MAX_VECT_LEN];
+  enum machine_mode vmode;
+  unsigned char nelt;
+  bool testing_p;
+};
+
+/* Return a vector mode with twice as many elements as VMODE.  */
+/* ??? Consider moving this to a table generated by genmodes.c.  */
+
+static enum machine_mode
+doublesize_vector_mode (enum machine_mode vmode)
+{
+  switch (vmode)
+    {
+    case V2SFmode:	return V4SFmode;
+    case V1DImode:	return V2DImode;
+    case V2SImode:	return V4SImode;
+    case V4HImode:	return V8HImode;
+    case V8QImode:	return V16QImode;
+
+    case V2DFmode:	return V4DFmode;
+    case V4SFmode:	return V8SFmode;
+    case V2DImode:	return V4DImode;
+    case V4SImode:	return V8SImode;
+    case V8HImode:	return V16HImode;
+    case V16QImode:	return V32QImode;
+
+    case V4DFmode:	return V8DFmode;
+    case V8SFmode:	return V16SFmode;
+    case V4DImode:	return V8DImode;
+    case V8SImode:	return V16SImode;
+    case V16HImode:	return V32HImode;
+    case V32QImode:	return V64QImode;
+
+    default:
+      gcc_unreachable ();
+    }
+}
+
+/* Construct (set target (vec_select op0 (parallel perm))) and
+   return true if that's a valid instruction in the active ISA.  */
+
+static bool
+expand_vselect (rtx target, rtx op0, const unsigned char *perm, unsigned nelt)
+{
+  rtx rperm[MAX_VECT_LEN], x;
+  unsigned i;
+
+  for (i = 0; i < nelt; ++i)
+    rperm[i] = GEN_INT (perm[i]);
+
+  x = gen_rtx_PARALLEL (VOIDmode, gen_rtvec_v (nelt, rperm));
+  x = gen_rtx_VEC_SELECT (GET_MODE (target), op0, x);
+  x = gen_rtx_SET (VOIDmode, target, x);
+
+  x = emit_insn (x);
+  if (recog_memoized (x) < 0)
+    {
+      remove_insn (x);
+      return false;
+    }
+  return true;
+}
+
+/* Similar, but generate a vec_concat from op0 and op1 as well.  */
+
+static bool
+expand_vselect_vconcat (rtx target, rtx op0, rtx op1,
+			const unsigned char *perm, unsigned nelt)
+{
+  enum machine_mode v2mode;
+  rtx x;
+
+  v2mode = doublesize_vector_mode (GET_MODE (op0));
+  x = gen_rtx_VEC_CONCAT (v2mode, op0, op1);
+  return expand_vselect (target, x, perm, nelt);
+}
+
+/* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to implement D
+   in terms of blendp[sd] / pblendw / pblendvb.  */
+
+static bool
+expand_vec_perm_blend (struct expand_vec_perm_d *d)
+{
+  enum machine_mode vmode = d->vmode;
+  unsigned i, mask, nelt = d->nelt;
+  rtx target, op0, op1, x;
+
+  if (!TARGET_SSE4_1 || d->op0 == d->op1)
+    return false;
+  if (!(GET_MODE_SIZE (vmode) == 16 || vmode == V4DFmode || vmode == V8SFmode))
+    return false;
+
+  /* This is a blend, not a permute.  Elements must stay in their
+     respective lanes.  */
+  for (i = 0; i < nelt; ++i)
+    {
+      unsigned e = d->perm[i];
+      if (!(e == i || e == i + nelt))
+	return false;
+    }
+
+  if (d->testing_p)
+    return true;
+
+  /* ??? Without SSE4.1, we could implement this with and/andn/or.  This
+     decision should be extracted elsewhere, so that we only try that
+     sequence once all budget==3 options have been tried.  */
+
+  /* For bytes, see if bytes move in pairs so we can use pblendw with
+     an immediate argument, rather than pblendvb with a vector argument.  */
+  if (vmode == V16QImode)
+    {
+      bool pblendw_ok = true;
+      for (i = 0; i < 16 && pblendw_ok; i += 2)
+	pblendw_ok = (d->perm[i] + 1 == d->perm[i + 1]);
+
+      if (!pblendw_ok)
+	{
+	  rtx rperm[16], vperm;
+
+	  for (i = 0; i < nelt; ++i)
+	    rperm[i] = (d->perm[i] < nelt ? const0_rtx : constm1_rtx);
+
+	  vperm = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, rperm));
+	  vperm = force_reg (V16QImode, vperm);
+
+	  emit_insn (gen_sse4_1_pblendvb (d->target, d->op0, d->op1, vperm));
+	  return true;
+	}
+    }
+
+  target = d->target;
+  op0 = d->op0;
+  op1 = d->op1;
+  mask = 0;
+
+  switch (vmode)
+    {
+    case V4DFmode:
+    case V8SFmode:
+    case V2DFmode:
+    case V4SFmode:
+    case V8HImode:
+      for (i = 0; i < nelt; ++i)
+	mask |= (d->perm[i] >= nelt) << i;
+      break;
+
+    case V2DImode:
+      for (i = 0; i < 2; ++i)
+	mask |= (d->perm[i] >= 2 ? 15 : 0) << (i * 4);
+      goto do_subreg;
+
+    case V4SImode:
+      for (i = 0; i < 4; ++i)
+	mask |= (d->perm[i] >= 4 ? 3 : 0) << (i * 2);
+      goto do_subreg;
+
+    case V16QImode:
+      for (i = 0; i < 8; ++i)
+	mask |= (d->perm[i * 2] >= 16) << i;
+
+    do_subreg:
+      vmode = V8HImode;
+      target = gen_lowpart (vmode, target);
+      op0 = gen_lowpart (vmode, target);
+      op1 = gen_lowpart (vmode, target);
+      break;
+
+    default:
+      gcc_unreachable ();
+    }
+
+  /* This matches five different patterns with the different modes.  */
+  x = gen_rtx_VEC_MERGE (vmode, op0, op1, GEN_INT (mask));
+  x = gen_rtx_SET (VOIDmode, target, x);
+  emit_insn (x);
+
+  return true;
+}
+
+/* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to implement D
+   in terms of the variable form of vpermilps.
+
+   Note that we will have already failed the immediate input vpermilps,
+   which requires that the high and low part shuffle be identical; the
+   variable form doesn't require that.  */
+
+static bool
+expand_vec_perm_vpermil (struct expand_vec_perm_d *d)
+{
+  rtx rperm[8], vperm;
+  unsigned i;
+
+  if (!TARGET_AVX || d->vmode != V8SFmode || d->op0 != d->op1)
+    return false;
+
+  /* We can only permute within the 128-bit lane.  */
+  for (i = 0; i < 8; ++i)
+    {
+      unsigned e = d->perm[i];
+      if (i < 4 ? e >= 4 : e < 4)
+	return false;
+    }
+
+  if (d->testing_p)
+    return true;
+
+  for (i = 0; i < 8; ++i)
+    {
+      unsigned e = d->perm[i];
+
+      /* Within each 128-bit lane, the elements of op0 are numbered
+	 from 0 and the elements of op1 are numbered from 4.  */
+      if (e >= 8 + 4)
+	e -= 8;
+      else if (e >= 4)
+	e -= 4;
+
+      rperm[i] = GEN_INT (e);
+    }
+
+  vperm = gen_rtx_CONST_VECTOR (V8SImode, gen_rtvec_v (8, rperm));
+  vperm = force_reg (V8SImode, vperm);
+  emit_insn (gen_avx_vpermilvarv8sf3 (d->target, d->op0, vperm));
+
+  return true;
+}
+
+/* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to implement D
+   in terms of pshufb or vpperm.  */
+
+static bool
+expand_vec_perm_pshufb (struct expand_vec_perm_d *d)
+{
+  unsigned i, nelt, eltsz;
+  rtx rperm[16], vperm, target, op0, op1;
+
+  if (!(d->op0 == d->op1 ? TARGET_SSSE3 : TARGET_XOP))
+    return false;
+  if (GET_MODE_SIZE (d->vmode) != 16)
+    return false;
+
+  if (d->testing_p)
+    return true;
+
+  nelt = d->nelt;
+  eltsz = GET_MODE_SIZE (GET_MODE_INNER (d->vmode));
+
+  for (i = 0; i < nelt; ++i)
+    {
+      unsigned j, e = d->perm[i];
+      for (j = 0; j < eltsz; ++j)
+	rperm[i * eltsz + j] = GEN_INT (e * eltsz + j);
+    }
+
+  vperm = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, rperm));
+  vperm = force_reg (V16QImode, vperm);
+
+  target = gen_lowpart (V16QImode, d->target);
+  op0 = gen_lowpart (V16QImode, d->op0);
+  if (d->op0 == d->op1)
+    emit_insn (gen_ssse3_pshufbv16qi3 (target, op0, vperm));
+  else
+    {
+      op1 = gen_lowpart (V16QImode, d->op1);
+      emit_insn (gen_xop_pperm (target, op0, op1, vperm));
+    }
+
+  return true;
+}
+
+/* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to instantiate D
+   in a single instruction.  */
+
+static bool
+expand_vec_perm_1 (struct expand_vec_perm_d *d)
+{
+  unsigned i, nelt = d->nelt;
+  unsigned char perm2[MAX_VECT_LEN];
+
+  /* Check plain VEC_SELECT first, because AVX has instructions that could
+     match both SEL and SEL+CONCAT, but the plain SEL will allow a memory
+     input where SEL+CONCAT may not.  */
+  if (d->op0 == d->op1)
+    {
+      if (expand_vselect (d->target, d->op0, d->perm, nelt))
+	return true;
+
+      /* There are plenty of patterns in sse.md that are written for
+	 SEL+CONCAT and are not replicated for a single op.  Perhaps
+	 that should be changed, to avoid the nastiness here.  */
+
+      /* Recognize interleave style patterns, which means incrementing
+	 every other permutation operand.  */
+      for (i = 0; i < nelt; i += 2)
+	{
+	  perm2[i] = d->perm[i];
+	  perm2[i+1] = d->perm[i+1] + nelt;
+	}
+      if (expand_vselect_vconcat (d->target, d->op0, d->op0, perm2, nelt))
+	return true;
+
+      /* Recognize shufps, which means adding {0, 0, nelt, nelt}.  */
+      if (nelt >= 4)
+	{
+	  memcpy (perm2, d->perm, nelt);
+	  for (i = 2; i < nelt; i += 4)
+	    {
+	      perm2[i+0] += nelt;
+	      perm2[i+1] += nelt;
+	    }
+
+	  if (expand_vselect_vconcat (d->target, d->op0, d->op0, perm2, nelt))
+	    return true;
+	}
+    }
+
+  /* Finally, try the fully general two operand permute.  */
+  if (expand_vselect_vconcat (d->target, d->op0, d->op1, d->perm, nelt))
+    return true;
+
+  /* Recognize interleave style patterns with reversed operands.  */
+  if (d->op0 != d->op1)
+    {
+      for (i = 0; i < nelt; ++i)
+	{
+	  unsigned e = d->perm[i];
+	  if (e >= nelt)
+	    e -= nelt;
+	  else
+	    e += nelt;
+	  perm2[i] = e;
+	}
+
+      if (expand_vselect_vconcat (d->target, d->op1, d->op0, perm2, nelt))
+	return true;
+    }
+
+  /* Try the SSE4.1 blend variable merge instructions.  */
+  if (expand_vec_perm_blend (d))
+    return true;
+
+  /* Try one of the AVX vpermil variable permutations.  */
+  if (expand_vec_perm_vpermil (d))
+    return true;
+
+  /* Try the SSSE3 pshufb or XOP vpperm variable permutation.  */
+  if (expand_vec_perm_pshufb (d))
+    return true;
+
+  return false;
+}
+
+/* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to implement D
+   in terms of a pair of pshuflw + pshufhw instructions.  */
+
+static bool
+expand_vec_perm_pshuflw_pshufhw (struct expand_vec_perm_d *d)
+{
+  unsigned char perm2[MAX_VECT_LEN];
+  unsigned i;
+  bool ok;
+
+  if (d->vmode != V8HImode || d->op0 != d->op1)
+    return false;
+
+  /* The two permutations only operate in 64-bit lanes.  */
+  for (i = 0; i < 4; ++i)
+    if (d->perm[i] >= 4)
+      return false;
+  for (i = 4; i < 8; ++i)
+    if (d->perm[i] < 4)
+      return false;
+
+  if (d->testing_p)
+    return true;
+
+  /* Emit the pshuflw.  */
+  memcpy (perm2, d->perm, 4);
+  for (i = 4; i < 8; ++i)
+    perm2[i] = i;
+  ok = expand_vselect (d->target, d->op0, perm2, 8);
+  gcc_assert (ok);
+
+  /* Emit the pshufhw.  */
+  memcpy (perm2 + 4, d->perm + 4, 4);
+  for (i = 0; i < 4; ++i)
+    perm2[i] = i;
+  ok = expand_vselect (d->target, d->target, perm2, 8);
+  gcc_assert (ok);
+
+  return true;
+}
+
+/* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to simplify
+   the permutation using the SSSE3 palignr instruction.  This succeeds
+   when all of the elements in PERM fit within one vector and we merely
+   need to shift them down so that a single vector permutation has a
+   chance to succeed.  */
+
+static bool
+expand_vec_perm_palignr (struct expand_vec_perm_d *d)
+{
+  unsigned i, nelt = d->nelt;
+  unsigned min, max;
+  bool in_order, ok;
+  rtx shift;
+
+  /* Even with AVX, palignr only operates on 128-bit vectors.  */
+  if (!TARGET_SSSE3 || GET_MODE_SIZE (d->vmode) != 16)
+    return false;
+
+  min = nelt, max = 0;
+  for (i = 0; i < nelt; ++i)
+    {
+      unsigned e = d->perm[i];
+      if (e < min)
+	min = e;
+      if (e > max)
+	max = e;
+    }
+  if (min == 0 || max - min >= nelt)
+    return false;
+
+  /* Given that we have SSSE3, we know we'll be able to implement the
+     single operand permutation after the palignr with pshufb.  */
+  if (d->testing_p)
+    return true;
+
+  shift = GEN_INT (min * GET_MODE_BITSIZE (GET_MODE_INNER (d->vmode)));
+  emit_insn (gen_ssse3_palignrti (gen_lowpart (TImode, d->target),
+				  gen_lowpart (TImode, d->op1),
+				  gen_lowpart (TImode, d->op0), shift));
+
+  d->op0 = d->op1 = d->target;
+
+  in_order = true;
+  for (i = 0; i < nelt; ++i)
+    {
+      unsigned e = d->perm[i] - min;
+      if (e != i)
+	in_order = false;
+      d->perm[i] = e;
+    }
+
+  /* Test for the degenerate case where the alignment by itself
+     produces the desired permutation.  */
+  if (in_order)
+    return true;
+
+  ok = expand_vec_perm_1 (d);
+  gcc_assert (ok);
+
+  return ok;
+}
+
+/* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to simplify
+   a two vector permutation into a single vector permutation by using
+   an interleave operation to merge the vectors.  */
+
+static bool
+expand_vec_perm_interleave2 (struct expand_vec_perm_d *d)
+{
+  struct expand_vec_perm_d dremap, dfinal;
+  unsigned i, nelt = d->nelt, nelt2 = nelt / 2;
+  unsigned contents, h1, h2, h3, h4;
+  unsigned char remap[2 * MAX_VECT_LEN];
+  rtx seq;
+  bool ok;
+
+  if (d->op0 == d->op1)
+    return false;
+
+  /* The 256-bit unpck[lh]p[sd] instructions only operate within the 128-bit
+     lanes.  We can use similar techniques with the vperm2f128 instruction,
+     but it requires slightly different logic.  */
+  if (GET_MODE_SIZE (d->vmode) != 16)
+    return false;
+
+  /* Examine from whence the elements come.  */
+  contents = 0;
+  for (i = 0; i < nelt; ++i)
+    contents |= 1u << d->perm[i];
+
+  /* Split the two input vectors into 4 halves.  */
+  h1 = (1u << nelt2) - 1;
+  h2 = h1 << nelt2;
+  h3 = h2 << nelt2;
+  h4 = h3 << nelt2;
+
+  memset (remap, 0xff, sizeof (remap));
+  dremap = *d;
+
+  /* If the elements from the low halves use interleave low, and similarly
+     for interleave high.  If the elements are from mis-matched halves, we
+     can use shufps for V4SF/V4SI or do a DImode shuffle.  */
+  if ((contents & (h1 | h3)) == contents)
+    {
+      for (i = 0; i < nelt2; ++i)
+	{
+	  remap[i] = i * 2;
+	  remap[i + nelt] = i * 2 + 1;
+	  dremap.perm[i * 2] = i;
+	  dremap.perm[i * 2 + 1] = i + nelt;
+	}
+    }
+  else if ((contents & (h2 | h4)) == contents)
+    {
+      for (i = 0; i < nelt2; ++i)
+	{
+	  remap[i + nelt2] = i * 2;
+	  remap[i + nelt + nelt2] = i * 2 + 1;
+	  dremap.perm[i * 2] = i + nelt2;
+	  dremap.perm[i * 2 + 1] = i + nelt + nelt2;
+	}
+    }
+  else if ((contents & (h1 | h4)) == contents)
+    {
+      for (i = 0; i < nelt2; ++i)
+	{
+	  remap[i] = i;
+	  remap[i + nelt + nelt2] = i + nelt2;
+	  dremap.perm[i] = i;
+	  dremap.perm[i + nelt2] = i + nelt + nelt2;
+	}
+      if (nelt != 4)
+	{
+	  dremap.vmode = V2DImode;
+	  dremap.nelt = 2;
+	  dremap.perm[0] = 0;
+	  dremap.perm[1] = 3;
+	}
+    }
+  else if ((contents & (h2 | h3)) == contents)
+    {
+      for (i = 0; i < nelt2; ++i)
+	{
+	  remap[i + nelt2] = i;
+	  remap[i + nelt] = i + nelt2;
+	  dremap.perm[i] = i + nelt2;
+	  dremap.perm[i + nelt2] = i + nelt;
+	}
+      if (nelt != 4)
+	{
+	  dremap.vmode = V2DImode;
+	  dremap.nelt = 2;
+	  dremap.perm[0] = 1;
+	  dremap.perm[1] = 2;
+	}
+    }
+  else
+    return false;
+
+  /* Use the remapping array set up above to move the elements from their
+     swizzled locations into their final destinations.  */
+  dfinal = *d;
+  for (i = 0; i < nelt; ++i)
+    {
+      unsigned e = remap[d->perm[i]];
+      gcc_assert (e < nelt);
+      dfinal.perm[i] = e;
+    }
+  dfinal.op0 = gen_reg_rtx (dfinal.vmode);
+  dfinal.op1 = dfinal.op0;
+  dremap.target = dfinal.op0;
+
+  /* Test if the final remap can be done with a single insn.  For V4SFmode or
+     V4SImode this *will* succeed.  For V8HImode or V16QImode it may not.  */
+  start_sequence ();
+  ok = expand_vec_perm_1 (&dfinal);
+  seq = get_insns ();
+  end_sequence ();
+
+  if (!ok)
+    return false;
+
+  if (dremap.vmode != dfinal.vmode)
+    {
+      dremap.target = gen_lowpart (dremap.vmode, dremap.target);
+      dremap.op0 = gen_lowpart (dremap.vmode, dremap.op0);
+      dremap.op1 = gen_lowpart (dremap.vmode, dremap.op1);
+    }
+
+  ok = expand_vec_perm_1 (&dremap);
+  gcc_assert (ok);
+
+  emit_insn (seq);
+  return true;
+}
+
+/* A subroutine of expand_vec_perm_even_odd_1.  Implement the double-word
+   permutation with two pshufb insns and an ior.  We should have already
+   failed all two instruction sequences.  */
+
+static bool
+expand_vec_perm_pshufb2 (struct expand_vec_perm_d *d)
+{
+  rtx rperm[2][16], vperm, l, h, op, m128;
+  unsigned int i, nelt, eltsz;
+
+  if (!TARGET_SSSE3)
+    return false;
+
+  nelt = d->nelt;
+  eltsz = GET_MODE_SIZE (GET_MODE_INNER (d->vmode));
+  
+  /* Generate two permutation masks.  If the required element is within
+     the given vector it is shuffled into the proper lane.  If the required
+     element is in the other vector, force a zero into the lane by setting
+     bit 7 in the permutation mask.  */
+  m128 = GEN_INT (-128);
+  for (i = 0; i < nelt; ++i)
+    {
+      unsigned j, e = d->perm[i];
+      unsigned which = (e >= nelt);
+      if (e >= nelt)
+	e -= nelt;
+
+      for (j = 0; j < eltsz; ++j)
+	{
+	  rperm[which][i*eltsz + j] = GEN_INT (e*eltsz + j);
+	  rperm[1-which][i*eltsz + j] = m128;
+	}
+    }
+
+  vperm = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, rperm[0]));
+  vperm = force_reg (V16QImode, vperm);
+
+  l = gen_reg_rtx (V16QImode);
+  op = gen_lowpart (V16QImode, d->op0);
+  emit_insn (gen_ssse3_pshufbv16qi3 (l, op, vperm));
+
+  vperm = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, rperm[1]));
+  vperm = force_reg (V16QImode, vperm);
+
+  h = gen_reg_rtx (V16QImode);
+  op = gen_lowpart (V16QImode, d->op1);
+  emit_insn (gen_ssse3_pshufbv16qi3 (h, op, vperm));
+
+  op = gen_lowpart (V16QImode, d->target);
+  emit_insn (gen_iorv16qi3 (op, l, h));
+
+  return true;
+}
+
+/* A subroutine of ix86_expand_vec_perm_builtin_1.  Pattern match
+   extract-even and extract-odd permutations.  */
+
+static bool
+expand_vec_perm_even_odd_1 (struct expand_vec_perm_d *d, unsigned odd)
+{
+  rtx t1, t2, t3, t4;
+
+  switch (d->vmode)
+    {
+    case V4DFmode:
+      t1 = gen_reg_rtx (V4DFmode);
+      t2 = gen_reg_rtx (V4DFmode);
+
+      /* Shuffle the lanes around into { 0 1 4 5 } and { 2 3 6 7 }.  */
+      emit_insn (gen_avx_vperm2f128v4df3 (t1, d->op0, d->op1, GEN_INT (0x20)));
+      emit_insn (gen_avx_vperm2f128v4df3 (t2, d->op0, d->op1, GEN_INT (0x31)));
+
+      /* Now an unpck[lh]pd will produce the result required.  */
+      if (odd)
+	t3 = gen_avx_unpckhpd256 (d->target, t1, t2);
+      else
+	t3 = gen_avx_unpcklpd256 (d->target, t1, t2);
+      emit_insn (t3);
+      break;
+
+    case V8SFmode:
+      {
+	static const unsigned char perm1[8] = { 0, 2, 1, 3, 5, 6, 5, 7 };
+	static const unsigned char perme[8] = { 0, 1,  8,  9, 4, 5, 12, 13 };
+	static const unsigned char permo[8] = { 2, 3, 10, 11, 6, 7, 14, 15 };
+
+	t1 = gen_reg_rtx (V8SFmode);
+	t2 = gen_reg_rtx (V8SFmode);
+	t3 = gen_reg_rtx (V8SFmode);
+	t4 = gen_reg_rtx (V8SFmode);
+
+	/* Shuffle within the 128-bit lanes to produce:
+	   { 0 2 1 3 4 6 5 7 } and { 8 a 9 b c e d f }.  */
+	expand_vselect (t1, d->op0, perm1, 8);
+	expand_vselect (t2, d->op1, perm1, 8);
+
+	/* Shuffle the lanes around to produce:
+	   { 0 2 1 3 8 a 9 b } and { 4 6 5 7 c e d f }.  */
+	emit_insn (gen_avx_vperm2f128v8sf3 (t3, t1, t2, GEN_INT (0x20)));
+	emit_insn (gen_avx_vperm2f128v8sf3 (t4, t1, t2, GEN_INT (0x31)));
+
+	/* Now a vpermil2p will produce the result required.  */
+	/* ??? The vpermil2p requires a vector constant.  Another option
+	   is a unpck[lh]ps to merge the two vectors to produce
+	   { 0 4 2 6 8 c a e } or { 1 5 3 7 9 d b f }.  Then use another
+	   vpermilps to get the elements into the final order.  */
+	d->op0 = t3;
+	d->op1 = t4;
+	memcpy (d->perm, odd ? permo: perme, 8);
+	expand_vec_perm_vpermil (d);
+      }
+      break;
+
+    case V2DFmode:
+    case V4SFmode:
+    case V2DImode:
+    case V4SImode:
+      /* These are always directly implementable by expand_vec_perm_1.  */
+      gcc_unreachable ();
+
+    case V8HImode:
+      if (TARGET_SSSE3)
+	return expand_vec_perm_pshufb2 (d);
+      else
+	{
+	  /* We need 2*log2(N)-1 operations to achieve odd/even
+	     with interleave. */
+	  t1 = gen_reg_rtx (V8HImode);
+	  t2 = gen_reg_rtx (V8HImode);
+	  emit_insn (gen_sse2_punpckhwd (t1, d->op0, d->op1));
+	  emit_insn (gen_sse2_punpcklwd (d->target, d->op0, d->op1));
+	  emit_insn (gen_sse2_punpckhwd (t2, d->target, t1));
+	  emit_insn (gen_sse2_punpcklwd (d->target, d->target, t1));
+	  if (odd)
+	    emit_insn (gen_sse2_punpckhwd (d->target, d->target, t2));
+	  else
+	    emit_insn (gen_sse2_punpcklwd (d->target, d->target, t2));
+	}
+      break;
+
+    case V16QImode:
+      if (TARGET_SSSE3)
+	return expand_vec_perm_pshufb2 (d);
+      else
+	{
+	  t1 = gen_reg_rtx (V16QImode);
+	  t2 = gen_reg_rtx (V16QImode);
+	  t3 = gen_reg_rtx (V16QImode);
+	  emit_insn (gen_sse2_punpckhbw (t1, d->op0, d->op1));
+	  emit_insn (gen_sse2_punpcklbw (d->target, d->op0, d->op1));
+	  emit_insn (gen_sse2_punpckhbw (t2, d->target, t1));
+	  emit_insn (gen_sse2_punpcklbw (d->target, d->target, t1));
+	  emit_insn (gen_sse2_punpckhbw (t3, d->target, t2));
+	  emit_insn (gen_sse2_punpcklbw (d->target, d->target, t2));
+	  if (odd)
+	    emit_insn (gen_sse2_punpckhbw (d->target, d->target, t3));
+	  else
+	    emit_insn (gen_sse2_punpcklbw (d->target, d->target, t3));
+	}
+      break;
+
+    default:
+      gcc_unreachable ();
+    }
+
+  return true;
+}
+
+static bool
+expand_vec_perm_even_odd (struct expand_vec_perm_d *d)
+{
+  unsigned i, odd, nelt = d->nelt;
+
+  odd = d->perm[0];
+  if (odd != 0 && odd != 1)
+    return false;
+
+  for (i = 1; i < nelt; ++i)
+    if (d->perm[i] != 2 * i + odd)
+      return false;
+
+  return expand_vec_perm_even_odd_1 (d, odd);
+}
+
+/* The guts of ix86_expand_vec_perm_builtin, also used by the ok hook.
+   With all of the interface bits taken care of, perform the expansion
+   in D and return true on success.  */
+
+static bool
+ix86_expand_vec_perm_builtin_1 (struct expand_vec_perm_d *d)
+{
+  /* First things first -- check if the instruction is implementable
+     with a single instruction.  */
+  if (expand_vec_perm_1 (d))
+    return true;
+
+  /* Try sequences of two instructions.  */
+
+  if (expand_vec_perm_pshuflw_pshufhw (d))
+    return true;
+
+  if (expand_vec_perm_palignr (d))
+    return true;
+
+  if (expand_vec_perm_interleave2 (d))
+    return true;
+
+  /* Try sequences of three instructions.  */
+
+  if (expand_vec_perm_pshufb2 (d))
+    return true;
+
+  /* ??? Look for narrow permutations whose element orderings would
+     allow the promition to a wider mode.  */
+
+  /* ??? Look for sequences of interleave or a wider permute that place
+     the data into the correct lanes for a half-vector shuffle like
+     pshuf[lh]w or vpermilps.  */
+
+  /* ??? Look for sequences of interleave that produce the desired results.
+     The combinatorics of punpck[lh] get pretty ugly... */
+
+  if (expand_vec_perm_even_odd (d))
+    return true;
+
+  /* ??? Pattern match broadcast.  */
+
+  return false;
+}
+
+/* Extract the values from the vector CST into the permutation array in D.
+   Return 0 on error, 1 if all values from the permutation come from the
+   first vector, 2 if all values from the second vector, and 3 otherwise.  */
+
+static int
+extract_vec_perm_cst (struct expand_vec_perm_d *d, tree cst)
+{
+  tree list = TREE_VECTOR_CST_ELTS (cst);
+  unsigned i, nelt = d->nelt;
+  int ret = 0;
+
+  for (i = 0; i < nelt; ++i, list = TREE_CHAIN (list))
+    {
+      unsigned HOST_WIDE_INT e;
+
+      if (!host_integerp (TREE_VALUE (list), 1))
+	return 0;
+      e = tree_low_cst (TREE_VALUE (list), 1);
+      if (e >= 2 * nelt)
+	return 0;
+
+      ret |= (e < nelt ? 1 : 2);
+      d->perm[i] = e;
+    }
+  gcc_assert (list == NULL);
+
+  /* For all elements from second vector, fold the elements to first.  */
+  if (ret == 2)
+    for (i = 0; i < nelt; ++i)
+      d->perm[i] -= nelt;
+
+  return ret;
+}
+
+static rtx
+ix86_expand_vec_perm_builtin (tree exp)
+{
+  struct expand_vec_perm_d d;
+  tree arg0, arg1, arg2;
+
+  arg0 = CALL_EXPR_ARG (exp, 0);
+  arg1 = CALL_EXPR_ARG (exp, 1);
+  arg2 = CALL_EXPR_ARG (exp, 2);
+
+  d.vmode = TYPE_MODE (TREE_TYPE (arg0));
+  d.nelt = GET_MODE_NUNITS (d.vmode);
+  d.testing_p = false;
+  gcc_assert (VECTOR_MODE_P (d.vmode));
+
+  if (TREE_CODE (arg2) != VECTOR_CST)
+    {
+      error_at (EXPR_LOCATION (exp),
+		"vector permutation requires vector constant");
+      goto exit_error;
+    }
+
+  switch (extract_vec_perm_cst (&d, arg2))
+    {
+    default:
+      gcc_unreachable();
+
+    case 0:
+      error_at (EXPR_LOCATION (exp), "invalid vector permutation constant");
+      goto exit_error;
+
+    case 3:
+      if (!operand_equal_p (arg0, arg1, 0))
+	{
+	  d.op0 = expand_expr (arg0, NULL_RTX, d.vmode, EXPAND_NORMAL);
+	  d.op0 = force_reg (d.vmode, d.op0);
+	  d.op1 = expand_expr (arg1, NULL_RTX, d.vmode, EXPAND_NORMAL);
+	  d.op1 = force_reg (d.vmode, d.op1);
+	  break;
+	}
+
+      /* The elements of PERM do not suggest that only the first operand
+	 is used, but both operands are identical.  Allow easier matching
+	 of the permutation by folding the permutation into the single
+	 input vector.  */
+      {
+	unsigned i, nelt = d.nelt;
+	for (i = 0; i < nelt; ++i)
+	  if (d.perm[i] >= nelt)
+	    d.perm[i] -= nelt;
+      }
+      /* FALLTHRU */
+
+    case 1:
+      d.op0 = expand_expr (arg0, NULL_RTX, d.vmode, EXPAND_NORMAL);
+      d.op0 = force_reg (d.vmode, d.op0);
+      d.op1 = d.op0;
+      break;
+
+    case 2:
+      d.op0 = expand_expr (arg1, NULL_RTX, d.vmode, EXPAND_NORMAL);
+      d.op0 = force_reg (d.vmode, d.op0);
+      d.op1 = d.op0;
+      break;
+    }
+ 
+  d.target = gen_reg_rtx (d.vmode);
+  if (ix86_expand_vec_perm_builtin_1 (&d))
+    return d.target;
+
+  /* For compiler generated permutations, we should never got here, because
+     the compiler should also be checking the ok hook.  But since this is a
+     builtin the user has access too, so don't abort.  */
+  switch (d.nelt)
+    {
+    case 2:
+      sorry ("vector permutation (%d %d)", d.perm[0], d.perm[1]);
+      break;
+    case 4:
+      sorry ("vector permutation (%d %d %d %d)",
+	     d.perm[0], d.perm[1], d.perm[2], d.perm[3]);
+      break;
+    case 8:
+      sorry ("vector permutation (%d %d %d %d %d %d %d %d)",
+	     d.perm[0], d.perm[1], d.perm[2], d.perm[3],
+	     d.perm[4], d.perm[5], d.perm[6], d.perm[7]);
+      break;
+    case 16:
+      sorry ("vector permutation "
+	     "(%d %d %d %d %d %d %d %d %d %d %d %d %d %d %d %d)",
+	     d.perm[0], d.perm[1], d.perm[2], d.perm[3],
+	     d.perm[4], d.perm[5], d.perm[6], d.perm[7],
+	     d.perm[8], d.perm[9], d.perm[10], d.perm[11],
+	     d.perm[12], d.perm[13], d.perm[14], d.perm[15]);
+      break;
+    default:
+      gcc_unreachable ();
+    }
+ exit_error:
+  return CONST0_RTX (d.vmode);
+}
+
+/* Implement targetm.vectorize.builtin_vec_perm_ok.  */
+
+static bool
+ix86_vectorize_builtin_vec_perm_ok (tree vec_type, tree mask)
+{
+  struct expand_vec_perm_d d;
+  int vec_mask;
+  bool ret, one_vec;
+
+  d.vmode = TYPE_MODE (vec_type);
+  d.nelt = GET_MODE_NUNITS (d.vmode);
+  d.testing_p = true;
+
+  /* Given sufficient ISA support we can just return true here
+     for selected vector modes.  */
+  if (GET_MODE_SIZE (d.vmode) == 16)
+    {
+      /* All implementable with a single vpperm insn.  */
+      if (TARGET_XOP)
+	return true;
+      /* All implementable with 2 pshufb + 1 ior.  */
+      if (TARGET_SSSE3)
+	return true;
+      /* All implementable with shufpd or unpck[lh]pd.  */
+      if (d.nelt == 2)
+	return true;
+    }
+
+  vec_mask = extract_vec_perm_cst (&d, mask);
+
+  /* This hook is cannot be called in response to something that the
+     user does (unlike the builtin expander) so we shouldn't ever see
+     an error generated from the extract.  */
+  gcc_assert (vec_mask > 0 && vec_mask <= 3);
+  one_vec = (vec_mask != 3);
+  
+  /* Implementable with shufps or pshufd.  */
+  if (one_vec && (d.vmode == V4SFmode || d.vmode == V4SImode))
+    return true;
+
+  /* Otherwise we have to go through the motions and see if we can
+     figure out how to generate the requested permutation.  */
+  d.target = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 1);
+  d.op1 = d.op0 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 2);
+  if (!one_vec)
+    d.op1 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 3);
+
+  start_sequence ();
+  ret = ix86_expand_vec_perm_builtin_1 (&d);
+  end_sequence ();
+
+  return ret;
+}
+
+void
+ix86_expand_vec_extract_even_odd (rtx targ, rtx op0, rtx op1, unsigned odd)
+{
+  struct expand_vec_perm_d d;
+  unsigned i, nelt;
+
+  d.target = targ;
+  d.op0 = op0;
+  d.op1 = op1;
+  d.vmode = GET_MODE (targ);
+  d.nelt = nelt = GET_MODE_NUNITS (d.vmode);
+  d.testing_p = false;
+
+  for (i = 0; i < nelt; ++i)
+    d.perm[i] = i * 2 + odd;
+
+  /* We'll either be able to implement the permutation directly...  */
+  if (expand_vec_perm_1 (&d))
+    return;
+
+  /* ... or we use the special-case patterns.  */
+  expand_vec_perm_even_odd_1 (&d, odd);
+}
+\f
 /* This function returns the calling abi specific va_list type node.
    It returns  the FNDECL specific va_list type.  */
 
@@ -29254,7 +30373,14 @@ ix86_enum_va_list (int idx, const char **pname, tree *ptree)
 #define TARGET_SECONDARY_RELOAD ix86_secondary_reload
 
 #undef TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST
-#define TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST x86_builtin_vectorization_cost
+#define TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST \
+  ix86_builtin_vectorization_cost
+#undef TARGET_VECTORIZE_BUILTIN_VEC_PERM
+#define TARGET_VECTORIZE_BUILTIN_VEC_PERM \
+  ix86_vectorize_builtin_vec_perm
+#undef TARGET_VECTORIZE_BUILTIN_VEC_PERM_OK
+#define TARGET_VECTORIZE_BUILTIN_VEC_PERM_OK \
+  ix86_vectorize_builtin_vec_perm_ok
 
 #undef TARGET_SET_CURRENT_FUNCTION
 #define TARGET_SET_CURRENT_FUNCTION ix86_set_current_function
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 8661b4a..b4bcc5f 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -71,6 +71,14 @@
 (define_mode_iterator SSEMODE124C8 [V16QI V8HI V4SI
 				    (V2DI "TARGET_SSE4_2")])
 
+;; Modes handled by vec_extract_even/odd pattern.
+(define_mode_iterator SSEMODE_EO
+  [(V4SF "TARGET_SSE")
+   (V2DF "TARGET_SSE2")
+   (V2DI "TARGET_SSE2") (V4SI "TARGET_SSE2")
+   (V8HI "TARGET_SSE2") (V16QI "TARGET_SSE2")
+   (V4DF "TARGET_AVX") (V8SF "TARGET_AVX")])
+
 ;; Mapping from float mode to required SSE level
 (define_mode_attr sse [(SF "sse") (DF "sse2") (V4SF "sse") (V2DF "sse2")])
 
@@ -4693,48 +4701,24 @@
 })
 
 (define_expand "vec_extract_even<mode>"
-  [(set (match_operand:SSEMODE4S 0 "register_operand" "")
-	(vec_select:SSEMODE4S
-	  (vec_concat:<ssedoublesizemode>
-	    (match_operand:SSEMODE4S 1 "register_operand" "")
-	    (match_operand:SSEMODE4S 2 "nonimmediate_operand" ""))
-	  (parallel [(const_int 0)
-		     (const_int 2)
-		     (const_int 4)
-		     (const_int 6)])))]
-  "TARGET_SSE")
-
-(define_expand "vec_extract_odd<mode>"
-  [(set (match_operand:SSEMODE4S 0 "register_operand" "")
-	(vec_select:SSEMODE4S
-	  (vec_concat:<ssedoublesizemode>
-	    (match_operand:SSEMODE4S 1 "register_operand" "")
-	    (match_operand:SSEMODE4S 2 "nonimmediate_operand" ""))
-	  (parallel [(const_int 1)
-		     (const_int 3)
-		     (const_int 5)
-		     (const_int 7)])))]
-  "TARGET_SSE")
-
-(define_expand "vec_extract_even<mode>"
-  [(set (match_operand:SSEMODE2D 0 "register_operand" "")
-	(vec_select:SSEMODE2D
-	  (vec_concat:<ssedoublesizemode>
-	    (match_operand:SSEMODE2D 1 "register_operand" "")
-	    (match_operand:SSEMODE2D 2 "nonimmediate_operand" ""))
-	  (parallel [(const_int 0)
-	  	     (const_int 2)])))]
-  "TARGET_SSE2")
+  [(match_operand:SSEMODE_EO 0 "register_operand" "")
+   (match_operand:SSEMODE_EO 1 "register_operand" "")
+   (match_operand:SSEMODE_EO 2 "register_operand" "")]
+  ""
+{
+  ix86_expand_vec_extract_even_odd (operands[0], operands[1], operands[2], 0);
+  DONE;
+})
 
 (define_expand "vec_extract_odd<mode>"
-  [(set (match_operand:SSEMODE2D 0 "register_operand" "")
-	(vec_select:SSEMODE2D
-	  (vec_concat:<ssedoublesizemode>
-	    (match_operand:SSEMODE2D 1 "register_operand" "")
-	    (match_operand:SSEMODE2D 2 "nonimmediate_operand" ""))
-	  (parallel [(const_int 1)
-	  	     (const_int 3)])))]
-  "TARGET_SSE2")
+  [(match_operand:SSEMODE_EO 0 "register_operand" "")
+   (match_operand:SSEMODE_EO 1 "register_operand" "")
+   (match_operand:SSEMODE_EO 2 "register_operand" "")]
+  ""
+{
+  ix86_expand_vec_extract_even_odd (operands[0], operands[1], operands[2], 1);
+  DONE;
+})
 
 ;; punpcklqdq and punpckhqdq are shorter than shufpd.
 (define_insn "*avx_punpckhqdq"
@@ -5243,20 +5227,16 @@
    (set_attr "prefix_data16" "1")
    (set_attr "mode" "TI")])
 
-(define_insn_and_split "mulv16qi3"
+(define_expand "mulv16qi3"
   [(set (match_operand:V16QI 0 "register_operand" "")
 	(mult:V16QI (match_operand:V16QI 1 "register_operand" "")
 		    (match_operand:V16QI 2 "register_operand" "")))]
-  "TARGET_SSE2
-   && can_create_pseudo_p ()"
-  "#"
-  "&& 1"
-  [(const_int 0)]
+  "TARGET_SSE2"
 {
-  rtx t[12];
+  rtx t[6];
   int i;
 
-  for (i = 0; i < 12; ++i)
+  for (i = 0; i < 6; ++i)
     t[i] = gen_reg_rtx (V16QImode);
 
   /* Unpack data such that we've got a source byte in each low byte of
@@ -5278,15 +5258,8 @@
 			   gen_lowpart (V8HImode, t[2]),
 			   gen_lowpart (V8HImode, t[3])));
 
-  /* Extract the relevant bytes and merge them back together.  */
-  emit_insn (gen_sse2_punpckhbw (t[6], t[5], t[4]));	/* ..AI..BJ..CK..DL */
-  emit_insn (gen_sse2_punpcklbw (t[7], t[5], t[4]));	/* ..EM..FN..GO..HP */
-  emit_insn (gen_sse2_punpckhbw (t[8], t[7], t[6]));	/* ....AEIM....BFJN */
-  emit_insn (gen_sse2_punpcklbw (t[9], t[7], t[6]));	/* ....CGKO....DHLP */
-  emit_insn (gen_sse2_punpckhbw (t[10], t[9], t[8]));	/* ........ACEGIKMO */
-  emit_insn (gen_sse2_punpcklbw (t[11], t[9], t[8]));	/* ........BDFHJLNP */
-
-  emit_insn (gen_sse2_punpcklbw (operands[0], t[11], t[10]));	/* ABCDEFGHIJKLMNOP */
+  /* Extract the even bytes and merge them back together.  */
+  ix86_expand_vec_extract_even_odd (operands[0], t[5], t[4], 0);
   DONE;
 })
 
@@ -6578,96 +6551,39 @@
 ;;
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 
-;; Reduce:
-;;      op1 = abcdefghijklmnop
-;;      op2 = qrstuvwxyz012345
-;;       h1 = aqbrcsdteufvgwhx
-;;       l1 = iyjzk0l1m2n3o4p5
-;;       h2 = aiqybjrzcks0dlt1
-;;       l2 = emu2fnv3gow4hpx5
-;;       h3 = aeimquy2bfjnrvz3
-;;       l3 = cgkosw04dhlptx15
-;;   result = bdfhjlnprtvxz135
 (define_expand "vec_pack_trunc_v8hi"
   [(match_operand:V16QI 0 "register_operand" "")
    (match_operand:V8HI 1 "register_operand" "")
    (match_operand:V8HI 2 "register_operand" "")]
   "TARGET_SSE2"
 {
-  rtx op1, op2, h1, l1, h2, l2, h3, l3;
-
-  op1 = gen_lowpart (V16QImode, operands[1]);
-  op2 = gen_lowpart (V16QImode, operands[2]);
-  h1 = gen_reg_rtx (V16QImode);
-  l1 = gen_reg_rtx (V16QImode);
-  h2 = gen_reg_rtx (V16QImode);
-  l2 = gen_reg_rtx (V16QImode);
-  h3 = gen_reg_rtx (V16QImode);
-  l3 = gen_reg_rtx (V16QImode);
-
-  emit_insn (gen_vec_interleave_highv16qi (h1, op1, op2));
-  emit_insn (gen_vec_interleave_lowv16qi (l1, op1, op2));
-  emit_insn (gen_vec_interleave_highv16qi (h2, l1, h1));
-  emit_insn (gen_vec_interleave_lowv16qi (l2, l1, h1));
-  emit_insn (gen_vec_interleave_highv16qi (h3, l2, h2));
-  emit_insn (gen_vec_interleave_lowv16qi (l3, l2, h2));
-  emit_insn (gen_vec_interleave_lowv16qi (operands[0], l3, h3));
+  rtx op1 = gen_lowpart (V16QImode, operands[1]);
+  rtx op2 = gen_lowpart (V16QImode, operands[2]);
+  ix86_expand_vec_extract_even_odd (operands[0], op1, op2, 0);
   DONE;
 })
 
-;; Reduce:
-;;      op1 = abcdefgh
-;;      op2 = ijklmnop
-;;       h1 = aibjckdl
-;;       l1 = emfngohp
-;;       h2 = aeimbfjn
-;;       l2 = cgkodhlp
-;;   result = bdfhjlnp
 (define_expand "vec_pack_trunc_v4si"
   [(match_operand:V8HI 0 "register_operand" "")
    (match_operand:V4SI 1 "register_operand" "")
    (match_operand:V4SI 2 "register_operand" "")]
   "TARGET_SSE2"
 {
-  rtx op1, op2, h1, l1, h2, l2;
-
-  op1 = gen_lowpart (V8HImode, operands[1]);
-  op2 = gen_lowpart (V8HImode, operands[2]);
-  h1 = gen_reg_rtx (V8HImode);
-  l1 = gen_reg_rtx (V8HImode);
-  h2 = gen_reg_rtx (V8HImode);
-  l2 = gen_reg_rtx (V8HImode);
-
-  emit_insn (gen_vec_interleave_highv8hi (h1, op1, op2));
-  emit_insn (gen_vec_interleave_lowv8hi (l1, op1, op2));
-  emit_insn (gen_vec_interleave_highv8hi (h2, l1, h1));
-  emit_insn (gen_vec_interleave_lowv8hi (l2, l1, h1));
-  emit_insn (gen_vec_interleave_lowv8hi (operands[0], l2, h2));
+  rtx op1 = gen_lowpart (V8HImode, operands[1]);
+  rtx op2 = gen_lowpart (V8HImode, operands[2]);
+  ix86_expand_vec_extract_even_odd (operands[0], op1, op2, 0);
   DONE;
 })
 
-;; Reduce:
-;;     op1 = abcd
-;;     op2 = efgh
-;;      h1 = aebf
-;;      l1 = cgdh
-;;  result = bdfh
 (define_expand "vec_pack_trunc_v2di"
   [(match_operand:V4SI 0 "register_operand" "")
    (match_operand:V2DI 1 "register_operand" "")
    (match_operand:V2DI 2 "register_operand" "")]
   "TARGET_SSE2"
 {
-  rtx op1, op2, h1, l1;
-
-  op1 = gen_lowpart (V4SImode, operands[1]);
-  op2 = gen_lowpart (V4SImode, operands[2]);
-  h1 = gen_reg_rtx (V4SImode);
-  l1 = gen_reg_rtx (V4SImode);
-
-  emit_insn (gen_vec_interleave_highv4si (h1, op1, op2));
-  emit_insn (gen_vec_interleave_lowv4si (l1, op1, op2));
-  emit_insn (gen_vec_interleave_lowv4si (operands[0], l1, h1));
+  rtx op1 = gen_lowpart (V4SImode, operands[1]);
+  rtx op2 = gen_lowpart (V4SImode, operands[2]);
+  ix86_expand_vec_extract_even_odd (operands[0], op1, op2, 0);
   DONE;
 })
 
diff --git a/gcc/testsuite/gcc.dg/vect/slp-21.c b/gcc/testsuite/gcc.dg/vect/slp-21.c
index 327045e..182ad49 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-21.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-21.c
@@ -200,8 +200,8 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 4 loops" 1 "vect"  { target vect_strided } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target  { ! { vect_strided } } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 4 loops" 1 "vect"  { target { vect_strided || vect_extract_even_odd } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target  { ! { vect_strided || vect_extract_even_odd } } } } } */
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target vect_strided }  } } */
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect"  { target { ! { vect_strided } } } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
diff --git a/gcc/testsuite/gcc.target/i386/isa-check.h b/gcc/testsuite/gcc.target/i386/isa-check.h
new file mode 100644
index 0000000..8ddbf4d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/isa-check.h
@@ -0,0 +1,85 @@
+#include "cpuid.h"
+
+extern void exit (int) __attribute__((noreturn));
+
+/* Determine what instruction set we've been compiled for,
+   and detect that we're running with it.  */
+static void __attribute__((constructor))
+check_isa (void)
+{
+  int a, b, c, d;
+  int c1, d1, c1e, d1e;
+
+  c1 = d1 = c1e = d1e = 0;
+
+#ifdef __MMX__
+  d1 |= bit_MMX;
+#endif
+#ifdef __3dNOW__
+  d1e |= bit_3DNOW;
+#endif
+#ifdef __3dNOW_A__
+  d1e |= bit_3DNOWP;
+#endif
+#ifdef __SSE__
+  d1 |= bit_SSE;
+#endif
+#ifdef __SSE2__
+  d1 |= bit_SSE2;
+#endif
+#ifdef __SSE3__
+  c1 |= bit_SSE3;
+#endif
+#ifdef __SSSE3__
+  c1 |= bit_SSSE3;
+#endif
+#ifdef __SSE4_1__
+  c1 |= bit_SSE4_1;
+#endif
+#ifdef __SSE4_2__
+  c1 |= bit_SSE4_2;
+#endif
+#ifdef __AES__
+  c1 |= bit_AES;
+#endif
+#ifdef __PCLMUL__
+  c1 |= bit_PCLMUL;
+#endif
+#ifdef __AVX__
+  c1 |= bit_AVX;
+#endif
+#ifdef __FMA__
+  c1 |= bit_FMA;
+#endif
+#ifdef __SSE4A__
+  c1e |= bit_SSE4a;
+#endif
+#ifdef __FMA4__
+  c1e |= bit_FMA4;
+#endif
+#ifdef __XOP__
+  c1e |= bit_XOP;
+#endif
+#ifdef __LWP__
+  c1e |= bit_LWP;
+#endif
+
+  if (c1 | d1)
+    {
+      if (!__get_cpuid (1, &a, &b, &c, &d))
+	goto fail;
+      if ((c & c1) != c1 || (d & d1) != d1)
+	goto fail;
+    }
+  if (c1e | d1e)
+    {
+      if (!__get_cpuid (0x80000001, &a, &b, &c, &d))
+	goto fail;
+      if ((c & c1e) != c1e || (d & d1e) != d1e)
+	goto fail;
+    }
+  return;
+
+ fail:
+  exit (0);
+}
diff --git a/gcc/testsuite/gcc.target/i386/vperm-2-2.inc b/gcc/testsuite/gcc.target/i386/vperm-2-2.inc
new file mode 100644
index 0000000..ef66f68
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vperm-2-2.inc
@@ -0,0 +1,27 @@
+/* This file auto-generated with ./vperm.pl 2 2.  */
+
+void check0(void)
+{
+  TEST (0, 0)
+  TEST (1, 0)
+  TEST (2, 0)
+  TEST (3, 0)
+  TEST (0, 1)
+  TEST (1, 1)
+  TEST (2, 1)
+  TEST (3, 1)
+  TEST (0, 2)
+  TEST (1, 2)
+  TEST (2, 2)
+  TEST (3, 2)
+  TEST (0, 3)
+  TEST (1, 3)
+  TEST (2, 3)
+  TEST (3, 3)
+}
+
+void check(void)
+{
+  check0 ();
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/vperm-4-1.inc b/gcc/testsuite/gcc.target/i386/vperm-4-1.inc
new file mode 100644
index 0000000..c04f185
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vperm-4-1.inc
@@ -0,0 +1,272 @@
+/* This file auto-generated with ./vperm.pl 4 1.  */
+
+void check0(void)
+{
+  TEST (0, 0, 0, 0)
+  TEST (1, 0, 0, 0)
+  TEST (2, 0, 0, 0)
+  TEST (3, 0, 0, 0)
+  TEST (0, 1, 0, 0)
+  TEST (1, 1, 0, 0)
+  TEST (2, 1, 0, 0)
+  TEST (3, 1, 0, 0)
+  TEST (0, 2, 0, 0)
+  TEST (1, 2, 0, 0)
+  TEST (2, 2, 0, 0)
+  TEST (3, 2, 0, 0)
+  TEST (0, 3, 0, 0)
+  TEST (1, 3, 0, 0)
+  TEST (2, 3, 0, 0)
+  TEST (3, 3, 0, 0)
+  TEST (0, 0, 1, 0)
+  TEST (1, 0, 1, 0)
+  TEST (2, 0, 1, 0)
+  TEST (3, 0, 1, 0)
+  TEST (0, 1, 1, 0)
+  TEST (1, 1, 1, 0)
+  TEST (2, 1, 1, 0)
+  TEST (3, 1, 1, 0)
+  TEST (0, 2, 1, 0)
+  TEST (1, 2, 1, 0)
+  TEST (2, 2, 1, 0)
+  TEST (3, 2, 1, 0)
+  TEST (0, 3, 1, 0)
+  TEST (1, 3, 1, 0)
+  TEST (2, 3, 1, 0)
+  TEST (3, 3, 1, 0)
+  TEST (0, 0, 2, 0)
+  TEST (1, 0, 2, 0)
+  TEST (2, 0, 2, 0)
+  TEST (3, 0, 2, 0)
+  TEST (0, 1, 2, 0)
+  TEST (1, 1, 2, 0)
+  TEST (2, 1, 2, 0)
+  TEST (3, 1, 2, 0)
+  TEST (0, 2, 2, 0)
+  TEST (1, 2, 2, 0)
+  TEST (2, 2, 2, 0)
+  TEST (3, 2, 2, 0)
+  TEST (0, 3, 2, 0)
+  TEST (1, 3, 2, 0)
+  TEST (2, 3, 2, 0)
+  TEST (3, 3, 2, 0)
+  TEST (0, 0, 3, 0)
+  TEST (1, 0, 3, 0)
+  TEST (2, 0, 3, 0)
+  TEST (3, 0, 3, 0)
+  TEST (0, 1, 3, 0)
+  TEST (1, 1, 3, 0)
+  TEST (2, 1, 3, 0)
+  TEST (3, 1, 3, 0)
+  TEST (0, 2, 3, 0)
+  TEST (1, 2, 3, 0)
+  TEST (2, 2, 3, 0)
+  TEST (3, 2, 3, 0)
+  TEST (0, 3, 3, 0)
+  TEST (1, 3, 3, 0)
+  TEST (2, 3, 3, 0)
+  TEST (3, 3, 3, 0)
+  TEST (0, 0, 0, 1)
+  TEST (1, 0, 0, 1)
+  TEST (2, 0, 0, 1)
+  TEST (3, 0, 0, 1)
+  TEST (0, 1, 0, 1)
+  TEST (1, 1, 0, 1)
+  TEST (2, 1, 0, 1)
+  TEST (3, 1, 0, 1)
+  TEST (0, 2, 0, 1)
+  TEST (1, 2, 0, 1)
+  TEST (2, 2, 0, 1)
+  TEST (3, 2, 0, 1)
+  TEST (0, 3, 0, 1)
+  TEST (1, 3, 0, 1)
+  TEST (2, 3, 0, 1)
+  TEST (3, 3, 0, 1)
+  TEST (0, 0, 1, 1)
+  TEST (1, 0, 1, 1)
+  TEST (2, 0, 1, 1)
+  TEST (3, 0, 1, 1)
+  TEST (0, 1, 1, 1)
+  TEST (1, 1, 1, 1)
+  TEST (2, 1, 1, 1)
+  TEST (3, 1, 1, 1)
+  TEST (0, 2, 1, 1)
+  TEST (1, 2, 1, 1)
+  TEST (2, 2, 1, 1)
+  TEST (3, 2, 1, 1)
+  TEST (0, 3, 1, 1)
+  TEST (1, 3, 1, 1)
+  TEST (2, 3, 1, 1)
+  TEST (3, 3, 1, 1)
+  TEST (0, 0, 2, 1)
+  TEST (1, 0, 2, 1)
+  TEST (2, 0, 2, 1)
+  TEST (3, 0, 2, 1)
+  TEST (0, 1, 2, 1)
+  TEST (1, 1, 2, 1)
+  TEST (2, 1, 2, 1)
+  TEST (3, 1, 2, 1)
+  TEST (0, 2, 2, 1)
+  TEST (1, 2, 2, 1)
+  TEST (2, 2, 2, 1)
+  TEST (3, 2, 2, 1)
+  TEST (0, 3, 2, 1)
+  TEST (1, 3, 2, 1)
+  TEST (2, 3, 2, 1)
+  TEST (3, 3, 2, 1)
+  TEST (0, 0, 3, 1)
+  TEST (1, 0, 3, 1)
+  TEST (2, 0, 3, 1)
+  TEST (3, 0, 3, 1)
+  TEST (0, 1, 3, 1)
+  TEST (1, 1, 3, 1)
+  TEST (2, 1, 3, 1)
+  TEST (3, 1, 3, 1)
+  TEST (0, 2, 3, 1)
+  TEST (1, 2, 3, 1)
+  TEST (2, 2, 3, 1)
+  TEST (3, 2, 3, 1)
+  TEST (0, 3, 3, 1)
+  TEST (1, 3, 3, 1)
+  TEST (2, 3, 3, 1)
+  TEST (3, 3, 3, 1)
+}
+
+void check1(void)
+{
+  TEST (0, 0, 0, 2)
+  TEST (1, 0, 0, 2)
+  TEST (2, 0, 0, 2)
+  TEST (3, 0, 0, 2)
+  TEST (0, 1, 0, 2)
+  TEST (1, 1, 0, 2)
+  TEST (2, 1, 0, 2)
+  TEST (3, 1, 0, 2)
+  TEST (0, 2, 0, 2)
+  TEST (1, 2, 0, 2)
+  TEST (2, 2, 0, 2)
+  TEST (3, 2, 0, 2)
+  TEST (0, 3, 0, 2)
+  TEST (1, 3, 0, 2)
+  TEST (2, 3, 0, 2)
+  TEST (3, 3, 0, 2)
+  TEST (0, 0, 1, 2)
+  TEST (1, 0, 1, 2)
+  TEST (2, 0, 1, 2)
+  TEST (3, 0, 1, 2)
+  TEST (0, 1, 1, 2)
+  TEST (1, 1, 1, 2)
+  TEST (2, 1, 1, 2)
+  TEST (3, 1, 1, 2)
+  TEST (0, 2, 1, 2)
+  TEST (1, 2, 1, 2)
+  TEST (2, 2, 1, 2)
+  TEST (3, 2, 1, 2)
+  TEST (0, 3, 1, 2)
+  TEST (1, 3, 1, 2)
+  TEST (2, 3, 1, 2)
+  TEST (3, 3, 1, 2)
+  TEST (0, 0, 2, 2)
+  TEST (1, 0, 2, 2)
+  TEST (2, 0, 2, 2)
+  TEST (3, 0, 2, 2)
+  TEST (0, 1, 2, 2)
+  TEST (1, 1, 2, 2)
+  TEST (2, 1, 2, 2)
+  TEST (3, 1, 2, 2)
+  TEST (0, 2, 2, 2)
+  TEST (1, 2, 2, 2)
+  TEST (2, 2, 2, 2)
+  TEST (3, 2, 2, 2)
+  TEST (0, 3, 2, 2)
+  TEST (1, 3, 2, 2)
+  TEST (2, 3, 2, 2)
+  TEST (3, 3, 2, 2)
+  TEST (0, 0, 3, 2)
+  TEST (1, 0, 3, 2)
+  TEST (2, 0, 3, 2)
+  TEST (3, 0, 3, 2)
+  TEST (0, 1, 3, 2)
+  TEST (1, 1, 3, 2)
+  TEST (2, 1, 3, 2)
+  TEST (3, 1, 3, 2)
+  TEST (0, 2, 3, 2)
+  TEST (1, 2, 3, 2)
+  TEST (2, 2, 3, 2)
+  TEST (3, 2, 3, 2)
+  TEST (0, 3, 3, 2)
+  TEST (1, 3, 3, 2)
+  TEST (2, 3, 3, 2)
+  TEST (3, 3, 3, 2)
+  TEST (0, 0, 0, 3)
+  TEST (1, 0, 0, 3)
+  TEST (2, 0, 0, 3)
+  TEST (3, 0, 0, 3)
+  TEST (0, 1, 0, 3)
+  TEST (1, 1, 0, 3)
+  TEST (2, 1, 0, 3)
+  TEST (3, 1, 0, 3)
+  TEST (0, 2, 0, 3)
+  TEST (1, 2, 0, 3)
+  TEST (2, 2, 0, 3)
+  TEST (3, 2, 0, 3)
+  TEST (0, 3, 0, 3)
+  TEST (1, 3, 0, 3)
+  TEST (2, 3, 0, 3)
+  TEST (3, 3, 0, 3)
+  TEST (0, 0, 1, 3)
+  TEST (1, 0, 1, 3)
+  TEST (2, 0, 1, 3)
+  TEST (3, 0, 1, 3)
+  TEST (0, 1, 1, 3)
+  TEST (1, 1, 1, 3)
+  TEST (2, 1, 1, 3)
+  TEST (3, 1, 1, 3)
+  TEST (0, 2, 1, 3)
+  TEST (1, 2, 1, 3)
+  TEST (2, 2, 1, 3)
+  TEST (3, 2, 1, 3)
+  TEST (0, 3, 1, 3)
+  TEST (1, 3, 1, 3)
+  TEST (2, 3, 1, 3)
+  TEST (3, 3, 1, 3)
+  TEST (0, 0, 2, 3)
+  TEST (1, 0, 2, 3)
+  TEST (2, 0, 2, 3)
+  TEST (3, 0, 2, 3)
+  TEST (0, 1, 2, 3)
+  TEST (1, 1, 2, 3)
+  TEST (2, 1, 2, 3)
+  TEST (3, 1, 2, 3)
+  TEST (0, 2, 2, 3)
+  TEST (1, 2, 2, 3)
+  TEST (2, 2, 2, 3)
+  TEST (3, 2, 2, 3)
+  TEST (0, 3, 2, 3)
+  TEST (1, 3, 2, 3)
+  TEST (2, 3, 2, 3)
+  TEST (3, 3, 2, 3)
+  TEST (0, 0, 3, 3)
+  TEST (1, 0, 3, 3)
+  TEST (2, 0, 3, 3)
+  TEST (3, 0, 3, 3)
+  TEST (0, 1, 3, 3)
+  TEST (1, 1, 3, 3)
+  TEST (2, 1, 3, 3)
+  TEST (3, 1, 3, 3)
+  TEST (0, 2, 3, 3)
+  TEST (1, 2, 3, 3)
+  TEST (2, 2, 3, 3)
+  TEST (3, 2, 3, 3)
+  TEST (0, 3, 3, 3)
+  TEST (1, 3, 3, 3)
+  TEST (2, 3, 3, 3)
+  TEST (3, 3, 3, 3)
+}
+
+void check(void)
+{
+  check0 ();
+  check1 ();
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/vperm-4-2.inc b/gcc/testsuite/gcc.target/i386/vperm-4-2.inc
new file mode 100644
index 0000000..2f7baa0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vperm-4-2.inc
@@ -0,0 +1,4262 @@
+/* This file auto-generated with ./vperm.pl 4 2.  */
+
+void check0(void)
+{
+  TEST (0, 0, 0, 0)
+  TEST (1, 0, 0, 0)
+  TEST (2, 0, 0, 0)
+  TEST (3, 0, 0, 0)
+  TEST (4, 0, 0, 0)
+  TEST (5, 0, 0, 0)
+  TEST (6, 0, 0, 0)
+  TEST (7, 0, 0, 0)
+  TEST (0, 1, 0, 0)
+  TEST (1, 1, 0, 0)
+  TEST (2, 1, 0, 0)
+  TEST (3, 1, 0, 0)
+  TEST (4, 1, 0, 0)
+  TEST (5, 1, 0, 0)
+  TEST (6, 1, 0, 0)
+  TEST (7, 1, 0, 0)
+  TEST (0, 2, 0, 0)
+  TEST (1, 2, 0, 0)
+  TEST (2, 2, 0, 0)
+  TEST (3, 2, 0, 0)
+  TEST (4, 2, 0, 0)
+  TEST (5, 2, 0, 0)
+  TEST (6, 2, 0, 0)
+  TEST (7, 2, 0, 0)
+  TEST (0, 3, 0, 0)
+  TEST (1, 3, 0, 0)
+  TEST (2, 3, 0, 0)
+  TEST (3, 3, 0, 0)
+  TEST (4, 3, 0, 0)
+  TEST (5, 3, 0, 0)
+  TEST (6, 3, 0, 0)
+  TEST (7, 3, 0, 0)
+  TEST (0, 4, 0, 0)
+  TEST (1, 4, 0, 0)
+  TEST (2, 4, 0, 0)
+  TEST (3, 4, 0, 0)
+  TEST (4, 4, 0, 0)
+  TEST (5, 4, 0, 0)
+  TEST (6, 4, 0, 0)
+  TEST (7, 4, 0, 0)
+  TEST (0, 5, 0, 0)
+  TEST (1, 5, 0, 0)
+  TEST (2, 5, 0, 0)
+  TEST (3, 5, 0, 0)
+  TEST (4, 5, 0, 0)
+  TEST (5, 5, 0, 0)
+  TEST (6, 5, 0, 0)
+  TEST (7, 5, 0, 0)
+  TEST (0, 6, 0, 0)
+  TEST (1, 6, 0, 0)
+  TEST (2, 6, 0, 0)
+  TEST (3, 6, 0, 0)
+  TEST (4, 6, 0, 0)
+  TEST (5, 6, 0, 0)
+  TEST (6, 6, 0, 0)
+  TEST (7, 6, 0, 0)
+  TEST (0, 7, 0, 0)
+  TEST (1, 7, 0, 0)
+  TEST (2, 7, 0, 0)
+  TEST (3, 7, 0, 0)
+  TEST (4, 7, 0, 0)
+  TEST (5, 7, 0, 0)
+  TEST (6, 7, 0, 0)
+  TEST (7, 7, 0, 0)
+  TEST (0, 0, 1, 0)
+  TEST (1, 0, 1, 0)
+  TEST (2, 0, 1, 0)
+  TEST (3, 0, 1, 0)
+  TEST (4, 0, 1, 0)
+  TEST (5, 0, 1, 0)
+  TEST (6, 0, 1, 0)
+  TEST (7, 0, 1, 0)
+  TEST (0, 1, 1, 0)
+  TEST (1, 1, 1, 0)
+  TEST (2, 1, 1, 0)
+  TEST (3, 1, 1, 0)
+  TEST (4, 1, 1, 0)
+  TEST (5, 1, 1, 0)
+  TEST (6, 1, 1, 0)
+  TEST (7, 1, 1, 0)
+  TEST (0, 2, 1, 0)
+  TEST (1, 2, 1, 0)
+  TEST (2, 2, 1, 0)
+  TEST (3, 2, 1, 0)
+  TEST (4, 2, 1, 0)
+  TEST (5, 2, 1, 0)
+  TEST (6, 2, 1, 0)
+  TEST (7, 2, 1, 0)
+  TEST (0, 3, 1, 0)
+  TEST (1, 3, 1, 0)
+  TEST (2, 3, 1, 0)
+  TEST (3, 3, 1, 0)
+  TEST (4, 3, 1, 0)
+  TEST (5, 3, 1, 0)
+  TEST (6, 3, 1, 0)
+  TEST (7, 3, 1, 0)
+  TEST (0, 4, 1, 0)
+  TEST (1, 4, 1, 0)
+  TEST (2, 4, 1, 0)
+  TEST (3, 4, 1, 0)
+  TEST (4, 4, 1, 0)
+  TEST (5, 4, 1, 0)
+  TEST (6, 4, 1, 0)
+  TEST (7, 4, 1, 0)
+  TEST (0, 5, 1, 0)
+  TEST (1, 5, 1, 0)
+  TEST (2, 5, 1, 0)
+  TEST (3, 5, 1, 0)
+  TEST (4, 5, 1, 0)
+  TEST (5, 5, 1, 0)
+  TEST (6, 5, 1, 0)
+  TEST (7, 5, 1, 0)
+  TEST (0, 6, 1, 0)
+  TEST (1, 6, 1, 0)
+  TEST (2, 6, 1, 0)
+  TEST (3, 6, 1, 0)
+  TEST (4, 6, 1, 0)
+  TEST (5, 6, 1, 0)
+  TEST (6, 6, 1, 0)
+  TEST (7, 6, 1, 0)
+  TEST (0, 7, 1, 0)
+  TEST (1, 7, 1, 0)
+  TEST (2, 7, 1, 0)
+  TEST (3, 7, 1, 0)
+  TEST (4, 7, 1, 0)
+  TEST (5, 7, 1, 0)
+  TEST (6, 7, 1, 0)
+  TEST (7, 7, 1, 0)
+}
+
+void check1(void)
+{
+  TEST (0, 0, 2, 0)
+  TEST (1, 0, 2, 0)
+  TEST (2, 0, 2, 0)
+  TEST (3, 0, 2, 0)
+  TEST (4, 0, 2, 0)
+  TEST (5, 0, 2, 0)
+  TEST (6, 0, 2, 0)
+  TEST (7, 0, 2, 0)
+  TEST (0, 1, 2, 0)
+  TEST (1, 1, 2, 0)
+  TEST (2, 1, 2, 0)
+  TEST (3, 1, 2, 0)
+  TEST (4, 1, 2, 0)
+  TEST (5, 1, 2, 0)
+  TEST (6, 1, 2, 0)
+  TEST (7, 1, 2, 0)
+  TEST (0, 2, 2, 0)
+  TEST (1, 2, 2, 0)
+  TEST (2, 2, 2, 0)
+  TEST (3, 2, 2, 0)
+  TEST (4, 2, 2, 0)
+  TEST (5, 2, 2, 0)
+  TEST (6, 2, 2, 0)
+  TEST (7, 2, 2, 0)
+  TEST (0, 3, 2, 0)
+  TEST (1, 3, 2, 0)
+  TEST (2, 3, 2, 0)
+  TEST (3, 3, 2, 0)
+  TEST (4, 3, 2, 0)
+  TEST (5, 3, 2, 0)
+  TEST (6, 3, 2, 0)
+  TEST (7, 3, 2, 0)
+  TEST (0, 4, 2, 0)
+  TEST (1, 4, 2, 0)
+  TEST (2, 4, 2, 0)
+  TEST (3, 4, 2, 0)
+  TEST (4, 4, 2, 0)
+  TEST (5, 4, 2, 0)
+  TEST (6, 4, 2, 0)
+  TEST (7, 4, 2, 0)
+  TEST (0, 5, 2, 0)
+  TEST (1, 5, 2, 0)
+  TEST (2, 5, 2, 0)
+  TEST (3, 5, 2, 0)
+  TEST (4, 5, 2, 0)
+  TEST (5, 5, 2, 0)
+  TEST (6, 5, 2, 0)
+  TEST (7, 5, 2, 0)
+  TEST (0, 6, 2, 0)
+  TEST (1, 6, 2, 0)
+  TEST (2, 6, 2, 0)
+  TEST (3, 6, 2, 0)
+  TEST (4, 6, 2, 0)
+  TEST (5, 6, 2, 0)
+  TEST (6, 6, 2, 0)
+  TEST (7, 6, 2, 0)
+  TEST (0, 7, 2, 0)
+  TEST (1, 7, 2, 0)
+  TEST (2, 7, 2, 0)
+  TEST (3, 7, 2, 0)
+  TEST (4, 7, 2, 0)
+  TEST (5, 7, 2, 0)
+  TEST (6, 7, 2, 0)
+  TEST (7, 7, 2, 0)
+  TEST (0, 0, 3, 0)
+  TEST (1, 0, 3, 0)
+  TEST (2, 0, 3, 0)
+  TEST (3, 0, 3, 0)
+  TEST (4, 0, 3, 0)
+  TEST (5, 0, 3, 0)
+  TEST (6, 0, 3, 0)
+  TEST (7, 0, 3, 0)
+  TEST (0, 1, 3, 0)
+  TEST (1, 1, 3, 0)
+  TEST (2, 1, 3, 0)
+  TEST (3, 1, 3, 0)
+  TEST (4, 1, 3, 0)
+  TEST (5, 1, 3, 0)
+  TEST (6, 1, 3, 0)
+  TEST (7, 1, 3, 0)
+  TEST (0, 2, 3, 0)
+  TEST (1, 2, 3, 0)
+  TEST (2, 2, 3, 0)
+  TEST (3, 2, 3, 0)
+  TEST (4, 2, 3, 0)
+  TEST (5, 2, 3, 0)
+  TEST (6, 2, 3, 0)
+  TEST (7, 2, 3, 0)
+  TEST (0, 3, 3, 0)
+  TEST (1, 3, 3, 0)
+  TEST (2, 3, 3, 0)
+  TEST (3, 3, 3, 0)
+  TEST (4, 3, 3, 0)
+  TEST (5, 3, 3, 0)
+  TEST (6, 3, 3, 0)
+  TEST (7, 3, 3, 0)
+  TEST (0, 4, 3, 0)
+  TEST (1, 4, 3, 0)
+  TEST (2, 4, 3, 0)
+  TEST (3, 4, 3, 0)
+  TEST (4, 4, 3, 0)
+  TEST (5, 4, 3, 0)
+  TEST (6, 4, 3, 0)
+  TEST (7, 4, 3, 0)
+  TEST (0, 5, 3, 0)
+  TEST (1, 5, 3, 0)
+  TEST (2, 5, 3, 0)
+  TEST (3, 5, 3, 0)
+  TEST (4, 5, 3, 0)
+  TEST (5, 5, 3, 0)
+  TEST (6, 5, 3, 0)
+  TEST (7, 5, 3, 0)
+  TEST (0, 6, 3, 0)
+  TEST (1, 6, 3, 0)
+  TEST (2, 6, 3, 0)
+  TEST (3, 6, 3, 0)
+  TEST (4, 6, 3, 0)
+  TEST (5, 6, 3, 0)
+  TEST (6, 6, 3, 0)
+  TEST (7, 6, 3, 0)
+  TEST (0, 7, 3, 0)
+  TEST (1, 7, 3, 0)
+  TEST (2, 7, 3, 0)
+  TEST (3, 7, 3, 0)
+  TEST (4, 7, 3, 0)
+  TEST (5, 7, 3, 0)
+  TEST (6, 7, 3, 0)
+  TEST (7, 7, 3, 0)
+}
+
+void check2(void)
+{
+  TEST (0, 0, 4, 0)
+  TEST (1, 0, 4, 0)
+  TEST (2, 0, 4, 0)
+  TEST (3, 0, 4, 0)
+  TEST (4, 0, 4, 0)
+  TEST (5, 0, 4, 0)
+  TEST (6, 0, 4, 0)
+  TEST (7, 0, 4, 0)
+  TEST (0, 1, 4, 0)
+  TEST (1, 1, 4, 0)
+  TEST (2, 1, 4, 0)
+  TEST (3, 1, 4, 0)
+  TEST (4, 1, 4, 0)
+  TEST (5, 1, 4, 0)
+  TEST (6, 1, 4, 0)
+  TEST (7, 1, 4, 0)
+  TEST (0, 2, 4, 0)
+  TEST (1, 2, 4, 0)
+  TEST (2, 2, 4, 0)
+  TEST (3, 2, 4, 0)
+  TEST (4, 2, 4, 0)
+  TEST (5, 2, 4, 0)
+  TEST (6, 2, 4, 0)
+  TEST (7, 2, 4, 0)
+  TEST (0, 3, 4, 0)
+  TEST (1, 3, 4, 0)
+  TEST (2, 3, 4, 0)
+  TEST (3, 3, 4, 0)
+  TEST (4, 3, 4, 0)
+  TEST (5, 3, 4, 0)
+  TEST (6, 3, 4, 0)
+  TEST (7, 3, 4, 0)
+  TEST (0, 4, 4, 0)
+  TEST (1, 4, 4, 0)
+  TEST (2, 4, 4, 0)
+  TEST (3, 4, 4, 0)
+  TEST (4, 4, 4, 0)
+  TEST (5, 4, 4, 0)
+  TEST (6, 4, 4, 0)
+  TEST (7, 4, 4, 0)
+  TEST (0, 5, 4, 0)
+  TEST (1, 5, 4, 0)
+  TEST (2, 5, 4, 0)
+  TEST (3, 5, 4, 0)
+  TEST (4, 5, 4, 0)
+  TEST (5, 5, 4, 0)
+  TEST (6, 5, 4, 0)
+  TEST (7, 5, 4, 0)
+  TEST (0, 6, 4, 0)
+  TEST (1, 6, 4, 0)
+  TEST (2, 6, 4, 0)
+  TEST (3, 6, 4, 0)
+  TEST (4, 6, 4, 0)
+  TEST (5, 6, 4, 0)
+  TEST (6, 6, 4, 0)
+  TEST (7, 6, 4, 0)
+  TEST (0, 7, 4, 0)
+  TEST (1, 7, 4, 0)
+  TEST (2, 7, 4, 0)
+  TEST (3, 7, 4, 0)
+  TEST (4, 7, 4, 0)
+  TEST (5, 7, 4, 0)
+  TEST (6, 7, 4, 0)
+  TEST (7, 7, 4, 0)
+  TEST (0, 0, 5, 0)
+  TEST (1, 0, 5, 0)
+  TEST (2, 0, 5, 0)
+  TEST (3, 0, 5, 0)
+  TEST (4, 0, 5, 0)
+  TEST (5, 0, 5, 0)
+  TEST (6, 0, 5, 0)
+  TEST (7, 0, 5, 0)
+  TEST (0, 1, 5, 0)
+  TEST (1, 1, 5, 0)
+  TEST (2, 1, 5, 0)
+  TEST (3, 1, 5, 0)
+  TEST (4, 1, 5, 0)
+  TEST (5, 1, 5, 0)
+  TEST (6, 1, 5, 0)
+  TEST (7, 1, 5, 0)
+  TEST (0, 2, 5, 0)
+  TEST (1, 2, 5, 0)
+  TEST (2, 2, 5, 0)
+  TEST (3, 2, 5, 0)
+  TEST (4, 2, 5, 0)
+  TEST (5, 2, 5, 0)
+  TEST (6, 2, 5, 0)
+  TEST (7, 2, 5, 0)
+  TEST (0, 3, 5, 0)
+  TEST (1, 3, 5, 0)
+  TEST (2, 3, 5, 0)
+  TEST (3, 3, 5, 0)
+  TEST (4, 3, 5, 0)
+  TEST (5, 3, 5, 0)
+  TEST (6, 3, 5, 0)
+  TEST (7, 3, 5, 0)
+  TEST (0, 4, 5, 0)
+  TEST (1, 4, 5, 0)
+  TEST (2, 4, 5, 0)
+  TEST (3, 4, 5, 0)
+  TEST (4, 4, 5, 0)
+  TEST (5, 4, 5, 0)
+  TEST (6, 4, 5, 0)
+  TEST (7, 4, 5, 0)
+  TEST (0, 5, 5, 0)
+  TEST (1, 5, 5, 0)
+  TEST (2, 5, 5, 0)
+  TEST (3, 5, 5, 0)
+  TEST (4, 5, 5, 0)
+  TEST (5, 5, 5, 0)
+  TEST (6, 5, 5, 0)
+  TEST (7, 5, 5, 0)
+  TEST (0, 6, 5, 0)
+  TEST (1, 6, 5, 0)
+  TEST (2, 6, 5, 0)
+  TEST (3, 6, 5, 0)
+  TEST (4, 6, 5, 0)
+  TEST (5, 6, 5, 0)
+  TEST (6, 6, 5, 0)
+  TEST (7, 6, 5, 0)
+  TEST (0, 7, 5, 0)
+  TEST (1, 7, 5, 0)
+  TEST (2, 7, 5, 0)
+  TEST (3, 7, 5, 0)
+  TEST (4, 7, 5, 0)
+  TEST (5, 7, 5, 0)
+  TEST (6, 7, 5, 0)
+  TEST (7, 7, 5, 0)
+}
+
+void check3(void)
+{
+  TEST (0, 0, 6, 0)
+  TEST (1, 0, 6, 0)
+  TEST (2, 0, 6, 0)
+  TEST (3, 0, 6, 0)
+  TEST (4, 0, 6, 0)
+  TEST (5, 0, 6, 0)
+  TEST (6, 0, 6, 0)
+  TEST (7, 0, 6, 0)
+  TEST (0, 1, 6, 0)
+  TEST (1, 1, 6, 0)
+  TEST (2, 1, 6, 0)
+  TEST (3, 1, 6, 0)
+  TEST (4, 1, 6, 0)
+  TEST (5, 1, 6, 0)
+  TEST (6, 1, 6, 0)
+  TEST (7, 1, 6, 0)
+  TEST (0, 2, 6, 0)
+  TEST (1, 2, 6, 0)
+  TEST (2, 2, 6, 0)
+  TEST (3, 2, 6, 0)
+  TEST (4, 2, 6, 0)
+  TEST (5, 2, 6, 0)
+  TEST (6, 2, 6, 0)
+  TEST (7, 2, 6, 0)
+  TEST (0, 3, 6, 0)
+  TEST (1, 3, 6, 0)
+  TEST (2, 3, 6, 0)
+  TEST (3, 3, 6, 0)
+  TEST (4, 3, 6, 0)
+  TEST (5, 3, 6, 0)
+  TEST (6, 3, 6, 0)
+  TEST (7, 3, 6, 0)
+  TEST (0, 4, 6, 0)
+  TEST (1, 4, 6, 0)
+  TEST (2, 4, 6, 0)
+  TEST (3, 4, 6, 0)
+  TEST (4, 4, 6, 0)
+  TEST (5, 4, 6, 0)
+  TEST (6, 4, 6, 0)
+  TEST (7, 4, 6, 0)
+  TEST (0, 5, 6, 0)
+  TEST (1, 5, 6, 0)
+  TEST (2, 5, 6, 0)
+  TEST (3, 5, 6, 0)
+  TEST (4, 5, 6, 0)
+  TEST (5, 5, 6, 0)
+  TEST (6, 5, 6, 0)
+  TEST (7, 5, 6, 0)
+  TEST (0, 6, 6, 0)
+  TEST (1, 6, 6, 0)
+  TEST (2, 6, 6, 0)
+  TEST (3, 6, 6, 0)
+  TEST (4, 6, 6, 0)
+  TEST (5, 6, 6, 0)
+  TEST (6, 6, 6, 0)
+  TEST (7, 6, 6, 0)
+  TEST (0, 7, 6, 0)
+  TEST (1, 7, 6, 0)
+  TEST (2, 7, 6, 0)
+  TEST (3, 7, 6, 0)
+  TEST (4, 7, 6, 0)
+  TEST (5, 7, 6, 0)
+  TEST (6, 7, 6, 0)
+  TEST (7, 7, 6, 0)
+  TEST (0, 0, 7, 0)
+  TEST (1, 0, 7, 0)
+  TEST (2, 0, 7, 0)
+  TEST (3, 0, 7, 0)
+  TEST (4, 0, 7, 0)
+  TEST (5, 0, 7, 0)
+  TEST (6, 0, 7, 0)
+  TEST (7, 0, 7, 0)
+  TEST (0, 1, 7, 0)
+  TEST (1, 1, 7, 0)
+  TEST (2, 1, 7, 0)
+  TEST (3, 1, 7, 0)
+  TEST (4, 1, 7, 0)
+  TEST (5, 1, 7, 0)
+  TEST (6, 1, 7, 0)
+  TEST (7, 1, 7, 0)
+  TEST (0, 2, 7, 0)
+  TEST (1, 2, 7, 0)
+  TEST (2, 2, 7, 0)
+  TEST (3, 2, 7, 0)
+  TEST (4, 2, 7, 0)
+  TEST (5, 2, 7, 0)
+  TEST (6, 2, 7, 0)
+  TEST (7, 2, 7, 0)
+  TEST (0, 3, 7, 0)
+  TEST (1, 3, 7, 0)
+  TEST (2, 3, 7, 0)
+  TEST (3, 3, 7, 0)
+  TEST (4, 3, 7, 0)
+  TEST (5, 3, 7, 0)
+  TEST (6, 3, 7, 0)
+  TEST (7, 3, 7, 0)
+  TEST (0, 4, 7, 0)
+  TEST (1, 4, 7, 0)
+  TEST (2, 4, 7, 0)
+  TEST (3, 4, 7, 0)
+  TEST (4, 4, 7, 0)
+  TEST (5, 4, 7, 0)
+  TEST (6, 4, 7, 0)
+  TEST (7, 4, 7, 0)
+  TEST (0, 5, 7, 0)
+  TEST (1, 5, 7, 0)
+  TEST (2, 5, 7, 0)
+  TEST (3, 5, 7, 0)
+  TEST (4, 5, 7, 0)
+  TEST (5, 5, 7, 0)
+  TEST (6, 5, 7, 0)
+  TEST (7, 5, 7, 0)
+  TEST (0, 6, 7, 0)
+  TEST (1, 6, 7, 0)
+  TEST (2, 6, 7, 0)
+  TEST (3, 6, 7, 0)
+  TEST (4, 6, 7, 0)
+  TEST (5, 6, 7, 0)
+  TEST (6, 6, 7, 0)
+  TEST (7, 6, 7, 0)
+  TEST (0, 7, 7, 0)
+  TEST (1, 7, 7, 0)
+  TEST (2, 7, 7, 0)
+  TEST (3, 7, 7, 0)
+  TEST (4, 7, 7, 0)
+  TEST (5, 7, 7, 0)
+  TEST (6, 7, 7, 0)
+  TEST (7, 7, 7, 0)
+}
+
+void check4(void)
+{
+  TEST (0, 0, 0, 1)
+  TEST (1, 0, 0, 1)
+  TEST (2, 0, 0, 1)
+  TEST (3, 0, 0, 1)
+  TEST (4, 0, 0, 1)
+  TEST (5, 0, 0, 1)
+  TEST (6, 0, 0, 1)
+  TEST (7, 0, 0, 1)
+  TEST (0, 1, 0, 1)
+  TEST (1, 1, 0, 1)
+  TEST (2, 1, 0, 1)
+  TEST (3, 1, 0, 1)
+  TEST (4, 1, 0, 1)
+  TEST (5, 1, 0, 1)
+  TEST (6, 1, 0, 1)
+  TEST (7, 1, 0, 1)
+  TEST (0, 2, 0, 1)
+  TEST (1, 2, 0, 1)
+  TEST (2, 2, 0, 1)
+  TEST (3, 2, 0, 1)
+  TEST (4, 2, 0, 1)
+  TEST (5, 2, 0, 1)
+  TEST (6, 2, 0, 1)
+  TEST (7, 2, 0, 1)
+  TEST (0, 3, 0, 1)
+  TEST (1, 3, 0, 1)
+  TEST (2, 3, 0, 1)
+  TEST (3, 3, 0, 1)
+  TEST (4, 3, 0, 1)
+  TEST (5, 3, 0, 1)
+  TEST (6, 3, 0, 1)
+  TEST (7, 3, 0, 1)
+  TEST (0, 4, 0, 1)
+  TEST (1, 4, 0, 1)
+  TEST (2, 4, 0, 1)
+  TEST (3, 4, 0, 1)
+  TEST (4, 4, 0, 1)
+  TEST (5, 4, 0, 1)
+  TEST (6, 4, 0, 1)
+  TEST (7, 4, 0, 1)
+  TEST (0, 5, 0, 1)
+  TEST (1, 5, 0, 1)
+  TEST (2, 5, 0, 1)
+  TEST (3, 5, 0, 1)
+  TEST (4, 5, 0, 1)
+  TEST (5, 5, 0, 1)
+  TEST (6, 5, 0, 1)
+  TEST (7, 5, 0, 1)
+  TEST (0, 6, 0, 1)
+  TEST (1, 6, 0, 1)
+  TEST (2, 6, 0, 1)
+  TEST (3, 6, 0, 1)
+  TEST (4, 6, 0, 1)
+  TEST (5, 6, 0, 1)
+  TEST (6, 6, 0, 1)
+  TEST (7, 6, 0, 1)
+  TEST (0, 7, 0, 1)
+  TEST (1, 7, 0, 1)
+  TEST (2, 7, 0, 1)
+  TEST (3, 7, 0, 1)
+  TEST (4, 7, 0, 1)
+  TEST (5, 7, 0, 1)
+  TEST (6, 7, 0, 1)
+  TEST (7, 7, 0, 1)
+  TEST (0, 0, 1, 1)
+  TEST (1, 0, 1, 1)
+  TEST (2, 0, 1, 1)
+  TEST (3, 0, 1, 1)
+  TEST (4, 0, 1, 1)
+  TEST (5, 0, 1, 1)
+  TEST (6, 0, 1, 1)
+  TEST (7, 0, 1, 1)
+  TEST (0, 1, 1, 1)
+  TEST (1, 1, 1, 1)
+  TEST (2, 1, 1, 1)
+  TEST (3, 1, 1, 1)
+  TEST (4, 1, 1, 1)
+  TEST (5, 1, 1, 1)
+  TEST (6, 1, 1, 1)
+  TEST (7, 1, 1, 1)
+  TEST (0, 2, 1, 1)
+  TEST (1, 2, 1, 1)
+  TEST (2, 2, 1, 1)
+  TEST (3, 2, 1, 1)
+  TEST (4, 2, 1, 1)
+  TEST (5, 2, 1, 1)
+  TEST (6, 2, 1, 1)
+  TEST (7, 2, 1, 1)
+  TEST (0, 3, 1, 1)
+  TEST (1, 3, 1, 1)
+  TEST (2, 3, 1, 1)
+  TEST (3, 3, 1, 1)
+  TEST (4, 3, 1, 1)
+  TEST (5, 3, 1, 1)
+  TEST (6, 3, 1, 1)
+  TEST (7, 3, 1, 1)
+  TEST (0, 4, 1, 1)
+  TEST (1, 4, 1, 1)
+  TEST (2, 4, 1, 1)
+  TEST (3, 4, 1, 1)
+  TEST (4, 4, 1, 1)
+  TEST (5, 4, 1, 1)
+  TEST (6, 4, 1, 1)
+  TEST (7, 4, 1, 1)
+  TEST (0, 5, 1, 1)
+  TEST (1, 5, 1, 1)
+  TEST (2, 5, 1, 1)
+  TEST (3, 5, 1, 1)
+  TEST (4, 5, 1, 1)
+  TEST (5, 5, 1, 1)
+  TEST (6, 5, 1, 1)
+  TEST (7, 5, 1, 1)
+  TEST (0, 6, 1, 1)
+  TEST (1, 6, 1, 1)
+  TEST (2, 6, 1, 1)
+  TEST (3, 6, 1, 1)
+  TEST (4, 6, 1, 1)
+  TEST (5, 6, 1, 1)
+  TEST (6, 6, 1, 1)
+  TEST (7, 6, 1, 1)
+  TEST (0, 7, 1, 1)
+  TEST (1, 7, 1, 1)
+  TEST (2, 7, 1, 1)
+  TEST (3, 7, 1, 1)
+  TEST (4, 7, 1, 1)
+  TEST (5, 7, 1, 1)
+  TEST (6, 7, 1, 1)
+  TEST (7, 7, 1, 1)
+}
+
+void check5(void)
+{
+  TEST (0, 0, 2, 1)
+  TEST (1, 0, 2, 1)
+  TEST (2, 0, 2, 1)
+  TEST (3, 0, 2, 1)
+  TEST (4, 0, 2, 1)
+  TEST (5, 0, 2, 1)
+  TEST (6, 0, 2, 1)
+  TEST (7, 0, 2, 1)
+  TEST (0, 1, 2, 1)
+  TEST (1, 1, 2, 1)
+  TEST (2, 1, 2, 1)
+  TEST (3, 1, 2, 1)
+  TEST (4, 1, 2, 1)
+  TEST (5, 1, 2, 1)
+  TEST (6, 1, 2, 1)
+  TEST (7, 1, 2, 1)
+  TEST (0, 2, 2, 1)
+  TEST (1, 2, 2, 1)
+  TEST (2, 2, 2, 1)
+  TEST (3, 2, 2, 1)
+  TEST (4, 2, 2, 1)
+  TEST (5, 2, 2, 1)
+  TEST (6, 2, 2, 1)
+  TEST (7, 2, 2, 1)
+  TEST (0, 3, 2, 1)
+  TEST (1, 3, 2, 1)
+  TEST (2, 3, 2, 1)
+  TEST (3, 3, 2, 1)
+  TEST (4, 3, 2, 1)
+  TEST (5, 3, 2, 1)
+  TEST (6, 3, 2, 1)
+  TEST (7, 3, 2, 1)
+  TEST (0, 4, 2, 1)
+  TEST (1, 4, 2, 1)
+  TEST (2, 4, 2, 1)
+  TEST (3, 4, 2, 1)
+  TEST (4, 4, 2, 1)
+  TEST (5, 4, 2, 1)
+  TEST (6, 4, 2, 1)
+  TEST (7, 4, 2, 1)
+  TEST (0, 5, 2, 1)
+  TEST (1, 5, 2, 1)
+  TEST (2, 5, 2, 1)
+  TEST (3, 5, 2, 1)
+  TEST (4, 5, 2, 1)
+  TEST (5, 5, 2, 1)
+  TEST (6, 5, 2, 1)
+  TEST (7, 5, 2, 1)
+  TEST (0, 6, 2, 1)
+  TEST (1, 6, 2, 1)
+  TEST (2, 6, 2, 1)
+  TEST (3, 6, 2, 1)
+  TEST (4, 6, 2, 1)
+  TEST (5, 6, 2, 1)
+  TEST (6, 6, 2, 1)
+  TEST (7, 6, 2, 1)
+  TEST (0, 7, 2, 1)
+  TEST (1, 7, 2, 1)
+  TEST (2, 7, 2, 1)
+  TEST (3, 7, 2, 1)
+  TEST (4, 7, 2, 1)
+  TEST (5, 7, 2, 1)
+  TEST (6, 7, 2, 1)
+  TEST (7, 7, 2, 1)
+  TEST (0, 0, 3, 1)
+  TEST (1, 0, 3, 1)
+  TEST (2, 0, 3, 1)
+  TEST (3, 0, 3, 1)
+  TEST (4, 0, 3, 1)
+  TEST (5, 0, 3, 1)
+  TEST (6, 0, 3, 1)
+  TEST (7, 0, 3, 1)
+  TEST (0, 1, 3, 1)
+  TEST (1, 1, 3, 1)
+  TEST (2, 1, 3, 1)
+  TEST (3, 1, 3, 1)
+  TEST (4, 1, 3, 1)
+  TEST (5, 1, 3, 1)
+  TEST (6, 1, 3, 1)
+  TEST (7, 1, 3, 1)
+  TEST (0, 2, 3, 1)
+  TEST (1, 2, 3, 1)
+  TEST (2, 2, 3, 1)
+  TEST (3, 2, 3, 1)
+  TEST (4, 2, 3, 1)
+  TEST (5, 2, 3, 1)
+  TEST (6, 2, 3, 1)
+  TEST (7, 2, 3, 1)
+  TEST (0, 3, 3, 1)
+  TEST (1, 3, 3, 1)
+  TEST (2, 3, 3, 1)
+  TEST (3, 3, 3, 1)
+  TEST (4, 3, 3, 1)
+  TEST (5, 3, 3, 1)
+  TEST (6, 3, 3, 1)
+  TEST (7, 3, 3, 1)
+  TEST (0, 4, 3, 1)
+  TEST (1, 4, 3, 1)
+  TEST (2, 4, 3, 1)
+  TEST (3, 4, 3, 1)
+  TEST (4, 4, 3, 1)
+  TEST (5, 4, 3, 1)
+  TEST (6, 4, 3, 1)
+  TEST (7, 4, 3, 1)
+  TEST (0, 5, 3, 1)
+  TEST (1, 5, 3, 1)
+  TEST (2, 5, 3, 1)
+  TEST (3, 5, 3, 1)
+  TEST (4, 5, 3, 1)
+  TEST (5, 5, 3, 1)
+  TEST (6, 5, 3, 1)
+  TEST (7, 5, 3, 1)
+  TEST (0, 6, 3, 1)
+  TEST (1, 6, 3, 1)
+  TEST (2, 6, 3, 1)
+  TEST (3, 6, 3, 1)
+  TEST (4, 6, 3, 1)
+  TEST (5, 6, 3, 1)
+  TEST (6, 6, 3, 1)
+  TEST (7, 6, 3, 1)
+  TEST (0, 7, 3, 1)
+  TEST (1, 7, 3, 1)
+  TEST (2, 7, 3, 1)
+  TEST (3, 7, 3, 1)
+  TEST (4, 7, 3, 1)
+  TEST (5, 7, 3, 1)
+  TEST (6, 7, 3, 1)
+  TEST (7, 7, 3, 1)
+}
+
+void check6(void)
+{
+  TEST (0, 0, 4, 1)
+  TEST (1, 0, 4, 1)
+  TEST (2, 0, 4, 1)
+  TEST (3, 0, 4, 1)
+  TEST (4, 0, 4, 1)
+  TEST (5, 0, 4, 1)
+  TEST (6, 0, 4, 1)
+  TEST (7, 0, 4, 1)
+  TEST (0, 1, 4, 1)
+  TEST (1, 1, 4, 1)
+  TEST (2, 1, 4, 1)
+  TEST (3, 1, 4, 1)
+  TEST (4, 1, 4, 1)
+  TEST (5, 1, 4, 1)
+  TEST (6, 1, 4, 1)
+  TEST (7, 1, 4, 1)
+  TEST (0, 2, 4, 1)
+  TEST (1, 2, 4, 1)
+  TEST (2, 2, 4, 1)
+  TEST (3, 2, 4, 1)
+  TEST (4, 2, 4, 1)
+  TEST (5, 2, 4, 1)
+  TEST (6, 2, 4, 1)
+  TEST (7, 2, 4, 1)
+  TEST (0, 3, 4, 1)
+  TEST (1, 3, 4, 1)
+  TEST (2, 3, 4, 1)
+  TEST (3, 3, 4, 1)
+  TEST (4, 3, 4, 1)
+  TEST (5, 3, 4, 1)
+  TEST (6, 3, 4, 1)
+  TEST (7, 3, 4, 1)
+  TEST (0, 4, 4, 1)
+  TEST (1, 4, 4, 1)
+  TEST (2, 4, 4, 1)
+  TEST (3, 4, 4, 1)
+  TEST (4, 4, 4, 1)
+  TEST (5, 4, 4, 1)
+  TEST (6, 4, 4, 1)
+  TEST (7, 4, 4, 1)
+  TEST (0, 5, 4, 1)
+  TEST (1, 5, 4, 1)
+  TEST (2, 5, 4, 1)
+  TEST (3, 5, 4, 1)
+  TEST (4, 5, 4, 1)
+  TEST (5, 5, 4, 1)
+  TEST (6, 5, 4, 1)
+  TEST (7, 5, 4, 1)
+  TEST (0, 6, 4, 1)
+  TEST (1, 6, 4, 1)
+  TEST (2, 6, 4, 1)
+  TEST (3, 6, 4, 1)
+  TEST (4, 6, 4, 1)
+  TEST (5, 6, 4, 1)
+  TEST (6, 6, 4, 1)
+  TEST (7, 6, 4, 1)
+  TEST (0, 7, 4, 1)
+  TEST (1, 7, 4, 1)
+  TEST (2, 7, 4, 1)
+  TEST (3, 7, 4, 1)
+  TEST (4, 7, 4, 1)
+  TEST (5, 7, 4, 1)
+  TEST (6, 7, 4, 1)
+  TEST (7, 7, 4, 1)
+  TEST (0, 0, 5, 1)
+  TEST (1, 0, 5, 1)
+  TEST (2, 0, 5, 1)
+  TEST (3, 0, 5, 1)
+  TEST (4, 0, 5, 1)
+  TEST (5, 0, 5, 1)
+  TEST (6, 0, 5, 1)
+  TEST (7, 0, 5, 1)
+  TEST (0, 1, 5, 1)
+  TEST (1, 1, 5, 1)
+  TEST (2, 1, 5, 1)
+  TEST (3, 1, 5, 1)
+  TEST (4, 1, 5, 1)
+  TEST (5, 1, 5, 1)
+  TEST (6, 1, 5, 1)
+  TEST (7, 1, 5, 1)
+  TEST (0, 2, 5, 1)
+  TEST (1, 2, 5, 1)
+  TEST (2, 2, 5, 1)
+  TEST (3, 2, 5, 1)
+  TEST (4, 2, 5, 1)
+  TEST (5, 2, 5, 1)
+  TEST (6, 2, 5, 1)
+  TEST (7, 2, 5, 1)
+  TEST (0, 3, 5, 1)
+  TEST (1, 3, 5, 1)
+  TEST (2, 3, 5, 1)
+  TEST (3, 3, 5, 1)
+  TEST (4, 3, 5, 1)
+  TEST (5, 3, 5, 1)
+  TEST (6, 3, 5, 1)
+  TEST (7, 3, 5, 1)
+  TEST (0, 4, 5, 1)
+  TEST (1, 4, 5, 1)
+  TEST (2, 4, 5, 1)
+  TEST (3, 4, 5, 1)
+  TEST (4, 4, 5, 1)
+  TEST (5, 4, 5, 1)
+  TEST (6, 4, 5, 1)
+  TEST (7, 4, 5, 1)
+  TEST (0, 5, 5, 1)
+  TEST (1, 5, 5, 1)
+  TEST (2, 5, 5, 1)
+  TEST (3, 5, 5, 1)
+  TEST (4, 5, 5, 1)
+  TEST (5, 5, 5, 1)
+  TEST (6, 5, 5, 1)
+  TEST (7, 5, 5, 1)
+  TEST (0, 6, 5, 1)
+  TEST (1, 6, 5, 1)
+  TEST (2, 6, 5, 1)
+  TEST (3, 6, 5, 1)
+  TEST (4, 6, 5, 1)
+  TEST (5, 6, 5, 1)
+  TEST (6, 6, 5, 1)
+  TEST (7, 6, 5, 1)
+  TEST (0, 7, 5, 1)
+  TEST (1, 7, 5, 1)
+  TEST (2, 7, 5, 1)
+  TEST (3, 7, 5, 1)
+  TEST (4, 7, 5, 1)
+  TEST (5, 7, 5, 1)
+  TEST (6, 7, 5, 1)
+  TEST (7, 7, 5, 1)
+}
+
+void check7(void)
+{
+  TEST (0, 0, 6, 1)
+  TEST (1, 0, 6, 1)
+  TEST (2, 0, 6, 1)
+  TEST (3, 0, 6, 1)
+  TEST (4, 0, 6, 1)
+  TEST (5, 0, 6, 1)
+  TEST (6, 0, 6, 1)
+  TEST (7, 0, 6, 1)
+  TEST (0, 1, 6, 1)
+  TEST (1, 1, 6, 1)
+  TEST (2, 1, 6, 1)
+  TEST (3, 1, 6, 1)
+  TEST (4, 1, 6, 1)
+  TEST (5, 1, 6, 1)
+  TEST (6, 1, 6, 1)
+  TEST (7, 1, 6, 1)
+  TEST (0, 2, 6, 1)
+  TEST (1, 2, 6, 1)
+  TEST (2, 2, 6, 1)
+  TEST (3, 2, 6, 1)
+  TEST (4, 2, 6, 1)
+  TEST (5, 2, 6, 1)
+  TEST (6, 2, 6, 1)
+  TEST (7, 2, 6, 1)
+  TEST (0, 3, 6, 1)
+  TEST (1, 3, 6, 1)
+  TEST (2, 3, 6, 1)
+  TEST (3, 3, 6, 1)
+  TEST (4, 3, 6, 1)
+  TEST (5, 3, 6, 1)
+  TEST (6, 3, 6, 1)
+  TEST (7, 3, 6, 1)
+  TEST (0, 4, 6, 1)
+  TEST (1, 4, 6, 1)
+  TEST (2, 4, 6, 1)
+  TEST (3, 4, 6, 1)
+  TEST (4, 4, 6, 1)
+  TEST (5, 4, 6, 1)
+  TEST (6, 4, 6, 1)
+  TEST (7, 4, 6, 1)
+  TEST (0, 5, 6, 1)
+  TEST (1, 5, 6, 1)
+  TEST (2, 5, 6, 1)
+  TEST (3, 5, 6, 1)
+  TEST (4, 5, 6, 1)
+  TEST (5, 5, 6, 1)
+  TEST (6, 5, 6, 1)
+  TEST (7, 5, 6, 1)
+  TEST (0, 6, 6, 1)
+  TEST (1, 6, 6, 1)
+  TEST (2, 6, 6, 1)
+  TEST (3, 6, 6, 1)
+  TEST (4, 6, 6, 1)
+  TEST (5, 6, 6, 1)
+  TEST (6, 6, 6, 1)
+  TEST (7, 6, 6, 1)
+  TEST (0, 7, 6, 1)
+  TEST (1, 7, 6, 1)
+  TEST (2, 7, 6, 1)
+  TEST (3, 7, 6, 1)
+  TEST (4, 7, 6, 1)
+  TEST (5, 7, 6, 1)
+  TEST (6, 7, 6, 1)
+  TEST (7, 7, 6, 1)
+  TEST (0, 0, 7, 1)
+  TEST (1, 0, 7, 1)
+  TEST (2, 0, 7, 1)
+  TEST (3, 0, 7, 1)
+  TEST (4, 0, 7, 1)
+  TEST (5, 0, 7, 1)
+  TEST (6, 0, 7, 1)
+  TEST (7, 0, 7, 1)
+  TEST (0, 1, 7, 1)
+  TEST (1, 1, 7, 1)
+  TEST (2, 1, 7, 1)
+  TEST (3, 1, 7, 1)
+  TEST (4, 1, 7, 1)
+  TEST (5, 1, 7, 1)
+  TEST (6, 1, 7, 1)
+  TEST (7, 1, 7, 1)
+  TEST (0, 2, 7, 1)
+  TEST (1, 2, 7, 1)
+  TEST (2, 2, 7, 1)
+  TEST (3, 2, 7, 1)
+  TEST (4, 2, 7, 1)
+  TEST (5, 2, 7, 1)
+  TEST (6, 2, 7, 1)
+  TEST (7, 2, 7, 1)
+  TEST (0, 3, 7, 1)
+  TEST (1, 3, 7, 1)
+  TEST (2, 3, 7, 1)
+  TEST (3, 3, 7, 1)
+  TEST (4, 3, 7, 1)
+  TEST (5, 3, 7, 1)
+  TEST (6, 3, 7, 1)
+  TEST (7, 3, 7, 1)
+  TEST (0, 4, 7, 1)
+  TEST (1, 4, 7, 1)
+  TEST (2, 4, 7, 1)
+  TEST (3, 4, 7, 1)
+  TEST (4, 4, 7, 1)
+  TEST (5, 4, 7, 1)
+  TEST (6, 4, 7, 1)
+  TEST (7, 4, 7, 1)
+  TEST (0, 5, 7, 1)
+  TEST (1, 5, 7, 1)
+  TEST (2, 5, 7, 1)
+  TEST (3, 5, 7, 1)
+  TEST (4, 5, 7, 1)
+  TEST (5, 5, 7, 1)
+  TEST (6, 5, 7, 1)
+  TEST (7, 5, 7, 1)
+  TEST (0, 6, 7, 1)
+  TEST (1, 6, 7, 1)
+  TEST (2, 6, 7, 1)
+  TEST (3, 6, 7, 1)
+  TEST (4, 6, 7, 1)
+  TEST (5, 6, 7, 1)
+  TEST (6, 6, 7, 1)
+  TEST (7, 6, 7, 1)
+  TEST (0, 7, 7, 1)
+  TEST (1, 7, 7, 1)
+  TEST (2, 7, 7, 1)
+  TEST (3, 7, 7, 1)
+  TEST (4, 7, 7, 1)
+  TEST (5, 7, 7, 1)
+  TEST (6, 7, 7, 1)
+  TEST (7, 7, 7, 1)
+}
+
+void check8(void)
+{
+  TEST (0, 0, 0, 2)
+  TEST (1, 0, 0, 2)
+  TEST (2, 0, 0, 2)
+  TEST (3, 0, 0, 2)
+  TEST (4, 0, 0, 2)
+  TEST (5, 0, 0, 2)
+  TEST (6, 0, 0, 2)
+  TEST (7, 0, 0, 2)
+  TEST (0, 1, 0, 2)
+  TEST (1, 1, 0, 2)
+  TEST (2, 1, 0, 2)
+  TEST (3, 1, 0, 2)
+  TEST (4, 1, 0, 2)
+  TEST (5, 1, 0, 2)
+  TEST (6, 1, 0, 2)
+  TEST (7, 1, 0, 2)
+  TEST (0, 2, 0, 2)
+  TEST (1, 2, 0, 2)
+  TEST (2, 2, 0, 2)
+  TEST (3, 2, 0, 2)
+  TEST (4, 2, 0, 2)
+  TEST (5, 2, 0, 2)
+  TEST (6, 2, 0, 2)
+  TEST (7, 2, 0, 2)
+  TEST (0, 3, 0, 2)
+  TEST (1, 3, 0, 2)
+  TEST (2, 3, 0, 2)
+  TEST (3, 3, 0, 2)
+  TEST (4, 3, 0, 2)
+  TEST (5, 3, 0, 2)
+  TEST (6, 3, 0, 2)
+  TEST (7, 3, 0, 2)
+  TEST (0, 4, 0, 2)
+  TEST (1, 4, 0, 2)
+  TEST (2, 4, 0, 2)
+  TEST (3, 4, 0, 2)
+  TEST (4, 4, 0, 2)
+  TEST (5, 4, 0, 2)
+  TEST (6, 4, 0, 2)
+  TEST (7, 4, 0, 2)
+  TEST (0, 5, 0, 2)
+  TEST (1, 5, 0, 2)
+  TEST (2, 5, 0, 2)
+  TEST (3, 5, 0, 2)
+  TEST (4, 5, 0, 2)
+  TEST (5, 5, 0, 2)
+  TEST (6, 5, 0, 2)
+  TEST (7, 5, 0, 2)
+  TEST (0, 6, 0, 2)
+  TEST (1, 6, 0, 2)
+  TEST (2, 6, 0, 2)
+  TEST (3, 6, 0, 2)
+  TEST (4, 6, 0, 2)
+  TEST (5, 6, 0, 2)
+  TEST (6, 6, 0, 2)
+  TEST (7, 6, 0, 2)
+  TEST (0, 7, 0, 2)
+  TEST (1, 7, 0, 2)
+  TEST (2, 7, 0, 2)
+  TEST (3, 7, 0, 2)
+  TEST (4, 7, 0, 2)
+  TEST (5, 7, 0, 2)
+  TEST (6, 7, 0, 2)
+  TEST (7, 7, 0, 2)
+  TEST (0, 0, 1, 2)
+  TEST (1, 0, 1, 2)
+  TEST (2, 0, 1, 2)
+  TEST (3, 0, 1, 2)
+  TEST (4, 0, 1, 2)
+  TEST (5, 0, 1, 2)
+  TEST (6, 0, 1, 2)
+  TEST (7, 0, 1, 2)
+  TEST (0, 1, 1, 2)
+  TEST (1, 1, 1, 2)
+  TEST (2, 1, 1, 2)
+  TEST (3, 1, 1, 2)
+  TEST (4, 1, 1, 2)
+  TEST (5, 1, 1, 2)
+  TEST (6, 1, 1, 2)
+  TEST (7, 1, 1, 2)
+  TEST (0, 2, 1, 2)
+  TEST (1, 2, 1, 2)
+  TEST (2, 2, 1, 2)
+  TEST (3, 2, 1, 2)
+  TEST (4, 2, 1, 2)
+  TEST (5, 2, 1, 2)
+  TEST (6, 2, 1, 2)
+  TEST (7, 2, 1, 2)
+  TEST (0, 3, 1, 2)
+  TEST (1, 3, 1, 2)
+  TEST (2, 3, 1, 2)
+  TEST (3, 3, 1, 2)
+  TEST (4, 3, 1, 2)
+  TEST (5, 3, 1, 2)
+  TEST (6, 3, 1, 2)
+  TEST (7, 3, 1, 2)
+  TEST (0, 4, 1, 2)
+  TEST (1, 4, 1, 2)
+  TEST (2, 4, 1, 2)
+  TEST (3, 4, 1, 2)
+  TEST (4, 4, 1, 2)
+  TEST (5, 4, 1, 2)
+  TEST (6, 4, 1, 2)
+  TEST (7, 4, 1, 2)
+  TEST (0, 5, 1, 2)
+  TEST (1, 5, 1, 2)
+  TEST (2, 5, 1, 2)
+  TEST (3, 5, 1, 2)
+  TEST (4, 5, 1, 2)
+  TEST (5, 5, 1, 2)
+  TEST (6, 5, 1, 2)
+  TEST (7, 5, 1, 2)
+  TEST (0, 6, 1, 2)
+  TEST (1, 6, 1, 2)
+  TEST (2, 6, 1, 2)
+  TEST (3, 6, 1, 2)
+  TEST (4, 6, 1, 2)
+  TEST (5, 6, 1, 2)
+  TEST (6, 6, 1, 2)
+  TEST (7, 6, 1, 2)
+  TEST (0, 7, 1, 2)
+  TEST (1, 7, 1, 2)
+  TEST (2, 7, 1, 2)
+  TEST (3, 7, 1, 2)
+  TEST (4, 7, 1, 2)
+  TEST (5, 7, 1, 2)
+  TEST (6, 7, 1, 2)
+  TEST (7, 7, 1, 2)
+}
+
+void check9(void)
+{
+  TEST (0, 0, 2, 2)
+  TEST (1, 0, 2, 2)
+  TEST (2, 0, 2, 2)
+  TEST (3, 0, 2, 2)
+  TEST (4, 0, 2, 2)
+  TEST (5, 0, 2, 2)
+  TEST (6, 0, 2, 2)
+  TEST (7, 0, 2, 2)
+  TEST (0, 1, 2, 2)
+  TEST (1, 1, 2, 2)
+  TEST (2, 1, 2, 2)
+  TEST (3, 1, 2, 2)
+  TEST (4, 1, 2, 2)
+  TEST (5, 1, 2, 2)
+  TEST (6, 1, 2, 2)
+  TEST (7, 1, 2, 2)
+  TEST (0, 2, 2, 2)
+  TEST (1, 2, 2, 2)
+  TEST (2, 2, 2, 2)
+  TEST (3, 2, 2, 2)
+  TEST (4, 2, 2, 2)
+  TEST (5, 2, 2, 2)
+  TEST (6, 2, 2, 2)
+  TEST (7, 2, 2, 2)
+  TEST (0, 3, 2, 2)
+  TEST (1, 3, 2, 2)
+  TEST (2, 3, 2, 2)
+  TEST (3, 3, 2, 2)
+  TEST (4, 3, 2, 2)
+  TEST (5, 3, 2, 2)
+  TEST (6, 3, 2, 2)
+  TEST (7, 3, 2, 2)
+  TEST (0, 4, 2, 2)
+  TEST (1, 4, 2, 2)
+  TEST (2, 4, 2, 2)
+  TEST (3, 4, 2, 2)
+  TEST (4, 4, 2, 2)
+  TEST (5, 4, 2, 2)
+  TEST (6, 4, 2, 2)
+  TEST (7, 4, 2, 2)
+  TEST (0, 5, 2, 2)
+  TEST (1, 5, 2, 2)
+  TEST (2, 5, 2, 2)
+  TEST (3, 5, 2, 2)
+  TEST (4, 5, 2, 2)
+  TEST (5, 5, 2, 2)
+  TEST (6, 5, 2, 2)
+  TEST (7, 5, 2, 2)
+  TEST (0, 6, 2, 2)
+  TEST (1, 6, 2, 2)
+  TEST (2, 6, 2, 2)
+  TEST (3, 6, 2, 2)
+  TEST (4, 6, 2, 2)
+  TEST (5, 6, 2, 2)
+  TEST (6, 6, 2, 2)
+  TEST (7, 6, 2, 2)
+  TEST (0, 7, 2, 2)
+  TEST (1, 7, 2, 2)
+  TEST (2, 7, 2, 2)
+  TEST (3, 7, 2, 2)
+  TEST (4, 7, 2, 2)
+  TEST (5, 7, 2, 2)
+  TEST (6, 7, 2, 2)
+  TEST (7, 7, 2, 2)
+  TEST (0, 0, 3, 2)
+  TEST (1, 0, 3, 2)
+  TEST (2, 0, 3, 2)
+  TEST (3, 0, 3, 2)
+  TEST (4, 0, 3, 2)
+  TEST (5, 0, 3, 2)
+  TEST (6, 0, 3, 2)
+  TEST (7, 0, 3, 2)
+  TEST (0, 1, 3, 2)
+  TEST (1, 1, 3, 2)
+  TEST (2, 1, 3, 2)
+  TEST (3, 1, 3, 2)
+  TEST (4, 1, 3, 2)
+  TEST (5, 1, 3, 2)
+  TEST (6, 1, 3, 2)
+  TEST (7, 1, 3, 2)
+  TEST (0, 2, 3, 2)
+  TEST (1, 2, 3, 2)
+  TEST (2, 2, 3, 2)
+  TEST (3, 2, 3, 2)
+  TEST (4, 2, 3, 2)
+  TEST (5, 2, 3, 2)
+  TEST (6, 2, 3, 2)
+  TEST (7, 2, 3, 2)
+  TEST (0, 3, 3, 2)
+  TEST (1, 3, 3, 2)
+  TEST (2, 3, 3, 2)
+  TEST (3, 3, 3, 2)
+  TEST (4, 3, 3, 2)
+  TEST (5, 3, 3, 2)
+  TEST (6, 3, 3, 2)
+  TEST (7, 3, 3, 2)
+  TEST (0, 4, 3, 2)
+  TEST (1, 4, 3, 2)
+  TEST (2, 4, 3, 2)
+  TEST (3, 4, 3, 2)
+  TEST (4, 4, 3, 2)
+  TEST (5, 4, 3, 2)
+  TEST (6, 4, 3, 2)
+  TEST (7, 4, 3, 2)
+  TEST (0, 5, 3, 2)
+  TEST (1, 5, 3, 2)
+  TEST (2, 5, 3, 2)
+  TEST (3, 5, 3, 2)
+  TEST (4, 5, 3, 2)
+  TEST (5, 5, 3, 2)
+  TEST (6, 5, 3, 2)
+  TEST (7, 5, 3, 2)
+  TEST (0, 6, 3, 2)
+  TEST (1, 6, 3, 2)
+  TEST (2, 6, 3, 2)
+  TEST (3, 6, 3, 2)
+  TEST (4, 6, 3, 2)
+  TEST (5, 6, 3, 2)
+  TEST (6, 6, 3, 2)
+  TEST (7, 6, 3, 2)
+  TEST (0, 7, 3, 2)
+  TEST (1, 7, 3, 2)
+  TEST (2, 7, 3, 2)
+  TEST (3, 7, 3, 2)
+  TEST (4, 7, 3, 2)
+  TEST (5, 7, 3, 2)
+  TEST (6, 7, 3, 2)
+  TEST (7, 7, 3, 2)
+}
+
+void check10(void)
+{
+  TEST (0, 0, 4, 2)
+  TEST (1, 0, 4, 2)
+  TEST (2, 0, 4, 2)
+  TEST (3, 0, 4, 2)
+  TEST (4, 0, 4, 2)
+  TEST (5, 0, 4, 2)
+  TEST (6, 0, 4, 2)
+  TEST (7, 0, 4, 2)
+  TEST (0, 1, 4, 2)
+  TEST (1, 1, 4, 2)
+  TEST (2, 1, 4, 2)
+  TEST (3, 1, 4, 2)
+  TEST (4, 1, 4, 2)
+  TEST (5, 1, 4, 2)
+  TEST (6, 1, 4, 2)
+  TEST (7, 1, 4, 2)
+  TEST (0, 2, 4, 2)
+  TEST (1, 2, 4, 2)
+  TEST (2, 2, 4, 2)
+  TEST (3, 2, 4, 2)
+  TEST (4, 2, 4, 2)
+  TEST (5, 2, 4, 2)
+  TEST (6, 2, 4, 2)
+  TEST (7, 2, 4, 2)
+  TEST (0, 3, 4, 2)
+  TEST (1, 3, 4, 2)
+  TEST (2, 3, 4, 2)
+  TEST (3, 3, 4, 2)
+  TEST (4, 3, 4, 2)
+  TEST (5, 3, 4, 2)
+  TEST (6, 3, 4, 2)
+  TEST (7, 3, 4, 2)
+  TEST (0, 4, 4, 2)
+  TEST (1, 4, 4, 2)
+  TEST (2, 4, 4, 2)
+  TEST (3, 4, 4, 2)
+  TEST (4, 4, 4, 2)
+  TEST (5, 4, 4, 2)
+  TEST (6, 4, 4, 2)
+  TEST (7, 4, 4, 2)
+  TEST (0, 5, 4, 2)
+  TEST (1, 5, 4, 2)
+  TEST (2, 5, 4, 2)
+  TEST (3, 5, 4, 2)
+  TEST (4, 5, 4, 2)
+  TEST (5, 5, 4, 2)
+  TEST (6, 5, 4, 2)
+  TEST (7, 5, 4, 2)
+  TEST (0, 6, 4, 2)
+  TEST (1, 6, 4, 2)
+  TEST (2, 6, 4, 2)
+  TEST (3, 6, 4, 2)
+  TEST (4, 6, 4, 2)
+  TEST (5, 6, 4, 2)
+  TEST (6, 6, 4, 2)
+  TEST (7, 6, 4, 2)
+  TEST (0, 7, 4, 2)
+  TEST (1, 7, 4, 2)
+  TEST (2, 7, 4, 2)
+  TEST (3, 7, 4, 2)
+  TEST (4, 7, 4, 2)
+  TEST (5, 7, 4, 2)
+  TEST (6, 7, 4, 2)
+  TEST (7, 7, 4, 2)
+  TEST (0, 0, 5, 2)
+  TEST (1, 0, 5, 2)
+  TEST (2, 0, 5, 2)
+  TEST (3, 0, 5, 2)
+  TEST (4, 0, 5, 2)
+  TEST (5, 0, 5, 2)
+  TEST (6, 0, 5, 2)
+  TEST (7, 0, 5, 2)
+  TEST (0, 1, 5, 2)
+  TEST (1, 1, 5, 2)
+  TEST (2, 1, 5, 2)
+  TEST (3, 1, 5, 2)
+  TEST (4, 1, 5, 2)
+  TEST (5, 1, 5, 2)
+  TEST (6, 1, 5, 2)
+  TEST (7, 1, 5, 2)
+  TEST (0, 2, 5, 2)
+  TEST (1, 2, 5, 2)
+  TEST (2, 2, 5, 2)
+  TEST (3, 2, 5, 2)
+  TEST (4, 2, 5, 2)
+  TEST (5, 2, 5, 2)
+  TEST (6, 2, 5, 2)
+  TEST (7, 2, 5, 2)
+  TEST (0, 3, 5, 2)
+  TEST (1, 3, 5, 2)
+  TEST (2, 3, 5, 2)
+  TEST (3, 3, 5, 2)
+  TEST (4, 3, 5, 2)
+  TEST (5, 3, 5, 2)
+  TEST (6, 3, 5, 2)
+  TEST (7, 3, 5, 2)
+  TEST (0, 4, 5, 2)
+  TEST (1, 4, 5, 2)
+  TEST (2, 4, 5, 2)
+  TEST (3, 4, 5, 2)
+  TEST (4, 4, 5, 2)
+  TEST (5, 4, 5, 2)
+  TEST (6, 4, 5, 2)
+  TEST (7, 4, 5, 2)
+  TEST (0, 5, 5, 2)
+  TEST (1, 5, 5, 2)
+  TEST (2, 5, 5, 2)
+  TEST (3, 5, 5, 2)
+  TEST (4, 5, 5, 2)
+  TEST (5, 5, 5, 2)
+  TEST (6, 5, 5, 2)
+  TEST (7, 5, 5, 2)
+  TEST (0, 6, 5, 2)
+  TEST (1, 6, 5, 2)
+  TEST (2, 6, 5, 2)
+  TEST (3, 6, 5, 2)
+  TEST (4, 6, 5, 2)
+  TEST (5, 6, 5, 2)
+  TEST (6, 6, 5, 2)
+  TEST (7, 6, 5, 2)
+  TEST (0, 7, 5, 2)
+  TEST (1, 7, 5, 2)
+  TEST (2, 7, 5, 2)
+  TEST (3, 7, 5, 2)
+  TEST (4, 7, 5, 2)
+  TEST (5, 7, 5, 2)
+  TEST (6, 7, 5, 2)
+  TEST (7, 7, 5, 2)
+}
+
+void check11(void)
+{
+  TEST (0, 0, 6, 2)
+  TEST (1, 0, 6, 2)
+  TEST (2, 0, 6, 2)
+  TEST (3, 0, 6, 2)
+  TEST (4, 0, 6, 2)
+  TEST (5, 0, 6, 2)
+  TEST (6, 0, 6, 2)
+  TEST (7, 0, 6, 2)
+  TEST (0, 1, 6, 2)
+  TEST (1, 1, 6, 2)
+  TEST (2, 1, 6, 2)
+  TEST (3, 1, 6, 2)
+  TEST (4, 1, 6, 2)
+  TEST (5, 1, 6, 2)
+  TEST (6, 1, 6, 2)
+  TEST (7, 1, 6, 2)
+  TEST (0, 2, 6, 2)
+  TEST (1, 2, 6, 2)
+  TEST (2, 2, 6, 2)
+  TEST (3, 2, 6, 2)
+  TEST (4, 2, 6, 2)
+  TEST (5, 2, 6, 2)
+  TEST (6, 2, 6, 2)
+  TEST (7, 2, 6, 2)
+  TEST (0, 3, 6, 2)
+  TEST (1, 3, 6, 2)
+  TEST (2, 3, 6, 2)
+  TEST (3, 3, 6, 2)
+  TEST (4, 3, 6, 2)
+  TEST (5, 3, 6, 2)
+  TEST (6, 3, 6, 2)
+  TEST (7, 3, 6, 2)
+  TEST (0, 4, 6, 2)
+  TEST (1, 4, 6, 2)
+  TEST (2, 4, 6, 2)
+  TEST (3, 4, 6, 2)
+  TEST (4, 4, 6, 2)
+  TEST (5, 4, 6, 2)
+  TEST (6, 4, 6, 2)
+  TEST (7, 4, 6, 2)
+  TEST (0, 5, 6, 2)
+  TEST (1, 5, 6, 2)
+  TEST (2, 5, 6, 2)
+  TEST (3, 5, 6, 2)
+  TEST (4, 5, 6, 2)
+  TEST (5, 5, 6, 2)
+  TEST (6, 5, 6, 2)
+  TEST (7, 5, 6, 2)
+  TEST (0, 6, 6, 2)
+  TEST (1, 6, 6, 2)
+  TEST (2, 6, 6, 2)
+  TEST (3, 6, 6, 2)
+  TEST (4, 6, 6, 2)
+  TEST (5, 6, 6, 2)
+  TEST (6, 6, 6, 2)
+  TEST (7, 6, 6, 2)
+  TEST (0, 7, 6, 2)
+  TEST (1, 7, 6, 2)
+  TEST (2, 7, 6, 2)
+  TEST (3, 7, 6, 2)
+  TEST (4, 7, 6, 2)
+  TEST (5, 7, 6, 2)
+  TEST (6, 7, 6, 2)
+  TEST (7, 7, 6, 2)
+  TEST (0, 0, 7, 2)
+  TEST (1, 0, 7, 2)
+  TEST (2, 0, 7, 2)
+  TEST (3, 0, 7, 2)
+  TEST (4, 0, 7, 2)
+  TEST (5, 0, 7, 2)
+  TEST (6, 0, 7, 2)
+  TEST (7, 0, 7, 2)
+  TEST (0, 1, 7, 2)
+  TEST (1, 1, 7, 2)
+  TEST (2, 1, 7, 2)
+  TEST (3, 1, 7, 2)
+  TEST (4, 1, 7, 2)
+  TEST (5, 1, 7, 2)
+  TEST (6, 1, 7, 2)
+  TEST (7, 1, 7, 2)
+  TEST (0, 2, 7, 2)
+  TEST (1, 2, 7, 2)
+  TEST (2, 2, 7, 2)
+  TEST (3, 2, 7, 2)
+  TEST (4, 2, 7, 2)
+  TEST (5, 2, 7, 2)
+  TEST (6, 2, 7, 2)
+  TEST (7, 2, 7, 2)
+  TEST (0, 3, 7, 2)
+  TEST (1, 3, 7, 2)
+  TEST (2, 3, 7, 2)
+  TEST (3, 3, 7, 2)
+  TEST (4, 3, 7, 2)
+  TEST (5, 3, 7, 2)
+  TEST (6, 3, 7, 2)
+  TEST (7, 3, 7, 2)
+  TEST (0, 4, 7, 2)
+  TEST (1, 4, 7, 2)
+  TEST (2, 4, 7, 2)
+  TEST (3, 4, 7, 2)
+  TEST (4, 4, 7, 2)
+  TEST (5, 4, 7, 2)
+  TEST (6, 4, 7, 2)
+  TEST (7, 4, 7, 2)
+  TEST (0, 5, 7, 2)
+  TEST (1, 5, 7, 2)
+  TEST (2, 5, 7, 2)
+  TEST (3, 5, 7, 2)
+  TEST (4, 5, 7, 2)
+  TEST (5, 5, 7, 2)
+  TEST (6, 5, 7, 2)
+  TEST (7, 5, 7, 2)
+  TEST (0, 6, 7, 2)
+  TEST (1, 6, 7, 2)
+  TEST (2, 6, 7, 2)
+  TEST (3, 6, 7, 2)
+  TEST (4, 6, 7, 2)
+  TEST (5, 6, 7, 2)
+  TEST (6, 6, 7, 2)
+  TEST (7, 6, 7, 2)
+  TEST (0, 7, 7, 2)
+  TEST (1, 7, 7, 2)
+  TEST (2, 7, 7, 2)
+  TEST (3, 7, 7, 2)
+  TEST (4, 7, 7, 2)
+  TEST (5, 7, 7, 2)
+  TEST (6, 7, 7, 2)
+  TEST (7, 7, 7, 2)
+}
+
+void check12(void)
+{
+  TEST (0, 0, 0, 3)
+  TEST (1, 0, 0, 3)
+  TEST (2, 0, 0, 3)
+  TEST (3, 0, 0, 3)
+  TEST (4, 0, 0, 3)
+  TEST (5, 0, 0, 3)
+  TEST (6, 0, 0, 3)
+  TEST (7, 0, 0, 3)
+  TEST (0, 1, 0, 3)
+  TEST (1, 1, 0, 3)
+  TEST (2, 1, 0, 3)
+  TEST (3, 1, 0, 3)
+  TEST (4, 1, 0, 3)
+  TEST (5, 1, 0, 3)
+  TEST (6, 1, 0, 3)
+  TEST (7, 1, 0, 3)
+  TEST (0, 2, 0, 3)
+  TEST (1, 2, 0, 3)
+  TEST (2, 2, 0, 3)
+  TEST (3, 2, 0, 3)
+  TEST (4, 2, 0, 3)
+  TEST (5, 2, 0, 3)
+  TEST (6, 2, 0, 3)
+  TEST (7, 2, 0, 3)
+  TEST (0, 3, 0, 3)
+  TEST (1, 3, 0, 3)
+  TEST (2, 3, 0, 3)
+  TEST (3, 3, 0, 3)
+  TEST (4, 3, 0, 3)
+  TEST (5, 3, 0, 3)
+  TEST (6, 3, 0, 3)
+  TEST (7, 3, 0, 3)
+  TEST (0, 4, 0, 3)
+  TEST (1, 4, 0, 3)
+  TEST (2, 4, 0, 3)
+  TEST (3, 4, 0, 3)
+  TEST (4, 4, 0, 3)
+  TEST (5, 4, 0, 3)
+  TEST (6, 4, 0, 3)
+  TEST (7, 4, 0, 3)
+  TEST (0, 5, 0, 3)
+  TEST (1, 5, 0, 3)
+  TEST (2, 5, 0, 3)
+  TEST (3, 5, 0, 3)
+  TEST (4, 5, 0, 3)
+  TEST (5, 5, 0, 3)
+  TEST (6, 5, 0, 3)
+  TEST (7, 5, 0, 3)
+  TEST (0, 6, 0, 3)
+  TEST (1, 6, 0, 3)
+  TEST (2, 6, 0, 3)
+  TEST (3, 6, 0, 3)
+  TEST (4, 6, 0, 3)
+  TEST (5, 6, 0, 3)
+  TEST (6, 6, 0, 3)
+  TEST (7, 6, 0, 3)
+  TEST (0, 7, 0, 3)
+  TEST (1, 7, 0, 3)
+  TEST (2, 7, 0, 3)
+  TEST (3, 7, 0, 3)
+  TEST (4, 7, 0, 3)
+  TEST (5, 7, 0, 3)
+  TEST (6, 7, 0, 3)
+  TEST (7, 7, 0, 3)
+  TEST (0, 0, 1, 3)
+  TEST (1, 0, 1, 3)
+  TEST (2, 0, 1, 3)
+  TEST (3, 0, 1, 3)
+  TEST (4, 0, 1, 3)
+  TEST (5, 0, 1, 3)
+  TEST (6, 0, 1, 3)
+  TEST (7, 0, 1, 3)
+  TEST (0, 1, 1, 3)
+  TEST (1, 1, 1, 3)
+  TEST (2, 1, 1, 3)
+  TEST (3, 1, 1, 3)
+  TEST (4, 1, 1, 3)
+  TEST (5, 1, 1, 3)
+  TEST (6, 1, 1, 3)
+  TEST (7, 1, 1, 3)
+  TEST (0, 2, 1, 3)
+  TEST (1, 2, 1, 3)
+  TEST (2, 2, 1, 3)
+  TEST (3, 2, 1, 3)
+  TEST (4, 2, 1, 3)
+  TEST (5, 2, 1, 3)
+  TEST (6, 2, 1, 3)
+  TEST (7, 2, 1, 3)
+  TEST (0, 3, 1, 3)
+  TEST (1, 3, 1, 3)
+  TEST (2, 3, 1, 3)
+  TEST (3, 3, 1, 3)
+  TEST (4, 3, 1, 3)
+  TEST (5, 3, 1, 3)
+  TEST (6, 3, 1, 3)
+  TEST (7, 3, 1, 3)
+  TEST (0, 4, 1, 3)
+  TEST (1, 4, 1, 3)
+  TEST (2, 4, 1, 3)
+  TEST (3, 4, 1, 3)
+  TEST (4, 4, 1, 3)
+  TEST (5, 4, 1, 3)
+  TEST (6, 4, 1, 3)
+  TEST (7, 4, 1, 3)
+  TEST (0, 5, 1, 3)
+  TEST (1, 5, 1, 3)
+  TEST (2, 5, 1, 3)
+  TEST (3, 5, 1, 3)
+  TEST (4, 5, 1, 3)
+  TEST (5, 5, 1, 3)
+  TEST (6, 5, 1, 3)
+  TEST (7, 5, 1, 3)
+  TEST (0, 6, 1, 3)
+  TEST (1, 6, 1, 3)
+  TEST (2, 6, 1, 3)
+  TEST (3, 6, 1, 3)
+  TEST (4, 6, 1, 3)
+  TEST (5, 6, 1, 3)
+  TEST (6, 6, 1, 3)
+  TEST (7, 6, 1, 3)
+  TEST (0, 7, 1, 3)
+  TEST (1, 7, 1, 3)
+  TEST (2, 7, 1, 3)
+  TEST (3, 7, 1, 3)
+  TEST (4, 7, 1, 3)
+  TEST (5, 7, 1, 3)
+  TEST (6, 7, 1, 3)
+  TEST (7, 7, 1, 3)
+}
+
+void check13(void)
+{
+  TEST (0, 0, 2, 3)
+  TEST (1, 0, 2, 3)
+  TEST (2, 0, 2, 3)
+  TEST (3, 0, 2, 3)
+  TEST (4, 0, 2, 3)
+  TEST (5, 0, 2, 3)
+  TEST (6, 0, 2, 3)
+  TEST (7, 0, 2, 3)
+  TEST (0, 1, 2, 3)
+  TEST (1, 1, 2, 3)
+  TEST (2, 1, 2, 3)
+  TEST (3, 1, 2, 3)
+  TEST (4, 1, 2, 3)
+  TEST (5, 1, 2, 3)
+  TEST (6, 1, 2, 3)
+  TEST (7, 1, 2, 3)
+  TEST (0, 2, 2, 3)
+  TEST (1, 2, 2, 3)
+  TEST (2, 2, 2, 3)
+  TEST (3, 2, 2, 3)
+  TEST (4, 2, 2, 3)
+  TEST (5, 2, 2, 3)
+  TEST (6, 2, 2, 3)
+  TEST (7, 2, 2, 3)
+  TEST (0, 3, 2, 3)
+  TEST (1, 3, 2, 3)
+  TEST (2, 3, 2, 3)
+  TEST (3, 3, 2, 3)
+  TEST (4, 3, 2, 3)
+  TEST (5, 3, 2, 3)
+  TEST (6, 3, 2, 3)
+  TEST (7, 3, 2, 3)
+  TEST (0, 4, 2, 3)
+  TEST (1, 4, 2, 3)
+  TEST (2, 4, 2, 3)
+  TEST (3, 4, 2, 3)
+  TEST (4, 4, 2, 3)
+  TEST (5, 4, 2, 3)
+  TEST (6, 4, 2, 3)
+  TEST (7, 4, 2, 3)
+  TEST (0, 5, 2, 3)
+  TEST (1, 5, 2, 3)
+  TEST (2, 5, 2, 3)
+  TEST (3, 5, 2, 3)
+  TEST (4, 5, 2, 3)
+  TEST (5, 5, 2, 3)
+  TEST (6, 5, 2, 3)
+  TEST (7, 5, 2, 3)
+  TEST (0, 6, 2, 3)
+  TEST (1, 6, 2, 3)
+  TEST (2, 6, 2, 3)
+  TEST (3, 6, 2, 3)
+  TEST (4, 6, 2, 3)
+  TEST (5, 6, 2, 3)
+  TEST (6, 6, 2, 3)
+  TEST (7, 6, 2, 3)
+  TEST (0, 7, 2, 3)
+  TEST (1, 7, 2, 3)
+  TEST (2, 7, 2, 3)
+  TEST (3, 7, 2, 3)
+  TEST (4, 7, 2, 3)
+  TEST (5, 7, 2, 3)
+  TEST (6, 7, 2, 3)
+  TEST (7, 7, 2, 3)
+  TEST (0, 0, 3, 3)
+  TEST (1, 0, 3, 3)
+  TEST (2, 0, 3, 3)
+  TEST (3, 0, 3, 3)
+  TEST (4, 0, 3, 3)
+  TEST (5, 0, 3, 3)
+  TEST (6, 0, 3, 3)
+  TEST (7, 0, 3, 3)
+  TEST (0, 1, 3, 3)
+  TEST (1, 1, 3, 3)
+  TEST (2, 1, 3, 3)
+  TEST (3, 1, 3, 3)
+  TEST (4, 1, 3, 3)
+  TEST (5, 1, 3, 3)
+  TEST (6, 1, 3, 3)
+  TEST (7, 1, 3, 3)
+  TEST (0, 2, 3, 3)
+  TEST (1, 2, 3, 3)
+  TEST (2, 2, 3, 3)
+  TEST (3, 2, 3, 3)
+  TEST (4, 2, 3, 3)
+  TEST (5, 2, 3, 3)
+  TEST (6, 2, 3, 3)
+  TEST (7, 2, 3, 3)
+  TEST (0, 3, 3, 3)
+  TEST (1, 3, 3, 3)
+  TEST (2, 3, 3, 3)
+  TEST (3, 3, 3, 3)
+  TEST (4, 3, 3, 3)
+  TEST (5, 3, 3, 3)
+  TEST (6, 3, 3, 3)
+  TEST (7, 3, 3, 3)
+  TEST (0, 4, 3, 3)
+  TEST (1, 4, 3, 3)
+  TEST (2, 4, 3, 3)
+  TEST (3, 4, 3, 3)
+  TEST (4, 4, 3, 3)
+  TEST (5, 4, 3, 3)
+  TEST (6, 4, 3, 3)
+  TEST (7, 4, 3, 3)
+  TEST (0, 5, 3, 3)
+  TEST (1, 5, 3, 3)
+  TEST (2, 5, 3, 3)
+  TEST (3, 5, 3, 3)
+  TEST (4, 5, 3, 3)
+  TEST (5, 5, 3, 3)
+  TEST (6, 5, 3, 3)
+  TEST (7, 5, 3, 3)
+  TEST (0, 6, 3, 3)
+  TEST (1, 6, 3, 3)
+  TEST (2, 6, 3, 3)
+  TEST (3, 6, 3, 3)
+  TEST (4, 6, 3, 3)
+  TEST (5, 6, 3, 3)
+  TEST (6, 6, 3, 3)
+  TEST (7, 6, 3, 3)
+  TEST (0, 7, 3, 3)
+  TEST (1, 7, 3, 3)
+  TEST (2, 7, 3, 3)
+  TEST (3, 7, 3, 3)
+  TEST (4, 7, 3, 3)
+  TEST (5, 7, 3, 3)
+  TEST (6, 7, 3, 3)
+  TEST (7, 7, 3, 3)
+}
+
+void check14(void)
+{
+  TEST (0, 0, 4, 3)
+  TEST (1, 0, 4, 3)
+  TEST (2, 0, 4, 3)
+  TEST (3, 0, 4, 3)
+  TEST (4, 0, 4, 3)
+  TEST (5, 0, 4, 3)
+  TEST (6, 0, 4, 3)
+  TEST (7, 0, 4, 3)
+  TEST (0, 1, 4, 3)
+  TEST (1, 1, 4, 3)
+  TEST (2, 1, 4, 3)
+  TEST (3, 1, 4, 3)
+  TEST (4, 1, 4, 3)
+  TEST (5, 1, 4, 3)
+  TEST (6, 1, 4, 3)
+  TEST (7, 1, 4, 3)
+  TEST (0, 2, 4, 3)
+  TEST (1, 2, 4, 3)
+  TEST (2, 2, 4, 3)
+  TEST (3, 2, 4, 3)
+  TEST (4, 2, 4, 3)
+  TEST (5, 2, 4, 3)
+  TEST (6, 2, 4, 3)
+  TEST (7, 2, 4, 3)
+  TEST (0, 3, 4, 3)
+  TEST (1, 3, 4, 3)
+  TEST (2, 3, 4, 3)
+  TEST (3, 3, 4, 3)
+  TEST (4, 3, 4, 3)
+  TEST (5, 3, 4, 3)
+  TEST (6, 3, 4, 3)
+  TEST (7, 3, 4, 3)
+  TEST (0, 4, 4, 3)
+  TEST (1, 4, 4, 3)
+  TEST (2, 4, 4, 3)
+  TEST (3, 4, 4, 3)
+  TEST (4, 4, 4, 3)
+  TEST (5, 4, 4, 3)
+  TEST (6, 4, 4, 3)
+  TEST (7, 4, 4, 3)
+  TEST (0, 5, 4, 3)
+  TEST (1, 5, 4, 3)
+  TEST (2, 5, 4, 3)
+  TEST (3, 5, 4, 3)
+  TEST (4, 5, 4, 3)
+  TEST (5, 5, 4, 3)
+  TEST (6, 5, 4, 3)
+  TEST (7, 5, 4, 3)
+  TEST (0, 6, 4, 3)
+  TEST (1, 6, 4, 3)
+  TEST (2, 6, 4, 3)
+  TEST (3, 6, 4, 3)
+  TEST (4, 6, 4, 3)
+  TEST (5, 6, 4, 3)
+  TEST (6, 6, 4, 3)
+  TEST (7, 6, 4, 3)
+  TEST (0, 7, 4, 3)
+  TEST (1, 7, 4, 3)
+  TEST (2, 7, 4, 3)
+  TEST (3, 7, 4, 3)
+  TEST (4, 7, 4, 3)
+  TEST (5, 7, 4, 3)
+  TEST (6, 7, 4, 3)
+  TEST (7, 7, 4, 3)
+  TEST (0, 0, 5, 3)
+  TEST (1, 0, 5, 3)
+  TEST (2, 0, 5, 3)
+  TEST (3, 0, 5, 3)
+  TEST (4, 0, 5, 3)
+  TEST (5, 0, 5, 3)
+  TEST (6, 0, 5, 3)
+  TEST (7, 0, 5, 3)
+  TEST (0, 1, 5, 3)
+  TEST (1, 1, 5, 3)
+  TEST (2, 1, 5, 3)
+  TEST (3, 1, 5, 3)
+  TEST (4, 1, 5, 3)
+  TEST (5, 1, 5, 3)
+  TEST (6, 1, 5, 3)
+  TEST (7, 1, 5, 3)
+  TEST (0, 2, 5, 3)
+  TEST (1, 2, 5, 3)
+  TEST (2, 2, 5, 3)
+  TEST (3, 2, 5, 3)
+  TEST (4, 2, 5, 3)
+  TEST (5, 2, 5, 3)
+  TEST (6, 2, 5, 3)
+  TEST (7, 2, 5, 3)
+  TEST (0, 3, 5, 3)
+  TEST (1, 3, 5, 3)
+  TEST (2, 3, 5, 3)
+  TEST (3, 3, 5, 3)
+  TEST (4, 3, 5, 3)
+  TEST (5, 3, 5, 3)
+  TEST (6, 3, 5, 3)
+  TEST (7, 3, 5, 3)
+  TEST (0, 4, 5, 3)
+  TEST (1, 4, 5, 3)
+  TEST (2, 4, 5, 3)
+  TEST (3, 4, 5, 3)
+  TEST (4, 4, 5, 3)
+  TEST (5, 4, 5, 3)
+  TEST (6, 4, 5, 3)
+  TEST (7, 4, 5, 3)
+  TEST (0, 5, 5, 3)
+  TEST (1, 5, 5, 3)
+  TEST (2, 5, 5, 3)
+  TEST (3, 5, 5, 3)
+  TEST (4, 5, 5, 3)
+  TEST (5, 5, 5, 3)
+  TEST (6, 5, 5, 3)
+  TEST (7, 5, 5, 3)
+  TEST (0, 6, 5, 3)
+  TEST (1, 6, 5, 3)
+  TEST (2, 6, 5, 3)
+  TEST (3, 6, 5, 3)
+  TEST (4, 6, 5, 3)
+  TEST (5, 6, 5, 3)
+  TEST (6, 6, 5, 3)
+  TEST (7, 6, 5, 3)
+  TEST (0, 7, 5, 3)
+  TEST (1, 7, 5, 3)
+  TEST (2, 7, 5, 3)
+  TEST (3, 7, 5, 3)
+  TEST (4, 7, 5, 3)
+  TEST (5, 7, 5, 3)
+  TEST (6, 7, 5, 3)
+  TEST (7, 7, 5, 3)
+}
+
+void check15(void)
+{
+  TEST (0, 0, 6, 3)
+  TEST (1, 0, 6, 3)
+  TEST (2, 0, 6, 3)
+  TEST (3, 0, 6, 3)
+  TEST (4, 0, 6, 3)
+  TEST (5, 0, 6, 3)
+  TEST (6, 0, 6, 3)
+  TEST (7, 0, 6, 3)
+  TEST (0, 1, 6, 3)
+  TEST (1, 1, 6, 3)
+  TEST (2, 1, 6, 3)
+  TEST (3, 1, 6, 3)
+  TEST (4, 1, 6, 3)
+  TEST (5, 1, 6, 3)
+  TEST (6, 1, 6, 3)
+  TEST (7, 1, 6, 3)
+  TEST (0, 2, 6, 3)
+  TEST (1, 2, 6, 3)
+  TEST (2, 2, 6, 3)
+  TEST (3, 2, 6, 3)
+  TEST (4, 2, 6, 3)
+  TEST (5, 2, 6, 3)
+  TEST (6, 2, 6, 3)
+  TEST (7, 2, 6, 3)
+  TEST (0, 3, 6, 3)
+  TEST (1, 3, 6, 3)
+  TEST (2, 3, 6, 3)
+  TEST (3, 3, 6, 3)
+  TEST (4, 3, 6, 3)
+  TEST (5, 3, 6, 3)
+  TEST (6, 3, 6, 3)
+  TEST (7, 3, 6, 3)
+  TEST (0, 4, 6, 3)
+  TEST (1, 4, 6, 3)
+  TEST (2, 4, 6, 3)
+  TEST (3, 4, 6, 3)
+  TEST (4, 4, 6, 3)
+  TEST (5, 4, 6, 3)
+  TEST (6, 4, 6, 3)
+  TEST (7, 4, 6, 3)
+  TEST (0, 5, 6, 3)
+  TEST (1, 5, 6, 3)
+  TEST (2, 5, 6, 3)
+  TEST (3, 5, 6, 3)
+  TEST (4, 5, 6, 3)
+  TEST (5, 5, 6, 3)
+  TEST (6, 5, 6, 3)
+  TEST (7, 5, 6, 3)
+  TEST (0, 6, 6, 3)
+  TEST (1, 6, 6, 3)
+  TEST (2, 6, 6, 3)
+  TEST (3, 6, 6, 3)
+  TEST (4, 6, 6, 3)
+  TEST (5, 6, 6, 3)
+  TEST (6, 6, 6, 3)
+  TEST (7, 6, 6, 3)
+  TEST (0, 7, 6, 3)
+  TEST (1, 7, 6, 3)
+  TEST (2, 7, 6, 3)
+  TEST (3, 7, 6, 3)
+  TEST (4, 7, 6, 3)
+  TEST (5, 7, 6, 3)
+  TEST (6, 7, 6, 3)
+  TEST (7, 7, 6, 3)
+  TEST (0, 0, 7, 3)
+  TEST (1, 0, 7, 3)
+  TEST (2, 0, 7, 3)
+  TEST (3, 0, 7, 3)
+  TEST (4, 0, 7, 3)
+  TEST (5, 0, 7, 3)
+  TEST (6, 0, 7, 3)
+  TEST (7, 0, 7, 3)
+  TEST (0, 1, 7, 3)
+  TEST (1, 1, 7, 3)
+  TEST (2, 1, 7, 3)
+  TEST (3, 1, 7, 3)
+  TEST (4, 1, 7, 3)
+  TEST (5, 1, 7, 3)
+  TEST (6, 1, 7, 3)
+  TEST (7, 1, 7, 3)
+  TEST (0, 2, 7, 3)
+  TEST (1, 2, 7, 3)
+  TEST (2, 2, 7, 3)
+  TEST (3, 2, 7, 3)
+  TEST (4, 2, 7, 3)
+  TEST (5, 2, 7, 3)
+  TEST (6, 2, 7, 3)
+  TEST (7, 2, 7, 3)
+  TEST (0, 3, 7, 3)
+  TEST (1, 3, 7, 3)
+  TEST (2, 3, 7, 3)
+  TEST (3, 3, 7, 3)
+  TEST (4, 3, 7, 3)
+  TEST (5, 3, 7, 3)
+  TEST (6, 3, 7, 3)
+  TEST (7, 3, 7, 3)
+  TEST (0, 4, 7, 3)
+  TEST (1, 4, 7, 3)
+  TEST (2, 4, 7, 3)
+  TEST (3, 4, 7, 3)
+  TEST (4, 4, 7, 3)
+  TEST (5, 4, 7, 3)
+  TEST (6, 4, 7, 3)
+  TEST (7, 4, 7, 3)
+  TEST (0, 5, 7, 3)
+  TEST (1, 5, 7, 3)
+  TEST (2, 5, 7, 3)
+  TEST (3, 5, 7, 3)
+  TEST (4, 5, 7, 3)
+  TEST (5, 5, 7, 3)
+  TEST (6, 5, 7, 3)
+  TEST (7, 5, 7, 3)
+  TEST (0, 6, 7, 3)
+  TEST (1, 6, 7, 3)
+  TEST (2, 6, 7, 3)
+  TEST (3, 6, 7, 3)
+  TEST (4, 6, 7, 3)
+  TEST (5, 6, 7, 3)
+  TEST (6, 6, 7, 3)
+  TEST (7, 6, 7, 3)
+  TEST (0, 7, 7, 3)
+  TEST (1, 7, 7, 3)
+  TEST (2, 7, 7, 3)
+  TEST (3, 7, 7, 3)
+  TEST (4, 7, 7, 3)
+  TEST (5, 7, 7, 3)
+  TEST (6, 7, 7, 3)
+  TEST (7, 7, 7, 3)
+}
+
+void check16(void)
+{
+  TEST (0, 0, 0, 4)
+  TEST (1, 0, 0, 4)
+  TEST (2, 0, 0, 4)
+  TEST (3, 0, 0, 4)
+  TEST (4, 0, 0, 4)
+  TEST (5, 0, 0, 4)
+  TEST (6, 0, 0, 4)
+  TEST (7, 0, 0, 4)
+  TEST (0, 1, 0, 4)
+  TEST (1, 1, 0, 4)
+  TEST (2, 1, 0, 4)
+  TEST (3, 1, 0, 4)
+  TEST (4, 1, 0, 4)
+  TEST (5, 1, 0, 4)
+  TEST (6, 1, 0, 4)
+  TEST (7, 1, 0, 4)
+  TEST (0, 2, 0, 4)
+  TEST (1, 2, 0, 4)
+  TEST (2, 2, 0, 4)
+  TEST (3, 2, 0, 4)
+  TEST (4, 2, 0, 4)
+  TEST (5, 2, 0, 4)
+  TEST (6, 2, 0, 4)
+  TEST (7, 2, 0, 4)
+  TEST (0, 3, 0, 4)
+  TEST (1, 3, 0, 4)
+  TEST (2, 3, 0, 4)
+  TEST (3, 3, 0, 4)
+  TEST (4, 3, 0, 4)
+  TEST (5, 3, 0, 4)
+  TEST (6, 3, 0, 4)
+  TEST (7, 3, 0, 4)
+  TEST (0, 4, 0, 4)
+  TEST (1, 4, 0, 4)
+  TEST (2, 4, 0, 4)
+  TEST (3, 4, 0, 4)
+  TEST (4, 4, 0, 4)
+  TEST (5, 4, 0, 4)
+  TEST (6, 4, 0, 4)
+  TEST (7, 4, 0, 4)
+  TEST (0, 5, 0, 4)
+  TEST (1, 5, 0, 4)
+  TEST (2, 5, 0, 4)
+  TEST (3, 5, 0, 4)
+  TEST (4, 5, 0, 4)
+  TEST (5, 5, 0, 4)
+  TEST (6, 5, 0, 4)
+  TEST (7, 5, 0, 4)
+  TEST (0, 6, 0, 4)
+  TEST (1, 6, 0, 4)
+  TEST (2, 6, 0, 4)
+  TEST (3, 6, 0, 4)
+  TEST (4, 6, 0, 4)
+  TEST (5, 6, 0, 4)
+  TEST (6, 6, 0, 4)
+  TEST (7, 6, 0, 4)
+  TEST (0, 7, 0, 4)
+  TEST (1, 7, 0, 4)
+  TEST (2, 7, 0, 4)
+  TEST (3, 7, 0, 4)
+  TEST (4, 7, 0, 4)
+  TEST (5, 7, 0, 4)
+  TEST (6, 7, 0, 4)
+  TEST (7, 7, 0, 4)
+  TEST (0, 0, 1, 4)
+  TEST (1, 0, 1, 4)
+  TEST (2, 0, 1, 4)
+  TEST (3, 0, 1, 4)
+  TEST (4, 0, 1, 4)
+  TEST (5, 0, 1, 4)
+  TEST (6, 0, 1, 4)
+  TEST (7, 0, 1, 4)
+  TEST (0, 1, 1, 4)
+  TEST (1, 1, 1, 4)
+  TEST (2, 1, 1, 4)
+  TEST (3, 1, 1, 4)
+  TEST (4, 1, 1, 4)
+  TEST (5, 1, 1, 4)
+  TEST (6, 1, 1, 4)
+  TEST (7, 1, 1, 4)
+  TEST (0, 2, 1, 4)
+  TEST (1, 2, 1, 4)
+  TEST (2, 2, 1, 4)
+  TEST (3, 2, 1, 4)
+  TEST (4, 2, 1, 4)
+  TEST (5, 2, 1, 4)
+  TEST (6, 2, 1, 4)
+  TEST (7, 2, 1, 4)
+  TEST (0, 3, 1, 4)
+  TEST (1, 3, 1, 4)
+  TEST (2, 3, 1, 4)
+  TEST (3, 3, 1, 4)
+  TEST (4, 3, 1, 4)
+  TEST (5, 3, 1, 4)
+  TEST (6, 3, 1, 4)
+  TEST (7, 3, 1, 4)
+  TEST (0, 4, 1, 4)
+  TEST (1, 4, 1, 4)
+  TEST (2, 4, 1, 4)
+  TEST (3, 4, 1, 4)
+  TEST (4, 4, 1, 4)
+  TEST (5, 4, 1, 4)
+  TEST (6, 4, 1, 4)
+  TEST (7, 4, 1, 4)
+  TEST (0, 5, 1, 4)
+  TEST (1, 5, 1, 4)
+  TEST (2, 5, 1, 4)
+  TEST (3, 5, 1, 4)
+  TEST (4, 5, 1, 4)
+  TEST (5, 5, 1, 4)
+  TEST (6, 5, 1, 4)
+  TEST (7, 5, 1, 4)
+  TEST (0, 6, 1, 4)
+  TEST (1, 6, 1, 4)
+  TEST (2, 6, 1, 4)
+  TEST (3, 6, 1, 4)
+  TEST (4, 6, 1, 4)
+  TEST (5, 6, 1, 4)
+  TEST (6, 6, 1, 4)
+  TEST (7, 6, 1, 4)
+  TEST (0, 7, 1, 4)
+  TEST (1, 7, 1, 4)
+  TEST (2, 7, 1, 4)
+  TEST (3, 7, 1, 4)
+  TEST (4, 7, 1, 4)
+  TEST (5, 7, 1, 4)
+  TEST (6, 7, 1, 4)
+  TEST (7, 7, 1, 4)
+}
+
+void check17(void)
+{
+  TEST (0, 0, 2, 4)
+  TEST (1, 0, 2, 4)
+  TEST (2, 0, 2, 4)
+  TEST (3, 0, 2, 4)
+  TEST (4, 0, 2, 4)
+  TEST (5, 0, 2, 4)
+  TEST (6, 0, 2, 4)
+  TEST (7, 0, 2, 4)
+  TEST (0, 1, 2, 4)
+  TEST (1, 1, 2, 4)
+  TEST (2, 1, 2, 4)
+  TEST (3, 1, 2, 4)
+  TEST (4, 1, 2, 4)
+  TEST (5, 1, 2, 4)
+  TEST (6, 1, 2, 4)
+  TEST (7, 1, 2, 4)
+  TEST (0, 2, 2, 4)
+  TEST (1, 2, 2, 4)
+  TEST (2, 2, 2, 4)
+  TEST (3, 2, 2, 4)
+  TEST (4, 2, 2, 4)
+  TEST (5, 2, 2, 4)
+  TEST (6, 2, 2, 4)
+  TEST (7, 2, 2, 4)
+  TEST (0, 3, 2, 4)
+  TEST (1, 3, 2, 4)
+  TEST (2, 3, 2, 4)
+  TEST (3, 3, 2, 4)
+  TEST (4, 3, 2, 4)
+  TEST (5, 3, 2, 4)
+  TEST (6, 3, 2, 4)
+  TEST (7, 3, 2, 4)
+  TEST (0, 4, 2, 4)
+  TEST (1, 4, 2, 4)
+  TEST (2, 4, 2, 4)
+  TEST (3, 4, 2, 4)
+  TEST (4, 4, 2, 4)
+  TEST (5, 4, 2, 4)
+  TEST (6, 4, 2, 4)
+  TEST (7, 4, 2, 4)
+  TEST (0, 5, 2, 4)
+  TEST (1, 5, 2, 4)
+  TEST (2, 5, 2, 4)
+  TEST (3, 5, 2, 4)
+  TEST (4, 5, 2, 4)
+  TEST (5, 5, 2, 4)
+  TEST (6, 5, 2, 4)
+  TEST (7, 5, 2, 4)
+  TEST (0, 6, 2, 4)
+  TEST (1, 6, 2, 4)
+  TEST (2, 6, 2, 4)
+  TEST (3, 6, 2, 4)
+  TEST (4, 6, 2, 4)
+  TEST (5, 6, 2, 4)
+  TEST (6, 6, 2, 4)
+  TEST (7, 6, 2, 4)
+  TEST (0, 7, 2, 4)
+  TEST (1, 7, 2, 4)
+  TEST (2, 7, 2, 4)
+  TEST (3, 7, 2, 4)
+  TEST (4, 7, 2, 4)
+  TEST (5, 7, 2, 4)
+  TEST (6, 7, 2, 4)
+  TEST (7, 7, 2, 4)
+  TEST (0, 0, 3, 4)
+  TEST (1, 0, 3, 4)
+  TEST (2, 0, 3, 4)
+  TEST (3, 0, 3, 4)
+  TEST (4, 0, 3, 4)
+  TEST (5, 0, 3, 4)
+  TEST (6, 0, 3, 4)
+  TEST (7, 0, 3, 4)
+  TEST (0, 1, 3, 4)
+  TEST (1, 1, 3, 4)
+  TEST (2, 1, 3, 4)
+  TEST (3, 1, 3, 4)
+  TEST (4, 1, 3, 4)
+  TEST (5, 1, 3, 4)
+  TEST (6, 1, 3, 4)
+  TEST (7, 1, 3, 4)
+  TEST (0, 2, 3, 4)
+  TEST (1, 2, 3, 4)
+  TEST (2, 2, 3, 4)
+  TEST (3, 2, 3, 4)
+  TEST (4, 2, 3, 4)
+  TEST (5, 2, 3, 4)
+  TEST (6, 2, 3, 4)
+  TEST (7, 2, 3, 4)
+  TEST (0, 3, 3, 4)
+  TEST (1, 3, 3, 4)
+  TEST (2, 3, 3, 4)
+  TEST (3, 3, 3, 4)
+  TEST (4, 3, 3, 4)
+  TEST (5, 3, 3, 4)
+  TEST (6, 3, 3, 4)
+  TEST (7, 3, 3, 4)
+  TEST (0, 4, 3, 4)
+  TEST (1, 4, 3, 4)
+  TEST (2, 4, 3, 4)
+  TEST (3, 4, 3, 4)
+  TEST (4, 4, 3, 4)
+  TEST (5, 4, 3, 4)
+  TEST (6, 4, 3, 4)
+  TEST (7, 4, 3, 4)
+  TEST (0, 5, 3, 4)
+  TEST (1, 5, 3, 4)
+  TEST (2, 5, 3, 4)
+  TEST (3, 5, 3, 4)
+  TEST (4, 5, 3, 4)
+  TEST (5, 5, 3, 4)
+  TEST (6, 5, 3, 4)
+  TEST (7, 5, 3, 4)
+  TEST (0, 6, 3, 4)
+  TEST (1, 6, 3, 4)
+  TEST (2, 6, 3, 4)
+  TEST (3, 6, 3, 4)
+  TEST (4, 6, 3, 4)
+  TEST (5, 6, 3, 4)
+  TEST (6, 6, 3, 4)
+  TEST (7, 6, 3, 4)
+  TEST (0, 7, 3, 4)
+  TEST (1, 7, 3, 4)
+  TEST (2, 7, 3, 4)
+  TEST (3, 7, 3, 4)
+  TEST (4, 7, 3, 4)
+  TEST (5, 7, 3, 4)
+  TEST (6, 7, 3, 4)
+  TEST (7, 7, 3, 4)
+}
+
+void check18(void)
+{
+  TEST (0, 0, 4, 4)
+  TEST (1, 0, 4, 4)
+  TEST (2, 0, 4, 4)
+  TEST (3, 0, 4, 4)
+  TEST (4, 0, 4, 4)
+  TEST (5, 0, 4, 4)
+  TEST (6, 0, 4, 4)
+  TEST (7, 0, 4, 4)
+  TEST (0, 1, 4, 4)
+  TEST (1, 1, 4, 4)
+  TEST (2, 1, 4, 4)
+  TEST (3, 1, 4, 4)
+  TEST (4, 1, 4, 4)
+  TEST (5, 1, 4, 4)
+  TEST (6, 1, 4, 4)
+  TEST (7, 1, 4, 4)
+  TEST (0, 2, 4, 4)
+  TEST (1, 2, 4, 4)
+  TEST (2, 2, 4, 4)
+  TEST (3, 2, 4, 4)
+  TEST (4, 2, 4, 4)
+  TEST (5, 2, 4, 4)
+  TEST (6, 2, 4, 4)
+  TEST (7, 2, 4, 4)
+  TEST (0, 3, 4, 4)
+  TEST (1, 3, 4, 4)
+  TEST (2, 3, 4, 4)
+  TEST (3, 3, 4, 4)
+  TEST (4, 3, 4, 4)
+  TEST (5, 3, 4, 4)
+  TEST (6, 3, 4, 4)
+  TEST (7, 3, 4, 4)
+  TEST (0, 4, 4, 4)
+  TEST (1, 4, 4, 4)
+  TEST (2, 4, 4, 4)
+  TEST (3, 4, 4, 4)
+  TEST (4, 4, 4, 4)
+  TEST (5, 4, 4, 4)
+  TEST (6, 4, 4, 4)
+  TEST (7, 4, 4, 4)
+  TEST (0, 5, 4, 4)
+  TEST (1, 5, 4, 4)
+  TEST (2, 5, 4, 4)
+  TEST (3, 5, 4, 4)
+  TEST (4, 5, 4, 4)
+  TEST (5, 5, 4, 4)
+  TEST (6, 5, 4, 4)
+  TEST (7, 5, 4, 4)
+  TEST (0, 6, 4, 4)
+  TEST (1, 6, 4, 4)
+  TEST (2, 6, 4, 4)
+  TEST (3, 6, 4, 4)
+  TEST (4, 6, 4, 4)
+  TEST (5, 6, 4, 4)
+  TEST (6, 6, 4, 4)
+  TEST (7, 6, 4, 4)
+  TEST (0, 7, 4, 4)
+  TEST (1, 7, 4, 4)
+  TEST (2, 7, 4, 4)
+  TEST (3, 7, 4, 4)
+  TEST (4, 7, 4, 4)
+  TEST (5, 7, 4, 4)
+  TEST (6, 7, 4, 4)
+  TEST (7, 7, 4, 4)
+  TEST (0, 0, 5, 4)
+  TEST (1, 0, 5, 4)
+  TEST (2, 0, 5, 4)
+  TEST (3, 0, 5, 4)
+  TEST (4, 0, 5, 4)
+  TEST (5, 0, 5, 4)
+  TEST (6, 0, 5, 4)
+  TEST (7, 0, 5, 4)
+  TEST (0, 1, 5, 4)
+  TEST (1, 1, 5, 4)
+  TEST (2, 1, 5, 4)
+  TEST (3, 1, 5, 4)
+  TEST (4, 1, 5, 4)
+  TEST (5, 1, 5, 4)
+  TEST (6, 1, 5, 4)
+  TEST (7, 1, 5, 4)
+  TEST (0, 2, 5, 4)
+  TEST (1, 2, 5, 4)
+  TEST (2, 2, 5, 4)
+  TEST (3, 2, 5, 4)
+  TEST (4, 2, 5, 4)
+  TEST (5, 2, 5, 4)
+  TEST (6, 2, 5, 4)
+  TEST (7, 2, 5, 4)
+  TEST (0, 3, 5, 4)
+  TEST (1, 3, 5, 4)
+  TEST (2, 3, 5, 4)
+  TEST (3, 3, 5, 4)
+  TEST (4, 3, 5, 4)
+  TEST (5, 3, 5, 4)
+  TEST (6, 3, 5, 4)
+  TEST (7, 3, 5, 4)
+  TEST (0, 4, 5, 4)
+  TEST (1, 4, 5, 4)
+  TEST (2, 4, 5, 4)
+  TEST (3, 4, 5, 4)
+  TEST (4, 4, 5, 4)
+  TEST (5, 4, 5, 4)
+  TEST (6, 4, 5, 4)
+  TEST (7, 4, 5, 4)
+  TEST (0, 5, 5, 4)
+  TEST (1, 5, 5, 4)
+  TEST (2, 5, 5, 4)
+  TEST (3, 5, 5, 4)
+  TEST (4, 5, 5, 4)
+  TEST (5, 5, 5, 4)
+  TEST (6, 5, 5, 4)
+  TEST (7, 5, 5, 4)
+  TEST (0, 6, 5, 4)
+  TEST (1, 6, 5, 4)
+  TEST (2, 6, 5, 4)
+  TEST (3, 6, 5, 4)
+  TEST (4, 6, 5, 4)
+  TEST (5, 6, 5, 4)
+  TEST (6, 6, 5, 4)
+  TEST (7, 6, 5, 4)
+  TEST (0, 7, 5, 4)
+  TEST (1, 7, 5, 4)
+  TEST (2, 7, 5, 4)
+  TEST (3, 7, 5, 4)
+  TEST (4, 7, 5, 4)
+  TEST (5, 7, 5, 4)
+  TEST (6, 7, 5, 4)
+  TEST (7, 7, 5, 4)
+}
+
+void check19(void)
+{
+  TEST (0, 0, 6, 4)
+  TEST (1, 0, 6, 4)
+  TEST (2, 0, 6, 4)
+  TEST (3, 0, 6, 4)
+  TEST (4, 0, 6, 4)
+  TEST (5, 0, 6, 4)
+  TEST (6, 0, 6, 4)
+  TEST (7, 0, 6, 4)
+  TEST (0, 1, 6, 4)
+  TEST (1, 1, 6, 4)
+  TEST (2, 1, 6, 4)
+  TEST (3, 1, 6, 4)
+  TEST (4, 1, 6, 4)
+  TEST (5, 1, 6, 4)
+  TEST (6, 1, 6, 4)
+  TEST (7, 1, 6, 4)
+  TEST (0, 2, 6, 4)
+  TEST (1, 2, 6, 4)
+  TEST (2, 2, 6, 4)
+  TEST (3, 2, 6, 4)
+  TEST (4, 2, 6, 4)
+  TEST (5, 2, 6, 4)
+  TEST (6, 2, 6, 4)
+  TEST (7, 2, 6, 4)
+  TEST (0, 3, 6, 4)
+  TEST (1, 3, 6, 4)
+  TEST (2, 3, 6, 4)
+  TEST (3, 3, 6, 4)
+  TEST (4, 3, 6, 4)
+  TEST (5, 3, 6, 4)
+  TEST (6, 3, 6, 4)
+  TEST (7, 3, 6, 4)
+  TEST (0, 4, 6, 4)
+  TEST (1, 4, 6, 4)
+  TEST (2, 4, 6, 4)
+  TEST (3, 4, 6, 4)
+  TEST (4, 4, 6, 4)
+  TEST (5, 4, 6, 4)
+  TEST (6, 4, 6, 4)
+  TEST (7, 4, 6, 4)
+  TEST (0, 5, 6, 4)
+  TEST (1, 5, 6, 4)
+  TEST (2, 5, 6, 4)
+  TEST (3, 5, 6, 4)
+  TEST (4, 5, 6, 4)
+  TEST (5, 5, 6, 4)
+  TEST (6, 5, 6, 4)
+  TEST (7, 5, 6, 4)
+  TEST (0, 6, 6, 4)
+  TEST (1, 6, 6, 4)
+  TEST (2, 6, 6, 4)
+  TEST (3, 6, 6, 4)
+  TEST (4, 6, 6, 4)
+  TEST (5, 6, 6, 4)
+  TEST (6, 6, 6, 4)
+  TEST (7, 6, 6, 4)
+  TEST (0, 7, 6, 4)
+  TEST (1, 7, 6, 4)
+  TEST (2, 7, 6, 4)
+  TEST (3, 7, 6, 4)
+  TEST (4, 7, 6, 4)
+  TEST (5, 7, 6, 4)
+  TEST (6, 7, 6, 4)
+  TEST (7, 7, 6, 4)
+  TEST (0, 0, 7, 4)
+  TEST (1, 0, 7, 4)
+  TEST (2, 0, 7, 4)
+  TEST (3, 0, 7, 4)
+  TEST (4, 0, 7, 4)
+  TEST (5, 0, 7, 4)
+  TEST (6, 0, 7, 4)
+  TEST (7, 0, 7, 4)
+  TEST (0, 1, 7, 4)
+  TEST (1, 1, 7, 4)
+  TEST (2, 1, 7, 4)
+  TEST (3, 1, 7, 4)
+  TEST (4, 1, 7, 4)
+  TEST (5, 1, 7, 4)
+  TEST (6, 1, 7, 4)
+  TEST (7, 1, 7, 4)
+  TEST (0, 2, 7, 4)
+  TEST (1, 2, 7, 4)
+  TEST (2, 2, 7, 4)
+  TEST (3, 2, 7, 4)
+  TEST (4, 2, 7, 4)
+  TEST (5, 2, 7, 4)
+  TEST (6, 2, 7, 4)
+  TEST (7, 2, 7, 4)
+  TEST (0, 3, 7, 4)
+  TEST (1, 3, 7, 4)
+  TEST (2, 3, 7, 4)
+  TEST (3, 3, 7, 4)
+  TEST (4, 3, 7, 4)
+  TEST (5, 3, 7, 4)
+  TEST (6, 3, 7, 4)
+  TEST (7, 3, 7, 4)
+  TEST (0, 4, 7, 4)
+  TEST (1, 4, 7, 4)
+  TEST (2, 4, 7, 4)
+  TEST (3, 4, 7, 4)
+  TEST (4, 4, 7, 4)
+  TEST (5, 4, 7, 4)
+  TEST (6, 4, 7, 4)
+  TEST (7, 4, 7, 4)
+  TEST (0, 5, 7, 4)
+  TEST (1, 5, 7, 4)
+  TEST (2, 5, 7, 4)
+  TEST (3, 5, 7, 4)
+  TEST (4, 5, 7, 4)
+  TEST (5, 5, 7, 4)
+  TEST (6, 5, 7, 4)
+  TEST (7, 5, 7, 4)
+  TEST (0, 6, 7, 4)
+  TEST (1, 6, 7, 4)
+  TEST (2, 6, 7, 4)
+  TEST (3, 6, 7, 4)
+  TEST (4, 6, 7, 4)
+  TEST (5, 6, 7, 4)
+  TEST (6, 6, 7, 4)
+  TEST (7, 6, 7, 4)
+  TEST (0, 7, 7, 4)
+  TEST (1, 7, 7, 4)
+  TEST (2, 7, 7, 4)
+  TEST (3, 7, 7, 4)
+  TEST (4, 7, 7, 4)
+  TEST (5, 7, 7, 4)
+  TEST (6, 7, 7, 4)
+  TEST (7, 7, 7, 4)
+}
+
+void check20(void)
+{
+  TEST (0, 0, 0, 5)
+  TEST (1, 0, 0, 5)
+  TEST (2, 0, 0, 5)
+  TEST (3, 0, 0, 5)
+  TEST (4, 0, 0, 5)
+  TEST (5, 0, 0, 5)
+  TEST (6, 0, 0, 5)
+  TEST (7, 0, 0, 5)
+  TEST (0, 1, 0, 5)
+  TEST (1, 1, 0, 5)
+  TEST (2, 1, 0, 5)
+  TEST (3, 1, 0, 5)
+  TEST (4, 1, 0, 5)
+  TEST (5, 1, 0, 5)
+  TEST (6, 1, 0, 5)
+  TEST (7, 1, 0, 5)
+  TEST (0, 2, 0, 5)
+  TEST (1, 2, 0, 5)
+  TEST (2, 2, 0, 5)
+  TEST (3, 2, 0, 5)
+  TEST (4, 2, 0, 5)
+  TEST (5, 2, 0, 5)
+  TEST (6, 2, 0, 5)
+  TEST (7, 2, 0, 5)
+  TEST (0, 3, 0, 5)
+  TEST (1, 3, 0, 5)
+  TEST (2, 3, 0, 5)
+  TEST (3, 3, 0, 5)
+  TEST (4, 3, 0, 5)
+  TEST (5, 3, 0, 5)
+  TEST (6, 3, 0, 5)
+  TEST (7, 3, 0, 5)
+  TEST (0, 4, 0, 5)
+  TEST (1, 4, 0, 5)
+  TEST (2, 4, 0, 5)
+  TEST (3, 4, 0, 5)
+  TEST (4, 4, 0, 5)
+  TEST (5, 4, 0, 5)
+  TEST (6, 4, 0, 5)
+  TEST (7, 4, 0, 5)
+  TEST (0, 5, 0, 5)
+  TEST (1, 5, 0, 5)
+  TEST (2, 5, 0, 5)
+  TEST (3, 5, 0, 5)
+  TEST (4, 5, 0, 5)
+  TEST (5, 5, 0, 5)
+  TEST (6, 5, 0, 5)
+  TEST (7, 5, 0, 5)
+  TEST (0, 6, 0, 5)
+  TEST (1, 6, 0, 5)
+  TEST (2, 6, 0, 5)
+  TEST (3, 6, 0, 5)
+  TEST (4, 6, 0, 5)
+  TEST (5, 6, 0, 5)
+  TEST (6, 6, 0, 5)
+  TEST (7, 6, 0, 5)
+  TEST (0, 7, 0, 5)
+  TEST (1, 7, 0, 5)
+  TEST (2, 7, 0, 5)
+  TEST (3, 7, 0, 5)
+  TEST (4, 7, 0, 5)
+  TEST (5, 7, 0, 5)
+  TEST (6, 7, 0, 5)
+  TEST (7, 7, 0, 5)
+  TEST (0, 0, 1, 5)
+  TEST (1, 0, 1, 5)
+  TEST (2, 0, 1, 5)
+  TEST (3, 0, 1, 5)
+  TEST (4, 0, 1, 5)
+  TEST (5, 0, 1, 5)
+  TEST (6, 0, 1, 5)
+  TEST (7, 0, 1, 5)
+  TEST (0, 1, 1, 5)
+  TEST (1, 1, 1, 5)
+  TEST (2, 1, 1, 5)
+  TEST (3, 1, 1, 5)
+  TEST (4, 1, 1, 5)
+  TEST (5, 1, 1, 5)
+  TEST (6, 1, 1, 5)
+  TEST (7, 1, 1, 5)
+  TEST (0, 2, 1, 5)
+  TEST (1, 2, 1, 5)
+  TEST (2, 2, 1, 5)
+  TEST (3, 2, 1, 5)
+  TEST (4, 2, 1, 5)
+  TEST (5, 2, 1, 5)
+  TEST (6, 2, 1, 5)
+  TEST (7, 2, 1, 5)
+  TEST (0, 3, 1, 5)
+  TEST (1, 3, 1, 5)
+  TEST (2, 3, 1, 5)
+  TEST (3, 3, 1, 5)
+  TEST (4, 3, 1, 5)
+  TEST (5, 3, 1, 5)
+  TEST (6, 3, 1, 5)
+  TEST (7, 3, 1, 5)
+  TEST (0, 4, 1, 5)
+  TEST (1, 4, 1, 5)
+  TEST (2, 4, 1, 5)
+  TEST (3, 4, 1, 5)
+  TEST (4, 4, 1, 5)
+  TEST (5, 4, 1, 5)
+  TEST (6, 4, 1, 5)
+  TEST (7, 4, 1, 5)
+  TEST (0, 5, 1, 5)
+  TEST (1, 5, 1, 5)
+  TEST (2, 5, 1, 5)
+  TEST (3, 5, 1, 5)
+  TEST (4, 5, 1, 5)
+  TEST (5, 5, 1, 5)
+  TEST (6, 5, 1, 5)
+  TEST (7, 5, 1, 5)
+  TEST (0, 6, 1, 5)
+  TEST (1, 6, 1, 5)
+  TEST (2, 6, 1, 5)
+  TEST (3, 6, 1, 5)
+  TEST (4, 6, 1, 5)
+  TEST (5, 6, 1, 5)
+  TEST (6, 6, 1, 5)
+  TEST (7, 6, 1, 5)
+  TEST (0, 7, 1, 5)
+  TEST (1, 7, 1, 5)
+  TEST (2, 7, 1, 5)
+  TEST (3, 7, 1, 5)
+  TEST (4, 7, 1, 5)
+  TEST (5, 7, 1, 5)
+  TEST (6, 7, 1, 5)
+  TEST (7, 7, 1, 5)
+}
+
+void check21(void)
+{
+  TEST (0, 0, 2, 5)
+  TEST (1, 0, 2, 5)
+  TEST (2, 0, 2, 5)
+  TEST (3, 0, 2, 5)
+  TEST (4, 0, 2, 5)
+  TEST (5, 0, 2, 5)
+  TEST (6, 0, 2, 5)
+  TEST (7, 0, 2, 5)
+  TEST (0, 1, 2, 5)
+  TEST (1, 1, 2, 5)
+  TEST (2, 1, 2, 5)
+  TEST (3, 1, 2, 5)
+  TEST (4, 1, 2, 5)
+  TEST (5, 1, 2, 5)
+  TEST (6, 1, 2, 5)
+  TEST (7, 1, 2, 5)
+  TEST (0, 2, 2, 5)
+  TEST (1, 2, 2, 5)
+  TEST (2, 2, 2, 5)
+  TEST (3, 2, 2, 5)
+  TEST (4, 2, 2, 5)
+  TEST (5, 2, 2, 5)
+  TEST (6, 2, 2, 5)
+  TEST (7, 2, 2, 5)
+  TEST (0, 3, 2, 5)
+  TEST (1, 3, 2, 5)
+  TEST (2, 3, 2, 5)
+  TEST (3, 3, 2, 5)
+  TEST (4, 3, 2, 5)
+  TEST (5, 3, 2, 5)
+  TEST (6, 3, 2, 5)
+  TEST (7, 3, 2, 5)
+  TEST (0, 4, 2, 5)
+  TEST (1, 4, 2, 5)
+  TEST (2, 4, 2, 5)
+  TEST (3, 4, 2, 5)
+  TEST (4, 4, 2, 5)
+  TEST (5, 4, 2, 5)
+  TEST (6, 4, 2, 5)
+  TEST (7, 4, 2, 5)
+  TEST (0, 5, 2, 5)
+  TEST (1, 5, 2, 5)
+  TEST (2, 5, 2, 5)
+  TEST (3, 5, 2, 5)
+  TEST (4, 5, 2, 5)
+  TEST (5, 5, 2, 5)
+  TEST (6, 5, 2, 5)
+  TEST (7, 5, 2, 5)
+  TEST (0, 6, 2, 5)
+  TEST (1, 6, 2, 5)
+  TEST (2, 6, 2, 5)
+  TEST (3, 6, 2, 5)
+  TEST (4, 6, 2, 5)
+  TEST (5, 6, 2, 5)
+  TEST (6, 6, 2, 5)
+  TEST (7, 6, 2, 5)
+  TEST (0, 7, 2, 5)
+  TEST (1, 7, 2, 5)
+  TEST (2, 7, 2, 5)
+  TEST (3, 7, 2, 5)
+  TEST (4, 7, 2, 5)
+  TEST (5, 7, 2, 5)
+  TEST (6, 7, 2, 5)
+  TEST (7, 7, 2, 5)
+  TEST (0, 0, 3, 5)
+  TEST (1, 0, 3, 5)
+  TEST (2, 0, 3, 5)
+  TEST (3, 0, 3, 5)
+  TEST (4, 0, 3, 5)
+  TEST (5, 0, 3, 5)
+  TEST (6, 0, 3, 5)
+  TEST (7, 0, 3, 5)
+  TEST (0, 1, 3, 5)
+  TEST (1, 1, 3, 5)
+  TEST (2, 1, 3, 5)
+  TEST (3, 1, 3, 5)
+  TEST (4, 1, 3, 5)
+  TEST (5, 1, 3, 5)
+  TEST (6, 1, 3, 5)
+  TEST (7, 1, 3, 5)
+  TEST (0, 2, 3, 5)
+  TEST (1, 2, 3, 5)
+  TEST (2, 2, 3, 5)
+  TEST (3, 2, 3, 5)
+  TEST (4, 2, 3, 5)
+  TEST (5, 2, 3, 5)
+  TEST (6, 2, 3, 5)
+  TEST (7, 2, 3, 5)
+  TEST (0, 3, 3, 5)
+  TEST (1, 3, 3, 5)
+  TEST (2, 3, 3, 5)
+  TEST (3, 3, 3, 5)
+  TEST (4, 3, 3, 5)
+  TEST (5, 3, 3, 5)
+  TEST (6, 3, 3, 5)
+  TEST (7, 3, 3, 5)
+  TEST (0, 4, 3, 5)
+  TEST (1, 4, 3, 5)
+  TEST (2, 4, 3, 5)
+  TEST (3, 4, 3, 5)
+  TEST (4, 4, 3, 5)
+  TEST (5, 4, 3, 5)
+  TEST (6, 4, 3, 5)
+  TEST (7, 4, 3, 5)
+  TEST (0, 5, 3, 5)
+  TEST (1, 5, 3, 5)
+  TEST (2, 5, 3, 5)
+  TEST (3, 5, 3, 5)
+  TEST (4, 5, 3, 5)
+  TEST (5, 5, 3, 5)
+  TEST (6, 5, 3, 5)
+  TEST (7, 5, 3, 5)
+  TEST (0, 6, 3, 5)
+  TEST (1, 6, 3, 5)
+  TEST (2, 6, 3, 5)
+  TEST (3, 6, 3, 5)
+  TEST (4, 6, 3, 5)
+  TEST (5, 6, 3, 5)
+  TEST (6, 6, 3, 5)
+  TEST (7, 6, 3, 5)
+  TEST (0, 7, 3, 5)
+  TEST (1, 7, 3, 5)
+  TEST (2, 7, 3, 5)
+  TEST (3, 7, 3, 5)
+  TEST (4, 7, 3, 5)
+  TEST (5, 7, 3, 5)
+  TEST (6, 7, 3, 5)
+  TEST (7, 7, 3, 5)
+}
+
+void check22(void)
+{
+  TEST (0, 0, 4, 5)
+  TEST (1, 0, 4, 5)
+  TEST (2, 0, 4, 5)
+  TEST (3, 0, 4, 5)
+  TEST (4, 0, 4, 5)
+  TEST (5, 0, 4, 5)
+  TEST (6, 0, 4, 5)
+  TEST (7, 0, 4, 5)
+  TEST (0, 1, 4, 5)
+  TEST (1, 1, 4, 5)
+  TEST (2, 1, 4, 5)
+  TEST (3, 1, 4, 5)
+  TEST (4, 1, 4, 5)
+  TEST (5, 1, 4, 5)
+  TEST (6, 1, 4, 5)
+  TEST (7, 1, 4, 5)
+  TEST (0, 2, 4, 5)
+  TEST (1, 2, 4, 5)
+  TEST (2, 2, 4, 5)
+  TEST (3, 2, 4, 5)
+  TEST (4, 2, 4, 5)
+  TEST (5, 2, 4, 5)
+  TEST (6, 2, 4, 5)
+  TEST (7, 2, 4, 5)
+  TEST (0, 3, 4, 5)
+  TEST (1, 3, 4, 5)
+  TEST (2, 3, 4, 5)
+  TEST (3, 3, 4, 5)
+  TEST (4, 3, 4, 5)
+  TEST (5, 3, 4, 5)
+  TEST (6, 3, 4, 5)
+  TEST (7, 3, 4, 5)
+  TEST (0, 4, 4, 5)
+  TEST (1, 4, 4, 5)
+  TEST (2, 4, 4, 5)
+  TEST (3, 4, 4, 5)
+  TEST (4, 4, 4, 5)
+  TEST (5, 4, 4, 5)
+  TEST (6, 4, 4, 5)
+  TEST (7, 4, 4, 5)
+  TEST (0, 5, 4, 5)
+  TEST (1, 5, 4, 5)
+  TEST (2, 5, 4, 5)
+  TEST (3, 5, 4, 5)
+  TEST (4, 5, 4, 5)
+  TEST (5, 5, 4, 5)
+  TEST (6, 5, 4, 5)
+  TEST (7, 5, 4, 5)
+  TEST (0, 6, 4, 5)
+  TEST (1, 6, 4, 5)
+  TEST (2, 6, 4, 5)
+  TEST (3, 6, 4, 5)
+  TEST (4, 6, 4, 5)
+  TEST (5, 6, 4, 5)
+  TEST (6, 6, 4, 5)
+  TEST (7, 6, 4, 5)
+  TEST (0, 7, 4, 5)
+  TEST (1, 7, 4, 5)
+  TEST (2, 7, 4, 5)
+  TEST (3, 7, 4, 5)
+  TEST (4, 7, 4, 5)
+  TEST (5, 7, 4, 5)
+  TEST (6, 7, 4, 5)
+  TEST (7, 7, 4, 5)
+  TEST (0, 0, 5, 5)
+  TEST (1, 0, 5, 5)
+  TEST (2, 0, 5, 5)
+  TEST (3, 0, 5, 5)
+  TEST (4, 0, 5, 5)
+  TEST (5, 0, 5, 5)
+  TEST (6, 0, 5, 5)
+  TEST (7, 0, 5, 5)
+  TEST (0, 1, 5, 5)
+  TEST (1, 1, 5, 5)
+  TEST (2, 1, 5, 5)
+  TEST (3, 1, 5, 5)
+  TEST (4, 1, 5, 5)
+  TEST (5, 1, 5, 5)
+  TEST (6, 1, 5, 5)
+  TEST (7, 1, 5, 5)
+  TEST (0, 2, 5, 5)
+  TEST (1, 2, 5, 5)
+  TEST (2, 2, 5, 5)
+  TEST (3, 2, 5, 5)
+  TEST (4, 2, 5, 5)
+  TEST (5, 2, 5, 5)
+  TEST (6, 2, 5, 5)
+  TEST (7, 2, 5, 5)
+  TEST (0, 3, 5, 5)
+  TEST (1, 3, 5, 5)
+  TEST (2, 3, 5, 5)
+  TEST (3, 3, 5, 5)
+  TEST (4, 3, 5, 5)
+  TEST (5, 3, 5, 5)
+  TEST (6, 3, 5, 5)
+  TEST (7, 3, 5, 5)
+  TEST (0, 4, 5, 5)
+  TEST (1, 4, 5, 5)
+  TEST (2, 4, 5, 5)
+  TEST (3, 4, 5, 5)
+  TEST (4, 4, 5, 5)
+  TEST (5, 4, 5, 5)
+  TEST (6, 4, 5, 5)
+  TEST (7, 4, 5, 5)
+  TEST (0, 5, 5, 5)
+  TEST (1, 5, 5, 5)
+  TEST (2, 5, 5, 5)
+  TEST (3, 5, 5, 5)
+  TEST (4, 5, 5, 5)
+  TEST (5, 5, 5, 5)
+  TEST (6, 5, 5, 5)
+  TEST (7, 5, 5, 5)
+  TEST (0, 6, 5, 5)
+  TEST (1, 6, 5, 5)
+  TEST (2, 6, 5, 5)
+  TEST (3, 6, 5, 5)
+  TEST (4, 6, 5, 5)
+  TEST (5, 6, 5, 5)
+  TEST (6, 6, 5, 5)
+  TEST (7, 6, 5, 5)
+  TEST (0, 7, 5, 5)
+  TEST (1, 7, 5, 5)
+  TEST (2, 7, 5, 5)
+  TEST (3, 7, 5, 5)
+  TEST (4, 7, 5, 5)
+  TEST (5, 7, 5, 5)
+  TEST (6, 7, 5, 5)
+  TEST (7, 7, 5, 5)
+}
+
+void check23(void)
+{
+  TEST (0, 0, 6, 5)
+  TEST (1, 0, 6, 5)
+  TEST (2, 0, 6, 5)
+  TEST (3, 0, 6, 5)
+  TEST (4, 0, 6, 5)
+  TEST (5, 0, 6, 5)
+  TEST (6, 0, 6, 5)
+  TEST (7, 0, 6, 5)
+  TEST (0, 1, 6, 5)
+  TEST (1, 1, 6, 5)
+  TEST (2, 1, 6, 5)
+  TEST (3, 1, 6, 5)
+  TEST (4, 1, 6, 5)
+  TEST (5, 1, 6, 5)
+  TEST (6, 1, 6, 5)
+  TEST (7, 1, 6, 5)
+  TEST (0, 2, 6, 5)
+  TEST (1, 2, 6, 5)
+  TEST (2, 2, 6, 5)
+  TEST (3, 2, 6, 5)
+  TEST (4, 2, 6, 5)
+  TEST (5, 2, 6, 5)
+  TEST (6, 2, 6, 5)
+  TEST (7, 2, 6, 5)
+  TEST (0, 3, 6, 5)
+  TEST (1, 3, 6, 5)
+  TEST (2, 3, 6, 5)
+  TEST (3, 3, 6, 5)
+  TEST (4, 3, 6, 5)
+  TEST (5, 3, 6, 5)
+  TEST (6, 3, 6, 5)
+  TEST (7, 3, 6, 5)
+  TEST (0, 4, 6, 5)
+  TEST (1, 4, 6, 5)
+  TEST (2, 4, 6, 5)
+  TEST (3, 4, 6, 5)
+  TEST (4, 4, 6, 5)
+  TEST (5, 4, 6, 5)
+  TEST (6, 4, 6, 5)
+  TEST (7, 4, 6, 5)
+  TEST (0, 5, 6, 5)
+  TEST (1, 5, 6, 5)
+  TEST (2, 5, 6, 5)
+  TEST (3, 5, 6, 5)
+  TEST (4, 5, 6, 5)
+  TEST (5, 5, 6, 5)
+  TEST (6, 5, 6, 5)
+  TEST (7, 5, 6, 5)
+  TEST (0, 6, 6, 5)
+  TEST (1, 6, 6, 5)
+  TEST (2, 6, 6, 5)
+  TEST (3, 6, 6, 5)
+  TEST (4, 6, 6, 5)
+  TEST (5, 6, 6, 5)
+  TEST (6, 6, 6, 5)
+  TEST (7, 6, 6, 5)
+  TEST (0, 7, 6, 5)
+  TEST (1, 7, 6, 5)
+  TEST (2, 7, 6, 5)
+  TEST (3, 7, 6, 5)
+  TEST (4, 7, 6, 5)
+  TEST (5, 7, 6, 5)
+  TEST (6, 7, 6, 5)
+  TEST (7, 7, 6, 5)
+  TEST (0, 0, 7, 5)
+  TEST (1, 0, 7, 5)
+  TEST (2, 0, 7, 5)
+  TEST (3, 0, 7, 5)
+  TEST (4, 0, 7, 5)
+  TEST (5, 0, 7, 5)
+  TEST (6, 0, 7, 5)
+  TEST (7, 0, 7, 5)
+  TEST (0, 1, 7, 5)
+  TEST (1, 1, 7, 5)
+  TEST (2, 1, 7, 5)
+  TEST (3, 1, 7, 5)
+  TEST (4, 1, 7, 5)
+  TEST (5, 1, 7, 5)
+  TEST (6, 1, 7, 5)
+  TEST (7, 1, 7, 5)
+  TEST (0, 2, 7, 5)
+  TEST (1, 2, 7, 5)
+  TEST (2, 2, 7, 5)
+  TEST (3, 2, 7, 5)
+  TEST (4, 2, 7, 5)
+  TEST (5, 2, 7, 5)
+  TEST (6, 2, 7, 5)
+  TEST (7, 2, 7, 5)
+  TEST (0, 3, 7, 5)
+  TEST (1, 3, 7, 5)
+  TEST (2, 3, 7, 5)
+  TEST (3, 3, 7, 5)
+  TEST (4, 3, 7, 5)
+  TEST (5, 3, 7, 5)
+  TEST (6, 3, 7, 5)
+  TEST (7, 3, 7, 5)
+  TEST (0, 4, 7, 5)
+  TEST (1, 4, 7, 5)
+  TEST (2, 4, 7, 5)
+  TEST (3, 4, 7, 5)
+  TEST (4, 4, 7, 5)
+  TEST (5, 4, 7, 5)
+  TEST (6, 4, 7, 5)
+  TEST (7, 4, 7, 5)
+  TEST (0, 5, 7, 5)
+  TEST (1, 5, 7, 5)
+  TEST (2, 5, 7, 5)
+  TEST (3, 5, 7, 5)
+  TEST (4, 5, 7, 5)
+  TEST (5, 5, 7, 5)
+  TEST (6, 5, 7, 5)
+  TEST (7, 5, 7, 5)
+  TEST (0, 6, 7, 5)
+  TEST (1, 6, 7, 5)
+  TEST (2, 6, 7, 5)
+  TEST (3, 6, 7, 5)
+  TEST (4, 6, 7, 5)
+  TEST (5, 6, 7, 5)
+  TEST (6, 6, 7, 5)
+  TEST (7, 6, 7, 5)
+  TEST (0, 7, 7, 5)
+  TEST (1, 7, 7, 5)
+  TEST (2, 7, 7, 5)
+  TEST (3, 7, 7, 5)
+  TEST (4, 7, 7, 5)
+  TEST (5, 7, 7, 5)
+  TEST (6, 7, 7, 5)
+  TEST (7, 7, 7, 5)
+}
+
+void check24(void)
+{
+  TEST (0, 0, 0, 6)
+  TEST (1, 0, 0, 6)
+  TEST (2, 0, 0, 6)
+  TEST (3, 0, 0, 6)
+  TEST (4, 0, 0, 6)
+  TEST (5, 0, 0, 6)
+  TEST (6, 0, 0, 6)
+  TEST (7, 0, 0, 6)
+  TEST (0, 1, 0, 6)
+  TEST (1, 1, 0, 6)
+  TEST (2, 1, 0, 6)
+  TEST (3, 1, 0, 6)
+  TEST (4, 1, 0, 6)
+  TEST (5, 1, 0, 6)
+  TEST (6, 1, 0, 6)
+  TEST (7, 1, 0, 6)
+  TEST (0, 2, 0, 6)
+  TEST (1, 2, 0, 6)
+  TEST (2, 2, 0, 6)
+  TEST (3, 2, 0, 6)
+  TEST (4, 2, 0, 6)
+  TEST (5, 2, 0, 6)
+  TEST (6, 2, 0, 6)
+  TEST (7, 2, 0, 6)
+  TEST (0, 3, 0, 6)
+  TEST (1, 3, 0, 6)
+  TEST (2, 3, 0, 6)
+  TEST (3, 3, 0, 6)
+  TEST (4, 3, 0, 6)
+  TEST (5, 3, 0, 6)
+  TEST (6, 3, 0, 6)
+  TEST (7, 3, 0, 6)
+  TEST (0, 4, 0, 6)
+  TEST (1, 4, 0, 6)
+  TEST (2, 4, 0, 6)
+  TEST (3, 4, 0, 6)
+  TEST (4, 4, 0, 6)
+  TEST (5, 4, 0, 6)
+  TEST (6, 4, 0, 6)
+  TEST (7, 4, 0, 6)
+  TEST (0, 5, 0, 6)
+  TEST (1, 5, 0, 6)
+  TEST (2, 5, 0, 6)
+  TEST (3, 5, 0, 6)
+  TEST (4, 5, 0, 6)
+  TEST (5, 5, 0, 6)
+  TEST (6, 5, 0, 6)
+  TEST (7, 5, 0, 6)
+  TEST (0, 6, 0, 6)
+  TEST (1, 6, 0, 6)
+  TEST (2, 6, 0, 6)
+  TEST (3, 6, 0, 6)
+  TEST (4, 6, 0, 6)
+  TEST (5, 6, 0, 6)
+  TEST (6, 6, 0, 6)
+  TEST (7, 6, 0, 6)
+  TEST (0, 7, 0, 6)
+  TEST (1, 7, 0, 6)
+  TEST (2, 7, 0, 6)
+  TEST (3, 7, 0, 6)
+  TEST (4, 7, 0, 6)
+  TEST (5, 7, 0, 6)
+  TEST (6, 7, 0, 6)
+  TEST (7, 7, 0, 6)
+  TEST (0, 0, 1, 6)
+  TEST (1, 0, 1, 6)
+  TEST (2, 0, 1, 6)
+  TEST (3, 0, 1, 6)
+  TEST (4, 0, 1, 6)
+  TEST (5, 0, 1, 6)
+  TEST (6, 0, 1, 6)
+  TEST (7, 0, 1, 6)
+  TEST (0, 1, 1, 6)
+  TEST (1, 1, 1, 6)
+  TEST (2, 1, 1, 6)
+  TEST (3, 1, 1, 6)
+  TEST (4, 1, 1, 6)
+  TEST (5, 1, 1, 6)
+  TEST (6, 1, 1, 6)
+  TEST (7, 1, 1, 6)
+  TEST (0, 2, 1, 6)
+  TEST (1, 2, 1, 6)
+  TEST (2, 2, 1, 6)
+  TEST (3, 2, 1, 6)
+  TEST (4, 2, 1, 6)
+  TEST (5, 2, 1, 6)
+  TEST (6, 2, 1, 6)
+  TEST (7, 2, 1, 6)
+  TEST (0, 3, 1, 6)
+  TEST (1, 3, 1, 6)
+  TEST (2, 3, 1, 6)
+  TEST (3, 3, 1, 6)
+  TEST (4, 3, 1, 6)
+  TEST (5, 3, 1, 6)
+  TEST (6, 3, 1, 6)
+  TEST (7, 3, 1, 6)
+  TEST (0, 4, 1, 6)
+  TEST (1, 4, 1, 6)
+  TEST (2, 4, 1, 6)
+  TEST (3, 4, 1, 6)
+  TEST (4, 4, 1, 6)
+  TEST (5, 4, 1, 6)
+  TEST (6, 4, 1, 6)
+  TEST (7, 4, 1, 6)
+  TEST (0, 5, 1, 6)
+  TEST (1, 5, 1, 6)
+  TEST (2, 5, 1, 6)
+  TEST (3, 5, 1, 6)
+  TEST (4, 5, 1, 6)
+  TEST (5, 5, 1, 6)
+  TEST (6, 5, 1, 6)
+  TEST (7, 5, 1, 6)
+  TEST (0, 6, 1, 6)
+  TEST (1, 6, 1, 6)
+  TEST (2, 6, 1, 6)
+  TEST (3, 6, 1, 6)
+  TEST (4, 6, 1, 6)
+  TEST (5, 6, 1, 6)
+  TEST (6, 6, 1, 6)
+  TEST (7, 6, 1, 6)
+  TEST (0, 7, 1, 6)
+  TEST (1, 7, 1, 6)
+  TEST (2, 7, 1, 6)
+  TEST (3, 7, 1, 6)
+  TEST (4, 7, 1, 6)
+  TEST (5, 7, 1, 6)
+  TEST (6, 7, 1, 6)
+  TEST (7, 7, 1, 6)
+}
+
+void check25(void)
+{
+  TEST (0, 0, 2, 6)
+  TEST (1, 0, 2, 6)
+  TEST (2, 0, 2, 6)
+  TEST (3, 0, 2, 6)
+  TEST (4, 0, 2, 6)
+  TEST (5, 0, 2, 6)
+  TEST (6, 0, 2, 6)
+  TEST (7, 0, 2, 6)
+  TEST (0, 1, 2, 6)
+  TEST (1, 1, 2, 6)
+  TEST (2, 1, 2, 6)
+  TEST (3, 1, 2, 6)
+  TEST (4, 1, 2, 6)
+  TEST (5, 1, 2, 6)
+  TEST (6, 1, 2, 6)
+  TEST (7, 1, 2, 6)
+  TEST (0, 2, 2, 6)
+  TEST (1, 2, 2, 6)
+  TEST (2, 2, 2, 6)
+  TEST (3, 2, 2, 6)
+  TEST (4, 2, 2, 6)
+  TEST (5, 2, 2, 6)
+  TEST (6, 2, 2, 6)
+  TEST (7, 2, 2, 6)
+  TEST (0, 3, 2, 6)
+  TEST (1, 3, 2, 6)
+  TEST (2, 3, 2, 6)
+  TEST (3, 3, 2, 6)
+  TEST (4, 3, 2, 6)
+  TEST (5, 3, 2, 6)
+  TEST (6, 3, 2, 6)
+  TEST (7, 3, 2, 6)
+  TEST (0, 4, 2, 6)
+  TEST (1, 4, 2, 6)
+  TEST (2, 4, 2, 6)
+  TEST (3, 4, 2, 6)
+  TEST (4, 4, 2, 6)
+  TEST (5, 4, 2, 6)
+  TEST (6, 4, 2, 6)
+  TEST (7, 4, 2, 6)
+  TEST (0, 5, 2, 6)
+  TEST (1, 5, 2, 6)
+  TEST (2, 5, 2, 6)
+  TEST (3, 5, 2, 6)
+  TEST (4, 5, 2, 6)
+  TEST (5, 5, 2, 6)
+  TEST (6, 5, 2, 6)
+  TEST (7, 5, 2, 6)
+  TEST (0, 6, 2, 6)
+  TEST (1, 6, 2, 6)
+  TEST (2, 6, 2, 6)
+  TEST (3, 6, 2, 6)
+  TEST (4, 6, 2, 6)
+  TEST (5, 6, 2, 6)
+  TEST (6, 6, 2, 6)
+  TEST (7, 6, 2, 6)
+  TEST (0, 7, 2, 6)
+  TEST (1, 7, 2, 6)
+  TEST (2, 7, 2, 6)
+  TEST (3, 7, 2, 6)
+  TEST (4, 7, 2, 6)
+  TEST (5, 7, 2, 6)
+  TEST (6, 7, 2, 6)
+  TEST (7, 7, 2, 6)
+  TEST (0, 0, 3, 6)
+  TEST (1, 0, 3, 6)
+  TEST (2, 0, 3, 6)
+  TEST (3, 0, 3, 6)
+  TEST (4, 0, 3, 6)
+  TEST (5, 0, 3, 6)
+  TEST (6, 0, 3, 6)
+  TEST (7, 0, 3, 6)
+  TEST (0, 1, 3, 6)
+  TEST (1, 1, 3, 6)
+  TEST (2, 1, 3, 6)
+  TEST (3, 1, 3, 6)
+  TEST (4, 1, 3, 6)
+  TEST (5, 1, 3, 6)
+  TEST (6, 1, 3, 6)
+  TEST (7, 1, 3, 6)
+  TEST (0, 2, 3, 6)
+  TEST (1, 2, 3, 6)
+  TEST (2, 2, 3, 6)
+  TEST (3, 2, 3, 6)
+  TEST (4, 2, 3, 6)
+  TEST (5, 2, 3, 6)
+  TEST (6, 2, 3, 6)
+  TEST (7, 2, 3, 6)
+  TEST (0, 3, 3, 6)
+  TEST (1, 3, 3, 6)
+  TEST (2, 3, 3, 6)
+  TEST (3, 3, 3, 6)
+  TEST (4, 3, 3, 6)
+  TEST (5, 3, 3, 6)
+  TEST (6, 3, 3, 6)
+  TEST (7, 3, 3, 6)
+  TEST (0, 4, 3, 6)
+  TEST (1, 4, 3, 6)
+  TEST (2, 4, 3, 6)
+  TEST (3, 4, 3, 6)
+  TEST (4, 4, 3, 6)
+  TEST (5, 4, 3, 6)
+  TEST (6, 4, 3, 6)
+  TEST (7, 4, 3, 6)
+  TEST (0, 5, 3, 6)
+  TEST (1, 5, 3, 6)
+  TEST (2, 5, 3, 6)
+  TEST (3, 5, 3, 6)
+  TEST (4, 5, 3, 6)
+  TEST (5, 5, 3, 6)
+  TEST (6, 5, 3, 6)
+  TEST (7, 5, 3, 6)
+  TEST (0, 6, 3, 6)
+  TEST (1, 6, 3, 6)
+  TEST (2, 6, 3, 6)
+  TEST (3, 6, 3, 6)
+  TEST (4, 6, 3, 6)
+  TEST (5, 6, 3, 6)
+  TEST (6, 6, 3, 6)
+  TEST (7, 6, 3, 6)
+  TEST (0, 7, 3, 6)
+  TEST (1, 7, 3, 6)
+  TEST (2, 7, 3, 6)
+  TEST (3, 7, 3, 6)
+  TEST (4, 7, 3, 6)
+  TEST (5, 7, 3, 6)
+  TEST (6, 7, 3, 6)
+  TEST (7, 7, 3, 6)
+}
+
+void check26(void)
+{
+  TEST (0, 0, 4, 6)
+  TEST (1, 0, 4, 6)
+  TEST (2, 0, 4, 6)
+  TEST (3, 0, 4, 6)
+  TEST (4, 0, 4, 6)
+  TEST (5, 0, 4, 6)
+  TEST (6, 0, 4, 6)
+  TEST (7, 0, 4, 6)
+  TEST (0, 1, 4, 6)
+  TEST (1, 1, 4, 6)
+  TEST (2, 1, 4, 6)
+  TEST (3, 1, 4, 6)
+  TEST (4, 1, 4, 6)
+  TEST (5, 1, 4, 6)
+  TEST (6, 1, 4, 6)
+  TEST (7, 1, 4, 6)
+  TEST (0, 2, 4, 6)
+  TEST (1, 2, 4, 6)
+  TEST (2, 2, 4, 6)
+  TEST (3, 2, 4, 6)
+  TEST (4, 2, 4, 6)
+  TEST (5, 2, 4, 6)
+  TEST (6, 2, 4, 6)
+  TEST (7, 2, 4, 6)
+  TEST (0, 3, 4, 6)
+  TEST (1, 3, 4, 6)
+  TEST (2, 3, 4, 6)
+  TEST (3, 3, 4, 6)
+  TEST (4, 3, 4, 6)
+  TEST (5, 3, 4, 6)
+  TEST (6, 3, 4, 6)
+  TEST (7, 3, 4, 6)
+  TEST (0, 4, 4, 6)
+  TEST (1, 4, 4, 6)
+  TEST (2, 4, 4, 6)
+  TEST (3, 4, 4, 6)
+  TEST (4, 4, 4, 6)
+  TEST (5, 4, 4, 6)
+  TEST (6, 4, 4, 6)
+  TEST (7, 4, 4, 6)
+  TEST (0, 5, 4, 6)
+  TEST (1, 5, 4, 6)
+  TEST (2, 5, 4, 6)
+  TEST (3, 5, 4, 6)
+  TEST (4, 5, 4, 6)
+  TEST (5, 5, 4, 6)
+  TEST (6, 5, 4, 6)
+  TEST (7, 5, 4, 6)
+  TEST (0, 6, 4, 6)
+  TEST (1, 6, 4, 6)
+  TEST (2, 6, 4, 6)
+  TEST (3, 6, 4, 6)
+  TEST (4, 6, 4, 6)
+  TEST (5, 6, 4, 6)
+  TEST (6, 6, 4, 6)
+  TEST (7, 6, 4, 6)
+  TEST (0, 7, 4, 6)
+  TEST (1, 7, 4, 6)
+  TEST (2, 7, 4, 6)
+  TEST (3, 7, 4, 6)
+  TEST (4, 7, 4, 6)
+  TEST (5, 7, 4, 6)
+  TEST (6, 7, 4, 6)
+  TEST (7, 7, 4, 6)
+  TEST (0, 0, 5, 6)
+  TEST (1, 0, 5, 6)
+  TEST (2, 0, 5, 6)
+  TEST (3, 0, 5, 6)
+  TEST (4, 0, 5, 6)
+  TEST (5, 0, 5, 6)
+  TEST (6, 0, 5, 6)
+  TEST (7, 0, 5, 6)
+  TEST (0, 1, 5, 6)
+  TEST (1, 1, 5, 6)
+  TEST (2, 1, 5, 6)
+  TEST (3, 1, 5, 6)
+  TEST (4, 1, 5, 6)
+  TEST (5, 1, 5, 6)
+  TEST (6, 1, 5, 6)
+  TEST (7, 1, 5, 6)
+  TEST (0, 2, 5, 6)
+  TEST (1, 2, 5, 6)
+  TEST (2, 2, 5, 6)
+  TEST (3, 2, 5, 6)
+  TEST (4, 2, 5, 6)
+  TEST (5, 2, 5, 6)
+  TEST (6, 2, 5, 6)
+  TEST (7, 2, 5, 6)
+  TEST (0, 3, 5, 6)
+  TEST (1, 3, 5, 6)
+  TEST (2, 3, 5, 6)
+  TEST (3, 3, 5, 6)
+  TEST (4, 3, 5, 6)
+  TEST (5, 3, 5, 6)
+  TEST (6, 3, 5, 6)
+  TEST (7, 3, 5, 6)
+  TEST (0, 4, 5, 6)
+  TEST (1, 4, 5, 6)
+  TEST (2, 4, 5, 6)
+  TEST (3, 4, 5, 6)
+  TEST (4, 4, 5, 6)
+  TEST (5, 4, 5, 6)
+  TEST (6, 4, 5, 6)
+  TEST (7, 4, 5, 6)
+  TEST (0, 5, 5, 6)
+  TEST (1, 5, 5, 6)
+  TEST (2, 5, 5, 6)
+  TEST (3, 5, 5, 6)
+  TEST (4, 5, 5, 6)
+  TEST (5, 5, 5, 6)
+  TEST (6, 5, 5, 6)
+  TEST (7, 5, 5, 6)
+  TEST (0, 6, 5, 6)
+  TEST (1, 6, 5, 6)
+  TEST (2, 6, 5, 6)
+  TEST (3, 6, 5, 6)
+  TEST (4, 6, 5, 6)
+  TEST (5, 6, 5, 6)
+  TEST (6, 6, 5, 6)
+  TEST (7, 6, 5, 6)
+  TEST (0, 7, 5, 6)
+  TEST (1, 7, 5, 6)
+  TEST (2, 7, 5, 6)
+  TEST (3, 7, 5, 6)
+  TEST (4, 7, 5, 6)
+  TEST (5, 7, 5, 6)
+  TEST (6, 7, 5, 6)
+  TEST (7, 7, 5, 6)
+}
+
+void check27(void)
+{
+  TEST (0, 0, 6, 6)
+  TEST (1, 0, 6, 6)
+  TEST (2, 0, 6, 6)
+  TEST (3, 0, 6, 6)
+  TEST (4, 0, 6, 6)
+  TEST (5, 0, 6, 6)
+  TEST (6, 0, 6, 6)
+  TEST (7, 0, 6, 6)
+  TEST (0, 1, 6, 6)
+  TEST (1, 1, 6, 6)
+  TEST (2, 1, 6, 6)
+  TEST (3, 1, 6, 6)
+  TEST (4, 1, 6, 6)
+  TEST (5, 1, 6, 6)
+  TEST (6, 1, 6, 6)
+  TEST (7, 1, 6, 6)
+  TEST (0, 2, 6, 6)
+  TEST (1, 2, 6, 6)
+  TEST (2, 2, 6, 6)
+  TEST (3, 2, 6, 6)
+  TEST (4, 2, 6, 6)
+  TEST (5, 2, 6, 6)
+  TEST (6, 2, 6, 6)
+  TEST (7, 2, 6, 6)
+  TEST (0, 3, 6, 6)
+  TEST (1, 3, 6, 6)
+  TEST (2, 3, 6, 6)
+  TEST (3, 3, 6, 6)
+  TEST (4, 3, 6, 6)
+  TEST (5, 3, 6, 6)
+  TEST (6, 3, 6, 6)
+  TEST (7, 3, 6, 6)
+  TEST (0, 4, 6, 6)
+  TEST (1, 4, 6, 6)
+  TEST (2, 4, 6, 6)
+  TEST (3, 4, 6, 6)
+  TEST (4, 4, 6, 6)
+  TEST (5, 4, 6, 6)
+  TEST (6, 4, 6, 6)
+  TEST (7, 4, 6, 6)
+  TEST (0, 5, 6, 6)
+  TEST (1, 5, 6, 6)
+  TEST (2, 5, 6, 6)
+  TEST (3, 5, 6, 6)
+  TEST (4, 5, 6, 6)
+  TEST (5, 5, 6, 6)
+  TEST (6, 5, 6, 6)
+  TEST (7, 5, 6, 6)
+  TEST (0, 6, 6, 6)
+  TEST (1, 6, 6, 6)
+  TEST (2, 6, 6, 6)
+  TEST (3, 6, 6, 6)
+  TEST (4, 6, 6, 6)
+  TEST (5, 6, 6, 6)
+  TEST (6, 6, 6, 6)
+  TEST (7, 6, 6, 6)
+  TEST (0, 7, 6, 6)
+  TEST (1, 7, 6, 6)
+  TEST (2, 7, 6, 6)
+  TEST (3, 7, 6, 6)
+  TEST (4, 7, 6, 6)
+  TEST (5, 7, 6, 6)
+  TEST (6, 7, 6, 6)
+  TEST (7, 7, 6, 6)
+  TEST (0, 0, 7, 6)
+  TEST (1, 0, 7, 6)
+  TEST (2, 0, 7, 6)
+  TEST (3, 0, 7, 6)
+  TEST (4, 0, 7, 6)
+  TEST (5, 0, 7, 6)
+  TEST (6, 0, 7, 6)
+  TEST (7, 0, 7, 6)
+  TEST (0, 1, 7, 6)
+  TEST (1, 1, 7, 6)
+  TEST (2, 1, 7, 6)
+  TEST (3, 1, 7, 6)
+  TEST (4, 1, 7, 6)
+  TEST (5, 1, 7, 6)
+  TEST (6, 1, 7, 6)
+  TEST (7, 1, 7, 6)
+  TEST (0, 2, 7, 6)
+  TEST (1, 2, 7, 6)
+  TEST (2, 2, 7, 6)
+  TEST (3, 2, 7, 6)
+  TEST (4, 2, 7, 6)
+  TEST (5, 2, 7, 6)
+  TEST (6, 2, 7, 6)
+  TEST (7, 2, 7, 6)
+  TEST (0, 3, 7, 6)
+  TEST (1, 3, 7, 6)
+  TEST (2, 3, 7, 6)
+  TEST (3, 3, 7, 6)
+  TEST (4, 3, 7, 6)
+  TEST (5, 3, 7, 6)
+  TEST (6, 3, 7, 6)
+  TEST (7, 3, 7, 6)
+  TEST (0, 4, 7, 6)
+  TEST (1, 4, 7, 6)
+  TEST (2, 4, 7, 6)
+  TEST (3, 4, 7, 6)
+  TEST (4, 4, 7, 6)
+  TEST (5, 4, 7, 6)
+  TEST (6, 4, 7, 6)
+  TEST (7, 4, 7, 6)
+  TEST (0, 5, 7, 6)
+  TEST (1, 5, 7, 6)
+  TEST (2, 5, 7, 6)
+  TEST (3, 5, 7, 6)
+  TEST (4, 5, 7, 6)
+  TEST (5, 5, 7, 6)
+  TEST (6, 5, 7, 6)
+  TEST (7, 5, 7, 6)
+  TEST (0, 6, 7, 6)
+  TEST (1, 6, 7, 6)
+  TEST (2, 6, 7, 6)
+  TEST (3, 6, 7, 6)
+  TEST (4, 6, 7, 6)
+  TEST (5, 6, 7, 6)
+  TEST (6, 6, 7, 6)
+  TEST (7, 6, 7, 6)
+  TEST (0, 7, 7, 6)
+  TEST (1, 7, 7, 6)
+  TEST (2, 7, 7, 6)
+  TEST (3, 7, 7, 6)
+  TEST (4, 7, 7, 6)
+  TEST (5, 7, 7, 6)
+  TEST (6, 7, 7, 6)
+  TEST (7, 7, 7, 6)
+}
+
+void check28(void)
+{
+  TEST (0, 0, 0, 7)
+  TEST (1, 0, 0, 7)
+  TEST (2, 0, 0, 7)
+  TEST (3, 0, 0, 7)
+  TEST (4, 0, 0, 7)
+  TEST (5, 0, 0, 7)
+  TEST (6, 0, 0, 7)
+  TEST (7, 0, 0, 7)
+  TEST (0, 1, 0, 7)
+  TEST (1, 1, 0, 7)
+  TEST (2, 1, 0, 7)
+  TEST (3, 1, 0, 7)
+  TEST (4, 1, 0, 7)
+  TEST (5, 1, 0, 7)
+  TEST (6, 1, 0, 7)
+  TEST (7, 1, 0, 7)
+  TEST (0, 2, 0, 7)
+  TEST (1, 2, 0, 7)
+  TEST (2, 2, 0, 7)
+  TEST (3, 2, 0, 7)
+  TEST (4, 2, 0, 7)
+  TEST (5, 2, 0, 7)
+  TEST (6, 2, 0, 7)
+  TEST (7, 2, 0, 7)
+  TEST (0, 3, 0, 7)
+  TEST (1, 3, 0, 7)
+  TEST (2, 3, 0, 7)
+  TEST (3, 3, 0, 7)
+  TEST (4, 3, 0, 7)
+  TEST (5, 3, 0, 7)
+  TEST (6, 3, 0, 7)
+  TEST (7, 3, 0, 7)
+  TEST (0, 4, 0, 7)
+  TEST (1, 4, 0, 7)
+  TEST (2, 4, 0, 7)
+  TEST (3, 4, 0, 7)
+  TEST (4, 4, 0, 7)
+  TEST (5, 4, 0, 7)
+  TEST (6, 4, 0, 7)
+  TEST (7, 4, 0, 7)
+  TEST (0, 5, 0, 7)
+  TEST (1, 5, 0, 7)
+  TEST (2, 5, 0, 7)
+  TEST (3, 5, 0, 7)
+  TEST (4, 5, 0, 7)
+  TEST (5, 5, 0, 7)
+  TEST (6, 5, 0, 7)
+  TEST (7, 5, 0, 7)
+  TEST (0, 6, 0, 7)
+  TEST (1, 6, 0, 7)
+  TEST (2, 6, 0, 7)
+  TEST (3, 6, 0, 7)
+  TEST (4, 6, 0, 7)
+  TEST (5, 6, 0, 7)
+  TEST (6, 6, 0, 7)
+  TEST (7, 6, 0, 7)
+  TEST (0, 7, 0, 7)
+  TEST (1, 7, 0, 7)
+  TEST (2, 7, 0, 7)
+  TEST (3, 7, 0, 7)
+  TEST (4, 7, 0, 7)
+  TEST (5, 7, 0, 7)
+  TEST (6, 7, 0, 7)
+  TEST (7, 7, 0, 7)
+  TEST (0, 0, 1, 7)
+  TEST (1, 0, 1, 7)
+  TEST (2, 0, 1, 7)
+  TEST (3, 0, 1, 7)
+  TEST (4, 0, 1, 7)
+  TEST (5, 0, 1, 7)
+  TEST (6, 0, 1, 7)
+  TEST (7, 0, 1, 7)
+  TEST (0, 1, 1, 7)
+  TEST (1, 1, 1, 7)
+  TEST (2, 1, 1, 7)
+  TEST (3, 1, 1, 7)
+  TEST (4, 1, 1, 7)
+  TEST (5, 1, 1, 7)
+  TEST (6, 1, 1, 7)
+  TEST (7, 1, 1, 7)
+  TEST (0, 2, 1, 7)
+  TEST (1, 2, 1, 7)
+  TEST (2, 2, 1, 7)
+  TEST (3, 2, 1, 7)
+  TEST (4, 2, 1, 7)
+  TEST (5, 2, 1, 7)
+  TEST (6, 2, 1, 7)
+  TEST (7, 2, 1, 7)
+  TEST (0, 3, 1, 7)
+  TEST (1, 3, 1, 7)
+  TEST (2, 3, 1, 7)
+  TEST (3, 3, 1, 7)
+  TEST (4, 3, 1, 7)
+  TEST (5, 3, 1, 7)
+  TEST (6, 3, 1, 7)
+  TEST (7, 3, 1, 7)
+  TEST (0, 4, 1, 7)
+  TEST (1, 4, 1, 7)
+  TEST (2, 4, 1, 7)
+  TEST (3, 4, 1, 7)
+  TEST (4, 4, 1, 7)
+  TEST (5, 4, 1, 7)
+  TEST (6, 4, 1, 7)
+  TEST (7, 4, 1, 7)
+  TEST (0, 5, 1, 7)
+  TEST (1, 5, 1, 7)
+  TEST (2, 5, 1, 7)
+  TEST (3, 5, 1, 7)
+  TEST (4, 5, 1, 7)
+  TEST (5, 5, 1, 7)
+  TEST (6, 5, 1, 7)
+  TEST (7, 5, 1, 7)
+  TEST (0, 6, 1, 7)
+  TEST (1, 6, 1, 7)
+  TEST (2, 6, 1, 7)
+  TEST (3, 6, 1, 7)
+  TEST (4, 6, 1, 7)
+  TEST (5, 6, 1, 7)
+  TEST (6, 6, 1, 7)
+  TEST (7, 6, 1, 7)
+  TEST (0, 7, 1, 7)
+  TEST (1, 7, 1, 7)
+  TEST (2, 7, 1, 7)
+  TEST (3, 7, 1, 7)
+  TEST (4, 7, 1, 7)
+  TEST (5, 7, 1, 7)
+  TEST (6, 7, 1, 7)
+  TEST (7, 7, 1, 7)
+}
+
+void check29(void)
+{
+  TEST (0, 0, 2, 7)
+  TEST (1, 0, 2, 7)
+  TEST (2, 0, 2, 7)
+  TEST (3, 0, 2, 7)
+  TEST (4, 0, 2, 7)
+  TEST (5, 0, 2, 7)
+  TEST (6, 0, 2, 7)
+  TEST (7, 0, 2, 7)
+  TEST (0, 1, 2, 7)
+  TEST (1, 1, 2, 7)
+  TEST (2, 1, 2, 7)
+  TEST (3, 1, 2, 7)
+  TEST (4, 1, 2, 7)
+  TEST (5, 1, 2, 7)
+  TEST (6, 1, 2, 7)
+  TEST (7, 1, 2, 7)
+  TEST (0, 2, 2, 7)
+  TEST (1, 2, 2, 7)
+  TEST (2, 2, 2, 7)
+  TEST (3, 2, 2, 7)
+  TEST (4, 2, 2, 7)
+  TEST (5, 2, 2, 7)
+  TEST (6, 2, 2, 7)
+  TEST (7, 2, 2, 7)
+  TEST (0, 3, 2, 7)
+  TEST (1, 3, 2, 7)
+  TEST (2, 3, 2, 7)
+  TEST (3, 3, 2, 7)
+  TEST (4, 3, 2, 7)
+  TEST (5, 3, 2, 7)
+  TEST (6, 3, 2, 7)
+  TEST (7, 3, 2, 7)
+  TEST (0, 4, 2, 7)
+  TEST (1, 4, 2, 7)
+  TEST (2, 4, 2, 7)
+  TEST (3, 4, 2, 7)
+  TEST (4, 4, 2, 7)
+  TEST (5, 4, 2, 7)
+  TEST (6, 4, 2, 7)
+  TEST (7, 4, 2, 7)
+  TEST (0, 5, 2, 7)
+  TEST (1, 5, 2, 7)
+  TEST (2, 5, 2, 7)
+  TEST (3, 5, 2, 7)
+  TEST (4, 5, 2, 7)
+  TEST (5, 5, 2, 7)
+  TEST (6, 5, 2, 7)
+  TEST (7, 5, 2, 7)
+  TEST (0, 6, 2, 7)
+  TEST (1, 6, 2, 7)
+  TEST (2, 6, 2, 7)
+  TEST (3, 6, 2, 7)
+  TEST (4, 6, 2, 7)
+  TEST (5, 6, 2, 7)
+  TEST (6, 6, 2, 7)
+  TEST (7, 6, 2, 7)
+  TEST (0, 7, 2, 7)
+  TEST (1, 7, 2, 7)
+  TEST (2, 7, 2, 7)
+  TEST (3, 7, 2, 7)
+  TEST (4, 7, 2, 7)
+  TEST (5, 7, 2, 7)
+  TEST (6, 7, 2, 7)
+  TEST (7, 7, 2, 7)
+  TEST (0, 0, 3, 7)
+  TEST (1, 0, 3, 7)
+  TEST (2, 0, 3, 7)
+  TEST (3, 0, 3, 7)
+  TEST (4, 0, 3, 7)
+  TEST (5, 0, 3, 7)
+  TEST (6, 0, 3, 7)
+  TEST (7, 0, 3, 7)
+  TEST (0, 1, 3, 7)
+  TEST (1, 1, 3, 7)
+  TEST (2, 1, 3, 7)
+  TEST (3, 1, 3, 7)
+  TEST (4, 1, 3, 7)
+  TEST (5, 1, 3, 7)
+  TEST (6, 1, 3, 7)
+  TEST (7, 1, 3, 7)
+  TEST (0, 2, 3, 7)
+  TEST (1, 2, 3, 7)
+  TEST (2, 2, 3, 7)
+  TEST (3, 2, 3, 7)
+  TEST (4, 2, 3, 7)
+  TEST (5, 2, 3, 7)
+  TEST (6, 2, 3, 7)
+  TEST (7, 2, 3, 7)
+  TEST (0, 3, 3, 7)
+  TEST (1, 3, 3, 7)
+  TEST (2, 3, 3, 7)
+  TEST (3, 3, 3, 7)
+  TEST (4, 3, 3, 7)
+  TEST (5, 3, 3, 7)
+  TEST (6, 3, 3, 7)
+  TEST (7, 3, 3, 7)
+  TEST (0, 4, 3, 7)
+  TEST (1, 4, 3, 7)
+  TEST (2, 4, 3, 7)
+  TEST (3, 4, 3, 7)
+  TEST (4, 4, 3, 7)
+  TEST (5, 4, 3, 7)
+  TEST (6, 4, 3, 7)
+  TEST (7, 4, 3, 7)
+  TEST (0, 5, 3, 7)
+  TEST (1, 5, 3, 7)
+  TEST (2, 5, 3, 7)
+  TEST (3, 5, 3, 7)
+  TEST (4, 5, 3, 7)
+  TEST (5, 5, 3, 7)
+  TEST (6, 5, 3, 7)
+  TEST (7, 5, 3, 7)
+  TEST (0, 6, 3, 7)
+  TEST (1, 6, 3, 7)
+  TEST (2, 6, 3, 7)
+  TEST (3, 6, 3, 7)
+  TEST (4, 6, 3, 7)
+  TEST (5, 6, 3, 7)
+  TEST (6, 6, 3, 7)
+  TEST (7, 6, 3, 7)
+  TEST (0, 7, 3, 7)
+  TEST (1, 7, 3, 7)
+  TEST (2, 7, 3, 7)
+  TEST (3, 7, 3, 7)
+  TEST (4, 7, 3, 7)
+  TEST (5, 7, 3, 7)
+  TEST (6, 7, 3, 7)
+  TEST (7, 7, 3, 7)
+}
+
+void check30(void)
+{
+  TEST (0, 0, 4, 7)
+  TEST (1, 0, 4, 7)
+  TEST (2, 0, 4, 7)
+  TEST (3, 0, 4, 7)
+  TEST (4, 0, 4, 7)
+  TEST (5, 0, 4, 7)
+  TEST (6, 0, 4, 7)
+  TEST (7, 0, 4, 7)
+  TEST (0, 1, 4, 7)
+  TEST (1, 1, 4, 7)
+  TEST (2, 1, 4, 7)
+  TEST (3, 1, 4, 7)
+  TEST (4, 1, 4, 7)
+  TEST (5, 1, 4, 7)
+  TEST (6, 1, 4, 7)
+  TEST (7, 1, 4, 7)
+  TEST (0, 2, 4, 7)
+  TEST (1, 2, 4, 7)
+  TEST (2, 2, 4, 7)
+  TEST (3, 2, 4, 7)
+  TEST (4, 2, 4, 7)
+  TEST (5, 2, 4, 7)
+  TEST (6, 2, 4, 7)
+  TEST (7, 2, 4, 7)
+  TEST (0, 3, 4, 7)
+  TEST (1, 3, 4, 7)
+  TEST (2, 3, 4, 7)
+  TEST (3, 3, 4, 7)
+  TEST (4, 3, 4, 7)
+  TEST (5, 3, 4, 7)
+  TEST (6, 3, 4, 7)
+  TEST (7, 3, 4, 7)
+  TEST (0, 4, 4, 7)
+  TEST (1, 4, 4, 7)
+  TEST (2, 4, 4, 7)
+  TEST (3, 4, 4, 7)
+  TEST (4, 4, 4, 7)
+  TEST (5, 4, 4, 7)
+  TEST (6, 4, 4, 7)
+  TEST (7, 4, 4, 7)
+  TEST (0, 5, 4, 7)
+  TEST (1, 5, 4, 7)
+  TEST (2, 5, 4, 7)
+  TEST (3, 5, 4, 7)
+  TEST (4, 5, 4, 7)
+  TEST (5, 5, 4, 7)
+  TEST (6, 5, 4, 7)
+  TEST (7, 5, 4, 7)
+  TEST (0, 6, 4, 7)
+  TEST (1, 6, 4, 7)
+  TEST (2, 6, 4, 7)
+  TEST (3, 6, 4, 7)
+  TEST (4, 6, 4, 7)
+  TEST (5, 6, 4, 7)
+  TEST (6, 6, 4, 7)
+  TEST (7, 6, 4, 7)
+  TEST (0, 7, 4, 7)
+  TEST (1, 7, 4, 7)
+  TEST (2, 7, 4, 7)
+  TEST (3, 7, 4, 7)
+  TEST (4, 7, 4, 7)
+  TEST (5, 7, 4, 7)
+  TEST (6, 7, 4, 7)
+  TEST (7, 7, 4, 7)
+  TEST (0, 0, 5, 7)
+  TEST (1, 0, 5, 7)
+  TEST (2, 0, 5, 7)
+  TEST (3, 0, 5, 7)
+  TEST (4, 0, 5, 7)
+  TEST (5, 0, 5, 7)
+  TEST (6, 0, 5, 7)
+  TEST (7, 0, 5, 7)
+  TEST (0, 1, 5, 7)
+  TEST (1, 1, 5, 7)
+  TEST (2, 1, 5, 7)
+  TEST (3, 1, 5, 7)
+  TEST (4, 1, 5, 7)
+  TEST (5, 1, 5, 7)
+  TEST (6, 1, 5, 7)
+  TEST (7, 1, 5, 7)
+  TEST (0, 2, 5, 7)
+  TEST (1, 2, 5, 7)
+  TEST (2, 2, 5, 7)
+  TEST (3, 2, 5, 7)
+  TEST (4, 2, 5, 7)
+  TEST (5, 2, 5, 7)
+  TEST (6, 2, 5, 7)
+  TEST (7, 2, 5, 7)
+  TEST (0, 3, 5, 7)
+  TEST (1, 3, 5, 7)
+  TEST (2, 3, 5, 7)
+  TEST (3, 3, 5, 7)
+  TEST (4, 3, 5, 7)
+  TEST (5, 3, 5, 7)
+  TEST (6, 3, 5, 7)
+  TEST (7, 3, 5, 7)
+  TEST (0, 4, 5, 7)
+  TEST (1, 4, 5, 7)
+  TEST (2, 4, 5, 7)
+  TEST (3, 4, 5, 7)
+  TEST (4, 4, 5, 7)
+  TEST (5, 4, 5, 7)
+  TEST (6, 4, 5, 7)
+  TEST (7, 4, 5, 7)
+  TEST (0, 5, 5, 7)
+  TEST (1, 5, 5, 7)
+  TEST (2, 5, 5, 7)
+  TEST (3, 5, 5, 7)
+  TEST (4, 5, 5, 7)
+  TEST (5, 5, 5, 7)
+  TEST (6, 5, 5, 7)
+  TEST (7, 5, 5, 7)
+  TEST (0, 6, 5, 7)
+  TEST (1, 6, 5, 7)
+  TEST (2, 6, 5, 7)
+  TEST (3, 6, 5, 7)
+  TEST (4, 6, 5, 7)
+  TEST (5, 6, 5, 7)
+  TEST (6, 6, 5, 7)
+  TEST (7, 6, 5, 7)
+  TEST (0, 7, 5, 7)
+  TEST (1, 7, 5, 7)
+  TEST (2, 7, 5, 7)
+  TEST (3, 7, 5, 7)
+  TEST (4, 7, 5, 7)
+  TEST (5, 7, 5, 7)
+  TEST (6, 7, 5, 7)
+  TEST (7, 7, 5, 7)
+}
+
+void check31(void)
+{
+  TEST (0, 0, 6, 7)
+  TEST (1, 0, 6, 7)
+  TEST (2, 0, 6, 7)
+  TEST (3, 0, 6, 7)
+  TEST (4, 0, 6, 7)
+  TEST (5, 0, 6, 7)
+  TEST (6, 0, 6, 7)
+  TEST (7, 0, 6, 7)
+  TEST (0, 1, 6, 7)
+  TEST (1, 1, 6, 7)
+  TEST (2, 1, 6, 7)
+  TEST (3, 1, 6, 7)
+  TEST (4, 1, 6, 7)
+  TEST (5, 1, 6, 7)
+  TEST (6, 1, 6, 7)
+  TEST (7, 1, 6, 7)
+  TEST (0, 2, 6, 7)
+  TEST (1, 2, 6, 7)
+  TEST (2, 2, 6, 7)
+  TEST (3, 2, 6, 7)
+  TEST (4, 2, 6, 7)
+  TEST (5, 2, 6, 7)
+  TEST (6, 2, 6, 7)
+  TEST (7, 2, 6, 7)
+  TEST (0, 3, 6, 7)
+  TEST (1, 3, 6, 7)
+  TEST (2, 3, 6, 7)
+  TEST (3, 3, 6, 7)
+  TEST (4, 3, 6, 7)
+  TEST (5, 3, 6, 7)
+  TEST (6, 3, 6, 7)
+  TEST (7, 3, 6, 7)
+  TEST (0, 4, 6, 7)
+  TEST (1, 4, 6, 7)
+  TEST (2, 4, 6, 7)
+  TEST (3, 4, 6, 7)
+  TEST (4, 4, 6, 7)
+  TEST (5, 4, 6, 7)
+  TEST (6, 4, 6, 7)
+  TEST (7, 4, 6, 7)
+  TEST (0, 5, 6, 7)
+  TEST (1, 5, 6, 7)
+  TEST (2, 5, 6, 7)
+  TEST (3, 5, 6, 7)
+  TEST (4, 5, 6, 7)
+  TEST (5, 5, 6, 7)
+  TEST (6, 5, 6, 7)
+  TEST (7, 5, 6, 7)
+  TEST (0, 6, 6, 7)
+  TEST (1, 6, 6, 7)
+  TEST (2, 6, 6, 7)
+  TEST (3, 6, 6, 7)
+  TEST (4, 6, 6, 7)
+  TEST (5, 6, 6, 7)
+  TEST (6, 6, 6, 7)
+  TEST (7, 6, 6, 7)
+  TEST (0, 7, 6, 7)
+  TEST (1, 7, 6, 7)
+  TEST (2, 7, 6, 7)
+  TEST (3, 7, 6, 7)
+  TEST (4, 7, 6, 7)
+  TEST (5, 7, 6, 7)
+  TEST (6, 7, 6, 7)
+  TEST (7, 7, 6, 7)
+  TEST (0, 0, 7, 7)
+  TEST (1, 0, 7, 7)
+  TEST (2, 0, 7, 7)
+  TEST (3, 0, 7, 7)
+  TEST (4, 0, 7, 7)
+  TEST (5, 0, 7, 7)
+  TEST (6, 0, 7, 7)
+  TEST (7, 0, 7, 7)
+  TEST (0, 1, 7, 7)
+  TEST (1, 1, 7, 7)
+  TEST (2, 1, 7, 7)
+  TEST (3, 1, 7, 7)
+  TEST (4, 1, 7, 7)
+  TEST (5, 1, 7, 7)
+  TEST (6, 1, 7, 7)
+  TEST (7, 1, 7, 7)
+  TEST (0, 2, 7, 7)
+  TEST (1, 2, 7, 7)
+  TEST (2, 2, 7, 7)
+  TEST (3, 2, 7, 7)
+  TEST (4, 2, 7, 7)
+  TEST (5, 2, 7, 7)
+  TEST (6, 2, 7, 7)
+  TEST (7, 2, 7, 7)
+  TEST (0, 3, 7, 7)
+  TEST (1, 3, 7, 7)
+  TEST (2, 3, 7, 7)
+  TEST (3, 3, 7, 7)
+  TEST (4, 3, 7, 7)
+  TEST (5, 3, 7, 7)
+  TEST (6, 3, 7, 7)
+  TEST (7, 3, 7, 7)
+  TEST (0, 4, 7, 7)
+  TEST (1, 4, 7, 7)
+  TEST (2, 4, 7, 7)
+  TEST (3, 4, 7, 7)
+  TEST (4, 4, 7, 7)
+  TEST (5, 4, 7, 7)
+  TEST (6, 4, 7, 7)
+  TEST (7, 4, 7, 7)
+  TEST (0, 5, 7, 7)
+  TEST (1, 5, 7, 7)
+  TEST (2, 5, 7, 7)
+  TEST (3, 5, 7, 7)
+  TEST (4, 5, 7, 7)
+  TEST (5, 5, 7, 7)
+  TEST (6, 5, 7, 7)
+  TEST (7, 5, 7, 7)
+  TEST (0, 6, 7, 7)
+  TEST (1, 6, 7, 7)
+  TEST (2, 6, 7, 7)
+  TEST (3, 6, 7, 7)
+  TEST (4, 6, 7, 7)
+  TEST (5, 6, 7, 7)
+  TEST (6, 6, 7, 7)
+  TEST (7, 6, 7, 7)
+  TEST (0, 7, 7, 7)
+  TEST (1, 7, 7, 7)
+  TEST (2, 7, 7, 7)
+  TEST (3, 7, 7, 7)
+  TEST (4, 7, 7, 7)
+  TEST (5, 7, 7, 7)
+  TEST (6, 7, 7, 7)
+  TEST (7, 7, 7, 7)
+}
+
+void check(void)
+{
+  check0 ();
+  check1 ();
+  check2 ();
+  check3 ();
+  check4 ();
+  check5 ();
+  check6 ();
+  check7 ();
+  check8 ();
+  check9 ();
+  check10 ();
+  check11 ();
+  check12 ();
+  check13 ();
+  check14 ();
+  check15 ();
+  check16 ();
+  check17 ();
+  check18 ();
+  check19 ();
+  check20 ();
+  check21 ();
+  check22 ();
+  check23 ();
+  check24 ();
+  check25 ();
+  check26 ();
+  check27 ();
+  check28 ();
+  check29 ();
+  check30 ();
+  check31 ();
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/vperm-v2df.c b/gcc/testsuite/gcc.target/i386/vperm-v2df.c
new file mode 100644
index 0000000..f17e065
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vperm-v2df.c
@@ -0,0 +1,34 @@
+/* { dg-do run } */
+/* { dg-options "-O -msse2" } */
+
+#include "isa-check.h"
+
+typedef double S;
+typedef double V __attribute__((vector_size(16)));
+typedef long long IV __attribute__((vector_size(16)));
+typedef union { S s[2]; V v; } U;
+
+static U i[2], b, c;
+
+extern int memcmp (const void *, const void *, __SIZE_TYPE__);
+#define assert(T) ((T) || (__builtin_trap (), 0))
+
+#define TEST(E0, E1) \
+  b.v = __builtin_ia32_vec_perm_v2df (i[0].v, i[1].v, (IV){E0, E1}); \
+  c.s[0] = i[0].s[E0]; \
+  c.s[1] = i[0].s[E1]; \
+  __asm__("" : : : "memory"); \
+  assert (memcmp (&b, &c, sizeof(c)) == 0);
+
+#include "vperm-2-2.inc"
+
+int main()
+{
+  i[0].s[0] = 0;
+  i[0].s[1] = 1;
+  i[0].s[2] = 2;
+  i[0].s[3] = 3;
+
+  check();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/vperm-v2di.c b/gcc/testsuite/gcc.target/i386/vperm-v2di.c
new file mode 100644
index 0000000..c6fe561
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vperm-v2di.c
@@ -0,0 +1,34 @@
+/* { dg-do run } */
+/* { dg-options "-O -msse2" } */
+
+#include "isa-check.h"
+
+typedef long long S;
+typedef long long V __attribute__((vector_size(16)));
+typedef long long IV __attribute__((vector_size(16)));
+typedef union { S s[2]; V v; } U;
+
+static U i[2], b, c;
+
+extern int memcmp (const void *, const void *, __SIZE_TYPE__);
+#define assert(T) ((T) || (__builtin_trap (), 0))
+
+#define TEST(E0, E1) \
+  b.v = __builtin_ia32_vec_perm_v2di (i[0].v, i[1].v, (IV){E0, E1}); \
+  c.s[0] = i[0].s[E0]; \
+  c.s[1] = i[0].s[E1]; \
+  __asm__("" : : : "memory"); \
+  assert (memcmp (&b, &c, sizeof(c)) == 0);
+
+#include "vperm-2-2.inc"
+
+int main()
+{
+  i[0].s[0] = 0;
+  i[0].s[1] = 1;
+  i[0].s[2] = 2;
+  i[0].s[3] = 3;
+
+  check();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/vperm-v4sf-1.c b/gcc/testsuite/gcc.target/i386/vperm-v4sf-1.c
new file mode 100644
index 0000000..b9fc9b1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vperm-v4sf-1.c
@@ -0,0 +1,40 @@
+/* { dg-do run } */
+/* { dg-options "-O -msse" } */
+
+#include "isa-check.h"
+
+typedef float S;
+typedef float V __attribute__((vector_size(16)));
+typedef int IV __attribute__((vector_size(16)));
+typedef union { S s[4]; V v; } U;
+
+static U i[2], b, c;
+
+extern int memcmp (const void *, const void *, __SIZE_TYPE__);
+#define assert(T) ((T) || (__builtin_trap (), 0))
+
+#define TEST(E0, E1, E2, E3) \
+  b.v = __builtin_ia32_vec_perm_v4sf (i[0].v, i[1].v, (IV){E0, E1, E2, E3}); \
+  c.s[0] = i[0].s[E0]; \
+  c.s[1] = i[0].s[E1]; \
+  c.s[2] = i[0].s[E2]; \
+  c.s[3] = i[0].s[E3]; \
+  __asm__("" : : : "memory"); \
+  assert (memcmp (&b, &c, sizeof(c)) == 0);
+
+#include "vperm-4-1.inc"
+
+int main()
+{
+  i[0].s[0] = 0;
+  i[0].s[1] = 1;
+  i[0].s[2] = 2;
+  i[0].s[3] = 3;
+  i[0].s[4] = 4;
+  i[0].s[5] = 5;
+  i[0].s[6] = 6;
+  i[0].s[7] = 7;
+
+  check();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/vperm-v4sf-2.c b/gcc/testsuite/gcc.target/i386/vperm-v4sf-2.c
new file mode 100644
index 0000000..f81d241
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vperm-v4sf-2.c
@@ -0,0 +1,40 @@
+/* { dg-do run } */
+/* { dg-options "-O -mssse3" } */
+
+#include "isa-check.h"
+
+typedef float S;
+typedef float V __attribute__((vector_size(16)));
+typedef int IV __attribute__((vector_size(16)));
+typedef union { S s[4]; V v; } U;
+
+static U i[2], b, c;
+
+extern int memcmp (const void *, const void *, __SIZE_TYPE__);
+#define assert(T) ((T) || (__builtin_trap (), 0))
+
+#define TEST(E0, E1, E2, E3) \
+  b.v = __builtin_ia32_vec_perm_v4sf (i[0].v, i[1].v, (IV){E0, E1, E2, E3}); \
+  c.s[0] = i[0].s[E0]; \
+  c.s[1] = i[0].s[E1]; \
+  c.s[2] = i[0].s[E2]; \
+  c.s[3] = i[0].s[E3]; \
+  __asm__("" : : : "memory"); \
+  assert (memcmp (&b, &c, sizeof(c)) == 0);
+
+#include "vperm-4-2.inc"
+
+int main()
+{
+  i[0].s[0] = 0;
+  i[0].s[1] = 1;
+  i[0].s[2] = 2;
+  i[0].s[3] = 3;
+  i[0].s[4] = 4;
+  i[0].s[5] = 5;
+  i[0].s[6] = 6;
+  i[0].s[7] = 7;
+
+  check();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/vperm-v4si-1.c b/gcc/testsuite/gcc.target/i386/vperm-v4si-1.c
new file mode 100644
index 0000000..663feb3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vperm-v4si-1.c
@@ -0,0 +1,40 @@
+/* { dg-do run } */
+/* { dg-options "-O -msse2" } */
+
+#include "isa-check.h"
+
+typedef int S;
+typedef int V __attribute__((vector_size(16)));
+typedef int IV __attribute__((vector_size(16)));
+typedef union { S s[4]; V v; } U;
+
+static U i[2], b, c;
+
+extern int memcmp (const void *, const void *, __SIZE_TYPE__);
+#define assert(T) ((T) || (__builtin_trap (), 0))
+
+#define TEST(E0, E1, E2, E3) \
+  b.v = __builtin_ia32_vec_perm_v4si (i[0].v, i[1].v, (IV){E0, E1, E2, E3}); \
+  c.s[0] = i[0].s[E0]; \
+  c.s[1] = i[0].s[E1]; \
+  c.s[2] = i[0].s[E2]; \
+  c.s[3] = i[0].s[E3]; \
+  __asm__("" : : : "memory"); \
+  assert (memcmp (&b, &c, sizeof(c)) == 0);
+
+#include "vperm-4-1.inc"
+
+int main()
+{
+  i[0].s[0] = 0;
+  i[0].s[1] = 1;
+  i[0].s[2] = 2;
+  i[0].s[3] = 3;
+  i[0].s[4] = 4;
+  i[0].s[5] = 5;
+  i[0].s[6] = 6;
+  i[0].s[7] = 7;
+
+  check();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/vperm-v4si-2.c b/gcc/testsuite/gcc.target/i386/vperm-v4si-2.c
new file mode 100644
index 0000000..0da953b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vperm-v4si-2.c
@@ -0,0 +1,40 @@
+/* { dg-do run } */
+/* { dg-options "-O -mssse3" } */
+
+#include "isa-check.h"
+
+typedef int S;
+typedef int V __attribute__((vector_size(16)));
+typedef int IV __attribute__((vector_size(16)));
+typedef union { S s[4]; V v; } U;
+
+static U i[2], b, c;
+
+extern int memcmp (const void *, const void *, __SIZE_TYPE__);
+#define assert(T) ((T) || (__builtin_trap (), 0))
+
+#define TEST(E0, E1, E2, E3) \
+  b.v = __builtin_ia32_vec_perm_v4si (i[0].v, i[1].v, (IV){E0, E1, E2, E3}); \
+  c.s[0] = i[0].s[E0]; \
+  c.s[1] = i[0].s[E1]; \
+  c.s[2] = i[0].s[E2]; \
+  c.s[3] = i[0].s[E3]; \
+  __asm__("" : : : "memory"); \
+  assert (memcmp (&b, &c, sizeof(c)) == 0);
+
+#include "vperm-4-2.inc"
+
+int main()
+{
+  i[0].s[0] = 0;
+  i[0].s[1] = 1;
+  i[0].s[2] = 2;
+  i[0].s[3] = 3;
+  i[0].s[4] = 4;
+  i[0].s[5] = 5;
+  i[0].s[6] = 6;
+  i[0].s[7] = 7;
+
+  check();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/vperm-v4si-2x.c b/gcc/testsuite/gcc.target/i386/vperm-v4si-2x.c
new file mode 100644
index 0000000..4410d93
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vperm-v4si-2x.c
@@ -0,0 +1,3 @@
+/* { dg-do run } */
+/* { dg-options "-O -mxop" } */
+#include "vperm-v4si-2.c"
diff --git a/gcc/testsuite/gcc.target/i386/vperm.pl b/gcc/testsuite/gcc.target/i386/vperm.pl
new file mode 100755
index 0000000..80fae9d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vperm.pl
@@ -0,0 +1,41 @@
+#!/usr/bin/perl
+
+$nelt = int($ARGV[0]);
+$leng = int($ARGV[1]);
+
+print "/* This file auto-generated with ./vperm.pl $nelt $leng.  */\n\n";
+
+for ($i = 0; $i < $nelt; ++$i) { $perm[$i] = 0; }
+$ncheck = 0;
+
+for ($i = 0; $i < ($leng * $nelt) ** $nelt; ++$i)
+{
+  if ($i % 128 == 0)
+  {
+    print "}\n\n" if $ncheck > 0;
+    print "void check$ncheck(void)\n{\n";
+    ++$ncheck;
+  }
+
+  print "  TEST (";
+  for ($j = 0; $j < $nelt; ++$j)
+  {
+    print $perm[$j];
+    print ", " if $j < $nelt - 1;
+  }
+  print ")\n";
+
+  INCR: for ($j = 0; $j < $nelt; ++$j)
+  {
+    last INCR if ++$perm[$j] < $leng * $nelt;
+    $perm[$j] = 0;
+  }
+}
+print "}\n\n";
+
+print "void check(void)\n{\n";
+for ($i = 0; $i < $ncheck; ++$i)
+{
+  print "  check$i ();\n";
+}
+print "}\n\n";
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 4b8d6f3..5155cba 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2562,7 +2562,9 @@ proc check_effective_target_vect_extract_even_odd { } {
         verbose "check_effective_target_vect_extract_even_odd: using cached result" 2
     } else {
         set et_vect_extract_even_odd_saved 0 
-        if { [istarget powerpc*-*-*]
+        if { [istarget powerpc*-*-*] 
+             || [istarget i?86-*-*]
+             || [istarget x86_64-*-*]
              || [istarget spu-*-*] } {
            set et_vect_extract_even_odd_saved 1
         }

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Vector permutation support for x86
  2009-11-26  3:55 Vector permutation support for x86 Richard Henderson
@ 2009-11-27 23:54 ` H.J. Lu
  2010-04-17  5:27   ` H.J. Lu
  2009-11-30 18:36 ` Sebastian Pop
  1 sibling, 1 reply; 45+ messages in thread
From: H.J. Lu @ 2009-11-27 23:54 UTC (permalink / raw)
  To: Richard Henderson; +Cc: GCC Patches, Sebastian Pop

On Wed, Nov 25, 2009 at 6:56 PM, Richard Henderson <rth@redhat.com> wrote:
> The following implements the builtin_vec_perm hook so that the vectorizer
> can do its SLP thing.  As noted elsewhere, ISAs before SSSE3 cannot
> arbitrarily permute, so this complicates things a bit.  But even given
> SSSE3, the arbitrary two-vector permute costs 3 insns, and so we would want
> to do most of this work to find the 1 and 2 insn special cases.
>
> For the AMD folk: I tried to support the vpperm insn from the XOP ISA, but
> there seems to be some disconnect between trunk binutils and trunk gcc wrt
> vpperm.  This can be seen in the failure of the new test "vperm-v4si-2x.c".
>  I'm looking at the XOP spec labeled "Pub No 43479, Rev 3.03, May 2009", and
> what gcc is emitting looks ok.  But I've already been bitten by an
> out-of-date AVX spec during this adventure, so I'd appreciate some
> double-check.
>
> Tested on an i7 machine (i.e. sse4.2).
>

This caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42193


-- 
H.J.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Vector permutation support for x86
  2009-11-26  3:55 Vector permutation support for x86 Richard Henderson
  2009-11-27 23:54 ` H.J. Lu
@ 2009-11-30 18:36 ` Sebastian Pop
  2009-11-30 20:40   ` Richard Henderson
  1 sibling, 1 reply; 45+ messages in thread
From: Sebastian Pop @ 2009-11-30 18:36 UTC (permalink / raw)
  To: Richard Henderson; +Cc: GCC Patches

Hi,

On Wed, Nov 25, 2009 at 20:56, Richard Henderson <rth@redhat.com> wrote:
> For the AMD folk: I tried to support the vpperm insn from the XOP ISA, but
> there seems to be some disconnect between trunk binutils and trunk gcc wrt
> vpperm.  This can be seen in the failure of the new test "vperm-v4si-2x.c".
>  I'm looking at the XOP spec labeled "Pub No 43479, Rev 3.03, May 2009", and
> what gcc is emitting looks ok.  But I've already been bitten by an
> out-of-date AVX spec during this adventure, so I'd appreciate some
> double-check.
>

An updated manual for XOP has been posted a week ago, see:
http://support.amd.com/us/Processor_TechDocs/43479.pdf

I do not think that this pattern generated by GCC is correct:

	vpperm	%xmm1, %xmm0, i(%rip), %xmm0

for which I get with trunk GAS this error:

/home/seb/gcc/trunk/gcc/testsuite/gcc.target/i386/vperm-v4si-2x.s:147:
Error: suffix or operands invalid for `vpperm'

The manual specifies the VPPERM insn like this:

VPPERM dest, src1, src2, selector
VPPERM xmm1, xmm2, xmm3, xmm4/mem128
VPPERM xmm1, xmm2, xmm3/mem128, xmm4

and so the pattern for the above insn is this one:

VPPERM xmm1, xmm2, xmm3/mem128, xmm4

Now the AMD manual follows the Intel/Masm syntax whereas GCC generates
the AT&T syntax, and so I think that GCC should instead generate the
memory in the second operand, something like this probably:

	vpperm  %xmm0, i(%rip), %xmm0, %xmm1

Sebastian Pop
--
AMD / Open Source Compiler Engineering / GNU Tools

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Vector permutation support for x86
  2009-11-30 18:36 ` Sebastian Pop
@ 2009-11-30 20:40   ` Richard Henderson
  2009-11-30 21:07     ` Sebastian Pop
  0 siblings, 1 reply; 45+ messages in thread
From: Richard Henderson @ 2009-11-30 20:40 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: GCC Patches

On 11/30/2009 10:35 AM, Sebastian Pop wrote:
> An updated manual for XOP has been posted a week ago, see:
> http://support.amd.com/us/Processor_TechDocs/43479.pdf

Ok, so it looks like the gcc XOP+FMA4 support is *well* out of date.

The function ix86_fma4_valid_op_p assumes encoding forms
that do not exist in this specification.  It looks like
everything could be fixed via removing the 3rd alternative
on all of these 4 operand patterns (x/xm/x/x) and simplifying

  - && ix86_fma4_valid_op_p (operands, insn, .*)
  + && !(MEM_P (operands[2]) && MEM_P (operands[3]))

and remove the ix86_fma4_valid_op_p function.

Also it would appear that the splitters that follow many of
the patterns really aren't necessary any more.

Can you arrange for someone from AMD to make this change and
get it tested before gcc 4.5 is released?


r~

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Vector permutation support for x86
  2009-11-30 20:40   ` Richard Henderson
@ 2009-11-30 21:07     ` Sebastian Pop
  2009-12-02 19:53       ` Sebastian Pop
  0 siblings, 1 reply; 45+ messages in thread
From: Sebastian Pop @ 2009-11-30 21:07 UTC (permalink / raw)
  To: Richard Henderson; +Cc: GCC Patches

On Mon, Nov 30, 2009 at 14:32, Richard Henderson <rth@redhat.com> wrote:
> On 11/30/2009 10:35 AM, Sebastian Pop wrote:
>>
>> An updated manual for XOP has been posted a week ago, see:
>> http://support.amd.com/us/Processor_TechDocs/43479.pdf
>
> Ok, so it looks like the gcc XOP+FMA4 support is *well* out of date.
>
> The function ix86_fma4_valid_op_p assumes encoding forms
> that do not exist in this specification.  It looks like
> everything could be fixed via removing the 3rd alternative
> on all of these 4 operand patterns (x/xm/x/x) and simplifying
>
>  - && ix86_fma4_valid_op_p (operands, insn, .*)
>  + && !(MEM_P (operands[2]) && MEM_P (operands[3]))
>
> and remove the ix86_fma4_valid_op_p function.
>
> Also it would appear that the splitters that follow many of
> the patterns really aren't necessary any more.
>
> Can you arrange for someone from AMD to make this change and
> get it tested before gcc 4.5 is released?

I will do this.

Sebastian

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Vector permutation support for x86
  2009-11-30 21:07     ` Sebastian Pop
@ 2009-12-02 19:53       ` Sebastian Pop
  0 siblings, 0 replies; 45+ messages in thread
From: Sebastian Pop @ 2009-12-02 19:53 UTC (permalink / raw)
  To: Richard Henderson; +Cc: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 2612 bytes --]

On Mon, Nov 30, 2009 at 14:56, Sebastian Pop <sebastian.pop@amd.com> wrote:
> On Mon, Nov 30, 2009 at 14:32, Richard Henderson <rth@redhat.com> wrote:
>> On 11/30/2009 10:35 AM, Sebastian Pop wrote:
>>>
>>> An updated manual for XOP has been posted a week ago, see:
>>> http://support.amd.com/us/Processor_TechDocs/43479.pdf
>>
>> Ok, so it looks like the gcc XOP+FMA4 support is *well* out of date.
>>
>> The function ix86_fma4_valid_op_p assumes encoding forms
>> that do not exist in this specification.  It looks like
>> everything could be fixed via removing the 3rd alternative
>> on all of these 4 operand patterns (x/xm/x/x) and simplifying
>>
>>  - && ix86_fma4_valid_op_p (operands, insn, .*)
>>  + && !(MEM_P (operands[2]) && MEM_P (operands[3]))
>>
>> and remove the ix86_fma4_valid_op_p function.
>>
>> Also it would appear that the splitters that follow many of
>> the patterns really aren't necessary any more.
>>
>> Can you arrange for someone from AMD to make this change and
>> get it tested before gcc 4.5 is released?
>
> I will do this.

Attached are the patches that fix these problems.

2009-12-02  Richard Henderson  <rth@redhat.com>

        * config/i386/i386.c (ix86_fixup_binary_operands): For FMA4, force
        all operands into registers.

2009-12-02  Sebastian Pop  <sebastian.pop@amd.com>

        * config/i386/sse.md (fma4_*): Remove alternative with operand 1
        matching a memory access.  Do not use ix86_fma4_valid_op_p.

2009-12-02  Sebastian Pop  <sebastian.pop@amd.com>

        * config/i386/i386.c (ix86_expand_fma4_multiple_memory): Remove unused
        parameter.
        * config/i386/i386-protos.h (ix86_expand_fma4_multiple_memory): Same.
        * config/i386/sse.md: Same.

2009-12-02  Sebastian Pop  <sebastian.pop@amd.com>

        * config/i386/sse.md: Do not use ix86_fma4_valid_op_p in FMA4
        splitters.

2009-12-02  Sebastian Pop  <sebastian.pop@amd.com>

        * config/i386/sse.md: Do not use ix86_fma4_valid_op_p in XOP
        splitters.

2009-12-02  Sebastian Pop  <sebastian.pop@amd.com>

	* config/i386/i386-protos.h (ix86_fma4_valid_op_p): Removed.
	* config/i386/i386.c (ix86_fma4_valid_op_p): Removed.
	* config/i386/i386.md: Do not use ix86_fma4_valid_op_p.
	* config/i386/sse.md (xop_*): Remove alternative with operand 1
    	matching a memory access.  Do not use ix86_fma4_valid_op_p.

I'm bootstrapping and testing these patches on amd64-linux.  Ok for
trunk if this passes?

Thanks,
Sebastian Pop
--
AMD / Open Source Compiler Engineering / GNU Tools

[-- Attachment #2: 0001-For-FMA4-force-all-operands-into-registers.patch --]
[-- Type: text/x-patch, Size: 1174 bytes --]

From 78a674aa0bcbcfdc85c23748fef211dc380a92b3 Mon Sep 17 00:00:00 2001
From: Sebastian Pop <sebpop@gmail.com>
Date: Wed, 2 Dec 2009 13:15:48 -0600
Subject: [PATCH] For FMA4, force all operands into registers.

2009-12-02  Richard Henderson  <rth@redhat.com>

	* config/i386/i386.c (ix86_fixup_binary_operands): For FMA4, force
	all operands into registers.
---
 gcc/config/i386/i386.c |   10 ++++++++++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 462f2d5..ae43198 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -13384,6 +13384,16 @@ ix86_fixup_binary_operands (enum rtx_code code, enum machine_mode mode,
   if (MEM_P (src1) && !rtx_equal_p (dst, src1))
     src1 = force_reg (mode, src1);
 
+  /* In order for the multiply-add patterns to get matched, we need
+     to aid combine by forcing all operands into registers to start.  */
+  if (optimize && TARGET_FMA4)
+    {
+      if (MEM_P (src2))
+	src2 = force_reg (mode, src2);
+      else if (MEM_P (src1))
+	src1 = force_reg (mode, src1);
+    }
+
   operands[1] = src1;
   operands[2] = src2;
   return dst;
-- 
1.6.0.4


[-- Attachment #3: 0002-Fix-FMA4-insns.patch --]
[-- Type: text/x-patch, Size: 29193 bytes --]

From 12e043cc0bb55cb5fcdecf76f2ccc72e24f24983 Mon Sep 17 00:00:00 2001
From: Sebastian Pop <sebpop@gmail.com>
Date: Tue, 1 Dec 2009 13:05:57 -0600
Subject: [PATCH] Fix FMA4 insns.

2009-12-02  Sebastian Pop  <sebastian.pop@amd.com>

	* config/i386/sse.md (fma4_*): Remove alternative with operand 1
	matching a memory access.  Do not use ix86_fma4_valid_op_p.
---
 gcc/config/i386/sse.md |  224 +++++++++++++++++++++---------------------------
 1 files changed, 98 insertions(+), 126 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 08a3b5b..381035d 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1703,14 +1703,13 @@
 ;;	(set (reg3) (plus (reg2) (mem (addr3))))
 
 (define_insn "fma4_fmadd<mode>4256"
-  [(set (match_operand:FMA4MODEF4 0 "register_operand" "=x,x,x")
+  [(set (match_operand:FMA4MODEF4 0 "register_operand" "=x,x")
 	(plus:FMA4MODEF4
 	 (mult:FMA4MODEF4
-	  (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "x,x,xm")
-	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,xm,x"))
-	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x,x")))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+	  (match_operand:FMA4MODEF4 1 "register_operand" "%x,x")
+	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,xm"))
+	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x")))]
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmadd<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -1740,14 +1739,13 @@
 ;; Floating multiply and subtract
 ;; Allow two memory operands the same as fmadd
 (define_insn "fma4_fmsub<mode>4256"
-  [(set (match_operand:FMA4MODEF4 0 "register_operand" "=x,x,x")
+  [(set (match_operand:FMA4MODEF4 0 "register_operand" "=x,x")
 	(minus:FMA4MODEF4
 	 (mult:FMA4MODEF4
-	  (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "x,x,xm")
-	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,xm,x"))
-	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x,x")))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+	  (match_operand:FMA4MODEF4 1 "register_operand" "%x,x")
+	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,xm"))
+	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x")))]
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsub<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -1779,14 +1777,13 @@
 ;; Note operands are out of order to simplify call to ix86_fma4_valid_p
 ;; Allow two memory operands to help in optimizing.
 (define_insn "fma4_fnmadd<mode>4256"
-  [(set (match_operand:FMA4MODEF4 0 "register_operand" "=x,x,x")
+  [(set (match_operand:FMA4MODEF4 0 "register_operand" "=x,x")
 	(minus:FMA4MODEF4
-	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x,x")
+	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x")
 	 (mult:FMA4MODEF4
-	  (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "x,x,xm")
-	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,xm,x"))))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+	  (match_operand:FMA4MODEF4 1 "register_operand" "%x,x")
+	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,xm"))))]
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmadd<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -1821,11 +1818,10 @@
 	(minus:FMA4MODEF4
 	 (mult:FMA4MODEF4
 	  (neg:FMA4MODEF4
-	   (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "x,x"))
+	   (match_operand:FMA4MODEF4 1 "register_operand" "%x,x"))
 	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,xm"))
 	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x")))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, false)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmsub<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -1855,14 +1851,13 @@
 
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 (define_insn "fma4_fmadd<mode>4"
-  [(set (match_operand:SSEMODEF4 0 "register_operand" "=x,x,x")
+  [(set (match_operand:SSEMODEF4 0 "register_operand" "=x,x")
 	(plus:SSEMODEF4
 	 (mult:SSEMODEF4
-	  (match_operand:SSEMODEF4 1 "nonimmediate_operand" "x,x,xm")
-	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,xm,x"))
-	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "xm,x,x")))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+	  (match_operand:SSEMODEF4 1 "register_operand" "%x,x")
+	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,xm"))
+	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "xm,x")))]
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmadd<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -1897,13 +1892,12 @@
 	(vec_merge:SSEMODEF2P
 	 (plus:SSEMODEF2P
 	  (mult:SSEMODEF2P
-	   (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "x,x")
+	   (match_operand:SSEMODEF2P 1 "register_operand" "%x,x")
 	   (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm"))
 	  (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))
 	 (match_dup 0)
 	 (const_int 1)))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmadd<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -1911,14 +1905,13 @@
 ;; Floating multiply and subtract
 ;; Allow two memory operands the same as fmadd
 (define_insn "fma4_fmsub<mode>4"
-  [(set (match_operand:SSEMODEF4 0 "register_operand" "=x,x,x")
+  [(set (match_operand:SSEMODEF4 0 "register_operand" "=x,x")
 	(minus:SSEMODEF4
 	 (mult:SSEMODEF4
-	  (match_operand:SSEMODEF4 1 "nonimmediate_operand" "x,x,xm")
-	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,xm,x"))
-	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "xm,x,x")))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+	  (match_operand:SSEMODEF4 1 "register_operand" "%x,x")
+	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,xm"))
+	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "xm,x")))]
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsub<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -1953,13 +1946,12 @@
 	(vec_merge:SSEMODEF2P
 	 (minus:SSEMODEF2P
 	  (mult:SSEMODEF2P
-	   (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "x,x")
+	   (match_operand:SSEMODEF2P 1 "register_operand" "%x,x")
 	   (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm"))
 	  (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))
 	 (match_dup 0)
 	 (const_int 1)))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsub<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -1969,14 +1961,13 @@
 ;; Note operands are out of order to simplify call to ix86_fma4_valid_p
 ;; Allow two memory operands to help in optimizing.
 (define_insn "fma4_fnmadd<mode>4"
-  [(set (match_operand:SSEMODEF4 0 "register_operand" "=x,x,x")
+  [(set (match_operand:SSEMODEF4 0 "register_operand" "=x,x")
 	(minus:SSEMODEF4
-	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "xm,x,x")
+	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "xm,x")
 	 (mult:SSEMODEF4
-	  (match_operand:SSEMODEF4 1 "nonimmediate_operand" "x,x,xm")
-	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,xm,x"))))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+	  (match_operand:SSEMODEF4 1 "register_operand" "%x,x")
+	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,xm"))))]
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmadd<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2012,12 +2003,11 @@
 	 (minus:SSEMODEF2P
 	  (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x")
 	  (mult:SSEMODEF2P
-	   (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "x,x")
+	   (match_operand:SSEMODEF2P 1 "register_operand" "%x,x")
 	   (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm")))
 	 (match_dup 0)
 	 (const_int 1)))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmadd<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2030,11 +2020,10 @@
 	(minus:SSEMODEF4
 	 (mult:SSEMODEF4
 	  (neg:SSEMODEF4
-	   (match_operand:SSEMODEF4 1 "nonimmediate_operand" "x,x"))
+	   (match_operand:SSEMODEF4 1 "register_operand" "%x,x"))
 	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,xm"))
 	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "xm,x")))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, false)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmsub<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2071,13 +2060,12 @@
 	 (minus:SSEMODEF2P
 	  (mult:SSEMODEF2P
 	   (neg:SSEMODEF2P
-	    (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "x,x"))
+	    (match_operand:SSEMODEF2P 1 "register_operand" "%x,x"))
 	   (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm"))
 	  (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))
 	 (match_dup 0)
 	 (const_int 1)))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, false)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmsub<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2089,11 +2077,11 @@
 	(unspec:FMA4MODEF4
 	 [(plus:FMA4MODEF4
 	   (mult:FMA4MODEF4
-	    (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "x,x")
+	    (match_operand:FMA4MODEF4 1 "register_operand" "%x,x")
 	    (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,xm"))
 	   (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x"))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmadd<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2103,11 +2091,11 @@
 	(unspec:FMA4MODEF4
 	 [(minus:FMA4MODEF4
 	   (mult:FMA4MODEF4
-	    (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "x,x")
+	    (match_operand:FMA4MODEF4 1 "register_operand" "%x,x")
 	    (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,xm"))
 	   (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x"))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsub<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2118,10 +2106,10 @@
 	 [(minus:FMA4MODEF4
 	   (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x")
 	   (mult:FMA4MODEF4
-	    (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "x,x")
+	    (match_operand:FMA4MODEF4 1 "register_operand" "%x,x")
 	    (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,xm")))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmadd<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2132,11 +2120,11 @@
 	 [(minus:FMA4MODEF4
 	   (mult:FMA4MODEF4
 	    (neg:FMA4MODEF4
-	     (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "x,x"))
+	     (match_operand:FMA4MODEF4 1 "register_operand" "%x,x"))
 	    (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,xm"))
 	   (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x"))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmsub<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2147,11 +2135,11 @@
 	(unspec:SSEMODEF2P
 	 [(plus:SSEMODEF2P
 	   (mult:SSEMODEF2P
-	    (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "x,x")
+	    (match_operand:SSEMODEF2P 1 "register_operand" "%x,x")
 	    (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm"))
 	   (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmadd<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2161,11 +2149,11 @@
 	(unspec:SSEMODEF2P
 	 [(minus:SSEMODEF2P
 	   (mult:SSEMODEF2P
-	    (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "x,x")
+	    (match_operand:SSEMODEF2P 1 "register_operand" "%x,x")
 	    (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm"))
 	   (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsub<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2176,10 +2164,10 @@
 	 [(minus:SSEMODEF2P
 	   (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x")
 	   (mult:SSEMODEF2P
-	    (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "x,x")
+	    (match_operand:SSEMODEF2P 1 "register_operand" "%x,x")
 	    (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm")))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmadd<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2190,11 +2178,11 @@
 	 [(minus:SSEMODEF2P
 	   (mult:SSEMODEF2P
 	    (neg:SSEMODEF2P
-	     (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "x,x"))
+	     (match_operand:SSEMODEF2P 1 "register_operand" "%x,x"))
 	    (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm"))
 	   (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmsub<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2207,13 +2195,13 @@
 	 [(vec_merge:SSEMODEF2P
 	   (plus:SSEMODEF2P
 	    (mult:SSEMODEF2P
-	     (match_operand:SSEMODEF2P 1 "register_operand" "x,x")
+	     (match_operand:SSEMODEF2P 1 "register_operand" "%x,x")
 	     (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm"))
 	    (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))
 	   (match_dup 0)
 	   (const_int 1))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmadd<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<ssescalarmode>")])
@@ -2224,13 +2212,13 @@
 	 [(vec_merge:SSEMODEF2P
 	   (minus:SSEMODEF2P
 	    (mult:SSEMODEF2P
-	     (match_operand:SSEMODEF2P 1 "register_operand" "x,x")
+	     (match_operand:SSEMODEF2P 1 "register_operand" "%x,x")
 	     (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm"))
 	    (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))
 	   (match_dup 0)
 	   (const_int 1))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsub<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<ssescalarmode>")])
@@ -2242,12 +2230,12 @@
 	   (minus:SSEMODEF2P
 	    (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x")
 	    (mult:SSEMODEF2P
-	     (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "x,x")
+	     (match_operand:SSEMODEF2P 1 "register_operand" "%x,x")
 	     (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm")))
 	   (match_dup 0)
 	   (const_int 1))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmadd<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<ssescalarmode>")])
@@ -2259,13 +2247,13 @@
 	   (minus:SSEMODEF2P
 	    (mult:SSEMODEF2P
 	     (neg:SSEMODEF2P
-	      (match_operand:SSEMODEF2P 1 "register_operand" "x,x"))
+	      (match_operand:SSEMODEF2P 1 "register_operand" "%x,x"))
 	     (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm"))
 	    (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))
 	   (match_dup 0)
 	   (const_int 1))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmsub<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<ssescalarmode>")])
@@ -2281,7 +2269,7 @@
 	(vec_merge:V8SF
 	  (plus:V8SF
 	    (mult:V8SF
-	      (match_operand:V8SF 1 "nonimmediate_operand" "x,x")
+	      (match_operand:V8SF 1 "register_operand" "%x,x")
 	      (match_operand:V8SF 2 "nonimmediate_operand" "x,xm"))
 	    (match_operand:V8SF 3 "nonimmediate_operand" "xm,x"))
 	  (minus:V8SF
@@ -2290,8 +2278,7 @@
 	      (match_dup 2))
 	    (match_dup 3))
 	  (const_int 170)))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmaddsubps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V8SF")])
@@ -2301,7 +2288,7 @@
 	(vec_merge:V4DF
 	  (plus:V4DF
 	    (mult:V4DF
-	      (match_operand:V4DF 1 "nonimmediate_operand" "x,x")
+	      (match_operand:V4DF 1 "register_operand" "%x,x")
 	      (match_operand:V4DF 2 "nonimmediate_operand" "x,xm"))
 	    (match_operand:V4DF 3 "nonimmediate_operand" "xm,x"))
 	  (minus:V4DF
@@ -2310,8 +2297,7 @@
 	      (match_dup 2))
 	    (match_dup 3))
 	  (const_int 10)))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmaddsubpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V4DF")])
@@ -2321,7 +2307,7 @@
 	(vec_merge:V4SF
 	  (plus:V4SF
 	    (mult:V4SF
-	      (match_operand:V4SF 1 "nonimmediate_operand" "x,x")
+	      (match_operand:V4SF 1 "register_operand" "%x,x")
 	      (match_operand:V4SF 2 "nonimmediate_operand" "x,xm"))
 	    (match_operand:V4SF 3 "nonimmediate_operand" "xm,x"))
 	  (minus:V4SF
@@ -2330,8 +2316,7 @@
 	      (match_dup 2))
 	    (match_dup 3))
 	  (const_int 10)))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmaddsubps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V4SF")])
@@ -2341,7 +2326,7 @@
 	(vec_merge:V2DF
 	  (plus:V2DF
 	    (mult:V2DF
-	      (match_operand:V2DF 1 "nonimmediate_operand" "x,x")
+	      (match_operand:V2DF 1 "register_operand" "%x,x")
 	      (match_operand:V2DF 2 "nonimmediate_operand" "x,xm"))
 	    (match_operand:V2DF 3 "nonimmediate_operand" "xm,x"))
 	  (minus:V2DF
@@ -2350,8 +2335,7 @@
 	      (match_dup 2))
 	    (match_dup 3))
 	  (const_int 2)))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmaddsubpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V2DF")])
@@ -2361,7 +2345,7 @@
 	(vec_merge:V8SF
 	  (plus:V8SF
 	    (mult:V8SF
-	      (match_operand:V8SF 1 "nonimmediate_operand" "x,x")
+	      (match_operand:V8SF 1 "register_operand" "%x,x")
 	      (match_operand:V8SF 2 "nonimmediate_operand" "x,xm"))
 	    (match_operand:V8SF 3 "nonimmediate_operand" "xm,x"))
 	  (minus:V8SF
@@ -2370,8 +2354,7 @@
 	      (match_dup 2))
 	    (match_dup 3))
 	  (const_int 85)))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsubaddps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V8SF")])
@@ -2381,7 +2364,7 @@
 	(vec_merge:V4DF
 	  (plus:V4DF
 	    (mult:V4DF
-	      (match_operand:V4DF 1 "nonimmediate_operand" "x,x")
+	      (match_operand:V4DF 1 "register_operand" "%x,x")
 	      (match_operand:V4DF 2 "nonimmediate_operand" "x,xm"))
 	    (match_operand:V4DF 3 "nonimmediate_operand" "xm,x"))
 	  (minus:V4DF
@@ -2390,8 +2373,7 @@
 	      (match_dup 2))
 	    (match_dup 3))
 	  (const_int 5)))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsubaddpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V4DF")])
@@ -2401,7 +2383,7 @@
 	(vec_merge:V4SF
 	  (plus:V4SF
 	    (mult:V4SF
-	      (match_operand:V4SF 1 "nonimmediate_operand" "x,x")
+	      (match_operand:V4SF 1 "register_operand" "%x,x")
 	      (match_operand:V4SF 2 "nonimmediate_operand" "x,xm"))
 	    (match_operand:V4SF 3 "nonimmediate_operand" "xm,x"))
 	  (minus:V4SF
@@ -2410,8 +2392,7 @@
 	      (match_dup 2))
 	    (match_dup 3))
 	  (const_int 5)))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsubaddps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V4SF")])
@@ -2421,7 +2402,7 @@
 	(vec_merge:V2DF
 	  (plus:V2DF
 	    (mult:V2DF
-	      (match_operand:V2DF 1 "nonimmediate_operand" "x,x")
+	      (match_operand:V2DF 1 "register_operand" "%x,x")
 	      (match_operand:V2DF 2 "nonimmediate_operand" "x,xm"))
 	    (match_operand:V2DF 3 "nonimmediate_operand" "xm,x"))
 	  (minus:V2DF
@@ -2430,8 +2411,7 @@
 	      (match_dup 2))
 	    (match_dup 3))
 	  (const_int 1)))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsubaddpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V2DF")])
@@ -2444,7 +2424,7 @@
 	 [(vec_merge:V8SF
 	   (plus:V8SF
 	     (mult:V8SF
-	       (match_operand:V8SF 1 "nonimmediate_operand" "x,x")
+	       (match_operand:V8SF 1 "register_operand" "%x,x")
 	       (match_operand:V8SF 2 "nonimmediate_operand" "x,xm"))
 	     (match_operand:V8SF 3 "nonimmediate_operand" "xm,x"))
 	   (minus:V8SF
@@ -2454,8 +2434,7 @@
 	     (match_dup 3))
 	   (const_int 170))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmaddsubps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V8SF")])
@@ -2466,7 +2445,7 @@
 	 [(vec_merge:V4DF
 	   (plus:V4DF
 	     (mult:V4DF
-	       (match_operand:V4DF 1 "nonimmediate_operand" "x,x")
+	       (match_operand:V4DF 1 "register_operand" "%x,x")
 	       (match_operand:V4DF 2 "nonimmediate_operand" "x,xm"))
 	     (match_operand:V4DF 3 "nonimmediate_operand" "xm,x"))
 	   (minus:V4DF
@@ -2476,8 +2455,7 @@
 	     (match_dup 3))
 	   (const_int 10))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmaddsubpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V4DF")])
@@ -2488,7 +2466,7 @@
 	 [(vec_merge:V4SF
 	   (plus:V4SF
 	     (mult:V4SF
-	       (match_operand:V4SF 1 "nonimmediate_operand" "x,x")
+	       (match_operand:V4SF 1 "register_operand" "%x,x")
 	       (match_operand:V4SF 2 "nonimmediate_operand" "x,xm"))
 	     (match_operand:V4SF 3 "nonimmediate_operand" "xm,x"))
 	   (minus:V4SF
@@ -2498,8 +2476,7 @@
 	     (match_dup 3))
 	   (const_int 10))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmaddsubps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V4SF")])
@@ -2510,7 +2487,7 @@
 	 [(vec_merge:V2DF
 	   (plus:V2DF
 	     (mult:V2DF
-	       (match_operand:V2DF 1 "nonimmediate_operand" "x,x")
+	       (match_operand:V2DF 1 "register_operand" "%x,x")
 	       (match_operand:V2DF 2 "nonimmediate_operand" "x,xm"))
 	     (match_operand:V2DF 3 "nonimmediate_operand" "xm,x"))
 	   (minus:V2DF
@@ -2520,8 +2497,7 @@
 	     (match_dup 3))
 	   (const_int 2))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmaddsubpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V2DF")])
@@ -2532,7 +2508,7 @@
 	 [(vec_merge:V8SF
 	   (plus:V8SF
 	     (mult:V8SF
-	       (match_operand:V8SF 1 "nonimmediate_operand" "x,x")
+	       (match_operand:V8SF 1 "register_operand" "%x,x")
 	       (match_operand:V8SF 2 "nonimmediate_operand" "x,xm"))
 	     (match_operand:V8SF 3 "nonimmediate_operand" "xm,x"))
 	   (minus:V8SF
@@ -2542,8 +2518,7 @@
 	     (match_dup 3))
 	   (const_int 85))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsubaddps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V8SF")])
@@ -2554,7 +2529,7 @@
 	 [(vec_merge:V4DF
 	   (plus:V4DF
 	     (mult:V4DF
-	       (match_operand:V4DF 1 "nonimmediate_operand" "x,x")
+	       (match_operand:V4DF 1 "register_operand" "%x,x")
 	       (match_operand:V4DF 2 "nonimmediate_operand" "x,xm"))
 	     (match_operand:V4DF 3 "nonimmediate_operand" "xm,x"))
 	   (minus:V4DF
@@ -2564,8 +2539,7 @@
 	     (match_dup 3))
 	   (const_int 5))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsubaddpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V4DF")])
@@ -2576,7 +2550,7 @@
 	 [(vec_merge:V4SF
 	   (plus:V4SF
 	     (mult:V4SF
-	       (match_operand:V4SF 1 "nonimmediate_operand" "x,x")
+	       (match_operand:V4SF 1 "register_operand" "%x,x")
 	       (match_operand:V4SF 2 "nonimmediate_operand" "x,xm"))
 	     (match_operand:V4SF 3 "nonimmediate_operand" "xm,x"))
 	   (minus:V4SF
@@ -2586,8 +2560,7 @@
 	     (match_dup 3))
 	   (const_int 5))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsubaddps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V4SF")])
@@ -2598,7 +2571,7 @@
 	 [(vec_merge:V2DF
 	   (plus:V2DF
 	     (mult:V2DF
-	       (match_operand:V2DF 1 "nonimmediate_operand" "x,x")
+	       (match_operand:V2DF 1 "register_operand" "%x,x")
 	       (match_operand:V2DF 2 "nonimmediate_operand" "x,xm"))
 	     (match_operand:V2DF 3 "nonimmediate_operand" "xm,x"))
 	   (minus:V2DF
@@ -2608,8 +2581,7 @@
 	     (match_dup 3))
 	   (const_int 1))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsubaddpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V2DF")])
-- 
1.6.0.4


[-- Attachment #4: 0003-Remove-unused-operand.patch --]
[-- Type: text/x-patch, Size: 5321 bytes --]

From d6e9a4555d5bb6fdccd786b37f3896d88a699ce2 Mon Sep 17 00:00:00 2001
From: Sebastian Pop <sebpop@gmail.com>
Date: Tue, 1 Dec 2009 14:10:50 -0600
Subject: [PATCH] Remove unused operand.

2009-12-02  Sebastian Pop  <sebastian.pop@amd.com>

	* config/i386/i386.c (ix86_expand_fma4_multiple_memory): Remove unused
	parameter.
	* config/i386/i386-protos.h (ix86_expand_fma4_multiple_memory): Same.
	* config/i386/sse.md: Same.
---
 gcc/config/i386/i386-protos.h |    2 +-
 gcc/config/i386/i386.c        |    5 ++---
 gcc/config/i386/sse.md        |   20 ++++++++++----------
 3 files changed, 13 insertions(+), 14 deletions(-)

diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 1451e79..bb55da1 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -219,7 +219,7 @@ extern void ix86_expand_vector_extract (bool, rtx, rtx, int);
 extern void ix86_expand_reduc_v4sf (rtx (*)(rtx, rtx, rtx), rtx, rtx);
 
 extern bool ix86_fma4_valid_op_p (rtx [], rtx, int, bool, int, bool);
-extern void ix86_expand_fma4_multiple_memory (rtx [], int, enum machine_mode);
+extern void ix86_expand_fma4_multiple_memory (rtx [], enum machine_mode);
 
 extern void ix86_expand_vec_extract_even_odd (rtx, rtx, rtx, unsigned);
 
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index ae43198..1de17da 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -28972,12 +28972,11 @@ ix86_fma4_valid_op_p (rtx operands[], rtx insn ATTRIBUTE_UNUSED, int num,
 
 void
 ix86_expand_fma4_multiple_memory (rtx operands[],
-				  int num,
 				  enum machine_mode mode)
 {
   rtx op0 = operands[0];
-  if (num != 4
-      || memory_operand (op0, mode)
+
+  if (memory_operand (op0, mode)
       || reg_mentioned_p (op0, operands[1])
       || reg_mentioned_p (op0, operands[2])
       || reg_mentioned_p (op0, operands[3]))
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 381035d..19e07a2 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1730,7 +1730,7 @@
    && !reg_mentioned_p (operands[0], operands[3])"
   [(const_int 0)]
 {
-  ix86_expand_fma4_multiple_memory (operands, 4, <MODE>mode);
+  ix86_expand_fma4_multiple_memory (operands, <MODE>mode);
   emit_insn (gen_fma4_fmadd<mode>4256 (operands[0], operands[1],
 				    operands[2], operands[3]));
   DONE;
@@ -1766,7 +1766,7 @@
    && !reg_mentioned_p (operands[0], operands[3])"
   [(const_int 0)]
 {
-  ix86_expand_fma4_multiple_memory (operands, 4, <MODE>mode);
+  ix86_expand_fma4_multiple_memory (operands, <MODE>mode);
   emit_insn (gen_fma4_fmsub<mode>4256 (operands[0], operands[1],
 				    operands[2], operands[3]));
   DONE;
@@ -1804,7 +1804,7 @@
    && !reg_mentioned_p (operands[0], operands[3])"
   [(const_int 0)]
 {
-  ix86_expand_fma4_multiple_memory (operands, 4, <MODE>mode);
+  ix86_expand_fma4_multiple_memory (operands, <MODE>mode);
   emit_insn (gen_fma4_fnmadd<mode>4256 (operands[0], operands[1],
 				     operands[2], operands[3]));
   DONE;
@@ -1843,7 +1843,7 @@
    && !reg_mentioned_p (operands[0], operands[3])"
   [(const_int 0)]
 {
-  ix86_expand_fma4_multiple_memory (operands, 4, <MODE>mode);
+  ix86_expand_fma4_multiple_memory (operands, <MODE>mode);
   emit_insn (gen_fma4_fnmsub<mode>4256 (operands[0], operands[1],
 				        operands[2], operands[3]));
   DONE;
@@ -1878,7 +1878,7 @@
    && !reg_mentioned_p (operands[0], operands[3])"
   [(const_int 0)]
 {
-  ix86_expand_fma4_multiple_memory (operands, 4, <MODE>mode);
+  ix86_expand_fma4_multiple_memory (operands, <MODE>mode);
   emit_insn (gen_fma4_fmadd<mode>4 (operands[0], operands[1],
 				    operands[2], operands[3]));
   DONE;
@@ -1932,7 +1932,7 @@
    && !reg_mentioned_p (operands[0], operands[3])"
   [(const_int 0)]
 {
-  ix86_expand_fma4_multiple_memory (operands, 4, <MODE>mode);
+  ix86_expand_fma4_multiple_memory (operands, <MODE>mode);
   emit_insn (gen_fma4_fmsub<mode>4 (operands[0], operands[1],
 				    operands[2], operands[3]));
   DONE;
@@ -1988,7 +1988,7 @@
    && !reg_mentioned_p (operands[0], operands[3])"
   [(const_int 0)]
 {
-  ix86_expand_fma4_multiple_memory (operands, 4, <MODE>mode);
+  ix86_expand_fma4_multiple_memory (operands, <MODE>mode);
   emit_insn (gen_fma4_fnmadd<mode>4 (operands[0], operands[1],
 				     operands[2], operands[3]));
   DONE;
@@ -2045,7 +2045,7 @@
    && !reg_mentioned_p (operands[0], operands[3])"
   [(const_int 0)]
 {
-  ix86_expand_fma4_multiple_memory (operands, 4, <MODE>mode);
+  ix86_expand_fma4_multiple_memory (operands, <MODE>mode);
   emit_insn (gen_fma4_fnmsub<mode>4 (operands[0], operands[1],
 				     operands[2], operands[3]));
   DONE;
@@ -10356,7 +10356,7 @@
    && !reg_mentioned_p (operands[0], operands[3])"
   [(const_int 0)]
 {
-  ix86_expand_fma4_multiple_memory (operands, 4, V8HImode);
+  ix86_expand_fma4_multiple_memory (operands, V8HImode);
   emit_insn (gen_xop_pmacsww (operands[0], operands[1], operands[2],
 			      operands[3]));
   DONE;
@@ -10408,7 +10408,7 @@
    && !reg_mentioned_p (operands[0], operands[3])"
   [(const_int 0)]
 {
-  ix86_expand_fma4_multiple_memory (operands, 4, V4SImode);
+  ix86_expand_fma4_multiple_memory (operands, V4SImode);
   emit_insn (gen_xop_pmacsdd (operands[0], operands[1], operands[2],
 			      operands[3]));
   DONE;
-- 
1.6.0.4


[-- Attachment #5: 0004-Fix-FMA4-splitters.patch --]
[-- Type: text/x-patch, Size: 4745 bytes --]

From 7e50ccd01944c804c744177c97899c4767213adb Mon Sep 17 00:00:00 2001
From: Sebastian Pop <sebpop@gmail.com>
Date: Tue, 1 Dec 2009 14:16:30 -0600
Subject: [PATCH] Fix FMA4 splitters.

2009-12-02  Sebastian Pop  <sebastian.pop@amd.com>

	* config/i386/sse.md: Do not use ix86_fma4_valid_op_p in FMA4
	splitters.
---
 gcc/config/i386/sse.md |   32 ++++++++++++++++----------------
 1 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 19e07a2..36d6595 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1723,8 +1723,8 @@
 	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" ""))
 	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "")))]
   "TARGET_FMA4
-   && !ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)
+   && MEM_P (operands[2])
+   && (MEM_P (operands[1]) || MEM_P (operands[3]))
    && !reg_mentioned_p (operands[0], operands[1])
    && !reg_mentioned_p (operands[0], operands[2])
    && !reg_mentioned_p (operands[0], operands[3])"
@@ -1759,8 +1759,8 @@
 	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" ""))
 	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "")))]
   "TARGET_FMA4
-   && !ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)
+   && MEM_P (operands[2])
+   && (MEM_P (operands[1]) || MEM_P (operands[3]))
    && !reg_mentioned_p (operands[0], operands[1])
    && !reg_mentioned_p (operands[0], operands[2])
    && !reg_mentioned_p (operands[0], operands[3])"
@@ -1797,8 +1797,8 @@
 	  (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "")
 	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" ""))))]
   "TARGET_FMA4
-   && !ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)
+   && MEM_P (operands[2])
+   && (MEM_P (operands[1]) || MEM_P (operands[3]))
    && !reg_mentioned_p (operands[0], operands[1])
    && !reg_mentioned_p (operands[0], operands[2])
    && !reg_mentioned_p (operands[0], operands[3])"
@@ -1836,8 +1836,8 @@
 	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" ""))
 	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "")))]
   "TARGET_FMA4
-   && !ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, false)
+   && MEM_P (operands[2])
+   && (MEM_P (operands[1]) || MEM_P (operands[3]))
    && !reg_mentioned_p (operands[0], operands[1])
    && !reg_mentioned_p (operands[0], operands[2])
    && !reg_mentioned_p (operands[0], operands[3])"
@@ -1871,8 +1871,8 @@
 	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" ""))
 	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "")))]
   "TARGET_FMA4
-   && !ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)
+   && MEM_P (operands[2])
+   && (MEM_P (operands[1]) || MEM_P (operands[3]))
    && !reg_mentioned_p (operands[0], operands[1])
    && !reg_mentioned_p (operands[0], operands[2])
    && !reg_mentioned_p (operands[0], operands[3])"
@@ -1925,8 +1925,8 @@
 	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" ""))
 	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "")))]
   "TARGET_FMA4
-   && !ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)
+   && MEM_P (operands[2])
+   && (MEM_P (operands[1]) || MEM_P (operands[3]))
    && !reg_mentioned_p (operands[0], operands[1])
    && !reg_mentioned_p (operands[0], operands[2])
    && !reg_mentioned_p (operands[0], operands[3])"
@@ -1981,8 +1981,8 @@
 	  (match_operand:SSEMODEF4 1 "nonimmediate_operand" "")
 	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" ""))))]
   "TARGET_FMA4
-   && !ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)
+   && MEM_P (operands[2])
+   && (MEM_P (operands[1]) || MEM_P (operands[3]))
    && !reg_mentioned_p (operands[0], operands[1])
    && !reg_mentioned_p (operands[0], operands[2])
    && !reg_mentioned_p (operands[0], operands[3])"
@@ -2038,8 +2038,8 @@
 	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" ""))
 	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "")))]
   "TARGET_FMA4
-   && !ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, false)
+   && MEM_P (operands[2])
+   && (MEM_P (operands[1]) || MEM_P (operands[3]))
    && !reg_mentioned_p (operands[0], operands[1])
    && !reg_mentioned_p (operands[0], operands[2])
    && !reg_mentioned_p (operands[0], operands[3])"
-- 
1.6.0.4


[-- Attachment #6: 0005-Fix-XOP-splitters.patch --]
[-- Type: text/x-patch, Size: 1593 bytes --]

From 0164dfae6789608fcc0e4eb12bf2b29b11ddb2ab Mon Sep 17 00:00:00 2001
From: Sebastian Pop <sebpop@gmail.com>
Date: Tue, 1 Dec 2009 14:29:30 -0600
Subject: [PATCH] Fix XOP splitters.

2009-12-02  Sebastian Pop  <sebastian.pop@amd.com>

	* config/i386/sse.md: Do not use ix86_fma4_valid_op_p in XOP
	splitters.
---
 gcc/config/i386/sse.md |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 36d6595..06765b9 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -10349,8 +10349,8 @@
 		    (match_operand:V8HI 2 "nonimmediate_operand" ""))
 	 (match_operand:V8HI 3 "nonimmediate_operand" "")))]
   "TARGET_XOP
-   && !ix86_fma4_valid_op_p (operands, insn, 4, false, 1, true)
-   && ix86_fma4_valid_op_p (operands, insn, 4, false, 2, true)
+   && MEM_P (operands[2])
+   && (MEM_P (operands[1]) || MEM_P (operands[3]))
    && !reg_mentioned_p (operands[0], operands[1])
    && !reg_mentioned_p (operands[0], operands[2])
    && !reg_mentioned_p (operands[0], operands[3])"
@@ -10401,8 +10401,8 @@
 		    (match_operand:V4SI 2 "nonimmediate_operand" ""))
 	 (match_operand:V4SI 3 "nonimmediate_operand" "")))]
   "TARGET_XOP
-   && !ix86_fma4_valid_op_p (operands, insn, 4, false, 1, true)
-   && ix86_fma4_valid_op_p (operands, insn, 4, false, 2, true)
+   && MEM_P (operands[2])
+   && (MEM_P (operands[1]) || MEM_P (operands[3]))
    && !reg_mentioned_p (operands[0], operands[1])
    && !reg_mentioned_p (operands[0], operands[2])
    && !reg_mentioned_p (operands[0], operands[3])"
-- 
1.6.0.4


[-- Attachment #7: 0006-Fix-XOP-insns.patch --]
[-- Type: text/x-patch, Size: 27876 bytes --]

From 93f8662fb31544dd9d5e0647ca40200440f5d24b Mon Sep 17 00:00:00 2001
From: Sebastian Pop <sebpop@gmail.com>
Date: Tue, 1 Dec 2009 15:23:04 -0600
Subject: [PATCH] Fix XOP insns.

2009-12-02  Sebastian Pop  <sebastian.pop@amd.com>

	* config/i386/i386-protos.h (ix86_fma4_valid_op_p): Removed.
	* config/i386/i386.c (ix86_fma4_valid_op_p): Removed.
	* config/i386/i386.md: Do not use ix86_fma4_valid_op_p.
	* config/i386/sse.md (xop_*): Remove alternative with operand 1
    	matching a memory access.  Do not use ix86_fma4_valid_op_p.
---
 gcc/config/i386/i386-protos.h |    1 -
 gcc/config/i386/i386.c        |  157 -------------------------
 gcc/config/i386/i386.md       |    2 +-
 gcc/config/i386/sse.md        |  258 +++++++++++++++++++----------------------
 4 files changed, 118 insertions(+), 300 deletions(-)

diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index bb55da1..27fca86 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -218,7 +218,6 @@ extern void ix86_expand_vector_set (bool, rtx, rtx, int);
 extern void ix86_expand_vector_extract (bool, rtx, rtx, int);
 extern void ix86_expand_reduc_v4sf (rtx (*)(rtx, rtx, rtx), rtx, rtx);
 
-extern bool ix86_fma4_valid_op_p (rtx [], rtx, int, bool, int, bool);
 extern void ix86_expand_fma4_multiple_memory (rtx [], enum machine_mode);
 
 extern void ix86_expand_vec_extract_even_odd (rtx, rtx, rtx, unsigned);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 1de17da..5b949d0 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -28807,163 +28807,6 @@ ix86_expand_round (rtx operand0, rtx operand1)
   emit_move_insn (operand0, res);
 }
 \f
-/* Validate whether a FMA4 instruction is valid or not.
-   OPERANDS is the array of operands.
-   NUM is the number of operands.
-   USES_OC0 is true if the instruction uses OC0 and provides 4 variants.
-   NUM_MEMORY is the maximum number of memory operands to accept.
-   NUM_MEMORY less than zero is a special case to allow an operand
-   of an instruction to be memory operation.
-   when COMMUTATIVE is set, operand 1 and 2 can be swapped.  */
-
-bool
-ix86_fma4_valid_op_p (rtx operands[], rtx insn ATTRIBUTE_UNUSED, int num,
-		      bool uses_oc0, int num_memory, bool commutative)
-{
-  int mem_mask;
-  int mem_count;
-  int i;
-
-  /* Count the number of memory arguments */
-  mem_mask = 0;
-  mem_count = 0;
-  for (i = 0; i < num; i++)
-    {
-      enum machine_mode mode = GET_MODE (operands[i]);
-      if (register_operand (operands[i], mode))
-	;
-
-      else if (memory_operand (operands[i], mode))
-	{
-	  mem_mask |= (1 << i);
-	  mem_count++;
-	}
-
-      else
-	{
-	  rtx pattern = PATTERN (insn);
-
-	  /* allow 0 for pcmov */
-	  if (GET_CODE (pattern) != SET
-	      || GET_CODE (SET_SRC (pattern)) != IF_THEN_ELSE
-	      || i < 2
-	      || operands[i] != CONST0_RTX (mode))
-	    return false;
-	}
-    }
-
-  /* Special case pmacsdq{l,h} where we allow the 3rd argument to be
-     a memory operation.  */
-  if (num_memory < 0)
-    {
-      num_memory = -num_memory;
-      if ((mem_mask & (1 << (num-1))) != 0)
-	{
-	  mem_mask &= ~(1 << (num-1));
-	  mem_count--;
-	}
-    }
-
-  /* If there were no memory operations, allow the insn */
-  if (mem_mask == 0)
-    return true;
-
-  /* Do not allow the destination register to be a memory operand.  */
-  else if (mem_mask & (1 << 0))
-    return false;
-
-  /* If there are too many memory operations, disallow the instruction.  While
-     the hardware only allows 1 memory reference, before register allocation
-     for some insns, we allow two memory operations sometimes in order to allow
-     code like the following to be optimized:
-
-	float fmadd (float *a, float *b, float *c) { return (*a * *b) + *c; }
-
-    or similar cases that are vectorized into using the vfmaddss
-    instruction.  */
-  else if (mem_count > num_memory)
-    return false;
-
-  /* Don't allow more than one memory operation if not optimizing.  */
-  else if (mem_count > 1 && !optimize)
-    return false;
-
-  else if (num == 4 && mem_count == 1)
-    {
-      /* formats (destination is the first argument), example vfmaddss:
-	 xmm1, xmm1, xmm2, xmm3/mem
-	 xmm1, xmm1, xmm2/mem, xmm3
-	 xmm1, xmm2, xmm3/mem, xmm1
-	 xmm1, xmm2/mem, xmm3, xmm1 */
-      if (uses_oc0)
-	return ((mem_mask == (1 << 1))
-		|| (mem_mask == (1 << 2))
-		|| (mem_mask == (1 << 3)));
-
-      /* format, example vpmacsdd:
-	 xmm1, xmm2, xmm3/mem, xmm1 */
-      if (commutative)
-	return (mem_mask == (1 << 2) || mem_mask == (1 << 1));
-      else
-	return (mem_mask == (1 << 2));
-    }
-
-  else if (num == 4 && num_memory == 2)
-    {
-      /* If there are two memory operations, we can load one of the memory ops
-	 into the destination register.  This is for optimizing the
-	 multiply/add ops, which the combiner has optimized both the multiply
-	 and the add insns to have a memory operation.  We have to be careful
-	 that the destination doesn't overlap with the inputs.  */
-      rtx op0 = operands[0];
-
-      if (reg_mentioned_p (op0, operands[1])
-	  || reg_mentioned_p (op0, operands[2])
-	  || reg_mentioned_p (op0, operands[3]))
-	return false;
-
-      /* formats (destination is the first argument), example vfmaddss:
-	 xmm1, xmm1, xmm2, xmm3/mem
-	 xmm1, xmm1, xmm2/mem, xmm3
-	 xmm1, xmm2, xmm3/mem, xmm1
-	 xmm1, xmm2/mem, xmm3, xmm1
-
-         For the oc0 case, we will load either operands[1] or operands[3] into
-         operands[0], so any combination of 2 memory operands is ok.  */
-      if (uses_oc0)
-	return true;
-
-      /* format, example vpmacsdd:
-	 xmm1, xmm2, xmm3/mem, xmm1
-
-         For the integer multiply/add instructions be more restrictive and
-         require operands[2] and operands[3] to be the memory operands.  */
-      if (commutative)
-	return (mem_mask == ((1 << 1) | (1 << 3)) || ((1 << 2) | (1 << 3)));
-      else
-	return (mem_mask == ((1 << 2) | (1 << 3)));
-    }
-
-  else if (num == 3 && num_memory == 1)
-    {
-      /* formats, example vprotb:
-	 xmm1, xmm2, xmm3/mem
-	 xmm1, xmm2/mem, xmm3 */
-      if (uses_oc0)
-	return ((mem_mask == (1 << 1)) || (mem_mask == (1 << 2)));
-
-      /* format, example vpcomeq:
-	 xmm1, xmm2, xmm3/mem */
-      else
-	return (mem_mask == (1 << 2));
-    }
-
-  else
-    gcc_unreachable ();
-
-  return false;
-}
-
 
 /* Fixup an FMA4 instruction that has 2 memory input references into a form the
    hardware will allow by using the destination register to load one of the
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 851061d..1ef3025 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -19248,7 +19248,7 @@
 	  (match_operand:MODEF 1 "register_operand" "x")
 	  (match_operand:MODEF 2 "register_operand" "x")
 	  (match_operand:MODEF 3 "register_operand" "x")))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)"
+  "TARGET_XOP"
   "vpcmov\t{%1, %3, %2, %0|%0, %2, %3, %1}"
   [(set_attr "type" "sse4arg")])
 
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 06765b9..d81a701 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -10328,16 +10328,14 @@
 ;; that it does and splitting it later allows the following to be recognized:
 ;;	a[i] = b[i] * c[i] + d[i];
 (define_insn "xop_pmacsww"
-  [(set (match_operand:V8HI 0 "register_operand" "=x,x")
+  [(set (match_operand:V8HI 0 "register_operand" "=x")
         (plus:V8HI
 	 (mult:V8HI
-	  (match_operand:V8HI 1 "nonimmediate_operand" "%x,m")
-	  (match_operand:V8HI 2 "nonimmediate_operand" "xm,x"))
-	 (match_operand:V8HI 3 "register_operand" "x,x")))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, false, 2, true)"
-  "@
-   vpmacsww\t{%3, %2, %1, %0|%0, %1, %2, %3}
-   vpmacsww\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+	  (match_operand:V8HI 1 "register_operand" "%x")
+	  (match_operand:V8HI 2 "nonimmediate_operand" "xm"))
+	 (match_operand:V8HI 3 "register_operand" "x")))]
+  "TARGET_XOP"
+  "vpmacsww\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "TI")])
 
@@ -10363,15 +10361,13 @@
 })
 
 (define_insn "xop_pmacssww"
-  [(set (match_operand:V8HI 0 "register_operand" "=x,x")
+  [(set (match_operand:V8HI 0 "register_operand" "=x")
         (ss_plus:V8HI
-	 (mult:V8HI (match_operand:V8HI 1 "nonimmediate_operand" "%x,m")
-		    (match_operand:V8HI 2 "nonimmediate_operand" "xm,x"))
-	 (match_operand:V8HI 3 "register_operand" "x,x")))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, false, 1, true)"
-  "@
-   vpmacssww\t{%3, %2, %1, %0|%0, %1, %2, %3}
-   vpmacssww\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+	 (mult:V8HI (match_operand:V8HI 1 "register_operand" "%x")
+		    (match_operand:V8HI 2 "nonimmediate_operand" "xm"))
+	 (match_operand:V8HI 3 "register_operand" "x")))]
+  "TARGET_XOP"
+  "vpmacssww\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "TI")])
 
@@ -10380,16 +10376,14 @@
 ;; that it does and splitting it later allows the following to be recognized:
 ;;	a[i] = b[i] * c[i] + d[i];
 (define_insn "xop_pmacsdd"
-  [(set (match_operand:V4SI 0 "register_operand" "=x,x")
+  [(set (match_operand:V4SI 0 "register_operand" "=x")
         (plus:V4SI
 	 (mult:V4SI
-	  (match_operand:V4SI 1 "nonimmediate_operand" "%x,m")
-	  (match_operand:V4SI 2 "nonimmediate_operand" "xm,x"))
-	 (match_operand:V4SI 3 "register_operand" "x,x")))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, false, 2, true)"
-  "@
-   vpmacsdd\t{%3, %2, %1, %0|%0, %1, %2, %3}
-   vpmacsdd\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+	  (match_operand:V4SI 1 "register_operand" "%x")
+	  (match_operand:V4SI 2 "nonimmediate_operand" "xm"))
+	 (match_operand:V4SI 3 "register_operand" "x")))]
+  "TARGET_XOP"
+  "vpmacsdd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "TI")])
 
@@ -10415,99 +10409,91 @@
 })
 
 (define_insn "xop_pmacssdd"
-  [(set (match_operand:V4SI 0 "register_operand" "=x,x")
+  [(set (match_operand:V4SI 0 "register_operand" "=x")
         (ss_plus:V4SI
-	 (mult:V4SI (match_operand:V4SI 1 "nonimmediate_operand" "%x,m")
-		    (match_operand:V4SI 2 "nonimmediate_operand" "xm,x"))
-	 (match_operand:V4SI 3 "register_operand" "x,x")))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, false, 1, true)"
-  "@
-   vpmacssdd\t{%3, %2, %1, %0|%0, %1, %2, %3}
-   vpmacssdd\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+	 (mult:V4SI (match_operand:V4SI 1 "register_operand" "%x")
+		    (match_operand:V4SI 2 "nonimmediate_operand" "xm"))
+	 (match_operand:V4SI 3 "register_operand" "x")))]
+  "TARGET_XOP"
+  "vpmacssdd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "TI")])
 
 (define_insn "xop_pmacssdql"
-  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
 	(ss_plus:V2DI
 	 (mult:V2DI
 	  (sign_extend:V2DI
 	   (vec_select:V2SI
-	    (match_operand:V4SI 1 "nonimmediate_operand" "%x,m")
+	    (match_operand:V4SI 1 "register_operand" "%x")
 	    (parallel [(const_int 1)
 		       (const_int 3)])))
 	  (vec_select:V2SI
-	   (match_operand:V4SI 2 "nonimmediate_operand" "xm,x")
+	   (match_operand:V4SI 2 "nonimmediate_operand" "xm")
 	   (parallel [(const_int 1)
 		      (const_int 3)])))
-	 (match_operand:V2DI 3 "register_operand" "x,x")))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, false, 1, true)"
-  "@
-   vpmacssdql\t{%3, %2, %1, %0|%0, %1, %2, %3}
-   vpmacssdql\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+	 (match_operand:V2DI 3 "register_operand" "x")))]
+  "TARGET_XOP"
+  "vpmacssdql\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "TI")])
 
 (define_insn "xop_pmacssdqh"
-  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
 	(ss_plus:V2DI
 	 (mult:V2DI
 	  (sign_extend:V2DI
 	   (vec_select:V2SI
-	    (match_operand:V4SI 1 "nonimmediate_operand" "%x,m")
+	    (match_operand:V4SI 1 "register_operand" "%x")
 	    (parallel [(const_int 0)
 		       (const_int 2)])))
 	  (sign_extend:V2DI
 	   (vec_select:V2SI
-	    (match_operand:V4SI 2 "nonimmediate_operand" "xm,x")
+	    (match_operand:V4SI 2 "nonimmediate_operand" "xm")
 	    (parallel [(const_int 0)
 		       (const_int 2)]))))
-	 (match_operand:V2DI 3 "register_operand" "x,x")))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, false, 1, true)"
-  "@
-   vpmacssdqh\t{%3, %2, %1, %0|%0, %1, %2, %3}
-   vpmacssdqh\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+	 (match_operand:V2DI 3 "register_operand" "x")))]
+  "TARGET_XOP"
+  "vpmacssdqh\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "TI")])
 
 (define_insn "xop_pmacsdql"
-  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
 	(plus:V2DI
 	 (mult:V2DI
 	  (sign_extend:V2DI
 	   (vec_select:V2SI
-	    (match_operand:V4SI 1 "nonimmediate_operand" "%x,m")
+	    (match_operand:V4SI 1 "register_operand" "%x")
 	    (parallel [(const_int 1)
 		       (const_int 3)])))
 	  (sign_extend:V2DI
 	   (vec_select:V2SI
-	    (match_operand:V4SI 2 "nonimmediate_operand" "xm,x")
+	    (match_operand:V4SI 2 "nonimmediate_operand" "xm")
 	    (parallel [(const_int 1)
 		       (const_int 3)]))))
-	 (match_operand:V2DI 3 "register_operand" "x,x")))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, false, 1, true)"
-  "@
-   vpmacsdql\t{%3, %2, %1, %0|%0, %1, %2, %3}
-   vpmacsdql\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+	 (match_operand:V2DI 3 "register_operand" "x")))]
+  "TARGET_XOP"
+  "vpmacsdql\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "TI")])
 
 (define_insn_and_split "*xop_pmacsdql_mem"
-  [(set (match_operand:V2DI 0 "register_operand" "=&x,&x")
+  [(set (match_operand:V2DI 0 "register_operand" "=&x")
 	(plus:V2DI
 	 (mult:V2DI
 	  (sign_extend:V2DI
 	   (vec_select:V2SI
-	    (match_operand:V4SI 1 "nonimmediate_operand" "%x,m")
+	    (match_operand:V4SI 1 "register_operand" "%x")
 	    (parallel [(const_int 1)
 		       (const_int 3)])))
 	  (sign_extend:V2DI
 	   (vec_select:V2SI
-	    (match_operand:V4SI 2 "nonimmediate_operand" "xm,x")
+	    (match_operand:V4SI 2 "nonimmediate_operand" "xm")
 	    (parallel [(const_int 1)
 		       (const_int 3)]))))
-	 (match_operand:V2DI 3 "memory_operand" "m,m")))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, false, -1, true)"
+	 (match_operand:V2DI 3 "memory_operand" "m")))]
+  "TARGET_XOP"
   "#"
   "&& reload_completed"
   [(set (match_dup 0)
@@ -10536,7 +10522,7 @@
 	(mult:V2DI
 	  (sign_extend:V2DI
 	    (vec_select:V2SI
-	      (match_operand:V4SI 1 "nonimmediate_operand" "%x")
+	      (match_operand:V4SI 1 "register_operand" "%x")
 	      (parallel [(const_int 1)
 			 (const_int 3)])))
 	  (sign_extend:V2DI
@@ -10570,43 +10556,41 @@
    (set_attr "mode" "TI")])
 
 (define_insn "xop_pmacsdqh"
-  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
 	(plus:V2DI
 	 (mult:V2DI
 	  (sign_extend:V2DI
 	   (vec_select:V2SI
-	    (match_operand:V4SI 1 "nonimmediate_operand" "%x,m")
+	    (match_operand:V4SI 1 "register_operand" "%x")
 	    (parallel [(const_int 0)
 		       (const_int 2)])))
 	  (sign_extend:V2DI
 	   (vec_select:V2SI
-	    (match_operand:V4SI 2 "nonimmediate_operand" "xm,x")
+	    (match_operand:V4SI 2 "nonimmediate_operand" "xm")
 	    (parallel [(const_int 0)
 		       (const_int 2)]))))
-	 (match_operand:V2DI 3 "register_operand" "x,x")))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, false, 1, true)"
-  "@
-   vpmacsdqh\t{%3, %2, %1, %0|%0, %1, %2, %3}
-   vpmacsdqh\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+	 (match_operand:V2DI 3 "register_operand" "x")))]
+  "TARGET_XOP"
+  "vpmacsdqh\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "TI")])
 
 (define_insn_and_split "*xop_pmacsdqh_mem"
-  [(set (match_operand:V2DI 0 "register_operand" "=&x,&x")
+  [(set (match_operand:V2DI 0 "register_operand" "=&x")
 	(plus:V2DI
 	 (mult:V2DI
 	  (sign_extend:V2DI
 	   (vec_select:V2SI
-	    (match_operand:V4SI 1 "nonimmediate_operand" "%x,m")
+	    (match_operand:V4SI 1 "register_operand" "%x")
 	    (parallel [(const_int 0)
 		       (const_int 2)])))
 	  (sign_extend:V2DI
 	   (vec_select:V2SI
-	    (match_operand:V4SI 2 "nonimmediate_operand" "xm,x")
+	    (match_operand:V4SI 2 "nonimmediate_operand" "xm")
 	    (parallel [(const_int 0)
 		       (const_int 2)]))))
-	 (match_operand:V2DI 3 "memory_operand" "m,m")))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, false, -1, true)"
+	 (match_operand:V2DI 3 "memory_operand" "m")))]
+  "TARGET_XOP"
   "#"
   "&& reload_completed"
   [(set (match_dup 0)
@@ -10635,7 +10619,7 @@
 	(mult:V2DI
 	  (sign_extend:V2DI
 	    (vec_select:V2SI
-	      (match_operand:V4SI 1 "nonimmediate_operand" "%x")
+	      (match_operand:V4SI 1 "register_operand" "%x")
 	      (parallel [(const_int 0)
 			 (const_int 2)])))
 	  (sign_extend:V2DI
@@ -10670,72 +10654,68 @@
 
 ;; XOP parallel integer multiply/add instructions for the intrinisics
 (define_insn "xop_pmacsswd"
-  [(set (match_operand:V4SI 0 "register_operand" "=x,x")
+  [(set (match_operand:V4SI 0 "register_operand" "=x")
 	(ss_plus:V4SI
 	 (mult:V4SI
 	  (sign_extend:V4SI
 	   (vec_select:V4HI
-	    (match_operand:V8HI 1 "nonimmediate_operand" "%x,m")
+	    (match_operand:V8HI 1 "register_operand" "%x")
 	    (parallel [(const_int 1)
 		       (const_int 3)
 		       (const_int 5)
 		       (const_int 7)])))
 	  (sign_extend:V4SI
 	   (vec_select:V4HI
-	    (match_operand:V8HI 2 "nonimmediate_operand" "xm,x")
+	    (match_operand:V8HI 2 "nonimmediate_operand" "xm")
 	    (parallel [(const_int 1)
 		       (const_int 3)
 		       (const_int 5)
 		       (const_int 7)]))))
-	 (match_operand:V4SI 3 "register_operand" "x,x")))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, false, 1, true)"
-  "@
-   vpmacsswd\t{%3, %2, %1, %0|%0, %1, %2, %3}
-   vpmacsswd\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+	 (match_operand:V4SI 3 "register_operand" "x")))]
+  "TARGET_XOP"
+  "vpmacsswd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "TI")])
 
 (define_insn "xop_pmacswd"
-  [(set (match_operand:V4SI 0 "register_operand" "=x,x")
+  [(set (match_operand:V4SI 0 "register_operand" "=x")
 	(plus:V4SI
 	 (mult:V4SI
 	  (sign_extend:V4SI
 	   (vec_select:V4HI
-	    (match_operand:V8HI 1 "nonimmediate_operand" "%x,m")
+	    (match_operand:V8HI 1 "register_operand" "%x")
 	    (parallel [(const_int 1)
 		       (const_int 3)
 		       (const_int 5)
 		       (const_int 7)])))
 	  (sign_extend:V4SI
 	   (vec_select:V4HI
-	    (match_operand:V8HI 2 "nonimmediate_operand" "xm,x")
+	    (match_operand:V8HI 2 "nonimmediate_operand" "xm")
 	    (parallel [(const_int 1)
 		       (const_int 3)
 		       (const_int 5)
 		       (const_int 7)]))))
-	 (match_operand:V4SI 3 "register_operand" "x,x")))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, false, 1, true)"
-  "@
-   vpmacswd\t{%3, %2, %1, %0|%0, %1, %2, %3}
-   vpmacswd\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+	 (match_operand:V4SI 3 "register_operand" "x")))]
+  "TARGET_XOP"
+  "vpmacswd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "TI")])
 
 (define_insn "xop_pmadcsswd"
-  [(set (match_operand:V4SI 0 "register_operand" "=x,x")
+  [(set (match_operand:V4SI 0 "register_operand" "=x")
 	(ss_plus:V4SI
 	 (plus:V4SI
 	  (mult:V4SI
 	   (sign_extend:V4SI
 	    (vec_select:V4HI
-	     (match_operand:V8HI 1 "nonimmediate_operand" "%x,m")
+	     (match_operand:V8HI 1 "register_operand" "%x")
 	     (parallel [(const_int 0)
 			(const_int 2)
 			(const_int 4)
 			(const_int 6)])))
 	   (sign_extend:V4SI
 	    (vec_select:V4HI
-	     (match_operand:V8HI 2 "nonimmediate_operand" "xm,x")
+	     (match_operand:V8HI 2 "nonimmediate_operand" "xm")
 	     (parallel [(const_int 0)
 			(const_int 2)
 			(const_int 4)
@@ -10755,29 +10735,27 @@
 			(const_int 3)
 			(const_int 5)
 			(const_int 7)])))))
-	 (match_operand:V4SI 3 "register_operand" "x,x")))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, false, 1, true)"
-  "@
-   vpmadcsswd\t{%3, %2, %1, %0|%0, %1, %2, %3}
-   vpmadcsswd\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+	 (match_operand:V4SI 3 "register_operand" "x")))]
+  "TARGET_XOP"
+  "vpmadcsswd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "TI")])
 
 (define_insn "xop_pmadcswd"
-  [(set (match_operand:V4SI 0 "register_operand" "=x,x")
+  [(set (match_operand:V4SI 0 "register_operand" "=x")
 	(plus:V4SI
 	 (plus:V4SI
 	  (mult:V4SI
 	   (sign_extend:V4SI
 	    (vec_select:V4HI
-	     (match_operand:V8HI 1 "nonimmediate_operand" "%x,m")
+	     (match_operand:V8HI 1 "register_operand" "%x")
 	     (parallel [(const_int 0)
 			(const_int 2)
 			(const_int 4)
 			(const_int 6)])))
 	   (sign_extend:V4SI
 	    (vec_select:V4HI
-	     (match_operand:V8HI 2 "nonimmediate_operand" "xm,x")
+	     (match_operand:V8HI 2 "nonimmediate_operand" "xm")
 	     (parallel [(const_int 0)
 			(const_int 2)
 			(const_int 4)
@@ -10797,32 +10775,30 @@
 			(const_int 3)
 			(const_int 5)
 			(const_int 7)])))))
-	 (match_operand:V4SI 3 "register_operand" "x,x")))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, false, 1, true)"
-  "@
-   vpmadcswd\t{%3, %2, %1, %0|%0, %1, %2, %3}
-   vpmadcswd\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+	 (match_operand:V4SI 3 "register_operand" "x")))]
+  "TARGET_XOP"
+  "vpmadcswd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "TI")])
 
 ;; XOP parallel XMM conditional moves
 (define_insn "xop_pcmov_<mode>"
-  [(set (match_operand:SSEMODE 0 "register_operand" "=x,x,x")
+  [(set (match_operand:SSEMODE 0 "register_operand" "=x,x")
 	(if_then_else:SSEMODE
-	  (match_operand:SSEMODE 3 "nonimmediate_operand" "x,x,m")
-	  (match_operand:SSEMODE 1 "vector_move_operand" "x,m,x")
-	  (match_operand:SSEMODE 2 "vector_move_operand" "xm,x,x")))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)"
+	  (match_operand:SSEMODE 3 "nonimmediate_operand" "x,m")
+	  (match_operand:SSEMODE 1 "vector_move_operand" "x,x")
+	  (match_operand:SSEMODE 2 "vector_move_operand" "xm,x")))]
+  "TARGET_XOP"
   "vpcmov\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "sse4arg")])
 
 (define_insn "xop_pcmov_<mode>256"
-  [(set (match_operand:AVX256MODE 0 "register_operand" "=x,x,x")
+  [(set (match_operand:AVX256MODE 0 "register_operand" "=x,x")
 	(if_then_else:AVX256MODE
-	  (match_operand:AVX256MODE 3 "nonimmediate_operand" "x,x,m")
-	  (match_operand:AVX256MODE 1 "vector_move_operand" "x,m,x")
-	  (match_operand:AVX256MODE 2 "vector_move_operand" "xm,x,x")))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)"
+	  (match_operand:AVX256MODE 3 "nonimmediate_operand" "x,m")
+	  (match_operand:AVX256MODE 1 "vector_move_operand" "x,x")
+	  (match_operand:AVX256MODE 2 "vector_move_operand" "xm,x")))]
+  "TARGET_XOP"
   "vpcmov\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "sse4arg")])
 
@@ -11268,53 +11244,53 @@
 
 ;; XOP permute instructions
 (define_insn "xop_pperm"
-  [(set (match_operand:V16QI 0 "register_operand" "=x,x,x")
+  [(set (match_operand:V16QI 0 "register_operand" "=x,x")
 	(unspec:V16QI
-	  [(match_operand:V16QI 1 "nonimmediate_operand" "x,x,m")
-	   (match_operand:V16QI 2 "nonimmediate_operand" "x,m,x")
-	   (match_operand:V16QI 3 "nonimmediate_operand" "xm,x,x")]
+	  [(match_operand:V16QI 1 "register_operand" "x,x")
+	   (match_operand:V16QI 2 "nonimmediate_operand" "x,m")
+	   (match_operand:V16QI 3 "nonimmediate_operand" "xm,x")]
 	  UNSPEC_XOP_PERMUTE))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)"
+  "TARGET_XOP && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vpperm\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "sse4arg")
    (set_attr "mode" "TI")])
 
 ;; XOP pack instructions that combine two vectors into a smaller vector
 (define_insn "xop_pperm_pack_v2di_v4si"
-  [(set (match_operand:V4SI 0 "register_operand" "=x,x,x")
+  [(set (match_operand:V4SI 0 "register_operand" "=x,x")
 	(vec_concat:V4SI
 	 (truncate:V2SI
-	  (match_operand:V2DI 1 "nonimmediate_operand" "x,x,m"))
+	  (match_operand:V2DI 1 "register_operand" "x,x"))
 	 (truncate:V2SI
-	  (match_operand:V2DI 2 "nonimmediate_operand" "x,m,x"))))
-   (use (match_operand:V16QI 3 "nonimmediate_operand" "xm,x,x"))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)"
+	  (match_operand:V2DI 2 "nonimmediate_operand" "x,m"))))
+   (use (match_operand:V16QI 3 "nonimmediate_operand" "xm,x"))]
+  "TARGET_XOP && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vpperm\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "sse4arg")
    (set_attr "mode" "TI")])
 
 (define_insn "xop_pperm_pack_v4si_v8hi"
-  [(set (match_operand:V8HI 0 "register_operand" "=x,x,x")
+  [(set (match_operand:V8HI 0 "register_operand" "=x,x")
 	(vec_concat:V8HI
 	 (truncate:V4HI
-	  (match_operand:V4SI 1 "nonimmediate_operand" "x,x,m"))
+	  (match_operand:V4SI 1 "register_operand" "x,x"))
 	 (truncate:V4HI
-	  (match_operand:V4SI 2 "nonimmediate_operand" "x,m,x"))))
-   (use (match_operand:V16QI 3 "nonimmediate_operand" "xm,x,x"))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)"
+	  (match_operand:V4SI 2 "nonimmediate_operand" "x,m"))))
+   (use (match_operand:V16QI 3 "nonimmediate_operand" "xm,x"))]
+  "TARGET_XOP && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vpperm\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "sse4arg")
    (set_attr "mode" "TI")])
 
 (define_insn "xop_pperm_pack_v8hi_v16qi"
-  [(set (match_operand:V16QI 0 "register_operand" "=x,x,x")
+  [(set (match_operand:V16QI 0 "register_operand" "=x,x")
 	(vec_concat:V16QI
 	 (truncate:V8QI
-	  (match_operand:V8HI 1 "nonimmediate_operand" "x,x,m"))
+	  (match_operand:V8HI 1 "register_operand" "x,x"))
 	 (truncate:V8QI
-	  (match_operand:V8HI 2 "nonimmediate_operand" "x,m,x"))))
-   (use (match_operand:V16QI 3 "nonimmediate_operand" "xm,x,x"))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)"
+	  (match_operand:V8HI 2 "nonimmediate_operand" "x,m"))))
+   (use (match_operand:V16QI 3 "nonimmediate_operand" "xm,x"))]
+  "TARGET_XOP && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vpperm\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "sse4arg")
    (set_attr "mode" "TI")])
@@ -11443,7 +11419,7 @@
 	 (rotatert:SSEMODE1248
 	  (match_dup 1)
 	  (neg:SSEMODE1248 (match_dup 2)))))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 3, true, 1, false)"
+  "TARGET_XOP && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
   "vprot<ssevecsize>\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "type" "sseishft")
    (set_attr "prefix_data16" "0")
@@ -11498,7 +11474,7 @@
 	 (ashiftrt:SSEMODE1248
 	  (match_dup 1)
 	  (neg:SSEMODE1248 (match_dup 2)))))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 3, true, 1, false)"
+  "TARGET_XOP && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
   "vpsha<ssevecsize>\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "type" "sseishft")
    (set_attr "prefix_data16" "0")
@@ -11517,7 +11493,7 @@
 	 (lshiftrt:SSEMODE1248
 	  (match_dup 1)
 	  (neg:SSEMODE1248 (match_dup 2)))))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 3, true, 1, false)"
+  "TARGET_XOP && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
   "vpshl<ssevecsize>\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "type" "sseishft")
    (set_attr "prefix_data16" "0")
-- 
1.6.0.4


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Vector permutation support for x86
  2009-11-27 23:54 ` H.J. Lu
@ 2010-04-17  5:27   ` H.J. Lu
  2010-11-23  8:42     ` H.J. Lu
  0 siblings, 1 reply; 45+ messages in thread
From: H.J. Lu @ 2010-04-17  5:27 UTC (permalink / raw)
  To: Richard Henderson; +Cc: GCC Patches, Sebastian Pop

On Fri, Nov 27, 2009 at 3:45 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Wed, Nov 25, 2009 at 6:56 PM, Richard Henderson <rth@redhat.com> wrote:
>> The following implements the builtin_vec_perm hook so that the vectorizer
>> can do its SLP thing.  As noted elsewhere, ISAs before SSSE3 cannot
>> arbitrarily permute, so this complicates things a bit.  But even given
>> SSSE3, the arbitrary two-vector permute costs 3 insns, and so we would want
>> to do most of this work to find the 1 and 2 insn special cases.
>>
>> For the AMD folk: I tried to support the vpperm insn from the XOP ISA, but
>> there seems to be some disconnect between trunk binutils and trunk gcc wrt
>> vpperm.  This can be seen in the failure of the new test "vperm-v4si-2x.c".
>>  I'm looking at the XOP spec labeled "Pub No 43479, Rev 3.03, May 2009", and
>> what gcc is emitting looks ok.  But I've already been bitten by an
>> out-of-date AVX spec during this adventure, so I'd appreciate some
>> double-check.
>>
>> Tested on an i7 machine (i.e. sse4.2).
>>
>
> This caused:
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42193
>

This also caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43771


-- 
H.J.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Vector permutation support for x86
  2010-04-17  5:27   ` H.J. Lu
@ 2010-11-23  8:42     ` H.J. Lu
  0 siblings, 0 replies; 45+ messages in thread
From: H.J. Lu @ 2010-11-23  8:42 UTC (permalink / raw)
  To: Richard Henderson; +Cc: GCC Patches, Sebastian Pop

On Fri, Apr 16, 2010 at 9:03 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Fri, Nov 27, 2009 at 3:45 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Wed, Nov 25, 2009 at 6:56 PM, Richard Henderson <rth@redhat.com> wrote:
>>> The following implements the builtin_vec_perm hook so that the vectorizer
>>> can do its SLP thing.  As noted elsewhere, ISAs before SSSE3 cannot
>>> arbitrarily permute, so this complicates things a bit.  But even given
>>> SSSE3, the arbitrary two-vector permute costs 3 insns, and so we would want
>>> to do most of this work to find the 1 and 2 insn special cases.
>>>
>>> For the AMD folk: I tried to support the vpperm insn from the XOP ISA, but
>>> there seems to be some disconnect between trunk binutils and trunk gcc wrt
>>> vpperm.  This can be seen in the failure of the new test "vperm-v4si-2x.c".
>>>  I'm looking at the XOP spec labeled "Pub No 43479, Rev 3.03, May 2009", and
>>> what gcc is emitting looks ok.  But I've already been bitten by an
>>> out-of-date AVX spec during this adventure, so I'd appreciate some
>>> double-check.
>>>
>>> Tested on an i7 machine (i.e. sse4.2).
>>>
>>
>> This caused:
>>
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42193
>>
>
> This also caused:
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43771
>

This also caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46614


-- 
H.J.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Vector permutation support for x86
  2009-12-07 20:10                       ` Sebastian Pop
@ 2009-12-07 22:02                         ` Richard Henderson
  0 siblings, 0 replies; 45+ messages in thread
From: Richard Henderson @ 2009-12-07 22:02 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: Uros Bizjak, GCC Patches

Ok.


r~

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Vector permutation support for x86
  2009-12-07 19:14                     ` Richard Henderson
@ 2009-12-07 20:10                       ` Sebastian Pop
  2009-12-07 22:02                         ` Richard Henderson
  0 siblings, 1 reply; 45+ messages in thread
From: Sebastian Pop @ 2009-12-07 20:10 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Uros Bizjak, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 1832 bytes --]

On Mon, Dec 7, 2009 at 13:08, Richard Henderson <rth@redhat.com> wrote:
>> -;; Note the instruction does not allow the value being added to be a
>> memory
>> -;; operation.  However by pretending via the nonimmediate_operand
>> predicate
>> -;; that it does and splitting it later allows the following to be
>> recognized:
>> -;;     a[i] = b[i] * c[i] + d[i];
>>  (define_insn "xop_pmacsww"
>>   [(set (match_operand:V8HI 0 "register_operand" "=x")
>>         (plus:V8HI
>>         (mult:V8HI
>> -         (match_operand:V8HI 1 "register_operand" "%x")
>> +         (match_operand:V8HI 1 "nonimmediate_operand" "%x")
>>          (match_operand:V8HI 2 "nonimmediate_operand" "xm"))
>> -        (match_operand:V8HI 3 "register_operand" "x")))]
>> +        (match_operand:V8HI 3 "nonimmediate_operand" "x")))]
>
> I think the comment is still valuable, minus the subclause about splitting.
>  Because otherwise I would question the use of nonimmediate_operand in op3.
>
> However, I do not believe that the same applies to any of the
> non-multiply-add patterns.  E.g.
>
>>  (define_insn "xop_pperm"
>>   [(set (match_operand:V16QI 0 "register_operand" "=x,x")
>>        (unspec:V16QI
>> -         [(match_operand:V16QI 1 "register_operand" "x,x")
>> +         [(match_operand:V16QI 1 "nonimmediate_operand" "x,x")
>
> There's really no reason to accept a memory operand here, AFAICS.
> Similarly with all of the patterns that follow.

Fixed like this.

	* config/i386/i386-protos.h (ix86_expand_fma4_multiple_memory):
	Removed.
	* config/i386/i386.c (ix86_expand_fma4_multiple_memory): Removed.
	* config/i386/sse.md: Remove all XOP splitters.
	Allow the second and fourth operands of XOP multiply-add insns
	to be nonimmediate.

Sebastian

[-- Attachment #2: 0001-Remove-XOP-splitters.patch --]
[-- Type: text/x-patch, Size: 13670 bytes --]

From b1381c4d97b1e5c4705f8ba74808666782035d25 Mon Sep 17 00:00:00 2001
From: Sebastian Pop <sebpop@gmail.com>
Date: Mon, 7 Dec 2009 11:38:28 -0600
Subject: [PATCH] Remove XOP splitters.

	* config/i386/i386-protos.h (ix86_expand_fma4_multiple_memory):
	Removed.
	* config/i386/i386.c (ix86_expand_fma4_multiple_memory): Removed.
	* config/i386/sse.md: Remove all XOP splitters.
	Allow the second and fourth operands of XOP multiply-add insns
	to be nonimmediate.
---
 gcc/config/i386/i386-protos.h |    2 -
 gcc/config/i386/i386.c        |   30 --------
 gcc/config/i386/sse.md        |  163 +++++++----------------------------------
 3 files changed, 27 insertions(+), 168 deletions(-)

diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index cf29cc7..aa2ccd7 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -218,8 +218,6 @@ extern void ix86_expand_vector_set (bool, rtx, rtx, int);
 extern void ix86_expand_vector_extract (bool, rtx, rtx, int);
 extern void ix86_expand_reduc_v4sf (rtx (*)(rtx, rtx, rtx), rtx, rtx);
 
-extern bool ix86_expand_fma4_multiple_memory (rtx [], enum machine_mode);
-
 extern void ix86_expand_vec_extract_even_odd (rtx, rtx, rtx, unsigned);
 
 /* In i386-c.c  */
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 6cd9d7d..7cafdf6 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -28808,36 +28808,6 @@ ix86_expand_round (rtx operand0, rtx operand1)
 }
 \f
 
-/* Fixup an FMA4 or XOP instruction that has 2 memory input references
-   into a form the hardware will allow by using the destination
-   register to load one of the memory operations.  Presently this is
-   used by the multiply/add routines to allow 2 memory references.  */
-
-bool
-ix86_expand_fma4_multiple_memory (rtx operands[],
-				  enum machine_mode mode)
-{
-  rtx scratch = operands[0];
-
-  gcc_assert (register_operand (operands[0], mode));
-  gcc_assert (register_operand (operands[1], mode));
-  gcc_assert (MEM_P (operands[2]) && MEM_P (operands[3]));
-
-  if (reg_mentioned_p (scratch, operands[1]))
-    {
-      if (!can_create_pseudo_p ())
-	return false;
-      scratch = gen_reg_rtx (mode);
-    }
-
-  emit_move_insn (scratch, operands[3]);
-  if (rtx_equal_p (operands[2], operands[3]))
-    operands[2] = operands[3] = scratch;
-  else
-    operands[3] = scratch;
-  return true;
-}
-
 /* Table of valid machine attributes.  */
 static const struct attribute_spec ix86_attribute_table[] =
 {
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 4e409c6..db06078 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -10132,89 +10132,50 @@
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 
 ;; XOP parallel integer multiply/add instructions.
-;; Note the instruction does not allow the value being added to be a memory
-;; operation.  However by pretending via the nonimmediate_operand predicate
-;; that it does and splitting it later allows the following to be recognized:
-;;	a[i] = b[i] * c[i] + d[i];
+;; Note the XOP multiply/add instructions
+;;     a[i] = b[i] * c[i] + d[i];
+;; do not allow the value being added to be a memory operation.
 (define_insn "xop_pmacsww"
   [(set (match_operand:V8HI 0 "register_operand" "=x")
         (plus:V8HI
 	 (mult:V8HI
-	  (match_operand:V8HI 1 "register_operand" "%x")
+	  (match_operand:V8HI 1 "nonimmediate_operand" "%x")
 	  (match_operand:V8HI 2 "nonimmediate_operand" "xm"))
-	 (match_operand:V8HI 3 "register_operand" "x")))]
+	 (match_operand:V8HI 3 "nonimmediate_operand" "x")))]
   "TARGET_XOP"
   "vpmacsww\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "TI")])
 
-;; Split pmacsww with two memory operands into a load and the pmacsww.
-(define_split
-  [(set (match_operand:V8HI 0 "register_operand" "")
-	(plus:V8HI
-	 (mult:V8HI (match_operand:V8HI 1 "register_operand" "")
-		    (match_operand:V8HI 2 "memory_operand" ""))
-	 (match_operand:V8HI 3 "memory_operand" "")))]
-  "TARGET_XOP"
-  [(set (match_dup 0)
-        (plus:V8HI
-         (mult:V8HI (match_dup 1) (match_dup 2))
-         (match_dup 3)))]
-{
-  if (!ix86_expand_fma4_multiple_memory (operands, V8HImode))
-    FAIL;
-})
-
 (define_insn "xop_pmacssww"
   [(set (match_operand:V8HI 0 "register_operand" "=x")
         (ss_plus:V8HI
-	 (mult:V8HI (match_operand:V8HI 1 "register_operand" "%x")
+	 (mult:V8HI (match_operand:V8HI 1 "nonimmediate_operand" "%x")
 		    (match_operand:V8HI 2 "nonimmediate_operand" "xm"))
-	 (match_operand:V8HI 3 "register_operand" "x")))]
+	 (match_operand:V8HI 3 "nonimmediate_operand" "x")))]
   "TARGET_XOP"
   "vpmacssww\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "TI")])
 
-;; Note the instruction does not allow the value being added to be a memory
-;; operation.  However by pretending via the nonimmediate_operand predicate
-;; that it does and splitting it later allows the following to be recognized:
-;;	a[i] = b[i] * c[i] + d[i];
 (define_insn "xop_pmacsdd"
   [(set (match_operand:V4SI 0 "register_operand" "=x")
         (plus:V4SI
 	 (mult:V4SI
-	  (match_operand:V4SI 1 "register_operand" "%x")
+	  (match_operand:V4SI 1 "nonimmediate_operand" "%x")
 	  (match_operand:V4SI 2 "nonimmediate_operand" "xm"))
-	 (match_operand:V4SI 3 "register_operand" "x")))]
+	 (match_operand:V4SI 3 "nonimmediate_operand" "x")))]
   "TARGET_XOP"
   "vpmacsdd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "TI")])
 
-;; Split pmacsdd with two memory operands into a load and the pmacsdd.
-(define_split
-  [(set (match_operand:V4SI 0 "register_operand" "")
-	(plus:V4SI
-	 (mult:V4SI (match_operand:V4SI 1 "register_operand" "")
-		    (match_operand:V4SI 2 "memory_operand" ""))
-	 (match_operand:V4SI 3 "memory_operand" "")))]
-  "TARGET_XOP"
-  [(set (match_dup 0)
-        (plus:V4SI
-         (mult:V4SI (match_dup 1) (match_dup 2))
-         (match_dup 3)))]
-{
-  if (!ix86_expand_fma4_multiple_memory (operands, V4SImode))
-    FAIL;
-})
-
 (define_insn "xop_pmacssdd"
   [(set (match_operand:V4SI 0 "register_operand" "=x")
         (ss_plus:V4SI
-	 (mult:V4SI (match_operand:V4SI 1 "register_operand" "%x")
+	 (mult:V4SI (match_operand:V4SI 1 "nonimmediate_operand" "%x")
 		    (match_operand:V4SI 2 "nonimmediate_operand" "xm"))
-	 (match_operand:V4SI 3 "register_operand" "x")))]
+	 (match_operand:V4SI 3 "nonimmediate_operand" "x")))]
   "TARGET_XOP"
   "vpmacssdd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
@@ -10226,14 +10187,14 @@
 	 (mult:V2DI
 	  (sign_extend:V2DI
 	   (vec_select:V2SI
-	    (match_operand:V4SI 1 "register_operand" "%x")
+	    (match_operand:V4SI 1 "nonimmediate_operand" "%x")
 	    (parallel [(const_int 1)
 		       (const_int 3)])))
 	  (vec_select:V2SI
 	   (match_operand:V4SI 2 "nonimmediate_operand" "xm")
 	   (parallel [(const_int 1)
 		      (const_int 3)])))
-	 (match_operand:V2DI 3 "register_operand" "x")))]
+	 (match_operand:V2DI 3 "nonimmediate_operand" "x")))]
   "TARGET_XOP"
   "vpmacssdql\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
@@ -10245,7 +10206,7 @@
 	 (mult:V2DI
 	  (sign_extend:V2DI
 	   (vec_select:V2SI
-	    (match_operand:V4SI 1 "register_operand" "%x")
+	    (match_operand:V4SI 1 "nonimmediate_operand" "%x")
 	    (parallel [(const_int 0)
 		       (const_int 2)])))
 	  (sign_extend:V2DI
@@ -10253,7 +10214,7 @@
 	    (match_operand:V4SI 2 "nonimmediate_operand" "xm")
 	    (parallel [(const_int 0)
 		       (const_int 2)]))))
-	 (match_operand:V2DI 3 "register_operand" "x")))]
+	 (match_operand:V2DI 3 "nonimmediate_operand" "x")))]
   "TARGET_XOP"
   "vpmacssdqh\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
@@ -10265,7 +10226,7 @@
 	 (mult:V2DI
 	  (sign_extend:V2DI
 	   (vec_select:V2SI
-	    (match_operand:V4SI 1 "register_operand" "%x")
+	    (match_operand:V4SI 1 "nonimmediate_operand" "%x")
 	    (parallel [(const_int 1)
 		       (const_int 3)])))
 	  (sign_extend:V2DI
@@ -10273,47 +10234,12 @@
 	    (match_operand:V4SI 2 "nonimmediate_operand" "xm")
 	    (parallel [(const_int 1)
 		       (const_int 3)]))))
-	 (match_operand:V2DI 3 "register_operand" "x")))]
+	 (match_operand:V2DI 3 "nonimmediate_operand" "x")))]
   "TARGET_XOP"
   "vpmacsdql\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "TI")])
 
-(define_insn_and_split "*xop_pmacsdql_mem"
-  [(set (match_operand:V2DI 0 "register_operand" "=&x")
-	(plus:V2DI
-	 (mult:V2DI
-	  (sign_extend:V2DI
-	   (vec_select:V2SI
-	    (match_operand:V4SI 1 "register_operand" "%x")
-	    (parallel [(const_int 1)
-		       (const_int 3)])))
-	  (sign_extend:V2DI
-	   (vec_select:V2SI
-	    (match_operand:V4SI 2 "nonimmediate_operand" "xm")
-	    (parallel [(const_int 1)
-		       (const_int 3)]))))
-	 (match_operand:V2DI 3 "memory_operand" "m")))]
-  "TARGET_XOP"
-  "#"
-  "&& reload_completed"
-  [(set (match_dup 0)
-	(match_dup 3))
-   (set (match_dup 0)
-	(plus:V2DI
-	 (mult:V2DI
-	  (sign_extend:V2DI
-	   (vec_select:V2SI
-	    (match_dup 1)
-	    (parallel [(const_int 1)
-		       (const_int 3)])))
-	  (sign_extend:V2DI
-	   (vec_select:V2SI
-	    (match_dup 2)
-	    (parallel [(const_int 1)
-		       (const_int 3)]))))
-	 (match_dup 0)))])
-
 ;; We don't have a straight 32-bit parallel multiply and extend on XOP, so
 ;; fake it with a multiply/add.  In general, we expect the define_split to
 ;; occur before register allocation, so we have to handle the corner case where
@@ -10362,7 +10288,7 @@
 	 (mult:V2DI
 	  (sign_extend:V2DI
 	   (vec_select:V2SI
-	    (match_operand:V4SI 1 "register_operand" "%x")
+	    (match_operand:V4SI 1 "nonimmediate_operand" "%x")
 	    (parallel [(const_int 0)
 		       (const_int 2)])))
 	  (sign_extend:V2DI
@@ -10370,47 +10296,12 @@
 	    (match_operand:V4SI 2 "nonimmediate_operand" "xm")
 	    (parallel [(const_int 0)
 		       (const_int 2)]))))
-	 (match_operand:V2DI 3 "register_operand" "x")))]
+	 (match_operand:V2DI 3 "nonimmediate_operand" "x")))]
   "TARGET_XOP"
   "vpmacsdqh\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "TI")])
 
-(define_insn_and_split "*xop_pmacsdqh_mem"
-  [(set (match_operand:V2DI 0 "register_operand" "=&x")
-	(plus:V2DI
-	 (mult:V2DI
-	  (sign_extend:V2DI
-	   (vec_select:V2SI
-	    (match_operand:V4SI 1 "register_operand" "%x")
-	    (parallel [(const_int 0)
-		       (const_int 2)])))
-	  (sign_extend:V2DI
-	   (vec_select:V2SI
-	    (match_operand:V4SI 2 "nonimmediate_operand" "xm")
-	    (parallel [(const_int 0)
-		       (const_int 2)]))))
-	 (match_operand:V2DI 3 "memory_operand" "m")))]
-  "TARGET_XOP"
-  "#"
-  "&& reload_completed"
-  [(set (match_dup 0)
-	(match_dup 3))
-   (set (match_dup 0)
-	(plus:V2DI
-	 (mult:V2DI
-	  (sign_extend:V2DI
-	   (vec_select:V2SI
-	    (match_dup 1)
-	    (parallel [(const_int 0)
-		       (const_int 2)])))
-	  (sign_extend:V2DI
-	   (vec_select:V2SI
-	    (match_dup 2)
-	    (parallel [(const_int 0)
-		       (const_int 2)]))))
-	 (match_dup 0)))])
-
 ;; We don't have a straight 32-bit parallel multiply and extend on XOP, so
 ;; fake it with a multiply/add.  In general, we expect the define_split to
 ;; occur before register allocation, so we have to handle the corner case where
@@ -10460,7 +10351,7 @@
 	 (mult:V4SI
 	  (sign_extend:V4SI
 	   (vec_select:V4HI
-	    (match_operand:V8HI 1 "register_operand" "%x")
+	    (match_operand:V8HI 1 "nonimmediate_operand" "%x")
 	    (parallel [(const_int 1)
 		       (const_int 3)
 		       (const_int 5)
@@ -10472,7 +10363,7 @@
 		       (const_int 3)
 		       (const_int 5)
 		       (const_int 7)]))))
-	 (match_operand:V4SI 3 "register_operand" "x")))]
+	 (match_operand:V4SI 3 "nonimmediate_operand" "x")))]
   "TARGET_XOP"
   "vpmacsswd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
@@ -10484,7 +10375,7 @@
 	 (mult:V4SI
 	  (sign_extend:V4SI
 	   (vec_select:V4HI
-	    (match_operand:V8HI 1 "register_operand" "%x")
+	    (match_operand:V8HI 1 "nonimmediate_operand" "%x")
 	    (parallel [(const_int 1)
 		       (const_int 3)
 		       (const_int 5)
@@ -10496,7 +10387,7 @@
 		       (const_int 3)
 		       (const_int 5)
 		       (const_int 7)]))))
-	 (match_operand:V4SI 3 "register_operand" "x")))]
+	 (match_operand:V4SI 3 "nonimmediate_operand" "x")))]
   "TARGET_XOP"
   "vpmacswd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
@@ -10509,7 +10400,7 @@
 	  (mult:V4SI
 	   (sign_extend:V4SI
 	    (vec_select:V4HI
-	     (match_operand:V8HI 1 "register_operand" "%x")
+	     (match_operand:V8HI 1 "nonimmediate_operand" "%x")
 	     (parallel [(const_int 0)
 			(const_int 2)
 			(const_int 4)
@@ -10536,7 +10427,7 @@
 			(const_int 3)
 			(const_int 5)
 			(const_int 7)])))))
-	 (match_operand:V4SI 3 "register_operand" "x")))]
+	 (match_operand:V4SI 3 "nonimmediate_operand" "x")))]
   "TARGET_XOP"
   "vpmadcsswd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
@@ -10549,7 +10440,7 @@
 	  (mult:V4SI
 	   (sign_extend:V4SI
 	    (vec_select:V4HI
-	     (match_operand:V8HI 1 "register_operand" "%x")
+	     (match_operand:V8HI 1 "nonimmediate_operand" "%x")
 	     (parallel [(const_int 0)
 			(const_int 2)
 			(const_int 4)
@@ -10576,7 +10467,7 @@
 			(const_int 3)
 			(const_int 5)
 			(const_int 7)])))))
-	 (match_operand:V4SI 3 "register_operand" "x")))]
+	 (match_operand:V4SI 3 "nonimmediate_operand" "x")))]
   "TARGET_XOP"
   "vpmadcswd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
-- 
1.6.0.4


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Vector permutation support for x86
  2009-12-07 18:28                   ` Sebastian Pop
@ 2009-12-07 19:14                     ` Richard Henderson
  2009-12-07 20:10                       ` Sebastian Pop
  0 siblings, 1 reply; 45+ messages in thread
From: Richard Henderson @ 2009-12-07 19:14 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: Uros Bizjak, GCC Patches

> -;; Note the instruction does not allow the value being added to be a memory
> -;; operation.  However by pretending via the nonimmediate_operand predicate
> -;; that it does and splitting it later allows the following to be recognized:
> -;;	a[i] = b[i] * c[i] + d[i];
>  (define_insn "xop_pmacsww"
>    [(set (match_operand:V8HI 0 "register_operand" "=x")
>          (plus:V8HI
>  	 (mult:V8HI
> -	  (match_operand:V8HI 1 "register_operand" "%x")
> +	  (match_operand:V8HI 1 "nonimmediate_operand" "%x")
>  	  (match_operand:V8HI 2 "nonimmediate_operand" "xm"))
> -	 (match_operand:V8HI 3 "register_operand" "x")))]
> +	 (match_operand:V8HI 3 "nonimmediate_operand" "x")))]

I think the comment is still valuable, minus the subclause about 
splitting.  Because otherwise I would question the use of 
nonimmediate_operand in op3.

However, I do not believe that the same applies to any of the
non-multiply-add patterns.  E.g.

>  (define_insn "xop_pperm"
>    [(set (match_operand:V16QI 0 "register_operand" "=x,x")
>  	(unspec:V16QI
> -	  [(match_operand:V16QI 1 "register_operand" "x,x")
> +	  [(match_operand:V16QI 1 "nonimmediate_operand" "x,x")

There's really no reason to accept a memory operand here, AFAICS.
Similarly with all of the patterns that follow.


r~

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Vector permutation support for x86
  2009-12-07 17:35                 ` Sebastian Pop
@ 2009-12-07 18:28                   ` Sebastian Pop
  2009-12-07 19:14                     ` Richard Henderson
  0 siblings, 1 reply; 45+ messages in thread
From: Sebastian Pop @ 2009-12-07 18:28 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Uros Bizjak, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 482 bytes --]

Hi,

Applying Uros' technique to eliminate the splitters of the XOP insns,
we get the attached patch.

	* config/i386/i386-protos.h (ix86_expand_fma4_multiple_memory):
	Removed.
	* config/i386/i386.c (ix86_expand_fma4_multiple_memory): Removed.
	* config/i386/sse.md: Remove all XOP splitters.
	Allow the second and fourth operands of XOP insns to be nonimmediate.

Passed make -k check RUNTESTFLAGS=i386.exp
Ok for trunk if it passes bootstrap and full regtest?

Thanks,
Sebastian

[-- Attachment #2: 0006-Remove-XOP-splitters.patch --]
[-- Type: text/x-patch, Size: 16697 bytes --]

From 1265f75889f955afe197693da11346fd19ef2e65 Mon Sep 17 00:00:00 2001
From: Sebastian Pop <sebpop@gmail.com>
Date: Mon, 7 Dec 2009 11:38:28 -0600
Subject: [PATCH] Remove XOP splitters.

	* config/i386/i386-protos.h (ix86_expand_fma4_multiple_memory):
	Removed.
	* config/i386/i386.c (ix86_expand_fma4_multiple_memory): Removed.
	* config/i386/sse.md: Remove all XOP splitters.
	Allow the second and fourth operands of XOP insns to be nonimmediate.
---
 gcc/config/i386/i386-protos.h |    2 -
 gcc/config/i386/i386.c        |   30 -------
 gcc/config/i386/sse.md        |  176 ++++++++---------------------------------
 3 files changed, 32 insertions(+), 176 deletions(-)

diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index cf29cc7..aa2ccd7 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -218,8 +218,6 @@ extern void ix86_expand_vector_set (bool, rtx, rtx, int);
 extern void ix86_expand_vector_extract (bool, rtx, rtx, int);
 extern void ix86_expand_reduc_v4sf (rtx (*)(rtx, rtx, rtx), rtx, rtx);
 
-extern bool ix86_expand_fma4_multiple_memory (rtx [], enum machine_mode);
-
 extern void ix86_expand_vec_extract_even_odd (rtx, rtx, rtx, unsigned);
 
 /* In i386-c.c  */
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 2ba330e..0e58a17 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -28808,36 +28808,6 @@ ix86_expand_round (rtx operand0, rtx operand1)
 }
 \f
 
-/* Fixup an FMA4 or XOP instruction that has 2 memory input references
-   into a form the hardware will allow by using the destination
-   register to load one of the memory operations.  Presently this is
-   used by the multiply/add routines to allow 2 memory references.  */
-
-bool
-ix86_expand_fma4_multiple_memory (rtx operands[],
-				  enum machine_mode mode)
-{
-  rtx scratch = operands[0];
-
-  gcc_assert (register_operand (operands[0], mode));
-  gcc_assert (register_operand (operands[1], mode));
-  gcc_assert (MEM_P (operands[2]) && MEM_P (operands[3]));
-
-  if (reg_mentioned_p (scratch, operands[1]))
-    {
-      if (!can_create_pseudo_p ())
-	return false;
-      scratch = gen_reg_rtx (mode);
-    }
-
-  emit_move_insn (scratch, operands[3]);
-  if (rtx_equal_p (operands[2], operands[3]))
-    operands[2] = operands[3] = scratch;
-  else
-    operands[3] = scratch;
-  return true;
-}
-
 /* Table of valid machine attributes.  */
 static const struct attribute_spec ix86_attribute_table[] =
 {
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 4e409c6..b64d78a 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -10132,89 +10132,47 @@
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 
 ;; XOP parallel integer multiply/add instructions.
-;; Note the instruction does not allow the value being added to be a memory
-;; operation.  However by pretending via the nonimmediate_operand predicate
-;; that it does and splitting it later allows the following to be recognized:
-;;	a[i] = b[i] * c[i] + d[i];
 (define_insn "xop_pmacsww"
   [(set (match_operand:V8HI 0 "register_operand" "=x")
         (plus:V8HI
 	 (mult:V8HI
-	  (match_operand:V8HI 1 "register_operand" "%x")
+	  (match_operand:V8HI 1 "nonimmediate_operand" "%x")
 	  (match_operand:V8HI 2 "nonimmediate_operand" "xm"))
-	 (match_operand:V8HI 3 "register_operand" "x")))]
+	 (match_operand:V8HI 3 "nonimmediate_operand" "x")))]
   "TARGET_XOP"
   "vpmacsww\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "TI")])
 
-;; Split pmacsww with two memory operands into a load and the pmacsww.
-(define_split
-  [(set (match_operand:V8HI 0 "register_operand" "")
-	(plus:V8HI
-	 (mult:V8HI (match_operand:V8HI 1 "register_operand" "")
-		    (match_operand:V8HI 2 "memory_operand" ""))
-	 (match_operand:V8HI 3 "memory_operand" "")))]
-  "TARGET_XOP"
-  [(set (match_dup 0)
-        (plus:V8HI
-         (mult:V8HI (match_dup 1) (match_dup 2))
-         (match_dup 3)))]
-{
-  if (!ix86_expand_fma4_multiple_memory (operands, V8HImode))
-    FAIL;
-})
-
 (define_insn "xop_pmacssww"
   [(set (match_operand:V8HI 0 "register_operand" "=x")
         (ss_plus:V8HI
-	 (mult:V8HI (match_operand:V8HI 1 "register_operand" "%x")
+	 (mult:V8HI (match_operand:V8HI 1 "nonimmediate_operand" "%x")
 		    (match_operand:V8HI 2 "nonimmediate_operand" "xm"))
-	 (match_operand:V8HI 3 "register_operand" "x")))]
+	 (match_operand:V8HI 3 "nonimmediate_operand" "x")))]
   "TARGET_XOP"
   "vpmacssww\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "TI")])
 
-;; Note the instruction does not allow the value being added to be a memory
-;; operation.  However by pretending via the nonimmediate_operand predicate
-;; that it does and splitting it later allows the following to be recognized:
-;;	a[i] = b[i] * c[i] + d[i];
 (define_insn "xop_pmacsdd"
   [(set (match_operand:V4SI 0 "register_operand" "=x")
         (plus:V4SI
 	 (mult:V4SI
-	  (match_operand:V4SI 1 "register_operand" "%x")
+	  (match_operand:V4SI 1 "nonimmediate_operand" "%x")
 	  (match_operand:V4SI 2 "nonimmediate_operand" "xm"))
-	 (match_operand:V4SI 3 "register_operand" "x")))]
+	 (match_operand:V4SI 3 "nonimmediate_operand" "x")))]
   "TARGET_XOP"
   "vpmacsdd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "TI")])
 
-;; Split pmacsdd with two memory operands into a load and the pmacsdd.
-(define_split
-  [(set (match_operand:V4SI 0 "register_operand" "")
-	(plus:V4SI
-	 (mult:V4SI (match_operand:V4SI 1 "register_operand" "")
-		    (match_operand:V4SI 2 "memory_operand" ""))
-	 (match_operand:V4SI 3 "memory_operand" "")))]
-  "TARGET_XOP"
-  [(set (match_dup 0)
-        (plus:V4SI
-         (mult:V4SI (match_dup 1) (match_dup 2))
-         (match_dup 3)))]
-{
-  if (!ix86_expand_fma4_multiple_memory (operands, V4SImode))
-    FAIL;
-})
-
 (define_insn "xop_pmacssdd"
   [(set (match_operand:V4SI 0 "register_operand" "=x")
         (ss_plus:V4SI
-	 (mult:V4SI (match_operand:V4SI 1 "register_operand" "%x")
+	 (mult:V4SI (match_operand:V4SI 1 "nonimmediate_operand" "%x")
 		    (match_operand:V4SI 2 "nonimmediate_operand" "xm"))
-	 (match_operand:V4SI 3 "register_operand" "x")))]
+	 (match_operand:V4SI 3 "nonimmediate_operand" "x")))]
   "TARGET_XOP"
   "vpmacssdd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
@@ -10226,14 +10184,14 @@
 	 (mult:V2DI
 	  (sign_extend:V2DI
 	   (vec_select:V2SI
-	    (match_operand:V4SI 1 "register_operand" "%x")
+	    (match_operand:V4SI 1 "nonimmediate_operand" "%x")
 	    (parallel [(const_int 1)
 		       (const_int 3)])))
 	  (vec_select:V2SI
 	   (match_operand:V4SI 2 "nonimmediate_operand" "xm")
 	   (parallel [(const_int 1)
 		      (const_int 3)])))
-	 (match_operand:V2DI 3 "register_operand" "x")))]
+	 (match_operand:V2DI 3 "nonimmediate_operand" "x")))]
   "TARGET_XOP"
   "vpmacssdql\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
@@ -10245,7 +10203,7 @@
 	 (mult:V2DI
 	  (sign_extend:V2DI
 	   (vec_select:V2SI
-	    (match_operand:V4SI 1 "register_operand" "%x")
+	    (match_operand:V4SI 1 "nonimmediate_operand" "%x")
 	    (parallel [(const_int 0)
 		       (const_int 2)])))
 	  (sign_extend:V2DI
@@ -10253,7 +10211,7 @@
 	    (match_operand:V4SI 2 "nonimmediate_operand" "xm")
 	    (parallel [(const_int 0)
 		       (const_int 2)]))))
-	 (match_operand:V2DI 3 "register_operand" "x")))]
+	 (match_operand:V2DI 3 "nonimmediate_operand" "x")))]
   "TARGET_XOP"
   "vpmacssdqh\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
@@ -10265,7 +10223,7 @@
 	 (mult:V2DI
 	  (sign_extend:V2DI
 	   (vec_select:V2SI
-	    (match_operand:V4SI 1 "register_operand" "%x")
+	    (match_operand:V4SI 1 "nonimmediate_operand" "%x")
 	    (parallel [(const_int 1)
 		       (const_int 3)])))
 	  (sign_extend:V2DI
@@ -10273,47 +10231,12 @@
 	    (match_operand:V4SI 2 "nonimmediate_operand" "xm")
 	    (parallel [(const_int 1)
 		       (const_int 3)]))))
-	 (match_operand:V2DI 3 "register_operand" "x")))]
+	 (match_operand:V2DI 3 "nonimmediate_operand" "x")))]
   "TARGET_XOP"
   "vpmacsdql\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "TI")])
 
-(define_insn_and_split "*xop_pmacsdql_mem"
-  [(set (match_operand:V2DI 0 "register_operand" "=&x")
-	(plus:V2DI
-	 (mult:V2DI
-	  (sign_extend:V2DI
-	   (vec_select:V2SI
-	    (match_operand:V4SI 1 "register_operand" "%x")
-	    (parallel [(const_int 1)
-		       (const_int 3)])))
-	  (sign_extend:V2DI
-	   (vec_select:V2SI
-	    (match_operand:V4SI 2 "nonimmediate_operand" "xm")
-	    (parallel [(const_int 1)
-		       (const_int 3)]))))
-	 (match_operand:V2DI 3 "memory_operand" "m")))]
-  "TARGET_XOP"
-  "#"
-  "&& reload_completed"
-  [(set (match_dup 0)
-	(match_dup 3))
-   (set (match_dup 0)
-	(plus:V2DI
-	 (mult:V2DI
-	  (sign_extend:V2DI
-	   (vec_select:V2SI
-	    (match_dup 1)
-	    (parallel [(const_int 1)
-		       (const_int 3)])))
-	  (sign_extend:V2DI
-	   (vec_select:V2SI
-	    (match_dup 2)
-	    (parallel [(const_int 1)
-		       (const_int 3)]))))
-	 (match_dup 0)))])
-
 ;; We don't have a straight 32-bit parallel multiply and extend on XOP, so
 ;; fake it with a multiply/add.  In general, we expect the define_split to
 ;; occur before register allocation, so we have to handle the corner case where
@@ -10362,7 +10285,7 @@
 	 (mult:V2DI
 	  (sign_extend:V2DI
 	   (vec_select:V2SI
-	    (match_operand:V4SI 1 "register_operand" "%x")
+	    (match_operand:V4SI 1 "nonimmediate_operand" "%x")
 	    (parallel [(const_int 0)
 		       (const_int 2)])))
 	  (sign_extend:V2DI
@@ -10370,47 +10293,12 @@
 	    (match_operand:V4SI 2 "nonimmediate_operand" "xm")
 	    (parallel [(const_int 0)
 		       (const_int 2)]))))
-	 (match_operand:V2DI 3 "register_operand" "x")))]
+	 (match_operand:V2DI 3 "nonimmediate_operand" "x")))]
   "TARGET_XOP"
   "vpmacsdqh\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "TI")])
 
-(define_insn_and_split "*xop_pmacsdqh_mem"
-  [(set (match_operand:V2DI 0 "register_operand" "=&x")
-	(plus:V2DI
-	 (mult:V2DI
-	  (sign_extend:V2DI
-	   (vec_select:V2SI
-	    (match_operand:V4SI 1 "register_operand" "%x")
-	    (parallel [(const_int 0)
-		       (const_int 2)])))
-	  (sign_extend:V2DI
-	   (vec_select:V2SI
-	    (match_operand:V4SI 2 "nonimmediate_operand" "xm")
-	    (parallel [(const_int 0)
-		       (const_int 2)]))))
-	 (match_operand:V2DI 3 "memory_operand" "m")))]
-  "TARGET_XOP"
-  "#"
-  "&& reload_completed"
-  [(set (match_dup 0)
-	(match_dup 3))
-   (set (match_dup 0)
-	(plus:V2DI
-	 (mult:V2DI
-	  (sign_extend:V2DI
-	   (vec_select:V2SI
-	    (match_dup 1)
-	    (parallel [(const_int 0)
-		       (const_int 2)])))
-	  (sign_extend:V2DI
-	   (vec_select:V2SI
-	    (match_dup 2)
-	    (parallel [(const_int 0)
-		       (const_int 2)]))))
-	 (match_dup 0)))])
-
 ;; We don't have a straight 32-bit parallel multiply and extend on XOP, so
 ;; fake it with a multiply/add.  In general, we expect the define_split to
 ;; occur before register allocation, so we have to handle the corner case where
@@ -10460,7 +10348,7 @@
 	 (mult:V4SI
 	  (sign_extend:V4SI
 	   (vec_select:V4HI
-	    (match_operand:V8HI 1 "register_operand" "%x")
+	    (match_operand:V8HI 1 "nonimmediate_operand" "%x")
 	    (parallel [(const_int 1)
 		       (const_int 3)
 		       (const_int 5)
@@ -10472,7 +10360,7 @@
 		       (const_int 3)
 		       (const_int 5)
 		       (const_int 7)]))))
-	 (match_operand:V4SI 3 "register_operand" "x")))]
+	 (match_operand:V4SI 3 "nonimmediate_operand" "x")))]
   "TARGET_XOP"
   "vpmacsswd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
@@ -10484,7 +10372,7 @@
 	 (mult:V4SI
 	  (sign_extend:V4SI
 	   (vec_select:V4HI
-	    (match_operand:V8HI 1 "register_operand" "%x")
+	    (match_operand:V8HI 1 "nonimmediate_operand" "%x")
 	    (parallel [(const_int 1)
 		       (const_int 3)
 		       (const_int 5)
@@ -10496,7 +10384,7 @@
 		       (const_int 3)
 		       (const_int 5)
 		       (const_int 7)]))))
-	 (match_operand:V4SI 3 "register_operand" "x")))]
+	 (match_operand:V4SI 3 "nonimmediate_operand" "x")))]
   "TARGET_XOP"
   "vpmacswd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
@@ -10509,7 +10397,7 @@
 	  (mult:V4SI
 	   (sign_extend:V4SI
 	    (vec_select:V4HI
-	     (match_operand:V8HI 1 "register_operand" "%x")
+	     (match_operand:V8HI 1 "nonimmediate_operand" "%x")
 	     (parallel [(const_int 0)
 			(const_int 2)
 			(const_int 4)
@@ -10536,7 +10424,7 @@
 			(const_int 3)
 			(const_int 5)
 			(const_int 7)])))))
-	 (match_operand:V4SI 3 "register_operand" "x")))]
+	 (match_operand:V4SI 3 "nonimmediate_operand" "x")))]
   "TARGET_XOP"
   "vpmadcsswd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
@@ -10549,7 +10437,7 @@
 	  (mult:V4SI
 	   (sign_extend:V4SI
 	    (vec_select:V4HI
-	     (match_operand:V8HI 1 "register_operand" "%x")
+	     (match_operand:V8HI 1 "nonimmediate_operand" "%x")
 	     (parallel [(const_int 0)
 			(const_int 2)
 			(const_int 4)
@@ -10576,7 +10464,7 @@
 			(const_int 3)
 			(const_int 5)
 			(const_int 7)])))))
-	 (match_operand:V4SI 3 "register_operand" "x")))]
+	 (match_operand:V4SI 3 "nonimmediate_operand" "x")))]
   "TARGET_XOP"
   "vpmadcswd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
@@ -11047,7 +10935,7 @@
 (define_insn "xop_pperm"
   [(set (match_operand:V16QI 0 "register_operand" "=x,x")
 	(unspec:V16QI
-	  [(match_operand:V16QI 1 "register_operand" "x,x")
+	  [(match_operand:V16QI 1 "nonimmediate_operand" "x,x")
 	   (match_operand:V16QI 2 "nonimmediate_operand" "x,m")
 	   (match_operand:V16QI 3 "nonimmediate_operand" "xm,x")]
 	  UNSPEC_XOP_PERMUTE))]
@@ -11061,7 +10949,7 @@
   [(set (match_operand:V4SI 0 "register_operand" "=x,x")
 	(vec_concat:V4SI
 	 (truncate:V2SI
-	  (match_operand:V2DI 1 "register_operand" "x,x"))
+	  (match_operand:V2DI 1 "nonimmediate_operand" "x,x"))
 	 (truncate:V2SI
 	  (match_operand:V2DI 2 "nonimmediate_operand" "x,m"))))
    (use (match_operand:V16QI 3 "nonimmediate_operand" "xm,x"))]
@@ -11074,7 +10962,7 @@
   [(set (match_operand:V8HI 0 "register_operand" "=x,x")
 	(vec_concat:V8HI
 	 (truncate:V4HI
-	  (match_operand:V4SI 1 "register_operand" "x,x"))
+	  (match_operand:V4SI 1 "nonimmediate_operand" "x,x"))
 	 (truncate:V4HI
 	  (match_operand:V4SI 2 "nonimmediate_operand" "x,m"))))
    (use (match_operand:V16QI 3 "nonimmediate_operand" "xm,x"))]
@@ -11087,7 +10975,7 @@
   [(set (match_operand:V16QI 0 "register_operand" "=x,x")
 	(vec_concat:V16QI
 	 (truncate:V8QI
-	  (match_operand:V8HI 1 "register_operand" "x,x"))
+	  (match_operand:V8HI 1 "nonimmediate_operand" "x,x"))
 	 (truncate:V8QI
 	  (match_operand:V8HI 2 "nonimmediate_operand" "x,m"))))
    (use (match_operand:V16QI 3 "nonimmediate_operand" "xm,x"))]
@@ -11441,7 +11329,7 @@
 (define_insn "xop_maskcmp<mode>3"
   [(set (match_operand:SSEMODE1248 0 "register_operand" "=x")
 	(match_operator:SSEMODE1248 1 "ix86_comparison_int_operator"
-	 [(match_operand:SSEMODE1248 2 "register_operand" "x")
+	 [(match_operand:SSEMODE1248 2 "nonimmediate_operand" "x")
 	  (match_operand:SSEMODE1248 3 "nonimmediate_operand" "xm")]))]
   "TARGET_XOP"
   "vpcom%Y1<ssevecsize>\t{%3, %2, %0|%0, %2, %3}"
@@ -11455,7 +11343,7 @@
 (define_insn "xop_maskcmp_uns<mode>3"
   [(set (match_operand:SSEMODE1248 0 "register_operand" "=x")
 	(match_operator:SSEMODE1248 1 "ix86_comparison_uns_operator"
-	 [(match_operand:SSEMODE1248 2 "register_operand" "x")
+	 [(match_operand:SSEMODE1248 2 "nonimmediate_operand" "x")
 	  (match_operand:SSEMODE1248 3 "nonimmediate_operand" "xm")]))]
   "TARGET_XOP"
   "vpcom%Y1u<ssevecsize>\t{%3, %2, %0|%0, %2, %3}"
@@ -11473,7 +11361,7 @@
   [(set (match_operand:SSEMODE1248 0 "register_operand" "=x")
 	(unspec:SSEMODE1248
 	 [(match_operator:SSEMODE1248 1 "ix86_comparison_uns_operator"
-	  [(match_operand:SSEMODE1248 2 "register_operand" "x")
+	  [(match_operand:SSEMODE1248 2 "nonimmediate_operand" "x")
 	   (match_operand:SSEMODE1248 3 "nonimmediate_operand" "xm")])]
 	 UNSPEC_XOP_UNSIGNED_CMP))]
   "TARGET_XOP"
@@ -11489,7 +11377,7 @@
 (define_insn "xop_pcom_tf<mode>3"
   [(set (match_operand:SSEMODE1248 0 "register_operand" "=x")
 	(unspec:SSEMODE1248
-	  [(match_operand:SSEMODE1248 1 "register_operand" "x")
+	  [(match_operand:SSEMODE1248 1 "nonimmediate_operand" "x")
 	   (match_operand:SSEMODE1248 2 "nonimmediate_operand" "xm")
 	   (match_operand:SI 3 "const_int_operand" "n")]
 	  UNSPEC_XOP_TRUEFALSE))]
-- 
1.6.0.4


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Vector permutation support for x86
  2009-12-07  0:21               ` Richard Henderson
@ 2009-12-07 17:35                 ` Sebastian Pop
  2009-12-07 18:28                   ` Sebastian Pop
  0 siblings, 1 reply; 45+ messages in thread
From: Sebastian Pop @ 2009-12-07 17:35 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Uros Bizjak, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 532 bytes --]

On Sun, Dec 6, 2009 at 17:15, Richard Henderson <rth@redhat.com> wrote:
> In addition to Uros' comment about ix86_fma4_valid_p,
> there are a number of instances of
>
>> +;; Allow two memory operands the same as fmadd.
>
> which are no longer true -- we're in fact allowing 3 memory operands.  I
> don't think the comment is really that useful.
>
> Ok with those changes.
>

Here is the fix to the comments on top of the previous patch.
I squashed this into the previous patch and I will apply to trunk.

Sebastian

[-- Attachment #2: 1758_fix_comments.diff --]
[-- Type: text/x-patch, Size: 1847 bytes --]

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 96c7b25..4e409c6 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1716,7 +1716,6 @@
    (set_attr "mode" "<MODE>")])
 
 ;; Floating multiply and subtract.
-;; Allow two memory operands the same as fmadd.
 (define_insn "fma4_fmsub<mode>4256"
   [(set (match_operand:FMA4MODEF4 0 "register_operand" "=x,x")
 	(minus:FMA4MODEF4
@@ -1731,8 +1730,6 @@
 
 ;; Floating point negative multiply and add.
 ;; Rewrite (- (a * b) + c) into the canonical form: c - (a * b).
-;; Note operands are out of order to simplify call to ix86_fma4_valid_p.
-;; Allow two memory operands to help in optimizing.
 (define_insn "fma4_fnmadd<mode>4256"
   [(set (match_operand:FMA4MODEF4 0 "register_operand" "=x,x")
 	(minus:FMA4MODEF4
@@ -1746,8 +1743,6 @@
    (set_attr "mode" "<MODE>")])
 
 ;; Floating point negative multiply and subtract.
-;; Rewrite (- (a * b) - c) into the canonical form: ((-a) * b) - c.
-;; Allow 2 memory operands to help with optimization.
 (define_insn "fma4_fnmsub<mode>4256"
   [(set (match_operand:FMA4MODEF4 0 "register_operand" "=x,x")
 	(minus:FMA4MODEF4
@@ -1825,8 +1820,6 @@
 
 ;; Floating point negative multiply and add.
 ;; Rewrite (- (a * b) + c) into the canonical form: c - (a * b).
-;; Note operands are out of order to simplify call to ix86_fma4_valid_p.
-;; Allow two memory operands to help in optimizing.
 (define_insn "fma4_fnmadd<mode>4"
   [(set (match_operand:SSEMODEF4 0 "register_operand" "=x,x")
 	(minus:SSEMODEF4
@@ -1859,7 +1852,6 @@
 
 ;; Floating point negative multiply and subtract.
 ;; Rewrite (- (a * b) - c) into the canonical form: ((-a) * b) - c.
-;; Allow 2 memory operands to help with optimization.
 (define_insn "fma4_fnmsub<mode>4"
   [(set (match_operand:SSEMODEF4 0 "register_operand" "=x,x")
 	(minus:SSEMODEF4

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Vector permutation support for x86
  2009-12-05 20:40             ` Sebastian Pop
  2009-12-05 21:51               ` Sebastian Pop
  2009-12-06 12:20               ` Uros Bizjak
@ 2009-12-07  0:21               ` Richard Henderson
  2009-12-07 17:35                 ` Sebastian Pop
  2 siblings, 1 reply; 45+ messages in thread
From: Richard Henderson @ 2009-12-07  0:21 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: Uros Bizjak, GCC Patches

In addition to Uros' comment about ix86_fma4_valid_p,
there are a number of instances of

> +;; Allow two memory operands the same as fmadd.

which are no longer true -- we're in fact allowing 3 memory operands.  I 
don't think the comment is really that useful.

Ok with those changes.


r~

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Vector permutation support for x86
  2009-12-05 20:40             ` Sebastian Pop
  2009-12-05 21:51               ` Sebastian Pop
@ 2009-12-06 12:20               ` Uros Bizjak
  2009-12-07  0:21               ` Richard Henderson
  2 siblings, 0 replies; 45+ messages in thread
From: Uros Bizjak @ 2009-12-06 12:20 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: Richard Henderson, GCC Patches

On 12/05/2009 09:39 PM, Sebastian Pop wrote:

>>> >>  Attached patch was tested in gcc.target/i386 directory with and without -m32
>>> >>  and fixes mentioned regression for -m32. Despite only light testing, I'm
>>> >>  pretty confident that we should change handling of all FMA patterns in the
>>> >>  same way as shown in the example in the attached patch.
>>> >>
>>>        
>> >
>> >  I will prepare a patch that does that for the other FMA4 patterns.
>>      
> Fixed like this:
>
> 	* config/i386/sse.md: Remove all FMA4 splitters.
> 	Allow the second operand of FMA4 insns to be a nonimmediate.
> 	Fix comments punctuation.
>
> Okay to commit once it passes full bootstrap and test with
> make -k check RUNTESTFLAGS="--target_board=unix\{,-m32\}"
>
> -;; Floating point negative multiply and add
> -;; Rewrite (- (a * b) + c) into the canonical form: c - (a * b)
> -;; Note operands are out of order to simplify call to ix86_fma4_valid_p
> +;; Floating point negative multiply and add.
> +;; Rewrite (- (a * b) + c) into the canonical form: c - (a * b).
> +;; Note operands are out of order to simplify call to ix86_fma4_valid_p.
>    

This comment about ix86_fma4_valid_p probably doesn't apply anymore (so 
we can perhaps put operands back in order).

The patch is OK, but please wait a day or two if rth would like to comment.

Thanks,
Uros.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Vector permutation support for x86
  2009-12-05 21:51               ` Sebastian Pop
@ 2009-12-06  8:42                 ` Sebastian Pop
  0 siblings, 0 replies; 45+ messages in thread
From: Sebastian Pop @ 2009-12-06  8:42 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: Richard Henderson, GCC Patches

>>        * config/i386/sse.md: Remove all FMA4 splitters.
>>        Allow the second operand of FMA4 insns to be a nonimmediate.
>>        Fix comments punctuation.
>>
>> Okay to commit once it passes full bootstrap and test with
>> make -k check RUNTESTFLAGS="--target_board=unix\{,-m32\}"
>>

Passed this test with no new regressions.

Sebastian

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Vector permutation support for x86
  2009-12-05 20:40             ` Sebastian Pop
@ 2009-12-05 21:51               ` Sebastian Pop
  2009-12-06  8:42                 ` Sebastian Pop
  2009-12-06 12:20               ` Uros Bizjak
  2009-12-07  0:21               ` Richard Henderson
  2 siblings, 1 reply; 45+ messages in thread
From: Sebastian Pop @ 2009-12-05 21:51 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: Richard Henderson, GCC Patches

On Sat, Dec 5, 2009 at 14:39, Sebastian Pop <sebpop@gmail.com> wrote:
> On Sat, Dec 5, 2009 at 11:20, Sebastian Pop <sebpop@gmail.com> wrote:
>> On Sat, Dec 5, 2009 at 10:04, Uros Bizjak <ubizjak@gmail.com> wrote:
>>> Attached patch was tested in gcc.target/i386 directory with and without -m32
>>> and fixes mentioned regression for -m32. Despite only light testing, I'm
>>> pretty confident that we should change handling of all FMA patterns in the
>>> same way as shown in the example in the attached patch.
>>>
>>
>> I will prepare a patch that does that for the other FMA4 patterns.
>
> Fixed like this:
>
>        * config/i386/sse.md: Remove all FMA4 splitters.
>        Allow the second operand of FMA4 insns to be a nonimmediate.
>        Fix comments punctuation.
>
> Okay to commit once it passes full bootstrap and test with
> make -k check RUNTESTFLAGS="--target_board=unix\{,-m32\}"
>

Forgot to say that this passes make -k check RUNTESTFLAGS=i386.exp
Full check is in progress.

Sebastian

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Vector permutation support for x86
  2009-12-05 17:49           ` Sebastian Pop
@ 2009-12-05 20:40             ` Sebastian Pop
  2009-12-05 21:51               ` Sebastian Pop
                                 ` (2 more replies)
  0 siblings, 3 replies; 45+ messages in thread
From: Sebastian Pop @ 2009-12-05 20:40 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: Richard Henderson, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 803 bytes --]

On Sat, Dec 5, 2009 at 11:20, Sebastian Pop <sebpop@gmail.com> wrote:
> On Sat, Dec 5, 2009 at 10:04, Uros Bizjak <ubizjak@gmail.com> wrote:
>> Attached patch was tested in gcc.target/i386 directory with and without -m32
>> and fixes mentioned regression for -m32. Despite only light testing, I'm
>> pretty confident that we should change handling of all FMA patterns in the
>> same way as shown in the example in the attached patch.
>>
>
> I will prepare a patch that does that for the other FMA4 patterns.

Fixed like this:

	* config/i386/sse.md: Remove all FMA4 splitters.
	Allow the second operand of FMA4 insns to be a nonimmediate.
	Fix comments punctuation.

Okay to commit once it passes full bootstrap and test with
make -k check RUNTESTFLAGS="--target_board=unix\{,-m32\}"

Thanks,
Sebastian

[-- Attachment #2: 0001-Remove-all-FMA4-splitters.patch --]
[-- Type: text/x-patch, Size: 35796 bytes --]

From 79bfec22ea942836cde5b29ded7c446b944c2e7c Mon Sep 17 00:00:00 2001
From: Sebastian Pop <sebpop@gmail.com>
Date: Sat, 5 Dec 2009 14:33:23 -0600
Subject: [PATCH] Remove all FMA4 splitters.

	* config/i386/sse.md: Remove all FMA4 splitters.
	Allow the second operand of FMA4 insns to be a nonimmediate.
	Fix comments punctuation.
---
 gcc/config/i386/sse.md |  403 +++++++++++++-----------------------------------
 1 files changed, 104 insertions(+), 299 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 9524d4f..96c7b25 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1686,8 +1686,9 @@
 
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ;;
-;; FMA4 floating point multiply/accumulate instructions This includes the
-;; scalar version of the instructions as well as the vector
+;; FMA4 floating point multiply/accumulate instructions.  This
+;; includes the scalar version of the instructions as well as the
+;; vector.
 ;;
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 
@@ -1706,367 +1707,201 @@
   [(set (match_operand:FMA4MODEF4 0 "register_operand" "=x,x")
 	(plus:FMA4MODEF4
 	 (mult:FMA4MODEF4
-	  (match_operand:FMA4MODEF4 1 "register_operand" "%x,x")
+	  (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "%x,x")
 	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,m"))
 	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x")))]
-  "TARGET_FMA4 && TARGET_FUSED_MADD
-   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD"
   "vfmadd<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-;; Split fmadd with two memory operands into a load and the fmadd.
-(define_split
-  [(set (match_operand:FMA4MODEF4 0 "register_operand" "")
-	(plus:FMA4MODEF4
-	 (mult:FMA4MODEF4
-	  (match_operand:FMA4MODEF4 1 "register_operand" "")
-	  (match_operand:FMA4MODEF4 2 "memory_operand" ""))
-	 (match_operand:FMA4MODEF4 3 "memory_operand" "")))]
-  "TARGET_FMA4"
-  [(set (match_dup 0)
-        (plus:FMA4MODEF4
-         (mult:FMA4MODEF4 (match_dup 1) (match_dup 2))
-         (match_dup 3)))]
-{
- if (!ix86_expand_fma4_multiple_memory (operands, <MODE>mode))
-   FAIL;
-})
-
-;; Floating multiply and subtract
-;; Allow two memory operands the same as fmadd
+;; Floating multiply and subtract.
+;; Allow two memory operands the same as fmadd.
 (define_insn "fma4_fmsub<mode>4256"
   [(set (match_operand:FMA4MODEF4 0 "register_operand" "=x,x")
 	(minus:FMA4MODEF4
 	 (mult:FMA4MODEF4
-	  (match_operand:FMA4MODEF4 1 "register_operand" "%x,x")
+	  (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "%x,x")
 	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,m"))
 	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x")))]
-  "TARGET_FMA4 && TARGET_FUSED_MADD
-   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD"
   "vfmsub<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-;; Split fmsub with two memory operands into a load and the fmsub.
-(define_split
-  [(set (match_operand:FMA4MODEF4 0 "register_operand" "")
-	(minus:FMA4MODEF4
-	 (mult:FMA4MODEF4
-	  (match_operand:FMA4MODEF4 1 "register_operand" "")
-	  (match_operand:FMA4MODEF4 2 "memory_operand" ""))
-	 (match_operand:FMA4MODEF4 3 "memory_operand" "")))]
-  "TARGET_FMA4"
-  [(set (match_dup 0)
-        (minus:FMA4MODEF4
-         (mult:FMA4MODEF4 (match_dup 1) (match_dup 2))
-         (match_dup 3)))]
-{
- if (!ix86_expand_fma4_multiple_memory (operands, <MODE>mode))
-   FAIL;
-})
-
-;; Floating point negative multiply and add
-;; Rewrite (- (a * b) + c) into the canonical form: c - (a * b)
-;; Note operands are out of order to simplify call to ix86_fma4_valid_p
+;; Floating point negative multiply and add.
+;; Rewrite (- (a * b) + c) into the canonical form: c - (a * b).
+;; Note operands are out of order to simplify call to ix86_fma4_valid_p.
 ;; Allow two memory operands to help in optimizing.
 (define_insn "fma4_fnmadd<mode>4256"
   [(set (match_operand:FMA4MODEF4 0 "register_operand" "=x,x")
 	(minus:FMA4MODEF4
 	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x")
 	 (mult:FMA4MODEF4
-	  (match_operand:FMA4MODEF4 1 "register_operand" "%x,x")
+	  (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "%x,x")
 	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,m"))))]
-  "TARGET_FMA4 && TARGET_FUSED_MADD
-   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD"
   "vfnmadd<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-;; Split fnmadd with two memory operands into a load and the fnmadd.
-(define_split
-  [(set (match_operand:FMA4MODEF4 0 "register_operand" "")
-	(minus:FMA4MODEF4
-	 (match_operand:FMA4MODEF4 3 "memory_operand" "")
-	 (mult:FMA4MODEF4
-	  (match_operand:FMA4MODEF4 1 "register_operand" "")
-	  (match_operand:FMA4MODEF4 2 "memory_operand" ""))))]
-  "TARGET_FMA4"
-  [(set (match_dup 0)
-        (minus:FMA4MODEF4
-	 (match_dup 3)
-         (mult:FMA4MODEF4 (match_dup 1) (match_dup 2))))]
-{
-  if (!ix86_expand_fma4_multiple_memory (operands, <MODE>mode))
-    FAIL;
-})
-
-;; Floating point negative multiply and subtract
-;; Rewrite (- (a * b) - c) into the canonical form: ((-a) * b) - c
-;; Allow 2 memory operands to help with optimization
+;; Floating point negative multiply and subtract.
+;; Rewrite (- (a * b) - c) into the canonical form: ((-a) * b) - c.
+;; Allow 2 memory operands to help with optimization.
 (define_insn "fma4_fnmsub<mode>4256"
   [(set (match_operand:FMA4MODEF4 0 "register_operand" "=x,x")
 	(minus:FMA4MODEF4
 	 (mult:FMA4MODEF4
 	  (neg:FMA4MODEF4
-	   (match_operand:FMA4MODEF4 1 "register_operand" "%x,x"))
+	   (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "%x,x"))
 	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,m"))
 	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x")))]
-  "TARGET_FMA4 && TARGET_FUSED_MADD
-   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD"
   "vfnmsub<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-;; Split fnmsub with two memory operands into a load and the fmsub.
-(define_split
-  [(set (match_operand:FMA4MODEF4 0 "register_operand" "")
-	(minus:FMA4MODEF4
-	 (mult:FMA4MODEF4
-	  (neg:FMA4MODEF4
-	   (match_operand:FMA4MODEF4 1 "register_operand" ""))
-	  (match_operand:FMA4MODEF4 2 "memory_operand" ""))
-	 (match_operand:FMA4MODEF4 3 "memory_operand" "")))]
-  "TARGET_FMA4"
-  [(set (match_dup 0)
-        (minus:FMA4MODEF4
-         (mult:FMA4MODEF4
-	  (neg:FMA4MODEF4 (match_dup 1))
-	  (match_dup 2))
-         (match_dup 3)))]
-{
-  if (!ix86_expand_fma4_multiple_memory (operands, <MODE>mode))
-    FAIL;
-})
-
-;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 (define_insn "fma4_fmadd<mode>4"
   [(set (match_operand:SSEMODEF4 0 "register_operand" "=x,x")
 	(plus:SSEMODEF4
 	 (mult:SSEMODEF4
-	  (match_operand:SSEMODEF4 1 "register_operand" "%x,x")
+	  (match_operand:SSEMODEF4 1 "nonimmediate_operand" "%x,x")
 	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,m"))
 	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "xm,x")))]
-  "TARGET_FMA4 && TARGET_FUSED_MADD
-   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD"
   "vfmadd<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-;; Split fmadd with two memory operands into a load and the fmadd.
-(define_split
-  [(set (match_operand:SSEMODEF4 0 "register_operand" "")
-	(plus:SSEMODEF4
-	 (mult:SSEMODEF4
-	  (match_operand:SSEMODEF4 1 "register_operand" "")
-	  (match_operand:SSEMODEF4 2 "memory_operand" ""))
-	 (match_operand:SSEMODEF4 3 "memory_operand" "")))]
-  "TARGET_FMA4"
-  [(set (match_dup 0)
-        (plus:SSEMODEF4
-         (mult:SSEMODEF4 (match_dup 1) (match_dup 2))
-         (match_dup 3)))]
-{
-  if (!ix86_expand_fma4_multiple_memory (operands, <MODE>mode))
-    FAIL;
-})
-
 ;; For the scalar operations, use operand1 for the upper words that aren't
 ;; modified, so restrict the forms that are generated.
-;; Scalar version of fmadd
+;; Scalar version of fmadd.
 (define_insn "fma4_vmfmadd<mode>4"
   [(set (match_operand:SSEMODEF2P 0 "register_operand" "=x,x")
 	(vec_merge:SSEMODEF2P
 	 (plus:SSEMODEF2P
 	  (mult:SSEMODEF2P
-	   (match_operand:SSEMODEF2P 1 "register_operand" "%x,x")
+	   (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "%x,x")
 	   (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m"))
 	  (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))
 	 (match_dup 0)
 	 (const_int 1)))]
-  "TARGET_FMA4 && TARGET_FUSED_MADD
-   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD"
   "vfmadd<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-;; Floating multiply and subtract
-;; Allow two memory operands the same as fmadd
+;; Floating multiply and subtract.
+;; Allow two memory operands the same as fmadd.
 (define_insn "fma4_fmsub<mode>4"
   [(set (match_operand:SSEMODEF4 0 "register_operand" "=x,x")
 	(minus:SSEMODEF4
 	 (mult:SSEMODEF4
-	  (match_operand:SSEMODEF4 1 "register_operand" "%x,x")
+	  (match_operand:SSEMODEF4 1 "nonimmediate_operand" "%x,x")
 	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,m"))
 	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "xm,x")))]
-  "TARGET_FMA4 && TARGET_FUSED_MADD
-   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD"
   "vfmsub<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-;; Split fmsub with two memory operands into a load and the fmsub.
-(define_split
-  [(set (match_operand:SSEMODEF4 0 "register_operand" "")
-	(minus:SSEMODEF4
-	 (mult:SSEMODEF4
-	  (match_operand:SSEMODEF4 1 "register_operand" "")
-	  (match_operand:SSEMODEF4 2 "memory_operand" ""))
-	 (match_operand:SSEMODEF4 3 "memory_operand" "")))]
-  "TARGET_FMA4"
-  [(set (match_dup 0)
-        (minus:SSEMODEF4
-         (mult:SSEMODEF4 (match_dup 1) (match_dup 2))
-         (match_dup 3)))]
-{
-  if (!ix86_expand_fma4_multiple_memory (operands, <MODE>mode))
-    FAIL;
-})
-
 ;; For the scalar operations, use operand1 for the upper words that aren't
 ;; modified, so restrict the forms that are generated.
-;; Scalar version of fmsub
+;; Scalar version of fmsub.
 (define_insn "fma4_vmfmsub<mode>4"
   [(set (match_operand:SSEMODEF2P 0 "register_operand" "=x,x")
 	(vec_merge:SSEMODEF2P
 	 (minus:SSEMODEF2P
 	  (mult:SSEMODEF2P
-	   (match_operand:SSEMODEF2P 1 "register_operand" "%x,x")
+	   (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "%x,x")
 	   (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m"))
 	  (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))
 	 (match_dup 0)
 	 (const_int 1)))]
-  "TARGET_FMA4 && TARGET_FUSED_MADD
-   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD"
   "vfmsub<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-;; Floating point negative multiply and add
-;; Rewrite (- (a * b) + c) into the canonical form: c - (a * b)
-;; Note operands are out of order to simplify call to ix86_fma4_valid_p
+;; Floating point negative multiply and add.
+;; Rewrite (- (a * b) + c) into the canonical form: c - (a * b).
+;; Note operands are out of order to simplify call to ix86_fma4_valid_p.
 ;; Allow two memory operands to help in optimizing.
 (define_insn "fma4_fnmadd<mode>4"
   [(set (match_operand:SSEMODEF4 0 "register_operand" "=x,x")
 	(minus:SSEMODEF4
 	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "xm,x")
 	 (mult:SSEMODEF4
-	  (match_operand:SSEMODEF4 1 "register_operand" "%x,x")
+	  (match_operand:SSEMODEF4 1 "nonimmediate_operand" "%x,x")
 	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,m"))))]
-  "TARGET_FMA4 && TARGET_FUSED_MADD
-   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD"
   "vfnmadd<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-;; Split fnmadd with two memory operands into a load and the fnmadd.
-(define_split
-  [(set (match_operand:SSEMODEF4 0 "register_operand" "")
-	(minus:SSEMODEF4
-	 (match_operand:SSEMODEF4 3 "memory_operand" "")
-	 (mult:SSEMODEF4
-	  (match_operand:SSEMODEF4 1 "register_operand" "")
-	  (match_operand:SSEMODEF4 2 "memory_operand" ""))))]
-  "TARGET_FMA4"
-  [(set (match_dup 0)
-        (minus:SSEMODEF4
-	 (match_dup 3)
-         (mult:SSEMODEF4 (match_dup 1) (match_dup 2))))]
-{
-  if (!ix86_expand_fma4_multiple_memory (operands, <MODE>mode))
-    FAIL;
-})
-
 ;; For the scalar operations, use operand1 for the upper words that aren't
 ;; modified, so restrict the forms that are generated.
-;; Scalar version of fnmadd
+;; Scalar version of fnmadd.
 (define_insn "fma4_vmfnmadd<mode>4"
   [(set (match_operand:SSEMODEF2P 0 "register_operand" "=x,x")
 	(vec_merge:SSEMODEF2P
 	 (minus:SSEMODEF2P
 	  (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x")
 	  (mult:SSEMODEF2P
-	   (match_operand:SSEMODEF2P 1 "register_operand" "%x,x")
+	   (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "%x,x")
 	   (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m")))
 	 (match_dup 0)
 	 (const_int 1)))]
-  "TARGET_FMA4 && TARGET_FUSED_MADD
-   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD"
   "vfnmadd<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-;; Floating point negative multiply and subtract
-;; Rewrite (- (a * b) - c) into the canonical form: ((-a) * b) - c
-;; Allow 2 memory operands to help with optimization
+;; Floating point negative multiply and subtract.
+;; Rewrite (- (a * b) - c) into the canonical form: ((-a) * b) - c.
+;; Allow 2 memory operands to help with optimization.
 (define_insn "fma4_fnmsub<mode>4"
   [(set (match_operand:SSEMODEF4 0 "register_operand" "=x,x")
 	(minus:SSEMODEF4
 	 (mult:SSEMODEF4
 	  (neg:SSEMODEF4
-	   (match_operand:SSEMODEF4 1 "register_operand" "%x,x"))
+	   (match_operand:SSEMODEF4 1 "nonimmediate_operand" "%x,x"))
 	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,m"))
 	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "xm,x")))]
-  "TARGET_FMA4 && TARGET_FUSED_MADD
-   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD"
   "vfnmsub<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-;; Split fnmsub with two memory operands into a load and the fmsub.
-(define_split
-  [(set (match_operand:SSEMODEF4 0 "register_operand" "")
-	(minus:SSEMODEF4
-	 (mult:SSEMODEF4
-	  (neg:SSEMODEF4
-	   (match_operand:SSEMODEF4 1 "register_operand" ""))
-	  (match_operand:SSEMODEF4 2 "memory_operand" ""))
-	 (match_operand:SSEMODEF4 3 "memory_operand" "")))]
-  "TARGET_FMA4"
-  [(set (match_dup 0)
-        (minus:SSEMODEF4
-         (mult:SSEMODEF4
-	  (neg:SSEMODEF4 (match_dup 1))
-	  (match_dup 2))
-         (match_dup 3)))]
-{
-  if (!ix86_expand_fma4_multiple_memory (operands, <MODE>mode))
-    FAIL;
-})
-
 ;; For the scalar operations, use operand1 for the upper words that aren't
 ;; modified, so restrict the forms that are generated.
-;; Scalar version of fnmsub
+;; Scalar version of fnmsub.
 (define_insn "fma4_vmfnmsub<mode>4"
   [(set (match_operand:SSEMODEF2P 0 "register_operand" "=x,x")
 	(vec_merge:SSEMODEF2P
 	 (minus:SSEMODEF2P
 	  (mult:SSEMODEF2P
 	   (neg:SSEMODEF2P
-	    (match_operand:SSEMODEF2P 1 "register_operand" "%x,x"))
+	    (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "%x,x"))
 	   (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m"))
 	  (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))
 	 (match_dup 0)
 	 (const_int 1)))]
-  "TARGET_FMA4 && TARGET_FUSED_MADD
-   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD"
   "vfnmsub<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
-
 (define_insn "fma4i_fmadd<mode>4256"
   [(set (match_operand:FMA4MODEF4 0 "register_operand" "=x,x")
 	(unspec:FMA4MODEF4
 	 [(plus:FMA4MODEF4
 	   (mult:FMA4MODEF4
-	    (match_operand:FMA4MODEF4 1 "register_operand" "%x,x")
+	    (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "%x,x")
 	    (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,m"))
 	   (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x"))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && TARGET_FUSED_MADD
-   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD"
   "vfmadd<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2076,12 +1911,11 @@
 	(unspec:FMA4MODEF4
 	 [(minus:FMA4MODEF4
 	   (mult:FMA4MODEF4
-	    (match_operand:FMA4MODEF4 1 "register_operand" "%x,x")
+	    (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "%x,x")
 	    (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,m"))
 	   (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x"))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && TARGET_FUSED_MADD
-   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD"
   "vfmsub<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2092,11 +1926,10 @@
 	 [(minus:FMA4MODEF4
 	   (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x")
 	   (mult:FMA4MODEF4
-	    (match_operand:FMA4MODEF4 1 "register_operand" "%x,x")
+	    (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "%x,x")
 	    (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,m")))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && TARGET_FUSED_MADD
-   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD"
   "vfnmadd<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2107,28 +1940,25 @@
 	 [(minus:FMA4MODEF4
 	   (mult:FMA4MODEF4
 	    (neg:FMA4MODEF4
-	     (match_operand:FMA4MODEF4 1 "register_operand" "%x,x"))
+	     (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "%x,x"))
 	    (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,m"))
 	   (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x"))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && TARGET_FUSED_MADD
-   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD"
   "vfnmsub<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
-;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 
 (define_insn "fma4i_fmadd<mode>4"
   [(set (match_operand:SSEMODEF2P 0 "register_operand" "=x,x")
 	(unspec:SSEMODEF2P
 	 [(plus:SSEMODEF2P
 	   (mult:SSEMODEF2P
-	    (match_operand:SSEMODEF2P 1 "register_operand" "%x,x")
+	    (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "%x,x")
 	    (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m"))
 	   (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && TARGET_FUSED_MADD
-   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD"
   "vfmadd<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2138,12 +1968,11 @@
 	(unspec:SSEMODEF2P
 	 [(minus:SSEMODEF2P
 	   (mult:SSEMODEF2P
-	    (match_operand:SSEMODEF2P 1 "register_operand" "%x,x")
+	    (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "%x,x")
 	    (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m"))
 	   (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && TARGET_FUSED_MADD
-   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD"
   "vfmsub<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2154,11 +1983,10 @@
 	 [(minus:SSEMODEF2P
 	   (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x")
 	   (mult:SSEMODEF2P
-	    (match_operand:SSEMODEF2P 1 "register_operand" "%x,x")
+	    (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "%x,x")
 	    (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m")))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && TARGET_FUSED_MADD
-   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD"
   "vfnmadd<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2169,12 +1997,11 @@
 	 [(minus:SSEMODEF2P
 	   (mult:SSEMODEF2P
 	    (neg:SSEMODEF2P
-	     (match_operand:SSEMODEF2P 1 "register_operand" "%x,x"))
+	     (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "%x,x"))
 	    (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m"))
 	   (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && TARGET_FUSED_MADD
-   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD"
   "vfnmsub<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2187,14 +2014,13 @@
 	 [(vec_merge:SSEMODEF2P
 	   (plus:SSEMODEF2P
 	    (mult:SSEMODEF2P
-	     (match_operand:SSEMODEF2P 1 "register_operand" "%x,x")
+	     (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "%x,x")
 	     (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m"))
 	    (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))
 	   (match_dup 0)
 	   (const_int 1))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && TARGET_FUSED_MADD
-   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD"
   "vfmadd<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<ssescalarmode>")])
@@ -2205,14 +2031,13 @@
 	 [(vec_merge:SSEMODEF2P
 	   (minus:SSEMODEF2P
 	    (mult:SSEMODEF2P
-	     (match_operand:SSEMODEF2P 1 "register_operand" "%x,x")
+	     (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "%x,x")
 	     (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m"))
 	    (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))
 	   (match_dup 0)
 	   (const_int 1))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && TARGET_FUSED_MADD
-   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD"
   "vfmsub<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<ssescalarmode>")])
@@ -2224,13 +2049,12 @@
 	   (minus:SSEMODEF2P
 	    (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x")
 	    (mult:SSEMODEF2P
-	     (match_operand:SSEMODEF2P 1 "register_operand" "%x,x")
+	     (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "%x,x")
 	     (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m")))
 	   (match_dup 0)
 	   (const_int 1))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && TARGET_FUSED_MADD
-   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD"
   "vfnmadd<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<ssescalarmode>")])
@@ -2242,21 +2066,20 @@
 	   (minus:SSEMODEF2P
 	    (mult:SSEMODEF2P
 	     (neg:SSEMODEF2P
-	      (match_operand:SSEMODEF2P 1 "register_operand" "%x,x"))
+	      (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "%x,x"))
 	     (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m"))
 	    (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))
 	   (match_dup 0)
 	   (const_int 1))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && TARGET_FUSED_MADD
-   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD"
   "vfnmsub<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<ssescalarmode>")])
 
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ;;
-;; FMA4 Parallel floating point multiply addsub and subadd operations
+;; FMA4 Parallel floating point multiply addsub and subadd operations.
 ;;
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 
@@ -2265,7 +2088,7 @@
 	(vec_merge:V8SF
 	  (plus:V8SF
 	    (mult:V8SF
-	      (match_operand:V8SF 1 "register_operand" "%x,x")
+	      (match_operand:V8SF 1 "nonimmediate_operand" "%x,x")
 	      (match_operand:V8SF 2 "nonimmediate_operand" "x,m"))
 	    (match_operand:V8SF 3 "nonimmediate_operand" "xm,x"))
 	  (minus:V8SF
@@ -2274,8 +2097,7 @@
 	      (match_dup 2))
 	    (match_dup 3))
 	  (const_int 170)))]
-  "TARGET_FMA4 && TARGET_FUSED_MADD
-   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD"
   "vfmaddsubps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V8SF")])
@@ -2285,7 +2107,7 @@
 	(vec_merge:V4DF
 	  (plus:V4DF
 	    (mult:V4DF
-	      (match_operand:V4DF 1 "register_operand" "%x,x")
+	      (match_operand:V4DF 1 "nonimmediate_operand" "%x,x")
 	      (match_operand:V4DF 2 "nonimmediate_operand" "x,m"))
 	    (match_operand:V4DF 3 "nonimmediate_operand" "xm,x"))
 	  (minus:V4DF
@@ -2294,8 +2116,7 @@
 	      (match_dup 2))
 	    (match_dup 3))
 	  (const_int 10)))]
-  "TARGET_FMA4 && TARGET_FUSED_MADD
-   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD"
   "vfmaddsubpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V4DF")])
@@ -2305,7 +2126,7 @@
 	(vec_merge:V4SF
 	  (plus:V4SF
 	    (mult:V4SF
-	      (match_operand:V4SF 1 "register_operand" "%x,x")
+	      (match_operand:V4SF 1 "nonimmediate_operand" "%x,x")
 	      (match_operand:V4SF 2 "nonimmediate_operand" "x,m"))
 	    (match_operand:V4SF 3 "nonimmediate_operand" "xm,x"))
 	  (minus:V4SF
@@ -2314,8 +2135,7 @@
 	      (match_dup 2))
 	    (match_dup 3))
 	  (const_int 10)))]
-  "TARGET_FMA4 && TARGET_FUSED_MADD
-   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD"
   "vfmaddsubps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V4SF")])
@@ -2325,7 +2145,7 @@
 	(vec_merge:V2DF
 	  (plus:V2DF
 	    (mult:V2DF
-	      (match_operand:V2DF 1 "register_operand" "%x,x")
+	      (match_operand:V2DF 1 "nonimmediate_operand" "%x,x")
 	      (match_operand:V2DF 2 "nonimmediate_operand" "x,m"))
 	    (match_operand:V2DF 3 "nonimmediate_operand" "xm,x"))
 	  (minus:V2DF
@@ -2334,8 +2154,7 @@
 	      (match_dup 2))
 	    (match_dup 3))
 	  (const_int 2)))]
-  "TARGET_FMA4 && TARGET_FUSED_MADD
-   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD"
   "vfmaddsubpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V2DF")])
@@ -2345,7 +2164,7 @@
 	(vec_merge:V8SF
 	  (plus:V8SF
 	    (mult:V8SF
-	      (match_operand:V8SF 1 "register_operand" "%x,x")
+	      (match_operand:V8SF 1 "nonimmediate_operand" "%x,x")
 	      (match_operand:V8SF 2 "nonimmediate_operand" "x,m"))
 	    (match_operand:V8SF 3 "nonimmediate_operand" "xm,x"))
 	  (minus:V8SF
@@ -2354,8 +2173,7 @@
 	      (match_dup 2))
 	    (match_dup 3))
 	  (const_int 85)))]
-  "TARGET_FMA4 && TARGET_FUSED_MADD
-   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD"
   "vfmsubaddps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V8SF")])
@@ -2365,7 +2183,7 @@
 	(vec_merge:V4DF
 	  (plus:V4DF
 	    (mult:V4DF
-	      (match_operand:V4DF 1 "register_operand" "%x,x")
+	      (match_operand:V4DF 1 "nonimmediate_operand" "%x,x")
 	      (match_operand:V4DF 2 "nonimmediate_operand" "x,m"))
 	    (match_operand:V4DF 3 "nonimmediate_operand" "xm,x"))
 	  (minus:V4DF
@@ -2374,8 +2192,7 @@
 	      (match_dup 2))
 	    (match_dup 3))
 	  (const_int 5)))]
-  "TARGET_FMA4 && TARGET_FUSED_MADD
-   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD"
   "vfmsubaddpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V4DF")])
@@ -2385,7 +2202,7 @@
 	(vec_merge:V4SF
 	  (plus:V4SF
 	    (mult:V4SF
-	      (match_operand:V4SF 1 "register_operand" "%x,x")
+	      (match_operand:V4SF 1 "nonimmediate_operand" "%x,x")
 	      (match_operand:V4SF 2 "nonimmediate_operand" "x,m"))
 	    (match_operand:V4SF 3 "nonimmediate_operand" "xm,x"))
 	  (minus:V4SF
@@ -2394,8 +2211,7 @@
 	      (match_dup 2))
 	    (match_dup 3))
 	  (const_int 5)))]
-  "TARGET_FMA4 && TARGET_FUSED_MADD
-   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD"
   "vfmsubaddps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V4SF")])
@@ -2405,7 +2221,7 @@
 	(vec_merge:V2DF
 	  (plus:V2DF
 	    (mult:V2DF
-	      (match_operand:V2DF 1 "register_operand" "%x,x")
+	      (match_operand:V2DF 1 "nonimmediate_operand" "%x,x")
 	      (match_operand:V2DF 2 "nonimmediate_operand" "x,m"))
 	    (match_operand:V2DF 3 "nonimmediate_operand" "xm,x"))
 	  (minus:V2DF
@@ -2414,21 +2230,18 @@
 	      (match_dup 2))
 	    (match_dup 3))
 	  (const_int 1)))]
-  "TARGET_FMA4 && TARGET_FUSED_MADD
-   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD"
   "vfmsubaddpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V2DF")])
 
-;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
-
 (define_insn "fma4i_fmaddsubv8sf4"
   [(set (match_operand:V8SF 0 "register_operand" "=x,x")
 	(unspec:V8SF
 	 [(vec_merge:V8SF
 	   (plus:V8SF
 	     (mult:V8SF
-	       (match_operand:V8SF 1 "register_operand" "%x,x")
+	       (match_operand:V8SF 1 "nonimmediate_operand" "%x,x")
 	       (match_operand:V8SF 2 "nonimmediate_operand" "x,m"))
 	     (match_operand:V8SF 3 "nonimmediate_operand" "xm,x"))
 	   (minus:V8SF
@@ -2438,8 +2251,7 @@
 	     (match_dup 3))
 	   (const_int 170))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && TARGET_FUSED_MADD
-   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD"
   "vfmaddsubps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V8SF")])
@@ -2450,7 +2262,7 @@
 	 [(vec_merge:V4DF
 	   (plus:V4DF
 	     (mult:V4DF
-	       (match_operand:V4DF 1 "register_operand" "%x,x")
+	       (match_operand:V4DF 1 "nonimmediate_operand" "%x,x")
 	       (match_operand:V4DF 2 "nonimmediate_operand" "x,m"))
 	     (match_operand:V4DF 3 "nonimmediate_operand" "xm,x"))
 	   (minus:V4DF
@@ -2460,8 +2272,7 @@
 	     (match_dup 3))
 	   (const_int 10))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && TARGET_FUSED_MADD
-   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD"
   "vfmaddsubpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V4DF")])
@@ -2472,7 +2283,7 @@
 	 [(vec_merge:V4SF
 	   (plus:V4SF
 	     (mult:V4SF
-	       (match_operand:V4SF 1 "register_operand" "%x,x")
+	       (match_operand:V4SF 1 "nonimmediate_operand" "%x,x")
 	       (match_operand:V4SF 2 "nonimmediate_operand" "x,m"))
 	     (match_operand:V4SF 3 "nonimmediate_operand" "xm,x"))
 	   (minus:V4SF
@@ -2482,8 +2293,7 @@
 	     (match_dup 3))
 	   (const_int 10))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && TARGET_FUSED_MADD
-   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD"
   "vfmaddsubps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V4SF")])
@@ -2494,7 +2304,7 @@
 	 [(vec_merge:V2DF
 	   (plus:V2DF
 	     (mult:V2DF
-	       (match_operand:V2DF 1 "register_operand" "%x,x")
+	       (match_operand:V2DF 1 "nonimmediate_operand" "%x,x")
 	       (match_operand:V2DF 2 "nonimmediate_operand" "x,m"))
 	     (match_operand:V2DF 3 "nonimmediate_operand" "xm,x"))
 	   (minus:V2DF
@@ -2504,8 +2314,7 @@
 	     (match_dup 3))
 	   (const_int 2))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && TARGET_FUSED_MADD
-   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD"
   "vfmaddsubpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V2DF")])
@@ -2516,7 +2325,7 @@
 	 [(vec_merge:V8SF
 	   (plus:V8SF
 	     (mult:V8SF
-	       (match_operand:V8SF 1 "register_operand" "%x,x")
+	       (match_operand:V8SF 1 "nonimmediate_operand" "%x,x")
 	       (match_operand:V8SF 2 "nonimmediate_operand" "x,m"))
 	     (match_operand:V8SF 3 "nonimmediate_operand" "xm,x"))
 	   (minus:V8SF
@@ -2526,8 +2335,7 @@
 	     (match_dup 3))
 	   (const_int 85))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && TARGET_FUSED_MADD
-   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD"
   "vfmsubaddps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V8SF")])
@@ -2538,7 +2346,7 @@
 	 [(vec_merge:V4DF
 	   (plus:V4DF
 	     (mult:V4DF
-	       (match_operand:V4DF 1 "register_operand" "%x,x")
+	       (match_operand:V4DF 1 "nonimmediate_operand" "%x,x")
 	       (match_operand:V4DF 2 "nonimmediate_operand" "x,m"))
 	     (match_operand:V4DF 3 "nonimmediate_operand" "xm,x"))
 	   (minus:V4DF
@@ -2548,8 +2356,7 @@
 	     (match_dup 3))
 	   (const_int 5))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && TARGET_FUSED_MADD
-   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD"
   "vfmsubaddpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V4DF")])
@@ -2560,7 +2367,7 @@
 	 [(vec_merge:V4SF
 	   (plus:V4SF
 	     (mult:V4SF
-	       (match_operand:V4SF 1 "register_operand" "%x,x")
+	       (match_operand:V4SF 1 "nonimmediate_operand" "%x,x")
 	       (match_operand:V4SF 2 "nonimmediate_operand" "x,m"))
 	     (match_operand:V4SF 3 "nonimmediate_operand" "xm,x"))
 	   (minus:V4SF
@@ -2570,8 +2377,7 @@
 	     (match_dup 3))
 	   (const_int 5))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && TARGET_FUSED_MADD
-   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD"
   "vfmsubaddps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V4SF")])
@@ -2582,7 +2388,7 @@
 	 [(vec_merge:V2DF
 	   (plus:V2DF
 	     (mult:V2DF
-	       (match_operand:V2DF 1 "register_operand" "%x,x")
+	       (match_operand:V2DF 1 "nonimmediate_operand" "%x,x")
 	       (match_operand:V2DF 2 "nonimmediate_operand" "x,m"))
 	     (match_operand:V2DF 3 "nonimmediate_operand" "xm,x"))
 	   (minus:V2DF
@@ -2592,8 +2398,7 @@
 	     (match_dup 3))
 	   (const_int 1))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && TARGET_FUSED_MADD
-   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD"
   "vfmsubaddpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V2DF")])
-- 
1.6.0.4


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Vector permutation support for x86
  2009-12-05 17:19                   ` Sebastian Pop
@ 2009-12-05 17:55                     ` Richard Henderson
  0 siblings, 0 replies; 45+ messages in thread
From: Richard Henderson @ 2009-12-05 17:55 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: Uros Bizjak, GCC Patches, Harle, Christophe

> 	* config/i386/i386.c (TARGET_DEFAULT_TARGET_FLAGS): Add
> 	MASK_FUSED_MADD.
> 	* config/i386/i386.h (CC1_CPU_SPEC_1): Remove
> 	"'-mfused-madd' was removed".
> 	* config/i386/i386.opt (mfused-madd): New.
> 	* config/i386/sse.md: Add TARGET_FUSED_MADD to FMA4 insns.
> 	* doc/invoke.texi (-mfused-madd, -mno-fused-madd): Document.

Ok.


r~

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Vector permutation support for x86
  2009-12-05 17:07         ` Uros Bizjak
@ 2009-12-05 17:49           ` Sebastian Pop
  2009-12-05 20:40             ` Sebastian Pop
  0 siblings, 1 reply; 45+ messages in thread
From: Sebastian Pop @ 2009-12-05 17:49 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: Richard Henderson, GCC Patches

On Sat, Dec 5, 2009 at 10:04, Uros Bizjak <ubizjak@gmail.com> wrote:
> Attached patch was tested in gcc.target/i386 directory with and without -m32
> and fixes mentioned regression for -m32. Despite only light testing, I'm
> pretty confident that we should change handling of all FMA patterns in the
> same way as shown in the example in the attached patch.
>

I will prepare a patch that does that for the other FMA4 patterns.

Sebastian

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Vector permutation support for x86
  2009-12-04 16:40                 ` Sebastian Pop
@ 2009-12-05 17:19                   ` Sebastian Pop
  2009-12-05 17:55                     ` Richard Henderson
  0 siblings, 1 reply; 45+ messages in thread
From: Sebastian Pop @ 2009-12-05 17:19 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Uros Bizjak, GCC Patches, Harle, Christophe

[-- Attachment #1: Type: text/plain, Size: 537 bytes --]

>>> +mno-fused-madd
>>> +Target RejectNegative Report Mask(NO_FUSED_MADD) Undocumented Save
>>> +
>>> +mfused-madd
>>> +Target Report InverseMask(NO_FUSED_MADD, FUSED_MADD) Save
>>> +Enable automatic generation of fused floating point multiply-add
>>> instructions
>>> +if the ISA supports such instructions.  The -mfused-madd option is on by
>>> +default.
>>
>> Please don't do things this sort of backward way.
>
> Ok.
>

Fixed like this.
Testing in progress on amd64-linux.  Okay if it passes?

Thanks,
Sebastian

[-- Attachment #2: 0001-Add-TARGET_FUSED_MADD-to-FMA4-insns.patch --]
[-- Type: text/x-patch, Size: 20577 bytes --]

From 96c66d1af1f37e10cf03022e422f8f3da539d6b9 Mon Sep 17 00:00:00 2001
From: Sebastian Pop <sebpop@gmail.com>
Date: Thu, 3 Dec 2009 21:16:00 -0600
Subject: [PATCH] Add TARGET_FUSED_MADD to FMA4 insns.

	* config/i386/i386.c (TARGET_DEFAULT_TARGET_FLAGS): Add
	MASK_FUSED_MADD.
	* config/i386/i386.h (CC1_CPU_SPEC_1): Remove
	"'-mfused-madd' was removed".
	* config/i386/i386.opt (mfused-madd): New.
	* config/i386/sse.md: Add TARGET_FUSED_MADD to FMA4 insns.
	* doc/invoke.texi (-mfused-madd, -mno-fused-madd): Document.
---
 gcc/config/i386/i386.c   |    3 +-
 gcc/config/i386/i386.h   |    2 -
 gcc/config/i386/i386.opt |    6 ++
 gcc/config/i386/sse.md   |  120 ++++++++++++++++++++++++++++++---------------
 gcc/doc/invoke.texi      |   11 ++++-
 5 files changed, 97 insertions(+), 45 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index ade3a7d..6cd9d7d 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -30307,7 +30307,8 @@ ix86_enum_va_list (int idx, const char **pname, tree *ptree)
 #define TARGET_DEFAULT_TARGET_FLAGS	\
   (TARGET_DEFAULT			\
    | TARGET_SUBTARGET_DEFAULT		\
-   | TARGET_TLS_DIRECT_SEG_REFS_DEFAULT)
+   | TARGET_TLS_DIRECT_SEG_REFS_DEFAULT \
+   | MASK_FUSED_MADD)
 
 #undef TARGET_HANDLE_OPTION
 #define TARGET_HANDLE_OPTION ix86_handle_option
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index eb1c86f..860d234 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -546,8 +546,6 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
 %n`-mintel-syntax' is deprecated. Use `-masm=intel' instead.\n} \
 %{msse5:-mavx \
 %n'-msse5' was removed.\n} \
-%{mfused-madd:-mavx \
-%n'-mfused-madd' was removed.\n} \
 %{mno-intel-syntax:-masm=att \
 %n`-mno-intel-syntax' is deprecated. Use `-masm=att' instead.\n}"
 
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index dd47b7d..0afdd11 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -244,6 +244,12 @@ mcld
 Target Report Mask(CLD) Save
 Generate cld instruction in the function prologue.
 
+mfused-madd
+Target Report Mask(FUSED_MADD) Save
+Enable automatic generation of fused floating point multiply-add instructions
+if the ISA supports such instructions.  The -mfused-madd option is on by
+default.
+
 ;; ISA support
 
 m32
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 78e4b6a..9524d4f 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1709,7 +1709,8 @@
 	  (match_operand:FMA4MODEF4 1 "register_operand" "%x,x")
 	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,m"))
 	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x")))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmadd<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -1741,7 +1742,8 @@
 	  (match_operand:FMA4MODEF4 1 "register_operand" "%x,x")
 	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,m"))
 	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x")))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsub<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -1775,7 +1777,8 @@
 	 (mult:FMA4MODEF4
 	  (match_operand:FMA4MODEF4 1 "register_operand" "%x,x")
 	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,m"))))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmadd<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -1809,7 +1812,8 @@
 	   (match_operand:FMA4MODEF4 1 "register_operand" "%x,x"))
 	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,m"))
 	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x")))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmsub<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -1843,7 +1847,8 @@
 	  (match_operand:SSEMODEF4 1 "register_operand" "%x,x")
 	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,m"))
 	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "xm,x")))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmadd<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -1879,7 +1884,8 @@
 	  (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))
 	 (match_dup 0)
 	 (const_int 1)))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmadd<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -1893,7 +1899,8 @@
 	  (match_operand:SSEMODEF4 1 "register_operand" "%x,x")
 	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,m"))
 	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "xm,x")))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsub<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -1929,7 +1936,8 @@
 	  (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))
 	 (match_dup 0)
 	 (const_int 1)))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsub<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -1945,7 +1953,8 @@
 	 (mult:SSEMODEF4
 	  (match_operand:SSEMODEF4 1 "register_operand" "%x,x")
 	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,m"))))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmadd<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -1981,7 +1990,8 @@
 	   (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m")))
 	 (match_dup 0)
 	 (const_int 1)))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmadd<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -1997,7 +2007,8 @@
 	   (match_operand:SSEMODEF4 1 "register_operand" "%x,x"))
 	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,m"))
 	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "xm,x")))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmsub<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2037,7 +2048,8 @@
 	  (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))
 	 (match_dup 0)
 	 (const_int 1)))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmsub<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2053,7 +2065,8 @@
 	    (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,m"))
 	   (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x"))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmadd<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2067,7 +2080,8 @@
 	    (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,m"))
 	   (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x"))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsub<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2081,7 +2095,8 @@
 	    (match_operand:FMA4MODEF4 1 "register_operand" "%x,x")
 	    (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,m")))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmadd<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2096,7 +2111,8 @@
 	    (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,m"))
 	   (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x"))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmsub<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2111,7 +2127,8 @@
 	    (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m"))
 	   (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmadd<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2125,7 +2142,8 @@
 	    (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m"))
 	   (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsub<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2139,7 +2157,8 @@
 	    (match_operand:SSEMODEF2P 1 "register_operand" "%x,x")
 	    (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m")))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmadd<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2154,7 +2173,8 @@
 	    (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m"))
 	   (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmsub<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2173,7 +2193,8 @@
 	   (match_dup 0)
 	   (const_int 1))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmadd<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<ssescalarmode>")])
@@ -2190,7 +2211,8 @@
 	   (match_dup 0)
 	   (const_int 1))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsub<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<ssescalarmode>")])
@@ -2207,7 +2229,8 @@
 	   (match_dup 0)
 	   (const_int 1))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmadd<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<ssescalarmode>")])
@@ -2225,7 +2248,8 @@
 	   (match_dup 0)
 	   (const_int 1))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmsub<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<ssescalarmode>")])
@@ -2250,7 +2274,8 @@
 	      (match_dup 2))
 	    (match_dup 3))
 	  (const_int 170)))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmaddsubps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V8SF")])
@@ -2269,7 +2294,8 @@
 	      (match_dup 2))
 	    (match_dup 3))
 	  (const_int 10)))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmaddsubpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V4DF")])
@@ -2288,7 +2314,8 @@
 	      (match_dup 2))
 	    (match_dup 3))
 	  (const_int 10)))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmaddsubps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V4SF")])
@@ -2307,7 +2334,8 @@
 	      (match_dup 2))
 	    (match_dup 3))
 	  (const_int 2)))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmaddsubpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V2DF")])
@@ -2326,7 +2354,8 @@
 	      (match_dup 2))
 	    (match_dup 3))
 	  (const_int 85)))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsubaddps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V8SF")])
@@ -2345,7 +2374,8 @@
 	      (match_dup 2))
 	    (match_dup 3))
 	  (const_int 5)))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsubaddpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V4DF")])
@@ -2364,7 +2394,8 @@
 	      (match_dup 2))
 	    (match_dup 3))
 	  (const_int 5)))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsubaddps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V4SF")])
@@ -2383,7 +2414,8 @@
 	      (match_dup 2))
 	    (match_dup 3))
 	  (const_int 1)))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsubaddpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V2DF")])
@@ -2406,7 +2438,8 @@
 	     (match_dup 3))
 	   (const_int 170))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmaddsubps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V8SF")])
@@ -2427,7 +2460,8 @@
 	     (match_dup 3))
 	   (const_int 10))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmaddsubpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V4DF")])
@@ -2448,7 +2482,8 @@
 	     (match_dup 3))
 	   (const_int 10))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmaddsubps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V4SF")])
@@ -2469,7 +2504,8 @@
 	     (match_dup 3))
 	   (const_int 2))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmaddsubpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V2DF")])
@@ -2490,7 +2526,8 @@
 	     (match_dup 3))
 	   (const_int 85))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsubaddps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V8SF")])
@@ -2511,7 +2548,8 @@
 	     (match_dup 3))
 	   (const_int 5))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsubaddpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V4DF")])
@@ -2532,7 +2570,8 @@
 	     (match_dup 3))
 	   (const_int 5))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsubaddps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V4SF")])
@@ -2553,7 +2592,8 @@
 	     (match_dup 3))
 	   (const_int 1))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsubaddpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V2DF")])
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index a1df226..22fad8f 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -593,7 +593,7 @@ Objective-C and Objective-C++ Dialects}.
 -mincoming-stack-boundary=@var{num}
 -mcld -mcx16 -msahf -mmovbe -mcrc32 -mrecip @gol
 -mmmx  -msse  -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msse4 -mavx @gol
--maes -mpclmul @gol
+-maes -mpclmul -mfused-madd @gol
 -msse4a -m3dnow -mpopcnt -mabm -mfma4 -mxop -mlwp @gol
 -mthreads  -mno-align-stringops  -minline-all-stringops @gol
 -minline-stringops-dynamically -mstringop-strategy=@var{alg} @gol
@@ -12062,6 +12062,13 @@ supported architecture, using the appropriate flags.  In particular,
 the file containing the CPU detection code should be compiled without
 these options.
 
+@item -mfused-madd
+@itemx -mno-fused-madd
+@opindex mfused-madd
+@opindex mno-fused-madd
+Do (don't) generate code that uses the fused multiply/add or multiply/subtract
+instructions.  The default is to use these instructions.
+
 @item -mcld
 @opindex mcld
 This option instructs GCC to emit a @code{cld} instruction in the prologue
@@ -12397,7 +12404,7 @@ Do not generate inline code for sqrt.
 @opindex mfused-madd
 @opindex mno-fused-madd
 Do (don't) generate code that uses the fused multiply/add or multiply/subtract
-instructions.    The default is to use these instructions.
+instructions.  The default is to use these instructions.
 
 @item -mno-dwarf2-asm
 @itemx -mdwarf2-asm
-- 
1.6.0.4


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Vector permutation support for x86
  2009-12-03 19:30       ` Sebastian Pop
  2009-12-03 20:44         ` Richard Henderson
@ 2009-12-05 17:07         ` Uros Bizjak
  2009-12-05 17:49           ` Sebastian Pop
  1 sibling, 1 reply; 45+ messages in thread
From: Uros Bizjak @ 2009-12-05 17:07 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: Richard Henderson, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 3961 bytes --]

On 12/03/2009 08:27 PM, Sebastian Pop wrote:
> On Wed, Dec 2, 2009 at 16:23, Richard Henderson <rth@redhat.com> wrote:
>>> @@ -1724,8 +1723,8 @@
>>>          (match_operand:FMA4MODEF4 2 "nonimmediate_operand" ""))
>>>         (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "")))]
>>>   "TARGET_FMA4
>>> - && !ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)
>>> - && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)
>>> + && MEM_P (operands[2])
>>> + && (MEM_P (operands[1]) || MEM_P (operands[3]))
>>> && !reg_mentioned_p (operands[0], operands[1])
>>> && !reg_mentioned_p (operands[0], operands[2])
>>> && !reg_mentioned_p (operands[0], operands[3])"
>>
>> This is the splitter under fma4_fmadd<mode>4256", but the same comment
>> applies to all of the fma4 splitters.

I think that all these splitters are not needed. ATM, we regress in i386 
testsuite:

FAIL: gcc.target/i386/funcspec-9.c scan-assembler vfmaddss

due to failure to generate vfmaddss for -m32 for this testcase:

float
flt_mul_add (float a, float b, float c)
{
   return (a * b) + c;
}

combine pass first creates separate multiply and separate add 
instructions, both with a memory operand:

Successfully matched this instruction:
(set (reg:SF 64)
     (plus:SF (reg:SF 65)
         (mem/c/i:SF (plus:SI (reg/f:SI 16 argp)
                 (const_int 8 [0x8])) [2 c+0 S4 A32])))

Successfully matched this instruction:
(set (reg:SF 64)
     (plus:SF (reg:SF 65)
         (mem/c/i:SF (plus:SI (reg/f:SI 16 argp)
                 (const_int 8 [0x8])) [2 c+0 S4 A32])))
deferring deletion of insn with uid = 4.

and then fails to match:

Failed to match this instruction:
(set (reg:SF 64)
     (plus:SF (mult:SF (mem/c/i:SF (reg/f:SI 16 argp) [2 a+0 S4 A32])
             (reg/v:SF 62 [ b ]))
         (mem/c/i:SF (plus:SI (reg/f:SI 16 argp)
                 (const_int 8 [0x8])) [2 c+0 S4 A32])))

(BTW: Pushing all values into registers at expand time is simply not 
effective for FMAs.)

The generation of FMA insn can be fixed by having only following 
pattern, without any splitter support:

(define_insn "fma4_fmadd<mode>4"
   [(set (match_operand:SSEMODEF4 0 "register_operand" "=x,x")
     (plus:SSEMODEF4
      (mult:SSEMODEF4
       (match_operand:SSEMODEF4 1 "nonimmediate_operand" "%x,x")
       (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,m"))
      (match_operand:SSEMODEF4 3 "nonimmediate_operand" "xm,x")))]
   "TARGET_FMA4"
   "vfmadd<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])

Please note, that now all operands can consume memory operand as well, 
so combine creates desired insn pattern:

Successfully matched this instruction:
(set (reg:SF 64)
     (plus:SF (mult:SF (mem/c/i:SF (reg/f:SI 16 argp) [2 a+0 S4 A32])
             (reg/v:SF 62 [ b ]))
         (mem/c/i:SF (plus:SI (reg/f:SI 16 argp)
                 (const_int 8 [0x8])) [2 c+0 S4 A32])))

We pass this insn to reload. AFAIK, operand constraint ("%x,x" for op1) 
should be tighter that operand predicate ("nonimmediate_operand" for 
op1), otherwise reload crashes. Knowing that, we see that all 
constraints are tighter then the predicates in our case, so even when we 
have multiple memory operands, gcc generates for 32bit target:

flt_mul_add:
     pushl    %ebp
     movl    %esp, %ebp
     subl    $4, %esp
     vmovss    8(%ebp), %xmm0
     vmovss    12(%ebp), %xmm1
     vfmaddss    16(%ebp), %xmm1, %xmm0, %xmm0
     vmovss    %xmm0, -4(%ebp)
     flds    -4(%ebp)
     leave
     ret

q.e.d.

Attached patch was tested in gcc.target/i386 directory with and without 
-m32 and fixes mentioned regression for -m32. Despite only light 
testing, I'm pretty confident that we should change handling of all FMA 
patterns in the same way as shown in the example in the attached patch.

BTW: Please test patches that apply to 32 and 64bit targets with:

RUNTESTFLAGS="--target_board=unix\{,-m32\}"

Uros.

[-- Attachment #2: p.diff.txt --]
[-- Type: text/plain, Size: 1453 bytes --]

Index: sse.md
===================================================================
--- sse.md	(revision 155009)
+++ sse.md	(working copy)
@@ -1840,32 +1840,14 @@
   [(set (match_operand:SSEMODEF4 0 "register_operand" "=x,x")
 	(plus:SSEMODEF4
 	 (mult:SSEMODEF4
-	  (match_operand:SSEMODEF4 1 "register_operand" "%x,x")
+	  (match_operand:SSEMODEF4 1 "nonimmediate_operand" "%x,x")
 	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,m"))
 	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "xm,x")))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4"
   "vfmadd<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-;; Split fmadd with two memory operands into a load and the fmadd.
-(define_split
-  [(set (match_operand:SSEMODEF4 0 "register_operand" "")
-	(plus:SSEMODEF4
-	 (mult:SSEMODEF4
-	  (match_operand:SSEMODEF4 1 "register_operand" "")
-	  (match_operand:SSEMODEF4 2 "memory_operand" ""))
-	 (match_operand:SSEMODEF4 3 "memory_operand" "")))]
-  "TARGET_FMA4"
-  [(set (match_dup 0)
-        (plus:SSEMODEF4
-         (mult:SSEMODEF4 (match_dup 1) (match_dup 2))
-         (match_dup 3)))]
-{
-  if (!ix86_expand_fma4_multiple_memory (operands, <MODE>mode))
-    FAIL;
-})
-
 ;; For the scalar operations, use operand1 for the upper words that aren't
 ;; modified, so restrict the forms that are generated.
 ;; Scalar version of fmadd

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Vector permutation support for x86
  2009-12-04 16:31               ` Richard Henderson
@ 2009-12-04 16:40                 ` Sebastian Pop
  2009-12-05 17:19                   ` Sebastian Pop
  0 siblings, 1 reply; 45+ messages in thread
From: Sebastian Pop @ 2009-12-04 16:40 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Uros Bizjak, GCC Patches, Harle, Christophe

On Fri, Dec 4, 2009 at 10:10, Richard Henderson <rth@redhat.com> wrote:
>> @@ -244,6 +244,15 @@ mcld
>>  Target Report Mask(CLD) Save
>>  Generate cld instruction in the function prologue.
>>
>> +mno-fused-madd
>> +Target RejectNegative Report Mask(NO_FUSED_MADD) Undocumented Save
>> +
>> +mfused-madd
>> +Target Report InverseMask(NO_FUSED_MADD, FUSED_MADD) Save
>> +Enable automatic generation of fused floating point multiply-add
>> instructions
>> +if the ISA supports such instructions.  The -mfused-madd option is on by
>> +default.
>
> Please don't do things this sort of backward way.

Ok.

> Please instead look at the other targets and see
> how they use the positive mask.

I just re-enabled this flag from the SSE5 patch that did it this way.

Sebastian

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Vector permutation support for x86
  2009-12-04  6:50             ` Sebastian Pop
@ 2009-12-04 16:31               ` Richard Henderson
  2009-12-04 16:40                 ` Sebastian Pop
  0 siblings, 1 reply; 45+ messages in thread
From: Richard Henderson @ 2009-12-04 16:31 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: Uros Bizjak, GCC Patches, Harle, Christophe

> @@ -244,6 +244,15 @@ mcld
>  Target Report Mask(CLD) Save
>  Generate cld instruction in the function prologue.
>
> +mno-fused-madd
> +Target RejectNegative Report Mask(NO_FUSED_MADD) Undocumented Save
> +
> +mfused-madd
> +Target Report InverseMask(NO_FUSED_MADD, FUSED_MADD) Save
> +Enable automatic generation of fused floating point multiply-add instructions
> +if the ISA supports such instructions.  The -mfused-madd option is on by
> +default.

Please don't do things this sort of backward way.
Please instead look at the other targets and see
how they use the positive mask.


r~

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Vector permutation support for x86
  2009-12-03 19:53           ` Sebastian Pop
@ 2009-12-04  6:50             ` Sebastian Pop
  2009-12-04 16:31               ` Richard Henderson
  0 siblings, 1 reply; 45+ messages in thread
From: Sebastian Pop @ 2009-12-04  6:50 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Uros Bizjak, GCC Patches, Harle, Christophe

[-- Attachment #1: Type: text/plain, Size: 1124 bytes --]

On Thu, Dec 3, 2009 at 13:31, Sebastian Pop <sebpop@gmail.com> wrote:
>> I do see from other ports that all of the versions of these patterns without
>> the unspec should be protected by TARGET_FUSED_MADD and the -mfused-add
>> command-line option to control it.  Note that FUSED_MADD is enabled by
>> default on all those targets that implement it.
>>
>> I see i386 used to have the option, but someone decided that -mfused-madd
>> should imply -mavx instead.  Which is silly since that's not the same thing
>> at all; -m{avx,xop} -mno-fused-madd is a very sensible combination of
>> options if your numerical algorithm can't stand the fused operation.
>
> I will prepare a patch to fix this.
>

Fixed like this.  Okay for trunk if this passes test on amd64-linux?

	* config/i386/i386.c (ix86_target_string): Add -mno-fused-madd.
	(ix86_valid_target_attribute_inner_p): Add fused-madd.
	* config/i386/i386.opt (mno-fused-madd): New.
	(mfused-madd): New.
	* config/i386/sse.md: Add TARGET_FUSED_MADD to FMA4 insns.
	* doc/invoke.texi (-mfused-madd, -mno-fused-madd): Document.

Thanks,
Sebastian

[-- Attachment #2: 0001-Add-TARGET_FUSED_MADD-to-FMA4-insns.patch --]
[-- Type: text/x-patch, Size: 20543 bytes --]

From 70d7462eb5ee602cced553ae4d976915333cb947 Mon Sep 17 00:00:00 2001
From: Sebastian Pop <sebpop@gmail.com>
Date: Thu, 3 Dec 2009 21:16:00 -0600
Subject: [PATCH] Add TARGET_FUSED_MADD to FMA4 insns.

	* config/i386/i386.c (ix86_target_string): Add -mno-fused-madd.
	(ix86_valid_target_attribute_inner_p): Add fused-madd.
	* config/i386/i386.opt (mno-fused-madd): New.
	(mfused-madd): New.
	* config/i386/sse.md: Add TARGET_FUSED_MADD to FMA4 insns.
	* doc/invoke.texi (-mfused-madd, -mno-fused-madd): Document.
---
 gcc/config/i386/i386.c   |    5 ++
 gcc/config/i386/i386.opt |    9 ++++
 gcc/config/i386/sse.md   |  120 ++++++++++++++++++++++++++++++---------------
 gcc/doc/invoke.texi      |   11 ++++-
 4 files changed, 103 insertions(+), 42 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index ade3a7d..e1e6341 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2454,6 +2454,7 @@ ix86_target_string (int isa, int flags, const char *arch, const char *tune,
     { "-mms-bitfields",			MASK_MS_BITFIELD_LAYOUT },
     { "-mno-align-stringops",		MASK_NO_ALIGN_STRINGOPS },
     { "-mno-fancy-math-387",		MASK_NO_FANCY_MATH_387 },
+    { "-mno-fused-madd",		MASK_NO_FUSED_MADD },
     { "-mno-push-args",			MASK_NO_PUSH_ARGS },
     { "-mno-red-zone",			MASK_NO_RED_ZONE },
     { "-momit-leaf-frame-pointer",	MASK_OMIT_LEAF_FRAME_POINTER },
@@ -3704,6 +3705,10 @@ ix86_valid_target_attribute_inner_p (tree args, char *p_strings[])
 		  OPT_mfancy_math_387,
 		  MASK_NO_FANCY_MATH_387),
 
+    IX86_ATTR_NO ("fused-madd",
+		  OPT_mfused_madd,
+		  MASK_NO_FUSED_MADD),
+
     IX86_ATTR_YES ("ieee-fp",
 		   OPT_mieee_fp,
 		   MASK_IEEE_FP),
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index dd47b7d..00916fc 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -244,6 +244,15 @@ mcld
 Target Report Mask(CLD) Save
 Generate cld instruction in the function prologue.
 
+mno-fused-madd
+Target RejectNegative Report Mask(NO_FUSED_MADD) Undocumented Save
+
+mfused-madd
+Target Report InverseMask(NO_FUSED_MADD, FUSED_MADD) Save
+Enable automatic generation of fused floating point multiply-add instructions
+if the ISA supports such instructions.  The -mfused-madd option is on by
+default.
+
 ;; ISA support
 
 m32
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 78e4b6a..9524d4f 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1709,7 +1709,8 @@
 	  (match_operand:FMA4MODEF4 1 "register_operand" "%x,x")
 	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,m"))
 	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x")))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmadd<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -1741,7 +1742,8 @@
 	  (match_operand:FMA4MODEF4 1 "register_operand" "%x,x")
 	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,m"))
 	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x")))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsub<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -1775,7 +1777,8 @@
 	 (mult:FMA4MODEF4
 	  (match_operand:FMA4MODEF4 1 "register_operand" "%x,x")
 	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,m"))))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmadd<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -1809,7 +1812,8 @@
 	   (match_operand:FMA4MODEF4 1 "register_operand" "%x,x"))
 	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,m"))
 	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x")))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmsub<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -1843,7 +1847,8 @@
 	  (match_operand:SSEMODEF4 1 "register_operand" "%x,x")
 	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,m"))
 	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "xm,x")))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmadd<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -1879,7 +1884,8 @@
 	  (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))
 	 (match_dup 0)
 	 (const_int 1)))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmadd<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -1893,7 +1899,8 @@
 	  (match_operand:SSEMODEF4 1 "register_operand" "%x,x")
 	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,m"))
 	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "xm,x")))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsub<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -1929,7 +1936,8 @@
 	  (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))
 	 (match_dup 0)
 	 (const_int 1)))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsub<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -1945,7 +1953,8 @@
 	 (mult:SSEMODEF4
 	  (match_operand:SSEMODEF4 1 "register_operand" "%x,x")
 	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,m"))))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmadd<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -1981,7 +1990,8 @@
 	   (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m")))
 	 (match_dup 0)
 	 (const_int 1)))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmadd<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -1997,7 +2007,8 @@
 	   (match_operand:SSEMODEF4 1 "register_operand" "%x,x"))
 	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,m"))
 	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "xm,x")))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmsub<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2037,7 +2048,8 @@
 	  (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))
 	 (match_dup 0)
 	 (const_int 1)))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmsub<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2053,7 +2065,8 @@
 	    (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,m"))
 	   (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x"))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmadd<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2067,7 +2080,8 @@
 	    (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,m"))
 	   (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x"))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsub<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2081,7 +2095,8 @@
 	    (match_operand:FMA4MODEF4 1 "register_operand" "%x,x")
 	    (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,m")))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmadd<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2096,7 +2111,8 @@
 	    (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,m"))
 	   (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x"))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmsub<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2111,7 +2127,8 @@
 	    (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m"))
 	   (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmadd<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2125,7 +2142,8 @@
 	    (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m"))
 	   (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsub<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2139,7 +2157,8 @@
 	    (match_operand:SSEMODEF2P 1 "register_operand" "%x,x")
 	    (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m")))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmadd<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2154,7 +2173,8 @@
 	    (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m"))
 	   (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmsub<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2173,7 +2193,8 @@
 	   (match_dup 0)
 	   (const_int 1))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmadd<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<ssescalarmode>")])
@@ -2190,7 +2211,8 @@
 	   (match_dup 0)
 	   (const_int 1))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsub<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<ssescalarmode>")])
@@ -2207,7 +2229,8 @@
 	   (match_dup 0)
 	   (const_int 1))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmadd<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<ssescalarmode>")])
@@ -2225,7 +2248,8 @@
 	   (match_dup 0)
 	   (const_int 1))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmsub<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<ssescalarmode>")])
@@ -2250,7 +2274,8 @@
 	      (match_dup 2))
 	    (match_dup 3))
 	  (const_int 170)))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmaddsubps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V8SF")])
@@ -2269,7 +2294,8 @@
 	      (match_dup 2))
 	    (match_dup 3))
 	  (const_int 10)))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmaddsubpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V4DF")])
@@ -2288,7 +2314,8 @@
 	      (match_dup 2))
 	    (match_dup 3))
 	  (const_int 10)))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmaddsubps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V4SF")])
@@ -2307,7 +2334,8 @@
 	      (match_dup 2))
 	    (match_dup 3))
 	  (const_int 2)))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmaddsubpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V2DF")])
@@ -2326,7 +2354,8 @@
 	      (match_dup 2))
 	    (match_dup 3))
 	  (const_int 85)))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsubaddps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V8SF")])
@@ -2345,7 +2374,8 @@
 	      (match_dup 2))
 	    (match_dup 3))
 	  (const_int 5)))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsubaddpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V4DF")])
@@ -2364,7 +2394,8 @@
 	      (match_dup 2))
 	    (match_dup 3))
 	  (const_int 5)))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsubaddps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V4SF")])
@@ -2383,7 +2414,8 @@
 	      (match_dup 2))
 	    (match_dup 3))
 	  (const_int 1)))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsubaddpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V2DF")])
@@ -2406,7 +2438,8 @@
 	     (match_dup 3))
 	   (const_int 170))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmaddsubps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V8SF")])
@@ -2427,7 +2460,8 @@
 	     (match_dup 3))
 	   (const_int 10))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmaddsubpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V4DF")])
@@ -2448,7 +2482,8 @@
 	     (match_dup 3))
 	   (const_int 10))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmaddsubps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V4SF")])
@@ -2469,7 +2504,8 @@
 	     (match_dup 3))
 	   (const_int 2))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmaddsubpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V2DF")])
@@ -2490,7 +2526,8 @@
 	     (match_dup 3))
 	   (const_int 85))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsubaddps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V8SF")])
@@ -2511,7 +2548,8 @@
 	     (match_dup 3))
 	   (const_int 5))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsubaddpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V4DF")])
@@ -2532,7 +2570,8 @@
 	     (match_dup 3))
 	   (const_int 5))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsubaddps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V4SF")])
@@ -2553,7 +2592,8 @@
 	     (match_dup 3))
 	   (const_int 1))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
+  "TARGET_FMA4 && TARGET_FUSED_MADD
+   && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsubaddpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V2DF")])
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index a1df226..22fad8f 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -593,7 +593,7 @@ Objective-C and Objective-C++ Dialects}.
 -mincoming-stack-boundary=@var{num}
 -mcld -mcx16 -msahf -mmovbe -mcrc32 -mrecip @gol
 -mmmx  -msse  -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msse4 -mavx @gol
--maes -mpclmul @gol
+-maes -mpclmul -mfused-madd @gol
 -msse4a -m3dnow -mpopcnt -mabm -mfma4 -mxop -mlwp @gol
 -mthreads  -mno-align-stringops  -minline-all-stringops @gol
 -minline-stringops-dynamically -mstringop-strategy=@var{alg} @gol
@@ -12062,6 +12062,13 @@ supported architecture, using the appropriate flags.  In particular,
 the file containing the CPU detection code should be compiled without
 these options.
 
+@item -mfused-madd
+@itemx -mno-fused-madd
+@opindex mfused-madd
+@opindex mno-fused-madd
+Do (don't) generate code that uses the fused multiply/add or multiply/subtract
+instructions.  The default is to use these instructions.
+
 @item -mcld
 @opindex mcld
 This option instructs GCC to emit a @code{cld} instruction in the prologue
@@ -12397,7 +12404,7 @@ Do not generate inline code for sqrt.
 @opindex mfused-madd
 @opindex mno-fused-madd
 Do (don't) generate code that uses the fused multiply/add or multiply/subtract
-instructions.    The default is to use these instructions.
+instructions.  The default is to use these instructions.
 
 @item -mno-dwarf2-asm
 @itemx -mdwarf2-asm
-- 
1.6.0.4


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Vector permutation support for x86
  2009-12-03 23:37           ` Sebastian Pop
@ 2009-12-04  5:21             ` Richard Henderson
  0 siblings, 0 replies; 45+ messages in thread
From: Richard Henderson @ 2009-12-04  5:21 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: Uros Bizjak, GCC Patches

Ok for the whole patch set.


r~

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Vector permutation support for x86
  2009-12-03 20:44         ` Richard Henderson
@ 2009-12-03 23:37           ` Sebastian Pop
  2009-12-04  5:21             ` Richard Henderson
  0 siblings, 1 reply; 45+ messages in thread
From: Sebastian Pop @ 2009-12-03 23:37 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Uros Bizjak, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 702 bytes --]

On Thu, Dec 3, 2009 at 13:53, Richard Henderson <rth@redhat.com> wrote:
>
> Careful -- op3 is in a different spot here.
> You set op3 as the register operand, not op1 as intended.
>
> You missed changing the operand constraints here.
> You'll fail the assertions in ix86_expand_fma4_multiple_memory.
>
> Likewise with xop_pmacsdd.
>

Fixed as shown in 1755-Fix-FMA4-and-XOP-splitters.diff.

> Otherwise ok.
>

Does this ok apply for the rest of the patch set as well?
The above patch is squashed in 0003 below:
 	
0001-Remove-unused-operand.patch
0002-For-FMA4-force-all-operands-into-registers.patch
0003-Fix-FMA4-and-XOP-insns.patch

This patch set has been tested on amd64-linux.

Thanks,
Sebastian

[-- Attachment #2: 0001-Remove-unused-operand.patch --]
[-- Type: text/x-patch, Size: 5321 bytes --]

From 2222a3cd31590d183e328fb6634e3c5128af3f47 Mon Sep 17 00:00:00 2001
From: Sebastian Pop <sebpop@gmail.com>
Date: Tue, 1 Dec 2009 14:10:50 -0600
Subject: [PATCH] Remove unused operand.

2009-12-02  Sebastian Pop  <sebastian.pop@amd.com>

	* config/i386/i386.c (ix86_expand_fma4_multiple_memory): Remove unused
	parameter.
	* config/i386/i386-protos.h (ix86_expand_fma4_multiple_memory): Same.
	* config/i386/sse.md: Same.
---
 gcc/config/i386/i386-protos.h |    2 +-
 gcc/config/i386/i386.c        |    5 ++---
 gcc/config/i386/sse.md        |   20 ++++++++++----------
 3 files changed, 13 insertions(+), 14 deletions(-)

diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 1451e79..bb55da1 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -219,7 +219,7 @@ extern void ix86_expand_vector_extract (bool, rtx, rtx, int);
 extern void ix86_expand_reduc_v4sf (rtx (*)(rtx, rtx, rtx), rtx, rtx);
 
 extern bool ix86_fma4_valid_op_p (rtx [], rtx, int, bool, int, bool);
-extern void ix86_expand_fma4_multiple_memory (rtx [], int, enum machine_mode);
+extern void ix86_expand_fma4_multiple_memory (rtx [], enum machine_mode);
 
 extern void ix86_expand_vec_extract_even_odd (rtx, rtx, rtx, unsigned);
 
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 462f2d5..82ec08f 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -28962,12 +28962,11 @@ ix86_fma4_valid_op_p (rtx operands[], rtx insn ATTRIBUTE_UNUSED, int num,
 
 void
 ix86_expand_fma4_multiple_memory (rtx operands[],
-				  int num,
 				  enum machine_mode mode)
 {
   rtx op0 = operands[0];
-  if (num != 4
-      || memory_operand (op0, mode)
+
+  if (memory_operand (op0, mode)
       || reg_mentioned_p (op0, operands[1])
       || reg_mentioned_p (op0, operands[2])
       || reg_mentioned_p (op0, operands[3]))
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 08a3b5b..4899c0a 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1731,7 +1731,7 @@
    && !reg_mentioned_p (operands[0], operands[3])"
   [(const_int 0)]
 {
-  ix86_expand_fma4_multiple_memory (operands, 4, <MODE>mode);
+  ix86_expand_fma4_multiple_memory (operands, <MODE>mode);
   emit_insn (gen_fma4_fmadd<mode>4256 (operands[0], operands[1],
 				    operands[2], operands[3]));
   DONE;
@@ -1768,7 +1768,7 @@
    && !reg_mentioned_p (operands[0], operands[3])"
   [(const_int 0)]
 {
-  ix86_expand_fma4_multiple_memory (operands, 4, <MODE>mode);
+  ix86_expand_fma4_multiple_memory (operands, <MODE>mode);
   emit_insn (gen_fma4_fmsub<mode>4256 (operands[0], operands[1],
 				    operands[2], operands[3]));
   DONE;
@@ -1807,7 +1807,7 @@
    && !reg_mentioned_p (operands[0], operands[3])"
   [(const_int 0)]
 {
-  ix86_expand_fma4_multiple_memory (operands, 4, <MODE>mode);
+  ix86_expand_fma4_multiple_memory (operands, <MODE>mode);
   emit_insn (gen_fma4_fnmadd<mode>4256 (operands[0], operands[1],
 				     operands[2], operands[3]));
   DONE;
@@ -1847,7 +1847,7 @@
    && !reg_mentioned_p (operands[0], operands[3])"
   [(const_int 0)]
 {
-  ix86_expand_fma4_multiple_memory (operands, 4, <MODE>mode);
+  ix86_expand_fma4_multiple_memory (operands, <MODE>mode);
   emit_insn (gen_fma4_fnmsub<mode>4256 (operands[0], operands[1],
 				        operands[2], operands[3]));
   DONE;
@@ -1883,7 +1883,7 @@
    && !reg_mentioned_p (operands[0], operands[3])"
   [(const_int 0)]
 {
-  ix86_expand_fma4_multiple_memory (operands, 4, <MODE>mode);
+  ix86_expand_fma4_multiple_memory (operands, <MODE>mode);
   emit_insn (gen_fma4_fmadd<mode>4 (operands[0], operands[1],
 				    operands[2], operands[3]));
   DONE;
@@ -1939,7 +1939,7 @@
    && !reg_mentioned_p (operands[0], operands[3])"
   [(const_int 0)]
 {
-  ix86_expand_fma4_multiple_memory (operands, 4, <MODE>mode);
+  ix86_expand_fma4_multiple_memory (operands, <MODE>mode);
   emit_insn (gen_fma4_fmsub<mode>4 (operands[0], operands[1],
 				    operands[2], operands[3]));
   DONE;
@@ -1997,7 +1997,7 @@
    && !reg_mentioned_p (operands[0], operands[3])"
   [(const_int 0)]
 {
-  ix86_expand_fma4_multiple_memory (operands, 4, <MODE>mode);
+  ix86_expand_fma4_multiple_memory (operands, <MODE>mode);
   emit_insn (gen_fma4_fnmadd<mode>4 (operands[0], operands[1],
 				     operands[2], operands[3]));
   DONE;
@@ -2056,7 +2056,7 @@
    && !reg_mentioned_p (operands[0], operands[3])"
   [(const_int 0)]
 {
-  ix86_expand_fma4_multiple_memory (operands, 4, <MODE>mode);
+  ix86_expand_fma4_multiple_memory (operands, <MODE>mode);
   emit_insn (gen_fma4_fnmsub<mode>4 (operands[0], operands[1],
 				     operands[2], operands[3]));
   DONE;
@@ -10384,7 +10384,7 @@
    && !reg_mentioned_p (operands[0], operands[3])"
   [(const_int 0)]
 {
-  ix86_expand_fma4_multiple_memory (operands, 4, V8HImode);
+  ix86_expand_fma4_multiple_memory (operands, V8HImode);
   emit_insn (gen_xop_pmacsww (operands[0], operands[1], operands[2],
 			      operands[3]));
   DONE;
@@ -10436,7 +10436,7 @@
    && !reg_mentioned_p (operands[0], operands[3])"
   [(const_int 0)]
 {
-  ix86_expand_fma4_multiple_memory (operands, 4, V4SImode);
+  ix86_expand_fma4_multiple_memory (operands, V4SImode);
   emit_insn (gen_xop_pmacsdd (operands[0], operands[1], operands[2],
 			      operands[3]));
   DONE;
-- 
1.6.0.4


[-- Attachment #3: 0002-For-FMA4-force-all-operands-into-registers.patch --]
[-- Type: text/x-patch, Size: 1196 bytes --]

From e170ea82fd01e2698fadba1dbdd2ae8a82a5b816 Mon Sep 17 00:00:00 2001
From: Sebastian Pop <sebpop@gmail.com>
Date: Wed, 2 Dec 2009 13:15:48 -0600
Subject: [PATCH] For FMA4, force all operands into registers.

2009-12-02  Richard Henderson  <rth@redhat.com>

	* config/i386/i386.c (ix86_fixup_binary_operands): For FMA4, force
	all operands into registers.
---
 gcc/config/i386/i386.c |   10 ++++++++++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 82ec08f..436e935 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -13384,6 +13384,16 @@ ix86_fixup_binary_operands (enum rtx_code code, enum machine_mode mode,
   if (MEM_P (src1) && !rtx_equal_p (dst, src1))
     src1 = force_reg (mode, src1);
 
+  /* In order for the multiply-add patterns to get matched, we need
+     to aid combine by forcing all operands into registers to start.  */
+  if (optimize && TARGET_FMA4)
+    {
+      if (MEM_P (src2))
+	src2 = force_reg (GET_MODE (src2), src2);
+      else if (MEM_P (src1))
+	src1 = force_reg (GET_MODE (src1), src1);
+    }
+
   operands[1] = src1;
   operands[2] = src2;
   return dst;
-- 
1.6.0.4


[-- Attachment #4: 0003-Fix-FMA4-and-XOP-insns.patch --]
[-- Type: text/x-patch, Size: 72766 bytes --]

From 67c4f47e3f3be277fd291c393b8d3bcec6770fa9 Mon Sep 17 00:00:00 2001
From: Sebastian Pop <sebpop@gmail.com>
Date: Wed, 2 Dec 2009 15:39:15 -0600
Subject: [PATCH] Fix FMA4 and XOP insns.

2009-12-02  Sebastian Pop  <sebastian.pop@amd.com>
	    Richard Henderson  <rth@redhat.com>

	* config/i386/i386-protos.h (ix86_fma4_valid_op_p): Removed.
	* config/i386/i386.c (ix86_fma4_valid_op_p): Removed.
	* config/i386/i386.md: Do not use ix86_fma4_valid_op_p.

	* config/i386/sse.md (fma4_*): Remove alternative with operand 1
	matching a memory access.  Do not use ix86_fma4_valid_op_p.
	(xop_*): Same.
	Do not use ix86_fma4_valid_op_p in FMA4 and XOP splitters.
---
 gcc/config/i386/i386-protos.h |    3 +-
 gcc/config/i386/i386.c        |  200 +----------
 gcc/config/i386/i386.md       |    2 +-
 gcc/config/i386/sse.md        |  794 ++++++++++++++++++-----------------------
 4 files changed, 374 insertions(+), 625 deletions(-)

diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index bb55da1..cf29cc7 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -218,8 +218,7 @@ extern void ix86_expand_vector_set (bool, rtx, rtx, int);
 extern void ix86_expand_vector_extract (bool, rtx, rtx, int);
 extern void ix86_expand_reduc_v4sf (rtx (*)(rtx, rtx, rtx), rtx, rtx);
 
-extern bool ix86_fma4_valid_op_p (rtx [], rtx, int, bool, int, bool);
-extern void ix86_expand_fma4_multiple_memory (rtx [], enum machine_mode);
+extern bool ix86_expand_fma4_multiple_memory (rtx [], enum machine_mode);
 
 extern void ix86_expand_vec_extract_even_odd (rtx, rtx, rtx, unsigned);
 
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 436e935..ade3a7d 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -28807,197 +28807,35 @@ ix86_expand_round (rtx operand0, rtx operand1)
   emit_move_insn (operand0, res);
 }
 \f
-/* Validate whether a FMA4 instruction is valid or not.
-   OPERANDS is the array of operands.
-   NUM is the number of operands.
-   USES_OC0 is true if the instruction uses OC0 and provides 4 variants.
-   NUM_MEMORY is the maximum number of memory operands to accept.
-   NUM_MEMORY less than zero is a special case to allow an operand
-   of an instruction to be memory operation.
-   when COMMUTATIVE is set, operand 1 and 2 can be swapped.  */
 
-bool
-ix86_fma4_valid_op_p (rtx operands[], rtx insn ATTRIBUTE_UNUSED, int num,
-		      bool uses_oc0, int num_memory, bool commutative)
-{
-  int mem_mask;
-  int mem_count;
-  int i;
-
-  /* Count the number of memory arguments */
-  mem_mask = 0;
-  mem_count = 0;
-  for (i = 0; i < num; i++)
-    {
-      enum machine_mode mode = GET_MODE (operands[i]);
-      if (register_operand (operands[i], mode))
-	;
-
-      else if (memory_operand (operands[i], mode))
-	{
-	  mem_mask |= (1 << i);
-	  mem_count++;
-	}
-
-      else
-	{
-	  rtx pattern = PATTERN (insn);
-
-	  /* allow 0 for pcmov */
-	  if (GET_CODE (pattern) != SET
-	      || GET_CODE (SET_SRC (pattern)) != IF_THEN_ELSE
-	      || i < 2
-	      || operands[i] != CONST0_RTX (mode))
-	    return false;
-	}
-    }
-
-  /* Special case pmacsdq{l,h} where we allow the 3rd argument to be
-     a memory operation.  */
-  if (num_memory < 0)
-    {
-      num_memory = -num_memory;
-      if ((mem_mask & (1 << (num-1))) != 0)
-	{
-	  mem_mask &= ~(1 << (num-1));
-	  mem_count--;
-	}
-    }
-
-  /* If there were no memory operations, allow the insn */
-  if (mem_mask == 0)
-    return true;
-
-  /* Do not allow the destination register to be a memory operand.  */
-  else if (mem_mask & (1 << 0))
-    return false;
-
-  /* If there are too many memory operations, disallow the instruction.  While
-     the hardware only allows 1 memory reference, before register allocation
-     for some insns, we allow two memory operations sometimes in order to allow
-     code like the following to be optimized:
+/* Fixup an FMA4 or XOP instruction that has 2 memory input references
+   into a form the hardware will allow by using the destination
+   register to load one of the memory operations.  Presently this is
+   used by the multiply/add routines to allow 2 memory references.  */
 
-	float fmadd (float *a, float *b, float *c) { return (*a * *b) + *c; }
-
-    or similar cases that are vectorized into using the vfmaddss
-    instruction.  */
-  else if (mem_count > num_memory)
-    return false;
-
-  /* Don't allow more than one memory operation if not optimizing.  */
-  else if (mem_count > 1 && !optimize)
-    return false;
-
-  else if (num == 4 && mem_count == 1)
-    {
-      /* formats (destination is the first argument), example vfmaddss:
-	 xmm1, xmm1, xmm2, xmm3/mem
-	 xmm1, xmm1, xmm2/mem, xmm3
-	 xmm1, xmm2, xmm3/mem, xmm1
-	 xmm1, xmm2/mem, xmm3, xmm1 */
-      if (uses_oc0)
-	return ((mem_mask == (1 << 1))
-		|| (mem_mask == (1 << 2))
-		|| (mem_mask == (1 << 3)));
-
-      /* format, example vpmacsdd:
-	 xmm1, xmm2, xmm3/mem, xmm1 */
-      if (commutative)
-	return (mem_mask == (1 << 2) || mem_mask == (1 << 1));
-      else
-	return (mem_mask == (1 << 2));
-    }
-
-  else if (num == 4 && num_memory == 2)
-    {
-      /* If there are two memory operations, we can load one of the memory ops
-	 into the destination register.  This is for optimizing the
-	 multiply/add ops, which the combiner has optimized both the multiply
-	 and the add insns to have a memory operation.  We have to be careful
-	 that the destination doesn't overlap with the inputs.  */
-      rtx op0 = operands[0];
-
-      if (reg_mentioned_p (op0, operands[1])
-	  || reg_mentioned_p (op0, operands[2])
-	  || reg_mentioned_p (op0, operands[3]))
-	return false;
-
-      /* formats (destination is the first argument), example vfmaddss:
-	 xmm1, xmm1, xmm2, xmm3/mem
-	 xmm1, xmm1, xmm2/mem, xmm3
-	 xmm1, xmm2, xmm3/mem, xmm1
-	 xmm1, xmm2/mem, xmm3, xmm1
-
-         For the oc0 case, we will load either operands[1] or operands[3] into
-         operands[0], so any combination of 2 memory operands is ok.  */
-      if (uses_oc0)
-	return true;
-
-      /* format, example vpmacsdd:
-	 xmm1, xmm2, xmm3/mem, xmm1
-
-         For the integer multiply/add instructions be more restrictive and
-         require operands[2] and operands[3] to be the memory operands.  */
-      if (commutative)
-	return (mem_mask == ((1 << 1) | (1 << 3)) || ((1 << 2) | (1 << 3)));
-      else
-	return (mem_mask == ((1 << 2) | (1 << 3)));
-    }
-
-  else if (num == 3 && num_memory == 1)
-    {
-      /* formats, example vprotb:
-	 xmm1, xmm2, xmm3/mem
-	 xmm1, xmm2/mem, xmm3 */
-      if (uses_oc0)
-	return ((mem_mask == (1 << 1)) || (mem_mask == (1 << 2)));
-
-      /* format, example vpcomeq:
-	 xmm1, xmm2, xmm3/mem */
-      else
-	return (mem_mask == (1 << 2));
-    }
-
-  else
-    gcc_unreachable ();
-
-  return false;
-}
-
-
-/* Fixup an FMA4 instruction that has 2 memory input references into a form the
-   hardware will allow by using the destination register to load one of the
-   memory operations.  Presently this is used by the multiply/add routines to
-   allow 2 memory references.  */
-
-void
+bool
 ix86_expand_fma4_multiple_memory (rtx operands[],
 				  enum machine_mode mode)
 {
-  rtx op0 = operands[0];
+  rtx scratch = operands[0];
 
-  if (memory_operand (op0, mode)
-      || reg_mentioned_p (op0, operands[1])
-      || reg_mentioned_p (op0, operands[2])
-      || reg_mentioned_p (op0, operands[3]))
-    gcc_unreachable ();
+  gcc_assert (register_operand (operands[0], mode));
+  gcc_assert (register_operand (operands[1], mode));
+  gcc_assert (MEM_P (operands[2]) && MEM_P (operands[3]));
 
-  /* For 2 memory operands, pick either operands[1] or operands[3] to move into
-     the destination register.  */
-  if (memory_operand (operands[1], mode))
+  if (reg_mentioned_p (scratch, operands[1]))
     {
-      emit_move_insn (op0, operands[1]);
-      operands[1] = op0;
-    }
-  else if (memory_operand (operands[3], mode))
-    {
-      emit_move_insn (op0, operands[3]);
-      operands[3] = op0;
+      if (!can_create_pseudo_p ())
+	return false;
+      scratch = gen_reg_rtx (mode);
     }
-  else
-    gcc_unreachable ();
 
-  return;
+  emit_move_insn (scratch, operands[3]);
+  if (rtx_equal_p (operands[2], operands[3]))
+    operands[2] = operands[3] = scratch;
+  else
+    operands[3] = scratch;
+  return true;
 }
 
 /* Table of valid machine attributes.  */
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 851061d..1ef3025 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -19248,7 +19248,7 @@
 	  (match_operand:MODEF 1 "register_operand" "x")
 	  (match_operand:MODEF 2 "register_operand" "x")
 	  (match_operand:MODEF 3 "register_operand" "x")))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)"
+  "TARGET_XOP"
   "vpcmov\t{%1, %3, %2, %0|%0, %2, %3, %1}"
   [(set_attr "type" "sse4arg")])
 
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 4899c0a..78e4b6a 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1703,14 +1703,13 @@
 ;;	(set (reg3) (plus (reg2) (mem (addr3))))
 
 (define_insn "fma4_fmadd<mode>4256"
-  [(set (match_operand:FMA4MODEF4 0 "register_operand" "=x,x,x")
+  [(set (match_operand:FMA4MODEF4 0 "register_operand" "=x,x")
 	(plus:FMA4MODEF4
 	 (mult:FMA4MODEF4
-	  (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "x,x,xm")
-	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,xm,x"))
-	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x,x")))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+	  (match_operand:FMA4MODEF4 1 "register_operand" "%x,x")
+	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,m"))
+	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x")))]
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmadd<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -1720,34 +1719,29 @@
   [(set (match_operand:FMA4MODEF4 0 "register_operand" "")
 	(plus:FMA4MODEF4
 	 (mult:FMA4MODEF4
-	  (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "")
-	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" ""))
-	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "")))]
-  "TARGET_FMA4
-   && !ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)
-   && !reg_mentioned_p (operands[0], operands[1])
-   && !reg_mentioned_p (operands[0], operands[2])
-   && !reg_mentioned_p (operands[0], operands[3])"
-  [(const_int 0)]
+	  (match_operand:FMA4MODEF4 1 "register_operand" "")
+	  (match_operand:FMA4MODEF4 2 "memory_operand" ""))
+	 (match_operand:FMA4MODEF4 3 "memory_operand" "")))]
+  "TARGET_FMA4"
+  [(set (match_dup 0)
+        (plus:FMA4MODEF4
+         (mult:FMA4MODEF4 (match_dup 1) (match_dup 2))
+         (match_dup 3)))]
 {
-  ix86_expand_fma4_multiple_memory (operands, <MODE>mode);
-  emit_insn (gen_fma4_fmadd<mode>4256 (operands[0], operands[1],
-				    operands[2], operands[3]));
-  DONE;
+ if (!ix86_expand_fma4_multiple_memory (operands, <MODE>mode))
+   FAIL;
 })
 
 ;; Floating multiply and subtract
 ;; Allow two memory operands the same as fmadd
 (define_insn "fma4_fmsub<mode>4256"
-  [(set (match_operand:FMA4MODEF4 0 "register_operand" "=x,x,x")
+  [(set (match_operand:FMA4MODEF4 0 "register_operand" "=x,x")
 	(minus:FMA4MODEF4
 	 (mult:FMA4MODEF4
-	  (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "x,x,xm")
-	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,xm,x"))
-	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x,x")))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+	  (match_operand:FMA4MODEF4 1 "register_operand" "%x,x")
+	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,m"))
+	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x")))]
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsub<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -1757,21 +1751,17 @@
   [(set (match_operand:FMA4MODEF4 0 "register_operand" "")
 	(minus:FMA4MODEF4
 	 (mult:FMA4MODEF4
-	  (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "")
-	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" ""))
-	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "")))]
-  "TARGET_FMA4
-   && !ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)
-   && !reg_mentioned_p (operands[0], operands[1])
-   && !reg_mentioned_p (operands[0], operands[2])
-   && !reg_mentioned_p (operands[0], operands[3])"
-  [(const_int 0)]
+	  (match_operand:FMA4MODEF4 1 "register_operand" "")
+	  (match_operand:FMA4MODEF4 2 "memory_operand" ""))
+	 (match_operand:FMA4MODEF4 3 "memory_operand" "")))]
+  "TARGET_FMA4"
+  [(set (match_dup 0)
+        (minus:FMA4MODEF4
+         (mult:FMA4MODEF4 (match_dup 1) (match_dup 2))
+         (match_dup 3)))]
 {
-  ix86_expand_fma4_multiple_memory (operands, <MODE>mode);
-  emit_insn (gen_fma4_fmsub<mode>4256 (operands[0], operands[1],
-				    operands[2], operands[3]));
-  DONE;
+ if (!ix86_expand_fma4_multiple_memory (operands, <MODE>mode))
+   FAIL;
 })
 
 ;; Floating point negative multiply and add
@@ -1779,14 +1769,13 @@
 ;; Note operands are out of order to simplify call to ix86_fma4_valid_p
 ;; Allow two memory operands to help in optimizing.
 (define_insn "fma4_fnmadd<mode>4256"
-  [(set (match_operand:FMA4MODEF4 0 "register_operand" "=x,x,x")
+  [(set (match_operand:FMA4MODEF4 0 "register_operand" "=x,x")
 	(minus:FMA4MODEF4
-	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x,x")
+	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x")
 	 (mult:FMA4MODEF4
-	  (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "x,x,xm")
-	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,xm,x"))))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+	  (match_operand:FMA4MODEF4 1 "register_operand" "%x,x")
+	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,m"))))]
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmadd<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -1795,22 +1784,18 @@
 (define_split
   [(set (match_operand:FMA4MODEF4 0 "register_operand" "")
 	(minus:FMA4MODEF4
-	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "")
+	 (match_operand:FMA4MODEF4 3 "memory_operand" "")
 	 (mult:FMA4MODEF4
-	  (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "")
-	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" ""))))]
-  "TARGET_FMA4
-   && !ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)
-   && !reg_mentioned_p (operands[0], operands[1])
-   && !reg_mentioned_p (operands[0], operands[2])
-   && !reg_mentioned_p (operands[0], operands[3])"
-  [(const_int 0)]
+	  (match_operand:FMA4MODEF4 1 "register_operand" "")
+	  (match_operand:FMA4MODEF4 2 "memory_operand" ""))))]
+  "TARGET_FMA4"
+  [(set (match_dup 0)
+        (minus:FMA4MODEF4
+	 (match_dup 3)
+         (mult:FMA4MODEF4 (match_dup 1) (match_dup 2))))]
 {
-  ix86_expand_fma4_multiple_memory (operands, <MODE>mode);
-  emit_insn (gen_fma4_fnmadd<mode>4256 (operands[0], operands[1],
-				     operands[2], operands[3]));
-  DONE;
+  if (!ix86_expand_fma4_multiple_memory (operands, <MODE>mode))
+    FAIL;
 })
 
 ;; Floating point negative multiply and subtract
@@ -1821,11 +1806,10 @@
 	(minus:FMA4MODEF4
 	 (mult:FMA4MODEF4
 	  (neg:FMA4MODEF4
-	   (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "x,x"))
-	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,xm"))
+	   (match_operand:FMA4MODEF4 1 "register_operand" "%x,x"))
+	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,m"))
 	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x")))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, false)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmsub<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -1836,33 +1820,30 @@
 	(minus:FMA4MODEF4
 	 (mult:FMA4MODEF4
 	  (neg:FMA4MODEF4
-	   (match_operand:FMA4MODEF4 1 "nonimmediate_operand" ""))
-	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" ""))
-	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "")))]
-  "TARGET_FMA4
-   && !ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, false)
-   && !reg_mentioned_p (operands[0], operands[1])
-   && !reg_mentioned_p (operands[0], operands[2])
-   && !reg_mentioned_p (operands[0], operands[3])"
-  [(const_int 0)]
+	   (match_operand:FMA4MODEF4 1 "register_operand" ""))
+	  (match_operand:FMA4MODEF4 2 "memory_operand" ""))
+	 (match_operand:FMA4MODEF4 3 "memory_operand" "")))]
+  "TARGET_FMA4"
+  [(set (match_dup 0)
+        (minus:FMA4MODEF4
+         (mult:FMA4MODEF4
+	  (neg:FMA4MODEF4 (match_dup 1))
+	  (match_dup 2))
+         (match_dup 3)))]
 {
-  ix86_expand_fma4_multiple_memory (operands, <MODE>mode);
-  emit_insn (gen_fma4_fnmsub<mode>4256 (operands[0], operands[1],
-				        operands[2], operands[3]));
-  DONE;
+  if (!ix86_expand_fma4_multiple_memory (operands, <MODE>mode))
+    FAIL;
 })
 
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 (define_insn "fma4_fmadd<mode>4"
-  [(set (match_operand:SSEMODEF4 0 "register_operand" "=x,x,x")
+  [(set (match_operand:SSEMODEF4 0 "register_operand" "=x,x")
 	(plus:SSEMODEF4
 	 (mult:SSEMODEF4
-	  (match_operand:SSEMODEF4 1 "nonimmediate_operand" "x,x,xm")
-	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,xm,x"))
-	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "xm,x,x")))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+	  (match_operand:SSEMODEF4 1 "register_operand" "%x,x")
+	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,m"))
+	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "xm,x")))]
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmadd<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -1872,21 +1853,17 @@
   [(set (match_operand:SSEMODEF4 0 "register_operand" "")
 	(plus:SSEMODEF4
 	 (mult:SSEMODEF4
-	  (match_operand:SSEMODEF4 1 "nonimmediate_operand" "")
-	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" ""))
-	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "")))]
-  "TARGET_FMA4
-   && !ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)
-   && !reg_mentioned_p (operands[0], operands[1])
-   && !reg_mentioned_p (operands[0], operands[2])
-   && !reg_mentioned_p (operands[0], operands[3])"
-  [(const_int 0)]
+	  (match_operand:SSEMODEF4 1 "register_operand" "")
+	  (match_operand:SSEMODEF4 2 "memory_operand" ""))
+	 (match_operand:SSEMODEF4 3 "memory_operand" "")))]
+  "TARGET_FMA4"
+  [(set (match_dup 0)
+        (plus:SSEMODEF4
+         (mult:SSEMODEF4 (match_dup 1) (match_dup 2))
+         (match_dup 3)))]
 {
-  ix86_expand_fma4_multiple_memory (operands, <MODE>mode);
-  emit_insn (gen_fma4_fmadd<mode>4 (operands[0], operands[1],
-				    operands[2], operands[3]));
-  DONE;
+  if (!ix86_expand_fma4_multiple_memory (operands, <MODE>mode))
+    FAIL;
 })
 
 ;; For the scalar operations, use operand1 for the upper words that aren't
@@ -1897,13 +1874,12 @@
 	(vec_merge:SSEMODEF2P
 	 (plus:SSEMODEF2P
 	  (mult:SSEMODEF2P
-	   (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "x,x")
-	   (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm"))
+	   (match_operand:SSEMODEF2P 1 "register_operand" "%x,x")
+	   (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m"))
 	  (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))
 	 (match_dup 0)
 	 (const_int 1)))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmadd<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -1911,14 +1887,13 @@
 ;; Floating multiply and subtract
 ;; Allow two memory operands the same as fmadd
 (define_insn "fma4_fmsub<mode>4"
-  [(set (match_operand:SSEMODEF4 0 "register_operand" "=x,x,x")
+  [(set (match_operand:SSEMODEF4 0 "register_operand" "=x,x")
 	(minus:SSEMODEF4
 	 (mult:SSEMODEF4
-	  (match_operand:SSEMODEF4 1 "nonimmediate_operand" "x,x,xm")
-	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,xm,x"))
-	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "xm,x,x")))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+	  (match_operand:SSEMODEF4 1 "register_operand" "%x,x")
+	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,m"))
+	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "xm,x")))]
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsub<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -1928,21 +1903,17 @@
   [(set (match_operand:SSEMODEF4 0 "register_operand" "")
 	(minus:SSEMODEF4
 	 (mult:SSEMODEF4
-	  (match_operand:SSEMODEF4 1 "nonimmediate_operand" "")
-	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" ""))
-	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "")))]
-  "TARGET_FMA4
-   && !ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)
-   && !reg_mentioned_p (operands[0], operands[1])
-   && !reg_mentioned_p (operands[0], operands[2])
-   && !reg_mentioned_p (operands[0], operands[3])"
-  [(const_int 0)]
+	  (match_operand:SSEMODEF4 1 "register_operand" "")
+	  (match_operand:SSEMODEF4 2 "memory_operand" ""))
+	 (match_operand:SSEMODEF4 3 "memory_operand" "")))]
+  "TARGET_FMA4"
+  [(set (match_dup 0)
+        (minus:SSEMODEF4
+         (mult:SSEMODEF4 (match_dup 1) (match_dup 2))
+         (match_dup 3)))]
 {
-  ix86_expand_fma4_multiple_memory (operands, <MODE>mode);
-  emit_insn (gen_fma4_fmsub<mode>4 (operands[0], operands[1],
-				    operands[2], operands[3]));
-  DONE;
+  if (!ix86_expand_fma4_multiple_memory (operands, <MODE>mode))
+    FAIL;
 })
 
 ;; For the scalar operations, use operand1 for the upper words that aren't
@@ -1953,13 +1924,12 @@
 	(vec_merge:SSEMODEF2P
 	 (minus:SSEMODEF2P
 	  (mult:SSEMODEF2P
-	   (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "x,x")
-	   (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm"))
+	   (match_operand:SSEMODEF2P 1 "register_operand" "%x,x")
+	   (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m"))
 	  (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))
 	 (match_dup 0)
 	 (const_int 1)))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsub<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -1969,14 +1939,13 @@
 ;; Note operands are out of order to simplify call to ix86_fma4_valid_p
 ;; Allow two memory operands to help in optimizing.
 (define_insn "fma4_fnmadd<mode>4"
-  [(set (match_operand:SSEMODEF4 0 "register_operand" "=x,x,x")
+  [(set (match_operand:SSEMODEF4 0 "register_operand" "=x,x")
 	(minus:SSEMODEF4
-	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "xm,x,x")
+	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "xm,x")
 	 (mult:SSEMODEF4
-	  (match_operand:SSEMODEF4 1 "nonimmediate_operand" "x,x,xm")
-	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,xm,x"))))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+	  (match_operand:SSEMODEF4 1 "register_operand" "%x,x")
+	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,m"))))]
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmadd<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -1985,22 +1954,18 @@
 (define_split
   [(set (match_operand:SSEMODEF4 0 "register_operand" "")
 	(minus:SSEMODEF4
-	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "")
+	 (match_operand:SSEMODEF4 3 "memory_operand" "")
 	 (mult:SSEMODEF4
-	  (match_operand:SSEMODEF4 1 "nonimmediate_operand" "")
-	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" ""))))]
-  "TARGET_FMA4
-   && !ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)
-   && !reg_mentioned_p (operands[0], operands[1])
-   && !reg_mentioned_p (operands[0], operands[2])
-   && !reg_mentioned_p (operands[0], operands[3])"
-  [(const_int 0)]
+	  (match_operand:SSEMODEF4 1 "register_operand" "")
+	  (match_operand:SSEMODEF4 2 "memory_operand" ""))))]
+  "TARGET_FMA4"
+  [(set (match_dup 0)
+        (minus:SSEMODEF4
+	 (match_dup 3)
+         (mult:SSEMODEF4 (match_dup 1) (match_dup 2))))]
 {
-  ix86_expand_fma4_multiple_memory (operands, <MODE>mode);
-  emit_insn (gen_fma4_fnmadd<mode>4 (operands[0], operands[1],
-				     operands[2], operands[3]));
-  DONE;
+  if (!ix86_expand_fma4_multiple_memory (operands, <MODE>mode))
+    FAIL;
 })
 
 ;; For the scalar operations, use operand1 for the upper words that aren't
@@ -2012,12 +1977,11 @@
 	 (minus:SSEMODEF2P
 	  (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x")
 	  (mult:SSEMODEF2P
-	   (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "x,x")
-	   (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm")))
+	   (match_operand:SSEMODEF2P 1 "register_operand" "%x,x")
+	   (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m")))
 	 (match_dup 0)
 	 (const_int 1)))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmadd<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2030,11 +1994,10 @@
 	(minus:SSEMODEF4
 	 (mult:SSEMODEF4
 	  (neg:SSEMODEF4
-	   (match_operand:SSEMODEF4 1 "nonimmediate_operand" "x,x"))
-	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,xm"))
+	   (match_operand:SSEMODEF4 1 "register_operand" "%x,x"))
+	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,m"))
 	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "xm,x")))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, false)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmsub<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2045,21 +2008,19 @@
 	(minus:SSEMODEF4
 	 (mult:SSEMODEF4
 	  (neg:SSEMODEF4
-	   (match_operand:SSEMODEF4 1 "nonimmediate_operand" ""))
-	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" ""))
-	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "")))]
-  "TARGET_FMA4
-   && !ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, false)
-   && !reg_mentioned_p (operands[0], operands[1])
-   && !reg_mentioned_p (operands[0], operands[2])
-   && !reg_mentioned_p (operands[0], operands[3])"
-  [(const_int 0)]
+	   (match_operand:SSEMODEF4 1 "register_operand" ""))
+	  (match_operand:SSEMODEF4 2 "memory_operand" ""))
+	 (match_operand:SSEMODEF4 3 "memory_operand" "")))]
+  "TARGET_FMA4"
+  [(set (match_dup 0)
+        (minus:SSEMODEF4
+         (mult:SSEMODEF4
+	  (neg:SSEMODEF4 (match_dup 1))
+	  (match_dup 2))
+         (match_dup 3)))]
 {
-  ix86_expand_fma4_multiple_memory (operands, <MODE>mode);
-  emit_insn (gen_fma4_fnmsub<mode>4 (operands[0], operands[1],
-				     operands[2], operands[3]));
-  DONE;
+  if (!ix86_expand_fma4_multiple_memory (operands, <MODE>mode))
+    FAIL;
 })
 
 ;; For the scalar operations, use operand1 for the upper words that aren't
@@ -2071,13 +2032,12 @@
 	 (minus:SSEMODEF2P
 	  (mult:SSEMODEF2P
 	   (neg:SSEMODEF2P
-	    (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "x,x"))
-	   (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm"))
+	    (match_operand:SSEMODEF2P 1 "register_operand" "%x,x"))
+	   (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m"))
 	  (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))
 	 (match_dup 0)
 	 (const_int 1)))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, false)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmsub<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2089,11 +2049,11 @@
 	(unspec:FMA4MODEF4
 	 [(plus:FMA4MODEF4
 	   (mult:FMA4MODEF4
-	    (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "x,x")
-	    (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,xm"))
+	    (match_operand:FMA4MODEF4 1 "register_operand" "%x,x")
+	    (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,m"))
 	   (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x"))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmadd<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2103,11 +2063,11 @@
 	(unspec:FMA4MODEF4
 	 [(minus:FMA4MODEF4
 	   (mult:FMA4MODEF4
-	    (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "x,x")
-	    (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,xm"))
+	    (match_operand:FMA4MODEF4 1 "register_operand" "%x,x")
+	    (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,m"))
 	   (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x"))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsub<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2118,10 +2078,10 @@
 	 [(minus:FMA4MODEF4
 	   (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x")
 	   (mult:FMA4MODEF4
-	    (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "x,x")
-	    (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,xm")))]
+	    (match_operand:FMA4MODEF4 1 "register_operand" "%x,x")
+	    (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,m")))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmadd<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2132,11 +2092,11 @@
 	 [(minus:FMA4MODEF4
 	   (mult:FMA4MODEF4
 	    (neg:FMA4MODEF4
-	     (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "x,x"))
-	    (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,xm"))
+	     (match_operand:FMA4MODEF4 1 "register_operand" "%x,x"))
+	    (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,m"))
 	   (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x"))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmsub<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2147,11 +2107,11 @@
 	(unspec:SSEMODEF2P
 	 [(plus:SSEMODEF2P
 	   (mult:SSEMODEF2P
-	    (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "x,x")
-	    (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm"))
+	    (match_operand:SSEMODEF2P 1 "register_operand" "%x,x")
+	    (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m"))
 	   (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmadd<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2161,11 +2121,11 @@
 	(unspec:SSEMODEF2P
 	 [(minus:SSEMODEF2P
 	   (mult:SSEMODEF2P
-	    (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "x,x")
-	    (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm"))
+	    (match_operand:SSEMODEF2P 1 "register_operand" "%x,x")
+	    (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m"))
 	   (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsub<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2176,10 +2136,10 @@
 	 [(minus:SSEMODEF2P
 	   (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x")
 	   (mult:SSEMODEF2P
-	    (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "x,x")
-	    (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm")))]
+	    (match_operand:SSEMODEF2P 1 "register_operand" "%x,x")
+	    (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m")))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmadd<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2190,11 +2150,11 @@
 	 [(minus:SSEMODEF2P
 	   (mult:SSEMODEF2P
 	    (neg:SSEMODEF2P
-	     (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "x,x"))
-	    (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm"))
+	     (match_operand:SSEMODEF2P 1 "register_operand" "%x,x"))
+	    (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m"))
 	   (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmsub<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2207,13 +2167,13 @@
 	 [(vec_merge:SSEMODEF2P
 	   (plus:SSEMODEF2P
 	    (mult:SSEMODEF2P
-	     (match_operand:SSEMODEF2P 1 "register_operand" "x,x")
-	     (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm"))
+	     (match_operand:SSEMODEF2P 1 "register_operand" "%x,x")
+	     (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m"))
 	    (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))
 	   (match_dup 0)
 	   (const_int 1))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmadd<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<ssescalarmode>")])
@@ -2224,13 +2184,13 @@
 	 [(vec_merge:SSEMODEF2P
 	   (minus:SSEMODEF2P
 	    (mult:SSEMODEF2P
-	     (match_operand:SSEMODEF2P 1 "register_operand" "x,x")
-	     (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm"))
+	     (match_operand:SSEMODEF2P 1 "register_operand" "%x,x")
+	     (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m"))
 	    (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))
 	   (match_dup 0)
 	   (const_int 1))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsub<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<ssescalarmode>")])
@@ -2242,12 +2202,12 @@
 	   (minus:SSEMODEF2P
 	    (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x")
 	    (mult:SSEMODEF2P
-	     (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "x,x")
-	     (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm")))
+	     (match_operand:SSEMODEF2P 1 "register_operand" "%x,x")
+	     (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m")))
 	   (match_dup 0)
 	   (const_int 1))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmadd<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<ssescalarmode>")])
@@ -2259,13 +2219,13 @@
 	   (minus:SSEMODEF2P
 	    (mult:SSEMODEF2P
 	     (neg:SSEMODEF2P
-	      (match_operand:SSEMODEF2P 1 "register_operand" "x,x"))
-	     (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm"))
+	      (match_operand:SSEMODEF2P 1 "register_operand" "%x,x"))
+	     (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m"))
 	    (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))
 	   (match_dup 0)
 	   (const_int 1))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmsub<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<ssescalarmode>")])
@@ -2281,8 +2241,8 @@
 	(vec_merge:V8SF
 	  (plus:V8SF
 	    (mult:V8SF
-	      (match_operand:V8SF 1 "nonimmediate_operand" "x,x")
-	      (match_operand:V8SF 2 "nonimmediate_operand" "x,xm"))
+	      (match_operand:V8SF 1 "register_operand" "%x,x")
+	      (match_operand:V8SF 2 "nonimmediate_operand" "x,m"))
 	    (match_operand:V8SF 3 "nonimmediate_operand" "xm,x"))
 	  (minus:V8SF
 	    (mult:V8SF
@@ -2290,8 +2250,7 @@
 	      (match_dup 2))
 	    (match_dup 3))
 	  (const_int 170)))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmaddsubps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V8SF")])
@@ -2301,8 +2260,8 @@
 	(vec_merge:V4DF
 	  (plus:V4DF
 	    (mult:V4DF
-	      (match_operand:V4DF 1 "nonimmediate_operand" "x,x")
-	      (match_operand:V4DF 2 "nonimmediate_operand" "x,xm"))
+	      (match_operand:V4DF 1 "register_operand" "%x,x")
+	      (match_operand:V4DF 2 "nonimmediate_operand" "x,m"))
 	    (match_operand:V4DF 3 "nonimmediate_operand" "xm,x"))
 	  (minus:V4DF
 	    (mult:V4DF
@@ -2310,8 +2269,7 @@
 	      (match_dup 2))
 	    (match_dup 3))
 	  (const_int 10)))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmaddsubpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V4DF")])
@@ -2321,8 +2279,8 @@
 	(vec_merge:V4SF
 	  (plus:V4SF
 	    (mult:V4SF
-	      (match_operand:V4SF 1 "nonimmediate_operand" "x,x")
-	      (match_operand:V4SF 2 "nonimmediate_operand" "x,xm"))
+	      (match_operand:V4SF 1 "register_operand" "%x,x")
+	      (match_operand:V4SF 2 "nonimmediate_operand" "x,m"))
 	    (match_operand:V4SF 3 "nonimmediate_operand" "xm,x"))
 	  (minus:V4SF
 	    (mult:V4SF
@@ -2330,8 +2288,7 @@
 	      (match_dup 2))
 	    (match_dup 3))
 	  (const_int 10)))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmaddsubps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V4SF")])
@@ -2341,8 +2298,8 @@
 	(vec_merge:V2DF
 	  (plus:V2DF
 	    (mult:V2DF
-	      (match_operand:V2DF 1 "nonimmediate_operand" "x,x")
-	      (match_operand:V2DF 2 "nonimmediate_operand" "x,xm"))
+	      (match_operand:V2DF 1 "register_operand" "%x,x")
+	      (match_operand:V2DF 2 "nonimmediate_operand" "x,m"))
 	    (match_operand:V2DF 3 "nonimmediate_operand" "xm,x"))
 	  (minus:V2DF
 	    (mult:V2DF
@@ -2350,8 +2307,7 @@
 	      (match_dup 2))
 	    (match_dup 3))
 	  (const_int 2)))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmaddsubpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V2DF")])
@@ -2361,8 +2317,8 @@
 	(vec_merge:V8SF
 	  (plus:V8SF
 	    (mult:V8SF
-	      (match_operand:V8SF 1 "nonimmediate_operand" "x,x")
-	      (match_operand:V8SF 2 "nonimmediate_operand" "x,xm"))
+	      (match_operand:V8SF 1 "register_operand" "%x,x")
+	      (match_operand:V8SF 2 "nonimmediate_operand" "x,m"))
 	    (match_operand:V8SF 3 "nonimmediate_operand" "xm,x"))
 	  (minus:V8SF
 	    (mult:V8SF
@@ -2370,8 +2326,7 @@
 	      (match_dup 2))
 	    (match_dup 3))
 	  (const_int 85)))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsubaddps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V8SF")])
@@ -2381,8 +2336,8 @@
 	(vec_merge:V4DF
 	  (plus:V4DF
 	    (mult:V4DF
-	      (match_operand:V4DF 1 "nonimmediate_operand" "x,x")
-	      (match_operand:V4DF 2 "nonimmediate_operand" "x,xm"))
+	      (match_operand:V4DF 1 "register_operand" "%x,x")
+	      (match_operand:V4DF 2 "nonimmediate_operand" "x,m"))
 	    (match_operand:V4DF 3 "nonimmediate_operand" "xm,x"))
 	  (minus:V4DF
 	    (mult:V4DF
@@ -2390,8 +2345,7 @@
 	      (match_dup 2))
 	    (match_dup 3))
 	  (const_int 5)))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsubaddpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V4DF")])
@@ -2401,8 +2355,8 @@
 	(vec_merge:V4SF
 	  (plus:V4SF
 	    (mult:V4SF
-	      (match_operand:V4SF 1 "nonimmediate_operand" "x,x")
-	      (match_operand:V4SF 2 "nonimmediate_operand" "x,xm"))
+	      (match_operand:V4SF 1 "register_operand" "%x,x")
+	      (match_operand:V4SF 2 "nonimmediate_operand" "x,m"))
 	    (match_operand:V4SF 3 "nonimmediate_operand" "xm,x"))
 	  (minus:V4SF
 	    (mult:V4SF
@@ -2410,8 +2364,7 @@
 	      (match_dup 2))
 	    (match_dup 3))
 	  (const_int 5)))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsubaddps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V4SF")])
@@ -2421,8 +2374,8 @@
 	(vec_merge:V2DF
 	  (plus:V2DF
 	    (mult:V2DF
-	      (match_operand:V2DF 1 "nonimmediate_operand" "x,x")
-	      (match_operand:V2DF 2 "nonimmediate_operand" "x,xm"))
+	      (match_operand:V2DF 1 "register_operand" "%x,x")
+	      (match_operand:V2DF 2 "nonimmediate_operand" "x,m"))
 	    (match_operand:V2DF 3 "nonimmediate_operand" "xm,x"))
 	  (minus:V2DF
 	    (mult:V2DF
@@ -2430,8 +2383,7 @@
 	      (match_dup 2))
 	    (match_dup 3))
 	  (const_int 1)))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsubaddpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V2DF")])
@@ -2444,8 +2396,8 @@
 	 [(vec_merge:V8SF
 	   (plus:V8SF
 	     (mult:V8SF
-	       (match_operand:V8SF 1 "nonimmediate_operand" "x,x")
-	       (match_operand:V8SF 2 "nonimmediate_operand" "x,xm"))
+	       (match_operand:V8SF 1 "register_operand" "%x,x")
+	       (match_operand:V8SF 2 "nonimmediate_operand" "x,m"))
 	     (match_operand:V8SF 3 "nonimmediate_operand" "xm,x"))
 	   (minus:V8SF
 	     (mult:V8SF
@@ -2454,8 +2406,7 @@
 	     (match_dup 3))
 	   (const_int 170))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmaddsubps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V8SF")])
@@ -2466,8 +2417,8 @@
 	 [(vec_merge:V4DF
 	   (plus:V4DF
 	     (mult:V4DF
-	       (match_operand:V4DF 1 "nonimmediate_operand" "x,x")
-	       (match_operand:V4DF 2 "nonimmediate_operand" "x,xm"))
+	       (match_operand:V4DF 1 "register_operand" "%x,x")
+	       (match_operand:V4DF 2 "nonimmediate_operand" "x,m"))
 	     (match_operand:V4DF 3 "nonimmediate_operand" "xm,x"))
 	   (minus:V4DF
 	     (mult:V4DF
@@ -2476,8 +2427,7 @@
 	     (match_dup 3))
 	   (const_int 10))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmaddsubpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V4DF")])
@@ -2488,8 +2438,8 @@
 	 [(vec_merge:V4SF
 	   (plus:V4SF
 	     (mult:V4SF
-	       (match_operand:V4SF 1 "nonimmediate_operand" "x,x")
-	       (match_operand:V4SF 2 "nonimmediate_operand" "x,xm"))
+	       (match_operand:V4SF 1 "register_operand" "%x,x")
+	       (match_operand:V4SF 2 "nonimmediate_operand" "x,m"))
 	     (match_operand:V4SF 3 "nonimmediate_operand" "xm,x"))
 	   (minus:V4SF
 	     (mult:V4SF
@@ -2498,8 +2448,7 @@
 	     (match_dup 3))
 	   (const_int 10))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmaddsubps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V4SF")])
@@ -2510,8 +2459,8 @@
 	 [(vec_merge:V2DF
 	   (plus:V2DF
 	     (mult:V2DF
-	       (match_operand:V2DF 1 "nonimmediate_operand" "x,x")
-	       (match_operand:V2DF 2 "nonimmediate_operand" "x,xm"))
+	       (match_operand:V2DF 1 "register_operand" "%x,x")
+	       (match_operand:V2DF 2 "nonimmediate_operand" "x,m"))
 	     (match_operand:V2DF 3 "nonimmediate_operand" "xm,x"))
 	   (minus:V2DF
 	     (mult:V2DF
@@ -2520,8 +2469,7 @@
 	     (match_dup 3))
 	   (const_int 2))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmaddsubpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V2DF")])
@@ -2532,8 +2480,8 @@
 	 [(vec_merge:V8SF
 	   (plus:V8SF
 	     (mult:V8SF
-	       (match_operand:V8SF 1 "nonimmediate_operand" "x,x")
-	       (match_operand:V8SF 2 "nonimmediate_operand" "x,xm"))
+	       (match_operand:V8SF 1 "register_operand" "%x,x")
+	       (match_operand:V8SF 2 "nonimmediate_operand" "x,m"))
 	     (match_operand:V8SF 3 "nonimmediate_operand" "xm,x"))
 	   (minus:V8SF
 	     (mult:V8SF
@@ -2542,8 +2490,7 @@
 	     (match_dup 3))
 	   (const_int 85))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsubaddps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V8SF")])
@@ -2554,8 +2501,8 @@
 	 [(vec_merge:V4DF
 	   (plus:V4DF
 	     (mult:V4DF
-	       (match_operand:V4DF 1 "nonimmediate_operand" "x,x")
-	       (match_operand:V4DF 2 "nonimmediate_operand" "x,xm"))
+	       (match_operand:V4DF 1 "register_operand" "%x,x")
+	       (match_operand:V4DF 2 "nonimmediate_operand" "x,m"))
 	     (match_operand:V4DF 3 "nonimmediate_operand" "xm,x"))
 	   (minus:V4DF
 	     (mult:V4DF
@@ -2564,8 +2511,7 @@
 	     (match_dup 3))
 	   (const_int 5))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsubaddpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V4DF")])
@@ -2576,8 +2522,8 @@
 	 [(vec_merge:V4SF
 	   (plus:V4SF
 	     (mult:V4SF
-	       (match_operand:V4SF 1 "nonimmediate_operand" "x,x")
-	       (match_operand:V4SF 2 "nonimmediate_operand" "x,xm"))
+	       (match_operand:V4SF 1 "register_operand" "%x,x")
+	       (match_operand:V4SF 2 "nonimmediate_operand" "x,m"))
 	     (match_operand:V4SF 3 "nonimmediate_operand" "xm,x"))
 	   (minus:V4SF
 	     (mult:V4SF
@@ -2586,8 +2532,7 @@
 	     (match_dup 3))
 	   (const_int 5))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsubaddps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V4SF")])
@@ -2598,8 +2543,8 @@
 	 [(vec_merge:V2DF
 	   (plus:V2DF
 	     (mult:V2DF
-	       (match_operand:V2DF 1 "nonimmediate_operand" "x,x")
-	       (match_operand:V2DF 2 "nonimmediate_operand" "x,xm"))
+	       (match_operand:V2DF 1 "register_operand" "%x,x")
+	       (match_operand:V2DF 2 "nonimmediate_operand" "x,m"))
 	     (match_operand:V2DF 3 "nonimmediate_operand" "xm,x"))
 	   (minus:V2DF
 	     (mult:V2DF
@@ -2608,8 +2553,7 @@
 	     (match_dup 3))
 	   (const_int 1))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsubaddpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V2DF")])
@@ -10356,16 +10300,14 @@
 ;; that it does and splitting it later allows the following to be recognized:
 ;;	a[i] = b[i] * c[i] + d[i];
 (define_insn "xop_pmacsww"
-  [(set (match_operand:V8HI 0 "register_operand" "=x,x")
+  [(set (match_operand:V8HI 0 "register_operand" "=x")
         (plus:V8HI
 	 (mult:V8HI
-	  (match_operand:V8HI 1 "nonimmediate_operand" "%x,m")
-	  (match_operand:V8HI 2 "nonimmediate_operand" "xm,x"))
-	 (match_operand:V8HI 3 "register_operand" "x,x")))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, false, 2, true)"
-  "@
-   vpmacsww\t{%3, %2, %1, %0|%0, %1, %2, %3}
-   vpmacsww\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+	  (match_operand:V8HI 1 "register_operand" "%x")
+	  (match_operand:V8HI 2 "nonimmediate_operand" "xm"))
+	 (match_operand:V8HI 3 "register_operand" "x")))]
+  "TARGET_XOP"
+  "vpmacsww\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "TI")])
 
@@ -10373,33 +10315,27 @@
 (define_split
   [(set (match_operand:V8HI 0 "register_operand" "")
 	(plus:V8HI
-	 (mult:V8HI (match_operand:V8HI 1 "nonimmediate_operand" "")
-		    (match_operand:V8HI 2 "nonimmediate_operand" ""))
-	 (match_operand:V8HI 3 "nonimmediate_operand" "")))]
-  "TARGET_XOP
-   && !ix86_fma4_valid_op_p (operands, insn, 4, false, 1, true)
-   && ix86_fma4_valid_op_p (operands, insn, 4, false, 2, true)
-   && !reg_mentioned_p (operands[0], operands[1])
-   && !reg_mentioned_p (operands[0], operands[2])
-   && !reg_mentioned_p (operands[0], operands[3])"
-  [(const_int 0)]
+	 (mult:V8HI (match_operand:V8HI 1 "register_operand" "")
+		    (match_operand:V8HI 2 "memory_operand" ""))
+	 (match_operand:V8HI 3 "memory_operand" "")))]
+  "TARGET_XOP"
+  [(set (match_dup 0)
+        (plus:V8HI
+         (mult:V8HI (match_dup 1) (match_dup 2))
+         (match_dup 3)))]
 {
-  ix86_expand_fma4_multiple_memory (operands, V8HImode);
-  emit_insn (gen_xop_pmacsww (operands[0], operands[1], operands[2],
-			      operands[3]));
-  DONE;
+  if (!ix86_expand_fma4_multiple_memory (operands, V8HImode))
+    FAIL;
 })
 
 (define_insn "xop_pmacssww"
-  [(set (match_operand:V8HI 0 "register_operand" "=x,x")
+  [(set (match_operand:V8HI 0 "register_operand" "=x")
         (ss_plus:V8HI
-	 (mult:V8HI (match_operand:V8HI 1 "nonimmediate_operand" "%x,m")
-		    (match_operand:V8HI 2 "nonimmediate_operand" "xm,x"))
-	 (match_operand:V8HI 3 "register_operand" "x,x")))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, false, 1, true)"
-  "@
-   vpmacssww\t{%3, %2, %1, %0|%0, %1, %2, %3}
-   vpmacssww\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+	 (mult:V8HI (match_operand:V8HI 1 "register_operand" "%x")
+		    (match_operand:V8HI 2 "nonimmediate_operand" "xm"))
+	 (match_operand:V8HI 3 "register_operand" "x")))]
+  "TARGET_XOP"
+  "vpmacssww\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "TI")])
 
@@ -10408,16 +10344,14 @@
 ;; that it does and splitting it later allows the following to be recognized:
 ;;	a[i] = b[i] * c[i] + d[i];
 (define_insn "xop_pmacsdd"
-  [(set (match_operand:V4SI 0 "register_operand" "=x,x")
+  [(set (match_operand:V4SI 0 "register_operand" "=x")
         (plus:V4SI
 	 (mult:V4SI
-	  (match_operand:V4SI 1 "nonimmediate_operand" "%x,m")
-	  (match_operand:V4SI 2 "nonimmediate_operand" "xm,x"))
-	 (match_operand:V4SI 3 "register_operand" "x,x")))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, false, 2, true)"
-  "@
-   vpmacsdd\t{%3, %2, %1, %0|%0, %1, %2, %3}
-   vpmacsdd\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+	  (match_operand:V4SI 1 "register_operand" "%x")
+	  (match_operand:V4SI 2 "nonimmediate_operand" "xm"))
+	 (match_operand:V4SI 3 "register_operand" "x")))]
+  "TARGET_XOP"
+  "vpmacsdd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "TI")])
 
@@ -10425,117 +10359,105 @@
 (define_split
   [(set (match_operand:V4SI 0 "register_operand" "")
 	(plus:V4SI
-	 (mult:V4SI (match_operand:V4SI 1 "nonimmediate_operand" "")
-		    (match_operand:V4SI 2 "nonimmediate_operand" ""))
-	 (match_operand:V4SI 3 "nonimmediate_operand" "")))]
-  "TARGET_XOP
-   && !ix86_fma4_valid_op_p (operands, insn, 4, false, 1, true)
-   && ix86_fma4_valid_op_p (operands, insn, 4, false, 2, true)
-   && !reg_mentioned_p (operands[0], operands[1])
-   && !reg_mentioned_p (operands[0], operands[2])
-   && !reg_mentioned_p (operands[0], operands[3])"
-  [(const_int 0)]
+	 (mult:V4SI (match_operand:V4SI 1 "register_operand" "")
+		    (match_operand:V4SI 2 "memory_operand" ""))
+	 (match_operand:V4SI 3 "memory_operand" "")))]
+  "TARGET_XOP"
+  [(set (match_dup 0)
+        (plus:V4SI
+         (mult:V4SI (match_dup 1) (match_dup 2))
+         (match_dup 3)))]
 {
-  ix86_expand_fma4_multiple_memory (operands, V4SImode);
-  emit_insn (gen_xop_pmacsdd (operands[0], operands[1], operands[2],
-			      operands[3]));
-  DONE;
+  if (!ix86_expand_fma4_multiple_memory (operands, V4SImode))
+    FAIL;
 })
 
 (define_insn "xop_pmacssdd"
-  [(set (match_operand:V4SI 0 "register_operand" "=x,x")
+  [(set (match_operand:V4SI 0 "register_operand" "=x")
         (ss_plus:V4SI
-	 (mult:V4SI (match_operand:V4SI 1 "nonimmediate_operand" "%x,m")
-		    (match_operand:V4SI 2 "nonimmediate_operand" "xm,x"))
-	 (match_operand:V4SI 3 "register_operand" "x,x")))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, false, 1, true)"
-  "@
-   vpmacssdd\t{%3, %2, %1, %0|%0, %1, %2, %3}
-   vpmacssdd\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+	 (mult:V4SI (match_operand:V4SI 1 "register_operand" "%x")
+		    (match_operand:V4SI 2 "nonimmediate_operand" "xm"))
+	 (match_operand:V4SI 3 "register_operand" "x")))]
+  "TARGET_XOP"
+  "vpmacssdd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "TI")])
 
 (define_insn "xop_pmacssdql"
-  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
 	(ss_plus:V2DI
 	 (mult:V2DI
 	  (sign_extend:V2DI
 	   (vec_select:V2SI
-	    (match_operand:V4SI 1 "nonimmediate_operand" "%x,m")
+	    (match_operand:V4SI 1 "register_operand" "%x")
 	    (parallel [(const_int 1)
 		       (const_int 3)])))
 	  (vec_select:V2SI
-	   (match_operand:V4SI 2 "nonimmediate_operand" "xm,x")
+	   (match_operand:V4SI 2 "nonimmediate_operand" "xm")
 	   (parallel [(const_int 1)
 		      (const_int 3)])))
-	 (match_operand:V2DI 3 "register_operand" "x,x")))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, false, 1, true)"
-  "@
-   vpmacssdql\t{%3, %2, %1, %0|%0, %1, %2, %3}
-   vpmacssdql\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+	 (match_operand:V2DI 3 "register_operand" "x")))]
+  "TARGET_XOP"
+  "vpmacssdql\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "TI")])
 
 (define_insn "xop_pmacssdqh"
-  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
 	(ss_plus:V2DI
 	 (mult:V2DI
 	  (sign_extend:V2DI
 	   (vec_select:V2SI
-	    (match_operand:V4SI 1 "nonimmediate_operand" "%x,m")
+	    (match_operand:V4SI 1 "register_operand" "%x")
 	    (parallel [(const_int 0)
 		       (const_int 2)])))
 	  (sign_extend:V2DI
 	   (vec_select:V2SI
-	    (match_operand:V4SI 2 "nonimmediate_operand" "xm,x")
+	    (match_operand:V4SI 2 "nonimmediate_operand" "xm")
 	    (parallel [(const_int 0)
 		       (const_int 2)]))))
-	 (match_operand:V2DI 3 "register_operand" "x,x")))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, false, 1, true)"
-  "@
-   vpmacssdqh\t{%3, %2, %1, %0|%0, %1, %2, %3}
-   vpmacssdqh\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+	 (match_operand:V2DI 3 "register_operand" "x")))]
+  "TARGET_XOP"
+  "vpmacssdqh\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "TI")])
 
 (define_insn "xop_pmacsdql"
-  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
 	(plus:V2DI
 	 (mult:V2DI
 	  (sign_extend:V2DI
 	   (vec_select:V2SI
-	    (match_operand:V4SI 1 "nonimmediate_operand" "%x,m")
+	    (match_operand:V4SI 1 "register_operand" "%x")
 	    (parallel [(const_int 1)
 		       (const_int 3)])))
 	  (sign_extend:V2DI
 	   (vec_select:V2SI
-	    (match_operand:V4SI 2 "nonimmediate_operand" "xm,x")
+	    (match_operand:V4SI 2 "nonimmediate_operand" "xm")
 	    (parallel [(const_int 1)
 		       (const_int 3)]))))
-	 (match_operand:V2DI 3 "register_operand" "x,x")))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, false, 1, true)"
-  "@
-   vpmacsdql\t{%3, %2, %1, %0|%0, %1, %2, %3}
-   vpmacsdql\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+	 (match_operand:V2DI 3 "register_operand" "x")))]
+  "TARGET_XOP"
+  "vpmacsdql\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "TI")])
 
 (define_insn_and_split "*xop_pmacsdql_mem"
-  [(set (match_operand:V2DI 0 "register_operand" "=&x,&x")
+  [(set (match_operand:V2DI 0 "register_operand" "=&x")
 	(plus:V2DI
 	 (mult:V2DI
 	  (sign_extend:V2DI
 	   (vec_select:V2SI
-	    (match_operand:V4SI 1 "nonimmediate_operand" "%x,m")
+	    (match_operand:V4SI 1 "register_operand" "%x")
 	    (parallel [(const_int 1)
 		       (const_int 3)])))
 	  (sign_extend:V2DI
 	   (vec_select:V2SI
-	    (match_operand:V4SI 2 "nonimmediate_operand" "xm,x")
+	    (match_operand:V4SI 2 "nonimmediate_operand" "xm")
 	    (parallel [(const_int 1)
 		       (const_int 3)]))))
-	 (match_operand:V2DI 3 "memory_operand" "m,m")))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, false, -1, true)"
+	 (match_operand:V2DI 3 "memory_operand" "m")))]
+  "TARGET_XOP"
   "#"
   "&& reload_completed"
   [(set (match_dup 0)
@@ -10564,7 +10486,7 @@
 	(mult:V2DI
 	  (sign_extend:V2DI
 	    (vec_select:V2SI
-	      (match_operand:V4SI 1 "nonimmediate_operand" "%x")
+	      (match_operand:V4SI 1 "register_operand" "%x")
 	      (parallel [(const_int 1)
 			 (const_int 3)])))
 	  (sign_extend:V2DI
@@ -10598,43 +10520,41 @@
    (set_attr "mode" "TI")])
 
 (define_insn "xop_pmacsdqh"
-  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
 	(plus:V2DI
 	 (mult:V2DI
 	  (sign_extend:V2DI
 	   (vec_select:V2SI
-	    (match_operand:V4SI 1 "nonimmediate_operand" "%x,m")
+	    (match_operand:V4SI 1 "register_operand" "%x")
 	    (parallel [(const_int 0)
 		       (const_int 2)])))
 	  (sign_extend:V2DI
 	   (vec_select:V2SI
-	    (match_operand:V4SI 2 "nonimmediate_operand" "xm,x")
+	    (match_operand:V4SI 2 "nonimmediate_operand" "xm")
 	    (parallel [(const_int 0)
 		       (const_int 2)]))))
-	 (match_operand:V2DI 3 "register_operand" "x,x")))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, false, 1, true)"
-  "@
-   vpmacsdqh\t{%3, %2, %1, %0|%0, %1, %2, %3}
-   vpmacsdqh\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+	 (match_operand:V2DI 3 "register_operand" "x")))]
+  "TARGET_XOP"
+  "vpmacsdqh\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "TI")])
 
 (define_insn_and_split "*xop_pmacsdqh_mem"
-  [(set (match_operand:V2DI 0 "register_operand" "=&x,&x")
+  [(set (match_operand:V2DI 0 "register_operand" "=&x")
 	(plus:V2DI
 	 (mult:V2DI
 	  (sign_extend:V2DI
 	   (vec_select:V2SI
-	    (match_operand:V4SI 1 "nonimmediate_operand" "%x,m")
+	    (match_operand:V4SI 1 "register_operand" "%x")
 	    (parallel [(const_int 0)
 		       (const_int 2)])))
 	  (sign_extend:V2DI
 	   (vec_select:V2SI
-	    (match_operand:V4SI 2 "nonimmediate_operand" "xm,x")
+	    (match_operand:V4SI 2 "nonimmediate_operand" "xm")
 	    (parallel [(const_int 0)
 		       (const_int 2)]))))
-	 (match_operand:V2DI 3 "memory_operand" "m,m")))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, false, -1, true)"
+	 (match_operand:V2DI 3 "memory_operand" "m")))]
+  "TARGET_XOP"
   "#"
   "&& reload_completed"
   [(set (match_dup 0)
@@ -10663,7 +10583,7 @@
 	(mult:V2DI
 	  (sign_extend:V2DI
 	    (vec_select:V2SI
-	      (match_operand:V4SI 1 "nonimmediate_operand" "%x")
+	      (match_operand:V4SI 1 "register_operand" "%x")
 	      (parallel [(const_int 0)
 			 (const_int 2)])))
 	  (sign_extend:V2DI
@@ -10698,72 +10618,68 @@
 
 ;; XOP parallel integer multiply/add instructions for the intrinisics
 (define_insn "xop_pmacsswd"
-  [(set (match_operand:V4SI 0 "register_operand" "=x,x")
+  [(set (match_operand:V4SI 0 "register_operand" "=x")
 	(ss_plus:V4SI
 	 (mult:V4SI
 	  (sign_extend:V4SI
 	   (vec_select:V4HI
-	    (match_operand:V8HI 1 "nonimmediate_operand" "%x,m")
+	    (match_operand:V8HI 1 "register_operand" "%x")
 	    (parallel [(const_int 1)
 		       (const_int 3)
 		       (const_int 5)
 		       (const_int 7)])))
 	  (sign_extend:V4SI
 	   (vec_select:V4HI
-	    (match_operand:V8HI 2 "nonimmediate_operand" "xm,x")
+	    (match_operand:V8HI 2 "nonimmediate_operand" "xm")
 	    (parallel [(const_int 1)
 		       (const_int 3)
 		       (const_int 5)
 		       (const_int 7)]))))
-	 (match_operand:V4SI 3 "register_operand" "x,x")))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, false, 1, true)"
-  "@
-   vpmacsswd\t{%3, %2, %1, %0|%0, %1, %2, %3}
-   vpmacsswd\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+	 (match_operand:V4SI 3 "register_operand" "x")))]
+  "TARGET_XOP"
+  "vpmacsswd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "TI")])
 
 (define_insn "xop_pmacswd"
-  [(set (match_operand:V4SI 0 "register_operand" "=x,x")
+  [(set (match_operand:V4SI 0 "register_operand" "=x")
 	(plus:V4SI
 	 (mult:V4SI
 	  (sign_extend:V4SI
 	   (vec_select:V4HI
-	    (match_operand:V8HI 1 "nonimmediate_operand" "%x,m")
+	    (match_operand:V8HI 1 "register_operand" "%x")
 	    (parallel [(const_int 1)
 		       (const_int 3)
 		       (const_int 5)
 		       (const_int 7)])))
 	  (sign_extend:V4SI
 	   (vec_select:V4HI
-	    (match_operand:V8HI 2 "nonimmediate_operand" "xm,x")
+	    (match_operand:V8HI 2 "nonimmediate_operand" "xm")
 	    (parallel [(const_int 1)
 		       (const_int 3)
 		       (const_int 5)
 		       (const_int 7)]))))
-	 (match_operand:V4SI 3 "register_operand" "x,x")))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, false, 1, true)"
-  "@
-   vpmacswd\t{%3, %2, %1, %0|%0, %1, %2, %3}
-   vpmacswd\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+	 (match_operand:V4SI 3 "register_operand" "x")))]
+  "TARGET_XOP"
+  "vpmacswd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "TI")])
 
 (define_insn "xop_pmadcsswd"
-  [(set (match_operand:V4SI 0 "register_operand" "=x,x")
+  [(set (match_operand:V4SI 0 "register_operand" "=x")
 	(ss_plus:V4SI
 	 (plus:V4SI
 	  (mult:V4SI
 	   (sign_extend:V4SI
 	    (vec_select:V4HI
-	     (match_operand:V8HI 1 "nonimmediate_operand" "%x,m")
+	     (match_operand:V8HI 1 "register_operand" "%x")
 	     (parallel [(const_int 0)
 			(const_int 2)
 			(const_int 4)
 			(const_int 6)])))
 	   (sign_extend:V4SI
 	    (vec_select:V4HI
-	     (match_operand:V8HI 2 "nonimmediate_operand" "xm,x")
+	     (match_operand:V8HI 2 "nonimmediate_operand" "xm")
 	     (parallel [(const_int 0)
 			(const_int 2)
 			(const_int 4)
@@ -10783,29 +10699,27 @@
 			(const_int 3)
 			(const_int 5)
 			(const_int 7)])))))
-	 (match_operand:V4SI 3 "register_operand" "x,x")))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, false, 1, true)"
-  "@
-   vpmadcsswd\t{%3, %2, %1, %0|%0, %1, %2, %3}
-   vpmadcsswd\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+	 (match_operand:V4SI 3 "register_operand" "x")))]
+  "TARGET_XOP"
+  "vpmadcsswd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "TI")])
 
 (define_insn "xop_pmadcswd"
-  [(set (match_operand:V4SI 0 "register_operand" "=x,x")
+  [(set (match_operand:V4SI 0 "register_operand" "=x")
 	(plus:V4SI
 	 (plus:V4SI
 	  (mult:V4SI
 	   (sign_extend:V4SI
 	    (vec_select:V4HI
-	     (match_operand:V8HI 1 "nonimmediate_operand" "%x,m")
+	     (match_operand:V8HI 1 "register_operand" "%x")
 	     (parallel [(const_int 0)
 			(const_int 2)
 			(const_int 4)
 			(const_int 6)])))
 	   (sign_extend:V4SI
 	    (vec_select:V4HI
-	     (match_operand:V8HI 2 "nonimmediate_operand" "xm,x")
+	     (match_operand:V8HI 2 "nonimmediate_operand" "xm")
 	     (parallel [(const_int 0)
 			(const_int 2)
 			(const_int 4)
@@ -10825,32 +10739,30 @@
 			(const_int 3)
 			(const_int 5)
 			(const_int 7)])))))
-	 (match_operand:V4SI 3 "register_operand" "x,x")))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, false, 1, true)"
-  "@
-   vpmadcswd\t{%3, %2, %1, %0|%0, %1, %2, %3}
-   vpmadcswd\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+	 (match_operand:V4SI 3 "register_operand" "x")))]
+  "TARGET_XOP"
+  "vpmadcswd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "TI")])
 
 ;; XOP parallel XMM conditional moves
 (define_insn "xop_pcmov_<mode>"
-  [(set (match_operand:SSEMODE 0 "register_operand" "=x,x,x")
+  [(set (match_operand:SSEMODE 0 "register_operand" "=x,x")
 	(if_then_else:SSEMODE
-	  (match_operand:SSEMODE 3 "nonimmediate_operand" "x,x,m")
-	  (match_operand:SSEMODE 1 "vector_move_operand" "x,m,x")
-	  (match_operand:SSEMODE 2 "vector_move_operand" "xm,x,x")))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)"
+	  (match_operand:SSEMODE 3 "nonimmediate_operand" "x,m")
+	  (match_operand:SSEMODE 1 "vector_move_operand" "x,x")
+	  (match_operand:SSEMODE 2 "vector_move_operand" "xm,x")))]
+  "TARGET_XOP"
   "vpcmov\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "sse4arg")])
 
 (define_insn "xop_pcmov_<mode>256"
-  [(set (match_operand:AVX256MODE 0 "register_operand" "=x,x,x")
+  [(set (match_operand:AVX256MODE 0 "register_operand" "=x,x")
 	(if_then_else:AVX256MODE
-	  (match_operand:AVX256MODE 3 "nonimmediate_operand" "x,x,m")
-	  (match_operand:AVX256MODE 1 "vector_move_operand" "x,m,x")
-	  (match_operand:AVX256MODE 2 "vector_move_operand" "xm,x,x")))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)"
+	  (match_operand:AVX256MODE 3 "nonimmediate_operand" "x,m")
+	  (match_operand:AVX256MODE 1 "vector_move_operand" "x,x")
+	  (match_operand:AVX256MODE 2 "vector_move_operand" "xm,x")))]
+  "TARGET_XOP"
   "vpcmov\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "sse4arg")])
 
@@ -11296,53 +11208,53 @@
 
 ;; XOP permute instructions
 (define_insn "xop_pperm"
-  [(set (match_operand:V16QI 0 "register_operand" "=x,x,x")
+  [(set (match_operand:V16QI 0 "register_operand" "=x,x")
 	(unspec:V16QI
-	  [(match_operand:V16QI 1 "nonimmediate_operand" "x,x,m")
-	   (match_operand:V16QI 2 "nonimmediate_operand" "x,m,x")
-	   (match_operand:V16QI 3 "nonimmediate_operand" "xm,x,x")]
+	  [(match_operand:V16QI 1 "register_operand" "x,x")
+	   (match_operand:V16QI 2 "nonimmediate_operand" "x,m")
+	   (match_operand:V16QI 3 "nonimmediate_operand" "xm,x")]
 	  UNSPEC_XOP_PERMUTE))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)"
+  "TARGET_XOP && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vpperm\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "sse4arg")
    (set_attr "mode" "TI")])
 
 ;; XOP pack instructions that combine two vectors into a smaller vector
 (define_insn "xop_pperm_pack_v2di_v4si"
-  [(set (match_operand:V4SI 0 "register_operand" "=x,x,x")
+  [(set (match_operand:V4SI 0 "register_operand" "=x,x")
 	(vec_concat:V4SI
 	 (truncate:V2SI
-	  (match_operand:V2DI 1 "nonimmediate_operand" "x,x,m"))
+	  (match_operand:V2DI 1 "register_operand" "x,x"))
 	 (truncate:V2SI
-	  (match_operand:V2DI 2 "nonimmediate_operand" "x,m,x"))))
-   (use (match_operand:V16QI 3 "nonimmediate_operand" "xm,x,x"))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)"
+	  (match_operand:V2DI 2 "nonimmediate_operand" "x,m"))))
+   (use (match_operand:V16QI 3 "nonimmediate_operand" "xm,x"))]
+  "TARGET_XOP && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vpperm\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "sse4arg")
    (set_attr "mode" "TI")])
 
 (define_insn "xop_pperm_pack_v4si_v8hi"
-  [(set (match_operand:V8HI 0 "register_operand" "=x,x,x")
+  [(set (match_operand:V8HI 0 "register_operand" "=x,x")
 	(vec_concat:V8HI
 	 (truncate:V4HI
-	  (match_operand:V4SI 1 "nonimmediate_operand" "x,x,m"))
+	  (match_operand:V4SI 1 "register_operand" "x,x"))
 	 (truncate:V4HI
-	  (match_operand:V4SI 2 "nonimmediate_operand" "x,m,x"))))
-   (use (match_operand:V16QI 3 "nonimmediate_operand" "xm,x,x"))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)"
+	  (match_operand:V4SI 2 "nonimmediate_operand" "x,m"))))
+   (use (match_operand:V16QI 3 "nonimmediate_operand" "xm,x"))]
+  "TARGET_XOP && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vpperm\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "sse4arg")
    (set_attr "mode" "TI")])
 
 (define_insn "xop_pperm_pack_v8hi_v16qi"
-  [(set (match_operand:V16QI 0 "register_operand" "=x,x,x")
+  [(set (match_operand:V16QI 0 "register_operand" "=x,x")
 	(vec_concat:V16QI
 	 (truncate:V8QI
-	  (match_operand:V8HI 1 "nonimmediate_operand" "x,x,m"))
+	  (match_operand:V8HI 1 "register_operand" "x,x"))
 	 (truncate:V8QI
-	  (match_operand:V8HI 2 "nonimmediate_operand" "x,m,x"))))
-   (use (match_operand:V16QI 3 "nonimmediate_operand" "xm,x,x"))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)"
+	  (match_operand:V8HI 2 "nonimmediate_operand" "x,m"))))
+   (use (match_operand:V16QI 3 "nonimmediate_operand" "xm,x"))]
+  "TARGET_XOP && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vpperm\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "sse4arg")
    (set_attr "mode" "TI")])
@@ -11471,7 +11383,7 @@
 	 (rotatert:SSEMODE1248
 	  (match_dup 1)
 	  (neg:SSEMODE1248 (match_dup 2)))))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 3, true, 1, false)"
+  "TARGET_XOP && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
   "vprot<ssevecsize>\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "type" "sseishft")
    (set_attr "prefix_data16" "0")
@@ -11526,7 +11438,7 @@
 	 (ashiftrt:SSEMODE1248
 	  (match_dup 1)
 	  (neg:SSEMODE1248 (match_dup 2)))))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 3, true, 1, false)"
+  "TARGET_XOP && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
   "vpsha<ssevecsize>\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "type" "sseishft")
    (set_attr "prefix_data16" "0")
@@ -11545,7 +11457,7 @@
 	 (lshiftrt:SSEMODE1248
 	  (match_dup 1)
 	  (neg:SSEMODE1248 (match_dup 2)))))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 3, true, 1, false)"
+  "TARGET_XOP && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
   "vpshl<ssevecsize>\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "type" "sseishft")
    (set_attr "prefix_data16" "0")
-- 
1.6.0.4


[-- Attachment #5: 1755_Fix-FMA4-and-XOP-splitters.diff --]
[-- Type: text/x-patch, Size: 15115 bytes --]

From 54c19dd5ae7a7bc8c12d3433b3b2fbe8ff361e5e Mon Sep 17 00:00:00 2001
From: Sebastian Pop <sebpop@gmail.com>
Date: Thu, 3 Dec 2009 12:43:48 -0600
Subject: [PATCH] Fix FMA4 and XOP splitters.

---
 gcc/config/i386/i386-protos.h |    2 +-
 gcc/config/i386/i386.c        |   43 +++----
 gcc/config/i386/sse.md        |  248 ++++++++++++++++++-----------------------
 3 files changed, 126 insertions(+), 167 deletions(-)

diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 27fca86..cf29cc7 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -218,7 +218,7 @@ extern void ix86_expand_vector_set (bool, rtx, rtx, int);
 extern void ix86_expand_vector_extract (bool, rtx, rtx, int);
 extern void ix86_expand_reduc_v4sf (rtx (*)(rtx, rtx, rtx), rtx, rtx);
 
-extern void ix86_expand_fma4_multiple_memory (rtx [], enum machine_mode);
+extern bool ix86_expand_fma4_multiple_memory (rtx [], enum machine_mode);
 
 extern void ix86_expand_vec_extract_even_odd (rtx, rtx, rtx, unsigned);
 
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index a0a2001..ade3a7d 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -28808,39 +28808,34 @@ ix86_expand_round (rtx operand0, rtx operand1)
 }
 \f
 
-/* Fixup an FMA4 instruction that has 2 memory input references into a form the
-   hardware will allow by using the destination register to load one of the
-   memory operations.  Presently this is used by the multiply/add routines to
-   allow 2 memory references.  */
+/* Fixup an FMA4 or XOP instruction that has 2 memory input references
+   into a form the hardware will allow by using the destination
+   register to load one of the memory operations.  Presently this is
+   used by the multiply/add routines to allow 2 memory references.  */
 
-void
+bool
 ix86_expand_fma4_multiple_memory (rtx operands[],
 				  enum machine_mode mode)
 {
-  rtx op0 = operands[0];
+  rtx scratch = operands[0];
 
-  if (memory_operand (op0, mode)
-      || reg_mentioned_p (op0, operands[1])
-      || reg_mentioned_p (op0, operands[2])
-      || reg_mentioned_p (op0, operands[3]))
-    gcc_unreachable ();
+  gcc_assert (register_operand (operands[0], mode));
+  gcc_assert (register_operand (operands[1], mode));
+  gcc_assert (MEM_P (operands[2]) && MEM_P (operands[3]));
 
-  /* For 2 memory operands, pick either operands[1] or operands[3] to move into
-     the destination register.  */
-  if (memory_operand (operands[1], mode))
+  if (reg_mentioned_p (scratch, operands[1]))
     {
-      emit_move_insn (op0, operands[1]);
-      operands[1] = op0;
-    }
-  else if (memory_operand (operands[3], mode))
-    {
-      emit_move_insn (op0, operands[3]);
-      operands[3] = op0;
+      if (!can_create_pseudo_p ())
+	return false;
+      scratch = gen_reg_rtx (mode);
     }
-  else
-    gcc_unreachable ();
 
-  return;
+  emit_move_insn (scratch, operands[3]);
+  if (rtx_equal_p (operands[2], operands[3]))
+    operands[2] = operands[3] = scratch;
+  else
+    operands[3] = scratch;
+  return true;
 }
 
 /* Table of valid machine attributes.  */
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 7bb4802..78e4b6a 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1719,21 +1719,17 @@
   [(set (match_operand:FMA4MODEF4 0 "register_operand" "")
 	(plus:FMA4MODEF4
 	 (mult:FMA4MODEF4
-	  (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "")
-	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" ""))
-	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "")))]
-  "TARGET_FMA4
-   && MEM_P (operands[2])
-   && (MEM_P (operands[1]) || MEM_P (operands[3]))
-   && !reg_mentioned_p (operands[0], operands[1])
-   && !reg_mentioned_p (operands[0], operands[2])
-   && !reg_mentioned_p (operands[0], operands[3])"
-  [(const_int 0)]
+	  (match_operand:FMA4MODEF4 1 "register_operand" "")
+	  (match_operand:FMA4MODEF4 2 "memory_operand" ""))
+	 (match_operand:FMA4MODEF4 3 "memory_operand" "")))]
+  "TARGET_FMA4"
+  [(set (match_dup 0)
+        (plus:FMA4MODEF4
+         (mult:FMA4MODEF4 (match_dup 1) (match_dup 2))
+         (match_dup 3)))]
 {
-  ix86_expand_fma4_multiple_memory (operands, <MODE>mode);
-  emit_insn (gen_fma4_fmadd<mode>4256 (operands[0], operands[1],
-				    operands[2], operands[3]));
-  DONE;
+ if (!ix86_expand_fma4_multiple_memory (operands, <MODE>mode))
+   FAIL;
 })
 
 ;; Floating multiply and subtract
@@ -1755,21 +1751,17 @@
   [(set (match_operand:FMA4MODEF4 0 "register_operand" "")
 	(minus:FMA4MODEF4
 	 (mult:FMA4MODEF4
-	  (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "")
-	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" ""))
-	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "")))]
-  "TARGET_FMA4
-   && MEM_P (operands[2])
-   && (MEM_P (operands[1]) || MEM_P (operands[3]))
-   && !reg_mentioned_p (operands[0], operands[1])
-   && !reg_mentioned_p (operands[0], operands[2])
-   && !reg_mentioned_p (operands[0], operands[3])"
-  [(const_int 0)]
+	  (match_operand:FMA4MODEF4 1 "register_operand" "")
+	  (match_operand:FMA4MODEF4 2 "memory_operand" ""))
+	 (match_operand:FMA4MODEF4 3 "memory_operand" "")))]
+  "TARGET_FMA4"
+  [(set (match_dup 0)
+        (minus:FMA4MODEF4
+         (mult:FMA4MODEF4 (match_dup 1) (match_dup 2))
+         (match_dup 3)))]
 {
-  ix86_expand_fma4_multiple_memory (operands, <MODE>mode);
-  emit_insn (gen_fma4_fmsub<mode>4256 (operands[0], operands[1],
-				    operands[2], operands[3]));
-  DONE;
+ if (!ix86_expand_fma4_multiple_memory (operands, <MODE>mode))
+   FAIL;
 })
 
 ;; Floating point negative multiply and add
@@ -1792,22 +1784,18 @@
 (define_split
   [(set (match_operand:FMA4MODEF4 0 "register_operand" "")
 	(minus:FMA4MODEF4
-	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "")
+	 (match_operand:FMA4MODEF4 3 "memory_operand" "")
 	 (mult:FMA4MODEF4
-	  (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "")
-	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" ""))))]
-  "TARGET_FMA4
-   && MEM_P (operands[2])
-   && (MEM_P (operands[1]) || MEM_P (operands[3]))
-   && !reg_mentioned_p (operands[0], operands[1])
-   && !reg_mentioned_p (operands[0], operands[2])
-   && !reg_mentioned_p (operands[0], operands[3])"
-  [(const_int 0)]
+	  (match_operand:FMA4MODEF4 1 "register_operand" "")
+	  (match_operand:FMA4MODEF4 2 "memory_operand" ""))))]
+  "TARGET_FMA4"
+  [(set (match_dup 0)
+        (minus:FMA4MODEF4
+	 (match_dup 3)
+         (mult:FMA4MODEF4 (match_dup 1) (match_dup 2))))]
 {
-  ix86_expand_fma4_multiple_memory (operands, <MODE>mode);
-  emit_insn (gen_fma4_fnmadd<mode>4256 (operands[0], operands[1],
-				     operands[2], operands[3]));
-  DONE;
+  if (!ix86_expand_fma4_multiple_memory (operands, <MODE>mode))
+    FAIL;
 })
 
 ;; Floating point negative multiply and subtract
@@ -1832,21 +1820,19 @@
 	(minus:FMA4MODEF4
 	 (mult:FMA4MODEF4
 	  (neg:FMA4MODEF4
-	   (match_operand:FMA4MODEF4 1 "nonimmediate_operand" ""))
-	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" ""))
-	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "")))]
-  "TARGET_FMA4
-   && MEM_P (operands[2])
-   && (MEM_P (operands[1]) || MEM_P (operands[3]))
-   && !reg_mentioned_p (operands[0], operands[1])
-   && !reg_mentioned_p (operands[0], operands[2])
-   && !reg_mentioned_p (operands[0], operands[3])"
-  [(const_int 0)]
+	   (match_operand:FMA4MODEF4 1 "register_operand" ""))
+	  (match_operand:FMA4MODEF4 2 "memory_operand" ""))
+	 (match_operand:FMA4MODEF4 3 "memory_operand" "")))]
+  "TARGET_FMA4"
+  [(set (match_dup 0)
+        (minus:FMA4MODEF4
+         (mult:FMA4MODEF4
+	  (neg:FMA4MODEF4 (match_dup 1))
+	  (match_dup 2))
+         (match_dup 3)))]
 {
-  ix86_expand_fma4_multiple_memory (operands, <MODE>mode);
-  emit_insn (gen_fma4_fnmsub<mode>4256 (operands[0], operands[1],
-				        operands[2], operands[3]));
-  DONE;
+  if (!ix86_expand_fma4_multiple_memory (operands, <MODE>mode))
+    FAIL;
 })
 
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
@@ -1867,21 +1853,17 @@
   [(set (match_operand:SSEMODEF4 0 "register_operand" "")
 	(plus:SSEMODEF4
 	 (mult:SSEMODEF4
-	  (match_operand:SSEMODEF4 1 "nonimmediate_operand" "")
-	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" ""))
-	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "")))]
-  "TARGET_FMA4
-   && MEM_P (operands[2])
-   && (MEM_P (operands[1]) || MEM_P (operands[3]))
-   && !reg_mentioned_p (operands[0], operands[1])
-   && !reg_mentioned_p (operands[0], operands[2])
-   && !reg_mentioned_p (operands[0], operands[3])"
-  [(const_int 0)]
+	  (match_operand:SSEMODEF4 1 "register_operand" "")
+	  (match_operand:SSEMODEF4 2 "memory_operand" ""))
+	 (match_operand:SSEMODEF4 3 "memory_operand" "")))]
+  "TARGET_FMA4"
+  [(set (match_dup 0)
+        (plus:SSEMODEF4
+         (mult:SSEMODEF4 (match_dup 1) (match_dup 2))
+         (match_dup 3)))]
 {
-  ix86_expand_fma4_multiple_memory (operands, <MODE>mode);
-  emit_insn (gen_fma4_fmadd<mode>4 (operands[0], operands[1],
-				    operands[2], operands[3]));
-  DONE;
+  if (!ix86_expand_fma4_multiple_memory (operands, <MODE>mode))
+    FAIL;
 })
 
 ;; For the scalar operations, use operand1 for the upper words that aren't
@@ -1921,21 +1903,17 @@
   [(set (match_operand:SSEMODEF4 0 "register_operand" "")
 	(minus:SSEMODEF4
 	 (mult:SSEMODEF4
-	  (match_operand:SSEMODEF4 1 "nonimmediate_operand" "")
-	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" ""))
-	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "")))]
-  "TARGET_FMA4
-   && MEM_P (operands[2])
-   && (MEM_P (operands[1]) || MEM_P (operands[3]))
-   && !reg_mentioned_p (operands[0], operands[1])
-   && !reg_mentioned_p (operands[0], operands[2])
-   && !reg_mentioned_p (operands[0], operands[3])"
-  [(const_int 0)]
+	  (match_operand:SSEMODEF4 1 "register_operand" "")
+	  (match_operand:SSEMODEF4 2 "memory_operand" ""))
+	 (match_operand:SSEMODEF4 3 "memory_operand" "")))]
+  "TARGET_FMA4"
+  [(set (match_dup 0)
+        (minus:SSEMODEF4
+         (mult:SSEMODEF4 (match_dup 1) (match_dup 2))
+         (match_dup 3)))]
 {
-  ix86_expand_fma4_multiple_memory (operands, <MODE>mode);
-  emit_insn (gen_fma4_fmsub<mode>4 (operands[0], operands[1],
-				    operands[2], operands[3]));
-  DONE;
+  if (!ix86_expand_fma4_multiple_memory (operands, <MODE>mode))
+    FAIL;
 })
 
 ;; For the scalar operations, use operand1 for the upper words that aren't
@@ -1976,22 +1954,18 @@
 (define_split
   [(set (match_operand:SSEMODEF4 0 "register_operand" "")
 	(minus:SSEMODEF4
-	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "")
+	 (match_operand:SSEMODEF4 3 "memory_operand" "")
 	 (mult:SSEMODEF4
-	  (match_operand:SSEMODEF4 1 "nonimmediate_operand" "")
-	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" ""))))]
-  "TARGET_FMA4
-   && MEM_P (operands[2])
-   && (MEM_P (operands[1]) || MEM_P (operands[3]))
-   && !reg_mentioned_p (operands[0], operands[1])
-   && !reg_mentioned_p (operands[0], operands[2])
-   && !reg_mentioned_p (operands[0], operands[3])"
-  [(const_int 0)]
+	  (match_operand:SSEMODEF4 1 "register_operand" "")
+	  (match_operand:SSEMODEF4 2 "memory_operand" ""))))]
+  "TARGET_FMA4"
+  [(set (match_dup 0)
+        (minus:SSEMODEF4
+	 (match_dup 3)
+         (mult:SSEMODEF4 (match_dup 1) (match_dup 2))))]
 {
-  ix86_expand_fma4_multiple_memory (operands, <MODE>mode);
-  emit_insn (gen_fma4_fnmadd<mode>4 (operands[0], operands[1],
-				     operands[2], operands[3]));
-  DONE;
+  if (!ix86_expand_fma4_multiple_memory (operands, <MODE>mode))
+    FAIL;
 })
 
 ;; For the scalar operations, use operand1 for the upper words that aren't
@@ -2034,21 +2008,19 @@
 	(minus:SSEMODEF4
 	 (mult:SSEMODEF4
 	  (neg:SSEMODEF4
-	   (match_operand:SSEMODEF4 1 "nonimmediate_operand" ""))
-	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" ""))
-	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "")))]
-  "TARGET_FMA4
-   && MEM_P (operands[2])
-   && (MEM_P (operands[1]) || MEM_P (operands[3]))
-   && !reg_mentioned_p (operands[0], operands[1])
-   && !reg_mentioned_p (operands[0], operands[2])
-   && !reg_mentioned_p (operands[0], operands[3])"
-  [(const_int 0)]
+	   (match_operand:SSEMODEF4 1 "register_operand" ""))
+	  (match_operand:SSEMODEF4 2 "memory_operand" ""))
+	 (match_operand:SSEMODEF4 3 "memory_operand" "")))]
+  "TARGET_FMA4"
+  [(set (match_dup 0)
+        (minus:SSEMODEF4
+         (mult:SSEMODEF4
+	  (neg:SSEMODEF4 (match_dup 1))
+	  (match_dup 2))
+         (match_dup 3)))]
 {
-  ix86_expand_fma4_multiple_memory (operands, <MODE>mode);
-  emit_insn (gen_fma4_fnmsub<mode>4 (operands[0], operands[1],
-				     operands[2], operands[3]));
-  DONE;
+  if (!ix86_expand_fma4_multiple_memory (operands, <MODE>mode))
+    FAIL;
 })
 
 ;; For the scalar operations, use operand1 for the upper words that aren't
@@ -10343,21 +10315,17 @@
 (define_split
   [(set (match_operand:V8HI 0 "register_operand" "")
 	(plus:V8HI
-	 (mult:V8HI (match_operand:V8HI 1 "nonimmediate_operand" "")
-		    (match_operand:V8HI 2 "nonimmediate_operand" ""))
-	 (match_operand:V8HI 3 "nonimmediate_operand" "")))]
-  "TARGET_XOP
-   && MEM_P (operands[2])
-   && (MEM_P (operands[1]) || MEM_P (operands[3]))
-   && !reg_mentioned_p (operands[0], operands[1])
-   && !reg_mentioned_p (operands[0], operands[2])
-   && !reg_mentioned_p (operands[0], operands[3])"
-  [(const_int 0)]
+	 (mult:V8HI (match_operand:V8HI 1 "register_operand" "")
+		    (match_operand:V8HI 2 "memory_operand" ""))
+	 (match_operand:V8HI 3 "memory_operand" "")))]
+  "TARGET_XOP"
+  [(set (match_dup 0)
+        (plus:V8HI
+         (mult:V8HI (match_dup 1) (match_dup 2))
+         (match_dup 3)))]
 {
-  ix86_expand_fma4_multiple_memory (operands, V8HImode);
-  emit_insn (gen_xop_pmacsww (operands[0], operands[1], operands[2],
-			      operands[3]));
-  DONE;
+  if (!ix86_expand_fma4_multiple_memory (operands, V8HImode))
+    FAIL;
 })
 
 (define_insn "xop_pmacssww"
@@ -10391,21 +10359,17 @@
 (define_split
   [(set (match_operand:V4SI 0 "register_operand" "")
 	(plus:V4SI
-	 (mult:V4SI (match_operand:V4SI 1 "nonimmediate_operand" "")
-		    (match_operand:V4SI 2 "nonimmediate_operand" ""))
-	 (match_operand:V4SI 3 "nonimmediate_operand" "")))]
-  "TARGET_XOP
-   && MEM_P (operands[2])
-   && (MEM_P (operands[1]) || MEM_P (operands[3]))
-   && !reg_mentioned_p (operands[0], operands[1])
-   && !reg_mentioned_p (operands[0], operands[2])
-   && !reg_mentioned_p (operands[0], operands[3])"
-  [(const_int 0)]
+	 (mult:V4SI (match_operand:V4SI 1 "register_operand" "")
+		    (match_operand:V4SI 2 "memory_operand" ""))
+	 (match_operand:V4SI 3 "memory_operand" "")))]
+  "TARGET_XOP"
+  [(set (match_dup 0)
+        (plus:V4SI
+         (mult:V4SI (match_dup 1) (match_dup 2))
+         (match_dup 3)))]
 {
-  ix86_expand_fma4_multiple_memory (operands, V4SImode);
-  emit_insn (gen_xop_pmacsdd (operands[0], operands[1], operands[2],
-			      operands[3]));
-  DONE;
+  if (!ix86_expand_fma4_multiple_memory (operands, V4SImode))
+    FAIL;
 })
 
 (define_insn "xop_pmacssdd"
-- 
1.6.0.4


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Vector permutation support for x86
  2009-12-03 19:30       ` Sebastian Pop
@ 2009-12-03 20:44         ` Richard Henderson
  2009-12-03 23:37           ` Sebastian Pop
  2009-12-05 17:07         ` Uros Bizjak
  1 sibling, 1 reply; 45+ messages in thread
From: Richard Henderson @ 2009-12-03 20:44 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: Uros Bizjak, GCC Patches

> @@ -1976,22 +1954,18 @@
>  (define_split
>    [(set (match_operand:SSEMODEF4 0 "register_operand" "")
>  	(minus:SSEMODEF4
> -	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "")
> +	 (match_operand:SSEMODEF4 3 "register_operand" "")
>  	 (mult:SSEMODEF4
> -	  (match_operand:SSEMODEF4 1 "nonimmediate_operand" "")
> -	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" ""))))]
> -  "TARGET_FMA4
> -   && MEM_P (operands[2])
> -   && (MEM_P (operands[1]) || MEM_P (operands[3]))
> -   && !reg_mentioned_p (operands[0], operands[1])
> -   && !reg_mentioned_p (operands[0], operands[2])
> -   && !reg_mentioned_p (operands[0], operands[3])"
> -  [(const_int 0)]
> +	  (match_operand:SSEMODEF4 1 "memory_operand" "")
> +	  (match_operand:SSEMODEF4 2 "memory_operand" ""))))]
> +  "TARGET_FMA4"
> +  [(set (match_dup 0)
> +        (minus:SSEMODEF4
> +	 (match_dup 3)
> +         (mult:SSEMODEF4 (match_dup 1) (match_dup 2))))]

Careful -- op3 is in a different spot here.
You set op3 as the register operand, not op1 as intended.

> @@ -10346,18 +10318,14 @@
>  	 (mult:V8HI (match_operand:V8HI 1 "nonimmediate_operand" "")
>  		    (match_operand:V8HI 2 "nonimmediate_operand" ""))
>  	 (match_operand:V8HI 3 "nonimmediate_operand" "")))]
> -  "TARGET_XOP
> -   && MEM_P (operands[2])
> -   && (MEM_P (operands[1]) || MEM_P (operands[3]))
> -   && !reg_mentioned_p (operands[0], operands[1])
> -   && !reg_mentioned_p (operands[0], operands[2])
> -   && !reg_mentioned_p (operands[0], operands[3])"
> -  [(const_int 0)]
> +  "TARGET_XOP"
> +  [(set (match_dup 0)
> +        (plus:V8HI
> +         (mult:V8HI (match_dup 1) (match_dup 2))
> +         (match_dup 3)))]
>  {
> -  ix86_expand_fma4_multiple_memory (operands, V8HImode);
> -  emit_insn (gen_xop_pmacsww (operands[0], operands[1], operands[2],
> -			      operands[3]));
> -  DONE;
> +  if (!ix86_expand_fma4_multiple_memory (operands, V8HImode))
> +    FAIL;
>  })

You missed changing the operand constraints here.
You'll fail the assertions in ix86_expand_fma4_multiple_memory.

Likewise with xop_pmacsdd.

Otherwise ok.

r~

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Vector permutation support for x86
  2009-12-02 23:55         ` Richard Henderson
@ 2009-12-03 19:53           ` Sebastian Pop
  2009-12-04  6:50             ` Sebastian Pop
  0 siblings, 1 reply; 45+ messages in thread
From: Sebastian Pop @ 2009-12-03 19:53 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Uros Bizjak, GCC Patches, Harle, Christophe

On Wed, Dec 2, 2009 at 17:53, Richard Henderson <rth@redhat.com> wrote:
> On 12/02/2009 02:57 PM, Sebastian Pop wrote:
>>
>> VFNMSUBPD dest, src1, src2, src3
>>
>> with the semantics: "dest = - (src1 * src2) - src3"
>>
>> that means that the above patterns are just wrong wrt. the XOP manual:
>> the pattern as implemented in sse.md is "dest = (-src1) * src2 - src3
>
> Actually, on second thought I don't think it's wrong at all.
>
> Unless A or B is zero, -A*B == -(A*B).  With A zero,
>
>   (-0*x) = +0
> but
>   -(0*x) = -(0) = -0.
>
> However, the following subtraction hides that because
>
>   -0 - (+-0) = +0
>
> so you can't actually tell the difference with this insn because you never
> get to see the wrong signed zero on the intermediate value.  And of course
> that also means that -A*B == A*-B, so of course the operands are still
> commutable.
>
> The comment
>
>> ;; Floating point negative multiply and subtract
>> ;; Rewrite (- (a * b) - c) into the canonical form: ((-a) * b) - c
>
> suggests that someone's already determined that the later form is more
> likely to be generated by GCC.  We probably ought to re-verify that, though
> that's not 100% necessary to do right away.
>

Ok, so I removed the neg patch.

> I do see from other ports that all of the versions of these patterns without
> the unspec should be protected by TARGET_FUSED_MADD and the -mfused-add
> command-line option to control it.  Note that FUSED_MADD is enabled by
> default on all those targets that implement it.
>
> I see i386 used to have the option, but someone decided that -mfused-madd
> should imply -mavx instead.  Which is silly since that's not the same thing
> at all; -m{avx,xop} -mno-fused-madd is a very sensible combination of
> options if your numerical algorithm can't stand the fused operation.

I will prepare a patch to fix this.

Sebastian

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Vector permutation support for x86
  2009-12-02 22:38     ` Richard Henderson
  2009-12-02 23:05       ` Sebastian Pop
@ 2009-12-03 19:30       ` Sebastian Pop
  2009-12-03 20:44         ` Richard Henderson
  2009-12-05 17:07         ` Uros Bizjak
  1 sibling, 2 replies; 45+ messages in thread
From: Sebastian Pop @ 2009-12-03 19:30 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Uros Bizjak, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 2663 bytes --]

On Wed, Dec 2, 2009 at 16:23, Richard Henderson <rth@redhat.com> wrote:
>> @@ -1724,8 +1723,8 @@
>>          (match_operand:FMA4MODEF4 2 "nonimmediate_operand" ""))
>>         (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "")))]
>>   "TARGET_FMA4
>> -   && !ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)
>> -   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)
>> +   && MEM_P (operands[2])
>> +   && (MEM_P (operands[1]) || MEM_P (operands[3]))
>>    && !reg_mentioned_p (operands[0], operands[1])
>>    && !reg_mentioned_p (operands[0], operands[2])
>>    && !reg_mentioned_p (operands[0], operands[3])"
>
> This is the splitter under fma4_fmadd<mode>4256", but the same comment
> applies to all of the fma4 splitters.
>
> First, MEM_P(operands[2]) would be better written as "memory_operand" in the
> match_operand for op2.  Second, two of the reg_mentioned_p tests are
> *always* going to be false for these patterns, for the simple reason that
> operands[0] is a vector float register and the only registers that would be
> present in a memory operand is an address register.
>
> So I think these splitters would be better written:
>
> ;; Split fmadd with two memory operands into a load and the fmadd.
> (define_split
>  [(set (match_operand:FMA4MODEF4 0 "register_operand" "")
>        (plus:FMA4MODEF4
>         (mult:FMA4MODEF4
>          (match_operand:FMA4MODEF4 1 "register_operand" "")
>          (match_operand:FMA4MODEF4 2 "memory_operand" ""))
>         (match_operand:FMA4MODEF4 3 "memory_operand" "")))]
>  "TARGET_FMA4"
>  [(set (match_dup 0)
>        (plus:FMA4MODEF4
>          (mult:FMA4MODEF4 (match_dup 1) (match_dup 2))
>          (match_dup 3)))]
> {
>  if (!ix86_expand_fma4_multiple_memory (operands, <MODE>mode))
>    FAIL;
> })
>
> bool
> ix86_expand_fma4_multiple_memory (rtx operands[],
>                                  enum machine_mode mode)
> {
>  rtx scratch = operands[0];
>
>  gcc_assert (register_operand (operands[0], mode));
>  gcc_assert (register_operand (operands[1], mode));
>  gcc_assert (MEM_P (operands[2]) && MEM_P (operands[3]));
>
>  if (reg_mentioned_p (scratch, operands[1]))
>    {
>      if (!can_create_pseudo_p ())
>        return false;
>      scratch = gen_reg_rtx (mode);
>    }
>
>  emit_move_insn (scratch, operands[3]);
>  if (rtx_equal_p (operands[2], operands[3]))
>    operands[2] = operands[3] = scratch;
>  else
>    operands[3] = scratch;
>  return true;
> }
>

Fixed like this.

Sebastian

[-- Attachment #2: 0001-Fix-FMA4-and-XOP-splitters.patch --]
[-- Type: text/x-patch, Size: 14187 bytes --]

From e79de1050d77d66bdf314542d474289c855f08e8 Mon Sep 17 00:00:00 2001
From: Sebastian Pop <sebpop@gmail.com>
Date: Thu, 3 Dec 2009 12:43:48 -0600
Subject: [PATCH] Fix FMA4 and XOP splitters.

---
 gcc/config/i386/i386-protos.h |    2 +-
 gcc/config/i386/i386.c        |   35 +++----
 gcc/config/i386/sse.md        |  236 +++++++++++++++++-----------------------
 3 files changed, 116 insertions(+), 157 deletions(-)

diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 27fca86..cf29cc7 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -218,7 +218,7 @@ extern void ix86_expand_vector_set (bool, rtx, rtx, int);
 extern void ix86_expand_vector_extract (bool, rtx, rtx, int);
 extern void ix86_expand_reduc_v4sf (rtx (*)(rtx, rtx, rtx), rtx, rtx);
 
-extern void ix86_expand_fma4_multiple_memory (rtx [], enum machine_mode);
+extern bool ix86_expand_fma4_multiple_memory (rtx [], enum machine_mode);
 
 extern void ix86_expand_vec_extract_even_odd (rtx, rtx, rtx, unsigned);
 
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index a0a2001..9b2829b 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -28813,34 +28813,29 @@ ix86_expand_round (rtx operand0, rtx operand1)
    memory operations.  Presently this is used by the multiply/add routines to
    allow 2 memory references.  */
 
-void
+bool
 ix86_expand_fma4_multiple_memory (rtx operands[],
 				  enum machine_mode mode)
 {
-  rtx op0 = operands[0];
+  rtx scratch = operands[0];
 
-  if (memory_operand (op0, mode)
-      || reg_mentioned_p (op0, operands[1])
-      || reg_mentioned_p (op0, operands[2])
-      || reg_mentioned_p (op0, operands[3]))
-    gcc_unreachable ();
+  gcc_assert (register_operand (operands[0], mode));
+  gcc_assert (register_operand (operands[1], mode));
+  gcc_assert (MEM_P (operands[2]) && MEM_P (operands[3]));
 
-  /* For 2 memory operands, pick either operands[1] or operands[3] to move into
-     the destination register.  */
-  if (memory_operand (operands[1], mode))
+  if (reg_mentioned_p (scratch, operands[1]))
     {
-      emit_move_insn (op0, operands[1]);
-      operands[1] = op0;
-    }
-  else if (memory_operand (operands[3], mode))
-    {
-      emit_move_insn (op0, operands[3]);
-      operands[3] = op0;
+      if (!can_create_pseudo_p ())
+	return false;
+      scratch = gen_reg_rtx (mode);
     }
-  else
-    gcc_unreachable ();
 
-  return;
+  emit_move_insn (scratch, operands[3]);
+  if (rtx_equal_p (operands[2], operands[3]))
+    operands[2] = operands[3] = scratch;
+  else
+    operands[3] = scratch;
+  return true;
 }
 
 /* Table of valid machine attributes.  */
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 7bb4802..2c9a6c8 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1719,21 +1719,17 @@
   [(set (match_operand:FMA4MODEF4 0 "register_operand" "")
 	(plus:FMA4MODEF4
 	 (mult:FMA4MODEF4
-	  (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "")
-	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" ""))
-	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "")))]
-  "TARGET_FMA4
-   && MEM_P (operands[2])
-   && (MEM_P (operands[1]) || MEM_P (operands[3]))
-   && !reg_mentioned_p (operands[0], operands[1])
-   && !reg_mentioned_p (operands[0], operands[2])
-   && !reg_mentioned_p (operands[0], operands[3])"
-  [(const_int 0)]
+	  (match_operand:FMA4MODEF4 1 "register_operand" "")
+	  (match_operand:FMA4MODEF4 2 "memory_operand" ""))
+	 (match_operand:FMA4MODEF4 3 "memory_operand" "")))]
+  "TARGET_FMA4"
+  [(set (match_dup 0)
+        (plus:FMA4MODEF4
+         (mult:FMA4MODEF4 (match_dup 1) (match_dup 2))
+         (match_dup 3)))]
 {
-  ix86_expand_fma4_multiple_memory (operands, <MODE>mode);
-  emit_insn (gen_fma4_fmadd<mode>4256 (operands[0], operands[1],
-				    operands[2], operands[3]));
-  DONE;
+ if (!ix86_expand_fma4_multiple_memory (operands, <MODE>mode))
+   FAIL;
 })
 
 ;; Floating multiply and subtract
@@ -1755,21 +1751,17 @@
   [(set (match_operand:FMA4MODEF4 0 "register_operand" "")
 	(minus:FMA4MODEF4
 	 (mult:FMA4MODEF4
-	  (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "")
-	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" ""))
-	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "")))]
-  "TARGET_FMA4
-   && MEM_P (operands[2])
-   && (MEM_P (operands[1]) || MEM_P (operands[3]))
-   && !reg_mentioned_p (operands[0], operands[1])
-   && !reg_mentioned_p (operands[0], operands[2])
-   && !reg_mentioned_p (operands[0], operands[3])"
-  [(const_int 0)]
+	  (match_operand:FMA4MODEF4 1 "register_operand" "")
+	  (match_operand:FMA4MODEF4 2 "memory_operand" ""))
+	 (match_operand:FMA4MODEF4 3 "memory_operand" "")))]
+  "TARGET_FMA4"
+  [(set (match_dup 0)
+        (minus:FMA4MODEF4
+         (mult:FMA4MODEF4 (match_dup 1) (match_dup 2))
+         (match_dup 3)))]
 {
-  ix86_expand_fma4_multiple_memory (operands, <MODE>mode);
-  emit_insn (gen_fma4_fmsub<mode>4256 (operands[0], operands[1],
-				    operands[2], operands[3]));
-  DONE;
+ if (!ix86_expand_fma4_multiple_memory (operands, <MODE>mode))
+   FAIL;
 })
 
 ;; Floating point negative multiply and add
@@ -1792,22 +1784,18 @@
 (define_split
   [(set (match_operand:FMA4MODEF4 0 "register_operand" "")
 	(minus:FMA4MODEF4
-	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "")
+	 (match_operand:FMA4MODEF4 3 "register_operand" "")
 	 (mult:FMA4MODEF4
-	  (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "")
-	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" ""))))]
-  "TARGET_FMA4
-   && MEM_P (operands[2])
-   && (MEM_P (operands[1]) || MEM_P (operands[3]))
-   && !reg_mentioned_p (operands[0], operands[1])
-   && !reg_mentioned_p (operands[0], operands[2])
-   && !reg_mentioned_p (operands[0], operands[3])"
-  [(const_int 0)]
+	  (match_operand:FMA4MODEF4 1 "memory_operand" "")
+	  (match_operand:FMA4MODEF4 2 "memory_operand" ""))))]
+  "TARGET_FMA4"
+  [(set (match_dup 0)
+        (minus:FMA4MODEF4
+	 (match_dup 3)
+         (mult:FMA4MODEF4 (match_dup 1) (match_dup 2))))]
 {
-  ix86_expand_fma4_multiple_memory (operands, <MODE>mode);
-  emit_insn (gen_fma4_fnmadd<mode>4256 (operands[0], operands[1],
-				     operands[2], operands[3]));
-  DONE;
+  if (!ix86_expand_fma4_multiple_memory (operands, <MODE>mode))
+    FAIL;
 })
 
 ;; Floating point negative multiply and subtract
@@ -1832,21 +1820,19 @@
 	(minus:FMA4MODEF4
 	 (mult:FMA4MODEF4
 	  (neg:FMA4MODEF4
-	   (match_operand:FMA4MODEF4 1 "nonimmediate_operand" ""))
-	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" ""))
-	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "")))]
-  "TARGET_FMA4
-   && MEM_P (operands[2])
-   && (MEM_P (operands[1]) || MEM_P (operands[3]))
-   && !reg_mentioned_p (operands[0], operands[1])
-   && !reg_mentioned_p (operands[0], operands[2])
-   && !reg_mentioned_p (operands[0], operands[3])"
-  [(const_int 0)]
+	   (match_operand:FMA4MODEF4 1 "register_operand" ""))
+	  (match_operand:FMA4MODEF4 2 "memory_operand" ""))
+	 (match_operand:FMA4MODEF4 3 "memory_operand" "")))]
+  "TARGET_FMA4"
+  [(set (match_dup 0)
+        (minus:FMA4MODEF4
+         (mult:FMA4MODEF4
+	  (neg:FMA4MODEF4 (match_dup 1))
+	  (match_dup 2))
+         (match_dup 3)))]
 {
-  ix86_expand_fma4_multiple_memory (operands, <MODE>mode);
-  emit_insn (gen_fma4_fnmsub<mode>4256 (operands[0], operands[1],
-				        operands[2], operands[3]));
-  DONE;
+  if (!ix86_expand_fma4_multiple_memory (operands, <MODE>mode))
+    FAIL;
 })
 
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
@@ -1867,21 +1853,17 @@
   [(set (match_operand:SSEMODEF4 0 "register_operand" "")
 	(plus:SSEMODEF4
 	 (mult:SSEMODEF4
-	  (match_operand:SSEMODEF4 1 "nonimmediate_operand" "")
-	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" ""))
-	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "")))]
-  "TARGET_FMA4
-   && MEM_P (operands[2])
-   && (MEM_P (operands[1]) || MEM_P (operands[3]))
-   && !reg_mentioned_p (operands[0], operands[1])
-   && !reg_mentioned_p (operands[0], operands[2])
-   && !reg_mentioned_p (operands[0], operands[3])"
-  [(const_int 0)]
+	  (match_operand:SSEMODEF4 1 "register_operand" "")
+	  (match_operand:SSEMODEF4 2 "memory_operand" ""))
+	 (match_operand:SSEMODEF4 3 "memory_operand" "")))]
+  "TARGET_FMA4"
+  [(set (match_dup 0)
+        (plus:SSEMODEF4
+         (mult:SSEMODEF4 (match_dup 1) (match_dup 2))
+         (match_dup 3)))]
 {
-  ix86_expand_fma4_multiple_memory (operands, <MODE>mode);
-  emit_insn (gen_fma4_fmadd<mode>4 (operands[0], operands[1],
-				    operands[2], operands[3]));
-  DONE;
+  if (!ix86_expand_fma4_multiple_memory (operands, <MODE>mode))
+    FAIL;
 })
 
 ;; For the scalar operations, use operand1 for the upper words that aren't
@@ -1921,21 +1903,17 @@
   [(set (match_operand:SSEMODEF4 0 "register_operand" "")
 	(minus:SSEMODEF4
 	 (mult:SSEMODEF4
-	  (match_operand:SSEMODEF4 1 "nonimmediate_operand" "")
-	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" ""))
-	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "")))]
-  "TARGET_FMA4
-   && MEM_P (operands[2])
-   && (MEM_P (operands[1]) || MEM_P (operands[3]))
-   && !reg_mentioned_p (operands[0], operands[1])
-   && !reg_mentioned_p (operands[0], operands[2])
-   && !reg_mentioned_p (operands[0], operands[3])"
-  [(const_int 0)]
+	  (match_operand:SSEMODEF4 1 "register_operand" "")
+	  (match_operand:SSEMODEF4 2 "memory_operand" ""))
+	 (match_operand:SSEMODEF4 3 "memory_operand" "")))]
+  "TARGET_FMA4"
+  [(set (match_dup 0)
+        (minus:SSEMODEF4
+         (mult:SSEMODEF4 (match_dup 1) (match_dup 2))
+         (match_dup 3)))]
 {
-  ix86_expand_fma4_multiple_memory (operands, <MODE>mode);
-  emit_insn (gen_fma4_fmsub<mode>4 (operands[0], operands[1],
-				    operands[2], operands[3]));
-  DONE;
+  if (!ix86_expand_fma4_multiple_memory (operands, <MODE>mode))
+    FAIL;
 })
 
 ;; For the scalar operations, use operand1 for the upper words that aren't
@@ -1976,22 +1954,18 @@
 (define_split
   [(set (match_operand:SSEMODEF4 0 "register_operand" "")
 	(minus:SSEMODEF4
-	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "")
+	 (match_operand:SSEMODEF4 3 "register_operand" "")
 	 (mult:SSEMODEF4
-	  (match_operand:SSEMODEF4 1 "nonimmediate_operand" "")
-	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" ""))))]
-  "TARGET_FMA4
-   && MEM_P (operands[2])
-   && (MEM_P (operands[1]) || MEM_P (operands[3]))
-   && !reg_mentioned_p (operands[0], operands[1])
-   && !reg_mentioned_p (operands[0], operands[2])
-   && !reg_mentioned_p (operands[0], operands[3])"
-  [(const_int 0)]
+	  (match_operand:SSEMODEF4 1 "memory_operand" "")
+	  (match_operand:SSEMODEF4 2 "memory_operand" ""))))]
+  "TARGET_FMA4"
+  [(set (match_dup 0)
+        (minus:SSEMODEF4
+	 (match_dup 3)
+         (mult:SSEMODEF4 (match_dup 1) (match_dup 2))))]
 {
-  ix86_expand_fma4_multiple_memory (operands, <MODE>mode);
-  emit_insn (gen_fma4_fnmadd<mode>4 (operands[0], operands[1],
-				     operands[2], operands[3]));
-  DONE;
+  if (!ix86_expand_fma4_multiple_memory (operands, <MODE>mode))
+    FAIL;
 })
 
 ;; For the scalar operations, use operand1 for the upper words that aren't
@@ -2034,21 +2008,19 @@
 	(minus:SSEMODEF4
 	 (mult:SSEMODEF4
 	  (neg:SSEMODEF4
-	   (match_operand:SSEMODEF4 1 "nonimmediate_operand" ""))
-	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" ""))
-	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "")))]
-  "TARGET_FMA4
-   && MEM_P (operands[2])
-   && (MEM_P (operands[1]) || MEM_P (operands[3]))
-   && !reg_mentioned_p (operands[0], operands[1])
-   && !reg_mentioned_p (operands[0], operands[2])
-   && !reg_mentioned_p (operands[0], operands[3])"
-  [(const_int 0)]
+	   (match_operand:SSEMODEF4 1 "register_operand" ""))
+	  (match_operand:SSEMODEF4 2 "memory_operand" ""))
+	 (match_operand:SSEMODEF4 3 "memory_operand" "")))]
+  "TARGET_FMA4"
+  [(set (match_dup 0)
+        (minus:SSEMODEF4
+         (mult:SSEMODEF4
+	  (neg:SSEMODEF4 (match_dup 1))
+	  (match_dup 2))
+         (match_dup 3)))]
 {
-  ix86_expand_fma4_multiple_memory (operands, <MODE>mode);
-  emit_insn (gen_fma4_fnmsub<mode>4 (operands[0], operands[1],
-				     operands[2], operands[3]));
-  DONE;
+  if (!ix86_expand_fma4_multiple_memory (operands, <MODE>mode))
+    FAIL;
 })
 
 ;; For the scalar operations, use operand1 for the upper words that aren't
@@ -10346,18 +10318,14 @@
 	 (mult:V8HI (match_operand:V8HI 1 "nonimmediate_operand" "")
 		    (match_operand:V8HI 2 "nonimmediate_operand" ""))
 	 (match_operand:V8HI 3 "nonimmediate_operand" "")))]
-  "TARGET_XOP
-   && MEM_P (operands[2])
-   && (MEM_P (operands[1]) || MEM_P (operands[3]))
-   && !reg_mentioned_p (operands[0], operands[1])
-   && !reg_mentioned_p (operands[0], operands[2])
-   && !reg_mentioned_p (operands[0], operands[3])"
-  [(const_int 0)]
+  "TARGET_XOP"
+  [(set (match_dup 0)
+        (plus:V8HI
+         (mult:V8HI (match_dup 1) (match_dup 2))
+         (match_dup 3)))]
 {
-  ix86_expand_fma4_multiple_memory (operands, V8HImode);
-  emit_insn (gen_xop_pmacsww (operands[0], operands[1], operands[2],
-			      operands[3]));
-  DONE;
+  if (!ix86_expand_fma4_multiple_memory (operands, V8HImode))
+    FAIL;
 })
 
 (define_insn "xop_pmacssww"
@@ -10394,18 +10362,14 @@
 	 (mult:V4SI (match_operand:V4SI 1 "nonimmediate_operand" "")
 		    (match_operand:V4SI 2 "nonimmediate_operand" ""))
 	 (match_operand:V4SI 3 "nonimmediate_operand" "")))]
-  "TARGET_XOP
-   && MEM_P (operands[2])
-   && (MEM_P (operands[1]) || MEM_P (operands[3]))
-   && !reg_mentioned_p (operands[0], operands[1])
-   && !reg_mentioned_p (operands[0], operands[2])
-   && !reg_mentioned_p (operands[0], operands[3])"
-  [(const_int 0)]
+  "TARGET_XOP"
+  [(set (match_dup 0)
+        (plus:V4SI
+         (mult:V4SI (match_dup 1) (match_dup 2))
+         (match_dup 3)))]
 {
-  ix86_expand_fma4_multiple_memory (operands, V4SImode);
-  emit_insn (gen_xop_pmacsdd (operands[0], operands[1], operands[2],
-			      operands[3]));
-  DONE;
+  if (!ix86_expand_fma4_multiple_memory (operands, V4SImode))
+    FAIL;
 })
 
 (define_insn "xop_pmacssdd"
-- 
1.6.0.4


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Vector permutation support for x86
  2009-12-02 23:05       ` Sebastian Pop
  2009-12-02 23:39         ` Sebastian Pop
@ 2009-12-02 23:55         ` Richard Henderson
  2009-12-03 19:53           ` Sebastian Pop
  1 sibling, 1 reply; 45+ messages in thread
From: Richard Henderson @ 2009-12-02 23:55 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: Uros Bizjak, GCC Patches, Harle, Christophe

On 12/02/2009 02:57 PM, Sebastian Pop wrote:
> VFNMSUBPD dest, src1, src2, src3
>
> with the semantics: "dest = - (src1 * src2) - src3"
>
> that means that the above patterns are just wrong wrt. the XOP manual:
> the pattern as implemented in sse.md is "dest = (-src1) * src2 - src3

Actually, on second thought I don't think it's wrong at all.

Unless A or B is zero, -A*B == -(A*B).  With A zero,

    (-0*x) = +0
but
    -(0*x) = -(0) = -0.

However, the following subtraction hides that because

    -0 - (+-0) = +0

so you can't actually tell the difference with this insn because you 
never get to see the wrong signed zero on the intermediate value.  And 
of course that also means that -A*B == A*-B, so of course the operands 
are still commutable.

The comment

> ;; Floating point negative multiply and subtract
> ;; Rewrite (- (a * b) - c) into the canonical form: ((-a) * b) - c

suggests that someone's already determined that the later form is more 
likely to be generated by GCC.  We probably ought to re-verify that, 
though that's not 100% necessary to do right away.

I do see from other ports that all of the versions of these patterns 
without the unspec should be protected by TARGET_FUSED_MADD and the 
-mfused-add command-line option to control it.  Note that FUSED_MADD is 
enabled by default on all those targets that implement it.

I see i386 used to have the option, but someone decided that 
-mfused-madd should imply -mavx instead.  Which is silly since that's 
not the same thing at all; -m{avx,xop} -mno-fused-madd is a very 
sensible combination of options if your numerical algorithm can't stand 
the fused operation.


r~

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Vector permutation support for x86
  2009-12-02 23:05       ` Sebastian Pop
@ 2009-12-02 23:39         ` Sebastian Pop
  2009-12-02 23:55         ` Richard Henderson
  1 sibling, 0 replies; 45+ messages in thread
From: Sebastian Pop @ 2009-12-02 23:39 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Uros Bizjak, GCC Patches, Harle, Christophe

[-- Attachment #1: Type: text/plain, Size: 1346 bytes --]

On Wed, Dec 2, 2009 at 16:57, Sebastian Pop <sebpop@gmail.com> wrote:
> On Wed, Dec 2, 2009 at 16:23, Richard Henderson <rth@redhat.com> wrote:
>>> @@ -2030,11 +2020,10 @@
>>>        (minus:SSEMODEF4
>>>         (mult:SSEMODEF4
>>>          (neg:SSEMODEF4
>>> -          (match_operand:SSEMODEF4 1 "nonimmediate_operand" "x,x"))
>>> -         (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,xm"))
>>> +          (match_operand:SSEMODEF4 1 "register_operand" "%x,x"))
>>> +         (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,m"))
>>>         (match_operand:SSEMODEF4 3 "nonimmediate_operand" "xm,x")))]
>>
>> Careful -- these with NEG are not commutative.  Five instances.
>
> This escaped me.
>
> But there is another problem: in the spec src1 and src2 are part
> of a multiplication and so op1 and op2 *are* commutative.
> For instance, in the XOP manual we have
>
> VFNMSUBPD dest, src1, src2, src3
>
> with the semantics: "dest = - (src1 * src2) - src3"
>
> that means that the above patterns are just wrong wrt. the XOP manual:
> the pattern as implemented in sse.md is "dest = (-src1) * src2 - src3
>
> I should go over the semantics of all these insns.
>

Fixed like this.

Sebastian

[-- Attachment #2: 0001-Fix-VFNMSUB-insns.patch --]
[-- Type: text/x-patch, Size: 5285 bytes --]

From cd5b475525fe364e99675f366f80bb872d5db9cd Mon Sep 17 00:00:00 2001
From: Sebastian Pop <sebpop@gmail.com>
Date: Wed, 2 Dec 2009 17:09:51 -0600
Subject: [PATCH] Fix VFNMSUB insns.

---
 gcc/config/i386/sse.md |   64 ++++++++++++++++++++++++------------------------
 1 files changed, 32 insertions(+), 32 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 7bb4802..e222143 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1816,10 +1816,10 @@
 (define_insn "fma4_fnmsub<mode>4256"
   [(set (match_operand:FMA4MODEF4 0 "register_operand" "=x,x")
 	(minus:FMA4MODEF4
-	 (mult:FMA4MODEF4
-	  (neg:FMA4MODEF4
-	   (match_operand:FMA4MODEF4 1 "register_operand" "%x,x"))
-	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,m"))
+	 (neg:FMA4MODEF4
+	  (mult:FMA4MODEF4
+	   (match_operand:FMA4MODEF4 1 "register_operand" "%x,x")
+	   (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,m")))
 	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x")))]
   "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmsub<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
@@ -1830,10 +1830,10 @@
 (define_split
   [(set (match_operand:FMA4MODEF4 0 "register_operand" "")
 	(minus:FMA4MODEF4
-	 (mult:FMA4MODEF4
-	  (neg:FMA4MODEF4
-	   (match_operand:FMA4MODEF4 1 "nonimmediate_operand" ""))
-	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" ""))
+	 (neg:FMA4MODEF4
+	  (mult:FMA4MODEF4
+	   (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "")
+	   (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "")))
 	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "")))]
   "TARGET_FMA4
    && MEM_P (operands[2])
@@ -2018,10 +2018,10 @@
 (define_insn "fma4_fnmsub<mode>4"
   [(set (match_operand:SSEMODEF4 0 "register_operand" "=x,x")
 	(minus:SSEMODEF4
-	 (mult:SSEMODEF4
-	  (neg:SSEMODEF4
-	   (match_operand:SSEMODEF4 1 "register_operand" "%x,x"))
-	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,m"))
+	 (neg:SSEMODEF4
+	  (mult:SSEMODEF4
+	   (match_operand:SSEMODEF4 1 "register_operand" "%x,x")
+	   (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,m")))
 	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "xm,x")))]
   "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmsub<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
@@ -2032,10 +2032,10 @@
 (define_split
   [(set (match_operand:SSEMODEF4 0 "register_operand" "")
 	(minus:SSEMODEF4
-	 (mult:SSEMODEF4
-	  (neg:SSEMODEF4
-	   (match_operand:SSEMODEF4 1 "nonimmediate_operand" ""))
-	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" ""))
+	 (neg:SSEMODEF4
+	  (mult:SSEMODEF4
+	   (match_operand:SSEMODEF4 1 "nonimmediate_operand" "")
+	   (match_operand:SSEMODEF4 2 "nonimmediate_operand" "")))
 	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "")))]
   "TARGET_FMA4
    && MEM_P (operands[2])
@@ -2058,10 +2058,10 @@
   [(set (match_operand:SSEMODEF2P 0 "register_operand" "=x,x")
 	(vec_merge:SSEMODEF2P
 	 (minus:SSEMODEF2P
-	  (mult:SSEMODEF2P
-	   (neg:SSEMODEF2P
-	    (match_operand:SSEMODEF2P 1 "register_operand" "%x,x"))
-	   (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m"))
+	  (neg:SSEMODEF2P
+	   (mult:SSEMODEF2P
+	    (match_operand:SSEMODEF2P 1 "register_operand" "%x,x")
+	    (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m")))
 	  (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))
 	 (match_dup 0)
 	 (const_int 1)))]
@@ -2118,10 +2118,10 @@
   [(set (match_operand:FMA4MODEF4 0 "register_operand" "=x,x")
 	(unspec:FMA4MODEF4
 	 [(minus:FMA4MODEF4
-	   (mult:FMA4MODEF4
-	    (neg:FMA4MODEF4
-	     (match_operand:FMA4MODEF4 1 "register_operand" "%x,x"))
-	    (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,m"))
+	   (neg:FMA4MODEF4
+	    (mult:FMA4MODEF4
+	     (match_operand:FMA4MODEF4 1 "register_operand" "%x,x")
+	     (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,m")))
 	   (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x"))]
 	 UNSPEC_FMA4_INTRINSIC))]
   "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
@@ -2176,10 +2176,10 @@
   [(set (match_operand:SSEMODEF2P 0 "register_operand" "=x,x")
 	(unspec:SSEMODEF2P
 	 [(minus:SSEMODEF2P
-	   (mult:SSEMODEF2P
-	    (neg:SSEMODEF2P
-	     (match_operand:SSEMODEF2P 1 "register_operand" "%x,x"))
-	    (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m"))
+	   (neg:SSEMODEF2P
+	    (mult:SSEMODEF2P
+	     (match_operand:SSEMODEF2P 1 "register_operand" "%x,x")
+	     (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m")))
 	   (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))]
 	 UNSPEC_FMA4_INTRINSIC))]
   "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
@@ -2245,10 +2245,10 @@
 	(unspec:SSEMODEF2P
 	 [(vec_merge:SSEMODEF2P
 	   (minus:SSEMODEF2P
-	    (mult:SSEMODEF2P
-	     (neg:SSEMODEF2P
-	      (match_operand:SSEMODEF2P 1 "register_operand" "%x,x"))
-	     (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m"))
+	    (neg:SSEMODEF2P
+	     (mult:SSEMODEF2P
+	      (match_operand:SSEMODEF2P 1 "register_operand" "%x,x")
+	      (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m")))
 	    (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))
 	   (match_dup 0)
 	   (const_int 1))]
-- 
1.6.0.4


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Vector permutation support for x86
  2009-12-02 22:38     ` Richard Henderson
@ 2009-12-02 23:05       ` Sebastian Pop
  2009-12-02 23:39         ` Sebastian Pop
  2009-12-02 23:55         ` Richard Henderson
  2009-12-03 19:30       ` Sebastian Pop
  1 sibling, 2 replies; 45+ messages in thread
From: Sebastian Pop @ 2009-12-02 23:05 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Uros Bizjak, GCC Patches, Harle, Christophe

On Wed, Dec 2, 2009 at 16:23, Richard Henderson <rth@redhat.com> wrote:
>> @@ -2030,11 +2020,10 @@
>>        (minus:SSEMODEF4
>>         (mult:SSEMODEF4
>>          (neg:SSEMODEF4
>> -          (match_operand:SSEMODEF4 1 "nonimmediate_operand" "x,x"))
>> -         (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,xm"))
>> +          (match_operand:SSEMODEF4 1 "register_operand" "%x,x"))
>> +         (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,m"))
>>         (match_operand:SSEMODEF4 3 "nonimmediate_operand" "xm,x")))]
>
> Careful -- these with NEG are not commutative.  Five instances.

This escaped me.

But there is another problem: in the spec src1 and src2 are part
of a multiplication and so op1 and op2 *are* commutative.
For instance, in the XOP manual we have

VFNMSUBPD dest, src1, src2, src3

with the semantics: "dest = - (src1 * src2) - src3"

that means that the above patterns are just wrong wrt. the XOP manual:
the pattern as implemented in sse.md is "dest = (-src1) * src2 - src3

I should go over the semantics of all these insns.

Sebastian

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Vector permutation support for x86
  2009-12-02 22:23   ` Sebastian Pop
@ 2009-12-02 22:38     ` Richard Henderson
  2009-12-02 23:05       ` Sebastian Pop
  2009-12-03 19:30       ` Sebastian Pop
  0 siblings, 2 replies; 45+ messages in thread
From: Richard Henderson @ 2009-12-02 22:38 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: Uros Bizjak, GCC Patches

> @@ -1724,8 +1723,8 @@
>  	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" ""))
>  	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "")))]
>    "TARGET_FMA4
> -   && !ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)
> -   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)
> +   && MEM_P (operands[2])
> +   && (MEM_P (operands[1]) || MEM_P (operands[3]))
>     && !reg_mentioned_p (operands[0], operands[1])
>     && !reg_mentioned_p (operands[0], operands[2])
>     && !reg_mentioned_p (operands[0], operands[3])"

This is the splitter under fma4_fmadd<mode>4256", but the same comment 
applies to all of the fma4 splitters.

First, MEM_P(operands[2]) would be better written as "memory_operand" in 
the match_operand for op2.  Second, two of the reg_mentioned_p tests are 
*always* going to be false for these patterns, for the simple reason 
that operands[0] is a vector float register and the only registers that 
would be present in a memory operand is an address register.

So I think these splitters would be better written:

;; Split fmadd with two memory operands into a load and the fmadd.
(define_split
   [(set (match_operand:FMA4MODEF4 0 "register_operand" "")
         (plus:FMA4MODEF4
          (mult:FMA4MODEF4
           (match_operand:FMA4MODEF4 1 "register_operand" "")
           (match_operand:FMA4MODEF4 2 "memory_operand" ""))
          (match_operand:FMA4MODEF4 3 "memory_operand" "")))]
   "TARGET_FMA4"
   [(set (match_dup 0)
	(plus:FMA4MODEF4
	  (mult:FMA4MODEF4 (match_dup 1) (match_dup 2))
	  (match_dup 3)))]
{
   if (!ix86_expand_fma4_multiple_memory (operands, <MODE>mode))
     FAIL;
})

bool
ix86_expand_fma4_multiple_memory (rtx operands[],
				  enum machine_mode mode)
{
   rtx scratch = operands[0];

   gcc_assert (register_operand (operands[0], mode));
   gcc_assert (register_operand (operands[1], mode));
   gcc_assert (MEM_P (operands[2]) && MEM_P (operands[3]));

   if (reg_mentioned_p (scratch, operands[1]))
     {
       if (!can_create_pseudo_p ())
	return false;
       scratch = gen_reg_rtx (mode);
     }

   emit_move_insn (scratch, operands[3]);
   if (rtx_equal_p (operands[2], operands[3]))
     operands[2] = operands[3] = scratch;
   else
     operands[3] = scratch;
   return true;
}

> @@ -2030,11 +2020,10 @@
>  	(minus:SSEMODEF4
>  	 (mult:SSEMODEF4
>  	  (neg:SSEMODEF4
> -	   (match_operand:SSEMODEF4 1 "nonimmediate_operand" "x,x"))
> -	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,xm"))
> +	   (match_operand:SSEMODEF4 1 "register_operand" "%x,x"))
> +	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,m"))
>  	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "xm,x")))]

Careful -- these with NEG are not commutative.  Five instances.



r~

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Vector permutation support for x86
  2009-12-02 21:22 ` Sebastian Pop
@ 2009-12-02 22:23   ` Sebastian Pop
  2009-12-02 22:38     ` Richard Henderson
  0 siblings, 1 reply; 45+ messages in thread
From: Sebastian Pop @ 2009-12-02 22:23 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: GCC Patches, Richard Henderson

[-- Attachment #1: Type: text/plain, Size: 118 bytes --]

Hi,

Attached is the new patch set that I am testing on amd64-linux.
Ok for trunk after it passes?

Thanks,
Sebastian

[-- Attachment #2: 0001-Remove-unused-operand.patch --]
[-- Type: text/x-patch, Size: 5321 bytes --]

From 2222a3cd31590d183e328fb6634e3c5128af3f47 Mon Sep 17 00:00:00 2001
From: Sebastian Pop <sebpop@gmail.com>
Date: Tue, 1 Dec 2009 14:10:50 -0600
Subject: [PATCH] Remove unused operand.

2009-12-02  Sebastian Pop  <sebastian.pop@amd.com>

	* config/i386/i386.c (ix86_expand_fma4_multiple_memory): Remove unused
	parameter.
	* config/i386/i386-protos.h (ix86_expand_fma4_multiple_memory): Same.
	* config/i386/sse.md: Same.
---
 gcc/config/i386/i386-protos.h |    2 +-
 gcc/config/i386/i386.c        |    5 ++---
 gcc/config/i386/sse.md        |   20 ++++++++++----------
 3 files changed, 13 insertions(+), 14 deletions(-)

diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 1451e79..bb55da1 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -219,7 +219,7 @@ extern void ix86_expand_vector_extract (bool, rtx, rtx, int);
 extern void ix86_expand_reduc_v4sf (rtx (*)(rtx, rtx, rtx), rtx, rtx);
 
 extern bool ix86_fma4_valid_op_p (rtx [], rtx, int, bool, int, bool);
-extern void ix86_expand_fma4_multiple_memory (rtx [], int, enum machine_mode);
+extern void ix86_expand_fma4_multiple_memory (rtx [], enum machine_mode);
 
 extern void ix86_expand_vec_extract_even_odd (rtx, rtx, rtx, unsigned);
 
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 462f2d5..82ec08f 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -28962,12 +28962,11 @@ ix86_fma4_valid_op_p (rtx operands[], rtx insn ATTRIBUTE_UNUSED, int num,
 
 void
 ix86_expand_fma4_multiple_memory (rtx operands[],
-				  int num,
 				  enum machine_mode mode)
 {
   rtx op0 = operands[0];
-  if (num != 4
-      || memory_operand (op0, mode)
+
+  if (memory_operand (op0, mode)
       || reg_mentioned_p (op0, operands[1])
       || reg_mentioned_p (op0, operands[2])
       || reg_mentioned_p (op0, operands[3]))
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 08a3b5b..4899c0a 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1731,7 +1731,7 @@
    && !reg_mentioned_p (operands[0], operands[3])"
   [(const_int 0)]
 {
-  ix86_expand_fma4_multiple_memory (operands, 4, <MODE>mode);
+  ix86_expand_fma4_multiple_memory (operands, <MODE>mode);
   emit_insn (gen_fma4_fmadd<mode>4256 (operands[0], operands[1],
 				    operands[2], operands[3]));
   DONE;
@@ -1768,7 +1768,7 @@
    && !reg_mentioned_p (operands[0], operands[3])"
   [(const_int 0)]
 {
-  ix86_expand_fma4_multiple_memory (operands, 4, <MODE>mode);
+  ix86_expand_fma4_multiple_memory (operands, <MODE>mode);
   emit_insn (gen_fma4_fmsub<mode>4256 (operands[0], operands[1],
 				    operands[2], operands[3]));
   DONE;
@@ -1807,7 +1807,7 @@
    && !reg_mentioned_p (operands[0], operands[3])"
   [(const_int 0)]
 {
-  ix86_expand_fma4_multiple_memory (operands, 4, <MODE>mode);
+  ix86_expand_fma4_multiple_memory (operands, <MODE>mode);
   emit_insn (gen_fma4_fnmadd<mode>4256 (operands[0], operands[1],
 				     operands[2], operands[3]));
   DONE;
@@ -1847,7 +1847,7 @@
    && !reg_mentioned_p (operands[0], operands[3])"
   [(const_int 0)]
 {
-  ix86_expand_fma4_multiple_memory (operands, 4, <MODE>mode);
+  ix86_expand_fma4_multiple_memory (operands, <MODE>mode);
   emit_insn (gen_fma4_fnmsub<mode>4256 (operands[0], operands[1],
 				        operands[2], operands[3]));
   DONE;
@@ -1883,7 +1883,7 @@
    && !reg_mentioned_p (operands[0], operands[3])"
   [(const_int 0)]
 {
-  ix86_expand_fma4_multiple_memory (operands, 4, <MODE>mode);
+  ix86_expand_fma4_multiple_memory (operands, <MODE>mode);
   emit_insn (gen_fma4_fmadd<mode>4 (operands[0], operands[1],
 				    operands[2], operands[3]));
   DONE;
@@ -1939,7 +1939,7 @@
    && !reg_mentioned_p (operands[0], operands[3])"
   [(const_int 0)]
 {
-  ix86_expand_fma4_multiple_memory (operands, 4, <MODE>mode);
+  ix86_expand_fma4_multiple_memory (operands, <MODE>mode);
   emit_insn (gen_fma4_fmsub<mode>4 (operands[0], operands[1],
 				    operands[2], operands[3]));
   DONE;
@@ -1997,7 +1997,7 @@
    && !reg_mentioned_p (operands[0], operands[3])"
   [(const_int 0)]
 {
-  ix86_expand_fma4_multiple_memory (operands, 4, <MODE>mode);
+  ix86_expand_fma4_multiple_memory (operands, <MODE>mode);
   emit_insn (gen_fma4_fnmadd<mode>4 (operands[0], operands[1],
 				     operands[2], operands[3]));
   DONE;
@@ -2056,7 +2056,7 @@
    && !reg_mentioned_p (operands[0], operands[3])"
   [(const_int 0)]
 {
-  ix86_expand_fma4_multiple_memory (operands, 4, <MODE>mode);
+  ix86_expand_fma4_multiple_memory (operands, <MODE>mode);
   emit_insn (gen_fma4_fnmsub<mode>4 (operands[0], operands[1],
 				     operands[2], operands[3]));
   DONE;
@@ -10384,7 +10384,7 @@
    && !reg_mentioned_p (operands[0], operands[3])"
   [(const_int 0)]
 {
-  ix86_expand_fma4_multiple_memory (operands, 4, V8HImode);
+  ix86_expand_fma4_multiple_memory (operands, V8HImode);
   emit_insn (gen_xop_pmacsww (operands[0], operands[1], operands[2],
 			      operands[3]));
   DONE;
@@ -10436,7 +10436,7 @@
    && !reg_mentioned_p (operands[0], operands[3])"
   [(const_int 0)]
 {
-  ix86_expand_fma4_multiple_memory (operands, 4, V4SImode);
+  ix86_expand_fma4_multiple_memory (operands, V4SImode);
   emit_insn (gen_xop_pmacsdd (operands[0], operands[1], operands[2],
 			      operands[3]));
   DONE;
-- 
1.6.0.4


[-- Attachment #3: 0002-For-FMA4-force-all-operands-into-registers.patch --]
[-- Type: text/x-patch, Size: 1196 bytes --]

From e170ea82fd01e2698fadba1dbdd2ae8a82a5b816 Mon Sep 17 00:00:00 2001
From: Sebastian Pop <sebpop@gmail.com>
Date: Wed, 2 Dec 2009 13:15:48 -0600
Subject: [PATCH] For FMA4, force all operands into registers.

2009-12-02  Richard Henderson  <rth@redhat.com>

	* config/i386/i386.c (ix86_fixup_binary_operands): For FMA4, force
	all operands into registers.
---
 gcc/config/i386/i386.c |   10 ++++++++++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 82ec08f..436e935 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -13384,6 +13384,16 @@ ix86_fixup_binary_operands (enum rtx_code code, enum machine_mode mode,
   if (MEM_P (src1) && !rtx_equal_p (dst, src1))
     src1 = force_reg (mode, src1);
 
+  /* In order for the multiply-add patterns to get matched, we need
+     to aid combine by forcing all operands into registers to start.  */
+  if (optimize && TARGET_FMA4)
+    {
+      if (MEM_P (src2))
+	src2 = force_reg (GET_MODE (src2), src2);
+      else if (MEM_P (src1))
+	src1 = force_reg (GET_MODE (src1), src1);
+    }
+
   operands[1] = src1;
   operands[2] = src2;
   return dst;
-- 
1.6.0.4


[-- Attachment #4: 0003-Fix-FMA4-and-XOP-insns.patch --]
[-- Type: text/x-patch, Size: 64083 bytes --]

From b8cce340ac571335fd517dcb5bced277bbb8f733 Mon Sep 17 00:00:00 2001
From: Sebastian Pop <sebpop@gmail.com>
Date: Wed, 2 Dec 2009 15:39:15 -0600
Subject: [PATCH] Fix FMA4 and XOP insns.

2009-12-02  Sebastian Pop  <sebastian.pop@amd.com>

	* config/i386/i386-protos.h (ix86_fma4_valid_op_p): Removed.
	* config/i386/i386.c (ix86_fma4_valid_op_p): Removed.
	* config/i386/i386.md: Do not use ix86_fma4_valid_op_p.

	* config/i386/sse.md (fma4_*): Remove alternative with operand 1
	matching a memory access.  Do not use ix86_fma4_valid_op_p.
	(xop_*): Same.
	Do not use ix86_fma4_valid_op_p in FMA4 and XOP splitters.
---
 gcc/config/i386/i386-protos.h |    1 -
 gcc/config/i386/i386.c        |  157 -----------
 gcc/config/i386/i386.md       |    2 +-
 gcc/config/i386/sse.md        |  590 +++++++++++++++++++----------------------
 4 files changed, 270 insertions(+), 480 deletions(-)

diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index bb55da1..27fca86 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -218,7 +218,6 @@ extern void ix86_expand_vector_set (bool, rtx, rtx, int);
 extern void ix86_expand_vector_extract (bool, rtx, rtx, int);
 extern void ix86_expand_reduc_v4sf (rtx (*)(rtx, rtx, rtx), rtx, rtx);
 
-extern bool ix86_fma4_valid_op_p (rtx [], rtx, int, bool, int, bool);
 extern void ix86_expand_fma4_multiple_memory (rtx [], enum machine_mode);
 
 extern void ix86_expand_vec_extract_even_odd (rtx, rtx, rtx, unsigned);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 436e935..a0a2001 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -28807,163 +28807,6 @@ ix86_expand_round (rtx operand0, rtx operand1)
   emit_move_insn (operand0, res);
 }
 \f
-/* Validate whether a FMA4 instruction is valid or not.
-   OPERANDS is the array of operands.
-   NUM is the number of operands.
-   USES_OC0 is true if the instruction uses OC0 and provides 4 variants.
-   NUM_MEMORY is the maximum number of memory operands to accept.
-   NUM_MEMORY less than zero is a special case to allow an operand
-   of an instruction to be memory operation.
-   when COMMUTATIVE is set, operand 1 and 2 can be swapped.  */
-
-bool
-ix86_fma4_valid_op_p (rtx operands[], rtx insn ATTRIBUTE_UNUSED, int num,
-		      bool uses_oc0, int num_memory, bool commutative)
-{
-  int mem_mask;
-  int mem_count;
-  int i;
-
-  /* Count the number of memory arguments */
-  mem_mask = 0;
-  mem_count = 0;
-  for (i = 0; i < num; i++)
-    {
-      enum machine_mode mode = GET_MODE (operands[i]);
-      if (register_operand (operands[i], mode))
-	;
-
-      else if (memory_operand (operands[i], mode))
-	{
-	  mem_mask |= (1 << i);
-	  mem_count++;
-	}
-
-      else
-	{
-	  rtx pattern = PATTERN (insn);
-
-	  /* allow 0 for pcmov */
-	  if (GET_CODE (pattern) != SET
-	      || GET_CODE (SET_SRC (pattern)) != IF_THEN_ELSE
-	      || i < 2
-	      || operands[i] != CONST0_RTX (mode))
-	    return false;
-	}
-    }
-
-  /* Special case pmacsdq{l,h} where we allow the 3rd argument to be
-     a memory operation.  */
-  if (num_memory < 0)
-    {
-      num_memory = -num_memory;
-      if ((mem_mask & (1 << (num-1))) != 0)
-	{
-	  mem_mask &= ~(1 << (num-1));
-	  mem_count--;
-	}
-    }
-
-  /* If there were no memory operations, allow the insn */
-  if (mem_mask == 0)
-    return true;
-
-  /* Do not allow the destination register to be a memory operand.  */
-  else if (mem_mask & (1 << 0))
-    return false;
-
-  /* If there are too many memory operations, disallow the instruction.  While
-     the hardware only allows 1 memory reference, before register allocation
-     for some insns, we allow two memory operations sometimes in order to allow
-     code like the following to be optimized:
-
-	float fmadd (float *a, float *b, float *c) { return (*a * *b) + *c; }
-
-    or similar cases that are vectorized into using the vfmaddss
-    instruction.  */
-  else if (mem_count > num_memory)
-    return false;
-
-  /* Don't allow more than one memory operation if not optimizing.  */
-  else if (mem_count > 1 && !optimize)
-    return false;
-
-  else if (num == 4 && mem_count == 1)
-    {
-      /* formats (destination is the first argument), example vfmaddss:
-	 xmm1, xmm1, xmm2, xmm3/mem
-	 xmm1, xmm1, xmm2/mem, xmm3
-	 xmm1, xmm2, xmm3/mem, xmm1
-	 xmm1, xmm2/mem, xmm3, xmm1 */
-      if (uses_oc0)
-	return ((mem_mask == (1 << 1))
-		|| (mem_mask == (1 << 2))
-		|| (mem_mask == (1 << 3)));
-
-      /* format, example vpmacsdd:
-	 xmm1, xmm2, xmm3/mem, xmm1 */
-      if (commutative)
-	return (mem_mask == (1 << 2) || mem_mask == (1 << 1));
-      else
-	return (mem_mask == (1 << 2));
-    }
-
-  else if (num == 4 && num_memory == 2)
-    {
-      /* If there are two memory operations, we can load one of the memory ops
-	 into the destination register.  This is for optimizing the
-	 multiply/add ops, which the combiner has optimized both the multiply
-	 and the add insns to have a memory operation.  We have to be careful
-	 that the destination doesn't overlap with the inputs.  */
-      rtx op0 = operands[0];
-
-      if (reg_mentioned_p (op0, operands[1])
-	  || reg_mentioned_p (op0, operands[2])
-	  || reg_mentioned_p (op0, operands[3]))
-	return false;
-
-      /* formats (destination is the first argument), example vfmaddss:
-	 xmm1, xmm1, xmm2, xmm3/mem
-	 xmm1, xmm1, xmm2/mem, xmm3
-	 xmm1, xmm2, xmm3/mem, xmm1
-	 xmm1, xmm2/mem, xmm3, xmm1
-
-         For the oc0 case, we will load either operands[1] or operands[3] into
-         operands[0], so any combination of 2 memory operands is ok.  */
-      if (uses_oc0)
-	return true;
-
-      /* format, example vpmacsdd:
-	 xmm1, xmm2, xmm3/mem, xmm1
-
-         For the integer multiply/add instructions be more restrictive and
-         require operands[2] and operands[3] to be the memory operands.  */
-      if (commutative)
-	return (mem_mask == ((1 << 1) | (1 << 3)) || ((1 << 2) | (1 << 3)));
-      else
-	return (mem_mask == ((1 << 2) | (1 << 3)));
-    }
-
-  else if (num == 3 && num_memory == 1)
-    {
-      /* formats, example vprotb:
-	 xmm1, xmm2, xmm3/mem
-	 xmm1, xmm2/mem, xmm3 */
-      if (uses_oc0)
-	return ((mem_mask == (1 << 1)) || (mem_mask == (1 << 2)));
-
-      /* format, example vpcomeq:
-	 xmm1, xmm2, xmm3/mem */
-      else
-	return (mem_mask == (1 << 2));
-    }
-
-  else
-    gcc_unreachable ();
-
-  return false;
-}
-
 
 /* Fixup an FMA4 instruction that has 2 memory input references into a form the
    hardware will allow by using the destination register to load one of the
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 851061d..1ef3025 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -19248,7 +19248,7 @@
 	  (match_operand:MODEF 1 "register_operand" "x")
 	  (match_operand:MODEF 2 "register_operand" "x")
 	  (match_operand:MODEF 3 "register_operand" "x")))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)"
+  "TARGET_XOP"
   "vpcmov\t{%1, %3, %2, %0|%0, %2, %3, %1}"
   [(set_attr "type" "sse4arg")])
 
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 4899c0a..7bb4802 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1703,14 +1703,13 @@
 ;;	(set (reg3) (plus (reg2) (mem (addr3))))
 
 (define_insn "fma4_fmadd<mode>4256"
-  [(set (match_operand:FMA4MODEF4 0 "register_operand" "=x,x,x")
+  [(set (match_operand:FMA4MODEF4 0 "register_operand" "=x,x")
 	(plus:FMA4MODEF4
 	 (mult:FMA4MODEF4
-	  (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "x,x,xm")
-	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,xm,x"))
-	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x,x")))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+	  (match_operand:FMA4MODEF4 1 "register_operand" "%x,x")
+	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,m"))
+	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x")))]
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmadd<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -1724,8 +1723,8 @@
 	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" ""))
 	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "")))]
   "TARGET_FMA4
-   && !ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)
+   && MEM_P (operands[2])
+   && (MEM_P (operands[1]) || MEM_P (operands[3]))
    && !reg_mentioned_p (operands[0], operands[1])
    && !reg_mentioned_p (operands[0], operands[2])
    && !reg_mentioned_p (operands[0], operands[3])"
@@ -1740,14 +1739,13 @@
 ;; Floating multiply and subtract
 ;; Allow two memory operands the same as fmadd
 (define_insn "fma4_fmsub<mode>4256"
-  [(set (match_operand:FMA4MODEF4 0 "register_operand" "=x,x,x")
+  [(set (match_operand:FMA4MODEF4 0 "register_operand" "=x,x")
 	(minus:FMA4MODEF4
 	 (mult:FMA4MODEF4
-	  (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "x,x,xm")
-	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,xm,x"))
-	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x,x")))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+	  (match_operand:FMA4MODEF4 1 "register_operand" "%x,x")
+	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,m"))
+	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x")))]
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsub<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -1761,8 +1759,8 @@
 	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" ""))
 	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "")))]
   "TARGET_FMA4
-   && !ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)
+   && MEM_P (operands[2])
+   && (MEM_P (operands[1]) || MEM_P (operands[3]))
    && !reg_mentioned_p (operands[0], operands[1])
    && !reg_mentioned_p (operands[0], operands[2])
    && !reg_mentioned_p (operands[0], operands[3])"
@@ -1779,14 +1777,13 @@
 ;; Note operands are out of order to simplify call to ix86_fma4_valid_p
 ;; Allow two memory operands to help in optimizing.
 (define_insn "fma4_fnmadd<mode>4256"
-  [(set (match_operand:FMA4MODEF4 0 "register_operand" "=x,x,x")
+  [(set (match_operand:FMA4MODEF4 0 "register_operand" "=x,x")
 	(minus:FMA4MODEF4
-	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x,x")
+	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x")
 	 (mult:FMA4MODEF4
-	  (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "x,x,xm")
-	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,xm,x"))))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+	  (match_operand:FMA4MODEF4 1 "register_operand" "%x,x")
+	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,m"))))]
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmadd<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -1800,8 +1797,8 @@
 	  (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "")
 	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" ""))))]
   "TARGET_FMA4
-   && !ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)
+   && MEM_P (operands[2])
+   && (MEM_P (operands[1]) || MEM_P (operands[3]))
    && !reg_mentioned_p (operands[0], operands[1])
    && !reg_mentioned_p (operands[0], operands[2])
    && !reg_mentioned_p (operands[0], operands[3])"
@@ -1821,11 +1818,10 @@
 	(minus:FMA4MODEF4
 	 (mult:FMA4MODEF4
 	  (neg:FMA4MODEF4
-	   (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "x,x"))
-	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,xm"))
+	   (match_operand:FMA4MODEF4 1 "register_operand" "%x,x"))
+	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,m"))
 	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x")))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, false)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmsub<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -1840,8 +1836,8 @@
 	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" ""))
 	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "")))]
   "TARGET_FMA4
-   && !ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, false)
+   && MEM_P (operands[2])
+   && (MEM_P (operands[1]) || MEM_P (operands[3]))
    && !reg_mentioned_p (operands[0], operands[1])
    && !reg_mentioned_p (operands[0], operands[2])
    && !reg_mentioned_p (operands[0], operands[3])"
@@ -1855,14 +1851,13 @@
 
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 (define_insn "fma4_fmadd<mode>4"
-  [(set (match_operand:SSEMODEF4 0 "register_operand" "=x,x,x")
+  [(set (match_operand:SSEMODEF4 0 "register_operand" "=x,x")
 	(plus:SSEMODEF4
 	 (mult:SSEMODEF4
-	  (match_operand:SSEMODEF4 1 "nonimmediate_operand" "x,x,xm")
-	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,xm,x"))
-	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "xm,x,x")))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+	  (match_operand:SSEMODEF4 1 "register_operand" "%x,x")
+	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,m"))
+	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "xm,x")))]
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmadd<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -1876,8 +1871,8 @@
 	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" ""))
 	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "")))]
   "TARGET_FMA4
-   && !ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)
+   && MEM_P (operands[2])
+   && (MEM_P (operands[1]) || MEM_P (operands[3]))
    && !reg_mentioned_p (operands[0], operands[1])
    && !reg_mentioned_p (operands[0], operands[2])
    && !reg_mentioned_p (operands[0], operands[3])"
@@ -1897,13 +1892,12 @@
 	(vec_merge:SSEMODEF2P
 	 (plus:SSEMODEF2P
 	  (mult:SSEMODEF2P
-	   (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "x,x")
-	   (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm"))
+	   (match_operand:SSEMODEF2P 1 "register_operand" "%x,x")
+	   (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m"))
 	  (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))
 	 (match_dup 0)
 	 (const_int 1)))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmadd<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -1911,14 +1905,13 @@
 ;; Floating multiply and subtract
 ;; Allow two memory operands the same as fmadd
 (define_insn "fma4_fmsub<mode>4"
-  [(set (match_operand:SSEMODEF4 0 "register_operand" "=x,x,x")
+  [(set (match_operand:SSEMODEF4 0 "register_operand" "=x,x")
 	(minus:SSEMODEF4
 	 (mult:SSEMODEF4
-	  (match_operand:SSEMODEF4 1 "nonimmediate_operand" "x,x,xm")
-	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,xm,x"))
-	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "xm,x,x")))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+	  (match_operand:SSEMODEF4 1 "register_operand" "%x,x")
+	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,m"))
+	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "xm,x")))]
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsub<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -1932,8 +1925,8 @@
 	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" ""))
 	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "")))]
   "TARGET_FMA4
-   && !ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)
+   && MEM_P (operands[2])
+   && (MEM_P (operands[1]) || MEM_P (operands[3]))
    && !reg_mentioned_p (operands[0], operands[1])
    && !reg_mentioned_p (operands[0], operands[2])
    && !reg_mentioned_p (operands[0], operands[3])"
@@ -1953,13 +1946,12 @@
 	(vec_merge:SSEMODEF2P
 	 (minus:SSEMODEF2P
 	  (mult:SSEMODEF2P
-	   (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "x,x")
-	   (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm"))
+	   (match_operand:SSEMODEF2P 1 "register_operand" "%x,x")
+	   (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m"))
 	  (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))
 	 (match_dup 0)
 	 (const_int 1)))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsub<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -1969,14 +1961,13 @@
 ;; Note operands are out of order to simplify call to ix86_fma4_valid_p
 ;; Allow two memory operands to help in optimizing.
 (define_insn "fma4_fnmadd<mode>4"
-  [(set (match_operand:SSEMODEF4 0 "register_operand" "=x,x,x")
+  [(set (match_operand:SSEMODEF4 0 "register_operand" "=x,x")
 	(minus:SSEMODEF4
-	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "xm,x,x")
+	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "xm,x")
 	 (mult:SSEMODEF4
-	  (match_operand:SSEMODEF4 1 "nonimmediate_operand" "x,x,xm")
-	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,xm,x"))))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+	  (match_operand:SSEMODEF4 1 "register_operand" "%x,x")
+	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,m"))))]
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmadd<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -1990,8 +1981,8 @@
 	  (match_operand:SSEMODEF4 1 "nonimmediate_operand" "")
 	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" ""))))]
   "TARGET_FMA4
-   && !ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)
+   && MEM_P (operands[2])
+   && (MEM_P (operands[1]) || MEM_P (operands[3]))
    && !reg_mentioned_p (operands[0], operands[1])
    && !reg_mentioned_p (operands[0], operands[2])
    && !reg_mentioned_p (operands[0], operands[3])"
@@ -2012,12 +2003,11 @@
 	 (minus:SSEMODEF2P
 	  (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x")
 	  (mult:SSEMODEF2P
-	   (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "x,x")
-	   (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm")))
+	   (match_operand:SSEMODEF2P 1 "register_operand" "%x,x")
+	   (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m")))
 	 (match_dup 0)
 	 (const_int 1)))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmadd<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2030,11 +2020,10 @@
 	(minus:SSEMODEF4
 	 (mult:SSEMODEF4
 	  (neg:SSEMODEF4
-	   (match_operand:SSEMODEF4 1 "nonimmediate_operand" "x,x"))
-	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,xm"))
+	   (match_operand:SSEMODEF4 1 "register_operand" "%x,x"))
+	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,m"))
 	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "xm,x")))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, false)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmsub<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2049,8 +2038,8 @@
 	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" ""))
 	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "")))]
   "TARGET_FMA4
-   && !ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, false)
+   && MEM_P (operands[2])
+   && (MEM_P (operands[1]) || MEM_P (operands[3]))
    && !reg_mentioned_p (operands[0], operands[1])
    && !reg_mentioned_p (operands[0], operands[2])
    && !reg_mentioned_p (operands[0], operands[3])"
@@ -2071,13 +2060,12 @@
 	 (minus:SSEMODEF2P
 	  (mult:SSEMODEF2P
 	   (neg:SSEMODEF2P
-	    (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "x,x"))
-	   (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm"))
+	    (match_operand:SSEMODEF2P 1 "register_operand" "%x,x"))
+	   (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m"))
 	  (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))
 	 (match_dup 0)
 	 (const_int 1)))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, false)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmsub<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2089,11 +2077,11 @@
 	(unspec:FMA4MODEF4
 	 [(plus:FMA4MODEF4
 	   (mult:FMA4MODEF4
-	    (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "x,x")
-	    (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,xm"))
+	    (match_operand:FMA4MODEF4 1 "register_operand" "%x,x")
+	    (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,m"))
 	   (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x"))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmadd<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2103,11 +2091,11 @@
 	(unspec:FMA4MODEF4
 	 [(minus:FMA4MODEF4
 	   (mult:FMA4MODEF4
-	    (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "x,x")
-	    (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,xm"))
+	    (match_operand:FMA4MODEF4 1 "register_operand" "%x,x")
+	    (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,m"))
 	   (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x"))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsub<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2118,10 +2106,10 @@
 	 [(minus:FMA4MODEF4
 	   (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x")
 	   (mult:FMA4MODEF4
-	    (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "x,x")
-	    (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,xm")))]
+	    (match_operand:FMA4MODEF4 1 "register_operand" "%x,x")
+	    (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,m")))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmadd<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2132,11 +2120,11 @@
 	 [(minus:FMA4MODEF4
 	   (mult:FMA4MODEF4
 	    (neg:FMA4MODEF4
-	     (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "x,x"))
-	    (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,xm"))
+	     (match_operand:FMA4MODEF4 1 "register_operand" "%x,x"))
+	    (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,m"))
 	   (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x"))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmsub<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2147,11 +2135,11 @@
 	(unspec:SSEMODEF2P
 	 [(plus:SSEMODEF2P
 	   (mult:SSEMODEF2P
-	    (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "x,x")
-	    (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm"))
+	    (match_operand:SSEMODEF2P 1 "register_operand" "%x,x")
+	    (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m"))
 	   (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmadd<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2161,11 +2149,11 @@
 	(unspec:SSEMODEF2P
 	 [(minus:SSEMODEF2P
 	   (mult:SSEMODEF2P
-	    (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "x,x")
-	    (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm"))
+	    (match_operand:SSEMODEF2P 1 "register_operand" "%x,x")
+	    (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m"))
 	   (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsub<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2176,10 +2164,10 @@
 	 [(minus:SSEMODEF2P
 	   (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x")
 	   (mult:SSEMODEF2P
-	    (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "x,x")
-	    (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm")))]
+	    (match_operand:SSEMODEF2P 1 "register_operand" "%x,x")
+	    (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m")))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmadd<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2190,11 +2178,11 @@
 	 [(minus:SSEMODEF2P
 	   (mult:SSEMODEF2P
 	    (neg:SSEMODEF2P
-	     (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "x,x"))
-	    (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm"))
+	     (match_operand:SSEMODEF2P 1 "register_operand" "%x,x"))
+	    (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m"))
 	   (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmsub<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2207,13 +2195,13 @@
 	 [(vec_merge:SSEMODEF2P
 	   (plus:SSEMODEF2P
 	    (mult:SSEMODEF2P
-	     (match_operand:SSEMODEF2P 1 "register_operand" "x,x")
-	     (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm"))
+	     (match_operand:SSEMODEF2P 1 "register_operand" "%x,x")
+	     (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m"))
 	    (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))
 	   (match_dup 0)
 	   (const_int 1))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmadd<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<ssescalarmode>")])
@@ -2224,13 +2212,13 @@
 	 [(vec_merge:SSEMODEF2P
 	   (minus:SSEMODEF2P
 	    (mult:SSEMODEF2P
-	     (match_operand:SSEMODEF2P 1 "register_operand" "x,x")
-	     (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm"))
+	     (match_operand:SSEMODEF2P 1 "register_operand" "%x,x")
+	     (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m"))
 	    (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))
 	   (match_dup 0)
 	   (const_int 1))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsub<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<ssescalarmode>")])
@@ -2242,12 +2230,12 @@
 	   (minus:SSEMODEF2P
 	    (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x")
 	    (mult:SSEMODEF2P
-	     (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "x,x")
-	     (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm")))
+	     (match_operand:SSEMODEF2P 1 "register_operand" "%x,x")
+	     (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m")))
 	   (match_dup 0)
 	   (const_int 1))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmadd<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<ssescalarmode>")])
@@ -2259,13 +2247,13 @@
 	   (minus:SSEMODEF2P
 	    (mult:SSEMODEF2P
 	     (neg:SSEMODEF2P
-	      (match_operand:SSEMODEF2P 1 "register_operand" "x,x"))
-	     (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm"))
+	      (match_operand:SSEMODEF2P 1 "register_operand" "%x,x"))
+	     (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m"))
 	    (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))
 	   (match_dup 0)
 	   (const_int 1))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmsub<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<ssescalarmode>")])
@@ -2281,8 +2269,8 @@
 	(vec_merge:V8SF
 	  (plus:V8SF
 	    (mult:V8SF
-	      (match_operand:V8SF 1 "nonimmediate_operand" "x,x")
-	      (match_operand:V8SF 2 "nonimmediate_operand" "x,xm"))
+	      (match_operand:V8SF 1 "register_operand" "%x,x")
+	      (match_operand:V8SF 2 "nonimmediate_operand" "x,m"))
 	    (match_operand:V8SF 3 "nonimmediate_operand" "xm,x"))
 	  (minus:V8SF
 	    (mult:V8SF
@@ -2290,8 +2278,7 @@
 	      (match_dup 2))
 	    (match_dup 3))
 	  (const_int 170)))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmaddsubps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V8SF")])
@@ -2301,8 +2288,8 @@
 	(vec_merge:V4DF
 	  (plus:V4DF
 	    (mult:V4DF
-	      (match_operand:V4DF 1 "nonimmediate_operand" "x,x")
-	      (match_operand:V4DF 2 "nonimmediate_operand" "x,xm"))
+	      (match_operand:V4DF 1 "register_operand" "%x,x")
+	      (match_operand:V4DF 2 "nonimmediate_operand" "x,m"))
 	    (match_operand:V4DF 3 "nonimmediate_operand" "xm,x"))
 	  (minus:V4DF
 	    (mult:V4DF
@@ -2310,8 +2297,7 @@
 	      (match_dup 2))
 	    (match_dup 3))
 	  (const_int 10)))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmaddsubpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V4DF")])
@@ -2321,8 +2307,8 @@
 	(vec_merge:V4SF
 	  (plus:V4SF
 	    (mult:V4SF
-	      (match_operand:V4SF 1 "nonimmediate_operand" "x,x")
-	      (match_operand:V4SF 2 "nonimmediate_operand" "x,xm"))
+	      (match_operand:V4SF 1 "register_operand" "%x,x")
+	      (match_operand:V4SF 2 "nonimmediate_operand" "x,m"))
 	    (match_operand:V4SF 3 "nonimmediate_operand" "xm,x"))
 	  (minus:V4SF
 	    (mult:V4SF
@@ -2330,8 +2316,7 @@
 	      (match_dup 2))
 	    (match_dup 3))
 	  (const_int 10)))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmaddsubps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V4SF")])
@@ -2341,8 +2326,8 @@
 	(vec_merge:V2DF
 	  (plus:V2DF
 	    (mult:V2DF
-	      (match_operand:V2DF 1 "nonimmediate_operand" "x,x")
-	      (match_operand:V2DF 2 "nonimmediate_operand" "x,xm"))
+	      (match_operand:V2DF 1 "register_operand" "%x,x")
+	      (match_operand:V2DF 2 "nonimmediate_operand" "x,m"))
 	    (match_operand:V2DF 3 "nonimmediate_operand" "xm,x"))
 	  (minus:V2DF
 	    (mult:V2DF
@@ -2350,8 +2335,7 @@
 	      (match_dup 2))
 	    (match_dup 3))
 	  (const_int 2)))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmaddsubpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V2DF")])
@@ -2361,8 +2345,8 @@
 	(vec_merge:V8SF
 	  (plus:V8SF
 	    (mult:V8SF
-	      (match_operand:V8SF 1 "nonimmediate_operand" "x,x")
-	      (match_operand:V8SF 2 "nonimmediate_operand" "x,xm"))
+	      (match_operand:V8SF 1 "register_operand" "%x,x")
+	      (match_operand:V8SF 2 "nonimmediate_operand" "x,m"))
 	    (match_operand:V8SF 3 "nonimmediate_operand" "xm,x"))
 	  (minus:V8SF
 	    (mult:V8SF
@@ -2370,8 +2354,7 @@
 	      (match_dup 2))
 	    (match_dup 3))
 	  (const_int 85)))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsubaddps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V8SF")])
@@ -2381,8 +2364,8 @@
 	(vec_merge:V4DF
 	  (plus:V4DF
 	    (mult:V4DF
-	      (match_operand:V4DF 1 "nonimmediate_operand" "x,x")
-	      (match_operand:V4DF 2 "nonimmediate_operand" "x,xm"))
+	      (match_operand:V4DF 1 "register_operand" "%x,x")
+	      (match_operand:V4DF 2 "nonimmediate_operand" "x,m"))
 	    (match_operand:V4DF 3 "nonimmediate_operand" "xm,x"))
 	  (minus:V4DF
 	    (mult:V4DF
@@ -2390,8 +2373,7 @@
 	      (match_dup 2))
 	    (match_dup 3))
 	  (const_int 5)))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsubaddpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V4DF")])
@@ -2401,8 +2383,8 @@
 	(vec_merge:V4SF
 	  (plus:V4SF
 	    (mult:V4SF
-	      (match_operand:V4SF 1 "nonimmediate_operand" "x,x")
-	      (match_operand:V4SF 2 "nonimmediate_operand" "x,xm"))
+	      (match_operand:V4SF 1 "register_operand" "%x,x")
+	      (match_operand:V4SF 2 "nonimmediate_operand" "x,m"))
 	    (match_operand:V4SF 3 "nonimmediate_operand" "xm,x"))
 	  (minus:V4SF
 	    (mult:V4SF
@@ -2410,8 +2392,7 @@
 	      (match_dup 2))
 	    (match_dup 3))
 	  (const_int 5)))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsubaddps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V4SF")])
@@ -2421,8 +2402,8 @@
 	(vec_merge:V2DF
 	  (plus:V2DF
 	    (mult:V2DF
-	      (match_operand:V2DF 1 "nonimmediate_operand" "x,x")
-	      (match_operand:V2DF 2 "nonimmediate_operand" "x,xm"))
+	      (match_operand:V2DF 1 "register_operand" "%x,x")
+	      (match_operand:V2DF 2 "nonimmediate_operand" "x,m"))
 	    (match_operand:V2DF 3 "nonimmediate_operand" "xm,x"))
 	  (minus:V2DF
 	    (mult:V2DF
@@ -2430,8 +2411,7 @@
 	      (match_dup 2))
 	    (match_dup 3))
 	  (const_int 1)))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsubaddpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V2DF")])
@@ -2444,8 +2424,8 @@
 	 [(vec_merge:V8SF
 	   (plus:V8SF
 	     (mult:V8SF
-	       (match_operand:V8SF 1 "nonimmediate_operand" "x,x")
-	       (match_operand:V8SF 2 "nonimmediate_operand" "x,xm"))
+	       (match_operand:V8SF 1 "register_operand" "%x,x")
+	       (match_operand:V8SF 2 "nonimmediate_operand" "x,m"))
 	     (match_operand:V8SF 3 "nonimmediate_operand" "xm,x"))
 	   (minus:V8SF
 	     (mult:V8SF
@@ -2454,8 +2434,7 @@
 	     (match_dup 3))
 	   (const_int 170))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmaddsubps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V8SF")])
@@ -2466,8 +2445,8 @@
 	 [(vec_merge:V4DF
 	   (plus:V4DF
 	     (mult:V4DF
-	       (match_operand:V4DF 1 "nonimmediate_operand" "x,x")
-	       (match_operand:V4DF 2 "nonimmediate_operand" "x,xm"))
+	       (match_operand:V4DF 1 "register_operand" "%x,x")
+	       (match_operand:V4DF 2 "nonimmediate_operand" "x,m"))
 	     (match_operand:V4DF 3 "nonimmediate_operand" "xm,x"))
 	   (minus:V4DF
 	     (mult:V4DF
@@ -2476,8 +2455,7 @@
 	     (match_dup 3))
 	   (const_int 10))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmaddsubpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V4DF")])
@@ -2488,8 +2466,8 @@
 	 [(vec_merge:V4SF
 	   (plus:V4SF
 	     (mult:V4SF
-	       (match_operand:V4SF 1 "nonimmediate_operand" "x,x")
-	       (match_operand:V4SF 2 "nonimmediate_operand" "x,xm"))
+	       (match_operand:V4SF 1 "register_operand" "%x,x")
+	       (match_operand:V4SF 2 "nonimmediate_operand" "x,m"))
 	     (match_operand:V4SF 3 "nonimmediate_operand" "xm,x"))
 	   (minus:V4SF
 	     (mult:V4SF
@@ -2498,8 +2476,7 @@
 	     (match_dup 3))
 	   (const_int 10))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmaddsubps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V4SF")])
@@ -2510,8 +2487,8 @@
 	 [(vec_merge:V2DF
 	   (plus:V2DF
 	     (mult:V2DF
-	       (match_operand:V2DF 1 "nonimmediate_operand" "x,x")
-	       (match_operand:V2DF 2 "nonimmediate_operand" "x,xm"))
+	       (match_operand:V2DF 1 "register_operand" "%x,x")
+	       (match_operand:V2DF 2 "nonimmediate_operand" "x,m"))
 	     (match_operand:V2DF 3 "nonimmediate_operand" "xm,x"))
 	   (minus:V2DF
 	     (mult:V2DF
@@ -2520,8 +2497,7 @@
 	     (match_dup 3))
 	   (const_int 2))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmaddsubpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V2DF")])
@@ -2532,8 +2508,8 @@
 	 [(vec_merge:V8SF
 	   (plus:V8SF
 	     (mult:V8SF
-	       (match_operand:V8SF 1 "nonimmediate_operand" "x,x")
-	       (match_operand:V8SF 2 "nonimmediate_operand" "x,xm"))
+	       (match_operand:V8SF 1 "register_operand" "%x,x")
+	       (match_operand:V8SF 2 "nonimmediate_operand" "x,m"))
 	     (match_operand:V8SF 3 "nonimmediate_operand" "xm,x"))
 	   (minus:V8SF
 	     (mult:V8SF
@@ -2542,8 +2518,7 @@
 	     (match_dup 3))
 	   (const_int 85))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsubaddps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V8SF")])
@@ -2554,8 +2529,8 @@
 	 [(vec_merge:V4DF
 	   (plus:V4DF
 	     (mult:V4DF
-	       (match_operand:V4DF 1 "nonimmediate_operand" "x,x")
-	       (match_operand:V4DF 2 "nonimmediate_operand" "x,xm"))
+	       (match_operand:V4DF 1 "register_operand" "%x,x")
+	       (match_operand:V4DF 2 "nonimmediate_operand" "x,m"))
 	     (match_operand:V4DF 3 "nonimmediate_operand" "xm,x"))
 	   (minus:V4DF
 	     (mult:V4DF
@@ -2564,8 +2539,7 @@
 	     (match_dup 3))
 	   (const_int 5))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsubaddpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V4DF")])
@@ -2576,8 +2550,8 @@
 	 [(vec_merge:V4SF
 	   (plus:V4SF
 	     (mult:V4SF
-	       (match_operand:V4SF 1 "nonimmediate_operand" "x,x")
-	       (match_operand:V4SF 2 "nonimmediate_operand" "x,xm"))
+	       (match_operand:V4SF 1 "register_operand" "%x,x")
+	       (match_operand:V4SF 2 "nonimmediate_operand" "x,m"))
 	     (match_operand:V4SF 3 "nonimmediate_operand" "xm,x"))
 	   (minus:V4SF
 	     (mult:V4SF
@@ -2586,8 +2560,7 @@
 	     (match_dup 3))
 	   (const_int 5))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsubaddps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V4SF")])
@@ -2598,8 +2571,8 @@
 	 [(vec_merge:V2DF
 	   (plus:V2DF
 	     (mult:V2DF
-	       (match_operand:V2DF 1 "nonimmediate_operand" "x,x")
-	       (match_operand:V2DF 2 "nonimmediate_operand" "x,xm"))
+	       (match_operand:V2DF 1 "register_operand" "%x,x")
+	       (match_operand:V2DF 2 "nonimmediate_operand" "x,m"))
 	     (match_operand:V2DF 3 "nonimmediate_operand" "xm,x"))
 	   (minus:V2DF
 	     (mult:V2DF
@@ -2608,8 +2581,7 @@
 	     (match_dup 3))
 	   (const_int 1))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsubaddpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V2DF")])
@@ -10356,16 +10328,14 @@
 ;; that it does and splitting it later allows the following to be recognized:
 ;;	a[i] = b[i] * c[i] + d[i];
 (define_insn "xop_pmacsww"
-  [(set (match_operand:V8HI 0 "register_operand" "=x,x")
+  [(set (match_operand:V8HI 0 "register_operand" "=x")
         (plus:V8HI
 	 (mult:V8HI
-	  (match_operand:V8HI 1 "nonimmediate_operand" "%x,m")
-	  (match_operand:V8HI 2 "nonimmediate_operand" "xm,x"))
-	 (match_operand:V8HI 3 "register_operand" "x,x")))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, false, 2, true)"
-  "@
-   vpmacsww\t{%3, %2, %1, %0|%0, %1, %2, %3}
-   vpmacsww\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+	  (match_operand:V8HI 1 "register_operand" "%x")
+	  (match_operand:V8HI 2 "nonimmediate_operand" "xm"))
+	 (match_operand:V8HI 3 "register_operand" "x")))]
+  "TARGET_XOP"
+  "vpmacsww\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "TI")])
 
@@ -10377,8 +10347,8 @@
 		    (match_operand:V8HI 2 "nonimmediate_operand" ""))
 	 (match_operand:V8HI 3 "nonimmediate_operand" "")))]
   "TARGET_XOP
-   && !ix86_fma4_valid_op_p (operands, insn, 4, false, 1, true)
-   && ix86_fma4_valid_op_p (operands, insn, 4, false, 2, true)
+   && MEM_P (operands[2])
+   && (MEM_P (operands[1]) || MEM_P (operands[3]))
    && !reg_mentioned_p (operands[0], operands[1])
    && !reg_mentioned_p (operands[0], operands[2])
    && !reg_mentioned_p (operands[0], operands[3])"
@@ -10391,15 +10361,13 @@
 })
 
 (define_insn "xop_pmacssww"
-  [(set (match_operand:V8HI 0 "register_operand" "=x,x")
+  [(set (match_operand:V8HI 0 "register_operand" "=x")
         (ss_plus:V8HI
-	 (mult:V8HI (match_operand:V8HI 1 "nonimmediate_operand" "%x,m")
-		    (match_operand:V8HI 2 "nonimmediate_operand" "xm,x"))
-	 (match_operand:V8HI 3 "register_operand" "x,x")))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, false, 1, true)"
-  "@
-   vpmacssww\t{%3, %2, %1, %0|%0, %1, %2, %3}
-   vpmacssww\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+	 (mult:V8HI (match_operand:V8HI 1 "register_operand" "%x")
+		    (match_operand:V8HI 2 "nonimmediate_operand" "xm"))
+	 (match_operand:V8HI 3 "register_operand" "x")))]
+  "TARGET_XOP"
+  "vpmacssww\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "TI")])
 
@@ -10408,16 +10376,14 @@
 ;; that it does and splitting it later allows the following to be recognized:
 ;;	a[i] = b[i] * c[i] + d[i];
 (define_insn "xop_pmacsdd"
-  [(set (match_operand:V4SI 0 "register_operand" "=x,x")
+  [(set (match_operand:V4SI 0 "register_operand" "=x")
         (plus:V4SI
 	 (mult:V4SI
-	  (match_operand:V4SI 1 "nonimmediate_operand" "%x,m")
-	  (match_operand:V4SI 2 "nonimmediate_operand" "xm,x"))
-	 (match_operand:V4SI 3 "register_operand" "x,x")))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, false, 2, true)"
-  "@
-   vpmacsdd\t{%3, %2, %1, %0|%0, %1, %2, %3}
-   vpmacsdd\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+	  (match_operand:V4SI 1 "register_operand" "%x")
+	  (match_operand:V4SI 2 "nonimmediate_operand" "xm"))
+	 (match_operand:V4SI 3 "register_operand" "x")))]
+  "TARGET_XOP"
+  "vpmacsdd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "TI")])
 
@@ -10429,8 +10395,8 @@
 		    (match_operand:V4SI 2 "nonimmediate_operand" ""))
 	 (match_operand:V4SI 3 "nonimmediate_operand" "")))]
   "TARGET_XOP
-   && !ix86_fma4_valid_op_p (operands, insn, 4, false, 1, true)
-   && ix86_fma4_valid_op_p (operands, insn, 4, false, 2, true)
+   && MEM_P (operands[2])
+   && (MEM_P (operands[1]) || MEM_P (operands[3]))
    && !reg_mentioned_p (operands[0], operands[1])
    && !reg_mentioned_p (operands[0], operands[2])
    && !reg_mentioned_p (operands[0], operands[3])"
@@ -10443,99 +10409,91 @@
 })
 
 (define_insn "xop_pmacssdd"
-  [(set (match_operand:V4SI 0 "register_operand" "=x,x")
+  [(set (match_operand:V4SI 0 "register_operand" "=x")
         (ss_plus:V4SI
-	 (mult:V4SI (match_operand:V4SI 1 "nonimmediate_operand" "%x,m")
-		    (match_operand:V4SI 2 "nonimmediate_operand" "xm,x"))
-	 (match_operand:V4SI 3 "register_operand" "x,x")))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, false, 1, true)"
-  "@
-   vpmacssdd\t{%3, %2, %1, %0|%0, %1, %2, %3}
-   vpmacssdd\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+	 (mult:V4SI (match_operand:V4SI 1 "register_operand" "%x")
+		    (match_operand:V4SI 2 "nonimmediate_operand" "xm"))
+	 (match_operand:V4SI 3 "register_operand" "x")))]
+  "TARGET_XOP"
+  "vpmacssdd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "TI")])
 
 (define_insn "xop_pmacssdql"
-  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
 	(ss_plus:V2DI
 	 (mult:V2DI
 	  (sign_extend:V2DI
 	   (vec_select:V2SI
-	    (match_operand:V4SI 1 "nonimmediate_operand" "%x,m")
+	    (match_operand:V4SI 1 "register_operand" "%x")
 	    (parallel [(const_int 1)
 		       (const_int 3)])))
 	  (vec_select:V2SI
-	   (match_operand:V4SI 2 "nonimmediate_operand" "xm,x")
+	   (match_operand:V4SI 2 "nonimmediate_operand" "xm")
 	   (parallel [(const_int 1)
 		      (const_int 3)])))
-	 (match_operand:V2DI 3 "register_operand" "x,x")))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, false, 1, true)"
-  "@
-   vpmacssdql\t{%3, %2, %1, %0|%0, %1, %2, %3}
-   vpmacssdql\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+	 (match_operand:V2DI 3 "register_operand" "x")))]
+  "TARGET_XOP"
+  "vpmacssdql\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "TI")])
 
 (define_insn "xop_pmacssdqh"
-  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
 	(ss_plus:V2DI
 	 (mult:V2DI
 	  (sign_extend:V2DI
 	   (vec_select:V2SI
-	    (match_operand:V4SI 1 "nonimmediate_operand" "%x,m")
+	    (match_operand:V4SI 1 "register_operand" "%x")
 	    (parallel [(const_int 0)
 		       (const_int 2)])))
 	  (sign_extend:V2DI
 	   (vec_select:V2SI
-	    (match_operand:V4SI 2 "nonimmediate_operand" "xm,x")
+	    (match_operand:V4SI 2 "nonimmediate_operand" "xm")
 	    (parallel [(const_int 0)
 		       (const_int 2)]))))
-	 (match_operand:V2DI 3 "register_operand" "x,x")))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, false, 1, true)"
-  "@
-   vpmacssdqh\t{%3, %2, %1, %0|%0, %1, %2, %3}
-   vpmacssdqh\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+	 (match_operand:V2DI 3 "register_operand" "x")))]
+  "TARGET_XOP"
+  "vpmacssdqh\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "TI")])
 
 (define_insn "xop_pmacsdql"
-  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
 	(plus:V2DI
 	 (mult:V2DI
 	  (sign_extend:V2DI
 	   (vec_select:V2SI
-	    (match_operand:V4SI 1 "nonimmediate_operand" "%x,m")
+	    (match_operand:V4SI 1 "register_operand" "%x")
 	    (parallel [(const_int 1)
 		       (const_int 3)])))
 	  (sign_extend:V2DI
 	   (vec_select:V2SI
-	    (match_operand:V4SI 2 "nonimmediate_operand" "xm,x")
+	    (match_operand:V4SI 2 "nonimmediate_operand" "xm")
 	    (parallel [(const_int 1)
 		       (const_int 3)]))))
-	 (match_operand:V2DI 3 "register_operand" "x,x")))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, false, 1, true)"
-  "@
-   vpmacsdql\t{%3, %2, %1, %0|%0, %1, %2, %3}
-   vpmacsdql\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+	 (match_operand:V2DI 3 "register_operand" "x")))]
+  "TARGET_XOP"
+  "vpmacsdql\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "TI")])
 
 (define_insn_and_split "*xop_pmacsdql_mem"
-  [(set (match_operand:V2DI 0 "register_operand" "=&x,&x")
+  [(set (match_operand:V2DI 0 "register_operand" "=&x")
 	(plus:V2DI
 	 (mult:V2DI
 	  (sign_extend:V2DI
 	   (vec_select:V2SI
-	    (match_operand:V4SI 1 "nonimmediate_operand" "%x,m")
+	    (match_operand:V4SI 1 "register_operand" "%x")
 	    (parallel [(const_int 1)
 		       (const_int 3)])))
 	  (sign_extend:V2DI
 	   (vec_select:V2SI
-	    (match_operand:V4SI 2 "nonimmediate_operand" "xm,x")
+	    (match_operand:V4SI 2 "nonimmediate_operand" "xm")
 	    (parallel [(const_int 1)
 		       (const_int 3)]))))
-	 (match_operand:V2DI 3 "memory_operand" "m,m")))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, false, -1, true)"
+	 (match_operand:V2DI 3 "memory_operand" "m")))]
+  "TARGET_XOP"
   "#"
   "&& reload_completed"
   [(set (match_dup 0)
@@ -10564,7 +10522,7 @@
 	(mult:V2DI
 	  (sign_extend:V2DI
 	    (vec_select:V2SI
-	      (match_operand:V4SI 1 "nonimmediate_operand" "%x")
+	      (match_operand:V4SI 1 "register_operand" "%x")
 	      (parallel [(const_int 1)
 			 (const_int 3)])))
 	  (sign_extend:V2DI
@@ -10598,43 +10556,41 @@
    (set_attr "mode" "TI")])
 
 (define_insn "xop_pmacsdqh"
-  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
+  [(set (match_operand:V2DI 0 "register_operand" "=x")
 	(plus:V2DI
 	 (mult:V2DI
 	  (sign_extend:V2DI
 	   (vec_select:V2SI
-	    (match_operand:V4SI 1 "nonimmediate_operand" "%x,m")
+	    (match_operand:V4SI 1 "register_operand" "%x")
 	    (parallel [(const_int 0)
 		       (const_int 2)])))
 	  (sign_extend:V2DI
 	   (vec_select:V2SI
-	    (match_operand:V4SI 2 "nonimmediate_operand" "xm,x")
+	    (match_operand:V4SI 2 "nonimmediate_operand" "xm")
 	    (parallel [(const_int 0)
 		       (const_int 2)]))))
-	 (match_operand:V2DI 3 "register_operand" "x,x")))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, false, 1, true)"
-  "@
-   vpmacsdqh\t{%3, %2, %1, %0|%0, %1, %2, %3}
-   vpmacsdqh\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+	 (match_operand:V2DI 3 "register_operand" "x")))]
+  "TARGET_XOP"
+  "vpmacsdqh\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "TI")])
 
 (define_insn_and_split "*xop_pmacsdqh_mem"
-  [(set (match_operand:V2DI 0 "register_operand" "=&x,&x")
+  [(set (match_operand:V2DI 0 "register_operand" "=&x")
 	(plus:V2DI
 	 (mult:V2DI
 	  (sign_extend:V2DI
 	   (vec_select:V2SI
-	    (match_operand:V4SI 1 "nonimmediate_operand" "%x,m")
+	    (match_operand:V4SI 1 "register_operand" "%x")
 	    (parallel [(const_int 0)
 		       (const_int 2)])))
 	  (sign_extend:V2DI
 	   (vec_select:V2SI
-	    (match_operand:V4SI 2 "nonimmediate_operand" "xm,x")
+	    (match_operand:V4SI 2 "nonimmediate_operand" "xm")
 	    (parallel [(const_int 0)
 		       (const_int 2)]))))
-	 (match_operand:V2DI 3 "memory_operand" "m,m")))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, false, -1, true)"
+	 (match_operand:V2DI 3 "memory_operand" "m")))]
+  "TARGET_XOP"
   "#"
   "&& reload_completed"
   [(set (match_dup 0)
@@ -10663,7 +10619,7 @@
 	(mult:V2DI
 	  (sign_extend:V2DI
 	    (vec_select:V2SI
-	      (match_operand:V4SI 1 "nonimmediate_operand" "%x")
+	      (match_operand:V4SI 1 "register_operand" "%x")
 	      (parallel [(const_int 0)
 			 (const_int 2)])))
 	  (sign_extend:V2DI
@@ -10698,72 +10654,68 @@
 
 ;; XOP parallel integer multiply/add instructions for the intrinisics
 (define_insn "xop_pmacsswd"
-  [(set (match_operand:V4SI 0 "register_operand" "=x,x")
+  [(set (match_operand:V4SI 0 "register_operand" "=x")
 	(ss_plus:V4SI
 	 (mult:V4SI
 	  (sign_extend:V4SI
 	   (vec_select:V4HI
-	    (match_operand:V8HI 1 "nonimmediate_operand" "%x,m")
+	    (match_operand:V8HI 1 "register_operand" "%x")
 	    (parallel [(const_int 1)
 		       (const_int 3)
 		       (const_int 5)
 		       (const_int 7)])))
 	  (sign_extend:V4SI
 	   (vec_select:V4HI
-	    (match_operand:V8HI 2 "nonimmediate_operand" "xm,x")
+	    (match_operand:V8HI 2 "nonimmediate_operand" "xm")
 	    (parallel [(const_int 1)
 		       (const_int 3)
 		       (const_int 5)
 		       (const_int 7)]))))
-	 (match_operand:V4SI 3 "register_operand" "x,x")))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, false, 1, true)"
-  "@
-   vpmacsswd\t{%3, %2, %1, %0|%0, %1, %2, %3}
-   vpmacsswd\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+	 (match_operand:V4SI 3 "register_operand" "x")))]
+  "TARGET_XOP"
+  "vpmacsswd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "TI")])
 
 (define_insn "xop_pmacswd"
-  [(set (match_operand:V4SI 0 "register_operand" "=x,x")
+  [(set (match_operand:V4SI 0 "register_operand" "=x")
 	(plus:V4SI
 	 (mult:V4SI
 	  (sign_extend:V4SI
 	   (vec_select:V4HI
-	    (match_operand:V8HI 1 "nonimmediate_operand" "%x,m")
+	    (match_operand:V8HI 1 "register_operand" "%x")
 	    (parallel [(const_int 1)
 		       (const_int 3)
 		       (const_int 5)
 		       (const_int 7)])))
 	  (sign_extend:V4SI
 	   (vec_select:V4HI
-	    (match_operand:V8HI 2 "nonimmediate_operand" "xm,x")
+	    (match_operand:V8HI 2 "nonimmediate_operand" "xm")
 	    (parallel [(const_int 1)
 		       (const_int 3)
 		       (const_int 5)
 		       (const_int 7)]))))
-	 (match_operand:V4SI 3 "register_operand" "x,x")))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, false, 1, true)"
-  "@
-   vpmacswd\t{%3, %2, %1, %0|%0, %1, %2, %3}
-   vpmacswd\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+	 (match_operand:V4SI 3 "register_operand" "x")))]
+  "TARGET_XOP"
+  "vpmacswd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "TI")])
 
 (define_insn "xop_pmadcsswd"
-  [(set (match_operand:V4SI 0 "register_operand" "=x,x")
+  [(set (match_operand:V4SI 0 "register_operand" "=x")
 	(ss_plus:V4SI
 	 (plus:V4SI
 	  (mult:V4SI
 	   (sign_extend:V4SI
 	    (vec_select:V4HI
-	     (match_operand:V8HI 1 "nonimmediate_operand" "%x,m")
+	     (match_operand:V8HI 1 "register_operand" "%x")
 	     (parallel [(const_int 0)
 			(const_int 2)
 			(const_int 4)
 			(const_int 6)])))
 	   (sign_extend:V4SI
 	    (vec_select:V4HI
-	     (match_operand:V8HI 2 "nonimmediate_operand" "xm,x")
+	     (match_operand:V8HI 2 "nonimmediate_operand" "xm")
 	     (parallel [(const_int 0)
 			(const_int 2)
 			(const_int 4)
@@ -10783,29 +10735,27 @@
 			(const_int 3)
 			(const_int 5)
 			(const_int 7)])))))
-	 (match_operand:V4SI 3 "register_operand" "x,x")))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, false, 1, true)"
-  "@
-   vpmadcsswd\t{%3, %2, %1, %0|%0, %1, %2, %3}
-   vpmadcsswd\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+	 (match_operand:V4SI 3 "register_operand" "x")))]
+  "TARGET_XOP"
+  "vpmadcsswd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "TI")])
 
 (define_insn "xop_pmadcswd"
-  [(set (match_operand:V4SI 0 "register_operand" "=x,x")
+  [(set (match_operand:V4SI 0 "register_operand" "=x")
 	(plus:V4SI
 	 (plus:V4SI
 	  (mult:V4SI
 	   (sign_extend:V4SI
 	    (vec_select:V4HI
-	     (match_operand:V8HI 1 "nonimmediate_operand" "%x,m")
+	     (match_operand:V8HI 1 "register_operand" "%x")
 	     (parallel [(const_int 0)
 			(const_int 2)
 			(const_int 4)
 			(const_int 6)])))
 	   (sign_extend:V4SI
 	    (vec_select:V4HI
-	     (match_operand:V8HI 2 "nonimmediate_operand" "xm,x")
+	     (match_operand:V8HI 2 "nonimmediate_operand" "xm")
 	     (parallel [(const_int 0)
 			(const_int 2)
 			(const_int 4)
@@ -10825,32 +10775,30 @@
 			(const_int 3)
 			(const_int 5)
 			(const_int 7)])))))
-	 (match_operand:V4SI 3 "register_operand" "x,x")))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, false, 1, true)"
-  "@
-   vpmadcswd\t{%3, %2, %1, %0|%0, %1, %2, %3}
-   vpmadcswd\t{%3, %1, %2, %0|%0, %2, %1, %3}"
+	 (match_operand:V4SI 3 "register_operand" "x")))]
+  "TARGET_XOP"
+  "vpmadcswd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "TI")])
 
 ;; XOP parallel XMM conditional moves
 (define_insn "xop_pcmov_<mode>"
-  [(set (match_operand:SSEMODE 0 "register_operand" "=x,x,x")
+  [(set (match_operand:SSEMODE 0 "register_operand" "=x,x")
 	(if_then_else:SSEMODE
-	  (match_operand:SSEMODE 3 "nonimmediate_operand" "x,x,m")
-	  (match_operand:SSEMODE 1 "vector_move_operand" "x,m,x")
-	  (match_operand:SSEMODE 2 "vector_move_operand" "xm,x,x")))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)"
+	  (match_operand:SSEMODE 3 "nonimmediate_operand" "x,m")
+	  (match_operand:SSEMODE 1 "vector_move_operand" "x,x")
+	  (match_operand:SSEMODE 2 "vector_move_operand" "xm,x")))]
+  "TARGET_XOP"
   "vpcmov\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "sse4arg")])
 
 (define_insn "xop_pcmov_<mode>256"
-  [(set (match_operand:AVX256MODE 0 "register_operand" "=x,x,x")
+  [(set (match_operand:AVX256MODE 0 "register_operand" "=x,x")
 	(if_then_else:AVX256MODE
-	  (match_operand:AVX256MODE 3 "nonimmediate_operand" "x,x,m")
-	  (match_operand:AVX256MODE 1 "vector_move_operand" "x,m,x")
-	  (match_operand:AVX256MODE 2 "vector_move_operand" "xm,x,x")))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)"
+	  (match_operand:AVX256MODE 3 "nonimmediate_operand" "x,m")
+	  (match_operand:AVX256MODE 1 "vector_move_operand" "x,x")
+	  (match_operand:AVX256MODE 2 "vector_move_operand" "xm,x")))]
+  "TARGET_XOP"
   "vpcmov\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "sse4arg")])
 
@@ -11296,53 +11244,53 @@
 
 ;; XOP permute instructions
 (define_insn "xop_pperm"
-  [(set (match_operand:V16QI 0 "register_operand" "=x,x,x")
+  [(set (match_operand:V16QI 0 "register_operand" "=x,x")
 	(unspec:V16QI
-	  [(match_operand:V16QI 1 "nonimmediate_operand" "x,x,m")
-	   (match_operand:V16QI 2 "nonimmediate_operand" "x,m,x")
-	   (match_operand:V16QI 3 "nonimmediate_operand" "xm,x,x")]
+	  [(match_operand:V16QI 1 "register_operand" "x,x")
+	   (match_operand:V16QI 2 "nonimmediate_operand" "x,m")
+	   (match_operand:V16QI 3 "nonimmediate_operand" "xm,x")]
 	  UNSPEC_XOP_PERMUTE))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)"
+  "TARGET_XOP && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vpperm\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "sse4arg")
    (set_attr "mode" "TI")])
 
 ;; XOP pack instructions that combine two vectors into a smaller vector
 (define_insn "xop_pperm_pack_v2di_v4si"
-  [(set (match_operand:V4SI 0 "register_operand" "=x,x,x")
+  [(set (match_operand:V4SI 0 "register_operand" "=x,x")
 	(vec_concat:V4SI
 	 (truncate:V2SI
-	  (match_operand:V2DI 1 "nonimmediate_operand" "x,x,m"))
+	  (match_operand:V2DI 1 "register_operand" "x,x"))
 	 (truncate:V2SI
-	  (match_operand:V2DI 2 "nonimmediate_operand" "x,m,x"))))
-   (use (match_operand:V16QI 3 "nonimmediate_operand" "xm,x,x"))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)"
+	  (match_operand:V2DI 2 "nonimmediate_operand" "x,m"))))
+   (use (match_operand:V16QI 3 "nonimmediate_operand" "xm,x"))]
+  "TARGET_XOP && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vpperm\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "sse4arg")
    (set_attr "mode" "TI")])
 
 (define_insn "xop_pperm_pack_v4si_v8hi"
-  [(set (match_operand:V8HI 0 "register_operand" "=x,x,x")
+  [(set (match_operand:V8HI 0 "register_operand" "=x,x")
 	(vec_concat:V8HI
 	 (truncate:V4HI
-	  (match_operand:V4SI 1 "nonimmediate_operand" "x,x,m"))
+	  (match_operand:V4SI 1 "register_operand" "x,x"))
 	 (truncate:V4HI
-	  (match_operand:V4SI 2 "nonimmediate_operand" "x,m,x"))))
-   (use (match_operand:V16QI 3 "nonimmediate_operand" "xm,x,x"))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)"
+	  (match_operand:V4SI 2 "nonimmediate_operand" "x,m"))))
+   (use (match_operand:V16QI 3 "nonimmediate_operand" "xm,x"))]
+  "TARGET_XOP && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vpperm\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "sse4arg")
    (set_attr "mode" "TI")])
 
 (define_insn "xop_pperm_pack_v8hi_v16qi"
-  [(set (match_operand:V16QI 0 "register_operand" "=x,x,x")
+  [(set (match_operand:V16QI 0 "register_operand" "=x,x")
 	(vec_concat:V16QI
 	 (truncate:V8QI
-	  (match_operand:V8HI 1 "nonimmediate_operand" "x,x,m"))
+	  (match_operand:V8HI 1 "register_operand" "x,x"))
 	 (truncate:V8QI
-	  (match_operand:V8HI 2 "nonimmediate_operand" "x,m,x"))))
-   (use (match_operand:V16QI 3 "nonimmediate_operand" "xm,x,x"))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)"
+	  (match_operand:V8HI 2 "nonimmediate_operand" "x,m"))))
+   (use (match_operand:V16QI 3 "nonimmediate_operand" "xm,x"))]
+  "TARGET_XOP && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vpperm\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "sse4arg")
    (set_attr "mode" "TI")])
@@ -11471,7 +11419,7 @@
 	 (rotatert:SSEMODE1248
 	  (match_dup 1)
 	  (neg:SSEMODE1248 (match_dup 2)))))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 3, true, 1, false)"
+  "TARGET_XOP && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
   "vprot<ssevecsize>\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "type" "sseishft")
    (set_attr "prefix_data16" "0")
@@ -11526,7 +11474,7 @@
 	 (ashiftrt:SSEMODE1248
 	  (match_dup 1)
 	  (neg:SSEMODE1248 (match_dup 2)))))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 3, true, 1, false)"
+  "TARGET_XOP && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
   "vpsha<ssevecsize>\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "type" "sseishft")
    (set_attr "prefix_data16" "0")
@@ -11545,7 +11493,7 @@
 	 (lshiftrt:SSEMODE1248
 	  (match_dup 1)
 	  (neg:SSEMODE1248 (match_dup 2)))))]
-  "TARGET_XOP && ix86_fma4_valid_op_p (operands, insn, 3, true, 1, false)"
+  "TARGET_XOP && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
   "vpshl<ssevecsize>\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "type" "sseishft")
    (set_attr "prefix_data16" "0")
-- 
1.6.0.4


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Vector permutation support for x86
  2009-12-02 20:04 Uros Bizjak
  2009-12-02 20:10 ` Sebastian Pop
@ 2009-12-02 21:22 ` Sebastian Pop
  2009-12-02 22:23   ` Sebastian Pop
  1 sibling, 1 reply; 45+ messages in thread
From: Sebastian Pop @ 2009-12-02 21:22 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: GCC Patches, Richard Henderson

[-- Attachment #1: Type: text/plain, Size: 476 bytes --]

2009/12/2 Uros Bizjak <ubizjak@gmail.com>:
> Hello!
>
> +         (match_operand:SSEMODEF4 1 "register_operand" "%x,x")
> +         (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,xm"))
> +        (match_operand:SSEMODEF4 3 "nonimmediate_operand" "xm,x")))]
>
>
> You can use only "x,m" in the operand 2, all-registers alternative is
> already matched in alternative 0.
>

Attached is the updated patch that contains these patterns.

Sebastian

[-- Attachment #2: 0002-Fix-FMA4-insns.patch --]
[-- Type: text/x-patch, Size: 31623 bytes --]

From 01a17b441d000210af0d8d27f0df87eced196128 Mon Sep 17 00:00:00 2001
From: Sebastian Pop <sebpop@gmail.com>
Date: Tue, 1 Dec 2009 13:05:57 -0600
Subject: [PATCH] Fix FMA4 insns.

2009-12-02  Sebastian Pop  <sebastian.pop@amd.com>

	* config/i386/sse.md (fma4_*): Remove alternative with operand 1
	matching a memory access.  Do not use ix86_fma4_valid_op_p.
---
 gcc/config/i386/sse.md |  292 ++++++++++++++++++++++--------------------------
 1 files changed, 132 insertions(+), 160 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 08a3b5b..0c6b77e 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1703,14 +1703,13 @@
 ;;	(set (reg3) (plus (reg2) (mem (addr3))))
 
 (define_insn "fma4_fmadd<mode>4256"
-  [(set (match_operand:FMA4MODEF4 0 "register_operand" "=x,x,x")
+  [(set (match_operand:FMA4MODEF4 0 "register_operand" "=x,x")
 	(plus:FMA4MODEF4
 	 (mult:FMA4MODEF4
-	  (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "x,x,xm")
-	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,xm,x"))
-	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x,x")))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+	  (match_operand:FMA4MODEF4 1 "register_operand" "%x,x")
+	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,m"))
+	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x")))]
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmadd<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -1740,14 +1739,13 @@
 ;; Floating multiply and subtract
 ;; Allow two memory operands the same as fmadd
 (define_insn "fma4_fmsub<mode>4256"
-  [(set (match_operand:FMA4MODEF4 0 "register_operand" "=x,x,x")
+  [(set (match_operand:FMA4MODEF4 0 "register_operand" "=x,x")
 	(minus:FMA4MODEF4
 	 (mult:FMA4MODEF4
-	  (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "x,x,xm")
-	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,xm,x"))
-	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x,x")))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+	  (match_operand:FMA4MODEF4 1 "register_operand" "%x,x")
+	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,m"))
+	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x")))]
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsub<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -1779,14 +1777,13 @@
 ;; Note operands are out of order to simplify call to ix86_fma4_valid_p
 ;; Allow two memory operands to help in optimizing.
 (define_insn "fma4_fnmadd<mode>4256"
-  [(set (match_operand:FMA4MODEF4 0 "register_operand" "=x,x,x")
+  [(set (match_operand:FMA4MODEF4 0 "register_operand" "=x,x")
 	(minus:FMA4MODEF4
-	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x,x")
+	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x")
 	 (mult:FMA4MODEF4
-	  (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "x,x,xm")
-	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,xm,x"))))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+	  (match_operand:FMA4MODEF4 1 "register_operand" "%x,x")
+	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,m"))))]
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmadd<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -1821,11 +1818,10 @@
 	(minus:FMA4MODEF4
 	 (mult:FMA4MODEF4
 	  (neg:FMA4MODEF4
-	   (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "x,x"))
-	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,xm"))
+	   (match_operand:FMA4MODEF4 1 "register_operand" "%x,x"))
+	  (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,m"))
 	 (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x")))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, false)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmsub<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -1855,14 +1851,13 @@
 
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 (define_insn "fma4_fmadd<mode>4"
-  [(set (match_operand:SSEMODEF4 0 "register_operand" "=x,x,x")
+  [(set (match_operand:SSEMODEF4 0 "register_operand" "=x,x")
 	(plus:SSEMODEF4
 	 (mult:SSEMODEF4
-	  (match_operand:SSEMODEF4 1 "nonimmediate_operand" "x,x,xm")
-	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,xm,x"))
-	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "xm,x,x")))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+	  (match_operand:SSEMODEF4 1 "register_operand" "%x,x")
+	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,m"))
+	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "xm,x")))]
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmadd<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -1897,13 +1892,12 @@
 	(vec_merge:SSEMODEF2P
 	 (plus:SSEMODEF2P
 	  (mult:SSEMODEF2P
-	   (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "x,x")
-	   (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm"))
+	   (match_operand:SSEMODEF2P 1 "register_operand" "%x,x")
+	   (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m"))
 	  (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))
 	 (match_dup 0)
 	 (const_int 1)))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmadd<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -1911,14 +1905,13 @@
 ;; Floating multiply and subtract
 ;; Allow two memory operands the same as fmadd
 (define_insn "fma4_fmsub<mode>4"
-  [(set (match_operand:SSEMODEF4 0 "register_operand" "=x,x,x")
+  [(set (match_operand:SSEMODEF4 0 "register_operand" "=x,x")
 	(minus:SSEMODEF4
 	 (mult:SSEMODEF4
-	  (match_operand:SSEMODEF4 1 "nonimmediate_operand" "x,x,xm")
-	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,xm,x"))
-	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "xm,x,x")))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+	  (match_operand:SSEMODEF4 1 "register_operand" "%x,x")
+	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,m"))
+	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "xm,x")))]
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsub<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -1953,13 +1946,12 @@
 	(vec_merge:SSEMODEF2P
 	 (minus:SSEMODEF2P
 	  (mult:SSEMODEF2P
-	   (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "x,x")
-	   (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm"))
+	   (match_operand:SSEMODEF2P 1 "register_operand" "%x,x")
+	   (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m"))
 	  (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))
 	 (match_dup 0)
 	 (const_int 1)))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsub<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -1969,14 +1961,13 @@
 ;; Note operands are out of order to simplify call to ix86_fma4_valid_p
 ;; Allow two memory operands to help in optimizing.
 (define_insn "fma4_fnmadd<mode>4"
-  [(set (match_operand:SSEMODEF4 0 "register_operand" "=x,x,x")
+  [(set (match_operand:SSEMODEF4 0 "register_operand" "=x,x")
 	(minus:SSEMODEF4
-	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "xm,x,x")
+	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "xm,x")
 	 (mult:SSEMODEF4
-	  (match_operand:SSEMODEF4 1 "nonimmediate_operand" "x,x,xm")
-	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,xm,x"))))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+	  (match_operand:SSEMODEF4 1 "register_operand" "%x,x")
+	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,m"))))]
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmadd<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2012,12 +2003,11 @@
 	 (minus:SSEMODEF2P
 	  (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x")
 	  (mult:SSEMODEF2P
-	   (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "x,x")
-	   (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm")))
+	   (match_operand:SSEMODEF2P 1 "register_operand" "%x,x")
+	   (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m")))
 	 (match_dup 0)
 	 (const_int 1)))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmadd<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2030,11 +2020,10 @@
 	(minus:SSEMODEF4
 	 (mult:SSEMODEF4
 	  (neg:SSEMODEF4
-	   (match_operand:SSEMODEF4 1 "nonimmediate_operand" "x,x"))
-	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,xm"))
+	   (match_operand:SSEMODEF4 1 "register_operand" "%x,x"))
+	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,m"))
 	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "xm,x")))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, false)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmsub<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2071,13 +2060,12 @@
 	 (minus:SSEMODEF2P
 	  (mult:SSEMODEF2P
 	   (neg:SSEMODEF2P
-	    (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "x,x"))
-	   (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm"))
+	    (match_operand:SSEMODEF2P 1 "register_operand" "%x,x"))
+	   (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m"))
 	  (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))
 	 (match_dup 0)
 	 (const_int 1)))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, false)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmsub<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2089,11 +2077,11 @@
 	(unspec:FMA4MODEF4
 	 [(plus:FMA4MODEF4
 	   (mult:FMA4MODEF4
-	    (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "x,x")
-	    (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,xm"))
+	    (match_operand:FMA4MODEF4 1 "register_operand" "%x,x")
+	    (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,m"))
 	   (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x"))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmadd<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2103,11 +2091,11 @@
 	(unspec:FMA4MODEF4
 	 [(minus:FMA4MODEF4
 	   (mult:FMA4MODEF4
-	    (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "x,x")
-	    (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,xm"))
+	    (match_operand:FMA4MODEF4 1 "register_operand" "%x,x")
+	    (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,m"))
 	   (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x"))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsub<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2118,10 +2106,10 @@
 	 [(minus:FMA4MODEF4
 	   (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x")
 	   (mult:FMA4MODEF4
-	    (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "x,x")
-	    (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,xm")))]
+	    (match_operand:FMA4MODEF4 1 "register_operand" "%x,x")
+	    (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,m")))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmadd<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2132,11 +2120,11 @@
 	 [(minus:FMA4MODEF4
 	   (mult:FMA4MODEF4
 	    (neg:FMA4MODEF4
-	     (match_operand:FMA4MODEF4 1 "nonimmediate_operand" "x,x"))
-	    (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,xm"))
+	     (match_operand:FMA4MODEF4 1 "register_operand" "%x,x"))
+	    (match_operand:FMA4MODEF4 2 "nonimmediate_operand" "x,m"))
 	   (match_operand:FMA4MODEF4 3 "nonimmediate_operand" "xm,x"))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmsub<fma4modesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2147,11 +2135,11 @@
 	(unspec:SSEMODEF2P
 	 [(plus:SSEMODEF2P
 	   (mult:SSEMODEF2P
-	    (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "x,x")
-	    (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm"))
+	    (match_operand:SSEMODEF2P 1 "register_operand" "%x,x")
+	    (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m"))
 	   (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmadd<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2161,11 +2149,11 @@
 	(unspec:SSEMODEF2P
 	 [(minus:SSEMODEF2P
 	   (mult:SSEMODEF2P
-	    (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "x,x")
-	    (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm"))
+	    (match_operand:SSEMODEF2P 1 "register_operand" "%x,x")
+	    (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m"))
 	   (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsub<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2176,10 +2164,10 @@
 	 [(minus:SSEMODEF2P
 	   (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x")
 	   (mult:SSEMODEF2P
-	    (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "x,x")
-	    (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm")))]
+	    (match_operand:SSEMODEF2P 1 "register_operand" "%x,x")
+	    (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m")))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmadd<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2190,11 +2178,11 @@
 	 [(minus:SSEMODEF2P
 	   (mult:SSEMODEF2P
 	    (neg:SSEMODEF2P
-	     (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "x,x"))
-	    (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm"))
+	     (match_operand:SSEMODEF2P 1 "register_operand" "%x,x"))
+	    (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m"))
 	   (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmsub<ssemodesuffixf4>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
@@ -2207,13 +2195,13 @@
 	 [(vec_merge:SSEMODEF2P
 	   (plus:SSEMODEF2P
 	    (mult:SSEMODEF2P
-	     (match_operand:SSEMODEF2P 1 "register_operand" "x,x")
-	     (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm"))
+	     (match_operand:SSEMODEF2P 1 "register_operand" "%x,x")
+	     (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m"))
 	    (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))
 	   (match_dup 0)
 	   (const_int 1))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmadd<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<ssescalarmode>")])
@@ -2224,13 +2212,13 @@
 	 [(vec_merge:SSEMODEF2P
 	   (minus:SSEMODEF2P
 	    (mult:SSEMODEF2P
-	     (match_operand:SSEMODEF2P 1 "register_operand" "x,x")
-	     (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm"))
+	     (match_operand:SSEMODEF2P 1 "register_operand" "%x,x")
+	     (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m"))
 	    (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))
 	   (match_dup 0)
 	   (const_int 1))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsub<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<ssescalarmode>")])
@@ -2242,12 +2230,12 @@
 	   (minus:SSEMODEF2P
 	    (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x")
 	    (mult:SSEMODEF2P
-	     (match_operand:SSEMODEF2P 1 "nonimmediate_operand" "x,x")
-	     (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm")))
+	     (match_operand:SSEMODEF2P 1 "register_operand" "%x,x")
+	     (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m")))
 	   (match_dup 0)
 	   (const_int 1))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmadd<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<ssescalarmode>")])
@@ -2259,13 +2247,13 @@
 	   (minus:SSEMODEF2P
 	    (mult:SSEMODEF2P
 	     (neg:SSEMODEF2P
-	      (match_operand:SSEMODEF2P 1 "register_operand" "x,x"))
-	     (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,xm"))
+	      (match_operand:SSEMODEF2P 1 "register_operand" "%x,x"))
+	     (match_operand:SSEMODEF2P 2 "nonimmediate_operand" "x,m"))
 	    (match_operand:SSEMODEF2P 3 "nonimmediate_operand" "xm,x"))
 	   (match_dup 0)
 	   (const_int 1))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4 && ix86_fma4_valid_op_p (operands, insn, 4, true, 1, false)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfnmsub<ssemodesuffixf2s>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<ssescalarmode>")])
@@ -2281,8 +2269,8 @@
 	(vec_merge:V8SF
 	  (plus:V8SF
 	    (mult:V8SF
-	      (match_operand:V8SF 1 "nonimmediate_operand" "x,x")
-	      (match_operand:V8SF 2 "nonimmediate_operand" "x,xm"))
+	      (match_operand:V8SF 1 "register_operand" "%x,x")
+	      (match_operand:V8SF 2 "nonimmediate_operand" "x,m"))
 	    (match_operand:V8SF 3 "nonimmediate_operand" "xm,x"))
 	  (minus:V8SF
 	    (mult:V8SF
@@ -2290,8 +2278,7 @@
 	      (match_dup 2))
 	    (match_dup 3))
 	  (const_int 170)))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmaddsubps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V8SF")])
@@ -2301,8 +2288,8 @@
 	(vec_merge:V4DF
 	  (plus:V4DF
 	    (mult:V4DF
-	      (match_operand:V4DF 1 "nonimmediate_operand" "x,x")
-	      (match_operand:V4DF 2 "nonimmediate_operand" "x,xm"))
+	      (match_operand:V4DF 1 "register_operand" "%x,x")
+	      (match_operand:V4DF 2 "nonimmediate_operand" "x,m"))
 	    (match_operand:V4DF 3 "nonimmediate_operand" "xm,x"))
 	  (minus:V4DF
 	    (mult:V4DF
@@ -2310,8 +2297,7 @@
 	      (match_dup 2))
 	    (match_dup 3))
 	  (const_int 10)))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmaddsubpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V4DF")])
@@ -2321,8 +2307,8 @@
 	(vec_merge:V4SF
 	  (plus:V4SF
 	    (mult:V4SF
-	      (match_operand:V4SF 1 "nonimmediate_operand" "x,x")
-	      (match_operand:V4SF 2 "nonimmediate_operand" "x,xm"))
+	      (match_operand:V4SF 1 "register_operand" "%x,x")
+	      (match_operand:V4SF 2 "nonimmediate_operand" "x,m"))
 	    (match_operand:V4SF 3 "nonimmediate_operand" "xm,x"))
 	  (minus:V4SF
 	    (mult:V4SF
@@ -2330,8 +2316,7 @@
 	      (match_dup 2))
 	    (match_dup 3))
 	  (const_int 10)))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmaddsubps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V4SF")])
@@ -2341,8 +2326,8 @@
 	(vec_merge:V2DF
 	  (plus:V2DF
 	    (mult:V2DF
-	      (match_operand:V2DF 1 "nonimmediate_operand" "x,x")
-	      (match_operand:V2DF 2 "nonimmediate_operand" "x,xm"))
+	      (match_operand:V2DF 1 "register_operand" "%x,x")
+	      (match_operand:V2DF 2 "nonimmediate_operand" "x,m"))
 	    (match_operand:V2DF 3 "nonimmediate_operand" "xm,x"))
 	  (minus:V2DF
 	    (mult:V2DF
@@ -2350,8 +2335,7 @@
 	      (match_dup 2))
 	    (match_dup 3))
 	  (const_int 2)))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmaddsubpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V2DF")])
@@ -2361,8 +2345,8 @@
 	(vec_merge:V8SF
 	  (plus:V8SF
 	    (mult:V8SF
-	      (match_operand:V8SF 1 "nonimmediate_operand" "x,x")
-	      (match_operand:V8SF 2 "nonimmediate_operand" "x,xm"))
+	      (match_operand:V8SF 1 "register_operand" "%x,x")
+	      (match_operand:V8SF 2 "nonimmediate_operand" "x,m"))
 	    (match_operand:V8SF 3 "nonimmediate_operand" "xm,x"))
 	  (minus:V8SF
 	    (mult:V8SF
@@ -2370,8 +2354,7 @@
 	      (match_dup 2))
 	    (match_dup 3))
 	  (const_int 85)))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsubaddps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V8SF")])
@@ -2381,8 +2364,8 @@
 	(vec_merge:V4DF
 	  (plus:V4DF
 	    (mult:V4DF
-	      (match_operand:V4DF 1 "nonimmediate_operand" "x,x")
-	      (match_operand:V4DF 2 "nonimmediate_operand" "x,xm"))
+	      (match_operand:V4DF 1 "register_operand" "%x,x")
+	      (match_operand:V4DF 2 "nonimmediate_operand" "x,m"))
 	    (match_operand:V4DF 3 "nonimmediate_operand" "xm,x"))
 	  (minus:V4DF
 	    (mult:V4DF
@@ -2390,8 +2373,7 @@
 	      (match_dup 2))
 	    (match_dup 3))
 	  (const_int 5)))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsubaddpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V4DF")])
@@ -2401,8 +2383,8 @@
 	(vec_merge:V4SF
 	  (plus:V4SF
 	    (mult:V4SF
-	      (match_operand:V4SF 1 "nonimmediate_operand" "x,x")
-	      (match_operand:V4SF 2 "nonimmediate_operand" "x,xm"))
+	      (match_operand:V4SF 1 "register_operand" "%x,x")
+	      (match_operand:V4SF 2 "nonimmediate_operand" "x,m"))
 	    (match_operand:V4SF 3 "nonimmediate_operand" "xm,x"))
 	  (minus:V4SF
 	    (mult:V4SF
@@ -2410,8 +2392,7 @@
 	      (match_dup 2))
 	    (match_dup 3))
 	  (const_int 5)))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsubaddps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V4SF")])
@@ -2421,8 +2402,8 @@
 	(vec_merge:V2DF
 	  (plus:V2DF
 	    (mult:V2DF
-	      (match_operand:V2DF 1 "nonimmediate_operand" "x,x")
-	      (match_operand:V2DF 2 "nonimmediate_operand" "x,xm"))
+	      (match_operand:V2DF 1 "register_operand" "%x,x")
+	      (match_operand:V2DF 2 "nonimmediate_operand" "x,m"))
 	    (match_operand:V2DF 3 "nonimmediate_operand" "xm,x"))
 	  (minus:V2DF
 	    (mult:V2DF
@@ -2430,8 +2411,7 @@
 	      (match_dup 2))
 	    (match_dup 3))
 	  (const_int 1)))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsubaddpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V2DF")])
@@ -2444,8 +2424,8 @@
 	 [(vec_merge:V8SF
 	   (plus:V8SF
 	     (mult:V8SF
-	       (match_operand:V8SF 1 "nonimmediate_operand" "x,x")
-	       (match_operand:V8SF 2 "nonimmediate_operand" "x,xm"))
+	       (match_operand:V8SF 1 "register_operand" "%x,x")
+	       (match_operand:V8SF 2 "nonimmediate_operand" "x,m"))
 	     (match_operand:V8SF 3 "nonimmediate_operand" "xm,x"))
 	   (minus:V8SF
 	     (mult:V8SF
@@ -2454,8 +2434,7 @@
 	     (match_dup 3))
 	   (const_int 170))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmaddsubps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V8SF")])
@@ -2466,8 +2445,8 @@
 	 [(vec_merge:V4DF
 	   (plus:V4DF
 	     (mult:V4DF
-	       (match_operand:V4DF 1 "nonimmediate_operand" "x,x")
-	       (match_operand:V4DF 2 "nonimmediate_operand" "x,xm"))
+	       (match_operand:V4DF 1 "register_operand" "%x,x")
+	       (match_operand:V4DF 2 "nonimmediate_operand" "x,m"))
 	     (match_operand:V4DF 3 "nonimmediate_operand" "xm,x"))
 	   (minus:V4DF
 	     (mult:V4DF
@@ -2476,8 +2455,7 @@
 	     (match_dup 3))
 	   (const_int 10))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmaddsubpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V4DF")])
@@ -2488,8 +2466,8 @@
 	 [(vec_merge:V4SF
 	   (plus:V4SF
 	     (mult:V4SF
-	       (match_operand:V4SF 1 "nonimmediate_operand" "x,x")
-	       (match_operand:V4SF 2 "nonimmediate_operand" "x,xm"))
+	       (match_operand:V4SF 1 "register_operand" "%x,x")
+	       (match_operand:V4SF 2 "nonimmediate_operand" "x,m"))
 	     (match_operand:V4SF 3 "nonimmediate_operand" "xm,x"))
 	   (minus:V4SF
 	     (mult:V4SF
@@ -2498,8 +2476,7 @@
 	     (match_dup 3))
 	   (const_int 10))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmaddsubps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V4SF")])
@@ -2510,8 +2487,8 @@
 	 [(vec_merge:V2DF
 	   (plus:V2DF
 	     (mult:V2DF
-	       (match_operand:V2DF 1 "nonimmediate_operand" "x,x")
-	       (match_operand:V2DF 2 "nonimmediate_operand" "x,xm"))
+	       (match_operand:V2DF 1 "register_operand" "%x,x")
+	       (match_operand:V2DF 2 "nonimmediate_operand" "x,m"))
 	     (match_operand:V2DF 3 "nonimmediate_operand" "xm,x"))
 	   (minus:V2DF
 	     (mult:V2DF
@@ -2520,8 +2497,7 @@
 	     (match_dup 3))
 	   (const_int 2))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmaddsubpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V2DF")])
@@ -2532,8 +2508,8 @@
 	 [(vec_merge:V8SF
 	   (plus:V8SF
 	     (mult:V8SF
-	       (match_operand:V8SF 1 "nonimmediate_operand" "x,x")
-	       (match_operand:V8SF 2 "nonimmediate_operand" "x,xm"))
+	       (match_operand:V8SF 1 "register_operand" "%x,x")
+	       (match_operand:V8SF 2 "nonimmediate_operand" "x,m"))
 	     (match_operand:V8SF 3 "nonimmediate_operand" "xm,x"))
 	   (minus:V8SF
 	     (mult:V8SF
@@ -2542,8 +2518,7 @@
 	     (match_dup 3))
 	   (const_int 85))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsubaddps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V8SF")])
@@ -2554,8 +2529,8 @@
 	 [(vec_merge:V4DF
 	   (plus:V4DF
 	     (mult:V4DF
-	       (match_operand:V4DF 1 "nonimmediate_operand" "x,x")
-	       (match_operand:V4DF 2 "nonimmediate_operand" "x,xm"))
+	       (match_operand:V4DF 1 "register_operand" "%x,x")
+	       (match_operand:V4DF 2 "nonimmediate_operand" "x,m"))
 	     (match_operand:V4DF 3 "nonimmediate_operand" "xm,x"))
 	   (minus:V4DF
 	     (mult:V4DF
@@ -2564,8 +2539,7 @@
 	     (match_dup 3))
 	   (const_int 5))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsubaddpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V4DF")])
@@ -2576,8 +2550,8 @@
 	 [(vec_merge:V4SF
 	   (plus:V4SF
 	     (mult:V4SF
-	       (match_operand:V4SF 1 "nonimmediate_operand" "x,x")
-	       (match_operand:V4SF 2 "nonimmediate_operand" "x,xm"))
+	       (match_operand:V4SF 1 "register_operand" "%x,x")
+	       (match_operand:V4SF 2 "nonimmediate_operand" "x,m"))
 	     (match_operand:V4SF 3 "nonimmediate_operand" "xm,x"))
 	   (minus:V4SF
 	     (mult:V4SF
@@ -2586,8 +2560,7 @@
 	     (match_dup 3))
 	   (const_int 5))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsubaddps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V4SF")])
@@ -2598,8 +2571,8 @@
 	 [(vec_merge:V2DF
 	   (plus:V2DF
 	     (mult:V2DF
-	       (match_operand:V2DF 1 "nonimmediate_operand" "x,x")
-	       (match_operand:V2DF 2 "nonimmediate_operand" "x,xm"))
+	       (match_operand:V2DF 1 "register_operand" "%x,x")
+	       (match_operand:V2DF 2 "nonimmediate_operand" "x,m"))
 	     (match_operand:V2DF 3 "nonimmediate_operand" "xm,x"))
 	   (minus:V2DF
 	     (mult:V2DF
@@ -2608,8 +2581,7 @@
 	     (match_dup 3))
 	   (const_int 1))]
 	 UNSPEC_FMA4_INTRINSIC))]
-  "TARGET_FMA4
-   && ix86_fma4_valid_op_p (operands, insn, 4, true, 2, true)"
+  "TARGET_FMA4 && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "vfmsubaddpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "V2DF")])
-- 
1.6.0.4


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Vector permutation support for x86
  2009-12-02 20:04 Uros Bizjak
@ 2009-12-02 20:10 ` Sebastian Pop
  2009-12-02 21:22 ` Sebastian Pop
  1 sibling, 0 replies; 45+ messages in thread
From: Sebastian Pop @ 2009-12-02 20:10 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: GCC Patches, Richard Henderson

2009/12/2 Uros Bizjak <ubizjak@gmail.com>:
> Hello!
>
> +         (match_operand:SSEMODEF4 1 "register_operand" "%x,x")
> +         (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,xm"))
> +        (match_operand:SSEMODEF4 3 "nonimmediate_operand" "xm,x")))]
>
>
> You can use only "x,m" in the operand 2, all-registers alternative is
> already matched in alternative 0.

Right!
I'm updating the patch set and will restart the bootstrap and test.

Sebastian

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Vector permutation support for x86
@ 2009-12-02 20:04 Uros Bizjak
  2009-12-02 20:10 ` Sebastian Pop
  2009-12-02 21:22 ` Sebastian Pop
  0 siblings, 2 replies; 45+ messages in thread
From: Uros Bizjak @ 2009-12-02 20:04 UTC (permalink / raw)
  To: GCC Patches; +Cc: Sebastian Pop, Richard Henderson

Hello!

+	  (match_operand:SSEMODEF4 1 "register_operand" "%x,x")
+	  (match_operand:SSEMODEF4 2 "nonimmediate_operand" "x,xm"))
+	 (match_operand:SSEMODEF4 3 "nonimmediate_operand" "xm,x")))]


You can use only "x,m" in the operand 2, all-registers alternative is 
already matched in alternative 0.

Uros.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Vector permutation support for x86
  2009-11-27 10:17 Uros Bizjak
@ 2009-11-27 16:02 ` Richard Henderson
  0 siblings, 0 replies; 45+ messages in thread
From: Richard Henderson @ 2009-11-27 16:02 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches

On 11/27/2009 02:14 AM, Uros Bizjak wrote:
> With longs and -mxop, I get substantial differences. With splitter:
>
> 	xorl	%eax, %eax
> .L2:
> 	vmovdqa	b(%rax), %xmm0
> 	vpshufd	$177, %xmm0, %xmm2
> 	vmovdqa	a(%rax), %xmm1
> 	vpmulld	%xmm1, %xmm2, %xmm2
> 	vphadddq	%xmm2, %xmm2
> 	vpsllq	$32, %xmm2, %xmm2
> 	vpmacsdql	%xmm2, %xmm1, %xmm0, %xmm1
> 	vmovdqa	%xmm1, x(%rax)
> 	vmovdqa	c(%rax), %xmm1
> 	vpshufd	$177, %xmm1, %xmm2
> 	vpmulld	%xmm0, %xmm2, %xmm2
> 	vphadddq	%xmm2, %xmm2
> 	vpsllq	$32, %xmm2, %xmm2
> 	vpmacsdql	%xmm2, %xmm0, %xmm1, %xmm0
> 	vmovdqa	%xmm0, y(%rax)
> 	addq	$16, %rax
> 	cmpq	$2048, %rax
> 	jne	.L2
> 	rep
> 	ret
>
> and with expander:
>
> 	vpxor	%xmm3, %xmm3, %xmm3
> 	xorl	%eax, %eax
> 	vmovdqa	%xmm3, %xmm2
> .L2:
> 	vmovdqa	b(%rax), %xmm0
> 	vpmacsdql	%xmm3, a(%rax), %xmm0, %xmm1
> 	vpmacsdql	%xmm2, c(%rax), %xmm0, %xmm0
> 	vmovdqa	%xmm1, x(%rax)
> 	vmovdqa	%xmm0, y(%rax)

Wow.  Something sure went wrong, we lost a lot of the important bits of 
the DImode multiplication.  What we've got left is mulsidi.


r~

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Vector permutation support for x86
@ 2009-11-27 10:17 Uros Bizjak
  2009-11-27 16:02 ` Richard Henderson
  0 siblings, 1 reply; 45+ messages in thread
From: Uros Bizjak @ 2009-11-27 10:17 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Henderson

Hello!

>>    I'm not sure when these were introduced; I probably would have made
>>    some objection at the time if I had noticed. However, I'll grant you
>>    that it'll be easy to notice if someone breaks this idiom. I don't
>>    agree that it's necessary, but since I can't prove it isn't I suppose
>>    I can revert the change to the define_expand for v16qi.


> I've committed the following, though I'm still not convinced.

> I think one could argue that this misses more cse opportunities
> than it finds.  Consider for instance:

>  x = a * b;
>  y = c * b;


> for mulv16qi, the expansion produces two sets of

>  sse2_punpckhbw (tmp, b, b)
>  sse2_punpcklbw (tmp, b, b)

which could be cse'd...

I have tried with the following testcase and attached patch that
changes all define_insn_and_split mult sequences to expanders.

--cut here--
char a[256], b[256], c[256], x[256], y[256];

void bar(void)
{
  int i;

  for (i=0; i<256; ++i)
    {
      x[i] = a[i] * b[i];
      y[i] = c[i] * b[i];
    }
}
--cut here--

The difference was:

> grep unpck vect-p.s | wc -l
20
> grep unpck vect.s | wc -l
22

Where vect-p.s was created with expanders. The full sequence had 45 vs 48 insns.

The same test with ints/longs showed no noticable differences.

With longs and -mxop, I get substantial differences. With splitter:

	xorl	%eax, %eax
.L2:
	vmovdqa	b(%rax), %xmm0
	vpshufd	$177, %xmm0, %xmm2
	vmovdqa	a(%rax), %xmm1
	vpmulld	%xmm1, %xmm2, %xmm2
	vphadddq	%xmm2, %xmm2
	vpsllq	$32, %xmm2, %xmm2
	vpmacsdql	%xmm2, %xmm1, %xmm0, %xmm1
	vmovdqa	%xmm1, x(%rax)
	vmovdqa	c(%rax), %xmm1
	vpshufd	$177, %xmm1, %xmm2
	vpmulld	%xmm0, %xmm2, %xmm2
	vphadddq	%xmm2, %xmm2
	vpsllq	$32, %xmm2, %xmm2
	vpmacsdql	%xmm2, %xmm0, %xmm1, %xmm0
	vmovdqa	%xmm0, y(%rax)
	addq	$16, %rax
	cmpq	$2048, %rax
	jne	.L2
	rep
	ret

and with expander:

	vpxor	%xmm3, %xmm3, %xmm3
	xorl	%eax, %eax
	vmovdqa	%xmm3, %xmm2
.L2:
	vmovdqa	b(%rax), %xmm0
	vpmacsdql	%xmm3, a(%rax), %xmm0, %xmm1
	vpmacsdql	%xmm2, c(%rax), %xmm0, %xmm0
	vmovdqa	%xmm1, x(%rax)
	vmovdqa	%xmm0, y(%rax)
	addq	$16, %rax
	cmpq	$2048, %rax
	jne	.L2
	rep
	ret

I will investigate this last issue a bit more.

Uros.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Vector permutation support for x86
  2009-11-26 20:57     ` Richard Henderson
@ 2009-11-26 23:24       ` Richard Henderson
  0 siblings, 0 replies; 45+ messages in thread
From: Richard Henderson @ 2009-11-26 23:24 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 802 bytes --]

On 11/26/2009 12:35 PM, Richard Henderson wrote:
> I'm not sure when these were introduced; I probably would have made
> some objection at the time if I had noticed. However, I'll grant you
> that it'll be easy to notice if someone breaks this idiom. I don't
> agree that it's necessary, but since I can't prove it isn't I suppose
> I can revert the change to the define_expand for v16qi.

I've committed the following, though I'm still not convinced.

I think one could argue that this misses more cse opportunities
than it finds.  Consider for instance:

   x = a * b;
   y = c * b;

for mulv16qi, the expansion produces two sets of

   sse2_punpckhbw (tmp, b, b)
   sse2_punpcklbw (tmp, b, b)

which could be cse'd.  For mulv2di, XOP, we'll generate two
identical pshufd's that could be cse'd.


r~

[-- Attachment #2: z --]
[-- Type: text/plain, Size: 5887 bytes --]

	* config/i386/sse.md (mulv16qi3): Change back from an expander
	to an insn-and-split.
	(mulv4si): Mention AVX not XOP for AVX exception.
	(*sse2_mulv4si3): Likewise.
	(mulv2di3): Use vmulld not vpmacsdd for XOP expansion.  Tidy.

testsuite/
        * gcc.target/i386/xop-imul64-vector.c: Look for vpmulld not vpmacsdd.


diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index b4bcc5f..12c5b17 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -5227,11 +5227,15 @@
    (set_attr "prefix_data16" "1")
    (set_attr "mode" "TI")])
 
-(define_expand "mulv16qi3"
+(define_insn_and_split "mulv16qi3"
   [(set (match_operand:V16QI 0 "register_operand" "")
 	(mult:V16QI (match_operand:V16QI 1 "register_operand" "")
 		    (match_operand:V16QI 2 "register_operand" "")))]
-  "TARGET_SSE2"
+  "TARGET_SSE2
+   && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
 {
   rtx t[6];
   int i;
@@ -5592,7 +5596,7 @@
 		   (match_operand:V4SI 2 "register_operand" "")))]
   "TARGET_SSE2"
 {
-  if (TARGET_SSE4_1 || TARGET_XOP)
+  if (TARGET_SSE4_1 || TARGET_AVX)
     ix86_fixup_binary_operands_no_copy (MULT, V4SImode, operands);
 })
 
@@ -5621,7 +5625,7 @@
   [(set (match_operand:V4SI 0 "register_operand" "")
 	(mult:V4SI (match_operand:V4SI 1 "register_operand" "")
 		   (match_operand:V4SI 2 "register_operand" "")))]
-  "TARGET_SSE2 && !TARGET_SSE4_1 && !TARGET_XOP
+  "TARGET_SSE2 && !TARGET_SSE4_1 && !TARGET_AVX
    && can_create_pseudo_p ()"
   "#"
   "&& 1"
@@ -5683,17 +5687,20 @@
   rtx t1, t2, t3, t4, t5, t6, thirtytwo;
   rtx op0, op1, op2;
 
+  op0 = operands[0];
+  op1 = operands[1];
+  op2 = operands[2];
+
   if (TARGET_XOP)
     {
       /* op1: A,B,C,D, op2: E,F,G,H */
-      op0 = operands[0];
-      op1 = gen_lowpart (V4SImode, operands[1]);
-      op2 = gen_lowpart (V4SImode, operands[2]);
+      op1 = gen_lowpart (V4SImode, op1);
+      op2 = gen_lowpart (V4SImode, op2);
+
       t1 = gen_reg_rtx (V4SImode);
       t2 = gen_reg_rtx (V4SImode);
-      t3 = gen_reg_rtx (V4SImode);
+      t3 = gen_reg_rtx (V2DImode);
       t4 = gen_reg_rtx (V2DImode);
-      t5 = gen_reg_rtx (V2DImode);
 
       /* t1: B,A,D,C */
       emit_insn (gen_sse2_pshufd_1 (t1, op1,
@@ -5702,55 +5709,50 @@
 				    GEN_INT (3),
 				    GEN_INT (2)));
 
-      /* t2: 0 */
-      emit_move_insn (t2, CONST0_RTX (V4SImode));
-
-      /* t3: (B*E),(A*F),(D*G),(C*H) */
-      emit_insn (gen_xop_pmacsdd (t3, t1, op2, t2));
+      /* t2: (B*E),(A*F),(D*G),(C*H) */
+      emit_insn (gen_mulv4si3 (t2, t1, op2));
 
       /* t4: (B*E)+(A*F), (D*G)+(C*H) */
-      emit_insn (gen_xop_phadddq (t4, t3));
+      emit_insn (gen_xop_phadddq (t3, t2));
 
       /* t5: ((B*E)+(A*F))<<32, ((D*G)+(C*H))<<32 */
-      emit_insn (gen_ashlv2di3 (t5, t4, GEN_INT (32)));
+      emit_insn (gen_ashlv2di3 (t4, t3, GEN_INT (32)));
 
       /* op0: (((B*E)+(A*F))<<32)+(B*F), (((D*G)+(C*H))<<32)+(D*H) */
-      emit_insn (gen_xop_pmacsdql (op0, op1, op2, t5));
-      DONE;
+      emit_insn (gen_xop_pmacsdql (op0, op1, op2, t4));
+    }
+  else
+    {
+      t1 = gen_reg_rtx (V2DImode);
+      t2 = gen_reg_rtx (V2DImode);
+      t3 = gen_reg_rtx (V2DImode);
+      t4 = gen_reg_rtx (V2DImode);
+      t5 = gen_reg_rtx (V2DImode);
+      t6 = gen_reg_rtx (V2DImode);
+      thirtytwo = GEN_INT (32);
+
+      /* Multiply low parts.  */
+      emit_insn (gen_sse2_umulv2siv2di3 (t1, gen_lowpart (V4SImode, op1),
+				         gen_lowpart (V4SImode, op2)));
+
+      /* Shift input vectors left 32 bits so we can multiply high parts.  */
+      emit_insn (gen_lshrv2di3 (t2, op1, thirtytwo));
+      emit_insn (gen_lshrv2di3 (t3, op2, thirtytwo));
+
+      /* Multiply high parts by low parts.  */
+      emit_insn (gen_sse2_umulv2siv2di3 (t4, gen_lowpart (V4SImode, op1),
+					 gen_lowpart (V4SImode, t3)));
+      emit_insn (gen_sse2_umulv2siv2di3 (t5, gen_lowpart (V4SImode, op2),
+					 gen_lowpart (V4SImode, t2)));
+
+      /* Shift them back.  */
+      emit_insn (gen_ashlv2di3 (t4, t4, thirtytwo));
+      emit_insn (gen_ashlv2di3 (t5, t5, thirtytwo));
+
+      /* Add the three parts together.  */
+      emit_insn (gen_addv2di3 (t6, t1, t4));
+      emit_insn (gen_addv2di3 (op0, t6, t5));
     }
-
-  op0 = operands[0];
-  op1 = operands[1];
-  op2 = operands[2];
-  t1 = gen_reg_rtx (V2DImode);
-  t2 = gen_reg_rtx (V2DImode);
-  t3 = gen_reg_rtx (V2DImode);
-  t4 = gen_reg_rtx (V2DImode);
-  t5 = gen_reg_rtx (V2DImode);
-  t6 = gen_reg_rtx (V2DImode);
-  thirtytwo = GEN_INT (32);
-
-  /* Multiply low parts.  */
-  emit_insn (gen_sse2_umulv2siv2di3 (t1, gen_lowpart (V4SImode, op1),
-				     gen_lowpart (V4SImode, op2)));
-
-  /* Shift input vectors left 32 bits so we can multiply high parts.  */
-  emit_insn (gen_lshrv2di3 (t2, op1, thirtytwo));
-  emit_insn (gen_lshrv2di3 (t3, op2, thirtytwo));
-
-  /* Multiply high parts by low parts.  */
-  emit_insn (gen_sse2_umulv2siv2di3 (t4, gen_lowpart (V4SImode, op1),
-				     gen_lowpart (V4SImode, t3)));
-  emit_insn (gen_sse2_umulv2siv2di3 (t5, gen_lowpart (V4SImode, op2),
-				     gen_lowpart (V4SImode, t2)));
-
-  /* Shift them back.  */
-  emit_insn (gen_ashlv2di3 (t4, t4, thirtytwo));
-  emit_insn (gen_ashlv2di3 (t5, t5, thirtytwo));
-
-  /* Add the three parts together.  */
-  emit_insn (gen_addv2di3 (t6, t1, t4));
-  emit_insn (gen_addv2di3 (op0, t6, t5));
   DONE;
 })
 
diff --git a/gcc/testsuite/gcc.target/i386/xop-imul64-vector.c b/gcc/testsuite/gcc.target/i386/xop-imul64-vector.c
index 738cac0..382677e 100644
--- a/gcc/testsuite/gcc.target/i386/xop-imul64-vector.c
+++ b/gcc/testsuite/gcc.target/i386/xop-imul64-vector.c
@@ -31,6 +31,6 @@ int main ()
   exit (0);
 }
 
-/* { dg-final { scan-assembler "vpmacsdd" } } */
+/* { dg-final { scan-assembler "vpmulld" } } */
 /* { dg-final { scan-assembler "vphadddq" } } */
 /* { dg-final { scan-assembler "vpmacsdql" } } */

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Vector permutation support for x86
  2009-11-26 20:36   ` Uros Bizjak
@ 2009-11-26 20:57     ` Richard Henderson
  2009-11-26 23:24       ` Richard Henderson
  0 siblings, 1 reply; 45+ messages in thread
From: Richard Henderson @ 2009-11-26 20:57 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches

On 11/26/2009 12:21 PM, Uros Bizjak wrote:
> ... the splitting before reload was kept to help PR 33353 ...

Ah, a testcase that is now fully folded before rtl:

<bb 2>:
   tabs[0] = 0;
   tabs[1] = 2;
   tabs[2] = 4;
   tabs[3] = 6;
   tabs[4] = 8;
   tabs[5] = 10;
   tabs[6] = 12;
   tabs[7] = 14;
   g (&tabs[0]);

which is why I passed over it earlier.

>> I certainly don't like the can_create_pseudo_p in the insn predicate.
...
> It looks to me, that this approach is used generally through i386.md...

I'm not sure when these were introduced; I probably would have made
some objection at the time if I had noticed.  However, I'll grant you
that it'll be easy to notice if someone breaks this idiom.  I don't
agree that it's necessary, but since I can't prove it isn't I suppose
I can revert the change to the define_expand for v16qi.


r~

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Vector permutation support for x86
  2009-11-26 19:18 ` Richard Henderson
@ 2009-11-26 20:36   ` Uros Bizjak
  2009-11-26 20:57     ` Richard Henderson
  0 siblings, 1 reply; 45+ messages in thread
From: Uros Bizjak @ 2009-11-26 20:36 UTC (permalink / raw)
  To: Richard Henderson; +Cc: gcc-patches

On 11/26/2009 08:05 PM, Richard Henderson wrote:

> So can you elaborate on exactly what you think might fail by using 
> expanders instead of the splitters?

The core of the problem was actually in move_invariant_reg, which called 
force_operand on expanded multiplication sequence.  The first attempt to 
fix this problem was by target-dependant patch [1] that simply deferred 
expansion of multiplication sequence to the point of split1, until loop 
optimizers were finished.

After that, the correct patch was committed, that removed force_operand 
from move_invariant_reg [2], so [1] _should_ be redundant, but ...

... the splitting before reload was kept to help PR 33353. A testcase, 
attached to the PR shows that gcc is unable to simplify vector 
multiplication with all constant operands to a vector constant. This 
missed optimization PR shows the problem with sse4_1_mulv4si3, but the 
situation it is much worse when multiplication is expanded early into a 
long sequence of shifts, shuffles, etc... Using sse4_1_mulv4si3, gcc is 
able to determine compile time constants (but it doesn't acually use it 
for some strange reason), while by expanding early, there is no hope to 
fold the sequence to compile time constant.

So, the idea with late split was simply to not hide the multiplication 
from RTL optimizers.

[1] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26449#c14
[2] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26449#c21
[3] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33353

> I certainly don't like the can_create_pseudo_p in the insn predicate. 
> Given that can_create_pseudo_p is turned on and off during various 
> parts of the rtl optimizers, that really seems to be asking for 
> trouble if someone tries to re-recognize the instruction while the 
> flag is off...

It looks to me, that this approach is used generally through i386.md 
file to expand patterns late into the game by using split1 pass, so the 
true operation is not hidden from RTL optimizers. Examples are 
fix_trunc<mode>_fisttp_i387_1 and similar patterns that expand to a 
sequence of instructions.

Uros.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Vector permutation support for x86
  2009-11-26  9:14 Uros Bizjak
@ 2009-11-26 19:18 ` Richard Henderson
  2009-11-26 20:36   ` Uros Bizjak
  0 siblings, 1 reply; 45+ messages in thread
From: Richard Henderson @ 2009-11-26 19:18 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1631 bytes --]

On 11/26/2009 12:17 AM, Uros Bizjak wrote:
> Please leave this pattern in its "mult" form up to split1 pass, so
> combine [and other passes] can process it as a multiplication
> operator.

Why wouldn't this have been properly processed at the gimple level?

> Also, please see PR33329 and PR26449 for some troubles in this area,
> when these patterns were split early.

I examined all of the PR's linked from these. Only one of them actually 
had a test case related to multiplication; the others seemed to be 
related to builtin_vec_set. I couldn't get any of the test cases to 
fail, and I can't imagine why they would have.

I bootstrapped and tested with the additional patch below, which 
produces no failures except for

FAIL: gcc.target/i386/xop-imul64-vector.c scan-assembler vpmacsdd

which is correct; the vpmacsdd gets folded to vpmulld. Which goes on to 
point out that the XOP mulv2di expander should have used the plain 
multiply in the first place instead of generating the multiply-add with 
a zero addend.

> (This change is not documented in ChangeLog, it looks like an
> unintended change.)

The form of the insn_and_split looked so odd I thought it was a mistake. 
  Sorry for missing the change in the Changelog.

So can you elaborate on exactly what you think might fail by using 
expanders instead of the splitters?

I certainly don't like the can_create_pseudo_p in the insn predicate. 
Given that can_create_pseudo_p is turned on and off during various parts 
of the rtl optimizers, that really seems to be asking for trouble if 
someone tries to re-recognize the instruction while the flag is off...


r~

[-- Attachment #2: z --]
[-- Type: text/plain, Size: 1497 bytes --]

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index b4bcc5f..5fa4d70 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -5592,8 +5592,13 @@
 		   (match_operand:V4SI 2 "register_operand" "")))]
   "TARGET_SSE2"
 {
-  if (TARGET_SSE4_1 || TARGET_XOP)
+  if (TARGET_SSE4_1 || TARGET_AVX)
     ix86_fixup_binary_operands_no_copy (MULT, V4SImode, operands);
+  else
+    {
+      emit_insn (gen_sse2_mulv4si3 (operands[0], operands[1], operands[2]));
+      DONE;
+    }
 })
 
 (define_insn "*avx_mulv4si3"
@@ -5617,15 +5622,11 @@
    (set_attr "prefix_extra" "1")
    (set_attr "mode" "TI")])
 
-(define_insn_and_split "*sse2_mulv4si3"
+(define_expand "sse2_mulv4si3"
   [(set (match_operand:V4SI 0 "register_operand" "")
 	(mult:V4SI (match_operand:V4SI 1 "register_operand" "")
 		   (match_operand:V4SI 2 "register_operand" "")))]
-  "TARGET_SSE2 && !TARGET_SSE4_1 && !TARGET_XOP
-   && can_create_pseudo_p ()"
-  "#"
-  "&& 1"
-  [(const_int 0)]
+  "TARGET_SSE2"
 {
   rtx t1, t2, t3, t4, t5, t6, thirtytwo;
   rtx op0, op1, op2;
@@ -5670,15 +5671,11 @@
   DONE;
 })
 
-(define_insn_and_split "mulv2di3"
+(define_expand "mulv2di3"
   [(set (match_operand:V2DI 0 "register_operand" "")
 	(mult:V2DI (match_operand:V2DI 1 "register_operand" "")
 		   (match_operand:V2DI 2 "register_operand" "")))]
-  "TARGET_SSE2
-   && can_create_pseudo_p ()"
-  "#"
-  "&& 1"
-  [(const_int 0)]
+  "TARGET_SSE2"
 {
   rtx t1, t2, t3, t4, t5, t6, thirtytwo;
   rtx op0, op1, op2;

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Vector permutation support for x86
@ 2009-11-26  9:14 Uros Bizjak
  2009-11-26 19:18 ` Richard Henderson
  0 siblings, 1 reply; 45+ messages in thread
From: Uros Bizjak @ 2009-11-26  9:14 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Henderson

Hello!

> -(define_insn_and_split "mulv16qi3"
> +(define_expand "mulv16qi3"
>    [(set (match_operand:V16QI 0 "register_operand" "")
>  	(mult:V16QI (match_operand:V16QI 1 "register_operand" "")
>  		    (match_operand:V16QI 2 "register_operand" "")))]
> -  "TARGET_SSE2
> -   && can_create_pseudo_p ()"
> -  "#"
> -  "&& 1"
> -  [(const_int 0)]
> +  "TARGET_SSE2"

	(mulv16qi3, vec_pack_trunc_v8hi, vec_pack_trunc_v4si,
	vec_pack_trunc_v2di): Use ix86_expand_vec_extract_even_odd.

Please leave this pattern in its "mult" form up to split1 pass, so
combine [and other passes] can process it as a multiplication
operator. Also, please see PR33329 and PR26449 for some troubles in
this area, when these patterns were split early.

(This change is not documented in ChangeLog, it looks like an
unintended change.)

Uros.

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2010-11-23  1:56 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-11-26  3:55 Vector permutation support for x86 Richard Henderson
2009-11-27 23:54 ` H.J. Lu
2010-04-17  5:27   ` H.J. Lu
2010-11-23  8:42     ` H.J. Lu
2009-11-30 18:36 ` Sebastian Pop
2009-11-30 20:40   ` Richard Henderson
2009-11-30 21:07     ` Sebastian Pop
2009-12-02 19:53       ` Sebastian Pop
2009-11-26  9:14 Uros Bizjak
2009-11-26 19:18 ` Richard Henderson
2009-11-26 20:36   ` Uros Bizjak
2009-11-26 20:57     ` Richard Henderson
2009-11-26 23:24       ` Richard Henderson
2009-11-27 10:17 Uros Bizjak
2009-11-27 16:02 ` Richard Henderson
2009-12-02 20:04 Uros Bizjak
2009-12-02 20:10 ` Sebastian Pop
2009-12-02 21:22 ` Sebastian Pop
2009-12-02 22:23   ` Sebastian Pop
2009-12-02 22:38     ` Richard Henderson
2009-12-02 23:05       ` Sebastian Pop
2009-12-02 23:39         ` Sebastian Pop
2009-12-02 23:55         ` Richard Henderson
2009-12-03 19:53           ` Sebastian Pop
2009-12-04  6:50             ` Sebastian Pop
2009-12-04 16:31               ` Richard Henderson
2009-12-04 16:40                 ` Sebastian Pop
2009-12-05 17:19                   ` Sebastian Pop
2009-12-05 17:55                     ` Richard Henderson
2009-12-03 19:30       ` Sebastian Pop
2009-12-03 20:44         ` Richard Henderson
2009-12-03 23:37           ` Sebastian Pop
2009-12-04  5:21             ` Richard Henderson
2009-12-05 17:07         ` Uros Bizjak
2009-12-05 17:49           ` Sebastian Pop
2009-12-05 20:40             ` Sebastian Pop
2009-12-05 21:51               ` Sebastian Pop
2009-12-06  8:42                 ` Sebastian Pop
2009-12-06 12:20               ` Uros Bizjak
2009-12-07  0:21               ` Richard Henderson
2009-12-07 17:35                 ` Sebastian Pop
2009-12-07 18:28                   ` Sebastian Pop
2009-12-07 19:14                     ` Richard Henderson
2009-12-07 20:10                       ` Sebastian Pop
2009-12-07 22:02                         ` Richard Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).