public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* PATCH: Simplify SSE builtin handling
@ 2008-04-16 21:10 H.J. Lu
  2008-04-17  7:32 ` H.J. Lu
  0 siblings, 1 reply; 4+ messages in thread
From: H.J. Lu @ 2008-04-16 21:10 UTC (permalink / raw)
  To: GCC Patches, Meissner, Michael, Uros Bizjak

[-- Attachment #1: Type: text/plain, Size: 1255 bytes --]

Hi,

There are many special treatments for various SSE builtins. SSE5 introduced
ix86_expand_multi_arg_builtin, which simplified SSE5 builtin handling.  This
patch adds ix86_expand_sse_operands_builtin, which is very similar to
ix86_expand_multi_arg_builtin, but a little bit more flexible. It can handle
any SSE builtins without type mismatch. Each builtin function type is encoded
as "return_func_args" in enum.  OK for trunk?

Thanks.


H.J.
---
2008-04-16  H.J. Lu  <hongjiu.lu@intel.com>

        * config/i386/i386.c (sse_builtin_type): New.
        (bdesc_sse_args): Likewise.
        (bdesc_sse_3arg): Removed.
        (bdesc_2arg): Remove IX86_BUILTIN_AESKEYGENASSIST128.
        (bdesc_1arg): Remove IX86_BUILTIN_ROUNDPD and
        IX86_BUILTIN_ROUNDPS.
        (ix86_init_mmx_sse_builtins): Handle bdesc_sse_args.  Remove
        bdesc_sse_3arg.  Remove IX86_BUILTIN_ROUNDPD and
        IX86_BUILTIN_ROUNDPS.
        (ix86_expand_sse_4_operands_builtin): Removed.
        (ix86_expand_sse_operands_builtin): New.
        (ix86_expand_unop_builtin): Remove CODE_FOR_sse4_1_roundpd
        and CODE_FOR_sse4_1_roundps.
        (ix86_expand_builtin): Remove IX86_BUILTIN_AESKEYGENASSIST128.
        Handle bdesc_sse_args.  Remove bdesc_sse_3arg.

[-- Attachment #2: sse.txt --]
[-- Type: text/plain, Size: 16564 bytes --]

Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 2189)
+++ gcc/config/i386/i386.c	(working copy)
@@ -18196,31 +18196,56 @@ static const struct builtin_description 
   { OPTION_MASK_ISA_SSE4_2, CODE_FOR_sse4_2_crc32di, 0, IX86_BUILTIN_CRC32DI, UNKNOWN, 0 },
 };
 
-/* SSE builtins with 3 arguments and the last argument must be an immediate or xmm0.  */
-static const struct builtin_description bdesc_sse_3arg[] =
+/* SSE */
+enum sse_builtin_type
+{
+  sse_func_unknown,
+  v4sf_func_v4sf_int,
+  v2di_func_v2di_int,
+  v2df_func_v2df_int,
+  v16qi_func_v16qi_v16qi_v16qi,
+  v4sf_func_v4sf_v4sf_v4sf,
+  v2df_func_v2df_v2df_v2df,
+  v16qi_func_v16qi_v16qi_int,
+  v8hi_func_v8hi_v8hi_int,
+  v4si_func_v4si_v4si_int,
+  v4sf_func_v4sf_v4sf_int,
+  v2di_func_v2di_v2di_int,
+  v2df_func_v2df_v2df_int,
+};
+
+/* SSE builtins with variable number of arguments.  */
+static const struct builtin_description bdesc_sse_args[] =
 {
   /* SSE */
-  { OPTION_MASK_ISA_SSE, CODE_FOR_sse_shufps, "__builtin_ia32_shufps", IX86_BUILTIN_SHUFPS, UNKNOWN, 0 },
+  { OPTION_MASK_ISA_SSE, CODE_FOR_sse_shufps, "__builtin_ia32_shufps", IX86_BUILTIN_SHUFPS, UNKNOWN, (int) v4sf_func_v4sf_v4sf_int },
 
   /* SSE2 */
-  { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_shufpd, "__builtin_ia32_shufpd", IX86_BUILTIN_SHUFPD, UNKNOWN, 0 },
+  { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_shufpd, "__builtin_ia32_shufpd", IX86_BUILTIN_SHUFPD, UNKNOWN, (int) v2df_func_v2df_v2df_int },
 
   /* SSE4.1 */
-  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_blendpd, "__builtin_ia32_blendpd", IX86_BUILTIN_BLENDPD, UNKNOWN, 0 },
-  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_blendps, "__builtin_ia32_blendps", IX86_BUILTIN_BLENDPS, UNKNOWN, 0 },
-  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_blendvpd, "__builtin_ia32_blendvpd", IX86_BUILTIN_BLENDVPD, UNKNOWN, 0 },
-  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_blendvps, "__builtin_ia32_blendvps", IX86_BUILTIN_BLENDVPS, UNKNOWN, 0 },
-  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_dppd, "__builtin_ia32_dppd", IX86_BUILTIN_DPPD, UNKNOWN, 0 },
-  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_dpps, "__builtin_ia32_dpps", IX86_BUILTIN_DPPS, UNKNOWN, 0 },
-  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_insertps, "__builtin_ia32_insertps128", IX86_BUILTIN_INSERTPS128, UNKNOWN, 0 },
-  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_mpsadbw, "__builtin_ia32_mpsadbw128", IX86_BUILTIN_MPSADBW128, UNKNOWN, 0 },
-  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_pblendvb, "__builtin_ia32_pblendvb128", IX86_BUILTIN_PBLENDVB128, UNKNOWN, 0 },
-  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_pblendw, "__builtin_ia32_pblendw128", IX86_BUILTIN_PBLENDW128, UNKNOWN, 0 },
-  { OPTION_MASK_ISA_ROUND, CODE_FOR_sse4_1_roundsd, "__builtin_ia32_roundsd", IX86_BUILTIN_ROUNDSD, UNKNOWN, 0 },
-  { OPTION_MASK_ISA_ROUND, CODE_FOR_sse4_1_roundss, "__builtin_ia32_roundss", IX86_BUILTIN_ROUNDSS, UNKNOWN, 0 },
+  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_blendpd, "__builtin_ia32_blendpd", IX86_BUILTIN_BLENDPD, UNKNOWN, (int) v2df_func_v2df_v2df_int },
+  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_blendps, "__builtin_ia32_blendps", IX86_BUILTIN_BLENDPS, UNKNOWN, (int) v4sf_func_v4sf_v4sf_int },
+  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_blendvpd, "__builtin_ia32_blendvpd", IX86_BUILTIN_BLENDVPD, UNKNOWN, (int) v2df_func_v2df_v2df_v2df },
+  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_blendvps, "__builtin_ia32_blendvps", IX86_BUILTIN_BLENDVPS, UNKNOWN, (int) v4sf_func_v4sf_v4sf_v4sf },
+  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_dppd, "__builtin_ia32_dppd", IX86_BUILTIN_DPPD, UNKNOWN, (int) v2df_func_v2df_v2df_int },
+  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_dpps, "__builtin_ia32_dpps", IX86_BUILTIN_DPPS, UNKNOWN, (int) v4sf_func_v4sf_v4sf_int },
+  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_insertps, "__builtin_ia32_insertps128", IX86_BUILTIN_INSERTPS128, UNKNOWN, (int) v4sf_func_v4sf_v4sf_int },
+  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_mpsadbw, "__builtin_ia32_mpsadbw128", IX86_BUILTIN_MPSADBW128, UNKNOWN, (int) v16qi_func_v16qi_v16qi_int },
+  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_pblendvb, "__builtin_ia32_pblendvb128", IX86_BUILTIN_PBLENDVB128, UNKNOWN, (int) v16qi_func_v16qi_v16qi_v16qi },
+  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_pblendw, "__builtin_ia32_pblendw128", IX86_BUILTIN_PBLENDW128, UNKNOWN, (int) v8hi_func_v8hi_v8hi_int },
+
+  /* SSE4.1 and SSE5 */
+  { OPTION_MASK_ISA_ROUND, CODE_FOR_sse4_1_roundpd, "__builtin_ia32_roundpd", IX86_BUILTIN_ROUNDPD, UNKNOWN, (int) v2df_func_v2df_int },
+  { OPTION_MASK_ISA_ROUND, CODE_FOR_sse4_1_roundps, "__builtin_ia32_roundps", IX86_BUILTIN_ROUNDPS, UNKNOWN, (int) v4sf_func_v4sf_int },
+  { OPTION_MASK_ISA_ROUND, CODE_FOR_sse4_1_roundsd, "__builtin_ia32_roundsd", IX86_BUILTIN_ROUNDSD, UNKNOWN, (int) v2df_func_v2df_v2df_int },
+  { OPTION_MASK_ISA_ROUND, CODE_FOR_sse4_1_roundss, "__builtin_ia32_roundss", IX86_BUILTIN_ROUNDSS, UNKNOWN, (int) v4sf_func_v4sf_v4sf_int },
+
+  /* AES */
+  { OPTION_MASK_ISA_SSE2, CODE_FOR_aeskeygenassist, 0, IX86_BUILTIN_AESKEYGENASSIST128, UNKNOWN, (int) v2di_func_v2di_int },
 
   /* PCLMUL */
-  { OPTION_MASK_ISA_SSE2, CODE_FOR_pclmulqdq, 0, IX86_BUILTIN_PCLMULQDQ128, UNKNOWN, 0 },
+  { OPTION_MASK_ISA_SSE2, CODE_FOR_pclmulqdq, 0, IX86_BUILTIN_PCLMULQDQ128, UNKNOWN, (int) v2di_func_v2di_v2di_int },
 };
 
 static const struct builtin_description bdesc_2arg[] =
@@ -18507,7 +18532,6 @@ static const struct builtin_description 
   { OPTION_MASK_ISA_SSE2, CODE_FOR_aesenclast, 0, IX86_BUILTIN_AESENCLAST128, UNKNOWN, 0 },
   { OPTION_MASK_ISA_SSE2, CODE_FOR_aesdec, 0, IX86_BUILTIN_AESDEC128, UNKNOWN, 0 },
   { OPTION_MASK_ISA_SSE2, CODE_FOR_aesdeclast, 0, IX86_BUILTIN_AESDECLAST128, UNKNOWN, 0 },
-  { OPTION_MASK_ISA_SSE2, CODE_FOR_aeskeygenassist, 0, IX86_BUILTIN_AESKEYGENASSIST128, UNKNOWN, 0 },
 };
 
 static const struct builtin_description bdesc_1arg[] =
@@ -18582,10 +18606,6 @@ static const struct builtin_description 
   { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_zero_extendv2siv2di2, 0, IX86_BUILTIN_PMOVZXDQ128, UNKNOWN, 0 },
   { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_phminposuw, "__builtin_ia32_phminposuw128", IX86_BUILTIN_PHMINPOSUW128, UNKNOWN, 0 },
 
-  /* Fake 1 arg builtins with a constant smaller than 8 bits as the 2nd arg.  */
-  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_roundpd, 0, IX86_BUILTIN_ROUNDPD, UNKNOWN, 0 },
-  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_roundps, 0, IX86_BUILTIN_ROUNDPS, UNKNOWN, 0 },
-
   /* AES */
   { OPTION_MASK_ISA_SSE2, CODE_FOR_aesimc, 0, IX86_BUILTIN_AESIMC128, UNKNOWN, 0 },
 };
@@ -19387,61 +19407,58 @@ ix86_init_mmx_sse_builtins (void)
       def_builtin_const (OPTION_MASK_ISA_64BIT, "__builtin_copysignq", ftype, IX86_BUILTIN_COPYSIGNQ);
     }
 
-  /* Add all SSE builtins that are more or less simple operations on
-     three operands.  */
-  for (i = 0, d = bdesc_sse_3arg;
-       i < ARRAY_SIZE (bdesc_sse_3arg);
+  /* Add all SSE builtins with variable number of operands.  */
+  for (i = 0, d = bdesc_sse_args;
+       i < ARRAY_SIZE (bdesc_sse_args);
        i++, d++)
     {
-      /* Use one of the operands; the target can have a different mode for
-	 mask-generating compares.  */
-      enum machine_mode mode;
       tree type;
 
       if (d->name == 0)
 	continue;
-      mode = insn_data[d->icode].operand[1].mode;
 
-      switch (mode)
+      switch ((enum sse_builtin_type) d->flag)
 	{
-	case V16QImode:
+	case v4sf_func_v4sf_int:
+	  type = v4sf_ftype_v4sf_int;
+	  break;
+	case v2di_func_v2di_int:
+	  type = v2di_ftype_v2di_int;
+	  break;
+	case v2df_func_v2df_int:
+	  type = v2df_ftype_v2df_int;
+	  break;
+	case v16qi_func_v16qi_v16qi_v16qi:
+	  type = v16qi_ftype_v16qi_v16qi_v16qi;
+	  break;
+	case v4sf_func_v4sf_v4sf_v4sf:
+	  type = v4sf_ftype_v4sf_v4sf_v4sf;
+	  break;
+	case v2df_func_v2df_v2df_v2df:
+	  type = v2df_ftype_v2df_v2df_v2df;
+	  break;
+	case v16qi_func_v16qi_v16qi_int:
 	  type = v16qi_ftype_v16qi_v16qi_int;
 	  break;
-	case V8HImode:
+	case v8hi_func_v8hi_v8hi_int:
 	  type = v8hi_ftype_v8hi_v8hi_int;
 	  break;
-	case V4SImode:
+	case v4si_func_v4si_v4si_int:
 	  type = v4si_ftype_v4si_v4si_int;
 	  break;
-	case V2DImode:
+	case v4sf_func_v4sf_v4sf_int:
+	  type = v4sf_ftype_v4sf_v4sf_int;
+	  break;
+	case v2di_func_v2di_v2di_int:
 	  type = v2di_ftype_v2di_v2di_int;
 	  break;
-	case V2DFmode:
+	case v2df_func_v2df_v2df_int:
 	  type = v2df_ftype_v2df_v2df_int;
 	  break;
-	case V4SFmode:
-	  type = v4sf_ftype_v4sf_v4sf_int;
-	  break;
 	default:
 	  gcc_unreachable ();
 	}
 
-      /* Override for variable blends.  */
-      switch (d->icode)
-	{
-	case CODE_FOR_sse4_1_blendvpd:
-	  type = v2df_ftype_v2df_v2df_v2df;
-	  break;
-	case CODE_FOR_sse4_1_blendvps:
-	  type = v4sf_ftype_v4sf_v4sf_v4sf;
-	  break;
-	case CODE_FOR_sse4_1_pblendvb:
-	  type = v16qi_ftype_v16qi_v16qi_v16qi;
-	  break;
-	default:
-	  break;
-	}
-
       def_builtin_const (d->mask, d->name, type, d->code);
     }
 
@@ -19798,10 +19815,6 @@ ix86_init_mmx_sse_builtins (void)
   def_builtin_const (OPTION_MASK_ISA_SSE4_1, "__builtin_ia32_pmovzxdq128", v2di_ftype_v4si, IX86_BUILTIN_PMOVZXDQ128);
   def_builtin_const (OPTION_MASK_ISA_SSE4_1, "__builtin_ia32_pmuldq128", v2di_ftype_v4si_v4si, IX86_BUILTIN_PMULDQ128);
 
-  /* SSE4.1 and SSE5 */
-  def_builtin_const (OPTION_MASK_ISA_ROUND, "__builtin_ia32_roundpd", v2df_ftype_v2df_int, IX86_BUILTIN_ROUNDPD);
-  def_builtin_const (OPTION_MASK_ISA_ROUND, "__builtin_ia32_roundps", v4sf_ftype_v4sf_int, IX86_BUILTIN_ROUNDPS);
-
   /* SSE4.2. */
   ftype = build_function_type_list (unsigned_type_node,
 				    unsigned_type_node,
@@ -20019,71 +20032,128 @@ safe_vector_operand (rtx x, enum machine
 }
 
 /* Subroutine of ix86_expand_builtin to take care of SSE insns with
-   4 operands. The third argument must be a constant smaller than 8
-   bits or xmm0.  */
+   variable number of operands.  */
 
 static rtx
-ix86_expand_sse_4_operands_builtin (enum insn_code icode, tree exp,
-				    rtx target)
+ix86_expand_sse_operands_builtin (enum insn_code icode, tree exp,
+				  enum sse_builtin_type type,
+				  rtx target)
 {
   rtx pat;
-  tree arg0 = CALL_EXPR_ARG (exp, 0);
-  tree arg1 = CALL_EXPR_ARG (exp, 1);
-  tree arg2 = CALL_EXPR_ARG (exp, 2);
-  rtx op0 = expand_normal (arg0);
-  rtx op1 = expand_normal (arg1);
-  rtx op2 = expand_normal (arg2);
-  enum machine_mode tmode = insn_data[icode].operand[0].mode;
-  enum machine_mode mode1 = insn_data[icode].operand[1].mode;
-  enum machine_mode mode2 = insn_data[icode].operand[2].mode;
-  enum machine_mode mode3 = insn_data[icode].operand[3].mode;
+  int i, nargs;
+  int num_memory = 0;
+  struct
+    {
+      rtx op;
+      enum machine_mode mode;
+    } args[3];
+  bool last_arg_constant = false;
+  const struct insn_data *insn_p = &insn_data[icode];
+  enum machine_mode tmode = insn_p->operand[0].mode;
 
-  if (VECTOR_MODE_P (mode1))
-    op0 = safe_vector_operand (op0, mode1);
-  if (VECTOR_MODE_P (mode2))
-    op1 = safe_vector_operand (op1, mode2);
-  if (VECTOR_MODE_P (mode3))
-    op2 = safe_vector_operand (op2, mode3);
+  switch (type)
+    {
+    case v4sf_func_v4sf_int:
+    case v2di_func_v2di_int:
+    case v2df_func_v2df_int:
+      nargs = 2;
+      last_arg_constant = true;
+      break;
+    case v16qi_func_v16qi_v16qi_v16qi:
+    case v4sf_func_v4sf_v4sf_v4sf:
+    case v2df_func_v2df_v2df_v2df:
+      nargs = 3;
+      break;
+    case v16qi_func_v16qi_v16qi_int:
+    case v8hi_func_v8hi_v8hi_int:
+    case v4si_func_v4si_v4si_int:
+    case v4sf_func_v4sf_v4sf_int:
+    case v2di_func_v2di_v2di_int:
+    case v2df_func_v2df_v2df_int:
+      nargs = 3;
+      last_arg_constant = true;
+      break;
+    default:
+      gcc_unreachable ();
+    }
+
+  gcc_assert (nargs <= ARRAY_SIZE (args));
 
   if (optimize
       || target == 0
       || GET_MODE (target) != tmode
-      || ! (*insn_data[icode].operand[0].predicate) (target, tmode))
+      || ! (*insn_p->operand[0].predicate) (target, tmode))
     target = gen_reg_rtx (tmode);
 
-  if (! (*insn_data[icode].operand[1].predicate) (op0, mode1))
-    op0 = copy_to_mode_reg (mode1, op0);
-  if ((optimize && !register_operand (op1, mode2))
-      || !(*insn_data[icode].operand[2].predicate) (op1, mode2))
-    op1 = copy_to_mode_reg (mode2, op1);
+  for (i = 0; i < nargs; i++)
+    {
+      tree arg = CALL_EXPR_ARG (exp, i);
+      rtx op = expand_normal (arg);
+      enum machine_mode mode = insn_p->operand[i + 1].mode;
+      bool match = (*insn_p->operand[i + 1].predicate) (op, mode);
 
-  if (! (*insn_data[icode].operand[3].predicate) (op2, mode3))
-    switch (icode)
-      {
-      case CODE_FOR_sse4_1_blendvpd:
-      case CODE_FOR_sse4_1_blendvps:
-      case CODE_FOR_sse4_1_pblendvb:
-	op2 = copy_to_mode_reg (mode3, op2);
-	break;
+      if (last_arg_constant && (i + 1) == nargs)
+	{
+	  if (!match)
+	    switch (icode)
+	      {
+	      case CODE_FOR_sse4_1_roundpd:
+	      case CODE_FOR_sse4_1_roundps:
+	      case CODE_FOR_sse4_1_roundsd:
+	      case CODE_FOR_sse4_1_roundss:
+	      case CODE_FOR_sse4_1_blendps:
+		error ("the last argument must be a 4-bit immediate");
+		return const0_rtx;
+
+	      case CODE_FOR_sse4_1_blendpd:
+		error ("the last argument must be a 2-bit immediate");
+		return const0_rtx;
+
+	     default:
+		error ("the last argument must be an 8-bit immediate");
+		return const0_rtx;
+	      }
+	}
+      else
+	{
+	  if (VECTOR_MODE_P (mode))
+	    op = safe_vector_operand (op, mode);
 
-      case CODE_FOR_sse4_1_roundsd:
-      case CODE_FOR_sse4_1_roundss:
-      case CODE_FOR_sse4_1_blendps:
-	error ("the third argument must be a 4-bit immediate");
-	return const0_rtx;
-
-      case CODE_FOR_sse4_1_blendpd:
-	error ("the third argument must be a 2-bit immediate");
-	return const0_rtx;
+	  /* If we aren't optimizing, only allow one memory operand to
+	     be generated.  */
+	  if (memory_operand (op, mode))
+	    num_memory++;
 
-      default:
-	error ("the third argument must be an 8-bit immediate");
-	return const0_rtx;
-      }
+	  gcc_assert (GET_MODE (op) == mode
+		      || GET_MODE (op) == VOIDmode);
+
+	  if (optimize || !match || num_memory > 1)
+	    op = copy_to_mode_reg (mode, op);
+	}
+
+      args[i].op = op;
+      args[i].mode = mode;
+    }
+
+  switch (nargs)
+    {
+    case 1:
+      pat = GEN_FCN (icode) (target, args[0].op);
+      break;
+    case 2:
+      pat = GEN_FCN (icode) (target, args[0].op, args[1].op);
+      break;
+    case 3:
+      pat = GEN_FCN (icode) (target, args[0].op, args[1].op,
+			     args[2].op);
+      break;
+    default:
+      gcc_unreachable ();
+    }
 
-  pat = GEN_FCN (icode) (target, op0, op1, op2);
   if (! pat)
     return 0;
+
   emit_insn (pat);
   return target;
 }
@@ -20453,28 +20523,7 @@ ix86_expand_unop_builtin (enum insn_code
 	op0 = copy_to_mode_reg (mode0, op0);
     }
 
-  switch (icode)
-    {
-    case CODE_FOR_sse4_1_roundpd:
-    case CODE_FOR_sse4_1_roundps:
-	{
-	  tree arg1 = CALL_EXPR_ARG (exp, 1);
-	  rtx op1 = expand_normal (arg1);
-	  enum machine_mode mode1 = insn_data[icode].operand[2].mode;
-
-	  if (! (*insn_data[icode].operand[2].predicate) (op1, mode1))
-	    {
-	      error ("the second argument must be a 4-bit immediate");
-	      return const0_rtx;
-	    }
-	  pat = GEN_FCN (icode) (target, op0, op1);
-	}
-      break;
-    default:
-      pat = GEN_FCN (icode) (target, op0);
-      break;
-    }
-
+  pat = GEN_FCN (icode) (target, op0);
   if (! pat)
     return 0;
   emit_insn (pat);
@@ -21262,10 +21311,6 @@ ix86_expand_builtin (tree exp, rtx targe
 					     exp, target);
       break;
 
-    case IX86_BUILTIN_AESKEYGENASSIST128:
-      return ix86_expand_binop_imm_builtin (CODE_FOR_aeskeygenassist,
-					     exp, target);
-
     case IX86_BUILTIN_FEMMS:
       emit_insn (gen_mmx_femms ());
       return NULL_RTX;
@@ -21624,12 +21669,15 @@ ix86_expand_builtin (tree exp, rtx targe
       break;
     }
 
-  for (i = 0, d = bdesc_sse_3arg;
-       i < ARRAY_SIZE (bdesc_sse_3arg);
+  for (i = 0, d = bdesc_sse_args;
+       i < ARRAY_SIZE (bdesc_sse_args);
        i++, d++)
     if (d->code == fcode)
-      return ix86_expand_sse_4_operands_builtin (d->icode, exp,
-						 target);
+      {
+	enum sse_builtin_type type = (enum sse_builtin_type ) d->flag;
+	return ix86_expand_sse_operands_builtin (d->icode, exp,
+						 type, target);
+      }
 
   for (i = 0, d = bdesc_2arg; i < ARRAY_SIZE (bdesc_2arg); i++, d++)
     if (d->code == fcode)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: PATCH: Simplify SSE builtin handling
  2008-04-16 21:10 PATCH: Simplify SSE builtin handling H.J. Lu
@ 2008-04-17  7:32 ` H.J. Lu
       [not found]   ` <5787cf470804170046t66ef5a0dt10111c4f70abf8ef@mail.gmail.com>
  2008-04-17 19:04   ` Meissner, Michael
  0 siblings, 2 replies; 4+ messages in thread
From: H.J. Lu @ 2008-04-17  7:32 UTC (permalink / raw)
  To: GCC Patches, Meissner, Michael, Uros Bizjak

On Wed, Apr 16, 2008 at 11:51:21AM -0700, H.J. Lu wrote:
> Hi,
> 
> There are many special treatments for various SSE builtins. SSE5 introduced
> ix86_expand_multi_arg_builtin, which simplified SSE5 builtin handling.  This
> patch adds ix86_expand_sse_operands_builtin, which is very similar to
> ix86_expand_multi_arg_builtin, but a little bit more flexible. It can handle
> any SSE builtins without type mismatch. Each builtin function type is encoded
> as "return_func_args" in enum.  OK for trunk?
> 
> Thanks.
> 
> 

Here is the update which bootstraped on Linux/ia32 and Linux/x86-64.


H.J.
---
2008-04-16  H.J. Lu  <hongjiu.lu@intel.com>

	* config/i386/i386.c (sse_builtin_type): New.
	(bdesc_sse_args): Likewise.
	(bdesc_sse_3arg): Removed.
	(bdesc_2arg): Remove IX86_BUILTIN_AESKEYGENASSIST128.
	(bdesc_1arg): Remove IX86_BUILTIN_ROUNDPD and
	IX86_BUILTIN_ROUNDPS.
	(ix86_init_mmx_sse_builtins): Handle bdesc_sse_args.  Remove
	bdesc_sse_3arg.  Remove IX86_BUILTIN_ROUNDPD and
	IX86_BUILTIN_ROUNDPS.
	(ix86_expand_sse_4_operands_builtin): Removed.
	(ix86_expand_sse_operands_builtin): New.
	(ix86_expand_unop_builtin): Remove CODE_FOR_sse4_1_roundpd
	and CODE_FOR_sse4_1_roundps.
	(ix86_expand_builtin): Remove IX86_BUILTIN_AESKEYGENASSIST128.
	Handle bdesc_sse_args.  Remove bdesc_sse_3arg.

Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 2189)
+++ gcc/config/i386/i386.c	(working copy)
@@ -18196,31 +18196,56 @@ static const struct builtin_description 
   { OPTION_MASK_ISA_SSE4_2, CODE_FOR_sse4_2_crc32di, 0, IX86_BUILTIN_CRC32DI, UNKNOWN, 0 },
 };
 
-/* SSE builtins with 3 arguments and the last argument must be an immediate or xmm0.  */
-static const struct builtin_description bdesc_sse_3arg[] =
+/* SSE */
+enum sse_builtin_type
+{
+  sse_func_unknown,
+  v4sf_func_v4sf_int,
+  v2di_func_v2di_int,
+  v2df_func_v2df_int,
+  v16qi_func_v16qi_v16qi_v16qi,
+  v4sf_func_v4sf_v4sf_v4sf,
+  v2df_func_v2df_v2df_v2df,
+  v16qi_func_v16qi_v16qi_int,
+  v8hi_func_v8hi_v8hi_int,
+  v4si_func_v4si_v4si_int,
+  v4sf_func_v4sf_v4sf_int,
+  v2di_func_v2di_v2di_int,
+  v2df_func_v2df_v2df_int
+};
+
+/* SSE builtins with variable number of arguments.  */
+static const struct builtin_description bdesc_sse_args[] =
 {
   /* SSE */
-  { OPTION_MASK_ISA_SSE, CODE_FOR_sse_shufps, "__builtin_ia32_shufps", IX86_BUILTIN_SHUFPS, UNKNOWN, 0 },
+  { OPTION_MASK_ISA_SSE, CODE_FOR_sse_shufps, "__builtin_ia32_shufps", IX86_BUILTIN_SHUFPS, UNKNOWN, (int) v4sf_func_v4sf_v4sf_int },
 
   /* SSE2 */
-  { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_shufpd, "__builtin_ia32_shufpd", IX86_BUILTIN_SHUFPD, UNKNOWN, 0 },
+  { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_shufpd, "__builtin_ia32_shufpd", IX86_BUILTIN_SHUFPD, UNKNOWN, (int) v2df_func_v2df_v2df_int },
 
   /* SSE4.1 */
-  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_blendpd, "__builtin_ia32_blendpd", IX86_BUILTIN_BLENDPD, UNKNOWN, 0 },
-  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_blendps, "__builtin_ia32_blendps", IX86_BUILTIN_BLENDPS, UNKNOWN, 0 },
-  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_blendvpd, "__builtin_ia32_blendvpd", IX86_BUILTIN_BLENDVPD, UNKNOWN, 0 },
-  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_blendvps, "__builtin_ia32_blendvps", IX86_BUILTIN_BLENDVPS, UNKNOWN, 0 },
-  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_dppd, "__builtin_ia32_dppd", IX86_BUILTIN_DPPD, UNKNOWN, 0 },
-  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_dpps, "__builtin_ia32_dpps", IX86_BUILTIN_DPPS, UNKNOWN, 0 },
-  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_insertps, "__builtin_ia32_insertps128", IX86_BUILTIN_INSERTPS128, UNKNOWN, 0 },
-  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_mpsadbw, "__builtin_ia32_mpsadbw128", IX86_BUILTIN_MPSADBW128, UNKNOWN, 0 },
-  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_pblendvb, "__builtin_ia32_pblendvb128", IX86_BUILTIN_PBLENDVB128, UNKNOWN, 0 },
-  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_pblendw, "__builtin_ia32_pblendw128", IX86_BUILTIN_PBLENDW128, UNKNOWN, 0 },
-  { OPTION_MASK_ISA_ROUND, CODE_FOR_sse4_1_roundsd, "__builtin_ia32_roundsd", IX86_BUILTIN_ROUNDSD, UNKNOWN, 0 },
-  { OPTION_MASK_ISA_ROUND, CODE_FOR_sse4_1_roundss, "__builtin_ia32_roundss", IX86_BUILTIN_ROUNDSS, UNKNOWN, 0 },
+  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_blendpd, "__builtin_ia32_blendpd", IX86_BUILTIN_BLENDPD, UNKNOWN, (int) v2df_func_v2df_v2df_int },
+  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_blendps, "__builtin_ia32_blendps", IX86_BUILTIN_BLENDPS, UNKNOWN, (int) v4sf_func_v4sf_v4sf_int },
+  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_blendvpd, "__builtin_ia32_blendvpd", IX86_BUILTIN_BLENDVPD, UNKNOWN, (int) v2df_func_v2df_v2df_v2df },
+  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_blendvps, "__builtin_ia32_blendvps", IX86_BUILTIN_BLENDVPS, UNKNOWN, (int) v4sf_func_v4sf_v4sf_v4sf },
+  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_dppd, "__builtin_ia32_dppd", IX86_BUILTIN_DPPD, UNKNOWN, (int) v2df_func_v2df_v2df_int },
+  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_dpps, "__builtin_ia32_dpps", IX86_BUILTIN_DPPS, UNKNOWN, (int) v4sf_func_v4sf_v4sf_int },
+  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_insertps, "__builtin_ia32_insertps128", IX86_BUILTIN_INSERTPS128, UNKNOWN, (int) v4sf_func_v4sf_v4sf_int },
+  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_mpsadbw, "__builtin_ia32_mpsadbw128", IX86_BUILTIN_MPSADBW128, UNKNOWN, (int) v16qi_func_v16qi_v16qi_int },
+  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_pblendvb, "__builtin_ia32_pblendvb128", IX86_BUILTIN_PBLENDVB128, UNKNOWN, (int) v16qi_func_v16qi_v16qi_v16qi },
+  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_pblendw, "__builtin_ia32_pblendw128", IX86_BUILTIN_PBLENDW128, UNKNOWN, (int) v8hi_func_v8hi_v8hi_int },
+
+  /* SSE4.1 and SSE5 */
+  { OPTION_MASK_ISA_ROUND, CODE_FOR_sse4_1_roundpd, "__builtin_ia32_roundpd", IX86_BUILTIN_ROUNDPD, UNKNOWN, (int) v2df_func_v2df_int },
+  { OPTION_MASK_ISA_ROUND, CODE_FOR_sse4_1_roundps, "__builtin_ia32_roundps", IX86_BUILTIN_ROUNDPS, UNKNOWN, (int) v4sf_func_v4sf_int },
+  { OPTION_MASK_ISA_ROUND, CODE_FOR_sse4_1_roundsd, "__builtin_ia32_roundsd", IX86_BUILTIN_ROUNDSD, UNKNOWN, (int) v2df_func_v2df_v2df_int },
+  { OPTION_MASK_ISA_ROUND, CODE_FOR_sse4_1_roundss, "__builtin_ia32_roundss", IX86_BUILTIN_ROUNDSS, UNKNOWN, (int) v4sf_func_v4sf_v4sf_int },
+
+  /* AES */
+  { OPTION_MASK_ISA_SSE2, CODE_FOR_aeskeygenassist, 0, IX86_BUILTIN_AESKEYGENASSIST128, UNKNOWN, (int) v2di_func_v2di_int },
 
   /* PCLMUL */
-  { OPTION_MASK_ISA_SSE2, CODE_FOR_pclmulqdq, 0, IX86_BUILTIN_PCLMULQDQ128, UNKNOWN, 0 },
+  { OPTION_MASK_ISA_SSE2, CODE_FOR_pclmulqdq, 0, IX86_BUILTIN_PCLMULQDQ128, UNKNOWN, (int) v2di_func_v2di_v2di_int },
 };
 
 static const struct builtin_description bdesc_2arg[] =
@@ -18507,7 +18532,6 @@ static const struct builtin_description 
   { OPTION_MASK_ISA_SSE2, CODE_FOR_aesenclast, 0, IX86_BUILTIN_AESENCLAST128, UNKNOWN, 0 },
   { OPTION_MASK_ISA_SSE2, CODE_FOR_aesdec, 0, IX86_BUILTIN_AESDEC128, UNKNOWN, 0 },
   { OPTION_MASK_ISA_SSE2, CODE_FOR_aesdeclast, 0, IX86_BUILTIN_AESDECLAST128, UNKNOWN, 0 },
-  { OPTION_MASK_ISA_SSE2, CODE_FOR_aeskeygenassist, 0, IX86_BUILTIN_AESKEYGENASSIST128, UNKNOWN, 0 },
 };
 
 static const struct builtin_description bdesc_1arg[] =
@@ -18582,10 +18606,6 @@ static const struct builtin_description 
   { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_zero_extendv2siv2di2, 0, IX86_BUILTIN_PMOVZXDQ128, UNKNOWN, 0 },
   { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_phminposuw, "__builtin_ia32_phminposuw128", IX86_BUILTIN_PHMINPOSUW128, UNKNOWN, 0 },
 
-  /* Fake 1 arg builtins with a constant smaller than 8 bits as the 2nd arg.  */
-  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_roundpd, 0, IX86_BUILTIN_ROUNDPD, UNKNOWN, 0 },
-  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_roundps, 0, IX86_BUILTIN_ROUNDPS, UNKNOWN, 0 },
-
   /* AES */
   { OPTION_MASK_ISA_SSE2, CODE_FOR_aesimc, 0, IX86_BUILTIN_AESIMC128, UNKNOWN, 0 },
 };
@@ -19387,61 +19407,58 @@ ix86_init_mmx_sse_builtins (void)
       def_builtin_const (OPTION_MASK_ISA_64BIT, "__builtin_copysignq", ftype, IX86_BUILTIN_COPYSIGNQ);
     }
 
-  /* Add all SSE builtins that are more or less simple operations on
-     three operands.  */
-  for (i = 0, d = bdesc_sse_3arg;
-       i < ARRAY_SIZE (bdesc_sse_3arg);
+  /* Add all SSE builtins with variable number of operands.  */
+  for (i = 0, d = bdesc_sse_args;
+       i < ARRAY_SIZE (bdesc_sse_args);
        i++, d++)
     {
-      /* Use one of the operands; the target can have a different mode for
-	 mask-generating compares.  */
-      enum machine_mode mode;
       tree type;
 
       if (d->name == 0)
 	continue;
-      mode = insn_data[d->icode].operand[1].mode;
 
-      switch (mode)
+      switch ((enum sse_builtin_type) d->flag)
 	{
-	case V16QImode:
+	case v4sf_func_v4sf_int:
+	  type = v4sf_ftype_v4sf_int;
+	  break;
+	case v2di_func_v2di_int:
+	  type = v2di_ftype_v2di_int;
+	  break;
+	case v2df_func_v2df_int:
+	  type = v2df_ftype_v2df_int;
+	  break;
+	case v16qi_func_v16qi_v16qi_v16qi:
+	  type = v16qi_ftype_v16qi_v16qi_v16qi;
+	  break;
+	case v4sf_func_v4sf_v4sf_v4sf:
+	  type = v4sf_ftype_v4sf_v4sf_v4sf;
+	  break;
+	case v2df_func_v2df_v2df_v2df:
+	  type = v2df_ftype_v2df_v2df_v2df;
+	  break;
+	case v16qi_func_v16qi_v16qi_int:
 	  type = v16qi_ftype_v16qi_v16qi_int;
 	  break;
-	case V8HImode:
+	case v8hi_func_v8hi_v8hi_int:
 	  type = v8hi_ftype_v8hi_v8hi_int;
 	  break;
-	case V4SImode:
+	case v4si_func_v4si_v4si_int:
 	  type = v4si_ftype_v4si_v4si_int;
 	  break;
-	case V2DImode:
+	case v4sf_func_v4sf_v4sf_int:
+	  type = v4sf_ftype_v4sf_v4sf_int;
+	  break;
+	case v2di_func_v2di_v2di_int:
 	  type = v2di_ftype_v2di_v2di_int;
 	  break;
-	case V2DFmode:
+	case v2df_func_v2df_v2df_int:
 	  type = v2df_ftype_v2df_v2df_int;
 	  break;
-	case V4SFmode:
-	  type = v4sf_ftype_v4sf_v4sf_int;
-	  break;
 	default:
 	  gcc_unreachable ();
 	}
 
-      /* Override for variable blends.  */
-      switch (d->icode)
-	{
-	case CODE_FOR_sse4_1_blendvpd:
-	  type = v2df_ftype_v2df_v2df_v2df;
-	  break;
-	case CODE_FOR_sse4_1_blendvps:
-	  type = v4sf_ftype_v4sf_v4sf_v4sf;
-	  break;
-	case CODE_FOR_sse4_1_pblendvb:
-	  type = v16qi_ftype_v16qi_v16qi_v16qi;
-	  break;
-	default:
-	  break;
-	}
-
       def_builtin_const (d->mask, d->name, type, d->code);
     }
 
@@ -19798,10 +19815,6 @@ ix86_init_mmx_sse_builtins (void)
   def_builtin_const (OPTION_MASK_ISA_SSE4_1, "__builtin_ia32_pmovzxdq128", v2di_ftype_v4si, IX86_BUILTIN_PMOVZXDQ128);
   def_builtin_const (OPTION_MASK_ISA_SSE4_1, "__builtin_ia32_pmuldq128", v2di_ftype_v4si_v4si, IX86_BUILTIN_PMULDQ128);
 
-  /* SSE4.1 and SSE5 */
-  def_builtin_const (OPTION_MASK_ISA_ROUND, "__builtin_ia32_roundpd", v2df_ftype_v2df_int, IX86_BUILTIN_ROUNDPD);
-  def_builtin_const (OPTION_MASK_ISA_ROUND, "__builtin_ia32_roundps", v4sf_ftype_v4sf_int, IX86_BUILTIN_ROUNDPS);
-
   /* SSE4.2. */
   ftype = build_function_type_list (unsigned_type_node,
 				    unsigned_type_node,
@@ -20019,71 +20032,128 @@ safe_vector_operand (rtx x, enum machine
 }
 
 /* Subroutine of ix86_expand_builtin to take care of SSE insns with
-   4 operands. The third argument must be a constant smaller than 8
-   bits or xmm0.  */
+   variable number of operands.  */
 
 static rtx
-ix86_expand_sse_4_operands_builtin (enum insn_code icode, tree exp,
-				    rtx target)
+ix86_expand_sse_operands_builtin (enum insn_code icode, tree exp,
+				  enum sse_builtin_type type,
+				  rtx target)
 {
   rtx pat;
-  tree arg0 = CALL_EXPR_ARG (exp, 0);
-  tree arg1 = CALL_EXPR_ARG (exp, 1);
-  tree arg2 = CALL_EXPR_ARG (exp, 2);
-  rtx op0 = expand_normal (arg0);
-  rtx op1 = expand_normal (arg1);
-  rtx op2 = expand_normal (arg2);
-  enum machine_mode tmode = insn_data[icode].operand[0].mode;
-  enum machine_mode mode1 = insn_data[icode].operand[1].mode;
-  enum machine_mode mode2 = insn_data[icode].operand[2].mode;
-  enum machine_mode mode3 = insn_data[icode].operand[3].mode;
+  unsigned int i, nargs;
+  int num_memory = 0;
+  struct
+    {
+      rtx op;
+      enum machine_mode mode;
+    } args[3];
+  bool last_arg_constant = false;
+  const struct insn_data *insn_p = &insn_data[icode];
+  enum machine_mode tmode = insn_p->operand[0].mode;
 
-  if (VECTOR_MODE_P (mode1))
-    op0 = safe_vector_operand (op0, mode1);
-  if (VECTOR_MODE_P (mode2))
-    op1 = safe_vector_operand (op1, mode2);
-  if (VECTOR_MODE_P (mode3))
-    op2 = safe_vector_operand (op2, mode3);
+  switch (type)
+    {
+    case v4sf_func_v4sf_int:
+    case v2di_func_v2di_int:
+    case v2df_func_v2df_int:
+      nargs = 2;
+      last_arg_constant = true;
+      break;
+    case v16qi_func_v16qi_v16qi_v16qi:
+    case v4sf_func_v4sf_v4sf_v4sf:
+    case v2df_func_v2df_v2df_v2df:
+      nargs = 3;
+      break;
+    case v16qi_func_v16qi_v16qi_int:
+    case v8hi_func_v8hi_v8hi_int:
+    case v4si_func_v4si_v4si_int:
+    case v4sf_func_v4sf_v4sf_int:
+    case v2di_func_v2di_v2di_int:
+    case v2df_func_v2df_v2df_int:
+      nargs = 3;
+      last_arg_constant = true;
+      break;
+    default:
+      gcc_unreachable ();
+    }
+
+  gcc_assert (nargs <= ARRAY_SIZE (args));
 
   if (optimize
       || target == 0
       || GET_MODE (target) != tmode
-      || ! (*insn_data[icode].operand[0].predicate) (target, tmode))
+      || ! (*insn_p->operand[0].predicate) (target, tmode))
     target = gen_reg_rtx (tmode);
 
-  if (! (*insn_data[icode].operand[1].predicate) (op0, mode1))
-    op0 = copy_to_mode_reg (mode1, op0);
-  if ((optimize && !register_operand (op1, mode2))
-      || !(*insn_data[icode].operand[2].predicate) (op1, mode2))
-    op1 = copy_to_mode_reg (mode2, op1);
+  for (i = 0; i < nargs; i++)
+    {
+      tree arg = CALL_EXPR_ARG (exp, i);
+      rtx op = expand_normal (arg);
+      enum machine_mode mode = insn_p->operand[i + 1].mode;
+      bool match = (*insn_p->operand[i + 1].predicate) (op, mode);
 
-  if (! (*insn_data[icode].operand[3].predicate) (op2, mode3))
-    switch (icode)
-      {
-      case CODE_FOR_sse4_1_blendvpd:
-      case CODE_FOR_sse4_1_blendvps:
-      case CODE_FOR_sse4_1_pblendvb:
-	op2 = copy_to_mode_reg (mode3, op2);
-	break;
+      if (last_arg_constant && (i + 1) == nargs)
+	{
+	  if (!match)
+	    switch (icode)
+	      {
+	      case CODE_FOR_sse4_1_roundpd:
+	      case CODE_FOR_sse4_1_roundps:
+	      case CODE_FOR_sse4_1_roundsd:
+	      case CODE_FOR_sse4_1_roundss:
+	      case CODE_FOR_sse4_1_blendps:
+		error ("the last argument must be a 4-bit immediate");
+		return const0_rtx;
+
+	      case CODE_FOR_sse4_1_blendpd:
+		error ("the last argument must be a 2-bit immediate");
+		return const0_rtx;
+
+	     default:
+		error ("the last argument must be an 8-bit immediate");
+		return const0_rtx;
+	      }
+	}
+      else
+	{
+	  if (VECTOR_MODE_P (mode))
+	    op = safe_vector_operand (op, mode);
 
-      case CODE_FOR_sse4_1_roundsd:
-      case CODE_FOR_sse4_1_roundss:
-      case CODE_FOR_sse4_1_blendps:
-	error ("the third argument must be a 4-bit immediate");
-	return const0_rtx;
-
-      case CODE_FOR_sse4_1_blendpd:
-	error ("the third argument must be a 2-bit immediate");
-	return const0_rtx;
+	  /* If we aren't optimizing, only allow one memory operand to
+	     be generated.  */
+	  if (memory_operand (op, mode))
+	    num_memory++;
 
-      default:
-	error ("the third argument must be an 8-bit immediate");
-	return const0_rtx;
-      }
+	  gcc_assert (GET_MODE (op) == mode
+		      || GET_MODE (op) == VOIDmode);
+
+	  if (optimize || !match || num_memory > 1)
+	    op = copy_to_mode_reg (mode, op);
+	}
+
+      args[i].op = op;
+      args[i].mode = mode;
+    }
+
+  switch (nargs)
+    {
+    case 1:
+      pat = GEN_FCN (icode) (target, args[0].op);
+      break;
+    case 2:
+      pat = GEN_FCN (icode) (target, args[0].op, args[1].op);
+      break;
+    case 3:
+      pat = GEN_FCN (icode) (target, args[0].op, args[1].op,
+			     args[2].op);
+      break;
+    default:
+      gcc_unreachable ();
+    }
 
-  pat = GEN_FCN (icode) (target, op0, op1, op2);
   if (! pat)
     return 0;
+
   emit_insn (pat);
   return target;
 }
@@ -20453,28 +20523,7 @@ ix86_expand_unop_builtin (enum insn_code
 	op0 = copy_to_mode_reg (mode0, op0);
     }
 
-  switch (icode)
-    {
-    case CODE_FOR_sse4_1_roundpd:
-    case CODE_FOR_sse4_1_roundps:
-	{
-	  tree arg1 = CALL_EXPR_ARG (exp, 1);
-	  rtx op1 = expand_normal (arg1);
-	  enum machine_mode mode1 = insn_data[icode].operand[2].mode;
-
-	  if (! (*insn_data[icode].operand[2].predicate) (op1, mode1))
-	    {
-	      error ("the second argument must be a 4-bit immediate");
-	      return const0_rtx;
-	    }
-	  pat = GEN_FCN (icode) (target, op0, op1);
-	}
-      break;
-    default:
-      pat = GEN_FCN (icode) (target, op0);
-      break;
-    }
-
+  pat = GEN_FCN (icode) (target, op0);
   if (! pat)
     return 0;
   emit_insn (pat);
@@ -21262,10 +21311,6 @@ ix86_expand_builtin (tree exp, rtx targe
 					     exp, target);
       break;
 
-    case IX86_BUILTIN_AESKEYGENASSIST128:
-      return ix86_expand_binop_imm_builtin (CODE_FOR_aeskeygenassist,
-					     exp, target);
-
     case IX86_BUILTIN_FEMMS:
       emit_insn (gen_mmx_femms ());
       return NULL_RTX;
@@ -21624,12 +21669,15 @@ ix86_expand_builtin (tree exp, rtx targe
       break;
     }
 
-  for (i = 0, d = bdesc_sse_3arg;
-       i < ARRAY_SIZE (bdesc_sse_3arg);
+  for (i = 0, d = bdesc_sse_args;
+       i < ARRAY_SIZE (bdesc_sse_args);
        i++, d++)
     if (d->code == fcode)
-      return ix86_expand_sse_4_operands_builtin (d->icode, exp,
-						 target);
+      {
+	enum sse_builtin_type type = (enum sse_builtin_type ) d->flag;
+	return ix86_expand_sse_operands_builtin (d->icode, exp,
+						 type, target);
+      }
 
   for (i = 0, d = bdesc_2arg; i < ARRAY_SIZE (bdesc_2arg); i++, d++)
     if (d->code == fcode)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: PATCH: Simplify SSE builtin handling
       [not found]   ` <5787cf470804170046t66ef5a0dt10111c4f70abf8ef@mail.gmail.com>
@ 2008-04-17 15:21     ` H.J. Lu
  0 siblings, 0 replies; 4+ messages in thread
From: H.J. Lu @ 2008-04-17 15:21 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches

On Thu, Apr 17, 2008 at 09:46:41AM +0200, Uros Bizjak wrote:
> On Thu, Apr 17, 2008 at 7:28 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> 
> >  2008-04-16  H.J. Lu  <hongjiu.lu@intel.com>
> >
> >         * config/i386/i386.c (sse_builtin_type): New.
> >         (bdesc_sse_args): Likewise.
> >         (bdesc_sse_3arg): Removed.
> >         (bdesc_2arg): Remove IX86_BUILTIN_AESKEYGENASSIST128.
> >         (bdesc_1arg): Remove IX86_BUILTIN_ROUNDPD and
> >         IX86_BUILTIN_ROUNDPS.
> >         (ix86_init_mmx_sse_builtins): Handle bdesc_sse_args.  Remove
> >         bdesc_sse_3arg.  Remove IX86_BUILTIN_ROUNDPD and
> >         IX86_BUILTIN_ROUNDPS.
> >         (ix86_expand_sse_4_operands_builtin): Removed.
> >         (ix86_expand_sse_operands_builtin): New.
> >         (ix86_expand_unop_builtin): Remove CODE_FOR_sse4_1_roundpd
> >         and CODE_FOR_sse4_1_roundps.
> >         (ix86_expand_builtin): Remove IX86_BUILTIN_AESKEYGENASSIST128.
> >         Handle bdesc_sse_args.  Remove bdesc_sse_3arg.
> 
> OK for mainline, but:
> 
> >  +enum sse_builtin_type
> >  +{
> >  +  sse_func_unknown,
> >  +  v4sf_func_v4sf_int,
> >  +  v2di_func_v2di_int,
> >  +  v2df_func_v2df_int,
> >  +  v16qi_func_v16qi_v16qi_v16qi,
> >  +  v4sf_func_v4sf_v4sf_v4sf,
> >  +  v2df_func_v2df_v2df_v2df,
> >  +  v16qi_func_v16qi_v16qi_int,
> >  +  v8hi_func_v8hi_v8hi_int,
> >  +  v4si_func_v4si_v4si_int,
> >  +  v4sf_func_v4sf_v4sf_int,
> >  +  v2di_func_v2di_v2di_int,
> >  +  v2df_func_v2df_v2df_int
> >  +};
> 
> Please change these names o ALL_CAPS, since they are members of a enum.
> 

This is the patch I am checking in. I also used V4SF_FTYPE_V4SF_INT
so that it matches v4sf_ftype_v4sf_int.

Thanks.


H.J.
----
2008-04-16  H.J. Lu  <hongjiu.lu@intel.com>

	* config/i386/i386.c (sse_builtin_type): New.
	(bdesc_sse_args): Likewise.
	(bdesc_sse_3arg): Removed.
	(bdesc_2arg): Remove IX86_BUILTIN_AESKEYGENASSIST128.
	(bdesc_1arg): Remove IX86_BUILTIN_ROUNDPD and
	IX86_BUILTIN_ROUNDPS.
	(ix86_init_mmx_sse_builtins): Handle bdesc_sse_args.  Remove
	bdesc_sse_3arg.  Remove IX86_BUILTIN_ROUNDPD and
	IX86_BUILTIN_ROUNDPS.
	(ix86_expand_sse_4_operands_builtin): Removed.
	(ix86_expand_sse_operands_builtin): New.
	(ix86_expand_unop_builtin): Remove CODE_FOR_sse4_1_roundpd
	and CODE_FOR_sse4_1_roundps.
	(ix86_expand_builtin): Remove IX86_BUILTIN_AESKEYGENASSIST128.
	Handle bdesc_sse_args.  Remove bdesc_sse_3arg.

Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 2189)
+++ gcc/config/i386/i386.c	(working copy)
@@ -18196,31 +18196,56 @@ static const struct builtin_description 
   { OPTION_MASK_ISA_SSE4_2, CODE_FOR_sse4_2_crc32di, 0, IX86_BUILTIN_CRC32DI, UNKNOWN, 0 },
 };
 
-/* SSE builtins with 3 arguments and the last argument must be an immediate or xmm0.  */
-static const struct builtin_description bdesc_sse_3arg[] =
+/* SSE */
+enum sse_builtin_type
+{
+  SSE_CTYPE_UNKNOWN,
+  V4SF_FTYPE_V4SF_INT,
+  V2DI_FTYPE_V2DI_INT,
+  V2DF_FTYPE_V2DF_INT,
+  V16QI_FTYPE_V16QI_V16QI_V16QI,
+  V4SF_FTYPE_V4SF_V4SF_V4SF,
+  V2DF_FTYPE_V2DF_V2DF_V2DF,
+  V16QI_FTYPE_V16QI_V16QI_INT,
+  V8HI_FTYPE_V8HI_V8HI_INT,
+  V4SI_FTYPE_V4SI_V4SI_INT,
+  V4SF_FTYPE_V4SF_V4SF_INT,
+  V2DI_FTYPE_V2DI_V2DI_INT,
+  V2DF_FTYPE_V2DF_V2DF_INT
+};
+
+/* SSE builtins with variable number of arguments.  */
+static const struct builtin_description bdesc_sse_args[] =
 {
   /* SSE */
-  { OPTION_MASK_ISA_SSE, CODE_FOR_sse_shufps, "__builtin_ia32_shufps", IX86_BUILTIN_SHUFPS, UNKNOWN, 0 },
+  { OPTION_MASK_ISA_SSE, CODE_FOR_sse_shufps, "__builtin_ia32_shufps", IX86_BUILTIN_SHUFPS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT },
 
   /* SSE2 */
-  { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_shufpd, "__builtin_ia32_shufpd", IX86_BUILTIN_SHUFPD, UNKNOWN, 0 },
+  { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_shufpd, "__builtin_ia32_shufpd", IX86_BUILTIN_SHUFPD, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT },
 
   /* SSE4.1 */
-  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_blendpd, "__builtin_ia32_blendpd", IX86_BUILTIN_BLENDPD, UNKNOWN, 0 },
-  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_blendps, "__builtin_ia32_blendps", IX86_BUILTIN_BLENDPS, UNKNOWN, 0 },
-  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_blendvpd, "__builtin_ia32_blendvpd", IX86_BUILTIN_BLENDVPD, UNKNOWN, 0 },
-  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_blendvps, "__builtin_ia32_blendvps", IX86_BUILTIN_BLENDVPS, UNKNOWN, 0 },
-  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_dppd, "__builtin_ia32_dppd", IX86_BUILTIN_DPPD, UNKNOWN, 0 },
-  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_dpps, "__builtin_ia32_dpps", IX86_BUILTIN_DPPS, UNKNOWN, 0 },
-  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_insertps, "__builtin_ia32_insertps128", IX86_BUILTIN_INSERTPS128, UNKNOWN, 0 },
-  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_mpsadbw, "__builtin_ia32_mpsadbw128", IX86_BUILTIN_MPSADBW128, UNKNOWN, 0 },
-  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_pblendvb, "__builtin_ia32_pblendvb128", IX86_BUILTIN_PBLENDVB128, UNKNOWN, 0 },
-  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_pblendw, "__builtin_ia32_pblendw128", IX86_BUILTIN_PBLENDW128, UNKNOWN, 0 },
-  { OPTION_MASK_ISA_ROUND, CODE_FOR_sse4_1_roundsd, "__builtin_ia32_roundsd", IX86_BUILTIN_ROUNDSD, UNKNOWN, 0 },
-  { OPTION_MASK_ISA_ROUND, CODE_FOR_sse4_1_roundss, "__builtin_ia32_roundss", IX86_BUILTIN_ROUNDSS, UNKNOWN, 0 },
+  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_blendpd, "__builtin_ia32_blendpd", IX86_BUILTIN_BLENDPD, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT },
+  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_blendps, "__builtin_ia32_blendps", IX86_BUILTIN_BLENDPS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT },
+  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_blendvpd, "__builtin_ia32_blendvpd", IX86_BUILTIN_BLENDVPD, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF },
+  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_blendvps, "__builtin_ia32_blendvps", IX86_BUILTIN_BLENDVPS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF },
+  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_dppd, "__builtin_ia32_dppd", IX86_BUILTIN_DPPD, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT },
+  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_dpps, "__builtin_ia32_dpps", IX86_BUILTIN_DPPS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT },
+  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_insertps, "__builtin_ia32_insertps128", IX86_BUILTIN_INSERTPS128, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT },
+  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_mpsadbw, "__builtin_ia32_mpsadbw128", IX86_BUILTIN_MPSADBW128, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI_INT },
+  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_pblendvb, "__builtin_ia32_pblendvb128", IX86_BUILTIN_PBLENDVB128, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI_V16QI },
+  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_pblendw, "__builtin_ia32_pblendw128", IX86_BUILTIN_PBLENDW128, UNKNOWN, (int) V8HI_FTYPE_V8HI_V8HI_INT },
+
+  /* SSE4.1 and SSE5 */
+  { OPTION_MASK_ISA_ROUND, CODE_FOR_sse4_1_roundpd, "__builtin_ia32_roundpd", IX86_BUILTIN_ROUNDPD, UNKNOWN, (int) V2DF_FTYPE_V2DF_INT },
+  { OPTION_MASK_ISA_ROUND, CODE_FOR_sse4_1_roundps, "__builtin_ia32_roundps", IX86_BUILTIN_ROUNDPS, UNKNOWN, (int) V4SF_FTYPE_V4SF_INT },
+  { OPTION_MASK_ISA_ROUND, CODE_FOR_sse4_1_roundsd, "__builtin_ia32_roundsd", IX86_BUILTIN_ROUNDSD, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT },
+  { OPTION_MASK_ISA_ROUND, CODE_FOR_sse4_1_roundss, "__builtin_ia32_roundss", IX86_BUILTIN_ROUNDSS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT },
+
+  /* AES */
+  { OPTION_MASK_ISA_SSE2, CODE_FOR_aeskeygenassist, 0, IX86_BUILTIN_AESKEYGENASSIST128, UNKNOWN, (int) V2DI_FTYPE_V2DI_INT },
 
   /* PCLMUL */
-  { OPTION_MASK_ISA_SSE2, CODE_FOR_pclmulqdq, 0, IX86_BUILTIN_PCLMULQDQ128, UNKNOWN, 0 },
+  { OPTION_MASK_ISA_SSE2, CODE_FOR_pclmulqdq, 0, IX86_BUILTIN_PCLMULQDQ128, UNKNOWN, (int) V2DI_FTYPE_V2DI_V2DI_INT },
 };
 
 static const struct builtin_description bdesc_2arg[] =
@@ -18507,7 +18532,6 @@ static const struct builtin_description 
   { OPTION_MASK_ISA_SSE2, CODE_FOR_aesenclast, 0, IX86_BUILTIN_AESENCLAST128, UNKNOWN, 0 },
   { OPTION_MASK_ISA_SSE2, CODE_FOR_aesdec, 0, IX86_BUILTIN_AESDEC128, UNKNOWN, 0 },
   { OPTION_MASK_ISA_SSE2, CODE_FOR_aesdeclast, 0, IX86_BUILTIN_AESDECLAST128, UNKNOWN, 0 },
-  { OPTION_MASK_ISA_SSE2, CODE_FOR_aeskeygenassist, 0, IX86_BUILTIN_AESKEYGENASSIST128, UNKNOWN, 0 },
 };
 
 static const struct builtin_description bdesc_1arg[] =
@@ -18582,10 +18606,6 @@ static const struct builtin_description 
   { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_zero_extendv2siv2di2, 0, IX86_BUILTIN_PMOVZXDQ128, UNKNOWN, 0 },
   { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_phminposuw, "__builtin_ia32_phminposuw128", IX86_BUILTIN_PHMINPOSUW128, UNKNOWN, 0 },
 
-  /* Fake 1 arg builtins with a constant smaller than 8 bits as the 2nd arg.  */
-  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_roundpd, 0, IX86_BUILTIN_ROUNDPD, UNKNOWN, 0 },
-  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_roundps, 0, IX86_BUILTIN_ROUNDPS, UNKNOWN, 0 },
-
   /* AES */
   { OPTION_MASK_ISA_SSE2, CODE_FOR_aesimc, 0, IX86_BUILTIN_AESIMC128, UNKNOWN, 0 },
 };
@@ -19387,61 +19407,58 @@ ix86_init_mmx_sse_builtins (void)
       def_builtin_const (OPTION_MASK_ISA_64BIT, "__builtin_copysignq", ftype, IX86_BUILTIN_COPYSIGNQ);
     }
 
-  /* Add all SSE builtins that are more or less simple operations on
-     three operands.  */
-  for (i = 0, d = bdesc_sse_3arg;
-       i < ARRAY_SIZE (bdesc_sse_3arg);
+  /* Add all SSE builtins with variable number of operands.  */
+  for (i = 0, d = bdesc_sse_args;
+       i < ARRAY_SIZE (bdesc_sse_args);
        i++, d++)
     {
-      /* Use one of the operands; the target can have a different mode for
-	 mask-generating compares.  */
-      enum machine_mode mode;
       tree type;
 
       if (d->name == 0)
 	continue;
-      mode = insn_data[d->icode].operand[1].mode;
 
-      switch (mode)
+      switch ((enum sse_builtin_type) d->flag)
 	{
-	case V16QImode:
+	case V4SF_FTYPE_V4SF_INT:
+	  type = v4sf_ftype_v4sf_int;
+	  break;
+	case V2DI_FTYPE_V2DI_INT:
+	  type = v2di_ftype_v2di_int;
+	  break;
+	case V2DF_FTYPE_V2DF_INT:
+	  type = v2df_ftype_v2df_int;
+	  break;
+	case V16QI_FTYPE_V16QI_V16QI_V16QI:
+	  type = v16qi_ftype_v16qi_v16qi_v16qi;
+	  break;
+	case V4SF_FTYPE_V4SF_V4SF_V4SF:
+	  type = v4sf_ftype_v4sf_v4sf_v4sf;
+	  break;
+	case V2DF_FTYPE_V2DF_V2DF_V2DF:
+	  type = v2df_ftype_v2df_v2df_v2df;
+	  break;
+	case V16QI_FTYPE_V16QI_V16QI_INT:
 	  type = v16qi_ftype_v16qi_v16qi_int;
 	  break;
-	case V8HImode:
+	case V8HI_FTYPE_V8HI_V8HI_INT:
 	  type = v8hi_ftype_v8hi_v8hi_int;
 	  break;
-	case V4SImode:
+	case V4SI_FTYPE_V4SI_V4SI_INT:
 	  type = v4si_ftype_v4si_v4si_int;
 	  break;
-	case V2DImode:
+	case V4SF_FTYPE_V4SF_V4SF_INT:
+	  type = v4sf_ftype_v4sf_v4sf_int;
+	  break;
+	case V2DI_FTYPE_V2DI_V2DI_INT:
 	  type = v2di_ftype_v2di_v2di_int;
 	  break;
-	case V2DFmode:
+	case V2DF_FTYPE_V2DF_V2DF_INT:
 	  type = v2df_ftype_v2df_v2df_int;
 	  break;
-	case V4SFmode:
-	  type = v4sf_ftype_v4sf_v4sf_int;
-	  break;
 	default:
 	  gcc_unreachable ();
 	}
 
-      /* Override for variable blends.  */
-      switch (d->icode)
-	{
-	case CODE_FOR_sse4_1_blendvpd:
-	  type = v2df_ftype_v2df_v2df_v2df;
-	  break;
-	case CODE_FOR_sse4_1_blendvps:
-	  type = v4sf_ftype_v4sf_v4sf_v4sf;
-	  break;
-	case CODE_FOR_sse4_1_pblendvb:
-	  type = v16qi_ftype_v16qi_v16qi_v16qi;
-	  break;
-	default:
-	  break;
-	}
-
       def_builtin_const (d->mask, d->name, type, d->code);
     }
 
@@ -19798,10 +19815,6 @@ ix86_init_mmx_sse_builtins (void)
   def_builtin_const (OPTION_MASK_ISA_SSE4_1, "__builtin_ia32_pmovzxdq128", v2di_ftype_v4si, IX86_BUILTIN_PMOVZXDQ128);
   def_builtin_const (OPTION_MASK_ISA_SSE4_1, "__builtin_ia32_pmuldq128", v2di_ftype_v4si_v4si, IX86_BUILTIN_PMULDQ128);
 
-  /* SSE4.1 and SSE5 */
-  def_builtin_const (OPTION_MASK_ISA_ROUND, "__builtin_ia32_roundpd", v2df_ftype_v2df_int, IX86_BUILTIN_ROUNDPD);
-  def_builtin_const (OPTION_MASK_ISA_ROUND, "__builtin_ia32_roundps", v4sf_ftype_v4sf_int, IX86_BUILTIN_ROUNDPS);
-
   /* SSE4.2. */
   ftype = build_function_type_list (unsigned_type_node,
 				    unsigned_type_node,
@@ -20019,71 +20032,128 @@ safe_vector_operand (rtx x, enum machine
 }
 
 /* Subroutine of ix86_expand_builtin to take care of SSE insns with
-   4 operands. The third argument must be a constant smaller than 8
-   bits or xmm0.  */
+   variable number of operands.  */
 
 static rtx
-ix86_expand_sse_4_operands_builtin (enum insn_code icode, tree exp,
-				    rtx target)
+ix86_expand_sse_operands_builtin (enum insn_code icode, tree exp,
+				  enum sse_builtin_type type,
+				  rtx target)
 {
   rtx pat;
-  tree arg0 = CALL_EXPR_ARG (exp, 0);
-  tree arg1 = CALL_EXPR_ARG (exp, 1);
-  tree arg2 = CALL_EXPR_ARG (exp, 2);
-  rtx op0 = expand_normal (arg0);
-  rtx op1 = expand_normal (arg1);
-  rtx op2 = expand_normal (arg2);
-  enum machine_mode tmode = insn_data[icode].operand[0].mode;
-  enum machine_mode mode1 = insn_data[icode].operand[1].mode;
-  enum machine_mode mode2 = insn_data[icode].operand[2].mode;
-  enum machine_mode mode3 = insn_data[icode].operand[3].mode;
+  unsigned int i, nargs;
+  int num_memory = 0;
+  struct
+    {
+      rtx op;
+      enum machine_mode mode;
+    } args[3];
+  bool last_arg_constant = false;
+  const struct insn_data *insn_p = &insn_data[icode];
+  enum machine_mode tmode = insn_p->operand[0].mode;
 
-  if (VECTOR_MODE_P (mode1))
-    op0 = safe_vector_operand (op0, mode1);
-  if (VECTOR_MODE_P (mode2))
-    op1 = safe_vector_operand (op1, mode2);
-  if (VECTOR_MODE_P (mode3))
-    op2 = safe_vector_operand (op2, mode3);
+  switch (type)
+    {
+    case V4SF_FTYPE_V4SF_INT:
+    case V2DI_FTYPE_V2DI_INT:
+    case V2DF_FTYPE_V2DF_INT:
+      nargs = 2;
+      last_arg_constant = true;
+      break;
+    case V16QI_FTYPE_V16QI_V16QI_V16QI:
+    case V4SF_FTYPE_V4SF_V4SF_V4SF:
+    case V2DF_FTYPE_V2DF_V2DF_V2DF:
+      nargs = 3;
+      break;
+    case V16QI_FTYPE_V16QI_V16QI_INT:
+    case V8HI_FTYPE_V8HI_V8HI_INT:
+    case V4SI_FTYPE_V4SI_V4SI_INT:
+    case V4SF_FTYPE_V4SF_V4SF_INT:
+    case V2DI_FTYPE_V2DI_V2DI_INT:
+    case V2DF_FTYPE_V2DF_V2DF_INT:
+      nargs = 3;
+      last_arg_constant = true;
+      break;
+    default:
+      gcc_unreachable ();
+    }
+
+  gcc_assert (nargs <= ARRAY_SIZE (args));
 
   if (optimize
       || target == 0
       || GET_MODE (target) != tmode
-      || ! (*insn_data[icode].operand[0].predicate) (target, tmode))
+      || ! (*insn_p->operand[0].predicate) (target, tmode))
     target = gen_reg_rtx (tmode);
 
-  if (! (*insn_data[icode].operand[1].predicate) (op0, mode1))
-    op0 = copy_to_mode_reg (mode1, op0);
-  if ((optimize && !register_operand (op1, mode2))
-      || !(*insn_data[icode].operand[2].predicate) (op1, mode2))
-    op1 = copy_to_mode_reg (mode2, op1);
+  for (i = 0; i < nargs; i++)
+    {
+      tree arg = CALL_EXPR_ARG (exp, i);
+      rtx op = expand_normal (arg);
+      enum machine_mode mode = insn_p->operand[i + 1].mode;
+      bool match = (*insn_p->operand[i + 1].predicate) (op, mode);
 
-  if (! (*insn_data[icode].operand[3].predicate) (op2, mode3))
-    switch (icode)
-      {
-      case CODE_FOR_sse4_1_blendvpd:
-      case CODE_FOR_sse4_1_blendvps:
-      case CODE_FOR_sse4_1_pblendvb:
-	op2 = copy_to_mode_reg (mode3, op2);
-	break;
+      if (last_arg_constant && (i + 1) == nargs)
+	{
+	  if (!match)
+	    switch (icode)
+	      {
+	      case CODE_FOR_sse4_1_roundpd:
+	      case CODE_FOR_sse4_1_roundps:
+	      case CODE_FOR_sse4_1_roundsd:
+	      case CODE_FOR_sse4_1_roundss:
+	      case CODE_FOR_sse4_1_blendps:
+		error ("the last argument must be a 4-bit immediate");
+		return const0_rtx;
+
+	      case CODE_FOR_sse4_1_blendpd:
+		error ("the last argument must be a 2-bit immediate");
+		return const0_rtx;
+
+	     default:
+		error ("the last argument must be an 8-bit immediate");
+		return const0_rtx;
+	      }
+	}
+      else
+	{
+	  if (VECTOR_MODE_P (mode))
+	    op = safe_vector_operand (op, mode);
 
-      case CODE_FOR_sse4_1_roundsd:
-      case CODE_FOR_sse4_1_roundss:
-      case CODE_FOR_sse4_1_blendps:
-	error ("the third argument must be a 4-bit immediate");
-	return const0_rtx;
-
-      case CODE_FOR_sse4_1_blendpd:
-	error ("the third argument must be a 2-bit immediate");
-	return const0_rtx;
+	  /* If we aren't optimizing, only allow one memory operand to
+	     be generated.  */
+	  if (memory_operand (op, mode))
+	    num_memory++;
 
-      default:
-	error ("the third argument must be an 8-bit immediate");
-	return const0_rtx;
-      }
+	  gcc_assert (GET_MODE (op) == mode
+		      || GET_MODE (op) == VOIDmode);
+
+	  if (optimize || !match || num_memory > 1)
+	    op = copy_to_mode_reg (mode, op);
+	}
+
+      args[i].op = op;
+      args[i].mode = mode;
+    }
+
+  switch (nargs)
+    {
+    case 1:
+      pat = GEN_FCN (icode) (target, args[0].op);
+      break;
+    case 2:
+      pat = GEN_FCN (icode) (target, args[0].op, args[1].op);
+      break;
+    case 3:
+      pat = GEN_FCN (icode) (target, args[0].op, args[1].op,
+			     args[2].op);
+      break;
+    default:
+      gcc_unreachable ();
+    }
 
-  pat = GEN_FCN (icode) (target, op0, op1, op2);
   if (! pat)
     return 0;
+
   emit_insn (pat);
   return target;
 }
@@ -20453,28 +20523,7 @@ ix86_expand_unop_builtin (enum insn_code
 	op0 = copy_to_mode_reg (mode0, op0);
     }
 
-  switch (icode)
-    {
-    case CODE_FOR_sse4_1_roundpd:
-    case CODE_FOR_sse4_1_roundps:
-	{
-	  tree arg1 = CALL_EXPR_ARG (exp, 1);
-	  rtx op1 = expand_normal (arg1);
-	  enum machine_mode mode1 = insn_data[icode].operand[2].mode;
-
-	  if (! (*insn_data[icode].operand[2].predicate) (op1, mode1))
-	    {
-	      error ("the second argument must be a 4-bit immediate");
-	      return const0_rtx;
-	    }
-	  pat = GEN_FCN (icode) (target, op0, op1);
-	}
-      break;
-    default:
-      pat = GEN_FCN (icode) (target, op0);
-      break;
-    }
-
+  pat = GEN_FCN (icode) (target, op0);
   if (! pat)
     return 0;
   emit_insn (pat);
@@ -21262,10 +21311,6 @@ ix86_expand_builtin (tree exp, rtx targe
 					     exp, target);
       break;
 
-    case IX86_BUILTIN_AESKEYGENASSIST128:
-      return ix86_expand_binop_imm_builtin (CODE_FOR_aeskeygenassist,
-					     exp, target);
-
     case IX86_BUILTIN_FEMMS:
       emit_insn (gen_mmx_femms ());
       return NULL_RTX;
@@ -21624,12 +21669,15 @@ ix86_expand_builtin (tree exp, rtx targe
       break;
     }
 
-  for (i = 0, d = bdesc_sse_3arg;
-       i < ARRAY_SIZE (bdesc_sse_3arg);
+  for (i = 0, d = bdesc_sse_args;
+       i < ARRAY_SIZE (bdesc_sse_args);
        i++, d++)
     if (d->code == fcode)
-      return ix86_expand_sse_4_operands_builtin (d->icode, exp,
-						 target);
+      {
+	enum sse_builtin_type type = (enum sse_builtin_type ) d->flag;
+	return ix86_expand_sse_operands_builtin (d->icode, exp,
+						 type, target);
+      }
 
   for (i = 0, d = bdesc_2arg; i < ARRAY_SIZE (bdesc_2arg); i++, d++)
     if (d->code == fcode)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: PATCH: Simplify SSE builtin handling
  2008-04-17  7:32 ` H.J. Lu
       [not found]   ` <5787cf470804170046t66ef5a0dt10111c4f70abf8ef@mail.gmail.com>
@ 2008-04-17 19:04   ` Meissner, Michael
  1 sibling, 0 replies; 4+ messages in thread
From: Meissner, Michael @ 2008-04-17 19:04 UTC (permalink / raw)
  To: H.J. Lu, GCC Patches, Uros Bizjak


> -----Original Message-----
> From: H.J. Lu [mailto:hjl.tools@gmail.com]
> Sent: Thursday, April 17, 2008 1:28 AM
> To: GCC Patches; Meissner, Michael; Uros Bizjak
> Subject: Re: PATCH: Simplify SSE builtin handling
> 
> On Wed, Apr 16, 2008 at 11:51:21AM -0700, H.J. Lu wrote:
> > Hi,
> >
> > There are many special treatments for various SSE builtins. SSE5
> introduced
> > ix86_expand_multi_arg_builtin, which simplified SSE5 builtin
handling.
> This
> > patch adds ix86_expand_sse_operands_builtin, which is very similar
to
> > ix86_expand_multi_arg_builtin, but a little bit more flexible. It
can
> handle
> > any SSE builtins without type mismatch. Each builtin function type
is
> encoded
> > as "return_func_args" in enum.  OK for trunk?
> >
> > Thanks.
> >
> >
> 
> Here is the update which bootstraped on Linux/ia32 and Linux/x86-64.
>

At some point I probably should move the calls to
ix86_expand_multi_arg_builtin to ix86_expand_sse_operands_builtin, but I
have another SSE5 patch that I would like to get out of the way first.
The patch looks good to me.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2008-04-17 17:24 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-04-16 21:10 PATCH: Simplify SSE builtin handling H.J. Lu
2008-04-17  7:32 ` H.J. Lu
     [not found]   ` <5787cf470804170046t66ef5a0dt10111c4f70abf8ef@mail.gmail.com>
2008-04-17 15:21     ` H.J. Lu
2008-04-17 19:04   ` Meissner, Michael

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).