[PATCH, rs6000] gcc mainline, add builtin support for vec_float, vec_float2, vec_floate, vec

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH, rs6000] gcc mainline, add builtin support for vec_float, vec_float2, vec_floate, vec_floate, builtins
@ 2017-06-09 18:20 Carl E. Love
  2017-06-09 21:05 ` Segher Boessenkool
  0 siblings, 1 reply; 8+ messages in thread
From: Carl E. Love @ 2017-06-09 18:20 UTC (permalink / raw)
  To: gcc-patches, David Edelsohn, Segher Boessenkool; +Cc: Bill Schmidt, cel

GCC Maintainers:

This patch adds support for the various vec_float, vec_float2,
vec_floate, vec_floate, builtins.

The patch has been tested on powerpc64le-unknown-linux-gnu (Power 8 LE)
and on powerpc64-unknown-linux-gnu (Power 8 BE) with no regressions.

Is the patch OK for gcc mainline?

                  Carl Love
--------------------------------------------------------------

gcc/ChangeLog:

2017-06-09  Carl Love  <cel@us.ibm.com>

	* config/rs6000/rs6000-c.c: Add definitions for the vec_float,
	vec_float2, vec_floato, vec_floate built-ins.
	* config/rs6000/vsx.md: Add RTL code for instructions vsx_xvcvsxws
	vsx_xvcvuxwsp, float2, floato and floate.
	* config/rs6000/rs6000-builtin.def: Add definitions for vsx_xvcvsxwsp,
	vsx_xvcvuxwsp, float2, floato and floate.
	* config/altivec.md: Add version of p8_vmrgew that takes V4SF args and
	returns V4SF.
	* config/rs6000/altivec.h: Add builtin defines for vec_float,
	vec_float2, vec_floate and vec_floato.
	* doc/extend.texi: Update the built-in documentation file for the
	new built-in functions.

gcc/testsuite/ChangeLog:

2017-06-09  Carl Love  <cel@us.ibm.com>

	* gcc.target/powerpc/builtins-3-runnable.c: Add runnable tests for
	vec_float, vec_float2, vec_floate and vec_floato builtins
	built-ins.
---
 gcc/config/rs6000/altivec.h                        |   4 +
 gcc/config/rs6000/altivec.md                       |  14 +-
 gcc/config/rs6000/rs6000-builtin.def               |  19 ++-
 gcc/config/rs6000/rs6000-c.c                       |  28 +++-
 gcc/config/rs6000/rs6000-protos.h                  |   1 +
 gcc/config/rs6000/rs6000.c                         |  44 ++++-
 gcc/config/rs6000/vsx.md                           | 177 +++++++++++++++++++++
 gcc/doc/extend.texi                                |  14 ++
 .../gcc.target/powerpc/builtins-3-runnable.c       |  82 ++++++++++
 9 files changed, 371 insertions(+), 12 deletions(-)

diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index 20050eb..d542315 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -133,6 +133,10 @@
 #define vec_doublel __builtin_vec_doublel
 #define vec_doubleh __builtin_vec_doubleh
 #define vec_expte __builtin_vec_expte
+#define vec_float __builtin_vec_float
+#define vec_float2 __builtin_vec_float2
+#define vec_floate __builtin_vec_floate
+#define vec_floato __builtin_vec_floato
 #define vec_floor __builtin_vec_floor
 #define vec_loge __builtin_vec_loge
 #define vec_madd __builtin_vec_madd
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 487b9a4..25b2768 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -1316,13 +1316,13 @@
 }
   [(set_attr "type" "vecperm")])
 
-;; Power8 vector merge even/odd
-(define_insn "p8_vmrgew"
-  [(set (match_operand:V4SI 0 "register_operand" "=v")
-	(vec_select:V4SI
-	  (vec_concat:V8SI
-	    (match_operand:V4SI 1 "register_operand" "v")
-	    (match_operand:V4SI 2 "register_operand" "v"))
+;; Power8 vector merge two V4SF/V4SI even words to V4SF
+(define_insn "p8_vmrgew_<mode>"
+  [(set (match_operand:VSX_W 0 "register_operand" "=v")
+	(vec_select:VSX_W
+	  (vec_concat:<VS_double>
+	    (match_operand:VSX_W 1 "register_operand" "v")
+	    (match_operand:VSX_W 2 "register_operand" "v"))
 	  (parallel [(const_int 0) (const_int 4)
 		     (const_int 2) (const_int 6)])))]
   "TARGET_P8_VECTOR"
diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def
index 241c439..4682628 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -1591,6 +1591,8 @@ BU_VSX_2 (CMPLE_U16QI,        "cmple_u16qi",    CONST,  vector_ngtuv16qi)
 BU_VSX_2 (CMPLE_U8HI,         "cmple_u8hi",     CONST,  vector_ngtuv8hi)
 BU_VSX_2 (CMPLE_U4SI,         "cmple_u4si",     CONST,  vector_ngtuv4si)
 BU_VSX_2 (CMPLE_U2DI,         "cmple_u2di",     CONST,  vector_ngtuv2di)
+BU_VSX_2 (FLOAT2_V2DI,        "float2_v2di",    CONST,  float2_v2di)
+BU_VSX_2 (UNS_FLOAT2_V2DI,    "uns_float2_v2di",    CONST,  uns_float2_v2di)
 
 /* VSX abs builtin functions.  */
 BU_VSX_A (XVABSDP,	      "xvabsdp",	CONST,	absv2df2)
@@ -1648,6 +1650,16 @@ BU_VSX_1 (XVCVSPSXDS,	      "xvcvspsxds",	CONST,	vsx_xvcvspsxds)
 BU_VSX_1 (XVCVSPUXDS,	      "xvcvspuxds",	CONST,	vsx_xvcvspuxds)
 BU_VSX_1 (XVCVSXDSP,	      "xvcvsxdsp",	CONST,	vsx_xvcvsxdsp)
 BU_VSX_1 (XVCVUXDSP,	      "xvcvuxdsp",	CONST,	vsx_xvcvuxdsp)
+
+BU_VSX_1 (XVCVSXWSP_V4SF,  "vsx_xvcvsxwsp",   CONST,	vsx_xvcvsxwsp)
+BU_VSX_1 (XVCVUXWSP_V4SF,  "vsx_xvcvuxwsp",   CONST,	vsx_xvcvuxwsp)
+BU_VSX_1 (FLOATE_V2DI,     "floate_v2di",     CONST,	floatev2di)
+BU_VSX_1 (FLOATE_V2DF,     "floate_v2df",     CONST,	floatev2df)
+BU_VSX_1 (FLOATO_V2DI,     "floato_v2di",     CONST,	floatov2di)
+BU_VSX_1 (FLOATO_V2DF,     "floato_v2df",     CONST,	floatov2df)
+BU_VSX_1 (UNS_FLOATO_V2DI, "uns_floato_v2di", CONST,	unsfloatov2di)
+BU_VSX_1 (UNS_FLOATE_V2DI, "uns_floate_v2di", CONST,	unsfloatev2di)
+
 BU_VSX_1 (XVRSPI,	      "xvrspi",		CONST,	vsx_xvrspi)
 BU_VSX_1 (XVRSPIC,	      "xvrspic",	CONST,	vsx_xvrspic)
 BU_VSX_1 (XVRSPIM,	      "xvrspim",	CONST,	vsx_floorv4sf2)
@@ -1760,6 +1772,8 @@ BU_VSX_OVERLOAD_2 (XXMRGHW,  "xxmrghw")
 BU_VSX_OVERLOAD_2 (XXMRGLW,  "xxmrglw")
 BU_VSX_OVERLOAD_2 (XXSPLTD,  "xxspltd")
 BU_VSX_OVERLOAD_2 (XXSPLTW,  "xxspltw")
+BU_VSX_OVERLOAD_2 (FLOAT2,   "float2")
+BU_VSX_OVERLOAD_2 (UNS_FLOAT2,   "uns_float2")
 
 /* 1 argument VSX overloaded builtin functions.  */
 BU_VSX_OVERLOAD_1 (DOUBLE,   "double")
@@ -1771,6 +1785,9 @@ BU_VSX_OVERLOAD_1 (DOUBLEH,  "doubleh")
 BU_VSX_OVERLOAD_1 (UNS_DOUBLEH,  "uns_doubleh")
 BU_VSX_OVERLOAD_1 (DOUBLEL,  "doublel")
 BU_VSX_OVERLOAD_1 (UNS_DOUBLEL,  "uns_doublel")
+BU_VSX_OVERLOAD_1 (FLOAT,  "float")
+BU_VSX_OVERLOAD_1 (FLOATE,  "floate")
+BU_VSX_OVERLOAD_1 (FLOATO,  "floato")
 
 /* VSX builtins that are handled as special cases.  */
 BU_VSX_OVERLOAD_X (LD,	     "ld")
@@ -1812,7 +1829,7 @@ BU_P8V_AV_2 (VMINSD,		"vminsd",	CONST,	sminv2di3)
 BU_P8V_AV_2 (VMAXSD,		"vmaxsd",	CONST,	smaxv2di3)
 BU_P8V_AV_2 (VMINUD,		"vminud",	CONST,	uminv2di3)
 BU_P8V_AV_2 (VMAXUD,		"vmaxud",	CONST,	umaxv2di3)
-BU_P8V_AV_2 (VMRGEW,		"vmrgew",	CONST,	p8_vmrgew)
+BU_P8V_AV_2 (VMRGEW_V4SI,	"vmrgew_v4si",	CONST,	p8_vmrgew_v4si)
 BU_P8V_AV_2 (VMRGOW,		"vmrgow",	CONST,	p8_vmrgow)
 BU_P8V_AV_2 (VBPERMQ,		"vbpermq",	CONST,	altivec_vbpermq)
 BU_P8V_AV_2 (VBPERMQ2,		"vbpermq2",	CONST,	altivec_vbpermq2)
diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
index f1e8d3d..19f6d9c 100644
--- a/gcc/config/rs6000/rs6000-c.c
+++ b/gcc/config/rs6000/rs6000-c.c
@@ -1538,6 +1538,28 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
   { VSX_BUILTIN_VEC_DOUBLEL, VSX_BUILTIN_DOUBLEL_V4SF,
     RS6000_BTI_V2DF, RS6000_BTI_V4SF, 0, 0 },
 
+  { VSX_BUILTIN_VEC_FLOAT, VSX_BUILTIN_XVCVSXWSP_V4SF,
+    RS6000_BTI_V4SF, RS6000_BTI_V4SI, 0, 0 },
+  { VSX_BUILTIN_VEC_FLOAT, VSX_BUILTIN_XVCVUXWSP_V4SF,
+    RS6000_BTI_V4SF, RS6000_BTI_unsigned_V4SI, 0, 0 },
+  { VSX_BUILTIN_VEC_FLOAT2, VSX_BUILTIN_FLOAT2_V2DI,
+    RS6000_BTI_V4SF, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { VSX_BUILTIN_VEC_FLOAT2, VSX_BUILTIN_UNS_FLOAT2_V2DI,
+    RS6000_BTI_V4SF, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
+  { VSX_BUILTIN_VEC_FLOATE, VSX_BUILTIN_FLOATE_V2DF,
+    RS6000_BTI_V4SF, RS6000_BTI_V2DF, 0, 0 },
+  { VSX_BUILTIN_VEC_FLOATE, VSX_BUILTIN_FLOATE_V2DI,
+    RS6000_BTI_V4SF, RS6000_BTI_V2DI, 0, 0 },
+  { VSX_BUILTIN_VEC_FLOATE, VSX_BUILTIN_UNS_FLOATE_V2DI,
+    RS6000_BTI_V4SF, RS6000_BTI_unsigned_V2DI, 0, 0 },
+  { VSX_BUILTIN_VEC_FLOATO, VSX_BUILTIN_FLOATO_V2DF,
+    RS6000_BTI_V4SF, RS6000_BTI_V2DF, 0, 0 },
+  { VSX_BUILTIN_VEC_FLOATO, VSX_BUILTIN_FLOATO_V2DI,
+    RS6000_BTI_V4SF, RS6000_BTI_V2DI, 0, 0 },
+  { VSX_BUILTIN_VEC_FLOATO, VSX_BUILTIN_UNS_FLOATO_V2DI,
+    RS6000_BTI_V4SF, RS6000_BTI_unsigned_V2DI, 0, 0 },
+
   { ALTIVEC_BUILTIN_VEC_LD, ALTIVEC_BUILTIN_LVX_V2DF,
     RS6000_BTI_V2DF, RS6000_BTI_INTSI, ~RS6000_BTI_V2DF, 0 },
   { ALTIVEC_BUILTIN_VEC_LD, ALTIVEC_BUILTIN_LVX_V2DI,
@@ -5262,12 +5284,12 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
     RS6000_BTI_unsigned_V2DI, 0 },
 
-  { P8V_BUILTIN_VEC_VMRGEW, P8V_BUILTIN_VMRGEW,
+  { P8V_BUILTIN_VEC_VMRGEW, P8V_BUILTIN_VMRGEW_V4SI,
     RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
-  { P8V_BUILTIN_VEC_VMRGEW, P8V_BUILTIN_VMRGEW,
+  { P8V_BUILTIN_VEC_VMRGEW, P8V_BUILTIN_VMRGEW_V4SI,
     RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
     RS6000_BTI_unsigned_V4SI, 0 },
-  { P8V_BUILTIN_VEC_VMRGEW, P8V_BUILTIN_VMRGEW,
+  { P8V_BUILTIN_VEC_VMRGEW, P8V_BUILTIN_VMRGEW_V4SI,
     RS6000_BTI_bool_V4SI, RS6000_BTI_bool_V4SI, RS6000_BTI_bool_V4SI, 0 },
 
   { P8V_BUILTIN_VEC_VMRGOW, P8V_BUILTIN_VMRGOW,
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index 8a231f5..8165d04 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -72,6 +72,7 @@ extern void altivec_expand_stvex_be (rtx, rtx, machine_mode, unsigned);
 extern void rs6000_expand_extract_even (rtx, rtx, rtx);
 extern void rs6000_expand_interleave (rtx, rtx, rtx, bool);
 extern void rs6000_scale_v2df (rtx, rtx, int);
+extern void rs6000_generate_float2_code (bool, rtx, rtx, rtx);
 extern int expand_block_clear (rtx[]);
 extern int expand_block_move (rtx[]);
 extern bool expand_block_compare (rtx[]);
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 941c0c2..f193025 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -36798,7 +36798,7 @@ altivec_expand_vec_perm_const (rtx operands[4])
       (BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct
        : CODE_FOR_altivec_vmrghw_direct),
       {  8,  9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31 } },
-    { OPTION_MASK_P8_VECTOR, CODE_FOR_p8_vmrgew,
+    { OPTION_MASK_P8_VECTOR, CODE_FOR_p8_vmrgew_v4si,
       {  0,  1,  2,  3, 16, 17, 18, 19,  8,  9, 10, 11, 24, 25, 26, 27 } },
     { OPTION_MASK_P8_VECTOR, CODE_FOR_p8_vmrgow,
       {  4,  5,  6,  7, 20, 21, 22, 23, 12, 13, 14, 15, 28, 29, 30, 31 } }
@@ -42389,6 +42389,48 @@ rs6000_atomic_assign_expand_fenv (tree *hold, tree *clear, tree *update)
   *update = build2 (COMPOUND_EXPR, void_type_node, update_mffs, update_mtfsf);
 }
 
+void
+rs6000_generate_float2_code (bool signed_convert, rtx dst, rtx src1, rtx src2)
+{
+  rtx rtx_tmp0, rtx_tmp1, rtx_tmp2, rtx_tmp3;
+
+  rtx_tmp0 = gen_reg_rtx (V2DImode);
+  rtx_tmp1 = gen_reg_rtx (V2DImode);
+
+  /* The vector merge instruction vmrgew swaps the 2nd and 3rd words,
+     compensate by swapping the 64-bit elements around to negate the vmrgew
+     swap. */
+  if (VECTOR_ELT_ORDER_BIG)
+    {
+      emit_insn (gen_vsx_xxpermdi_v2di_be (rtx_tmp0, src1, src2, GEN_INT(0)));
+      emit_insn (gen_vsx_xxpermdi_v2di_be (rtx_tmp1, src1, src2, GEN_INT(3)));
+    }
+  else
+    {
+      emit_insn (gen_vsx_xxpermdi_v2di (rtx_tmp0, src1, src2, GEN_INT(3)));
+      emit_insn (gen_vsx_xxpermdi_v2di (rtx_tmp1, src1, src2, GEN_INT(0)));
+    }
+
+  rtx_tmp2 = gen_reg_rtx (V4SFmode);
+  rtx_tmp3 = gen_reg_rtx (V4SFmode);
+
+  if (signed_convert)
+    {
+      emit_insn (gen_vsx_xvcvsxdsp (rtx_tmp2, rtx_tmp0));
+      emit_insn (gen_vsx_xvcvsxdsp (rtx_tmp3, rtx_tmp1));
+    }
+  else
+    {
+       emit_insn (gen_vsx_xvcvuxdsp (rtx_tmp2, rtx_tmp0));
+       emit_insn (gen_vsx_xvcvuxdsp (rtx_tmp3, rtx_tmp1));
+    }
+
+  if (VECTOR_ELT_ORDER_BIG)
+    emit_insn (gen_p8_vmrgew_v4sf (dst, rtx_tmp2, rtx_tmp3));
+  else
+    emit_insn (gen_p8_vmrgew_v4sf (dst, rtx_tmp3, rtx_tmp2));
+}
+
 /* Implement the TARGET_OPTAB_SUPPORTED_P hook.  */
 
 static bool
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 141aa42..342cc3d 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -310,6 +310,10 @@
 ;; Iterator for the 2 short vector types to do a splat from an integer
 (define_mode_iterator VSX_SPLAT_I [V16QI V8HI])
 
+;; Mode iterator and attribute for vector floate and floato conversions
+(define_mode_iterator VFC [V2DI V2DF])
+(define_mode_attr VFC_inst [(V2DI "sxd") (V2DF "dp")])
+
 ;; Mode attribute to give the count for the splat instruction to splat
 ;; the value in the 64-bit integer slot
 (define_mode_attr VSX_SPLAT_COUNT [(V16QI "7") (V8HI "3")])
@@ -331,6 +335,14 @@
    UNSPEC_VSX_CVUXDSP
    UNSPEC_VSX_CVSPSXDS
    UNSPEC_VSX_CVSPUXDS
+   UNSPEC_VSX_CVSXWSP
+   UNSPEC_VSX_CVUXWSP
+   UNSPEC_VSX_FLOAT2
+   UNSPEC_VSX_UNS_FLOAT2
+   UNSPEC_VSX_FLOATE
+   UNSPEC_VSX_UNS_FLOATE
+   UNSPEC_VSX_FLOATO
+   UNSPEC_VSX_UNS_FLOATO
    UNSPEC_VSX_TDIV
    UNSPEC_VSX_TSQRT
    UNSPEC_VSX_SET
@@ -1976,6 +1988,171 @@
   "xvcvspuxds %x0,%x1"
   [(set_attr "type" "vecdouble")])
 
+(define_insn "vsx_xvcvsxwsp"
+  [(set (match_operand:V4SF 0 "vsx_register_operand" "=v")
+	(unspec:V4SF[(match_operand:V4SI 1 "vsx_register_operand" "v")]
+		    UNSPEC_VSX_CVSXWSP))]
+  "VECTOR_UNIT_VSX_P (V4SFmode)"
+  "xvcvsxwsp %x0,%x1"
+  [(set_attr "type" "vecdouble")])
+
+(define_insn "vsx_xvcvuxwsp"
+  [(set (match_operand:V4SF 0 "vsx_register_operand" "=v")
+	(unspec:V4SF[(match_operand:V4SI 1 "vsx_register_operand" "v")]
+		    UNSPEC_VSX_CVUXWSP))]
+  "VECTOR_UNIT_VSX_P (V4SFmode)"
+  "xvcvuxwsp %x0,%x1"
+  [(set_attr "type" "vecdouble")])
+
+;; Generate float2
+;; convert two long long signed ints to float
+(define_expand "float2_v2di"
+  [(match_operand:V4SF 0 "register_operand" "=v")
+   (unspec:V4SI [(match_operand:V2DI 1 "register_operand" "v")
+                 (match_operand:V2DI 2 "register_operand" "v")]
+  UNSPEC_VSX_FLOAT2)]
+
+  "TARGET_VSX"
+{
+  rtx rtx_src1, rtx_src2, rtx_dst;
+
+  rtx_dst = operands[0];
+  rtx_src1 = operands[1];
+  rtx_src2 = operands[2];
+
+  rs6000_generate_float2_code (true, rtx_dst, rtx_src1, rtx_src2);
+  DONE;
+})
+
+;; Generate uns_float2
+;; convert two long long unsigned ints to float
+(define_expand "uns_float2_v2di"
+  [(match_operand:V4SF 0 "register_operand" "=v")
+   (unspec:V4SI [(match_operand:V2DI 1 "register_operand" "v")
+                 (match_operand:V2DI 2 "register_operand" "v")]
+  UNSPEC_VSX_UNS_FLOAT2)]
+
+  "TARGET_VSX"
+{
+  rtx rtx_src1, rtx_src2, rtx_dst;
+
+  rtx_dst = operands[0];
+  rtx_src1 = operands[1];
+  rtx_src2 = operands[2];
+
+  rs6000_generate_float2_code (true, rtx_dst, rtx_src1, rtx_src2);
+  DONE;
+})
+
+;; Generate floate
+;; convert  double or long long signed to float
+;;(Only even words are valid, BE numbering)
+(define_expand "floate<mode>"
+  [(match_operand:V4SF 0 "register_operand" "=v")
+   (unspec:V4SF [(match_operand:VFC 1 "register_operand" "v")]
+   UNSPEC_VSX_FLOATE)]
+   "TARGET_VSX"
+{
+  if (VECTOR_ELT_ORDER_BIG)
+    {
+      /* Shift left one word to put even word correct location */
+	rtx rtx_tmp;
+	rtx rtx_val = GEN_INT (4);
+
+	rtx_tmp = gen_reg_rtx (V4SFmode);
+	emit_insn (gen_vsx_xvcv<VFC_inst>sp (rtx_tmp, operands[1]));
+	emit_insn (gen_altivec_vsldoi_v4sf (operands[0],
+		   rtx_tmp, rtx_tmp, rtx_val));
+    }
+  else
+    {
+	emit_insn (gen_vsx_xvcv<VFC_inst>sp (operands[0], operands[1]));
+    }
+DONE;
+})
+
+;; Generate uns_floate
+;; convert long long unsigned to float
+;;(Only even words are valid, BE numbering)
+(define_expand "unsfloatev2di"
+  [(match_operand:V4SF 0 "register_operand" "=v")
+   (unspec:V4SF [(match_operand:V2DI 1 "register_operand" "v")]
+   UNSPEC_VSX_UNS_FLOATE)]
+   "TARGET_VSX"
+{
+  if (VECTOR_ELT_ORDER_BIG)
+    {
+      /* Shift left one word to put even word correct location */
+      rtx rtx_tmp;
+      rtx rtx_val = GEN_INT (4);
+
+      rtx_tmp = gen_reg_rtx (V4SFmode);
+      emit_insn (gen_vsx_xvcvuxdsp (rtx_tmp, operands[1]));
+      emit_insn (gen_altivec_vsldoi_v4sf (operands[0],
+                 rtx_tmp, rtx_tmp, rtx_val));
+    }
+  else
+    {
+      emit_insn (gen_vsx_xvcvuxdsp (operands[0], operands[1]));
+    }
+  DONE;
+})
+
+;; Generate floato
+;; convert double or long long signed to float
+;; Only odd words are valid, BE numbering)
+(define_expand "floato<mode>"
+  [(match_operand:V4SF 0 "register_operand" "=v")
+   (unspec:V4SF [(match_operand:VFC 1 "register_operand" "v")]
+   UNSPEC_VSX_UNS_FLOATO)]
+  "TARGET_VSX"
+{
+  if (VECTOR_ELT_ORDER_BIG)
+    {
+      emit_insn (gen_vsx_xvcv<VFC_inst>sp (operands[0], operands[1]));
+    }
+  else
+    {
+      /* Shift left one word to put odd word correct location */
+      rtx rtx_tmp;
+      rtx rtx_val = GEN_INT (4);
+
+      rtx_tmp = gen_reg_rtx (V4SFmode);
+      emit_insn (gen_vsx_xvcv<VFC_inst>sp (rtx_tmp, operands[1]));
+      emit_insn (gen_altivec_vsldoi_v4sf (operands[0],
+                 rtx_tmp, rtx_tmp, rtx_val));
+    }
+  DONE;
+})
+
+;; Generate uns_floato
+;; convert long long unsigned to float
+;; (Only odd words are valid, BE numbering)
+(define_expand "unsfloatov2di"
+  [(match_operand:V4SF 0 "register_operand" "=v")
+   (unspec:V4SF[(match_operand:V2DI 1 "register_operand" "v")]
+   UNSPEC_VSX_UNS_FLOATO)]
+
+  "TARGET_VSX"
+{
+  if (VECTOR_ELT_ORDER_BIG)
+    {
+      emit_insn (gen_vsx_xvcvuxdsp (operands[0], operands[1]));
+    }
+  else
+    {
+      /* Shift left one word to put odd word correct location */
+      rtx rtx_tmp;
+      rtx rtx_val = GEN_INT (4);
+
+      rtx_tmp = gen_reg_rtx (V4SFmode);
+      emit_insn (gen_vsx_xvcvuxdsp (rtx_tmp, operands[1]));
+      emit_insn (gen_altivec_vsldoi_v4sf (operands[0],
+                 rtx_tmp, rtx_tmp, rtx_val));
+    }
+  DONE;
+})
+
 ;; Only optimize (float (fix x)) -> frz if we are in fast-math mode, since
 ;; since the xvrdpiz instruction does not truncate the value if the floating
 ;; point value is < LONG_MIN or > LONG_MAX.
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 7d39335..a662aeb 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -16039,6 +16039,20 @@ vector float vec_expte (vector float);
 
 vector float vec_floor (vector float);
 
+vector float vec_float (vector signed int);
+vector float vec_float (vector unsigned int);
+
+vector float vec_float2 (vector signed long long, vector signed long long);
+vector float vec_float2 (vector unsigned long long, vector signed long long);
+
+vector float vec_floate (vector double);
+vector float vec_floate (vector signed long long);
+vector float vec_floate (vector unsigned long long);
+
+vector float vec_floato (vector double);
+vector float vec_floato (vector signed long long);
+vector float vec_floato (vector unsigned long long);
+
 vector float vec_ld (int, const vector float *);
 vector float vec_ld (int, const float *);
 vector bool int vec_ld (int, const vector bool int *);
diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-3-runnable.c b/gcc/testsuite/gcc.target/powerpc/builtins-3-runnable.c
index 60ec617..8e09a92 100644
--- a/gcc/testsuite/gcc.target/powerpc/builtins-3-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/builtins-3-runnable.c
@@ -5,8 +5,37 @@
 
 #include <altivec.h> // vector
 
+#define ALL  1
+#define EVEN 2
+#define ODD  3
+
 void abort (void);
 
+void test_result_sp(int check, vector float vec_result, vector float vec_expected)
+{
+   int i;
+   for(i = 0; i<4; i++) {
+
+      switch (check) {
+      case ALL:
+         break;
+      case EVEN:
+         if (i%2 == 0)
+            break;
+         else
+            continue;
+      case ODD:
+         if (i%2 != 0)
+            break;
+         else
+            continue;
+      }
+
+      if (vec_result[i] != vec_expected[i])
+         abort();
+   }
+}
+
 void test_result_dp(vector double vec_result, vector double vec_expected)
 {
 	if (vec_result[0] != vec_expected[0])
@@ -21,11 +50,17 @@ int main()
 	int i;
 	vector unsigned int vec_unint;
 	vector signed int vec_int;
+	vector long long int vec_ll_int0, vec_ll_int1;
+	vector long long unsigned int vec_ll_uns_int0, vec_ll_uns_int1;
 	vector float  vec_flt, vec_flt_result, vec_flt_expected;
 	vector double vec_dble0, vec_dble1, vec_dble_result, vec_dble_expected;
 
 	vec_int = (vector signed int){ -1, 3, -5, 1234567 };
+	vec_ll_int0 = (vector long long int){ -12, -12345678901234 };
+	vec_ll_int1 = (vector long long int){ 12, 9876543210 };
 	vec_unint = (vector unsigned int){ 9, 11, 15, 2468013579 };
+	vec_ll_uns_int0 = (vector unsigned long long int){ 102, 9753108642 };
+	vec_ll_uns_int1 = (vector unsigned long long int){ 23, 29 };
 	vec_flt = (vector float){ -21., 3.5, -53., 78. };
 	vec_dble0 = (vector double){ 34.0, 97.0 };
 	vec_dble1 = (vector double){ 214.0, -5.5 };
@@ -81,4 +116,51 @@ int main()
 	vec_dble_result = vec_doubleh (vec_unint);
 	test_result_dp(vec_dble_result, vec_dble_expected);
 
+	vec_dble_expected = (vector double){-21.000000, 3.500000};
+	vec_dble_result = vec_doubleh (vec_flt);
+	test_result_dp(vec_dble_result, vec_dble_expected);
+
+	/* conversion of integer vector to single precision float vector */
+	vec_flt_expected = (vector float){-1.00, 3.00, -5.00, 1234567.00};
+	vec_flt_result = vec_float (vec_int);
+	test_result_sp(ALL, vec_flt_result, vec_flt_expected);
+
+	vec_flt_expected = (vector float){9.00, 11.00, 15.00, 2468013579.0};
+	vec_flt_result = vec_float (vec_unint);
+	test_result_sp(ALL, vec_flt_result, vec_flt_expected);
+   
+	/* conversion of two double precision vectors to single precision vector */
+	vec_flt_expected = (vector float){-12.00, -12345678901234.00, 12.00, 9876543210.00};
+	vec_flt_result = vec_float2 (vec_ll_int0, vec_ll_int1);
+	test_result_sp(ALL, vec_flt_result, vec_flt_expected);
+
+	vec_flt_expected = (vector float){102.00, 9753108642.00, 23.00, 29.00};
+	vec_flt_result = vec_float2 (vec_ll_uns_int0, vec_ll_uns_int1);
+	test_result_sp(ALL, vec_flt_result, vec_flt_expected);
+
+	/* conversion of even words in double precision vector to single precision vector */
+	vec_flt_expected = (vector float){-12.00, 00.00, -12345678901234.00, 0.00};
+	vec_flt_result = vec_floate (vec_ll_int0);
+	test_result_sp(EVEN, vec_flt_result, vec_flt_expected);
+
+	vec_flt_expected = (vector float){102.00, 0.00, 9753108642.00, 0.00};
+	vec_flt_result = vec_floate (vec_ll_uns_int0);
+	test_result_sp(EVEN, vec_flt_result, vec_flt_expected);
+
+	vec_flt_expected = (vector float){34.00, 0.00, 97.00, 0.00};
+	vec_flt_result = vec_floate (vec_dble0);
+	test_result_sp(EVEN, vec_flt_result, vec_flt_expected);
+
+	/* conversion of odd words in double precision vector to single precision vector */
+	vec_flt_expected = (vector float){0.00, -12.00, 00.00, -12345678901234.00};
+	vec_flt_result = vec_floato (vec_ll_int0);
+	test_result_sp(ODD, vec_flt_result, vec_flt_expected);
+
+	vec_flt_expected = (vector float){0.00, 102.00, 0.00, 9753108642.00};
+	vec_flt_result = vec_floato (vec_ll_uns_int0);
+	test_result_sp(ODD, vec_flt_result, vec_flt_expected);
+
+	vec_flt_expected = (vector float){0.00, 34.00, 0.00, 97.00};
+	vec_flt_result = vec_floato (vec_dble0);
+	test_result_sp(ODD, vec_flt_result, vec_flt_expected);
 }
-- 
1.9.1



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH, rs6000] gcc mainline, add builtin support for vec_float, vec_float2, vec_floate, vec_floate, builtins
  2017-06-09 18:20 [PATCH, rs6000] gcc mainline, add builtin support for vec_float, vec_float2, vec_floate, vec_floate, builtins Carl E. Love
@ 2017-06-09 21:05 ` Segher Boessenkool
  2017-06-09 23:12   ` [PATCH v2, " Carl E. Love
  0 siblings, 1 reply; 8+ messages in thread
From: Segher Boessenkool @ 2017-06-09 21:05 UTC (permalink / raw)
  To: Carl E. Love; +Cc: gcc-patches, David Edelsohn, Bill Schmidt

Hi Carl,

A couple of issues, most small:

On Fri, Jun 09, 2017 at 11:20:43AM -0700, Carl E. Love wrote:
> +void
> +rs6000_generate_float2_code (bool signed_convert, rtx dst, rtx src1, rtx src2)
> +{
> +  rtx rtx_tmp0, rtx_tmp1, rtx_tmp2, rtx_tmp3;
> +
> +  rtx_tmp0 = gen_reg_rtx (V2DImode);
> +  rtx_tmp1 = gen_reg_rtx (V2DImode);
> +
> +  /* The vector merge instruction vmrgew swaps the 2nd and 3rd words,
> +     compensate by swapping the 64-bit elements around to negate the vmrgew
> +     swap. */

This comment isn't very clear to me...  Could you expand it a bit?

Oh, and dot space space.

> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -310,6 +310,10 @@
>  ;; Iterator for the 2 short vector types to do a splat from an integer
>  (define_mode_iterator VSX_SPLAT_I [V16QI V8HI])
>  
> +;; Mode iterator and attribute for vector floate and floato conversions
> +(define_mode_iterator VFC [V2DI V2DF])
> +(define_mode_attr VFC_inst [(V2DI "sxd") (V2DF "dp")])

.._sxddp, like VS_sxswp in altivec.md?  The iterator is just VSX_D.

Maybe some or all of these iterators/attrs should live in vector.md?

Is it really useful to have separate files altivec.md and vsx.md anymore?
Or should some things be moved?  This is a general question, not really
something to be handled in this patch ;-)

> +(define_insn "vsx_xvcvsxwsp"
> +  [(set (match_operand:V4SF 0 "vsx_register_operand" "=v")
> +	(unspec:V4SF[(match_operand:V4SI 1 "vsx_register_operand" "v")]
> +		    UNSPEC_VSX_CVSXWSP))]
> +  "VECTOR_UNIT_VSX_P (V4SFmode)"
> +  "xvcvsxwsp %x0,%x1"
> +  [(set_attr "type" "vecdouble")])

"v" is only the VRs...  Do you want "wa" or similar instead?

(Same question for everything, the expanders as well).

> +(define_expand "floate<mode>"
> +  [(match_operand:V4SF 0 "register_operand" "=v")
> +   (unspec:V4SF [(match_operand:VFC 1 "register_operand" "v")]
> +   UNSPEC_VSX_FLOATE)]
> +   "TARGET_VSX"
> +{
> +  if (VECTOR_ELT_ORDER_BIG)
> +    {
> +      /* Shift left one word to put even word correct location */
> +	rtx rtx_tmp;

This indent is incorrect.  Indent once after {, not twice.

> +DONE;
> +})

And an indent before DONE.

> +;; Generate uns_floate
> +;; convert long long unsigned to float
> +;;(Only even words are valid, BE numbering)

Space after ;; like on the previous lines.

> +  else
> +    {
> +      emit_insn (gen_vsx_xvcvuxdsp (operands[0], operands[1]));
> +    }

Don't make blocks for single statements, unless that really helps reading
(say, if there is a big comment before that statement).

Thanks,


Segher

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2, rs6000] gcc mainline, add builtin support for vec_float, vec_float2, vec_floate, vec_floate, builtins
  2017-06-09 21:05 ` Segher Boessenkool
@ 2017-06-09 23:12   ` Carl E. Love
  2017-06-10  0:19     ` Segher Boessenkool
  2017-06-12 18:09     ` Michael Meissner
  0 siblings, 2 replies; 8+ messages in thread
From: Carl E. Love @ 2017-06-09 23:12 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches, David Edelsohn, Bill Schmidt

GCC Maintainers:

On Fri, 2017-06-09 at 16:05 -0500, Segher Boessenkool wrote:

Fixed the various formatting (spaces) issues.  Been toying with how to
write a space checker for patches.  Have to take some time to really
think about how to do that....

> > +
> > +  /* The vector merge instruction vmrgew swaps the 2nd and 3rd words,
> > +     compensate by swapping the 64-bit elements around to negate the vmrgew
> > +     swap. */
> 
> This comment isn't very clear to me...  Could you expand it a bit?

Reworked it, hopefully it explains things better

> > +;; Mode iterator and attribute for vector floate and floato conversions
> > +(define_mode_iterator VFC [V2DI V2DF])
> > +(define_mode_attr VFC_inst [(V2DI "sxd") (V2DF "dp")])
> 
> .._sxddp, like VS_sxswp in altivec.md?  The iterator is just VSX_D.
> 
> Maybe some or all of these iterators/attrs should live in vector.md?
> 
> Is it really useful to have separate files altivec.md and vsx.md anymore?
> Or should some things be moved?  This is a general question, not really
> something to be handled in this patch ;-)
> 

Yea, I find searching through all the files to rather hard.  Perhaps
putting all the definitions into a single "header" file?  That way it
could span across all of the .md files.  If you combined vsx.md and
altivec.md, you would have a really large file.  Big files can be
problematic in their own right.

> > +(define_insn "vsx_xvcvsxwsp"
> > +  [(set (match_operand:V4SF 0 "vsx_register_operand" "=v")
> > +	(unspec:V4SF[(match_operand:V4SI 1 "vsx_register_operand" "v")]
> > +		    UNSPEC_VSX_CVSXWSP))]
> > +  "VECTOR_UNIT_VSX_P (V4SFmode)"
> > +  "xvcvsxwsp %x0,%x1"
> > +  [(set_attr "type" "vecdouble")])
> 
> "v" is only the VRs...  Do you want "wa" or similar instead?
> 

I went back and re-studied the Power register constrains.  I find them a
bit confusing, I am sure they are perfectly clear to everyone else.  So
the instructions all take VSX registers so "wa" should be fine if I
understand it correctly.  Not sure there is any need to further
constrain with "vs" for doubles or "ww" but I think you could.

I retested the changes on powerpc64le-unknown-linux-gnu (Power 8 LE)
only. 

Please let me know if the updated patch is OK for gcc mainline?

                             Carl Love
-------------------------------------------------------------------
From 3378d779286284183a4dc30a7a5dd10fa30671ff Mon Sep 17 00:00:00 2001
From: Carl Love <carll@us.ibm.com>
Date: Fri, 9 Jun 2017 17:58:23 -0500
Subject: [PATCH] Add vec_float, vec_float2, vec_floate, vec_floate, builtin
 support.

gcc/ChangeLog:

2017-06-09  Carl Love  <cel@us.ibm.com>

	* config/rs6000/rs6000-c.c: Add definitions for the vec_float,
	vec_float2, vec_floato, vec_floate built-ins.
	* config/rs6000/vsx.md: Add RTL code for instructions vsx_xvcvsxws
	vsx_xvcvuxwsp, float2, floato and floate.
	* config/rs6000/rs6000-builtin.def: Add definitions for vsx_xvcvsxwsp,
	vsx_xvcvuxwsp, float2, floato and floate.
	* config/altivec.md: Add version of p8_vmrgew that takes V4SF args and
	returns V4SF.
	* config/rs6000/altivec.h: Add builtin defines for vec_float,
	vec_float2, vec_floate and vec_floato.
	* doc/extend.texi: Update the built-in documentation file for the
	new built-in functions.

gcc/testsuite/ChangeLog:

2017-06-09  Carl Love  <cel@us.ibm.com>

	* gcc.target/powerpc/builtins-3-runnable.c: Add runnable tests for
	vec_float, vec_float2, vec_floate and vec_floato builtins
	built-ins.

Signed-off-by: Carl Love <carll@us.ibm.com>
---
 gcc/config/rs6000/altivec.h                        |   4 +
 gcc/config/rs6000/altivec.md                       |  14 +-
 gcc/config/rs6000/rs6000-builtin.def               |  19 ++-
 gcc/config/rs6000/rs6000-c.c                       |  28 +++-
 gcc/config/rs6000/rs6000-protos.h                  |   1 +
 gcc/config/rs6000/rs6000.c                         |  45 +++++-
 gcc/config/rs6000/vsx.md                           | 175 +++++++++++++++++++++
 gcc/doc/extend.texi                                |  14 ++
 .../gcc.target/powerpc/builtins-3-runnable.c       |  82 ++++++++++
 9 files changed, 370 insertions(+), 12 deletions(-)

diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index 20050eb..d542315 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -133,6 +133,10 @@
 #define vec_doublel __builtin_vec_doublel
 #define vec_doubleh __builtin_vec_doubleh
 #define vec_expte __builtin_vec_expte
+#define vec_float __builtin_vec_float
+#define vec_float2 __builtin_vec_float2
+#define vec_floate __builtin_vec_floate
+#define vec_floato __builtin_vec_floato
 #define vec_floor __builtin_vec_floor
 #define vec_loge __builtin_vec_loge
 #define vec_madd __builtin_vec_madd
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 487b9a4..25b2768 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -1316,13 +1316,13 @@
 }
   [(set_attr "type" "vecperm")])
 
-;; Power8 vector merge even/odd
-(define_insn "p8_vmrgew"
-  [(set (match_operand:V4SI 0 "register_operand" "=v")
-	(vec_select:V4SI
-	  (vec_concat:V8SI
-	    (match_operand:V4SI 1 "register_operand" "v")
-	    (match_operand:V4SI 2 "register_operand" "v"))
+;; Power8 vector merge two V4SF/V4SI even words to V4SF
+(define_insn "p8_vmrgew_<mode>"
+  [(set (match_operand:VSX_W 0 "register_operand" "=v")
+	(vec_select:VSX_W
+	  (vec_concat:<VS_double>
+	    (match_operand:VSX_W 1 "register_operand" "v")
+	    (match_operand:VSX_W 2 "register_operand" "v"))
 	  (parallel [(const_int 0) (const_int 4)
 		     (const_int 2) (const_int 6)])))]
   "TARGET_P8_VECTOR"
diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def
index 241c439..4682628 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -1591,6 +1591,8 @@ BU_VSX_2 (CMPLE_U16QI,        "cmple_u16qi",    CONST,  vector_ngtuv16qi)
 BU_VSX_2 (CMPLE_U8HI,         "cmple_u8hi",     CONST,  vector_ngtuv8hi)
 BU_VSX_2 (CMPLE_U4SI,         "cmple_u4si",     CONST,  vector_ngtuv4si)
 BU_VSX_2 (CMPLE_U2DI,         "cmple_u2di",     CONST,  vector_ngtuv2di)
+BU_VSX_2 (FLOAT2_V2DI,        "float2_v2di",    CONST,  float2_v2di)
+BU_VSX_2 (UNS_FLOAT2_V2DI,    "uns_float2_v2di",    CONST,  uns_float2_v2di)
 
 /* VSX abs builtin functions.  */
 BU_VSX_A (XVABSDP,	      "xvabsdp",	CONST,	absv2df2)
@@ -1648,6 +1650,16 @@ BU_VSX_1 (XVCVSPSXDS,	      "xvcvspsxds",	CONST,	vsx_xvcvspsxds)
 BU_VSX_1 (XVCVSPUXDS,	      "xvcvspuxds",	CONST,	vsx_xvcvspuxds)
 BU_VSX_1 (XVCVSXDSP,	      "xvcvsxdsp",	CONST,	vsx_xvcvsxdsp)
 BU_VSX_1 (XVCVUXDSP,	      "xvcvuxdsp",	CONST,	vsx_xvcvuxdsp)
+
+BU_VSX_1 (XVCVSXWSP_V4SF,  "vsx_xvcvsxwsp",   CONST,	vsx_xvcvsxwsp)
+BU_VSX_1 (XVCVUXWSP_V4SF,  "vsx_xvcvuxwsp",   CONST,	vsx_xvcvuxwsp)
+BU_VSX_1 (FLOATE_V2DI,     "floate_v2di",     CONST,	floatev2di)
+BU_VSX_1 (FLOATE_V2DF,     "floate_v2df",     CONST,	floatev2df)
+BU_VSX_1 (FLOATO_V2DI,     "floato_v2di",     CONST,	floatov2di)
+BU_VSX_1 (FLOATO_V2DF,     "floato_v2df",     CONST,	floatov2df)
+BU_VSX_1 (UNS_FLOATO_V2DI, "uns_floato_v2di", CONST,	unsfloatov2di)
+BU_VSX_1 (UNS_FLOATE_V2DI, "uns_floate_v2di", CONST,	unsfloatev2di)
+
 BU_VSX_1 (XVRSPI,	      "xvrspi",		CONST,	vsx_xvrspi)
 BU_VSX_1 (XVRSPIC,	      "xvrspic",	CONST,	vsx_xvrspic)
 BU_VSX_1 (XVRSPIM,	      "xvrspim",	CONST,	vsx_floorv4sf2)
@@ -1760,6 +1772,8 @@ BU_VSX_OVERLOAD_2 (XXMRGHW,  "xxmrghw")
 BU_VSX_OVERLOAD_2 (XXMRGLW,  "xxmrglw")
 BU_VSX_OVERLOAD_2 (XXSPLTD,  "xxspltd")
 BU_VSX_OVERLOAD_2 (XXSPLTW,  "xxspltw")
+BU_VSX_OVERLOAD_2 (FLOAT2,   "float2")
+BU_VSX_OVERLOAD_2 (UNS_FLOAT2,   "uns_float2")
 
 /* 1 argument VSX overloaded builtin functions.  */
 BU_VSX_OVERLOAD_1 (DOUBLE,   "double")
@@ -1771,6 +1785,9 @@ BU_VSX_OVERLOAD_1 (DOUBLEH,  "doubleh")
 BU_VSX_OVERLOAD_1 (UNS_DOUBLEH,  "uns_doubleh")
 BU_VSX_OVERLOAD_1 (DOUBLEL,  "doublel")
 BU_VSX_OVERLOAD_1 (UNS_DOUBLEL,  "uns_doublel")
+BU_VSX_OVERLOAD_1 (FLOAT,  "float")
+BU_VSX_OVERLOAD_1 (FLOATE,  "floate")
+BU_VSX_OVERLOAD_1 (FLOATO,  "floato")
 
 /* VSX builtins that are handled as special cases.  */
 BU_VSX_OVERLOAD_X (LD,	     "ld")
@@ -1812,7 +1829,7 @@ BU_P8V_AV_2 (VMINSD,		"vminsd",	CONST,	sminv2di3)
 BU_P8V_AV_2 (VMAXSD,		"vmaxsd",	CONST,	smaxv2di3)
 BU_P8V_AV_2 (VMINUD,		"vminud",	CONST,	uminv2di3)
 BU_P8V_AV_2 (VMAXUD,		"vmaxud",	CONST,	umaxv2di3)
-BU_P8V_AV_2 (VMRGEW,		"vmrgew",	CONST,	p8_vmrgew)
+BU_P8V_AV_2 (VMRGEW_V4SI,	"vmrgew_v4si",	CONST,	p8_vmrgew_v4si)
 BU_P8V_AV_2 (VMRGOW,		"vmrgow",	CONST,	p8_vmrgow)
 BU_P8V_AV_2 (VBPERMQ,		"vbpermq",	CONST,	altivec_vbpermq)
 BU_P8V_AV_2 (VBPERMQ2,		"vbpermq2",	CONST,	altivec_vbpermq2)
diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
index f1e8d3d..19f6d9c 100644
--- a/gcc/config/rs6000/rs6000-c.c
+++ b/gcc/config/rs6000/rs6000-c.c
@@ -1538,6 +1538,28 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
   { VSX_BUILTIN_VEC_DOUBLEL, VSX_BUILTIN_DOUBLEL_V4SF,
     RS6000_BTI_V2DF, RS6000_BTI_V4SF, 0, 0 },
 
+  { VSX_BUILTIN_VEC_FLOAT, VSX_BUILTIN_XVCVSXWSP_V4SF,
+    RS6000_BTI_V4SF, RS6000_BTI_V4SI, 0, 0 },
+  { VSX_BUILTIN_VEC_FLOAT, VSX_BUILTIN_XVCVUXWSP_V4SF,
+    RS6000_BTI_V4SF, RS6000_BTI_unsigned_V4SI, 0, 0 },
+  { VSX_BUILTIN_VEC_FLOAT2, VSX_BUILTIN_FLOAT2_V2DI,
+    RS6000_BTI_V4SF, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { VSX_BUILTIN_VEC_FLOAT2, VSX_BUILTIN_UNS_FLOAT2_V2DI,
+    RS6000_BTI_V4SF, RS6000_BTI_unsigned_V2DI,
+    RS6000_BTI_unsigned_V2DI, 0 },
+  { VSX_BUILTIN_VEC_FLOATE, VSX_BUILTIN_FLOATE_V2DF,
+    RS6000_BTI_V4SF, RS6000_BTI_V2DF, 0, 0 },
+  { VSX_BUILTIN_VEC_FLOATE, VSX_BUILTIN_FLOATE_V2DI,
+    RS6000_BTI_V4SF, RS6000_BTI_V2DI, 0, 0 },
+  { VSX_BUILTIN_VEC_FLOATE, VSX_BUILTIN_UNS_FLOATE_V2DI,
+    RS6000_BTI_V4SF, RS6000_BTI_unsigned_V2DI, 0, 0 },
+  { VSX_BUILTIN_VEC_FLOATO, VSX_BUILTIN_FLOATO_V2DF,
+    RS6000_BTI_V4SF, RS6000_BTI_V2DF, 0, 0 },
+  { VSX_BUILTIN_VEC_FLOATO, VSX_BUILTIN_FLOATO_V2DI,
+    RS6000_BTI_V4SF, RS6000_BTI_V2DI, 0, 0 },
+  { VSX_BUILTIN_VEC_FLOATO, VSX_BUILTIN_UNS_FLOATO_V2DI,
+    RS6000_BTI_V4SF, RS6000_BTI_unsigned_V2DI, 0, 0 },
+
   { ALTIVEC_BUILTIN_VEC_LD, ALTIVEC_BUILTIN_LVX_V2DF,
     RS6000_BTI_V2DF, RS6000_BTI_INTSI, ~RS6000_BTI_V2DF, 0 },
   { ALTIVEC_BUILTIN_VEC_LD, ALTIVEC_BUILTIN_LVX_V2DI,
@@ -5262,12 +5284,12 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
     RS6000_BTI_unsigned_V2DI, 0 },
 
-  { P8V_BUILTIN_VEC_VMRGEW, P8V_BUILTIN_VMRGEW,
+  { P8V_BUILTIN_VEC_VMRGEW, P8V_BUILTIN_VMRGEW_V4SI,
     RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
-  { P8V_BUILTIN_VEC_VMRGEW, P8V_BUILTIN_VMRGEW,
+  { P8V_BUILTIN_VEC_VMRGEW, P8V_BUILTIN_VMRGEW_V4SI,
     RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
     RS6000_BTI_unsigned_V4SI, 0 },
-  { P8V_BUILTIN_VEC_VMRGEW, P8V_BUILTIN_VMRGEW,
+  { P8V_BUILTIN_VEC_VMRGEW, P8V_BUILTIN_VMRGEW_V4SI,
     RS6000_BTI_bool_V4SI, RS6000_BTI_bool_V4SI, RS6000_BTI_bool_V4SI, 0 },
 
   { P8V_BUILTIN_VEC_VMRGOW, P8V_BUILTIN_VMRGOW,
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index 8a231f5..8165d04 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -72,6 +72,7 @@ extern void altivec_expand_stvex_be (rtx, rtx, machine_mode, unsigned);
 extern void rs6000_expand_extract_even (rtx, rtx, rtx);
 extern void rs6000_expand_interleave (rtx, rtx, rtx, bool);
 extern void rs6000_scale_v2df (rtx, rtx, int);
+extern void rs6000_generate_float2_code (bool, rtx, rtx, rtx);
 extern int expand_block_clear (rtx[]);
 extern int expand_block_move (rtx[]);
 extern bool expand_block_compare (rtx[]);
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 941c0c2..3e7ff03 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -36798,7 +36798,7 @@ altivec_expand_vec_perm_const (rtx operands[4])
       (BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct
        : CODE_FOR_altivec_vmrghw_direct),
       {  8,  9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31 } },
-    { OPTION_MASK_P8_VECTOR, CODE_FOR_p8_vmrgew,
+    { OPTION_MASK_P8_VECTOR, CODE_FOR_p8_vmrgew_v4si,
       {  0,  1,  2,  3, 16, 17, 18, 19,  8,  9, 10, 11, 24, 25, 26, 27 } },
     { OPTION_MASK_P8_VECTOR, CODE_FOR_p8_vmrgow,
       {  4,  5,  6,  7, 20, 21, 22, 23, 12, 13, 14, 15, 28, 29, 30, 31 } }
@@ -42389,6 +42389,49 @@ rs6000_atomic_assign_expand_fenv (tree *hold, tree *clear, tree *update)
   *update = build2 (COMPOUND_EXPR, void_type_node, update_mffs, update_mtfsf);
 }
 
+void
+rs6000_generate_float2_code (bool signed_convert, rtx dst, rtx src1, rtx src2)
+{
+  rtx rtx_tmp0, rtx_tmp1, rtx_tmp2, rtx_tmp3;
+
+  rtx_tmp0 = gen_reg_rtx (V2DImode);
+  rtx_tmp1 = gen_reg_rtx (V2DImode);
+
+  /* The destination of the vmrgew instruction layout is:
+     rtx_tmp2[0] rtx_tmp3[0] rtx_tmp2[1] rtx_tmp3[0].
+     Setup rtx_tmp0 and rtx_tmp1 to ensure the order of the elements after the
+     vmrgew instruction will be correct.  */
+  if (VECTOR_ELT_ORDER_BIG)
+    {
+      emit_insn (gen_vsx_xxpermdi_v2di_be (rtx_tmp0, src1, src2, GEN_INT(0)));
+      emit_insn (gen_vsx_xxpermdi_v2di_be (rtx_tmp1, src1, src2, GEN_INT(3)));
+    }
+  else
+    {
+      emit_insn (gen_vsx_xxpermdi_v2di (rtx_tmp0, src1, src2, GEN_INT(3)));
+      emit_insn (gen_vsx_xxpermdi_v2di (rtx_tmp1, src1, src2, GEN_INT(0)));
+    }
+
+  rtx_tmp2 = gen_reg_rtx (V4SFmode);
+  rtx_tmp3 = gen_reg_rtx (V4SFmode);
+
+  if (signed_convert)
+    {
+      emit_insn (gen_vsx_xvcvsxdsp (rtx_tmp2, rtx_tmp0));
+      emit_insn (gen_vsx_xvcvsxdsp (rtx_tmp3, rtx_tmp1));
+    }
+  else
+    {
+       emit_insn (gen_vsx_xvcvuxdsp (rtx_tmp2, rtx_tmp0));
+       emit_insn (gen_vsx_xvcvuxdsp (rtx_tmp3, rtx_tmp1));
+    }
+
+  if (VECTOR_ELT_ORDER_BIG)
+    emit_insn (gen_p8_vmrgew_v4sf (dst, rtx_tmp2, rtx_tmp3));
+  else
+    emit_insn (gen_p8_vmrgew_v4sf (dst, rtx_tmp3, rtx_tmp2));
+}
+
 /* Implement the TARGET_OPTAB_SUPPORTED_P hook.  */
 
 static bool
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 141aa42..dd88305 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -310,6 +310,9 @@
 ;; Iterator for the 2 short vector types to do a splat from an integer
 (define_mode_iterator VSX_SPLAT_I [V16QI V8HI])
 
+;; Mode attribute for vector floate and floato conversions
+(define_mode_attr VFC_inst [(V2DI "sxd") (V2DF "dp")])
+
 ;; Mode attribute to give the count for the splat instruction to splat
 ;; the value in the 64-bit integer slot
 (define_mode_attr VSX_SPLAT_COUNT [(V16QI "7") (V8HI "3")])
@@ -331,6 +334,14 @@
    UNSPEC_VSX_CVUXDSP
    UNSPEC_VSX_CVSPSXDS
    UNSPEC_VSX_CVSPUXDS
+   UNSPEC_VSX_CVSXWSP
+   UNSPEC_VSX_CVUXWSP
+   UNSPEC_VSX_FLOAT2
+   UNSPEC_VSX_UNS_FLOAT2
+   UNSPEC_VSX_FLOATE
+   UNSPEC_VSX_UNS_FLOATE
+   UNSPEC_VSX_FLOATO
+   UNSPEC_VSX_UNS_FLOATO
    UNSPEC_VSX_TDIV
    UNSPEC_VSX_TSQRT
    UNSPEC_VSX_SET
@@ -1976,6 +1987,170 @@
   "xvcvspuxds %x0,%x1"
   [(set_attr "type" "vecdouble")])
 
+(define_insn "vsx_xvcvsxwsp"
+  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa")
+	(unspec:V4SF[(match_operand:V4SI 1 "vsx_register_operand" "wa")]
+		    UNSPEC_VSX_CVSXWSP))]
+  "VECTOR_UNIT_VSX_P (V4SFmode)"
+  "xvcvsxwsp %x0,%x1"
+  [(set_attr "type" "vecdouble")])
+
+(define_insn "vsx_xvcvuxwsp"
+  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa")
+	(unspec:V4SF[(match_operand:V4SI 1 "vsx_register_operand" "wa")]
+		    UNSPEC_VSX_CVUXWSP))]
+  "VECTOR_UNIT_VSX_P (V4SFmode)"
+  "xvcvuxwsp %x0,%x1"
+  [(set_attr "type" "vecdouble")])
+
+;; Generate float2
+;; convert two long long signed ints to float
+(define_expand "float2_v2di"
+  [(match_operand:V4SF 0 "register_operand" "=wa")
+   (unspec:V4SI [(match_operand:V2DI 1 "register_operand" "wa")
+		 (match_operand:V2DI 2 "register_operand" "wa")]
+  UNSPEC_VSX_FLOAT2)]
+
+  "TARGET_VSX"
+{
+  rtx rtx_src1, rtx_src2, rtx_dst;
+
+  rtx_dst = operands[0];
+  rtx_src1 = operands[1];
+  rtx_src2 = operands[2];
+
+  rs6000_generate_float2_code (true, rtx_dst, rtx_src1, rtx_src2);
+  DONE;
+})
+
+;; Generate uns_float2
+;; convert two long long unsigned ints to float
+(define_expand "uns_float2_v2di"
+  [(match_operand:V4SF 0 "register_operand" "=wa")
+   (unspec:V4SI [(match_operand:V2DI 1 "register_operand" "wa")
+                 (match_operand:V2DI 2 "register_operand" "wa")]
+  UNSPEC_VSX_UNS_FLOAT2)]
+
+  "TARGET_VSX"
+{
+  rtx rtx_src1, rtx_src2, rtx_dst;
+
+  rtx_dst = operands[0];
+  rtx_src1 = operands[1];
+  rtx_src2 = operands[2];
+
+  rs6000_generate_float2_code (true, rtx_dst, rtx_src1, rtx_src2);
+  DONE;
+})
+
+;; Generate floate
+;; convert  double or long long signed to float
+;;(Only even words are valid, BE numbering)
+(define_expand "floate<mode>"
+  [(match_operand:V4SF 0 "register_operand" "=wa")
+   (unspec:V4SF [(match_operand:VSX_D 1 "register_operand" "wa")]
+   UNSPEC_VSX_FLOATE)]
+   "TARGET_VSX"
+{
+  if (VECTOR_ELT_ORDER_BIG)
+    {
+      /* Shift left one word to put even word correct location */
+      rtx rtx_tmp;
+      rtx rtx_val = GEN_INT (4);
+
+      rtx_tmp = gen_reg_rtx (V4SFmode);
+      emit_insn (gen_vsx_xvcv<VFC_inst>sp (rtx_tmp, operands[1]));
+      emit_insn (gen_altivec_vsldoi_v4sf (operands[0],
+		 rtx_tmp, rtx_tmp, rtx_val));
+    }
+  else
+    emit_insn (gen_vsx_xvcv<VFC_inst>sp (operands[0], operands[1]));
+
+  DONE;
+})
+
+;; Generate uns_floate
+;; convert long long unsigned to float
+;; (Only even words are valid, BE numbering)
+(define_expand "unsfloatev2di"
+  [(match_operand:V4SF 0 "register_operand" "=wa")
+   (unspec:V4SF [(match_operand:V2DI 1 "register_operand" "wa")]
+   UNSPEC_VSX_UNS_FLOATE)]
+   "TARGET_VSX"
+{
+  if (VECTOR_ELT_ORDER_BIG)
+    {
+      /* Shift left one word to put even word correct location */
+      rtx rtx_tmp;
+      rtx rtx_val = GEN_INT (4);
+
+      rtx_tmp = gen_reg_rtx (V4SFmode);
+      emit_insn (gen_vsx_xvcvuxdsp (rtx_tmp, operands[1]));
+      emit_insn (gen_altivec_vsldoi_v4sf (operands[0],
+                 rtx_tmp, rtx_tmp, rtx_val));
+    }
+  else
+    {
+      emit_insn (gen_vsx_xvcvuxdsp (operands[0], operands[1]));
+    }
+  DONE;
+})
+
+;; Generate floato
+;; convert double or long long signed to float
+;; Only odd words are valid, BE numbering)
+(define_expand "floato<mode>"
+  [(match_operand:V4SF 0 "register_operand" "=wa")
+   (unspec:V4SF [(match_operand:VSX_D 1 "register_operand" "wa")]
+   UNSPEC_VSX_UNS_FLOATO)]
+  "TARGET_VSX"
+{
+  if (VECTOR_ELT_ORDER_BIG)
+    {
+      emit_insn (gen_vsx_xvcv<VFC_inst>sp (operands[0], operands[1]));
+    }
+  else
+    {
+      /* Shift left one word to put odd word correct location */
+      rtx rtx_tmp;
+      rtx rtx_val = GEN_INT (4);
+
+      rtx_tmp = gen_reg_rtx (V4SFmode);
+      emit_insn (gen_vsx_xvcv<VFC_inst>sp (rtx_tmp, operands[1]));
+      emit_insn (gen_altivec_vsldoi_v4sf (operands[0],
+                 rtx_tmp, rtx_tmp, rtx_val));
+    }
+  DONE;
+})
+
+;; Generate uns_floato
+;; convert long long unsigned to float
+;; (Only odd words are valid, BE numbering)
+(define_expand "unsfloatov2di"
+  [(match_operand:V4SF 0 "register_operand" "=wa")
+   (unspec:V4SF[(match_operand:V2DI 1 "register_operand" "wa")]
+   UNSPEC_VSX_UNS_FLOATO)]
+
+  "TARGET_VSX"
+{
+  if (VECTOR_ELT_ORDER_BIG)
+    {
+      emit_insn (gen_vsx_xvcvuxdsp (operands[0], operands[1]));
+    }
+  else
+    {
+      /* Shift left one word to put odd word correct location */
+      rtx rtx_tmp;
+      rtx rtx_val = GEN_INT (4);
+
+      rtx_tmp = gen_reg_rtx (V4SFmode);
+      emit_insn (gen_vsx_xvcvuxdsp (rtx_tmp, operands[1]));
+      emit_insn (gen_altivec_vsldoi_v4sf (operands[0],
+                 rtx_tmp, rtx_tmp, rtx_val));
+    }
+  DONE;
+})
+
 ;; Only optimize (float (fix x)) -> frz if we are in fast-math mode, since
 ;; since the xvrdpiz instruction does not truncate the value if the floating
 ;; point value is < LONG_MIN or > LONG_MAX.
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 7d39335..a662aeb 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -16039,6 +16039,20 @@ vector float vec_expte (vector float);
 
 vector float vec_floor (vector float);
 
+vector float vec_float (vector signed int);
+vector float vec_float (vector unsigned int);
+
+vector float vec_float2 (vector signed long long, vector signed long long);
+vector float vec_float2 (vector unsigned long long, vector signed long long);
+
+vector float vec_floate (vector double);
+vector float vec_floate (vector signed long long);
+vector float vec_floate (vector unsigned long long);
+
+vector float vec_floato (vector double);
+vector float vec_floato (vector signed long long);
+vector float vec_floato (vector unsigned long long);
+
 vector float vec_ld (int, const vector float *);
 vector float vec_ld (int, const float *);
 vector bool int vec_ld (int, const vector bool int *);
diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-3-runnable.c b/gcc/testsuite/gcc.target/powerpc/builtins-3-runnable.c
index 60ec617..8e09a92 100644
--- a/gcc/testsuite/gcc.target/powerpc/builtins-3-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/builtins-3-runnable.c
@@ -5,8 +5,37 @@
 
 #include <altivec.h> // vector
 
+#define ALL  1
+#define EVEN 2
+#define ODD  3
+
 void abort (void);
 
+void test_result_sp(int check, vector float vec_result, vector float vec_expected)
+{
+   int i;
+   for(i = 0; i<4; i++) {
+
+      switch (check) {
+      case ALL:
+         break;
+      case EVEN:
+         if (i%2 == 0)
+            break;
+         else
+            continue;
+      case ODD:
+         if (i%2 != 0)
+            break;
+         else
+            continue;
+      }
+
+      if (vec_result[i] != vec_expected[i])
+         abort();
+   }
+}
+
 void test_result_dp(vector double vec_result, vector double vec_expected)
 {
 	if (vec_result[0] != vec_expected[0])
@@ -21,11 +50,17 @@ int main()
 	int i;
 	vector unsigned int vec_unint;
 	vector signed int vec_int;
+	vector long long int vec_ll_int0, vec_ll_int1;
+	vector long long unsigned int vec_ll_uns_int0, vec_ll_uns_int1;
 	vector float  vec_flt, vec_flt_result, vec_flt_expected;
 	vector double vec_dble0, vec_dble1, vec_dble_result, vec_dble_expected;
 
 	vec_int = (vector signed int){ -1, 3, -5, 1234567 };
+	vec_ll_int0 = (vector long long int){ -12, -12345678901234 };
+	vec_ll_int1 = (vector long long int){ 12, 9876543210 };
 	vec_unint = (vector unsigned int){ 9, 11, 15, 2468013579 };
+	vec_ll_uns_int0 = (vector unsigned long long int){ 102, 9753108642 };
+	vec_ll_uns_int1 = (vector unsigned long long int){ 23, 29 };
 	vec_flt = (vector float){ -21., 3.5, -53., 78. };
 	vec_dble0 = (vector double){ 34.0, 97.0 };
 	vec_dble1 = (vector double){ 214.0, -5.5 };
@@ -81,4 +116,51 @@ int main()
 	vec_dble_result = vec_doubleh (vec_unint);
 	test_result_dp(vec_dble_result, vec_dble_expected);
 
+	vec_dble_expected = (vector double){-21.000000, 3.500000};
+	vec_dble_result = vec_doubleh (vec_flt);
+	test_result_dp(vec_dble_result, vec_dble_expected);
+
+	/* conversion of integer vector to single precision float vector */
+	vec_flt_expected = (vector float){-1.00, 3.00, -5.00, 1234567.00};
+	vec_flt_result = vec_float (vec_int);
+	test_result_sp(ALL, vec_flt_result, vec_flt_expected);
+
+	vec_flt_expected = (vector float){9.00, 11.00, 15.00, 2468013579.0};
+	vec_flt_result = vec_float (vec_unint);
+	test_result_sp(ALL, vec_flt_result, vec_flt_expected);
+   
+	/* conversion of two double precision vectors to single precision vector */
+	vec_flt_expected = (vector float){-12.00, -12345678901234.00, 12.00, 9876543210.00};
+	vec_flt_result = vec_float2 (vec_ll_int0, vec_ll_int1);
+	test_result_sp(ALL, vec_flt_result, vec_flt_expected);
+
+	vec_flt_expected = (vector float){102.00, 9753108642.00, 23.00, 29.00};
+	vec_flt_result = vec_float2 (vec_ll_uns_int0, vec_ll_uns_int1);
+	test_result_sp(ALL, vec_flt_result, vec_flt_expected);
+
+	/* conversion of even words in double precision vector to single precision vector */
+	vec_flt_expected = (vector float){-12.00, 00.00, -12345678901234.00, 0.00};
+	vec_flt_result = vec_floate (vec_ll_int0);
+	test_result_sp(EVEN, vec_flt_result, vec_flt_expected);
+
+	vec_flt_expected = (vector float){102.00, 0.00, 9753108642.00, 0.00};
+	vec_flt_result = vec_floate (vec_ll_uns_int0);
+	test_result_sp(EVEN, vec_flt_result, vec_flt_expected);
+
+	vec_flt_expected = (vector float){34.00, 0.00, 97.00, 0.00};
+	vec_flt_result = vec_floate (vec_dble0);
+	test_result_sp(EVEN, vec_flt_result, vec_flt_expected);
+
+	/* conversion of odd words in double precision vector to single precision vector */
+	vec_flt_expected = (vector float){0.00, -12.00, 00.00, -12345678901234.00};
+	vec_flt_result = vec_floato (vec_ll_int0);
+	test_result_sp(ODD, vec_flt_result, vec_flt_expected);
+
+	vec_flt_expected = (vector float){0.00, 102.00, 0.00, 9753108642.00};
+	vec_flt_result = vec_floato (vec_ll_uns_int0);
+	test_result_sp(ODD, vec_flt_result, vec_flt_expected);
+
+	vec_flt_expected = (vector float){0.00, 34.00, 0.00, 97.00};
+	vec_flt_result = vec_floato (vec_dble0);
+	test_result_sp(ODD, vec_flt_result, vec_flt_expected);
 }
-- 
1.9.1



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2, rs6000] gcc mainline, add builtin support for vec_float, vec_float2, vec_floate, vec_floate, builtins
  2017-06-09 23:12   ` [PATCH v2, " Carl E. Love
@ 2017-06-10  0:19     ` Segher Boessenkool
  2017-06-12 18:09     ` Michael Meissner
  1 sibling, 0 replies; 8+ messages in thread
From: Segher Boessenkool @ 2017-06-10  0:19 UTC (permalink / raw)
  To: Carl E. Love; +Cc: gcc-patches, David Edelsohn, Bill Schmidt, meissner

Hi again,

On Fri, Jun 09, 2017 at 04:12:25PM -0700, Carl E. Love wrote:
> On Fri, 2017-06-09 at 16:05 -0500, Segher Boessenkool wrote:
> > > +;; Mode iterator and attribute for vector floate and floato conversions
> > > +(define_mode_iterator VFC [V2DI V2DF])
> > > +(define_mode_attr VFC_inst [(V2DI "sxd") (V2DF "dp")])
> > 
> > .._sxddp, like VS_sxswp in altivec.md?  The iterator is just VSX_D.
> > 
> > Maybe some or all of these iterators/attrs should live in vector.md?
> > 
> > Is it really useful to have separate files altivec.md and vsx.md anymore?
> > Or should some things be moved?  This is a general question, not really
> > something to be handled in this patch ;-)
> > 
> 
> Yea, I find searching through all the files to rather hard.  Perhaps
> putting all the definitions into a single "header" file?  That way it
> could span across all of the .md files.  If you combined vsx.md and
> altivec.md, you would have a really large file.  Big files can be
> problematic in their own right.

All md files already *are* included into rs6000.md; see the very end of
rs6000.md for these.  vector.md is included before vsx.md and altivec.md,
so you can define all iterators etc. there.

Big files are problematic; arbitrary splits are worse.  Originally it
made sense to have vsx.md and altivec.md separate (and separate from
the integer stuff in rs6000.md), but now it is less clear.  Maybe we
should split differently?  vector, vector-int, vector-float, something
like that?

> > > +(define_insn "vsx_xvcvsxwsp"
> > > +  [(set (match_operand:V4SF 0 "vsx_register_operand" "=v")
> > > +	(unspec:V4SF[(match_operand:V4SI 1 "vsx_register_operand" "v")]
> > > +		    UNSPEC_VSX_CVSXWSP))]
> > > +  "VECTOR_UNIT_VSX_P (V4SFmode)"
> > > +  "xvcvsxwsp %x0,%x1"
> > > +  [(set_attr "type" "vecdouble")])
> > 
> > "v" is only the VRs...  Do you want "wa" or similar instead?
> 
> I went back and re-studied the Power register constrains.  I find them a
> bit confusing, I am sure they are perfectly clear to everyone else.

Heh, good joke :-)

> So
> the instructions all take VSX registers so "wa" should be fine if I
> understand it correctly.  Not sure there is any need to further
> constrain with "vs" for doubles or "ww" but I think you could.

"ws" is all VSRs if TARGET_UPPER_REGS_DF but just FP regs otherwise.
"ww" is all VSRs if TARGET_P8_VECTOR and TARGET_UPPER_REGS_SF, just
the FP regs if only TARGET_VSX, and nothing otherwise.

I don't know what to use when.  Mike does.

> Please let me know if the updated patch is OK for gcc mainline?

Let's hear what Mike thinks about the constraints.  I *think* that
the VECTOR_UNIT_VSX_P (V4SFmode) makes "wa" just work.

> +;; Generate float2
> +;; convert two long long signed ints to float
> +(define_expand "float2_v2di"
> +  [(match_operand:V4SF 0 "register_operand" "=wa")
> +   (unspec:V4SI [(match_operand:V2DI 1 "register_operand" "wa")
> +		 (match_operand:V2DI 2 "register_operand" "wa")]
> +  UNSPEC_VSX_FLOAT2)]
> +
> +  "TARGET_VSX"

... but TARGET_VSX is probably not good enough for "wa".

Will this work at all?  The insns the expander expands to have a different
condition than the insns themselves...  VECTOR_UNIT_VSX_P (V4SFmode) is
actually the same thing as TARGET_VSX, but will it stay that way, and
more importantly, that isn't clear at all.

Segher

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2, rs6000] gcc mainline, add builtin support for vec_float, vec_float2, vec_floate, vec_floate, builtins
  2017-06-09 23:12   ` [PATCH v2, " Carl E. Love
  2017-06-10  0:19     ` Segher Boessenkool
@ 2017-06-12 18:09     ` Michael Meissner
  2017-06-12 18:40       ` Carl E. Love
  1 sibling, 1 reply; 8+ messages in thread
From: Michael Meissner @ 2017-06-12 18:09 UTC (permalink / raw)
  To: Carl E. Love
  Cc: Segher Boessenkool, gcc-patches, David Edelsohn, Bill Schmidt

On Fri, Jun 09, 2017 at 04:12:25PM -0700, Carl E. Love wrote:
> GCC Maintainers:
> 
> On Fri, 2017-06-09 at 16:05 -0500, Segher Boessenkool wrote:
> 
> Fixed the various formatting (spaces) issues.  Been toying with how to
> write a space checker for patches.  Have to take some time to really
> think about how to do that....
> 
> > > +
> > > +  /* The vector merge instruction vmrgew swaps the 2nd and 3rd words,
> > > +     compensate by swapping the 64-bit elements around to negate the vmrgew
> > > +     swap. */
> > 
> > This comment isn't very clear to me...  Could you expand it a bit?
> 
> Reworked it, hopefully it explains things better
> 
> > > +;; Mode iterator and attribute for vector floate and floato conversions
> > > +(define_mode_iterator VFC [V2DI V2DF])
> > > +(define_mode_attr VFC_inst [(V2DI "sxd") (V2DF "dp")])
> > 
> > .._sxddp, like VS_sxswp in altivec.md?  The iterator is just VSX_D.
> > 
> > Maybe some or all of these iterators/attrs should live in vector.md?
> > 
> > Is it really useful to have separate files altivec.md and vsx.md anymore?
> > Or should some things be moved?  This is a general question, not really
> > something to be handled in this patch ;-)
> > 

Probably not, particularly since we've been adding new Altivec encoded
instructions.  Back in the ISA 2.06 (power7) days, it was much clearer, that
altivec.md was the musty old instructions and vsx.md were the new ones.

> 
> Yea, I find searching through all the files to rather hard.  Perhaps
> putting all the definitions into a single "header" file?  That way it
> could span across all of the .md files.  If you combined vsx.md and
> altivec.md, you would have a really large file.  Big files can be
> problematic in their own right.

And doing large changes to 'simplify' things can lead to other problems.

> 
> > > +(define_insn "vsx_xvcvsxwsp"
> > > +  [(set (match_operand:V4SF 0 "vsx_register_operand" "=v")
> > > +	(unspec:V4SF[(match_operand:V4SI 1 "vsx_register_operand" "v")]
> > > +		    UNSPEC_VSX_CVSXWSP))]
> > > +  "VECTOR_UNIT_VSX_P (V4SFmode)"
> > > +  "xvcvsxwsp %x0,%x1"
> > > +  [(set_attr "type" "vecdouble")])
> > 
> > "v" is only the VRs...  Do you want "wa" or similar instead?
> > 
> 
> I went back and re-studied the Power register constrains.  I find them a
> bit confusing, I am sure they are perfectly clear to everyone else.  So
> the instructions all take VSX registers so "wa" should be fine if I
> understand it correctly.  Not sure there is any need to further
> constrain with "vs" for doubles or "ww" but I think you could.

Well in the power7 days, it wasn't clear whether we wanted to reduce the
register set, so I added the general "wa", and then added the more specific
changes ("ws", "wf", "wd").  In hindsight it probably wasn't a good idea.  But
the trouble is we can't delete the old constraints, or we would break user asm
code.

Over time, I have been deleting things where you have the specific constraint
and the general one where I'm modifying code:

	(match_operand:V2DF 0 "=wd,?wa")

to

	(match_operand:V2DF 0 "=wa")

Now the second round of constraints are needed because of the
-mupper-regs-<xxx> debug switches.  You might/might not allow DFmode into the
Altivec registers, and so you need several constraints:

	d	Just the traditional FPRs
	ws	Any FPR/Altivec register DFmode can go in for ISA 2.06 insns
	wk	Like ws, but only if 64-bit direct moves are supported
	wv	Only altivec registers (used for 64-bit load/stores)

Note, you have to be careful not to allow a register constraint that the
current type cannot go into.  This is due to a 'feature' in the LRA register
allocator that it will trap if such a case occurs.  For example, for ISA 2.06,
we do not have 32-bit floating point instructions in the Altivec registers.
This means you can't use "v" (just the Altivec registers) on any code where
-mcpu=power7 (or -mno-upper-regs-sf) is allowed.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2, rs6000] gcc mainline, add builtin support for vec_float, vec_float2, vec_floate, vec_floate, builtins
  2017-06-12 18:09     ` Michael Meissner
@ 2017-06-12 18:40       ` Carl E. Love
  2017-06-12 18:50         ` Michael Meissner
  2017-06-15 15:51         ` Segher Boessenkool
  0 siblings, 2 replies; 8+ messages in thread
From: Carl E. Love @ 2017-06-12 18:40 UTC (permalink / raw)
  To: Michael Meissner
  Cc: Segher Boessenkool, gcc-patches, David Edelsohn, Bill Schmidt

On Mon, 2017-06-12 at 14:09 -0400, Michael Meissner wrote:

> > 
> > > > +(define_insn "vsx_xvcvsxwsp"
> > > > +  [(set (match_operand:V4SF 0 "vsx_register_operand" "=v")
> > > > +	(unspec:V4SF[(match_operand:V4SI 1 "vsx_register_operand" "v")]
> > > > +		    UNSPEC_VSX_CVSXWSP))]
> > > > +  "VECTOR_UNIT_VSX_P (V4SFmode)"
> > > > +  "xvcvsxwsp %x0,%x1"
> > > > +  [(set_attr "type" "vecdouble")])
> > > 
> > > "v" is only the VRs...  Do you want "wa" or similar instead?
> > > 
> > 
> > I went back and re-studied the Power register constrains.  I find them a
> > bit confusing, I am sure they are perfectly clear to everyone else.  So
> > the instructions all take VSX registers so "wa" should be fine if I
> > understand it correctly.  Not sure there is any need to further
> > constrain with "vs" for doubles or "ww" but I think you could.
> 
> Well in the power7 days, it wasn't clear whether we wanted to reduce the
> register set, so I added the general "wa", and then added the more specific
> changes ("ws", "wf", "wd").  In hindsight it probably wasn't a good idea.  But
> the trouble is we can't delete the old constraints, or we would break user asm
> code.
> 
> Over time, I have been deleting things where you have the specific constraint
> and the general one where I'm modifying code:
> 
> 	(match_operand:V2DF 0 "=wd,?wa")
> 
> to
> 
> 	(match_operand:V2DF 0 "=wa")
> 
> Now the second round of constraints are needed because of the
> -mupper-regs-<xxx> debug switches.  You might/might not allow DFmode into the
> Altivec registers, and so you need several constraints:
> 
> 	d	Just the traditional FPRs
> 	ws	Any FPR/Altivec register DFmode can go in for ISA 2.06 insns
> 	wk	Like ws, but only if 64-bit direct moves are supported
> 	wv	Only altivec registers (used for 64-bit load/stores)
> 
> Note, you have to be careful not to allow a register constraint that the
> current type cannot go into.  This is due to a 'feature' in the LRA register
> allocator that it will trap if such a case occurs.  For example, for ISA 2.06,
> we do not have 32-bit floating point instructions in the Altivec registers.
> This means you can't use "v" (just the Altivec registers) on any code where
> -mcpu=power7 (or -mno-upper-regs-sf) is allowed.
> 

Michael:

OK, so sounds like I should stick to the general wa register constraint.
The third field of the define_expand I have what I believe is called the
"condition string" as "TARGET_VSX".  Is that the appropriate condition
string?  I see conditions string "VECTOR_UNIT_VSX_P (V4SFmode)" also
used.  Segher is thinking that this string would have the same effect as
"TARGET_VSX"??  How does one select the correct condition string based
on the register constraint?


Here is what I currently have for my define_expand. Is it correct?

;; Generate
float2                                                              
;; convert two long long signed ints to float                                   
(define_expand "float2_v2di"                                                    
  [(match_operand:V4SF 0 "register_operand" "=wa")                              
   (unspec:V4SI [(match_operand:V2DI 1 "register_operand" "wa")                 
                 (match_operand:V2DI 2 "register_operand" "wa")]                
  UNSPEC_VSX_FLOAT2)]                                                           
                                                                                
  "TARGET_VSX"                                                                  
{                                                                               
  rtx rtx_src1, rtx_src2, rtx_dst;                                              
                                                                                
  rtx_dst = operands[0];                                                        
  rtx_src1 = operands[1];                                                       
  rtx_src2 = operands[2];                                                       
                                                                                
  rs6000_generate_float2_code (true, rtx_dst, rtx_src1, rtx_src2);              
  DONE;                                                                         
})

Thanks for your help on this.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2, rs6000] gcc mainline, add builtin support for vec_float, vec_float2, vec_floate, vec_floate, builtins
  2017-06-12 18:40       ` Carl E. Love
@ 2017-06-12 18:50         ` Michael Meissner
  2017-06-15 15:51         ` Segher Boessenkool
  1 sibling, 0 replies; 8+ messages in thread
From: Michael Meissner @ 2017-06-12 18:50 UTC (permalink / raw)
  To: Carl E. Love
  Cc: Michael Meissner, Segher Boessenkool, gcc-patches,
	David Edelsohn, Bill Schmidt

On Mon, Jun 12, 2017 at 11:40:17AM -0700, Carl E. Love wrote:
> Michael:
> 
> OK, so sounds like I should stick to the general wa register constraint.
> The third field of the define_expand I have what I believe is called the
> "condition string" as "TARGET_VSX".  Is that the appropriate condition
> string?  I see conditions string "VECTOR_UNIT_VSX_P (V4SFmode)" also
> used.  Segher is thinking that this string would have the same effect as
> "TARGET_VSX"??  How does one select the correct condition string based
> on the register constraint?

In general, the idea was to allow you to turn off VSX support for one type, so
we used VECTOR_UNIT_VSX_P (mode) or VECTOR_MEM_VSX_P (mode) to say where there
were VSX arithmetic operations or memory operations on the particular type.

Note, it becomes an issue for V2DImode, as VECTOR_UNIT_VSX_P (V2DImode) is not
enabled until ISA 2.07 (power8), since we didn't have vector arithmetic
operations on V2DImode until then.

> 
> Here is what I currently have for my define_expand. Is it correct?
> 
> ;; Generate
> float2                                                              
> ;; convert two long long signed ints to float                                   
> (define_expand "float2_v2di"                                                    
>   [(match_operand:V4SF 0 "register_operand" "=wa")                              
>    (unspec:V4SI [(match_operand:V2DI 1 "register_operand" "wa")                 
>                  (match_operand:V2DI 2 "register_operand" "wa")]                
>   UNSPEC_VSX_FLOAT2)]                                                           
>                                                                                 
>   "TARGET_VSX"                                                                  
> {                                                                               
>   rtx rtx_src1, rtx_src2, rtx_dst;                                              
>                                                                                 
>   rtx_dst = operands[0];                                                        
>   rtx_src1 = operands[1];                                                       
>   rtx_src2 = operands[2];                                                       
>                                                                                 
>   rs6000_generate_float2_code (true, rtx_dst, rtx_src1, rtx_src2);              
>   DONE;                                                                         
> })
> 
> Thanks for your help on this.
> 

You can simplify this to:

(define_expand "float2_v2di"                                                    
  [(use (match_operand:V4SF 0 "register_operand"))
   (use (match_operand:V2DI 1 "register_operand"))
   (use (match_operand:V2DI 2 "register_operand"))]
  "TARGET_VSX"                                                                  
{                                                                               
  rtx rtx_src1, rtx_src2, rtx_dst;                                              
                                                                                
  rtx_dst = operands[0];                                                        
  rtx_src1 = operands[1];                                                       
  rtx_src2 = operands[2];                                                       
                                                                                
  rs6000_generate_float2_code (true, rtx_dst, rtx_src1, rtx_src2);              
  DONE;                                                                         
})

Since the gen* pattern never generates the code due to the DONE (the call to
rs6000_generate_float2_code does that).  And since it is a define_expand,
the constraints are not used.

Now, if it had been a define_insn_and_split, then you would have needed the
unspec and the constraints.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2, rs6000] gcc mainline, add builtin support for vec_float, vec_float2, vec_floate, vec_floate, builtins
  2017-06-12 18:40       ` Carl E. Love
  2017-06-12 18:50         ` Michael Meissner
@ 2017-06-15 15:51         ` Segher Boessenkool
  1 sibling, 0 replies; 8+ messages in thread
From: Segher Boessenkool @ 2017-06-15 15:51 UTC (permalink / raw)
  To: Carl E. Love; +Cc: Michael Meissner, gcc-patches, David Edelsohn, Bill Schmidt

On Mon, Jun 12, 2017 at 11:40:17AM -0700, Carl E. Love wrote:
> OK, so sounds like I should stick to the general wa register constraint.
> The third field of the define_expand I have what I believe is called the
> "condition string" as "TARGET_VSX".  Is that the appropriate condition
> string?  I see conditions string "VECTOR_UNIT_VSX_P (V4SFmode)" also
> used.  Segher is thinking that this string would have the same effect as
> "TARGET_VSX"??

rs6000.c has

  if (TARGET_VSX)
    {
      rs6000_vector_unit[V4SFmode] = VECTOR_VSX;

which makes VECTOR_UNIT_VSX_P (V4SFmode) the same as TARGET_VSX.

> How does one select the correct condition string based
> on the register constraint?

You don't: you decide both of those based on what you need for the
insn at hand.


Segher

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-06-15 15:51 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-09 18:20 [PATCH, rs6000] gcc mainline, add builtin support for vec_float, vec_float2, vec_floate, vec_floate, builtins Carl E. Love
2017-06-09 21:05 ` Segher Boessenkool
2017-06-09 23:12   ` [PATCH v2, " Carl E. Love
2017-06-10  0:19     ` Segher Boessenkool
2017-06-12 18:09     ` Michael Meissner
2017-06-12 18:40       ` Carl E. Love
2017-06-12 18:50         ` Michael Meissner
2017-06-15 15:51         ` Segher Boessenkool

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).