[PATCH] PR target/79799, Add vec_insert of V4SFmode on PowerPC ISA 3.0 (power9)

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] PR target/79799, Add vec_insert of V4SFmode on PowerPC ISA 3.0 (power9)
@ 2017-06-15  0:02 Michael Meissner
  2017-06-15 23:39 ` Michael Meissner
  0 siblings, 1 reply; 8+ messages in thread
From: Michael Meissner @ 2017-06-15  0:02 UTC (permalink / raw)
  To: GCC Patches, Segher Boessenkool, David Edelsohn, Bill Schmidt

[-- Attachment #1: Type: text/plain, Size: 2066 bytes --]

In doing some vector insert tests, I noticed I didn't add support for doing a
vec_insert on vector float data (PR target/79799) with ISA 3.0.  This meant
that it would do the insert by storing the vector to the stack, storing the
individual word, and then reloading the vector.

While I was at it, I added an optimization so that if you did:

	vector float v1, v2, v3;
	v1 = vec_insert (vec_extract (v2, 0), v3, 3);

it would not extract the floating point value and convert it from vector format
to scalar format, and then convert the scalar format back to vector format.

I have tested this on a little endian power8 system with bootstrap, and it had
no regressions.  I have tested the executable code on a little endian power9,
and it ran correctly.  I started a non-bootstrap make check on the power9
system, and assuming there is no regression, can I install the patch into the
trunk?

I would also like to backport it to GCC 7 after a burn-in period on the trunk.
The patch will not apply to earlier revisions.  Can I install the patch after a
burn-in period of time, assuming it applies cleanly?

[gcc]
2017-06-14  Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/79799
	* config/rs6000/rs6000.c (rs6000_expand_vector_init): Add support
	for doing vector set of SFmode on ISA 3.0.
	* config/rs6000/vsx.md (vsx_set_v4sf_p9): Likewise.
	(vsx_insert_extract_v4sf_p9): Add an optimization for inserting a
	SFmode value into a V4SF variable that was extracted from another
	V4SF variable without converting the element to double precision
	and back to single precision vector format.
	(vsx_insert_extract_v4sf_p9_2): Likewise.

[gcc/testsuite]
2017-06-14  Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/79799
	* gcc.target/powerpc/pr79799-1.c: New test.
	* gcc.target/powerpc/pr79799-2.c: New test.
	* gcc.target/powerpc/pr79799-3.c: New test.
	* gcc.target/powerpc/pr79799-4.c: New test.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

[-- Attachment #2: pr79799.patch01b --]
[-- Type: text/plain, Size: 11224 bytes --]

Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 249175)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -7442,6 +7442,9 @@ rs6000_expand_vector_set (rtx target, rt
       else if (mode == V2DImode)
 	insn = gen_vsx_set_v2di (target, target, val, elt_rtx);
 
+      else if (TARGET_P9_VECTOR && mode == V4SFmode)
+	insn = gen_vsx_set_v4sf_p9 (target, target, val, elt_rtx);
+
       else if (TARGET_P9_VECTOR && TARGET_VSX_SMALL_INTEGER
 	       && TARGET_UPPER_REGS_DI && TARGET_POWERPC64)
 	{
Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md	(revision 249175)
+++ gcc/config/rs6000/vsx.md	(working copy)
@@ -3012,6 +3012,105 @@ (define_insn "vsx_set_<mode>_p9"
 }
   [(set_attr "type" "vecperm")])
 
+(define_insn_and_split "vsx_set_v4sf_p9"
+  [(set (match_operand:V4SF 0 "gpc_reg_operand" "=wa")
+	(unspec:V4SF
+	 [(match_operand:V4SF 1 "gpc_reg_operand" "0")
+	  (match_operand:SF 2 "gpc_reg_operand" "ww")
+	  (match_operand:QI 3 "const_0_to_3_operand" "n")]
+	 UNSPEC_VSX_SET))
+   (clobber (match_scratch:SI 4 "=&wJwK"))]
+  "VECTOR_MEM_VSX_P (V4SFmode) && TARGET_P9_VECTOR"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 5)
+	(unspec:V4SF [(match_dup 2)]
+		     UNSPEC_VSX_CVDPSPN))
+   (parallel [(set (match_dup 4)
+		   (vec_select:SI (match_dup 6)
+				  (parallel [(match_dup 7)])))
+	      (clobber (scratch:SI))])
+   (set (match_dup 8)
+	(unspec:V4SI [(match_dup 8)
+		      (match_dup 4)
+		      (match_dup 3)]
+		     UNSPEC_VSX_SET))]
+{
+  unsigned int tmp_regno = reg_or_subregno (operands[4]);
+
+  operands[5] = gen_rtx_REG (V4SFmode, tmp_regno);
+  operands[6] = gen_rtx_REG (V4SImode, tmp_regno);
+  operands[7] = GEN_INT (VECTOR_ELT_ORDER_BIG ? 1 : 2);
+  operands[8] = gen_rtx_REG (V4SImode, reg_or_subregno (operands[0]));
+}
+  [(set_attr "type" "vecperm")
+   (set_attr "length" "12")])
+
+;; Optimize x = vec_insert (vec_extract (v2, n), v1, m) if n is the element
+;; that is in the default scalar position (1 for big endian, 2 for little
+;; endian).  We just need to do an xxinsertw since the element is in the
+;; correct location.
+
+(define_insn "*vsx_insert_extract_v4sf_p9"
+  [(set (match_operand:V4SF 0 "gpc_reg_operand" "=wa")
+	(unspec:V4SF
+	 [(match_operand:V4SF 1 "gpc_reg_operand" "0")
+	  (vec_select:SF (match_operand:V4SF 2 "gpc_reg_operand" "wa")
+			 (parallel
+			  [(match_operand:QI 3 "const_0_to_3_operand" "n")]))
+	  (match_operand:QI 4 "const_0_to_3_operand" "n")]
+	 UNSPEC_VSX_SET))]
+  "VECTOR_MEM_VSX_P (V4SFmode) && TARGET_P9_VECTOR
+   && (INTVAL (operands[3]) == (VECTOR_ELT_ORDER_BIG ? 1 : 2))"
+{
+  int ele = INTVAL (operands[4]);
+
+  if (!VECTOR_ELT_ORDER_BIG)
+    ele = GET_MODE_NUNITS (V4SFmode) - 1 - ele;
+
+  operands[4] = GEN_INT (GET_MODE_SIZE (SFmode) * ele);
+  return "xxinsertw %x0,%x2,%4";
+}
+  [(set_attr "type" "vecperm")])
+
+;; Optimize x = vec_insert (vec_extract (v2, n), v1, m) if n is not the element
+;; that is in the default scalar position (1 for big endian, 2 for little
+;; endian).  Convert the insert/extract to int and avoid doing the conversion.
+
+(define_insn_and_split "*vsx_insert_extract_v4sf_p9_2"
+  [(set (match_operand:V4SF 0 "gpc_reg_operand" "=wa")
+	(unspec:V4SF
+	 [(match_operand:V4SF 1 "gpc_reg_operand" "0")
+	  (vec_select:SF (match_operand:V4SF 2 "gpc_reg_operand" "wa")
+			 (parallel
+			  [(match_operand:QI 3 "const_0_to_3_operand" "n")]))
+	  (match_operand:QI 4 "const_0_to_3_operand" "n")]
+	 UNSPEC_VSX_SET))
+   (clobber (match_scratch:SI 5 "=&wJwK"))]
+  "VECTOR_MEM_VSX_P (V4SFmode) && VECTOR_MEM_VSX_P (V4SImode)
+   && TARGET_P9_VECTOR && TARGET_VSX_SMALL_INTEGER
+   && (INTVAL (operands[3]) != (VECTOR_ELT_ORDER_BIG ? 1 : 2))"
+  "#"
+  "&& 1"
+  [(parallel [(set (match_dup 5)
+		   (vec_select:SI (match_dup 6)
+				  (parallel [(match_dup 3)])))
+	      (clobber (scratch:SI))])
+   (set (match_dup 7)
+	(unspec:V4SI [(match_dup 8)
+		      (match_dup 5)
+		      (match_dup 4)]
+		     UNSPEC_VSX_SET))]
+{
+  if (GET_CODE (operands[5]) == SCRATCH)
+    operands[5] = gen_reg_rtx (SImode);
+
+  operands[6] = gen_lowpart (V4SImode, operands[2]);
+  operands[7] = gen_lowpart (V4SImode, operands[0]);
+  operands[8] = gen_lowpart (V4SImode, operands[1]);
+}
+  [(set_attr "type" "vecperm")])
+
 ;; Expanders for builtins
 (define_expand "vsx_mergel_<mode>"
   [(use (match_operand:VSX_D 0 "vsx_register_operand" ""))
Index: gcc/testsuite/gcc.target/powerpc/pr79799-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr79799-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr79799-1.c	(revision 0)
@@ -0,0 +1,41 @@
+/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-mcpu=power9 -O2" } */
+
+#include <altivec.h>
+
+/* GCC 7.1 did not have a specialized method for inserting 32-bit floating point on
+   ISA 3.0 (power9) systems.  */
+
+vector float
+insert_arg_0 (vector float vf, float f)
+{
+  return vec_insert (f, vf, 0);
+}
+
+vector float
+insert_arg_1 (vector float vf, float f)
+{
+  return vec_insert (f, vf, 1);
+}
+
+vector float
+insert_arg_2 (vector float vf, float f)
+{
+  return vec_insert (f, vf, 2);
+}
+
+vector float
+insert_arg_3 (vector float vf, float f)
+{
+  return vec_insert (f, vf, 3);
+}
+
+/* { dg-final { scan-assembler-not {\mlvewx\M}   } } */
+/* { dg-final { scan-assembler-not {\mlvx\M}     } } */
+/* { dg-final { scan-assembler-not {\mvperm\M}   } } */
+/* { dg-final { scan-assembler-not {\mvpermr\M}  } } */
+/* { dg-final { scan-assembler-not {\mstfs\M}    } } */
+/* { dg-final { scan-assembler-not {\mstxssp\M}  } } */
+/* { dg-final { scan-assembler-not {\mstxsspx\M} } } */
Index: gcc/testsuite/gcc.target/powerpc/pr79799-2.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr79799-2.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr79799-2.c	(revision 0)
@@ -0,0 +1,31 @@
+/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-mcpu=power9 -O2" } */
+
+#include <altivec.h>
+
+/* Optimize x = vec_insert (vec_extract (v2, N), v1, M) for SFmode if N is the default
+   scalar position.  */
+
+#if __ORDER_LITTLE_ENDIAN__
+#define ELE 2
+#else
+#define ELE 1
+#endif
+
+vector float
+foo (vector float v1, vector float v2)
+{
+  return vec_insert (vec_extract (v2, ELE), v1, 0);
+}
+
+/* { dg-final { scan-assembler     {\mxxinsertw\M}   } } */
+/* { dg-final { scan-assembler-not {\mxxextractuw\M} } } */
+/* { dg-final { scan-assembler-not {\mlvewx\M}       } } */
+/* { dg-final { scan-assembler-not {\mlvx\M}         } } */
+/* { dg-final { scan-assembler-not {\mvperm\M}       } } */
+/* { dg-final { scan-assembler-not {\mvpermr\M}      } } */
+/* { dg-final { scan-assembler-not {\mstfs\M}        } } */
+/* { dg-final { scan-assembler-not {\mstxssp\M}      } } */
+/* { dg-final { scan-assembler-not {\mstxsspx\M}     } } */
Index: gcc/testsuite/gcc.target/powerpc/pr79799-3.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr79799-3.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr79799-3.c	(revision 0)
@@ -0,0 +1,24 @@
+/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-mcpu=power9 -O2" } */
+
+#include <altivec.h>
+
+/* Optimize x = vec_insert (vec_extract (v2, N), v1, M) for SFmode.  */
+
+vector float
+foo (vector float v1, vector float v2)
+{
+  return vec_insert (vec_extract (v2, 4), v1, 0);
+}
+
+/* { dg-final { scan-assembler     {\mxxinsertw\M}   } } */
+/* { dg-final { scan-assembler     {\mxxextractuw\M} } } */
+/* { dg-final { scan-assembler-not {\mlvewx\M}       } } */
+/* { dg-final { scan-assembler-not {\mlvx\M}         } } */
+/* { dg-final { scan-assembler-not {\mvperm\M}       } } */
+/* { dg-final { scan-assembler-not {\mvpermr\M}      } } */
+/* { dg-final { scan-assembler-not {\mstfs\M}        } } */
+/* { dg-final { scan-assembler-not {\mstxssp\M}      } } */
+/* { dg-final { scan-assembler-not {\mstxsspx\M}     } } */
Index: gcc/testsuite/gcc.target/powerpc/pr79799-4.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr79799-4.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr79799-4.c	(revision 0)
@@ -0,0 +1,105 @@
+/* { dg-do run { target { powerpc*-*-linux* } } } */
+/* { dg-require-effective-target vsx_hw } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-require-effective-target p9vector_hw } */
+/* { dg-options "-mcpu=power9 -O2" } */
+
+#include <altivec.h>
+#include <stdlib.h>
+
+__attribute__ ((__noinline__))
+vector float
+insert_0 (vector float v, float f)
+{
+  return vec_insert (f, v, 0);
+}
+
+__attribute__ ((__noinline__))
+vector float
+insert_1 (vector float v, float f)
+{
+  return vec_insert (f, v, 1);
+}
+
+__attribute__ ((__noinline__))
+vector float
+insert_2 (vector float v, float f)
+{
+  return vec_insert (f, v, 2);
+}
+
+__attribute__ ((__noinline__))
+vector float
+insert_3 (vector float v, float f)
+{
+  return vec_insert (f, v, 3);
+}
+
+__attribute__ ((__noinline__))
+void
+test_insert (void)
+{
+  vector float v1 = { 1.0f, 2.0f, 3.0f, 4.0f };
+  vector float v2 = { 5.0f, 6.0f, 7.0f, 8.0f };
+
+  v1 = insert_0 (v1, 5.0f);
+  v1 = insert_1 (v1, 6.0f);
+  v1 = insert_2 (v1, 7.0f);
+  v1 = insert_3 (v1, 8.0f);
+
+  if (vec_any_ne (v1, v2))
+    abort ();
+}
+
+__attribute__ ((__noinline__))
+vector float
+insert_extract_0_3 (vector float v1, vector float v2)
+{
+  return vec_insert (vec_extract (v2, 3), v1, 0);
+}
+
+__attribute__ ((__noinline__))
+vector float
+insert_extract_1_2 (vector float v1, vector float v2)
+{
+  return vec_insert (vec_extract (v2, 2), v1, 1);
+}
+
+__attribute__ ((__noinline__))
+vector float
+insert_extract_2_1 (vector float v1, vector float v2)
+{
+  return vec_insert (vec_extract (v2, 1), v1, 2);
+}
+
+__attribute__ ((__noinline__))
+vector float
+insert_extract_3_0 (vector float v1, vector float v2)
+{
+  return vec_insert (vec_extract (v2, 0), v1, 3);
+}
+
+__attribute__ ((__noinline__))
+void
+test_insert_extract (void)
+{
+  vector float v1 = { 1.0f, 2.0f, 3.0f, 4.0f };
+  vector float v2 = { 5.0f, 6.0f, 7.0f, 8.0f };
+  vector float v3 = { 8.0f, 7.0f, 6.0f, 5.0f };
+
+  v1 = insert_extract_0_3 (v1, v2);
+  v1 = insert_extract_1_2 (v1, v2);
+  v1 = insert_extract_2_1 (v1, v2);
+  v1 = insert_extract_3_0 (v1, v2);
+
+  if (vec_any_ne (v1, v3))
+    abort ();
+}
+
+int
+main (void)
+{
+  test_insert ();
+  test_insert_extract ();
+  return 0;
+}

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] PR target/79799, Add vec_insert of V4SFmode on PowerPC ISA 3.0 (power9)
  2017-06-15  0:02 [PATCH] PR target/79799, Add vec_insert of V4SFmode on PowerPC ISA 3.0 (power9) Michael Meissner
@ 2017-06-15 23:39 ` Michael Meissner
  2017-06-16  2:10   ` [PATCH, rev 2] " Michael Meissner
  0 siblings, 1 reply; 8+ messages in thread
From: Michael Meissner @ 2017-06-15 23:39 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: GCC Patches, Segher Boessenkool, David Edelsohn, Bill Schmidt,
	Michael Meissner

I thought the patch was fine as I posted.  I had an optimization I thought
about (optimizing for inserting 0.0f) and I noticed some problems with it.
However, even in backing out the change, there are some problems.  So, I will
hopefully reissue the patch tomorrow.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH, rev 2] PR target/79799, Add vec_insert of V4SFmode on PowerPC ISA 3.0 (power9)
  2017-06-15 23:39 ` Michael Meissner
@ 2017-06-16  2:10   ` Michael Meissner
  2017-06-16 19:52     ` Segher Boessenkool
  0 siblings, 1 reply; 8+ messages in thread
From: Michael Meissner @ 2017-06-16  2:10 UTC (permalink / raw)
  To: Michael Meissner, Segher Boessenkool, GCC Patches,
	David Edelsohn, Bill Schmidt

[-- Attachment #1: Type: text/plain, Size: 2446 bytes --]

On Thu, Jun 15, 2017 at 07:39:39PM -0400, Michael Meissner wrote:
> I thought the patch was fine as I posted.  I had an optimization I thought
> about (optimizing for inserting 0.0f) and I noticed some problems with it.
> However, even in backing out the change, there are some problems.  So, I will
> hopefully reissue the patch tomorrow.

Ok, the problem was I need to patch the compiler with a work around to run code
on the current alpha hardware, and in backing out the patches of the code I was
working on, I backed out the work around as well.

This patch replaces the first patch.  It adds an optimazation so that if you
set a field in a V4SFmode vector to 0.0f, the compiler will know it can just
clear the field, and it doesn't have to convert the 0.0 in internal scalar
format to vector format witht he XSCVDPSPN instruction.

As before, I have bootstrapped this patch on a little endian power8 system, and
I had no regressions in the test suite.  The new tests pr79799-{1,2,3,5}.c all
generate the appropriate code.  I have also done a non-bootstrap build and make
check on the alpha power9 hardware with --with-cpu=power9, and there are no
regressions.  The executable test (pr79799-4.c) runs fine.

Can I install this change to the trunk?  After a week of burn-in, can I install
this on the GCC 7.x branch?  Note, it will not work on previous branches.

[gcc]
2017-06-15  Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/79799
	* config/rs6000/rs6000.c (rs6000_expand_vector_init): Add support
	for doing vector set of SFmode on ISA 3.0.
	* config/rs6000/vsx.md (vsx_set_v4sf_p9): Likewise.
	(vsx_set_v4sf_p9_zero): Special case setting 0.0f to a V4SF
	element.
	(vsx_insert_extract_v4sf_p9): Add an optimization for inserting a
	SFmode value into a V4SF variable that was extracted from another
	V4SF variable without converting the element to double precision
	and back to single precision vector format.
	(vsx_insert_extract_v4sf_p9_2): Likewise.

[gcc/testsuite]
2017-06-15  Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/79799
	* gcc.target/powerpc/pr79799-1.c: New test.
	* gcc.target/powerpc/pr79799-2.c: Likewise.
	* gcc.target/powerpc/pr79799-3.c: Likewise.
	* gcc.target/powerpc/pr79799-4.c: Likewise.
	* gcc.target/powerpc/pr79799-5.c: Likewise.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

[-- Attachment #2: pr79799.patch02b --]
[-- Type: text/plain, Size: 13473 bytes --]

Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 249175)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -7442,6 +7442,9 @@ rs6000_expand_vector_set (rtx target, rt
       else if (mode == V2DImode)
 	insn = gen_vsx_set_v2di (target, target, val, elt_rtx);
 
+      else if (TARGET_P9_VECTOR && mode == V4SFmode)
+	insn = gen_vsx_set_v4sf_p9 (target, target, val, elt_rtx);
+
       else if (TARGET_P9_VECTOR && TARGET_VSX_SMALL_INTEGER
 	       && TARGET_UPPER_REGS_DI && TARGET_POWERPC64)
 	{
Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md	(revision 249175)
+++ gcc/config/rs6000/vsx.md	(working copy)
@@ -3012,6 +3012,130 @@ (define_insn "vsx_set_<mode>_p9"
 }
   [(set_attr "type" "vecperm")])
 
+(define_insn_and_split "vsx_set_v4sf_p9"
+  [(set (match_operand:V4SF 0 "gpc_reg_operand" "=wa")
+	(unspec:V4SF
+	 [(match_operand:V4SF 1 "gpc_reg_operand" "0")
+	  (match_operand:SF 2 "gpc_reg_operand" "ww")
+	  (match_operand:QI 3 "const_0_to_3_operand" "n")]
+	 UNSPEC_VSX_SET))
+   (clobber (match_scratch:SI 4 "=&wJwK"))]
+  "VECTOR_MEM_VSX_P (V4SFmode) && TARGET_P9_VECTOR"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 5)
+	(unspec:V4SF [(match_dup 2)]
+		     UNSPEC_VSX_CVDPSPN))
+   (parallel [(set (match_dup 4)
+		   (vec_select:SI (match_dup 6)
+				  (parallel [(match_dup 7)])))
+	      (clobber (scratch:SI))])
+   (set (match_dup 8)
+	(unspec:V4SI [(match_dup 8)
+		      (match_dup 4)
+		      (match_dup 3)]
+		     UNSPEC_VSX_SET))]
+{
+  unsigned int tmp_regno = reg_or_subregno (operands[4]);
+
+  operands[5] = gen_rtx_REG (V4SFmode, tmp_regno);
+  operands[6] = gen_rtx_REG (V4SImode, tmp_regno);
+  operands[7] = GEN_INT (VECTOR_ELT_ORDER_BIG ? 1 : 2);
+  operands[8] = gen_rtx_REG (V4SImode, reg_or_subregno (operands[0]));
+}
+  [(set_attr "type" "vecperm")
+   (set_attr "length" "12")])
+
+;; Special case setting 0.0f to a V4SF element
+(define_insn_and_split "*vsx_set_v4sf_p9_zero"
+  [(set (match_operand:V4SF 0 "gpc_reg_operand" "=wa")
+	(unspec:V4SF
+	 [(match_operand:V4SF 1 "gpc_reg_operand" "0")
+	  (match_operand:SF 2 "zero_fp_constant" "j")
+	  (match_operand:QI 3 "const_0_to_3_operand" "n")]
+	 UNSPEC_VSX_SET))
+   (clobber (match_scratch:SI 4 "=&wJwK"))]
+  "VECTOR_MEM_VSX_P (V4SFmode) && TARGET_P9_VECTOR"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 4)
+	(const_int 0))
+   (set (match_dup 5)
+	(unspec:V4SI [(match_dup 5)
+		      (match_dup 4)
+		      (match_dup 3)]
+		     UNSPEC_VSX_SET))]
+{
+  operands[5] = gen_rtx_REG (V4SImode, reg_or_subregno (operands[0]));
+}
+  [(set_attr "type" "vecperm")
+   (set_attr "length" "8")])
+
+;; Optimize x = vec_insert (vec_extract (v2, n), v1, m) if n is the element
+;; that is in the default scalar position (1 for big endian, 2 for little
+;; endian).  We just need to do an xxinsertw since the element is in the
+;; correct location.
+
+(define_insn "*vsx_insert_extract_v4sf_p9"
+  [(set (match_operand:V4SF 0 "gpc_reg_operand" "=wa")
+	(unspec:V4SF
+	 [(match_operand:V4SF 1 "gpc_reg_operand" "0")
+	  (vec_select:SF (match_operand:V4SF 2 "gpc_reg_operand" "wa")
+			 (parallel
+			  [(match_operand:QI 3 "const_0_to_3_operand" "n")]))
+	  (match_operand:QI 4 "const_0_to_3_operand" "n")]
+	 UNSPEC_VSX_SET))]
+  "VECTOR_MEM_VSX_P (V4SFmode) && TARGET_P9_VECTOR
+   && (INTVAL (operands[3]) == (VECTOR_ELT_ORDER_BIG ? 1 : 2))"
+{
+  int ele = INTVAL (operands[4]);
+
+  if (!VECTOR_ELT_ORDER_BIG)
+    ele = GET_MODE_NUNITS (V4SFmode) - 1 - ele;
+
+  operands[4] = GEN_INT (GET_MODE_SIZE (SFmode) * ele);
+  return "xxinsertw %x0,%x2,%4";
+}
+  [(set_attr "type" "vecperm")])
+
+;; Optimize x = vec_insert (vec_extract (v2, n), v1, m) if n is not the element
+;; that is in the default scalar position (1 for big endian, 2 for little
+;; endian).  Convert the insert/extract to int and avoid doing the conversion.
+
+(define_insn_and_split "*vsx_insert_extract_v4sf_p9_2"
+  [(set (match_operand:V4SF 0 "gpc_reg_operand" "=wa")
+	(unspec:V4SF
+	 [(match_operand:V4SF 1 "gpc_reg_operand" "0")
+	  (vec_select:SF (match_operand:V4SF 2 "gpc_reg_operand" "wa")
+			 (parallel
+			  [(match_operand:QI 3 "const_0_to_3_operand" "n")]))
+	  (match_operand:QI 4 "const_0_to_3_operand" "n")]
+	 UNSPEC_VSX_SET))
+   (clobber (match_scratch:SI 5 "=&wJwK"))]
+  "VECTOR_MEM_VSX_P (V4SFmode) && VECTOR_MEM_VSX_P (V4SImode)
+   && TARGET_P9_VECTOR && TARGET_VSX_SMALL_INTEGER
+   && (INTVAL (operands[3]) != (VECTOR_ELT_ORDER_BIG ? 1 : 2))"
+  "#"
+  "&& 1"
+  [(parallel [(set (match_dup 5)
+		   (vec_select:SI (match_dup 6)
+				  (parallel [(match_dup 3)])))
+	      (clobber (scratch:SI))])
+   (set (match_dup 7)
+	(unspec:V4SI [(match_dup 8)
+		      (match_dup 5)
+		      (match_dup 4)]
+		     UNSPEC_VSX_SET))]
+{
+  if (GET_CODE (operands[5]) == SCRATCH)
+    operands[5] = gen_reg_rtx (SImode);
+
+  operands[6] = gen_lowpart (V4SImode, operands[2]);
+  operands[7] = gen_lowpart (V4SImode, operands[0]);
+  operands[8] = gen_lowpart (V4SImode, operands[1]);
+}
+  [(set_attr "type" "vecperm")])
+
 ;; Expanders for builtins
 (define_expand "vsx_mergel_<mode>"
   [(use (match_operand:VSX_D 0 "vsx_register_operand" ""))
Index: gcc/testsuite/gcc.target/powerpc/pr79799-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr79799-1.c	(nonexistent)
+++ gcc/testsuite/gcc.target/powerpc/pr79799-1.c	(working copy)
@@ -0,0 +1,43 @@
+/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-mcpu=power9 -O2" } */
+
+#include <altivec.h>
+
+/* GCC 7.1 did not have a specialized method for inserting 32-bit floating point on
+   ISA 3.0 (power9) systems.  */
+
+vector float
+insert_arg_0 (vector float vf, float f)
+{
+  return vec_insert (f, vf, 0);
+}
+
+vector float
+insert_arg_1 (vector float vf, float f)
+{
+  return vec_insert (f, vf, 1);
+}
+
+vector float
+insert_arg_2 (vector float vf, float f)
+{
+  return vec_insert (f, vf, 2);
+}
+
+vector float
+insert_arg_3 (vector float vf, float f)
+{
+  return vec_insert (f, vf, 3);
+}
+
+/* { dg-final { scan-assembler     {\mxscvdpspn\M} } } */
+/* { dg-final { scan-assembler     {\mxxinsertw\M} } } */
+/* { dg-final { scan-assembler-not {\mlvewx\M}     } } */
+/* { dg-final { scan-assembler-not {\mlvx\M}       } } */
+/* { dg-final { scan-assembler-not {\mvperm\M}     } } */
+/* { dg-final { scan-assembler-not {\mvpermr\M}    } } */
+/* { dg-final { scan-assembler-not {\mstfs\M}      } } */
+/* { dg-final { scan-assembler-not {\mstxssp\M}    } } */
+/* { dg-final { scan-assembler-not {\mstxsspx\M}   } } */
Index: gcc/testsuite/gcc.target/powerpc/pr79799-2.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr79799-2.c	(nonexistent)
+++ gcc/testsuite/gcc.target/powerpc/pr79799-2.c	(working copy)
@@ -0,0 +1,31 @@
+/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-mcpu=power9 -O2" } */
+
+#include <altivec.h>
+
+/* Optimize x = vec_insert (vec_extract (v2, N), v1, M) for SFmode if N is the default
+   scalar position.  */
+
+#if __ORDER_LITTLE_ENDIAN__
+#define ELE 2
+#else
+#define ELE 1
+#endif
+
+vector float
+foo (vector float v1, vector float v2)
+{
+  return vec_insert (vec_extract (v2, ELE), v1, 0);
+}
+
+/* { dg-final { scan-assembler     {\mxxinsertw\M}   } } */
+/* { dg-final { scan-assembler-not {\mxxextractuw\M} } } */
+/* { dg-final { scan-assembler-not {\mlvewx\M}       } } */
+/* { dg-final { scan-assembler-not {\mlvx\M}         } } */
+/* { dg-final { scan-assembler-not {\mvperm\M}       } } */
+/* { dg-final { scan-assembler-not {\mvpermr\M}      } } */
+/* { dg-final { scan-assembler-not {\mstfs\M}        } } */
+/* { dg-final { scan-assembler-not {\mstxssp\M}      } } */
+/* { dg-final { scan-assembler-not {\mstxsspx\M}     } } */
Index: gcc/testsuite/gcc.target/powerpc/pr79799-3.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr79799-3.c	(nonexistent)
+++ gcc/testsuite/gcc.target/powerpc/pr79799-3.c	(working copy)
@@ -0,0 +1,24 @@
+/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-mcpu=power9 -O2" } */
+
+#include <altivec.h>
+
+/* Optimize x = vec_insert (vec_extract (v2, N), v1, M) for SFmode.  */
+
+vector float
+foo (vector float v1, vector float v2)
+{
+  return vec_insert (vec_extract (v2, 4), v1, 0);
+}
+
+/* { dg-final { scan-assembler     {\mxxinsertw\M}   } } */
+/* { dg-final { scan-assembler     {\mxxextractuw\M} } } */
+/* { dg-final { scan-assembler-not {\mlvewx\M}       } } */
+/* { dg-final { scan-assembler-not {\mlvx\M}         } } */
+/* { dg-final { scan-assembler-not {\mvperm\M}       } } */
+/* { dg-final { scan-assembler-not {\mvpermr\M}      } } */
+/* { dg-final { scan-assembler-not {\mstfs\M}        } } */
+/* { dg-final { scan-assembler-not {\mstxssp\M}      } } */
+/* { dg-final { scan-assembler-not {\mstxsspx\M}     } } */
Index: gcc/testsuite/gcc.target/powerpc/pr79799-4.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr79799-4.c	(nonexistent)
+++ gcc/testsuite/gcc.target/powerpc/pr79799-4.c	(working copy)
@@ -0,0 +1,105 @@
+/* { dg-do run { target { powerpc*-*-linux* } } } */
+/* { dg-require-effective-target vsx_hw } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-require-effective-target p9vector_hw } */
+/* { dg-options "-mcpu=power9 -O2" } */
+
+#include <altivec.h>
+#include <stdlib.h>
+
+__attribute__ ((__noinline__))
+vector float
+insert_0 (vector float v, float f)
+{
+  return vec_insert (f, v, 0);
+}
+
+__attribute__ ((__noinline__))
+vector float
+insert_1 (vector float v, float f)
+{
+  return vec_insert (f, v, 1);
+}
+
+__attribute__ ((__noinline__))
+vector float
+insert_2 (vector float v, float f)
+{
+  return vec_insert (f, v, 2);
+}
+
+__attribute__ ((__noinline__))
+vector float
+insert_3 (vector float v, float f)
+{
+  return vec_insert (f, v, 3);
+}
+
+__attribute__ ((__noinline__))
+void
+test_insert (void)
+{
+  vector float v1 = { 1.0f, 2.0f, 3.0f, 4.0f };
+  vector float v2 = { 5.0f, 6.0f, 7.0f, 8.0f };
+
+  v1 = insert_0 (v1, 5.0f);
+  v1 = insert_1 (v1, 6.0f);
+  v1 = insert_2 (v1, 7.0f);
+  v1 = insert_3 (v1, 8.0f);
+
+  if (vec_any_ne (v1, v2))
+    abort ();
+}
+
+__attribute__ ((__noinline__))
+vector float
+insert_extract_0_3 (vector float v1, vector float v2)
+{
+  return vec_insert (vec_extract (v2, 3), v1, 0);
+}
+
+__attribute__ ((__noinline__))
+vector float
+insert_extract_1_2 (vector float v1, vector float v2)
+{
+  return vec_insert (vec_extract (v2, 2), v1, 1);
+}
+
+__attribute__ ((__noinline__))
+vector float
+insert_extract_2_1 (vector float v1, vector float v2)
+{
+  return vec_insert (vec_extract (v2, 1), v1, 2);
+}
+
+__attribute__ ((__noinline__))
+vector float
+insert_extract_3_0 (vector float v1, vector float v2)
+{
+  return vec_insert (vec_extract (v2, 0), v1, 3);
+}
+
+__attribute__ ((__noinline__))
+void
+test_insert_extract (void)
+{
+  vector float v1 = { 1.0f, 2.0f, 3.0f, 4.0f };
+  vector float v2 = { 5.0f, 6.0f, 7.0f, 8.0f };
+  vector float v3 = { 8.0f, 7.0f, 6.0f, 5.0f };
+
+  v1 = insert_extract_0_3 (v1, v2);
+  v1 = insert_extract_1_2 (v1, v2);
+  v1 = insert_extract_2_1 (v1, v2);
+  v1 = insert_extract_3_0 (v1, v2);
+
+  if (vec_any_ne (v1, v3))
+    abort ();
+}
+
+int
+main (void)
+{
+  test_insert ();
+  test_insert_extract ();
+  return 0;
+}
Index: gcc/testsuite/gcc.target/powerpc/pr79799-5.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr79799-5.c	(nonexistent)
+++ gcc/testsuite/gcc.target/powerpc/pr79799-5.c	(working copy)
@@ -0,0 +1,25 @@
+/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-mcpu=power9 -O2" } */
+
+#include <altivec.h>
+
+/* Insure setting 0.0f to a V4SFmode element does not do a FP conversion.  */
+
+vector float
+insert_arg_0 (vector float vf)
+{
+  return vec_insert (0.0f, vf, 0);
+}
+
+/* { dg-final { scan-assembler     {\mxxinsertw\M}   } } */
+/* { dg-final { scan-assembler-not {\mlvewx\M}       } } */
+/* { dg-final { scan-assembler-not {\mlvx\M}         } } */
+/* { dg-final { scan-assembler-not {\mvperm\M}       } } */
+/* { dg-final { scan-assembler-not {\mvpermr\M}      } } */
+/* { dg-final { scan-assembler-not {\mstfs\M}        } } */
+/* { dg-final { scan-assembler-not {\mstxssp\M}      } } */
+/* { dg-final { scan-assembler-not {\mstxsspx\M}     } } */
+/* { dg-final { scan-assembler-not {\mxscvdpspn\M}   } } */
+/* { dg-final { scan-assembler-not {\mxxextractuw\M} } } */

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH, rev 2] PR target/79799, Add vec_insert of V4SFmode on PowerPC ISA 3.0 (power9)
  2017-06-16  2:10   ` [PATCH, rev 2] " Michael Meissner
@ 2017-06-16 19:52     ` Segher Boessenkool
  2017-06-16 20:27       ` Michael Meissner
  0 siblings, 1 reply; 8+ messages in thread
From: Segher Boessenkool @ 2017-06-16 19:52 UTC (permalink / raw)
  To: Michael Meissner, GCC Patches, David Edelsohn, Bill Schmidt

Hi Mike,

On Thu, Jun 15, 2017 at 10:10:28PM -0400, Michael Meissner wrote:
> +(define_insn_and_split "vsx_set_v4sf_p9"
> +  [(set (match_operand:V4SF 0 "gpc_reg_operand" "=wa")
> +	(unspec:V4SF
> +	 [(match_operand:V4SF 1 "gpc_reg_operand" "0")
> +	  (match_operand:SF 2 "gpc_reg_operand" "ww")
> +	  (match_operand:QI 3 "const_0_to_3_operand" "n")]
> +	 UNSPEC_VSX_SET))
> +   (clobber (match_scratch:SI 4 "=&wJwK"))]
> +  "VECTOR_MEM_VSX_P (V4SFmode) && TARGET_P9_VECTOR"
> +  "#"
> +  "&& reload_completed"

I still don't think it is such a good idea to do all of this not until
after reload.  It does of course allow you to play tricks with changing
register mode at will, like you do ;-)

All these unspecs are a similar problem: the RTL optimisers cannot do
much at all with it.

> +  [(set_attr "type" "vecperm")

Is that a good type for this?  I think the convert is more expensive
than the permutes?  If so, that would be better (of course it only
matters for sched1, not super important).

> --- gcc/testsuite/gcc.target/powerpc/pr79799-1.c	(nonexistent)
> +++ gcc/testsuite/gcc.target/powerpc/pr79799-1.c	(working copy)
> @@ -0,0 +1,43 @@
> +/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */

Why not powerpc*-*-*?

> +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
> +/* { dg-require-effective-target powerpc_p9vector_ok } */
> +/* { dg-options "-mcpu=power9 -O2" } */
> +
> +#include <altivec.h>
> +
> +/* GCC 7.1 did not have a specialized method for inserting 32-bit floating point on
> +   ISA 3.0 (power9) systems.  */

That first line is a bit long.


The patch is okay for trunk and 7 with the testsuite nits taken care of.

Thanks,


Segher

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH, rev 2] PR target/79799, Add vec_insert of V4SFmode on PowerPC ISA 3.0 (power9)
  2017-06-16 19:52     ` Segher Boessenkool
@ 2017-06-16 20:27       ` Michael Meissner
  2017-06-16 21:30         ` Segher Boessenkool
  0 siblings, 1 reply; 8+ messages in thread
From: Michael Meissner @ 2017-06-16 20:27 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Michael Meissner, GCC Patches, David Edelsohn, Bill Schmidt

On Fri, Jun 16, 2017 at 02:52:46PM -0500, Segher Boessenkool wrote:
> Hi Mike,
> 
> On Thu, Jun 15, 2017 at 10:10:28PM -0400, Michael Meissner wrote:
> > +(define_insn_and_split "vsx_set_v4sf_p9"
> > +  [(set (match_operand:V4SF 0 "gpc_reg_operand" "=wa")
> > +	(unspec:V4SF
> > +	 [(match_operand:V4SF 1 "gpc_reg_operand" "0")
> > +	  (match_operand:SF 2 "gpc_reg_operand" "ww")
> > +	  (match_operand:QI 3 "const_0_to_3_operand" "n")]
> > +	 UNSPEC_VSX_SET))
> > +   (clobber (match_scratch:SI 4 "=&wJwK"))]
> > +  "VECTOR_MEM_VSX_P (V4SFmode) && TARGET_P9_VECTOR"
> > +  "#"
> > +  "&& reload_completed"
> 
> I still don't think it is such a good idea to do all of this not until
> after reload.  It does of course allow you to play tricks with changing
> register mode at will, like you do ;-)

The problem is MODES_TIEABLE_P.  V4S{I,F}mode and SImode cannot be tied
together (i.e. use gen_lowpart to change the mode and use a SUBREG).  So after
reload, we can just use gen_rtx_REG (...) to change the register type, but
before reload, by creating the SUBREG, it can lead to various aborts if rtl
checking is turned on.

> All these unspecs are a similar problem: the RTL optimisers cannot do
> much at all with it.

I don't think there is a good way to represent a vec_insert.  And vec_extract
can't represent a variable extract either.

> > +  [(set_attr "type" "vecperm")

I generally use the type of the last insn.  I am open to other suggestions.

> Is that a good type for this?  I think the convert is more expensive
> than the permutes?  If so, that would be better (of course it only
> matters for sched1, not super important).
> 
> > --- gcc/testsuite/gcc.target/powerpc/pr79799-1.c	(nonexistent)
> > +++ gcc/testsuite/gcc.target/powerpc/pr79799-1.c	(working copy)
> > @@ -0,0 +1,43 @@
> > +/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */
> 
> Why not powerpc*-*-*?

Well as it turns out, it aborts in 32-bit, because -mvsx-small-integer is not
enabled, and we can't have SImode in vector registers.  I'll have to add some
additional tests and resubmit the patch.

> 
> > +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
> > +/* { dg-require-effective-target powerpc_p9vector_ok } */
> > +/* { dg-options "-mcpu=power9 -O2" } */
> > +
> > +#include <altivec.h>
> > +
> > +/* GCC 7.1 did not have a specialized method for inserting 32-bit floating point on
> > +   ISA 3.0 (power9) systems.  */
> 
> That first line is a bit long.

Ok.

> The patch is okay for trunk and 7 with the testsuite nits taken care of.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH, rev 2] PR target/79799, Add vec_insert of V4SFmode on PowerPC ISA 3.0 (power9)
  2017-06-16 20:27       ` Michael Meissner
@ 2017-06-16 21:30         ` Segher Boessenkool
  2017-06-16 21:55           ` Michael Meissner
  0 siblings, 1 reply; 8+ messages in thread
From: Segher Boessenkool @ 2017-06-16 21:30 UTC (permalink / raw)
  To: Michael Meissner, GCC Patches, David Edelsohn, Bill Schmidt

On Fri, Jun 16, 2017 at 04:26:58PM -0400, Michael Meissner wrote:
> > > +  "&& reload_completed"
> > 
> > I still don't think it is such a good idea to do all of this not until
> > after reload.  It does of course allow you to play tricks with changing
> > register mode at will, like you do ;-)
> 
> The problem is MODES_TIEABLE_P.  V4S{I,F}mode and SImode cannot be tied
> together (i.e. use gen_lowpart to change the mode and use a SUBREG).  So after
> reload, we can just use gen_rtx_REG (...) to change the register type, but
> before reload, by creating the SUBREG, it can lead to various aborts if rtl
> checking is turned on.

That sounds like a problem elsewhere?  Hrm.

> > All these unspecs are a similar problem: the RTL optimisers cannot do
> > much at all with it.
> 
> I don't think there is a good way to represent a vec_insert.  And vec_extract
> can't represent a variable extract either.

Yeah.  But especially for all this lane shuffling etc. the generic
optimisers could do a good job, if only they knew how.  Maybe we need
some new RTL codes.

> > > +  [(set_attr "type" "vecperm")
> 
> > Is that a good type for this?  I think the convert is more expensive
> > than the permutes?  If so, that would be better (of course it only
> > matters for sched1, not super important).
> 
> I generally use the type of the last insn.  I am open to other suggestions.

It should describe the resulting insns as a whole.  Picking the type of
the most expensive insn is often a reasonable approximation; for integer
insns "two" or "three" can be okay.

I don't think we can do much better currently.


Segher

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH, rev 2] PR target/79799, Add vec_insert of V4SFmode on PowerPC ISA 3.0 (power9)
  2017-06-16 21:30         ` Segher Boessenkool
@ 2017-06-16 21:55           ` Michael Meissner
  2017-06-19 23:16             ` Segher Boessenkool
  0 siblings, 1 reply; 8+ messages in thread
From: Michael Meissner @ 2017-06-16 21:55 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Michael Meissner, GCC Patches, David Edelsohn, Bill Schmidt

[-- Attachment #1: Type: text/plain, Size: 3552 bytes --]

On Fri, Jun 16, 2017 at 04:30:48PM -0500, Segher Boessenkool wrote:
> On Fri, Jun 16, 2017 at 04:26:58PM -0400, Michael Meissner wrote:
> > > > +  "&& reload_completed"
> > > 
> > > I still don't think it is such a good idea to do all of this not until
> > > after reload.  It does of course allow you to play tricks with changing
> > > register mode at will, like you do ;-)
> > 
> > The problem is MODES_TIEABLE_P.  V4S{I,F}mode and SImode cannot be tied
> > together (i.e. use gen_lowpart to change the mode and use a SUBREG).  So after
> > reload, we can just use gen_rtx_REG (...) to change the register type, but
> > before reload, by creating the SUBREG, it can lead to various aborts if rtl
> > checking is turned on.
> 
> That sounds like a problem elsewhere?  Hrm.
> 
> > > All these unspecs are a similar problem: the RTL optimisers cannot do
> > > much at all with it.
> > 
> > I don't think there is a good way to represent a vec_insert.  And vec_extract
> > can't represent a variable extract either.
> 
> Yeah.  But especially for all this lane shuffling etc. the generic
> optimisers could do a good job, if only they knew how.  Maybe we need
> some new RTL codes.
> 
> > > > +  [(set_attr "type" "vecperm")
> > 
> > > Is that a good type for this?  I think the convert is more expensive
> > > than the permutes?  If so, that would be better (of course it only
> > > matters for sched1, not super important).
> > 
> > I generally use the type of the last insn.  I am open to other suggestions.
> 
> It should describe the resulting insns as a whole.  Picking the type of
> the most expensive insn is often a reasonable approximation; for integer
> insns "two" or "three" can be okay.
> 
> I don't think we can do much better currently.

Here is the latest patch that restricts the optimization to 64-bit (due to
needing VSX small integers).  I've done a full bootstrap/make check on a little
endian power8 system, and a build without bootstrap and make check on a little
endian power9 system.  Neither the power8 nor the power9 systems had any
regressions.  I'm also running a test on a big endian power7 system for
completeness.

Assuming the power7 test finishes without any regressions, can I check this
patch into the trunk and later the GCC 7 branch.

The main change was to restrict the optimization to 64-bit PowerPC that have
VSX small integer support turned on (default for 64-bit).  I did shorten the
one line in the testsuite that you mentioned.

[gcc]
2017-06-16  Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/79799
	* config/rs6000/rs6000.c (rs6000_expand_vector_init): Add support
	for doing vector set of SFmode on ISA 3.0.
	* config/rs6000/vsx.md (vsx_set_v4sf_p9): Likewise.
	(vsx_set_v4sf_p9_zero): Special case setting 0.0f to a V4SF
	element.
	(vsx_insert_extract_v4sf_p9): Add an optimization for inserting a
	SFmode value into a V4SF variable that was extracted from another
	V4SF variable without converting the element to double precision
	and back to single precision vector format.
	(vsx_insert_extract_v4sf_p9_2): Likewise.

[gcc/testsuite]
2017-06-16  Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/79799
	* gcc.target/powerpc/pr79799-1.c: New test.
	* gcc.target/powerpc/pr79799-2.c: Likewise.
	* gcc.target/powerpc/pr79799-3.c: Likewise.
	* gcc.target/powerpc/pr79799-4.c: Likewise.
	* gcc.target/powerpc/pr79799-5.c: Likewise.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

[-- Attachment #2: pr79799.patch03b --]
[-- Type: text/plain, Size: 13686 bytes --]

Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 249175)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -7451,6 +7451,8 @@ rs6000_expand_vector_set (rtx target, rt
 	    insn = gen_vsx_set_v8hi_p9 (target, target, val, elt_rtx);
 	  else if (mode == V16QImode)
 	    insn = gen_vsx_set_v16qi_p9 (target, target, val, elt_rtx);
+	  else if (mode == V4SFmode)
+	    insn = gen_vsx_set_v4sf_p9 (target, target, val, elt_rtx);
 	}
 
       if (insn)
Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md	(revision 249175)
+++ gcc/config/rs6000/vsx.md	(working copy)
@@ -3012,6 +3012,134 @@ (define_insn "vsx_set_<mode>_p9"
 }
   [(set_attr "type" "vecperm")])
 
+(define_insn_and_split "vsx_set_v4sf_p9"
+  [(set (match_operand:V4SF 0 "gpc_reg_operand" "=wa")
+	(unspec:V4SF
+	 [(match_operand:V4SF 1 "gpc_reg_operand" "0")
+	  (match_operand:SF 2 "gpc_reg_operand" "ww")
+	  (match_operand:QI 3 "const_0_to_3_operand" "n")]
+	 UNSPEC_VSX_SET))
+   (clobber (match_scratch:SI 4 "=&wJwK"))]
+  "VECTOR_MEM_VSX_P (V4SFmode) && TARGET_P9_VECTOR && TARGET_VSX_SMALL_INTEGER
+   && TARGET_UPPER_REGS_DI && TARGET_POWERPC64"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 5)
+	(unspec:V4SF [(match_dup 2)]
+		     UNSPEC_VSX_CVDPSPN))
+   (parallel [(set (match_dup 4)
+		   (vec_select:SI (match_dup 6)
+				  (parallel [(match_dup 7)])))
+	      (clobber (scratch:SI))])
+   (set (match_dup 8)
+	(unspec:V4SI [(match_dup 8)
+		      (match_dup 4)
+		      (match_dup 3)]
+		     UNSPEC_VSX_SET))]
+{
+  unsigned int tmp_regno = reg_or_subregno (operands[4]);
+
+  operands[5] = gen_rtx_REG (V4SFmode, tmp_regno);
+  operands[6] = gen_rtx_REG (V4SImode, tmp_regno);
+  operands[7] = GEN_INT (VECTOR_ELT_ORDER_BIG ? 1 : 2);
+  operands[8] = gen_rtx_REG (V4SImode, reg_or_subregno (operands[0]));
+}
+  [(set_attr "type" "vecperm")
+   (set_attr "length" "12")])
+
+;; Special case setting 0.0f to a V4SF element
+(define_insn_and_split "*vsx_set_v4sf_p9_zero"
+  [(set (match_operand:V4SF 0 "gpc_reg_operand" "=wa")
+	(unspec:V4SF
+	 [(match_operand:V4SF 1 "gpc_reg_operand" "0")
+	  (match_operand:SF 2 "zero_fp_constant" "j")
+	  (match_operand:QI 3 "const_0_to_3_operand" "n")]
+	 UNSPEC_VSX_SET))
+   (clobber (match_scratch:SI 4 "=&wJwK"))]
+  "VECTOR_MEM_VSX_P (V4SFmode) && TARGET_P9_VECTOR && TARGET_VSX_SMALL_INTEGER
+   && TARGET_UPPER_REGS_DI && TARGET_POWERPC64"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 4)
+	(const_int 0))
+   (set (match_dup 5)
+	(unspec:V4SI [(match_dup 5)
+		      (match_dup 4)
+		      (match_dup 3)]
+		     UNSPEC_VSX_SET))]
+{
+  operands[5] = gen_rtx_REG (V4SImode, reg_or_subregno (operands[0]));
+}
+  [(set_attr "type" "vecperm")
+   (set_attr "length" "8")])
+
+;; Optimize x = vec_insert (vec_extract (v2, n), v1, m) if n is the element
+;; that is in the default scalar position (1 for big endian, 2 for little
+;; endian).  We just need to do an xxinsertw since the element is in the
+;; correct location.
+
+(define_insn "*vsx_insert_extract_v4sf_p9"
+  [(set (match_operand:V4SF 0 "gpc_reg_operand" "=wa")
+	(unspec:V4SF
+	 [(match_operand:V4SF 1 "gpc_reg_operand" "0")
+	  (vec_select:SF (match_operand:V4SF 2 "gpc_reg_operand" "wa")
+			 (parallel
+			  [(match_operand:QI 3 "const_0_to_3_operand" "n")]))
+	  (match_operand:QI 4 "const_0_to_3_operand" "n")]
+	 UNSPEC_VSX_SET))]
+  "VECTOR_MEM_VSX_P (V4SFmode) && TARGET_P9_VECTOR && TARGET_VSX_SMALL_INTEGER
+   && TARGET_UPPER_REGS_DI && TARGET_POWERPC64
+   && (INTVAL (operands[3]) == (VECTOR_ELT_ORDER_BIG ? 1 : 2))"
+{
+  int ele = INTVAL (operands[4]);
+
+  if (!VECTOR_ELT_ORDER_BIG)
+    ele = GET_MODE_NUNITS (V4SFmode) - 1 - ele;
+
+  operands[4] = GEN_INT (GET_MODE_SIZE (SFmode) * ele);
+  return "xxinsertw %x0,%x2,%4";
+}
+  [(set_attr "type" "vecperm")])
+
+;; Optimize x = vec_insert (vec_extract (v2, n), v1, m) if n is not the element
+;; that is in the default scalar position (1 for big endian, 2 for little
+;; endian).  Convert the insert/extract to int and avoid doing the conversion.
+
+(define_insn_and_split "*vsx_insert_extract_v4sf_p9_2"
+  [(set (match_operand:V4SF 0 "gpc_reg_operand" "=wa")
+	(unspec:V4SF
+	 [(match_operand:V4SF 1 "gpc_reg_operand" "0")
+	  (vec_select:SF (match_operand:V4SF 2 "gpc_reg_operand" "wa")
+			 (parallel
+			  [(match_operand:QI 3 "const_0_to_3_operand" "n")]))
+	  (match_operand:QI 4 "const_0_to_3_operand" "n")]
+	 UNSPEC_VSX_SET))
+   (clobber (match_scratch:SI 5 "=&wJwK"))]
+  "VECTOR_MEM_VSX_P (V4SFmode) && VECTOR_MEM_VSX_P (V4SImode)
+   && TARGET_P9_VECTOR && TARGET_VSX_SMALL_INTEGER
+   && TARGET_UPPER_REGS_DI && TARGET_POWERPC64
+   && (INTVAL (operands[3]) != (VECTOR_ELT_ORDER_BIG ? 1 : 2))"
+  "#"
+  "&& 1"
+  [(parallel [(set (match_dup 5)
+		   (vec_select:SI (match_dup 6)
+				  (parallel [(match_dup 3)])))
+	      (clobber (scratch:SI))])
+   (set (match_dup 7)
+	(unspec:V4SI [(match_dup 8)
+		      (match_dup 5)
+		      (match_dup 4)]
+		     UNSPEC_VSX_SET))]
+{
+  if (GET_CODE (operands[5]) == SCRATCH)
+    operands[5] = gen_reg_rtx (SImode);
+
+  operands[6] = gen_lowpart (V4SImode, operands[2]);
+  operands[7] = gen_lowpart (V4SImode, operands[0]);
+  operands[8] = gen_lowpart (V4SImode, operands[1]);
+}
+  [(set_attr "type" "vecperm")])
+
 ;; Expanders for builtins
 (define_expand "vsx_mergel_<mode>"
   [(use (match_operand:VSX_D 0 "vsx_register_operand" ""))
Index: gcc/testsuite/gcc.target/powerpc/pr79799-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr79799-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr79799-1.c	(revision 0)
@@ -0,0 +1,43 @@
+/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-mcpu=power9 -O2" } */
+
+#include <altivec.h>
+
+/* GCC 7.1 did not have a specialized method for inserting 32-bit floating
+   point on ISA 3.0 (power9) systems.  */
+
+vector float
+insert_arg_0 (vector float vf, float f)
+{
+  return vec_insert (f, vf, 0);
+}
+
+vector float
+insert_arg_1 (vector float vf, float f)
+{
+  return vec_insert (f, vf, 1);
+}
+
+vector float
+insert_arg_2 (vector float vf, float f)
+{
+  return vec_insert (f, vf, 2);
+}
+
+vector float
+insert_arg_3 (vector float vf, float f)
+{
+  return vec_insert (f, vf, 3);
+}
+
+/* { dg-final { scan-assembler     {\mxscvdpspn\M} } } */
+/* { dg-final { scan-assembler     {\mxxinsertw\M} } } */
+/* { dg-final { scan-assembler-not {\mlvewx\M}     } } */
+/* { dg-final { scan-assembler-not {\mlvx\M}       } } */
+/* { dg-final { scan-assembler-not {\mvperm\M}     } } */
+/* { dg-final { scan-assembler-not {\mvpermr\M}    } } */
+/* { dg-final { scan-assembler-not {\mstfs\M}      } } */
+/* { dg-final { scan-assembler-not {\mstxssp\M}    } } */
+/* { dg-final { scan-assembler-not {\mstxsspx\M}   } } */
Index: gcc/testsuite/gcc.target/powerpc/pr79799-2.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr79799-2.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr79799-2.c	(revision 0)
@@ -0,0 +1,31 @@
+/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-mcpu=power9 -O2" } */
+
+#include <altivec.h>
+
+/* Optimize x = vec_insert (vec_extract (v2, N), v1, M) for SFmode if N is the default
+   scalar position.  */
+
+#if __ORDER_LITTLE_ENDIAN__
+#define ELE 2
+#else
+#define ELE 1
+#endif
+
+vector float
+foo (vector float v1, vector float v2)
+{
+  return vec_insert (vec_extract (v2, ELE), v1, 0);
+}
+
+/* { dg-final { scan-assembler     {\mxxinsertw\M}   } } */
+/* { dg-final { scan-assembler-not {\mxxextractuw\M} } } */
+/* { dg-final { scan-assembler-not {\mlvewx\M}       } } */
+/* { dg-final { scan-assembler-not {\mlvx\M}         } } */
+/* { dg-final { scan-assembler-not {\mvperm\M}       } } */
+/* { dg-final { scan-assembler-not {\mvpermr\M}      } } */
+/* { dg-final { scan-assembler-not {\mstfs\M}        } } */
+/* { dg-final { scan-assembler-not {\mstxssp\M}      } } */
+/* { dg-final { scan-assembler-not {\mstxsspx\M}     } } */
Index: gcc/testsuite/gcc.target/powerpc/pr79799-3.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr79799-3.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr79799-3.c	(revision 0)
@@ -0,0 +1,24 @@
+/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-mcpu=power9 -O2" } */
+
+#include <altivec.h>
+
+/* Optimize x = vec_insert (vec_extract (v2, N), v1, M) for SFmode.  */
+
+vector float
+foo (vector float v1, vector float v2)
+{
+  return vec_insert (vec_extract (v2, 4), v1, 0);
+}
+
+/* { dg-final { scan-assembler     {\mxxinsertw\M}   } } */
+/* { dg-final { scan-assembler     {\mxxextractuw\M} } } */
+/* { dg-final { scan-assembler-not {\mlvewx\M}       } } */
+/* { dg-final { scan-assembler-not {\mlvx\M}         } } */
+/* { dg-final { scan-assembler-not {\mvperm\M}       } } */
+/* { dg-final { scan-assembler-not {\mvpermr\M}      } } */
+/* { dg-final { scan-assembler-not {\mstfs\M}        } } */
+/* { dg-final { scan-assembler-not {\mstxssp\M}      } } */
+/* { dg-final { scan-assembler-not {\mstxsspx\M}     } } */
Index: gcc/testsuite/gcc.target/powerpc/pr79799-4.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr79799-4.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr79799-4.c	(revision 0)
@@ -0,0 +1,105 @@
+/* { dg-do run { target { powerpc*-*-linux* } } } */
+/* { dg-require-effective-target vsx_hw } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-require-effective-target p9vector_hw } */
+/* { dg-options "-mcpu=power9 -O2" } */
+
+#include <altivec.h>
+#include <stdlib.h>
+
+__attribute__ ((__noinline__))
+vector float
+insert_0 (vector float v, float f)
+{
+  return vec_insert (f, v, 0);
+}
+
+__attribute__ ((__noinline__))
+vector float
+insert_1 (vector float v, float f)
+{
+  return vec_insert (f, v, 1);
+}
+
+__attribute__ ((__noinline__))
+vector float
+insert_2 (vector float v, float f)
+{
+  return vec_insert (f, v, 2);
+}
+
+__attribute__ ((__noinline__))
+vector float
+insert_3 (vector float v, float f)
+{
+  return vec_insert (f, v, 3);
+}
+
+__attribute__ ((__noinline__))
+void
+test_insert (void)
+{
+  vector float v1 = { 1.0f, 2.0f, 3.0f, 4.0f };
+  vector float v2 = { 5.0f, 6.0f, 7.0f, 8.0f };
+
+  v1 = insert_0 (v1, 5.0f);
+  v1 = insert_1 (v1, 6.0f);
+  v1 = insert_2 (v1, 7.0f);
+  v1 = insert_3 (v1, 8.0f);
+
+  if (vec_any_ne (v1, v2))
+    abort ();
+}
+
+__attribute__ ((__noinline__))
+vector float
+insert_extract_0_3 (vector float v1, vector float v2)
+{
+  return vec_insert (vec_extract (v2, 3), v1, 0);
+}
+
+__attribute__ ((__noinline__))
+vector float
+insert_extract_1_2 (vector float v1, vector float v2)
+{
+  return vec_insert (vec_extract (v2, 2), v1, 1);
+}
+
+__attribute__ ((__noinline__))
+vector float
+insert_extract_2_1 (vector float v1, vector float v2)
+{
+  return vec_insert (vec_extract (v2, 1), v1, 2);
+}
+
+__attribute__ ((__noinline__))
+vector float
+insert_extract_3_0 (vector float v1, vector float v2)
+{
+  return vec_insert (vec_extract (v2, 0), v1, 3);
+}
+
+__attribute__ ((__noinline__))
+void
+test_insert_extract (void)
+{
+  vector float v1 = { 1.0f, 2.0f, 3.0f, 4.0f };
+  vector float v2 = { 5.0f, 6.0f, 7.0f, 8.0f };
+  vector float v3 = { 8.0f, 7.0f, 6.0f, 5.0f };
+
+  v1 = insert_extract_0_3 (v1, v2);
+  v1 = insert_extract_1_2 (v1, v2);
+  v1 = insert_extract_2_1 (v1, v2);
+  v1 = insert_extract_3_0 (v1, v2);
+
+  if (vec_any_ne (v1, v3))
+    abort ();
+}
+
+int
+main (void)
+{
+  test_insert ();
+  test_insert_extract ();
+  return 0;
+}
Index: gcc/testsuite/gcc.target/powerpc/pr79799-5.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr79799-5.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr79799-5.c	(revision 0)
@@ -0,0 +1,25 @@
+/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-mcpu=power9 -O2" } */
+
+#include <altivec.h>
+
+/* Insure setting 0.0f to a V4SFmode element does not do a FP conversion.  */
+
+vector float
+insert_arg_0 (vector float vf)
+{
+  return vec_insert (0.0f, vf, 0);
+}
+
+/* { dg-final { scan-assembler     {\mxxinsertw\M}   } } */
+/* { dg-final { scan-assembler-not {\mlvewx\M}       } } */
+/* { dg-final { scan-assembler-not {\mlvx\M}         } } */
+/* { dg-final { scan-assembler-not {\mvperm\M}       } } */
+/* { dg-final { scan-assembler-not {\mvpermr\M}      } } */
+/* { dg-final { scan-assembler-not {\mstfs\M}        } } */
+/* { dg-final { scan-assembler-not {\mstxssp\M}      } } */
+/* { dg-final { scan-assembler-not {\mstxsspx\M}     } } */
+/* { dg-final { scan-assembler-not {\mxscvdpspn\M}   } } */
+/* { dg-final { scan-assembler-not {\mxxextractuw\M} } } */

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH, rev 2] PR target/79799, Add vec_insert of V4SFmode on PowerPC ISA 3.0 (power9)
  2017-06-16 21:55           ` Michael Meissner
@ 2017-06-19 23:16             ` Segher Boessenkool
  0 siblings, 0 replies; 8+ messages in thread
From: Segher Boessenkool @ 2017-06-19 23:16 UTC (permalink / raw)
  To: Michael Meissner, GCC Patches, David Edelsohn, Bill Schmidt

On Fri, Jun 16, 2017 at 05:55:35PM -0400, Michael Meissner wrote:
> Here is the latest patch that restricts the optimization to 64-bit (due to
> needing VSX small integers).  I've done a full bootstrap/make check on a little
> endian power8 system, and a build without bootstrap and make check on a little
> endian power9 system.  Neither the power8 nor the power9 systems had any
> regressions.  I'm also running a test on a big endian power7 system for
> completeness.
> 
> Assuming the power7 test finishes without any regressions, can I check this
> patch into the trunk and later the GCC 7 branch.
> 
> The main change was to restrict the optimization to 64-bit PowerPC that have
> VSX small integer support turned on (default for 64-bit).  I did shorten the
> one line in the testsuite that you mentioned.

Okay for both.  Thanks!


Segher


> 2017-06-16  Michael Meissner  <meissner@linux.vnet.ibm.com>
> 
> 	PR target/79799
> 	* config/rs6000/rs6000.c (rs6000_expand_vector_init): Add support
> 	for doing vector set of SFmode on ISA 3.0.
> 	* config/rs6000/vsx.md (vsx_set_v4sf_p9): Likewise.
> 	(vsx_set_v4sf_p9_zero): Special case setting 0.0f to a V4SF
> 	element.
> 	(vsx_insert_extract_v4sf_p9): Add an optimization for inserting a
> 	SFmode value into a V4SF variable that was extracted from another
> 	V4SF variable without converting the element to double precision
> 	and back to single precision vector format.
> 	(vsx_insert_extract_v4sf_p9_2): Likewise.
> 
> [gcc/testsuite]
> 2017-06-16  Michael Meissner  <meissner@linux.vnet.ibm.com>
> 
> 	PR target/79799
> 	* gcc.target/powerpc/pr79799-1.c: New test.
> 	* gcc.target/powerpc/pr79799-2.c: Likewise.
> 	* gcc.target/powerpc/pr79799-3.c: Likewise.
> 	* gcc.target/powerpc/pr79799-4.c: Likewise.
> 	* gcc.target/powerpc/pr79799-5.c: Likewise.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-06-19 23:16 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-15  0:02 [PATCH] PR target/79799, Add vec_insert of V4SFmode on PowerPC ISA 3.0 (power9) Michael Meissner
2017-06-15 23:39 ` Michael Meissner
2017-06-16  2:10   ` [PATCH, rev 2] " Michael Meissner
2017-06-16 19:52     ` Segher Boessenkool
2017-06-16 20:27       ` Michael Meissner
2017-06-16 21:30         ` Segher Boessenkool
2017-06-16 21:55           ` Michael Meissner
2017-06-19 23:16             ` Segher Boessenkool

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).