[PATCH], Improve vector int/long initialization on PowerPC

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH], Improve vector int/long initialization on PowerPC
@ 2016-08-04  4:34 Michael Meissner
  2016-08-04 15:03 ` Segher Boessenkool
  2016-08-05 22:00 ` [PATCH], Improve vector int/long " Pat Haugen
  0 siblings, 2 replies; 16+ messages in thread
From: Michael Meissner @ 2016-08-04  4:34 UTC (permalink / raw)
  To: gcc-patches, Segher Boessenkool, David Edelsohn, Bill Schmidt

[-- Attachment #1: Type: text/plain, Size: 3187 bytes --]

This is a set of 3 patches to improve initializing vectors on the PowerPC.

The first patch changes the initialization of vector int where the
initialization part was not constant.  Previously, the compiler would create
the vector initialization on the stack and load it up, and now on 64-bit power8
and newer systems, it will create the parts in the GPRs.

Before the switch from using the old RELOAD register allocator to the newer LRA
register allocator, this patch had a problem with one of the fortran benchmarks
(cray_pointers_2) on a Power8 system (it works on power7 because the
optimization is not done there, and on power9 because power9 has d-form vector
addressing).  This was due to TImode not being allowed in vector registers.
So, I added a test to disable the optimization on such a system.  Since LRA
enables TImode to go into vector registers, most users will see the benefits of
this optimization.

The second part is cosmetic, in that moves the determination of the true
register number for a REG or SUBREG to a helper function from the
rs6000_adjust_vec_address function.  Previous versions of this patch had
additional callers to regno_or_subregno, but those uses were removed at this
time.  However, as I work on further optimizing vector initialization, set, and
extract, I may wind up using the helper function.

The third patch improves formation of vector long on ISA 3.0 system, to use the
new MTVSRDD instruction (that builds a vector from two 64-bit GPRs).

I built spec 2006 with these patches on a little endian power8 system, and at
least 18 of the benchmarks had vector initializations replaced.  Most
benchmarks only used the initialization in a few places, but gamess, dealII,
h264ref, and wrf each had over 100 initializations changed.

I have tried these patches on a big endian power7 system (both 32-bit and
64-bit targets), on a big endian power8 system (just 64-bit targets), and a
little endian power8 system (just 64-bit targets).  There were no regressions
on any of the systems.  Can I install these patches to the trunk?

[gcc]
2016-08-03  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* config/rs6000/rs6000.c (rs6000_expand_vector): On 64-bit systems
	with direct move and TImode registers allowed in VSX, initialize a
	V4SImode vector in the GPRs, rather than creating a temporary
	vector on the stack, doing 4 stores to that temporary vector, and
	then doing a vector load (which causes a pipeline bubble between
	the stores and the load).
	(regno_or_subregno): New helper function to get the register
	number of a REG or SUBREG rtx.
	(rs6000_adjust_vec_address): Use regno_or_subregno.
	* config/rs6000/vsx.md (vsx_concat_<mode>): Add support for the
	ISA 3.0 mtvsrdd instruction if we are moving two gpr registers to
	create on vector register.

[gcc/testsuite]
2016-08-03  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* gcc.target/powerpc/vec-init-1.c: New tests for vector init.
	* gcc.target/powerpc/vec-init-2.c: Likewise.
	* gcc.target/powerpc/vec-init-3.c: Likewise.


-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

[-- Attachment #2: gcc-stage7.init001b --]
[-- Type: text/plain, Size: 7670 bytes --]

Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)	(revision 239098)
+++ gcc/config/rs6000/rs6000.c	(.../gcc/config/rs6000)	(working copy)
@@ -6736,6 +6736,38 @@ rs6000_expand_vector_init (rtx target, r
       return;
     }
 
+  /* Special case initializing vector int if we are on 64-bit systems with
+     direct move.  This bug tickles a bug in reload for fortran's
+     cray_pointers_2 test unless -mvsx-timode is enabled.  */
+  if (mode == V4SImode && TARGET_DIRECT_MOVE_64BIT && TARGET_VSX_TIMODE)
+    {
+      rtx di_hi, di_lo, elements[4], tmp;
+      size_t i;
+
+      for (i = 0; i < 4; i++)
+	{
+	  rtx element_si = XVECEXP (vals, 0, VECTOR_ELT_ORDER_BIG ? i : 3 - i);
+	  element_si = copy_to_mode_reg (SImode, element_si);
+	  elements[i] = gen_reg_rtx (DImode);
+	  convert_move (elements[i], element_si, true);
+	}
+
+      di_hi = gen_reg_rtx (DImode);
+      tmp = gen_reg_rtx (DImode);
+      emit_insn (gen_ashldi3 (tmp, elements[0], GEN_INT (32)));
+      emit_insn (gen_iordi3 (di_hi, tmp, elements[1]));
+
+      di_lo = gen_reg_rtx (DImode);
+      tmp = gen_reg_rtx (DImode);
+      emit_insn (gen_ashldi3 (tmp, elements[2], GEN_INT (32)));
+      emit_insn (gen_iordi3 (di_lo, tmp, elements[3]));
+
+      emit_insn (gen_rtx_CLOBBER (VOIDmode, target));
+      emit_move_insn (gen_highpart (DImode, target), di_hi);
+      emit_move_insn (gen_lowpart (DImode, target), di_lo);
+      return;
+    }
+
   /* With single precision floating point on VSX, know that internally single
      precision is actually represented as a double, and either make 2 V2DF
      vectors, and convert these vectors to single precision, or do one
@@ -7021,6 +7053,18 @@ rs6000_expand_vector_extract (rtx target
   emit_move_insn (target, adjust_address_nv (mem, inner_mode, 0));
 }
 
+/* Helper function to return the register number of a RTX.  */
+static inline int
+regno_or_subregno (rtx op)
+{
+  if (REG_P (op))
+    return REGNO (op);
+  else if (SUBREG_P (op))
+    return subreg_regno (op);
+  else
+    gcc_unreachable ();
+}
+
 /* Adjust a memory address (MEM) of a vector type to point to a scalar field
    within the vector (ELEMENT) with a mode (SCALAR_MODE).  Use a base register
    temporary (BASE_TMP) to fixup the address.  Return the new memory address
@@ -7136,14 +7180,7 @@ rs6000_adjust_vec_address (rtx scalar_re
     {
       rtx op1 = XEXP (new_addr, 1);
       addr_mask_type addr_mask;
-      int scalar_regno;
-
-      if (REG_P (scalar_reg))
-	scalar_regno = REGNO (scalar_reg);
-      else if (SUBREG_P (scalar_reg))
-	scalar_regno = subreg_regno (scalar_reg);
-      else
-	gcc_unreachable ();
+      int scalar_regno = regno_or_subregno (scalar_reg);
 
       gcc_assert (scalar_regno < FIRST_PSEUDO_REGISTER);
       if (INT_REGNO_P (scalar_regno))
Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)	(revision 239098)
+++ gcc/config/rs6000/vsx.md	(.../gcc/config/rs6000)	(working copy)
@@ -1899,18 +1899,28 @@ (define_insn "*vsx_float_fix_v2df2"
 
 ;; Build a V2DF/V2DI vector from two scalars
 (define_insn "vsx_concat_<mode>"
-  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=<VSr>,?<VSa>")
+  [(set (match_operand:VSX_D 0 "gpc_reg_operand" "=<VSa>,we")
 	(vec_concat:VSX_D
-	 (match_operand:<VS_scalar> 1 "vsx_register_operand" "<VS_64reg>,<VSa>")
-	 (match_operand:<VS_scalar> 2 "vsx_register_operand" "<VS_64reg>,<VSa>")))]
+	 (match_operand:<VS_scalar> 1 "gpc_reg_operand" "<VS_64reg>,r")
+	 (match_operand:<VS_scalar> 2 "gpc_reg_operand" "<VS_64reg>,r")))
+   (clobber (match_scratch:DI 3 "=X,X"))]
   "VECTOR_MEM_VSX_P (<MODE>mode)"
 {
-  if (BYTES_BIG_ENDIAN)
-    return "xxpermdi %x0,%x1,%x2,0";
+  if (which_alternative == 0)
+    return (BYTES_BIG_ENDIAN
+	    ? "xxpermdi %x0,%x1,%x2,0"
+	    : "xxpermdi %x0,%x2,%x1,0");
+
+  else if (which_alternative == 1)
+    return (BYTES_BIG_ENDIAN
+	    ? "mtvsrdd %x0,%1,%2"
+	    : "mtvsrdd %x0,%2,%1");
+
   else
-    return "xxpermdi %x0,%x2,%x1,0";
+    gcc_unreachable ();
 }
-  [(set_attr "type" "vecperm")])
+  [(set_attr "type" "vecperm,mftgpr")
+   (set_attr "length" "4")])
 
 ;; Special purpose concat using xxpermdi to glue two single precision values
 ;; together, relying on the fact that internally scalar floats are represented

Index: gcc/testsuite/gcc.target/powerpc/vec-init-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vec-init-1.c	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc)	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vec-init-1.c	(.../gcc/testsuite/gcc.target/powerpc)	(revision 239099)
@@ -0,0 +1,36 @@
+/* { dg-do run { target { powerpc*-*-linux* } } } */
+/* { dg-require-effective-target vsx_hw } */
+/* { dg-options "-O2 -mvsx" } */
+
+#include <stdlib.h>
+#include <stddef.h>
+#include <altivec.h>
+
+extern void check (vector int a)                    __attribute__((__noinline__));
+extern vector int pack (int a, int b, int c, int d) __attribute__((__noinline__));
+
+void
+check (vector int a)
+{
+  static const int expected[] = { -1, 2, 0, -3 };
+  size_t i;
+
+  for (i = 0; i < 4; i++)
+    if (vec_extract (a, i) != expected[i])
+      abort ();
+}
+
+vector int
+pack (int a, int b, int c, int d)
+{
+  return (vector int) { a, b, c, d };
+}
+
+vector int sv = (vector int) { -1, 2, 0, -3 };
+
+int main (void)
+{
+  check (sv);
+  check (pack (-1, 2, 0, -3));
+  return 0;
+}
Index: gcc/testsuite/gcc.target/powerpc/vec-init-2.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vec-init-2.c	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc)	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vec-init-2.c	(.../gcc/testsuite/gcc.target/powerpc)	(revision 239099)
@@ -0,0 +1,36 @@
+/* { dg-do run { target { powerpc*-*-linux* && lp64 } } } */
+/* { dg-require-effective-target vsx_hw } */
+/* { dg-options "-O2 -mvsx" } */
+
+#include <stdlib.h>
+#include <stddef.h>
+#include <altivec.h>
+
+extern void check (vector long a)        __attribute__((__noinline__));
+extern vector long pack (long a, long b) __attribute__((__noinline__));
+
+void
+check (vector long a)
+{
+  static const long expected[] = { 2L, -3L };
+  size_t i;
+
+  for (i = 0; i < 2; i++)
+    if (vec_extract (a, i) != expected[i])
+      abort ();
+}
+
+vector long
+pack (long a, long b)
+{
+  return (vector long) { a, b };
+}
+
+vector long sv = (vector long) { 2L, -3L };
+
+int main (void)
+{
+  check (sv);
+  check (pack (2L, -3L));
+  return 0;
+}
Index: gcc/testsuite/gcc.target/powerpc/vec-init-3.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vec-init-3.c	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc)	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vec-init-3.c	(.../gcc/testsuite/gcc.target/powerpc)	(revision 239099)
@@ -0,0 +1,12 @@
+/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-mcpu=power9 -O2 -mupper-regs-di" } */
+
+vector long
+merge (long a, long b)
+{
+  return (vector long) { a, b };
+}
+
+/* { dg-final { scan-assembler "mtvsrdd" } } */

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH], Improve vector int/long initialization on PowerPC
  2016-08-04  4:34 [PATCH], Improve vector int/long initialization on PowerPC Michael Meissner
@ 2016-08-04 15:03 ` Segher Boessenkool
  2016-08-08 22:55   ` Michael Meissner
  2016-08-05 22:00 ` [PATCH], Improve vector int/long " Pat Haugen
  1 sibling, 1 reply; 16+ messages in thread
From: Segher Boessenkool @ 2016-08-04 15:03 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, David Edelsohn, Bill Schmidt

Hi Mike,

On Thu, Aug 04, 2016 at 12:33:44AM -0400, Michael Meissner wrote:
> I built spec 2006 with these patches on a little endian power8 system, and at
> least 18 of the benchmarks had vector initializations replaced.  Most
> benchmarks only used the initialization in a few places, but gamess, dealII,
> h264ref, and wrf each had over 100 initializations changed.

Did performance change?

> I have tried these patches on a big endian power7 system (both 32-bit and
> 64-bit targets), on a big endian power8 system (just 64-bit targets), and a
> little endian power8 system (just 64-bit targets).  There were no regressions
> on any of the systems.  Can I install these patches to the trunk?

Some questions below, okay for trunk with those taken care of.  Thanks.


> --- gcc/config/rs6000/rs6000.c	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)	(revision 239098)
> +++ gcc/config/rs6000/rs6000.c	(.../gcc/config/rs6000)	(working copy)
> @@ -6736,6 +6736,38 @@ rs6000_expand_vector_init (rtx target, r
>        return;
>      }
>  
> +  /* Special case initializing vector int if we are on 64-bit systems with
> +     direct move.  This bug tickles a bug in reload for fortran's
> +     cray_pointers_2 test unless -mvsx-timode is enabled.  */

"This bug"?  It's not clear to me what this says, could you rephrase?
Just say what the code does, not what would happen without the code.  Or
say both.

> +static inline int
> +regno_or_subregno (rtx op)
> +{
> +  if (REG_P (op))
> +    return REGNO (op);
> +  else if (SUBREG_P (op))
> +    return subreg_regno (op);
> +  else
> +    gcc_unreachable ();
> +}

Maybe this should check the subreg is lowpart, too?  For robustness.

>  ;; Build a V2DF/V2DI vector from two scalars
>  (define_insn "vsx_concat_<mode>"
> -  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=<VSr>,?<VSa>")
> +  [(set (match_operand:VSX_D 0 "gpc_reg_operand" "=<VSa>,we")
>  	(vec_concat:VSX_D
> -	 (match_operand:<VS_scalar> 1 "vsx_register_operand" "<VS_64reg>,<VSa>")
> -	 (match_operand:<VS_scalar> 2 "vsx_register_operand" "<VS_64reg>,<VSa>")))]
> +	 (match_operand:<VS_scalar> 1 "gpc_reg_operand" "<VS_64reg>,r")
> +	 (match_operand:<VS_scalar> 2 "gpc_reg_operand" "<VS_64reg>,r")))
> +   (clobber (match_scratch:DI 3 "=X,X"))]

X,X?  How is that useful?

> +   (set_attr "length" "4")])

One insn is the default.


Segher

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH], Improve vector int/long initialization on PowerPC
  2016-08-04 15:03 ` Segher Boessenkool
@ 2016-08-08 22:55   ` Michael Meissner
  2016-08-11 23:15     ` [PATCH], Patch #4, " Michael Meissner
  0 siblings, 1 reply; 16+ messages in thread
From: Michael Meissner @ 2016-08-08 22:55 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Michael Meissner, gcc-patches, David Edelsohn, Bill Schmidt

On Thu, Aug 04, 2016 at 10:03:36AM -0500, Segher Boessenkool wrote:
> Hi Mike,
> 
> On Thu, Aug 04, 2016 at 12:33:44AM -0400, Michael Meissner wrote:
> > I built spec 2006 with these patches on a little endian power8 system, and at
> > least 18 of the benchmarks had vector initializations replaced.  Most
> > benchmarks only used the initialization in a few places, but gamess, dealII,
> > h264ref, and wrf each had over 100 initializations changed.
> 
> Did performance change?

I ran a selected set of spec benchmarks, and I saw one regression (3.5% in
tonto).  I'll run the full suite, and see if I can see what the slow down is.

> > I have tried these patches on a big endian power7 system (both 32-bit and
> > 64-bit targets), on a big endian power8 system (just 64-bit targets), and a
> > little endian power8 system (just 64-bit targets).  There were no regressions
> > on any of the systems.  Can I install these patches to the trunk?
> 
> Some questions below, okay for trunk with those taken care of.  Thanks.
> 
> 
> > --- gcc/config/rs6000/rs6000.c	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)	(revision 239098)
> > +++ gcc/config/rs6000/rs6000.c	(.../gcc/config/rs6000)	(working copy)
> > @@ -6736,6 +6736,38 @@ rs6000_expand_vector_init (rtx target, r
> >        return;
> >      }
> >  
> > +  /* Special case initializing vector int if we are on 64-bit systems with
> > +     direct move.  This bug tickles a bug in reload for fortran's
> > +     cray_pointers_2 test unless -mvsx-timode is enabled.  */
> 
> "This bug"?  It's not clear to me what this says, could you rephrase?
> Just say what the code does, not what would happen without the code.  Or
> say both.

Here is the comment I tried to reword to be clearer.  Since the bug only occurs
if you do -mno-lra -mno-vsx-timode, I made the test to be those conditions.  I
don't think it is worth the time at this point to track down what the reload
issue is.

  /* Special case initializing vector int if we are on 64-bit systems with
     direct move.  This optimization tickles a bug in RELOAD for fortran's
     cray_pointers_2 test unless -mvsx-timode is enabled (the register
     allocator is trying to load up a V4SImode vector in GPRs with a TImode
     address using a SUBREG).  Since RELOAD is no longer the default register
     allocator, just don't do the optimization.  */
  if (mode == V4SImode && TARGET_DIRECT_MOVE_64BIT
      && (TARGET_LRA || TARGET_VSX_TIMODE))


> > +static inline int
> > +regno_or_subregno (rtx op)
> > +{
> > +  if (REG_P (op))
> > +    return REGNO (op);
> > +  else if (SUBREG_P (op))
> > +    return subreg_regno (op);
> > +  else
> > +    gcc_unreachable ();
> > +}
> 
> Maybe this should check the subreg is lowpart, too?  For robustness.

No, subreg_regno already does that.  Since this is just infrastructure for a
potential future change, I can remove this change until the next patch.

> >  ;; Build a V2DF/V2DI vector from two scalars
> >  (define_insn "vsx_concat_<mode>"
> > -  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=<VSr>,?<VSa>")
> > +  [(set (match_operand:VSX_D 0 "gpc_reg_operand" "=<VSa>,we")
> >  	(vec_concat:VSX_D
> > -	 (match_operand:<VS_scalar> 1 "vsx_register_operand" "<VS_64reg>,<VSa>")
> > -	 (match_operand:<VS_scalar> 2 "vsx_register_operand" "<VS_64reg>,<VSa>")))]
> > +	 (match_operand:<VS_scalar> 1 "gpc_reg_operand" "<VS_64reg>,r")
> > +	 (match_operand:<VS_scalar> 2 "gpc_reg_operand" "<VS_64reg>,r")))
> > +   (clobber (match_scratch:DI 3 "=X,X"))]
> 
> X,X?  How is that useful?

Because I had a more elaborate version for vsx_concat_<mode> that did need a
base register for memory addresses.  I missed deleting the useless clobber.  I
deleted it in my source tree that will become the next iteration of the patch.
When I add support for vector insert to/from memory, I will probably need the
clobbers (and this BTW, was the reason to split off regno_or_subregno).

> > +   (set_attr "length" "4")])
> 
> One insn is the default.

Yep.  I had been working on a larger change, and some of the other alternatives
did have other lengths.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH], Patch #4, Improve vector int/long initialization on PowerPC
  2016-08-08 22:55   ` Michael Meissner
@ 2016-08-11 23:15     ` Michael Meissner
  2016-08-12  0:21       ` Segher Boessenkool
  2016-08-19 22:18       ` [PATCH], Patch #5, Improve vector int " Michael Meissner
  0 siblings, 2 replies; 16+ messages in thread
From: Michael Meissner @ 2016-08-11 23:15 UTC (permalink / raw)
  To: Michael Meissner, Segher Boessenkool, gcc-patches,
	David Edelsohn, Bill Schmidt

[-- Attachment #1: Type: text/plain, Size: 1508 bytes --]

This patch was originally part of patch #3, but I separated it out as I rework
what used to be part of patch #3 to fix some issues.

This patch adds support for using the ISA 3.0 MTVSRDD instruction when
initializing vector long vectors with variables.  I also changed the CPU type
of the other use of MTVSRDD to be vecperm as Pat suggested.

I added two general tests (vec-init-1.c and vec-init-2.c) that test various
forms of vector initialization to make sure the compiler generates the correct
code.  These tests will test optimizations that the future patches will
enhance.  I also added a third test (vec-init-3.c) to specifically test whether
MTVSRDD is generated.

I did a bootstrap and make check on a little endian power8 system, and there
were no regressions.  Can I install this patch to the trunk?

[gcc]
2016-08-11  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* config/rs6000/vsx.md (vsx_concat_<mode>): Add support for the
	ISA 3.0 MTVSRDD instruction.
	(vsx_splat_<mode>): Change cpu type of MTVSRDD instruction to
	vecperm.

[gcc/testsuite]
2016-08-11  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* gcc.target/powerpc/vec-init-1.c: New tests to test various
	vector initialization options.
	* gcc.target/powerpc/vec-init-2.c: Likewise.
	* gcc.target/powerpc/vec-init-3.c: New test to make sure MTVSRDD
	is generated on ISA 3.0.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

[-- Attachment #2: gcc-stage7.init004b --]
[-- Type: text/plain, Size: 9591 bytes --]

Index: gcc/config/rs6000/rs6000.c
===================================================================
Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)	(revision 239334)
+++ gcc/config/rs6000/vsx.md	(.../gcc/config/rs6000)	(working copy)
@@ -1911,16 +1911,24 @@ (define_insn "*vsx_float_fix_v2df2"
 
 ;; Build a V2DF/V2DI vector from two scalars
 (define_insn "vsx_concat_<mode>"
-  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=<VSr>,?<VSa>")
+  [(set (match_operand:VSX_D 0 "gpc_reg_operand" "=<VSa>,we")
 	(vec_concat:VSX_D
-	 (match_operand:<VS_scalar> 1 "vsx_register_operand" "<VS_64reg>,<VSa>")
-	 (match_operand:<VS_scalar> 2 "vsx_register_operand" "<VS_64reg>,<VSa>")))]
+	 (match_operand:<VS_scalar> 1 "gpc_reg_operand" "<VS_64reg>,r")
+	 (match_operand:<VS_scalar> 2 "gpc_reg_operand" "<VS_64reg>,r")))]
   "VECTOR_MEM_VSX_P (<MODE>mode)"
 {
-  if (BYTES_BIG_ENDIAN)
-    return "xxpermdi %x0,%x1,%x2,0";
+  if (which_alternative == 0)
+    return (BYTES_BIG_ENDIAN
+	    ? "xxpermdi %x0,%x1,%x2,0"
+	    : "xxpermdi %x0,%x2,%x1,0");
+
+  else if (which_alternative == 1)
+    return (BYTES_BIG_ENDIAN
+	    ? "mtvsrdd %x0,%1,%2"
+	    : "mtvsrdd %x0,%2,%1");
+
   else
-    return "xxpermdi %x0,%x2,%x1,0";
+    gcc_unreachable ();
 }
   [(set_attr "type" "vecperm")])
 
@@ -2650,7 +2658,7 @@ (define_insn "vsx_splat_<mode>"
    xxpermdi %x0,%x1,%x1,0
    lxvdsx %x0,%y1
    mtvsrdd %x0,%1,%1"
-  [(set_attr "type" "vecperm,vecload,mftgpr")])
+  [(set_attr "type" "vecperm,vecload,vecperm")])
 
 ;; V4SI splat (ISA 3.0)
 ;; When SI's are allowed in VSX registers, add XXSPLTW support
Index: gcc/testsuite/gcc.target/powerpc/vec-init-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vec-init-1.c	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc)	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vec-init-1.c	(.../gcc/testsuite/gcc.target/powerpc)	(revision 239334)
@@ -0,0 +1,169 @@
+/* { dg-do run { target { powerpc*-*-linux* } } } */
+/* { dg-require-effective-target vsx_hw } */
+/* { dg-options "-O2 -mvsx" } */
+
+#include <stdlib.h>
+#include <stddef.h>
+#include <altivec.h>
+
+#define ELEMENTS -1, 2, 0, -123456
+#define SPLAT 0x01234567
+
+vector int sv = (vector int) { ELEMENTS };
+vector int splat = (vector int) { SPLAT, SPLAT, SPLAT, SPLAT };
+vector int sv_global, sp_global;
+static vector int sv_static, sp_static;
+static const int expected[] = { ELEMENTS };
+
+extern void check (vector int a)
+  __attribute__((__noinline__));
+
+extern void check_splat (vector int a)
+  __attribute__((__noinline__));
+
+extern vector int pack_reg (int a, int b, int c, int d)
+  __attribute__((__noinline__));
+
+extern vector int pack_const (void)
+  __attribute__((__noinline__));
+
+extern void pack_ptr (vector int *p, int a, int b, int c, int d)
+  __attribute__((__noinline__));
+
+extern void pack_static (int a, int b, int c, int d)
+  __attribute__((__noinline__));
+
+extern void pack_global (int a, int b, int c, int d)
+  __attribute__((__noinline__));
+
+extern vector int splat_reg (int a)
+  __attribute__((__noinline__));
+
+extern vector int splat_const (void)
+  __attribute__((__noinline__));
+
+extern void splat_ptr (vector int *p, int a)
+  __attribute__((__noinline__));
+
+extern void splat_static (int a)
+  __attribute__((__noinline__));
+
+extern void splat_global (int a)
+  __attribute__((__noinline__));
+
+void
+check (vector int a)
+{
+  size_t i;
+
+  for (i = 0; i < 4; i++)
+    if (vec_extract (a, i) != expected[i])
+      abort ();
+}
+
+void
+check_splat (vector int a)
+{
+  size_t i;
+
+  for (i = 0; i < 4; i++)
+    if (vec_extract (a, i) != SPLAT)
+      abort ();
+}
+
+vector int
+pack_reg (int a, int b, int c, int d)
+{
+  return (vector int) { a, b, c, d };
+}
+
+vector int
+pack_const (void)
+{
+  return (vector int) { ELEMENTS };
+}
+
+void
+pack_ptr (vector int *p, int a, int b, int c, int d)
+{
+  *p = (vector int) { a, b, c, d };
+}
+
+void
+pack_static (int a, int b, int c, int d)
+{
+  sv_static = (vector int) { a, b, c, d };
+}
+
+void
+pack_global (int a, int b, int c, int d)
+{
+  sv_global = (vector int) { a, b, c, d };
+}
+
+vector int
+splat_reg (int a)
+{
+  return (vector int) { a, a, a, a };
+}
+
+vector int
+splat_const (void)
+{
+  return (vector int) { SPLAT, SPLAT, SPLAT, SPLAT };
+}
+
+void
+splat_ptr (vector int *p, int a)
+{
+  *p = (vector int) { a, a, a, a };
+}
+
+void
+splat_static (int a)
+{
+  sp_static = (vector int) { a, a, a, a };
+}
+
+void
+splat_global (int a)
+{
+  sp_global = (vector int) { a, a, a, a };
+}
+
+int main (void)
+{
+  vector int sv2, sv3;
+
+  check (sv);
+
+  check (pack_reg (ELEMENTS));
+
+  check (pack_const ());
+
+  pack_ptr (&sv2, ELEMENTS);
+  check (sv2);
+
+  pack_static (ELEMENTS);
+  check (sv_static);
+
+  pack_global (ELEMENTS);
+  check (sv_global);
+
+  check_splat (splat);
+
+  check_splat (splat_reg (SPLAT));
+
+  check_splat (splat_const ());
+
+  splat_ptr (&sv2, SPLAT);
+  check_splat (sv2);
+
+  splat_static (SPLAT);
+  check_splat (sp_static);
+
+  splat_global (SPLAT);
+  check_splat (sp_global);
+
+  return 0;
+}
Index: gcc/testsuite/gcc.target/powerpc/vec-init-2.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vec-init-2.c	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc)	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vec-init-2.c	(.../gcc/testsuite/gcc.target/powerpc)	(revision 239334)
@@ -0,0 +1,169 @@
+/* { dg-do run { target { powerpc*-*-linux* && lp64 } } } */
+/* { dg-require-effective-target vsx_hw } */
+/* { dg-options "-O2 -mvsx" } */
+
+#include <stdlib.h>
+#include <stddef.h>
+#include <altivec.h>
+
+#define ELEMENTS -12345678L, 9L
+#define SPLAT 0x0123456789ABCDE
+
+vector long sv = (vector long) { ELEMENTS };
+vector long splat = (vector long) { SPLAT, SPLAT };
+vector long sv_global, sp_global;
+static vector long sv_static, sp_static;
+static const int expected[] = { ELEMENTS };
+
+extern void check (vector long a)
+  __attribute__((__noinline__));
+
+extern void check_splat (vector long a)
+  __attribute__((__noinline__));
+
+extern vector long pack_reg (long a, long b)
+  __attribute__((__noinline__));
+
+extern vector long pack_const (void)
+  __attribute__((__noinline__));
+
+extern void pack_ptr (vector long *p, long a, long b)
+  __attribute__((__noinline__));
+
+extern void pack_static (long a, long b)
+  __attribute__((__noinline__));
+
+extern void pack_global (long a, long b)
+  __attribute__((__noinline__));
+
+extern vector long splat_reg (long a)
+  __attribute__((__noinline__));
+
+extern vector long splat_const (void)
+  __attribute__((__noinline__));
+
+extern void splat_ptr (vector long *p, long a)
+  __attribute__((__noinline__));
+
+extern void splat_static (long a)
+  __attribute__((__noinline__));
+
+extern void splat_global (long a)
+  __attribute__((__noinline__));
+
+void
+check (vector long a)
+{
+  size_t i;
+
+  for (i = 0; i < 2; i++)
+    if (vec_extract (a, i) != expected[i])
+      abort ();
+}
+
+void
+check_splat (vector long a)
+{
+  size_t i;
+
+  for (i = 0; i < 2; i++)
+    if (vec_extract (a, i) != SPLAT)
+      abort ();
+}
+
+vector long
+pack_reg (long a, long b)
+{
+  return (vector long) { a, b };
+}
+
+vector long
+pack_const (void)
+{
+  return (vector long) { ELEMENTS };
+}
+
+void
+pack_ptr (vector long *p, long a, long b)
+{
+  *p = (vector long) { a, b };
+}
+
+void
+pack_static (long a, long b)
+{
+  sv_static = (vector long) { a, b };
+}
+
+void
+pack_global (long a, long b)
+{
+  sv_global = (vector long) { a, b };
+}
+
+vector long
+splat_reg (long a)
+{
+  return (vector long) { a, a };
+}
+
+vector long
+splat_const (void)
+{
+  return (vector long) { SPLAT, SPLAT };
+}
+
+void
+splat_ptr (vector long *p, long a)
+{
+  *p = (vector long) { a, a };
+}
+
+void
+splat_static (long a)
+{
+  sp_static = (vector long) { a, a };
+}
+
+void
+splat_global (long a)
+{
+  sp_global = (vector long) { a, a };
+}
+
+int  main (void)
+{
+  vector long sv2, sv3;
+
+  check (sv);
+
+  check (pack_reg (ELEMENTS));
+
+  check (pack_const ());
+
+  pack_ptr (&sv2, ELEMENTS);
+  check (sv2);
+
+  pack_static (ELEMENTS);
+  check (sv_static);
+
+  pack_global (ELEMENTS);
+  check (sv_global);
+
+  check_splat (splat);
+
+  check_splat (splat_reg (SPLAT));
+
+  check_splat (splat_const ());
+
+  splat_ptr (&sv2, SPLAT);
+  check_splat (sv2);
+
+  splat_static (SPLAT);
+  check_splat (sp_static);
+
+  splat_global (SPLAT);
+  check_splat (sp_global);
+
+  return 0;
+}
Index: gcc/testsuite/gcc.target/powerpc/vec-init-3.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vec-init-3.c	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc)	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vec-init-3.c	(.../gcc/testsuite/gcc.target/powerpc)	(revision 239334)
@@ -0,0 +1,12 @@
+/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-mcpu=power9 -O2 -mupper-regs-di" } */
+
+vector long
+merge (long a, long b)
+{
+  return (vector long) { a, b };
+}
+
+/* { dg-final { scan-assembler "mtvsrdd" } } */

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH], Patch #4, Improve vector int/long initialization on PowerPC
  2016-08-11 23:15     ` [PATCH], Patch #4, " Michael Meissner
@ 2016-08-12  0:21       ` Segher Boessenkool
  2016-08-19 22:18       ` [PATCH], Patch #5, Improve vector int " Michael Meissner
  1 sibling, 0 replies; 16+ messages in thread
From: Segher Boessenkool @ 2016-08-12  0:21 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, David Edelsohn, Bill Schmidt

On Thu, Aug 11, 2016 at 07:15:17PM -0400, Michael Meissner wrote:
> This patch was originally part of patch #3, but I separated it out as I rework
> what used to be part of patch #3 to fix some issues.
> 
> This patch adds support for using the ISA 3.0 MTVSRDD instruction when
> initializing vector long vectors with variables.  I also changed the CPU type
> of the other use of MTVSRDD to be vecperm as Pat suggested.
> 
> I added two general tests (vec-init-1.c and vec-init-2.c) that test various
> forms of vector initialization to make sure the compiler generates the correct
> code.  These tests will test optimizations that the future patches will
> enhance.  I also added a third test (vec-init-3.c) to specifically test whether
> MTVSRDD is generated.
> 
> I did a bootstrap and make check on a little endian power8 system, and there
> were no regressions.  Can I install this patch to the trunk?

Yes, please do.  Thanks,


Segher


> 2016-08-11  Michael Meissner  <meissner@linux.vnet.ibm.com>
> 
> 	* config/rs6000/vsx.md (vsx_concat_<mode>): Add support for the
> 	ISA 3.0 MTVSRDD instruction.
> 	(vsx_splat_<mode>): Change cpu type of MTVSRDD instruction to
> 	vecperm.
> 
> [gcc/testsuite]
> 2016-08-11  Michael Meissner  <meissner@linux.vnet.ibm.com>
> 
> 	* gcc.target/powerpc/vec-init-1.c: New tests to test various
> 	vector initialization options.
> 	* gcc.target/powerpc/vec-init-2.c: Likewise.
> 	* gcc.target/powerpc/vec-init-3.c: New test to make sure MTVSRDD
> 	is generated on ISA 3.0.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH], Patch #5, Improve vector int initialization on PowerPC
  2016-08-11 23:15     ` [PATCH], Patch #4, " Michael Meissner
  2016-08-12  0:21       ` Segher Boessenkool
@ 2016-08-19 22:18       ` Michael Meissner
  2016-08-19 23:59         ` [PATCH], Patch #6, Improve vector short/char splat " Michael Meissner
  2016-08-22 16:38         ` [PATCH], Patch #5, Improve vector int initialization on PowerPC Segher Boessenkool
  1 sibling, 2 replies; 16+ messages in thread
From: Michael Meissner @ 2016-08-19 22:18 UTC (permalink / raw)
  To: Michael Meissner, Segher Boessenkool, gcc-patches,
	David Edelsohn, Bill Schmidt

[-- Attachment #1: Type: text/plain, Size: 4258 bytes --]

This is a rewrite of patch #3 to improve vector int initialization on the
PowerPC 64-bit systems wtih direct move (power8, and forthcoming power9).

This patch adds full support for doing vector int initialization in the GPR and
vector registers, rather than creating a stack temporary, doing 4 stores, and
then a vector load (including having an interlock due to having different sizes
of stores vs. loads being done).

In addition to the vector int initialization changes, I separated vector int
from vector float initialization insns.  In looking at vector float, I noticed
that there were places that used the old preferred register class mechanism
that was never used, and I eliminated the preferred register class
alternatives.  I also noticed that the scalar alternatives for float were not
modified to allow float scalar variables to be in Altivec registers.

Finally, in editing the code, I noticed that we were using an explicit XOR to
initialize a register to all 0's.  I changed this to set the vector to
CONST0_RTX (<mode>), which mirrors similar changes I've did on May 15th on the
normal vector moves.

I ran all of the Spec 2006 benchmark suite that I normally run and there were
no significant timing differences between using this patch and the base
compiler.  Originally there was a regression in tonto, but it was fixed when
Alan's patch on August 18th was applied to the trunk.

I wrote a program that did a lot of vector initializations and some simple
vector adds, and it is 5.7% faster for vector initialization of 4 independent
variables, and 7.7% faster if all of the elements are the same.

I have built bootstrap compilers and have run make check on these patches on a
big endian Power8 system and a little endian Power8 system with no regressions.
Previous versions of the patch did boostrap and had no regressions on a big
endian Power7 system.  Are these patches ok to install on the trunk?

[gcc]
2016-08-19  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* config/rs6000/rs6000-protos.h (rs6000_split_v4si_init): Add
	declaration.
	* config/rs6000/rs6000.c (rs6000_expand_vector_init): Set
	initialization of all 0's to the 0 constant, instead of directly
	generating XOR.  Add support for V4SImode vector initialization on
	64-bit systems with direct move, and rework the ISA 3.0 V4SImode
	initialization.  Change variables used in V4SFmode vector
	intialization.  For V4SFmode vector splat on ISA 3.0, make sure
	any memory addresses are in index form.
	(regno_or_subregno): New helper function to return a register
	number for either REG or SUBREG.
	(rs6000_adjust_vec_address): Do not generate ADDI <reg>,R0,<num>.
	Use regno_or_subregno where possible.
	(rs6000_split_v4si_init_di_reg): New helper function to build up a
	DImode value from two SImode values in order to generate V4SImode
	vector initialization on 64-bit systems with direct move.
	(rs6000_split_v4si_init): Split up the insns for a V4SImode vector
	initialization.
	(rtx_is_swappable_p): V4SImode vector initialization insn is not
	swappable.
	* config/rs6000/vsx.md (UNSPEC_VSX_VEC_INIT): New unspec.
	(vsx_concat_v2sf): Eliminate using 'preferred' register classes.
	Allow SFmode values to come from Altivec registers.
	(vsx_init_v4si): New insn/split for V4SImode vector initialization
	on 64-bit systems with direct move.
	(vsx_splat_<mode>, VSX_W iterator): Rework V4SImode and V4SFmode
	vector initializations, to allow V4SImode vector initializations
	on 64-bit systems with direct move.
	(vsx_splat_v4si): Likewise.
	(vsx_splat_v4si_di): Likewise.
	(vsx_splat_v4sf): Likewise.
	(vsx_splat_v4sf_internal): Likewise.
	(vsx_xxspltw_<mode>, VSX_W iterator): Eliminate using 'preferred'
	register classes.
	(vsx_xxspltw_<mode>_direct, VSX_W iterator): Likewise.
	* config/rs6000/rs6000.h (TARGET_DIRECT_MOVE_64BIT): Disallow
	optimization if -maltivec=be.

[gcc/testsuite]
2016-08-19  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* gcc.target/powerpc/vec-init-1.c: Add tests where the vector is
	being created from pointers to memory locations.
	* gcc.target/powerpc/vec-init-2.c: Likewise.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

[-- Attachment #2: gcc-stage7.init005b --]
[-- Type: text/plain, Size: 19597 bytes --]

Index: gcc/config/rs6000/rs6000-protos.h
===================================================================
--- gcc/config/rs6000/rs6000-protos.h	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)	(revision 239554)
+++ gcc/config/rs6000/rs6000-protos.h	(.../gcc/config/rs6000)	(working copy)
@@ -65,6 +65,7 @@ extern void rs6000_expand_vector_set (rt
 extern void rs6000_expand_vector_extract (rtx, rtx, rtx);
 extern void rs6000_split_vec_extract_var (rtx, rtx, rtx, rtx, rtx);
 extern rtx rs6000_adjust_vec_address (rtx, rtx, rtx, rtx, machine_mode);
+extern void rs6000_split_v4si_init (rtx []);
 extern bool altivec_expand_vec_perm_const (rtx op[4]);
 extern void altivec_expand_vec_perm_le (rtx op[4]);
 extern bool rs6000_expand_vec_perm_const (rtx op[4]);
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)	(revision 239554)
+++ gcc/config/rs6000/rs6000.c	(.../gcc/config/rs6000)	(working copy)
@@ -6692,7 +6692,7 @@ rs6000_expand_vector_init (rtx target, r
       if ((int_vector_p || TARGET_VSX) && all_const_zero)
 	{
 	  /* Zero register.  */
-	  emit_insn (gen_rtx_SET (target, gen_rtx_XOR (mode, target, target)));
+	  emit_insn (gen_rtx_SET (target, CONST0_RTX (mode)));
 	  return;
 	}
       else if (int_vector_p && easy_vector_constant (const_vec, mode))
@@ -6735,32 +6735,69 @@ rs6000_expand_vector_init (rtx target, r
       return;
     }
 
-  /* Word values on ISA 3.0 can use mtvsrws, lxvwsx, or vspltisw.  V4SF is
-     complicated since scalars are stored as doubles in the registers.  */
-  if (TARGET_P9_VECTOR && mode == V4SImode && all_same
-      && VECTOR_MEM_VSX_P (mode))
+  /* Special case initializing vector int if we are on 64-bit systems with
+     direct move or we have the ISA 3.0 instructions.  */
+  if (mode == V4SImode  && VECTOR_MEM_VSX_P (V4SImode)
+      && TARGET_DIRECT_MOVE_64BIT)
     {
-      emit_insn (gen_vsx_splat_v4si (target, XVECEXP (vals, 0, 0)));
-      return;
+      if (all_same)
+	{
+	  rtx element0 = XVECEXP (vals, 0, 0);
+	  if (MEM_P (element0))
+	    element0 = rs6000_address_for_fpconvert (element0);
+	  else if (!REG_P (element0))
+	    element0 = force_reg (SImode, element0);
+
+	  if (TARGET_P9_VECTOR)
+	    emit_insn (gen_vsx_splat_v4si (target, element0));
+	  else
+	    {
+	      rtx tmp = gen_reg_rtx (DImode);
+	      emit_insn (gen_zero_extendsidi2 (tmp, element0));
+	      emit_insn (gen_vsx_splat_v4si_di (target, tmp));
+	    }
+	  return;
+	}
+      else
+	{
+	  rtx elements[4];
+	  size_t i;
+
+	  for (i = 0; i < 4; i++)
+	    {
+	      elements[i] = XVECEXP (vals, 0, i);
+	      if (!CONST_INT_P (elements[i]) && !REG_P (elements[i]))
+		elements[i] = copy_to_mode_reg (SImode, elements[i]);
+	    }
+
+	  emit_insn (gen_vsx_init_v4si (target, elements[0], elements[1],
+					elements[2], elements[3]));
+	  return;
+	}
     }
 
   /* With single precision floating point on VSX, know that internally single
      precision is actually represented as a double, and either make 2 V2DF
      vectors, and convert these vectors to single precision, or do one
      conversion, and splat the result to the other elements.  */
-  if (mode == V4SFmode && VECTOR_MEM_VSX_P (mode))
+  if (mode == V4SFmode && VECTOR_MEM_VSX_P (V4SFmode))
     {
       if (all_same)
 	{
-	  rtx op0 = XVECEXP (vals, 0, 0);
+	  rtx element0 = XVECEXP (vals, 0, 0);
 
 	  if (TARGET_P9_VECTOR)
-	    emit_insn (gen_vsx_splat_v4sf (target, op0));
+	    {
+	      if (MEM_P (element0))
+		element0 = rs6000_address_for_fpconvert (element0);
+
+	      emit_insn (gen_vsx_splat_v4sf (target, element0));
+	    }
 
 	  else
 	    {
 	      rtx freg = gen_reg_rtx (V4SFmode);
-	      rtx sreg = force_reg (SFmode, op0);
+	      rtx sreg = force_reg (SFmode, element0);
 	      rtx cvt  = (TARGET_XSCVDPSPN
 			  ? gen_vsx_xscvdpspn_scalar (freg, sreg)
 			  : gen_vsx_xscvdpsp_scalar (freg, sreg));
@@ -7029,6 +7066,18 @@ rs6000_expand_vector_extract (rtx target
   emit_move_insn (target, adjust_address_nv (mem, inner_mode, 0));
 }
 
+/* Helper function to return the register number of a RTX.  */
+static inline int
+regno_or_subregno (rtx op)
+{
+  if (REG_P (op))
+    return REGNO (op);
+  else if (SUBREG_P (op))
+    return subreg_regno (op);
+  else
+    gcc_unreachable ();
+}
+
 /* Adjust a memory address (MEM) of a vector type to point to a scalar field
    within the vector (ELEMENT) with a mode (SCALAR_MODE).  Use a base register
    temporary (BASE_TMP) to fixup the address.  Return the new memory address
@@ -7108,14 +7157,22 @@ rs6000_adjust_vec_address (rtx scalar_re
 	}
       else
 	{
-	  if (REG_P (op1) || SUBREG_P (op1))
+	  bool op1_reg_p = (REG_P (op1) || SUBREG_P (op1));
+	  bool ele_reg_p = (REG_P (element_offset) || SUBREG_P (element_offset));
+
+	  /* Note, ADDI requires the register being added to be a base
+	     register.  If the register was R0, load it up into the temporary
+	     and do the add.  */
+	  if (op1_reg_p
+	      && (ele_reg_p || reg_or_subregno (op1) != FIRST_GPR_REGNO))
 	    {
 	      insn = gen_add3_insn (base_tmp, op1, element_offset);
 	      gcc_assert (insn != NULL_RTX);
 	      emit_insn (insn);
 	    }
 
-	  else if (REG_P (element_offset) || SUBREG_P (element_offset))
+	  else if (ele_reg_p
+		   && reg_or_subregno (element_offset) != FIRST_GPR_REGNO)
 	    {
 	      insn = gen_add3_insn (base_tmp, element_offset, op1);
 	      gcc_assert (insn != NULL_RTX);
@@ -7144,14 +7201,7 @@ rs6000_adjust_vec_address (rtx scalar_re
     {
       rtx op1 = XEXP (new_addr, 1);
       addr_mask_type addr_mask;
-      int scalar_regno;
-
-      if (REG_P (scalar_reg))
-	scalar_regno = REGNO (scalar_reg);
-      else if (SUBREG_P (scalar_reg))
-	scalar_regno = subreg_regno (scalar_reg);
-      else
-	gcc_unreachable ();
+      int scalar_regno = regno_or_subregno (scalar_reg);
 
       gcc_assert (scalar_regno < FIRST_PSEUDO_REGISTER);
       if (INT_REGNO_P (scalar_regno))
@@ -7318,6 +7368,93 @@ rs6000_split_vec_extract_var (rtx dest, 
     gcc_unreachable ();
  }
 
+/* Helper function for rs6000_split_v4si_init to build up a DImode value from
+   two SImode values.  */
+
+static void
+rs6000_split_v4si_init_di_reg (rtx dest, rtx si1, rtx si2, rtx tmp)
+{
+  const unsigned HOST_WIDE_INT mask_32bit = HOST_WIDE_INT_C (0xffffffff);
+
+  if (CONST_INT_P (si1) && CONST_INT_P (si2))
+    {
+      unsigned HOST_WIDE_INT const1 = (UINTVAL (si1) & mask_32bit) << 32;
+      unsigned HOST_WIDE_INT const2 = UINTVAL (si2) & mask_32bit;
+
+      emit_move_insn (dest, GEN_INT (const1 | const2));
+      return;
+    }
+
+  /* Put si1 into upper 32-bits of dest.  */
+  if (CONST_INT_P (si1))
+    emit_move_insn (dest, GEN_INT ((UINTVAL (si1) & mask_32bit) << 32));
+  else
+    {
+      /* Generate RLDIC.  */
+      rtx si1_di = gen_rtx_REG (DImode, regno_or_subregno (si1));
+      rtx shift_rtx = gen_rtx_ASHIFT (DImode, si1_di, GEN_INT (32));
+      rtx mask_rtx = GEN_INT (mask_32bit << 32);
+      rtx and_rtx = gen_rtx_AND (DImode, shift_rtx, mask_rtx);
+      gcc_assert (!reg_overlap_mentioned_p (dest, si1));
+      emit_insn (gen_rtx_SET (dest, and_rtx));
+    }
+
+  /* Put si2 into the temporary.  */
+  gcc_assert (!reg_overlap_mentioned_p (dest, tmp));
+  if (CONST_INT_P (si2))
+    emit_move_insn (tmp, GEN_INT (UINTVAL (si2) & mask_32bit));
+  else
+    emit_insn (gen_zero_extendsidi2 (tmp, si2));
+
+  /* Combine the two parts.  */
+  emit_insn (gen_iordi3 (dest, dest, tmp));
+  return;
+}
+
+/* Split a V4SI initialization.  */
+
+void
+rs6000_split_v4si_init (rtx operands[])
+{
+  rtx dest = operands[0];
+
+  /* Destination is a GPR, build up the two DImode parts in place.  */
+  if (REG_P (dest) || SUBREG_P (dest))
+    {
+      int d_regno = regno_or_subregno (dest);
+      rtx scalar1 = operands[1];
+      rtx scalar2 = operands[2];
+      rtx scalar3 = operands[3];
+      rtx scalar4 = operands[4];
+      rtx tmp1 = operands[5];
+      rtx tmp2 = operands[6];
+
+      /* Even though we only need one temporary (plus the destination, which
+	 has an early clobber constraint, try to use two temporaries, one for
+	 each double word created.  That way the 2nd insn scheduling pass can
+	 rearrange things so the two parts are done in parallel.  */
+      if (BYTES_BIG_ENDIAN)
+	{
+	  rtx di_lo = gen_rtx_REG (DImode, d_regno);
+	  rtx di_hi = gen_rtx_REG (DImode, d_regno + 1);
+	  rs6000_split_v4si_init_di_reg (di_lo, scalar1, scalar2, tmp1);
+	  rs6000_split_v4si_init_di_reg (di_hi, scalar3, scalar4, tmp2);
+	}
+      else
+	{
+	  rtx di_lo = gen_rtx_REG (DImode, d_regno + 1);
+	  rtx di_hi = gen_rtx_REG (DImode, d_regno);
+	  gcc_assert (!VECTOR_ELT_ORDER_BIG);
+	  rs6000_split_v4si_init_di_reg (di_lo, scalar4, scalar3, tmp1);
+	  rs6000_split_v4si_init_di_reg (di_hi, scalar2, scalar1, tmp2);
+	}
+      return;
+    }
+
+  else
+    gcc_unreachable ();
+}
+
 /* Return TRUE if OP is an invalid SUBREG operation on the e500.  */
 
 bool
@@ -39006,6 +39143,7 @@ rtx_is_swappable_p (rtx op, unsigned int
 	  case UNSPEC_VSX_CVSPDPN:
 	  case UNSPEC_VSX_EXTRACT:
 	  case UNSPEC_VSX_VSLO:
+	  case UNSPEC_VSX_VEC_INIT:
 	    return 0;
 	  case UNSPEC_VSPLT_DIRECT:
 	    *special = SH_SPLAT;
Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)	(revision 239554)
+++ gcc/config/rs6000/vsx.md	(.../gcc/config/rs6000)	(working copy)
@@ -323,6 +323,7 @@ (define_c_enum "unspec"
    UNSPEC_VSX_VXSIG
    UNSPEC_VSX_VIEXP
    UNSPEC_VSX_VTSTDC
+   UNSPEC_VSX_VEC_INIT
   ])
 
 ;; VSX moves
@@ -1950,10 +1951,10 @@ (define_insn "vsx_concat_<mode>"
 ;; together, relying on the fact that internally scalar floats are represented
 ;; as doubles.  This is used to initialize a V4SF vector with 4 floats
 (define_insn "vsx_concat_v2sf"
-  [(set (match_operand:V2DF 0 "vsx_register_operand" "=wd,?wa")
+  [(set (match_operand:V2DF 0 "vsx_register_operand" "=wa")
 	(unspec:V2DF
-	 [(match_operand:SF 1 "vsx_register_operand" "f,f")
-	  (match_operand:SF 2 "vsx_register_operand" "f,f")]
+	 [(match_operand:SF 1 "vsx_register_operand" "ww")
+	  (match_operand:SF 2 "vsx_register_operand" "ww")]
 	 UNSPEC_VSX_CONCAT))]
   "VECTOR_MEM_VSX_P (V2DFmode)"
 {
@@ -1964,6 +1965,26 @@ (define_insn "vsx_concat_v2sf"
 }
   [(set_attr "type" "vecperm")])
 
+;; V4SImode initialization splitter
+(define_insn_and_split "vsx_init_v4si"
+  [(set (match_operand:V4SI 0 "gpc_reg_operand" "=&r")
+	(unspec:V4SI
+	 [(match_operand:SI 1 "reg_or_cint_operand" "rn")
+	  (match_operand:SI 2 "reg_or_cint_operand" "rn")
+	  (match_operand:SI 3 "reg_or_cint_operand" "rn")
+	  (match_operand:SI 4 "reg_or_cint_operand" "rn")]
+	 UNSPEC_VSX_VEC_INIT))
+   (clobber (match_scratch:DI 5 "=&r"))
+   (clobber (match_scratch:DI 6 "=&r"))]
+   "VECTOR_MEM_VSX_P (V4SImode) && TARGET_DIRECT_MOVE_64BIT"
+   "#"
+   "&& reload_completed"
+   [(const_int 0)]
+{
+  rs6000_split_v4si_init (operands);
+  DONE;
+})
+
 ;; xxpermdi for little endian loads and stores.  We need several of
 ;; these since the form of the PARALLEL differs by mode.
 (define_insn "*vsx_xxpermdi2_le_<mode>"
@@ -2674,32 +2695,33 @@ (define_insn "vsx_splat_<mode>"
    mtvsrdd %x0,%1,%1"
   [(set_attr "type" "vecperm,vecload,vecperm")])
 
-;; V4SI splat (ISA 3.0)
-;; When SI's are allowed in VSX registers, add XXSPLTW support
-(define_expand "vsx_splat_<mode>"
-  [(set (match_operand:VSX_W 0 "vsx_register_operand" "")
-	(vec_duplicate:VSX_W
-	 (match_operand:<VS_scalar> 1 "splat_input_operand" "")))]
-  "TARGET_P9_VECTOR"
-{
-  if (MEM_P (operands[1]))
-    operands[1] = rs6000_address_for_fpconvert (operands[1]);
-  else if (!REG_P (operands[1]))
-    operands[1] = force_reg (<VS_scalar>mode, operands[1]);
-})
-
-(define_insn "*vsx_splat_v4si_internal"
-  [(set (match_operand:V4SI 0 "vsx_register_operand" "=wa,wa")
+;; V4SI splat support
+(define_insn "vsx_splat_v4si"
+  [(set (match_operand:V4SI 0 "vsx_register_operand" "=we,we")
 	(vec_duplicate:V4SI
 	 (match_operand:SI 1 "splat_input_operand" "r,Z")))]
   "TARGET_P9_VECTOR"
   "@
    mtvsrws %x0,%1
    lxvwsx %x0,%y1"
-  [(set_attr "type" "mftgpr,vecload")])
+  [(set_attr "type" "vecperm,vecload")])
+
+;; SImode is not currently allowed in vector registers.  This pattern
+;; allows us to use direct move to get the value in a vector register
+;; so that we can use XXSPLTW
+(define_insn "vsx_splat_v4si_di"
+  [(set (match_operand:V4SI 0 "vsx_register_operand" "=wa,we")
+	(vec_duplicate:V4SI
+	 (truncate:SI
+	  (match_operand:DI 1 "gpc_reg_operand" "wj,r"))))]
+  "VECTOR_MEM_VSX_P (V4SImode) && TARGET_DIRECT_MOVE_64BIT"
+  "@
+   xxspltw %x0,%x1,1
+   mtvsrws %x0,%1"
+  [(set_attr "type" "vecperm")])
 
 ;; V4SF splat (ISA 3.0)
-(define_insn_and_split "*vsx_splat_v4sf_internal"
+(define_insn_and_split "vsx_splat_v4sf"
   [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa,wa")
 	(vec_duplicate:V4SF
 	 (match_operand:SF 1 "splat_input_operand" "Z,wy,r")))]
@@ -2720,12 +2742,12 @@ (define_insn_and_split "*vsx_splat_v4sf_
 
 ;; V4SF/V4SI splat from a vector element
 (define_insn "vsx_xxspltw_<mode>"
-  [(set (match_operand:VSX_W 0 "vsx_register_operand" "=wf,?<VSa>")
+  [(set (match_operand:VSX_W 0 "vsx_register_operand" "=<VSa>")
 	(vec_duplicate:VSX_W
 	 (vec_select:<VS_scalar>
-	  (match_operand:VSX_W 1 "vsx_register_operand" "wf,<VSa>")
+	  (match_operand:VSX_W 1 "vsx_register_operand" "<VSa>")
 	  (parallel
-	   [(match_operand:QI 2 "u5bit_cint_operand" "i,i")]))))]
+	   [(match_operand:QI 2 "u5bit_cint_operand" "n")]))))]
   "VECTOR_MEM_VSX_P (<MODE>mode)"
 {
   if (!BYTES_BIG_ENDIAN)
@@ -2736,9 +2758,9 @@ (define_insn "vsx_xxspltw_<mode>"
   [(set_attr "type" "vecperm")])
 
 (define_insn "vsx_xxspltw_<mode>_direct"
-  [(set (match_operand:VSX_W 0 "vsx_register_operand" "=wf,?<VSa>")
-        (unspec:VSX_W [(match_operand:VSX_W 1 "vsx_register_operand" "wf,<VSa>")
-                       (match_operand:QI 2 "u5bit_cint_operand" "i,i")]
+  [(set (match_operand:VSX_W 0 "vsx_register_operand" "=<VSa>")
+        (unspec:VSX_W [(match_operand:VSX_W 1 "vsx_register_operand" "<VSa>")
+                       (match_operand:QI 2 "u5bit_cint_operand" "i")]
                       UNSPEC_VSX_XXSPLTW))]
   "VECTOR_MEM_VSX_P (<MODE>mode)"
   "xxspltw %x0,%x1,%2"
Index: gcc/config/rs6000/rs6000.h
===================================================================
--- gcc/config/rs6000/rs6000.h	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)	(revision 239554)
+++ gcc/config/rs6000/rs6000.h	(.../gcc/config/rs6000)	(working copy)
@@ -760,13 +760,15 @@ extern int rs6000_vector_align[];
 				 && TARGET_SINGLE_FLOAT			\
 				 && TARGET_DOUBLE_FLOAT)
 
-/* Macro to say whether we can do optimization where we need to do parts of the
-   calculation in 64-bit GPRs and then is transfered to the vector
-   registers.  */
+/* Macro to say whether we can do optimizations where we need to do parts of
+   the calculation in 64-bit GPRs and then is transfered to the vector
+   registers.  Do not allow -maltivec=be for these optimizations, because it
+   adds to the complexity of the code.  */
 #define TARGET_DIRECT_MOVE_64BIT	(TARGET_DIRECT_MOVE		\
 					 && TARGET_P8_VECTOR		\
 					 && TARGET_POWERPC64		\
-					 && TARGET_UPPER_REGS_DI)
+					 && TARGET_UPPER_REGS_DI	\
+					 && (rs6000_altivec_element_order != 2))
 
 /* Whether the various reciprocal divide/square root estimate instructions
    exist, and whether we should automatically generate code for the instruction

Index: gcc/testsuite/gcc.target/powerpc/vec-init-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vec-init-1.c	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc)	(revision 239554)
+++ gcc/testsuite/gcc.target/powerpc/vec-init-1.c	(.../gcc/testsuite/gcc.target/powerpc)	(working copy)
@@ -24,6 +24,9 @@ extern void check_splat (vector int a)
 extern vector int pack_reg (int a, int b, int c, int d)
   __attribute__((__noinline__));
 
+extern vector int pack_from_ptr (int *p_a, int *p_b, int *p_c, int *p_d)
+  __attribute__((__noinline__));
+
 extern vector int pack_const (void)
   __attribute__((__noinline__));
 
@@ -39,6 +42,9 @@ extern void pack_global (int a, int b, i
 extern vector int splat_reg (int a)
   __attribute__((__noinline__));
 
+extern vector int splat_from_ptr (int *p)
+  __attribute__((__noinline__));
+
 extern vector int splat_const (void)
   __attribute__((__noinline__));
 
@@ -78,6 +84,12 @@ pack_reg (int a, int b, int c, int d)
 }
 
 vector int
+pack_from_ptr (int *p_a, int *p_b, int *p_c, int *p_d)
+{
+  return (vector int) { *p_a, *p_b, *p_c, *p_d };
+}
+
+vector int
 pack_const (void)
 {
   return (vector int) { ELEMENTS };
@@ -108,6 +120,12 @@ splat_reg (int a)
 }
 
 vector int
+splat_from_ptr (int *p)
+{
+  return (vector int) { *p, *p, *p, *p };
+}
+
+vector int
 splat_const (void)
 {
   return (vector int) { SPLAT, SPLAT, SPLAT, SPLAT };
@@ -134,11 +152,15 @@ splat_global (int a)
 int main (void)
 {
   vector int sv2, sv3;
+  int mem = SPLAT;
+  int mem2[4] = { ELEMENTS };
 
   check (sv);
 
   check (pack_reg (ELEMENTS));
 
+  check (pack_from_ptr (&mem2[0], &mem2[1], &mem2[2], &mem2[3]));
+
   check (pack_const ());
 
   pack_ptr (&sv2, ELEMENTS);
@@ -154,6 +176,8 @@ int main (void)
 
   check_splat (splat_reg (SPLAT));
 
+  check_splat (splat_from_ptr (&mem));
+
   check_splat (splat_const ());
 
   splat_ptr (&sv2, SPLAT);
Index: gcc/testsuite/gcc.target/powerpc/vec-init-2.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vec-init-2.c	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc)	(revision 239554)
+++ gcc/testsuite/gcc.target/powerpc/vec-init-2.c	(.../gcc/testsuite/gcc.target/powerpc)	(working copy)
@@ -24,6 +24,9 @@ extern void check_splat (vector long a)
 extern vector long pack_reg (long a, long b)
   __attribute__((__noinline__));
 
+extern vector long pack_from_ptr (long *p_a, long *p_b)
+  __attribute__((__noinline__));
+
 extern vector long pack_const (void)
   __attribute__((__noinline__));
 
@@ -39,6 +42,9 @@ extern void pack_global (long a, long b)
 extern vector long splat_reg (long a)
   __attribute__((__noinline__));
 
+extern vector long splat_from_ptr (long *p)
+  __attribute__((__noinline__));
+
 extern vector long splat_const (void)
   __attribute__((__noinline__));
 
@@ -78,6 +84,12 @@ pack_reg (long a, long b)
 }
 
 vector long
+pack_from_ptr (long *p_a, long *p_b)
+{
+  return (vector long) { *p_a, *p_b };
+}
+
+vector long
 pack_const (void)
 {
   return (vector long) { ELEMENTS };
@@ -108,6 +120,12 @@ splat_reg (long a)
 }
 
 vector long
+splat_from_ptr (long *p)
+{
+  return (vector long) { *p, *p };
+}
+
+vector long
 splat_const (void)
 {
   return (vector long) { SPLAT, SPLAT };
@@ -134,11 +152,15 @@ splat_global (long a)
 int  main (void)
 {
   vector long sv2, sv3;
+  long mem = SPLAT;
+  long mem2[2] = { ELEMENTS };
 
   check (sv);
 
   check (pack_reg (ELEMENTS));
 
+  check (pack_from_ptr (&mem2[0], &mem2[1]));
+
   check (pack_const ());
 
   pack_ptr (&sv2, ELEMENTS);
@@ -154,6 +176,8 @@ int  main (void)
 
   check_splat (splat_reg (SPLAT));
 
+  check_splat (splat_from_ptr (&mem));
+
   check_splat (splat_const ());
 
   splat_ptr (&sv2, SPLAT);

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH], Patch #6, Improve vector short/char splat initialization on PowerPC
  2016-08-19 22:18       ` [PATCH], Patch #5, Improve vector int " Michael Meissner
@ 2016-08-19 23:59         ` Michael Meissner
  2016-08-22 16:47           ` Segher Boessenkool
  2016-08-26 19:30           ` [PATCH], Patch #7, Add PowerPC vector initialization tests Michael Meissner
  2016-08-22 16:38         ` [PATCH], Patch #5, Improve vector int initialization on PowerPC Segher Boessenkool
  1 sibling, 2 replies; 16+ messages in thread
From: Michael Meissner @ 2016-08-19 23:59 UTC (permalink / raw)
  To: Michael Meissner, Segher Boessenkool, gcc-patches,
	David Edelsohn, Bill Schmidt

[-- Attachment #1: Type: text/plain, Size: 1774 bytes --]

This patch is a follow up to patch #5.  It adds the support to use the Altivec
VSPLTB/VSPLTH instructions if you are creating a vector char or vector short
where each element is the same (but not constant) on 64-bit systems with direct
move.

The patch has been part of the larger set of patches for vector initialization
that I've been testing for awhile.  Most of those patches were submitted in
patch #5, and in this patch (#6).

There are a few patches remaining that cause a 4% performance degradation in
the zeusmp benchmark (everything else with the larger set of patches is about
the same performance).  I built and ran zeusmp, and these particular patches do
not cause the degradation.  I will submit a full run over the weekend just to
be sure.

I tested these patches on a big endian Power8 system and a little endian Power8
system, and previous versions have run on a big endian Power7 system.  There
were no regressions caused by these patches.  Can I install these patches in
the GCC 7 trunk after the patches in patch #5 are installed?

[gcc]
2016-08-19  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* config/rs6000/rs6000.c (rs6000_expand_vector_init): Add support
	for using VSPLTH/VSPLTB to initialize vector short and vector char
	vectors with all of the same element.

	* config/rs6000/vsx.md (VSX_SPLAT_I): New mode iterators and
	attributes to initialize V8HImode and V16QImode vectors with the
	same element.
	(VSX_SPLAT_COUNT): Likewise.
	(VSX_SPLAT_SUFFIX): Likewise.
	(vsx_vsplt<VSX_SPLAT_SUFFIX>_di): New insns to support
	initializing V8HImode and V16QImode vectors with the same
	element.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

[-- Attachment #2: gcc-stage7.init006b --]
[-- Type: text/plain, Size: 2666 bytes --]

Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 239627)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -6827,6 +6827,32 @@ rs6000_expand_vector_init (rtx target, r
       return;
     }
 
+  /* Special case initializing vector short/char that are splats if we are on
+     64-bit systems with direct move.  */
+  if (all_same && TARGET_DIRECT_MOVE_64BIT
+      && (mode == V16QImode || mode == V8HImode))
+    {
+      rtx op0 = XVECEXP (vals, 0, 0);
+      rtx di_tmp = gen_reg_rtx (DImode);
+
+      if (!REG_P (op0))
+	op0 = force_reg (GET_MODE_INNER (mode), op0);
+
+      if (mode == V16QImode)
+	{
+	  emit_insn (gen_zero_extendqidi2 (di_tmp, op0));
+	  emit_insn (gen_vsx_vspltb_di (target, di_tmp));
+	  return;
+	}
+
+      if (mode == V8HImode)
+	{
+	  emit_insn (gen_zero_extendhidi2 (di_tmp, op0));
+	  emit_insn (gen_vsx_vsplth_di (target, di_tmp));
+	  return;
+	}
+    }
+
   /* Store value to stack temp.  Load vector element.  Splat.  However, splat
      of 64-bit items is not supported on Altivec.  */
   if (all_same && GET_MODE_SIZE (inner_mode) <= 4)
Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md	(revision 239588)
+++ gcc/config/rs6000/vsx.md	(working copy)
@@ -281,6 +281,16 @@ (define_mode_attr VSX_EX [(V16QI "v")
 			  (V8HI  "v")
 			  (V4SI  "wa")])
 
+;; Iterator for the 2 short vector types to do a splat from an integer
+(define_mode_iterator VSX_SPLAT_I [V16QI V8HI])
+
+;; Mode attribute to give the count for the splat instruction to splat
+;; the value in the 64-bit integer slot
+(define_mode_attr VSX_SPLAT_COUNT [(V16QI "7") (V8HI "3")])
+
+;; Mode attribute to give the suffix for the splat instruction
+(define_mode_attr VSX_SPLAT_SUFFIX [(V16QI "b") (V8HI "h")])
+
 ;; Constants for creating unspecs
 (define_c_enum "unspec"
   [UNSPEC_VSX_CONCAT
@@ -2766,6 +2776,16 @@ (define_insn "vsx_xxspltw_<mode>_direct"
   "xxspltw %x0,%x1,%2"
   [(set_attr "type" "vecperm")])
 
+;; V16QI/V8HI splat support on ISA 2.07
+(define_insn "vsx_vsplt<VSX_SPLAT_SUFFIX>_di"
+  [(set (match_operand:VSX_SPLAT_I 0 "altivec_register_operand" "=v")
+	(vec_duplicate:VSX_SPLAT_I
+	 (truncate:<VS_scalar>
+	  (match_operand:DI 1 "altivec_register_operand" "v"))))]
+  "VECTOR_MEM_VSX_P (<MODE>mode) && TARGET_DIRECT_MOVE_64BIT"
+  "vsplt<VSX_SPLAT_SUFFIX> %0,%1,<VSX_SPLAT_COUNT>"
+  [(set_attr "type" "vecperm")])
+
 ;; V2DF/V2DI splat for use by vec_splat builtin
 (define_insn "vsx_xxspltd_<mode>"
   [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH], Patch #6, Improve vector short/char splat initialization on PowerPC
  2016-08-19 23:59         ` [PATCH], Patch #6, Improve vector short/char splat " Michael Meissner
@ 2016-08-22 16:47           ` Segher Boessenkool
  2016-08-26 19:30           ` [PATCH], Patch #7, Add PowerPC vector initialization tests Michael Meissner
  1 sibling, 0 replies; 16+ messages in thread
From: Segher Boessenkool @ 2016-08-22 16:47 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, David Edelsohn, Bill Schmidt

Hi Mike,

Okay for trunk.  Comment below...

On Fri, Aug 19, 2016 at 07:59:39PM -0400, Michael Meissner wrote:
> 	* config/rs6000/rs6000.c (rs6000_expand_vector_init): Add support
> 	for using VSPLTH/VSPLTB to initialize vector short and vector char
> 	vectors with all of the same element.
> 
> 	* config/rs6000/vsx.md (VSX_SPLAT_I): New mode iterators and
> 	attributes to initialize V8HImode and V16QImode vectors with the
> 	same element.
> 	(VSX_SPLAT_COUNT): Likewise.
> 	(VSX_SPLAT_SUFFIX): Likewise.
> 	(vsx_vsplt<VSX_SPLAT_SUFFIX>_di): New insns to support
> 	initializing V8HImode and V16QImode vectors with the same
> 	element.

> +  /* Special case initializing vector short/char that are splats if we are on
> +     64-bit systems with direct move.  */
> +  if (all_same && TARGET_DIRECT_MOVE_64BIT
> +      && (mode == V16QImode || mode == V8HImode))
> +    {
> +      rtx op0 = XVECEXP (vals, 0, 0);
> +      rtx di_tmp = gen_reg_rtx (DImode);
> +
> +      if (!REG_P (op0))
> +	op0 = force_reg (GET_MODE_INNER (mode), op0);

Always using force_reg is easier to read imo.

Thanks,


Segher

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH], Patch #7, Add PowerPC vector initialization tests
  2016-08-19 23:59         ` [PATCH], Patch #6, Improve vector short/char splat " Michael Meissner
  2016-08-22 16:47           ` Segher Boessenkool
@ 2016-08-26 19:30           ` Michael Meissner
  2016-08-27  1:24             ` Segher Boessenkool
  1 sibling, 1 reply; 16+ messages in thread
From: Michael Meissner @ 2016-08-26 19:30 UTC (permalink / raw)
  To: Michael Meissner
  Cc: Segher Boessenkool, gcc-patches, David Edelsohn, Bill Schmidt

[-- Attachment #1: Type: text/plain, Size: 1200 bytes --]

These patches add more tests to the PowerPC vector initialization tests.  Four
of the tests added (#4, #5, #8, and #9) just try to do a bunch of vector
initializations for the different vector type (char, short, float, and double).

These other patches (#6, #7) test the code generated in paches #5 and #6.

I have run tese tests on a big endian power7 system (with both 32-bt and 64-bit
tests runs), a big endian power8 system (just 64-bit tests), and a little
endian power8 system.  There were no regressions.  As these tests ok to
install?

2016-08-25  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* gcc.target/powerpc/vec-init-4.c: New runtime tests for various
	vector short/char initializations.
	* gcc.target/powerpc/vec-init-5.c: Likewise.
	* gcc.target/powerpc/vec-init-6.c: New compile time test for
	vector initialization optimizations.
	* gcc.target/powerpc/vec-init-7.c: Likewise.
	* gcc.target/powerpc/vec-init-8.c: New runtime tests for various
	vector float/double initializations.
	* gcc.target/powerpc/vec-init-9.c: Likewise.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

[-- Attachment #2: gcc-stage7.init009b --]
[-- Type: text/plain, Size: 22705 bytes --]

Index: gcc/testsuite/gcc.target/powerpc/vec-init-4.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vec-init-4.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vec-init-4.c	(working copy)
@@ -0,0 +1,212 @@
+/* { dg-do run { target { powerpc*-*-linux* } } } */
+/* { dg-require-effective-target vsx_hw } */
+/* { dg-options "-O2 -mvsx" } */
+
+#include <stdlib.h>
+#include <stddef.h>
+#include <altivec.h>
+
+#define ELEMENTS -1, 2, 0, -32768, 32767, 53, 1, 16000
+#define SPLAT 0x0123
+
+vector short sv = (vector short) { ELEMENTS };
+vector short splat = (vector short) { SPLAT, SPLAT, SPLAT, SPLAT,
+				      SPLAT, SPLAT, SPLAT, SPLAT };
+vector short sv_global, sp_global;
+static vector short sv_static, sp_static;
+static short expected[] = { ELEMENTS };
+static short splat_expected = SPLAT;
+
+extern void check (vector short a)
+  __attribute__((__noinline__));
+
+extern void check_splat (vector short a)
+  __attribute__((__noinline__));
+
+extern vector short pack_reg (short a, short b, short c, short d,
+			      short e, short f, short g, short h)
+  __attribute__((__noinline__));
+
+extern vector short pack_from_ptr (short *p_a, short *p_b,
+				   short *p_c, short *p_d,
+				   short *p_e, short *p_f,
+				   short *p_g, short *p_h)
+  __attribute__((__noinline__));
+
+extern vector short pack_const (void)
+  __attribute__((__noinline__));
+
+extern void pack_ptr (vector short *p,
+		      short a, short b, short c, short d,
+		      short e, short f, short g, short h)
+  __attribute__((__noinline__));
+
+extern void pack_static (short a, short b, short c, short d,
+			 short e, short f, short g, short h)
+  __attribute__((__noinline__));
+
+extern void pack_global (short a, short b, short c, short d,
+			 short e, short f, short g, short h)
+  __attribute__((__noinline__));
+
+extern vector short splat_reg (short a)
+  __attribute__((__noinline__));
+
+extern vector short splat_from_ptr (short *p_a)
+  __attribute__((__noinline__));
+
+extern vector short splat_const (void)
+  __attribute__((__noinline__));
+
+extern void splat_ptr (vector short *p, short a)
+  __attribute__((__noinline__));
+
+extern void splat_static (short a)
+  __attribute__((__noinline__));
+
+extern void splat_global (short a)
+  __attribute__((__noinline__));
+
+void
+check (vector short a)
+{
+  size_t i;
+
+  for (i = 0; i < 8; i++)
+    if (vec_extract (a, i) != expected[i])
+      abort ();
+}
+
+void
+check_splat (vector short a)
+{
+  size_t i;
+
+  for (i = 0; i < 8; i++)
+    if (vec_extract (a, i) != SPLAT)
+      abort ();
+}
+
+vector short
+pack_reg (short a, short b, short c, short d,
+	  short e, short f, short g, short h)
+{
+  return (vector short) { a, b, c, d, e, f, g, h };
+}
+
+vector short
+pack_from_ptr (short *p_a, short *p_b, short *p_c, short *p_d,
+	       short *p_e, short *p_f, short *p_g, short *p_h)
+{
+  return (vector short) { *p_a, *p_b, *p_c, *p_d,
+			  *p_e, *p_f, *p_g, *p_h };
+}
+
+vector short
+pack_const (void)
+{
+  return (vector short) { ELEMENTS };
+}
+
+void
+pack_ptr (vector short *p,
+	  short a, short b, short c, short d,
+	  short e, short f, short g, short h)
+{
+  *p = (vector short) { a, b, c, d, e, f, g, h };
+}
+
+void
+pack_static (short a, short b, short c, short d,
+	     short e, short f, short g, short h)
+{
+  sv_static = (vector short) { a, b, c, d, e, f, g, h };
+}
+
+void
+pack_global (short a, short b, short c, short d,
+	     short e, short f, short g, short h)
+{
+  sv_global = (vector short) { a, b, c, d, e, f, g, h };
+}
+
+vector short
+splat_reg (short a)
+{
+  return (vector short) { a, a, a, a, a, a, a, a };
+}
+
+vector short
+splat_from_ptr (short *p_a)
+{
+  return (vector short) { *p_a, *p_a, *p_a, *p_a,
+			  *p_a, *p_a, *p_a, *p_a };
+}
+
+vector short
+splat_const (void)
+{
+  return (vector short) { SPLAT, SPLAT, SPLAT, SPLAT,
+			  SPLAT, SPLAT, SPLAT, SPLAT };
+}
+
+void
+splat_ptr (vector short *p, short a)
+{
+  *p = (vector short) { a, a, a, a, a, a, a, a };
+}
+
+void
+splat_static (short a)
+{
+  sp_static = (vector short) { a, a, a, a, a, a, a, a };
+}
+
+void
+splat_global (short a)
+{
+  sp_global = (vector short) { a, a, a, a, a, a, a, a };
+}
+
+int main (void)
+{
+  vector short sv2, sv3;
+
+  check (sv);
+
+  check (pack_reg (ELEMENTS));
+
+  check (pack_from_ptr (&expected[0], &expected[1], &expected[2],
+			&expected[3], &expected[4], &expected[5],
+			&expected[6], &expected[7]));
+
+  check (pack_const ());
+
+  pack_ptr (&sv2, ELEMENTS);
+  check (sv2);
+
+  pack_static (ELEMENTS);
+  check (sv_static);
+
+  pack_global (ELEMENTS);
+  check (sv_global);
+
+  check_splat (splat);
+
+  check_splat (splat_reg (SPLAT));
+
+  check_splat (splat_from_ptr (&splat_expected));
+
+  check_splat (splat_const ());
+
+  splat_ptr (&sv2, SPLAT);
+  check_splat (sv2);
+
+  splat_static (SPLAT);
+  check_splat (sp_static);
+
+  splat_global (SPLAT);
+  check_splat (sp_global);
+
+  return 0;
+}
Index: gcc/testsuite/gcc.target/powerpc/vec-init-5.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vec-init-5.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vec-init-5.c	(working copy)
@@ -0,0 +1,258 @@
+/* { dg-do run { target { powerpc*-*-linux* } } } */
+/* { dg-require-effective-target vsx_hw } */
+/* { dg-options "-O2 -mvsx" } */
+
+#include <stdlib.h>
+#include <stddef.h>
+#include <altivec.h>
+
+#define ELEMENTS 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 127, -1, -128
+#define SPLAT 0x12
+
+vector signed char sv = (vector signed char) { ELEMENTS };
+vector signed char splat = (vector signed char) { SPLAT, SPLAT, SPLAT, SPLAT,
+						  SPLAT, SPLAT, SPLAT, SPLAT,
+						  SPLAT, SPLAT, SPLAT, SPLAT,
+						  SPLAT, SPLAT, SPLAT, SPLAT };
+vector signed char sv_global, sp_global;
+static vector signed char sv_static, sp_static;
+static signed char expected[] = { ELEMENTS };
+static signed char splat_expected = SPLAT;
+
+extern void check (vector signed char a)
+  __attribute__((__noinline__));
+
+extern void check_splat (vector signed char a)
+  __attribute__((__noinline__));
+
+extern vector signed char pack_reg (signed char a, signed char b,
+				    signed char c, signed char d,
+				    signed char e, signed char f,
+				    signed char g, signed char h,
+				    signed char i, signed char j,
+				    signed char k, signed char l,
+				    signed char m, signed char n,
+				    signed char o, signed char p)
+  __attribute__((__noinline__));
+
+extern vector signed char pack_from_ptr (signed char *p_a, signed char *p_b,
+					 signed char *p_c, signed char *p_d,
+					 signed char *p_e, signed char *p_f,
+					 signed char *p_g, signed char *p_h,
+					 signed char *p_i, signed char *p_j,
+					 signed char *p_k, signed char *p_l,
+					 signed char *p_m, signed char *p_n,
+					 signed char *p_o, signed char *p_p)
+  __attribute__((__noinline__));
+
+extern vector signed char pack_const (void)
+  __attribute__((__noinline__));
+
+extern void pack_ptr (vector signed char *q,
+		      signed char a, signed char b, signed char c, signed char d,
+		      signed char e, signed char f, signed char g, signed char h,
+		      signed char i, signed char j, signed char k, signed char l,
+		      signed char m, signed char n, signed char o, signed char p)
+  __attribute__((__noinline__));
+
+extern void pack_static (signed char a, signed char b, signed char c, signed char d,
+			 signed char e, signed char f, signed char g, signed char h,
+			 signed char i, signed char j, signed char k, signed char l,
+			 signed char m, signed char n, signed char o, signed char p)
+  __attribute__((__noinline__));
+
+extern void pack_global (signed char a, signed char b, signed char c, signed char d,
+			 signed char e, signed char f, signed char g, signed char h,
+			 signed char i, signed char j, signed char k, signed char l,
+			 signed char m, signed char n, signed char o, signed char p)
+  __attribute__((__noinline__));
+
+extern vector signed char splat_reg (signed char a)
+  __attribute__((__noinline__));
+
+extern vector signed char splat_from_ptr (signed char *p_a)
+  __attribute__((__noinline__));
+
+extern vector signed char splat_const (void)
+  __attribute__((__noinline__));
+
+extern void splat_ptr (vector signed char *p, signed char a)
+  __attribute__((__noinline__));
+
+extern void splat_static (signed char a)
+  __attribute__((__noinline__));
+
+extern void splat_global (signed char a)
+  __attribute__((__noinline__));
+
+void
+check (vector signed char a)
+{
+  size_t i;
+
+  for (i = 0; i < 16; i++)
+    if (vec_extract (a, i) != expected[i])
+      abort ();
+}
+
+void
+check_splat (vector signed char a)
+{
+  size_t i;
+
+  for (i = 0; i < 16; i++)
+    if (vec_extract (a, i) != SPLAT)
+      abort ();
+}
+
+vector signed char
+pack_reg (signed char a, signed char b, signed char c, signed char d,
+	  signed char e, signed char f, signed char g, signed char h,
+	  signed char i, signed char j, signed char k, signed char l,
+	  signed char m, signed char n, signed char o, signed char p)
+{
+  return (vector signed char) { a, b, c, d, e, f, g, h,
+				i, j, k, l, m, n, o, p };
+}
+
+vector signed char
+pack_from_ptr (signed char *p_a, signed char *p_b, signed char *p_c, signed char *p_d,
+	       signed char *p_e, signed char *p_f, signed char *p_g, signed char *p_h,
+	       signed char *p_i, signed char *p_j, signed char *p_k, signed char *p_l,
+	       signed char *p_m, signed char *p_n, signed char *p_o, signed char *p_p)
+{
+  return (vector signed char) { *p_a, *p_b, *p_c, *p_d,
+				*p_e, *p_f, *p_g, *p_h,
+				*p_i, *p_j, *p_k, *p_l,
+				*p_m, *p_n, *p_o, *p_p };
+
+}
+
+vector signed char
+pack_const (void)
+{
+  return (vector signed char) { ELEMENTS };
+}
+
+void
+pack_ptr (vector signed char *q,
+	  signed char a, signed char b, signed char c, signed char d,
+	  signed char e, signed char f, signed char g, signed char h,
+	  signed char i, signed char j, signed char k, signed char l,
+	  signed char m, signed char n, signed char o, signed char p)
+{
+  *q = (vector signed char) { a, b, c, d, e, f, g, h,
+			      i, j, k, l, m, n, o, p };
+}
+
+void
+pack_static (signed char a, signed char b, signed char c, signed char d,
+	     signed char e, signed char f, signed char g, signed char h,
+	     signed char i, signed char j, signed char k, signed char l,
+	     signed char m, signed char n, signed char o, signed char p)
+{
+  sv_static = (vector signed char) { a, b, c, d, e, f, g, h,
+				     i, j, k, l, m, n, o, p };
+}
+
+void
+pack_global (signed char a, signed char b, signed char c, signed char d,
+	     signed char e, signed char f, signed char g, signed char h,
+	     signed char i, signed char j, signed char k, signed char l,
+	     signed char m, signed char n, signed char o, signed char p)
+{
+  sv_global = (vector signed char) { a, b, c, d, e, f, g, h,
+				     i, j, k, l, m, n, o, p };
+}
+
+vector signed char
+splat_reg (signed char a)
+{
+  return (vector signed char) { a, a, a, a, a, a, a, a,
+				a, a, a, a, a, a, a, a };
+}
+
+vector signed char
+splat_from_ptr (signed char *p_a)
+{
+  return (vector signed char) { *p_a, *p_a, *p_a, *p_a,
+				*p_a, *p_a, *p_a, *p_a,
+				*p_a, *p_a, *p_a, *p_a,
+				*p_a, *p_a, *p_a, *p_a };
+}
+
+vector signed char
+splat_const (void)
+{
+  return (vector signed char) { SPLAT, SPLAT, SPLAT, SPLAT,
+				SPLAT, SPLAT, SPLAT, SPLAT,
+				SPLAT, SPLAT, SPLAT, SPLAT,
+				SPLAT, SPLAT, SPLAT, SPLAT };
+}
+
+void
+splat_ptr (vector signed char *p, signed char a)
+{
+  *p = (vector signed char) { a, a, a, a, a, a, a, a,
+			      a, a, a, a, a, a, a, a };
+}
+
+void
+splat_static (signed char a)
+{
+  sp_static = (vector signed char) { a, a, a, a, a, a, a, a,
+				     a, a, a, a, a, a, a, a };
+}
+
+void
+splat_global (signed char a)
+{
+  sp_global = (vector signed char) { a, a, a, a, a, a, a, a,
+				     a, a, a, a, a, a, a, a };
+}
+
+int main (void)
+{
+  vector signed char sv2, sv3;
+
+  check (sv);
+
+  check (pack_reg (ELEMENTS));
+
+  check (pack_from_ptr (&expected[0],  &expected[1],  &expected[2],
+			&expected[3],  &expected[4],  &expected[5],
+			&expected[6],  &expected[7],  &expected[8],
+			&expected[9],  &expected[10], &expected[11],
+			&expected[12], &expected[13], &expected[14],
+			&expected[15]));
+
+  check (pack_const ());
+
+  pack_ptr (&sv2, ELEMENTS);
+  check (sv2);
+
+  pack_static (ELEMENTS);
+  check (sv_static);
+
+  pack_global (ELEMENTS);
+  check (sv_global);
+
+  check_splat (splat);
+
+  check_splat (splat_reg (SPLAT));
+
+  check_splat (splat_from_ptr (&splat_expected));
+
+  check_splat (splat_const ());
+
+  splat_ptr (&sv2, SPLAT);
+  check_splat (sv2);
+
+  splat_static (SPLAT);
+  check_splat (sp_static);
+
+  splat_global (SPLAT);
+  check_splat (sp_global);
+
+  return 0;
+}
Index: gcc/testsuite/gcc.target/powerpc/vec-init-6.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vec-init-6.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vec-init-6.c	(working copy)
@@ -0,0 +1,16 @@
+/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-mcpu=power8 -O2 -mupper-regs-di" } */
+
+vector int
+merge (int a, int b, int c, int d)
+{
+  return (vector int) { a, b, c, d };
+}
+
+/* { dg-final { scan-assembler     "rldicr" } } */
+/* { dg-final { scan-assembler     "rldicl" } } */
+/* { dg-final { scan-assembler     "mtvsrd" } } */
+/* { dg-final { scan-assembler-not "stw"    } } */
+/* { dg-final { scan-assembler-not "lxvw4x" } } */
Index: gcc/testsuite/gcc.target/powerpc/vec-init-7.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vec-init-7.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vec-init-7.c	(working copy)
@@ -0,0 +1,13 @@
+/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-mcpu=power8 -O2 -mupper-regs-di" } */
+
+vector int
+splat (int a)
+{
+  return (vector int) { a, a, a, a };
+}
+
+/* { dg-final { scan-assembler "mtvsrwz" } } */
+/* { dg-final { scan-assembler "xxspltw" } } */
Index: gcc/testsuite/gcc.target/powerpc/vec-init-8.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vec-init-8.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vec-init-8.c	(working copy)
@@ -0,0 +1,194 @@
+/* { dg-do run { target { powerpc*-*-linux* } } } */
+/* { dg-require-effective-target vsx_hw } */
+/* { dg-options "-O2 -mvsx" } */
+
+#include <stdlib.h>
+#include <stddef.h>
+#include <altivec.h>
+
+#define ELEMENTS -1.0f, 2.0f, 0.0f, -1234.0f
+#define SPLAT 2345.0f
+
+vector float sv = (vector float) { ELEMENTS };
+vector float splat = (vector float) { SPLAT, SPLAT, SPLAT, SPLAT };
+vector float sv_global, sp_global;
+static vector float sv_static, sp_static;
+static const float expected[] = { ELEMENTS };
+
+extern void check (vector float a)
+  __attribute__((__noinline__));
+
+extern void check_splat (vector float a)
+  __attribute__((__noinline__));
+
+extern vector float pack_reg (float a, float b, float c, float d)
+  __attribute__((__noinline__));
+
+extern vector float pack_from_ptr (float *p_a, float *p_b,
+				   float *p_c, float *p_d)
+  __attribute__((__noinline__));
+
+extern vector float pack_const (void)
+  __attribute__((__noinline__));
+
+extern void pack_ptr (vector float *p, float a, float b, float c, float d)
+  __attribute__((__noinline__));
+
+extern void pack_static (float a, float b, float c, float d)
+  __attribute__((__noinline__));
+
+extern void pack_global (float a, float b, float c, float d)
+  __attribute__((__noinline__));
+
+extern vector float splat_reg (float a)
+  __attribute__((__noinline__));
+
+extern vector float splat_from_ptr (float *p)
+  __attribute__((__noinline__));
+
+extern vector float splat_const (void)
+  __attribute__((__noinline__));
+
+extern void splat_ptr (vector float *p, float a)
+  __attribute__((__noinline__));
+
+extern void splat_static (float a)
+  __attribute__((__noinline__));
+
+extern void splat_global (float a)
+  __attribute__((__noinline__));
+
+void
+check (vector float a)
+{
+  size_t i;
+
+  for (i = 0; i < 4; i++)
+    if (vec_extract (a, i) != expected[i])
+      abort ();
+}
+
+void
+check_splat (vector float a)
+{
+  size_t i;
+
+  for (i = 0; i < 4; i++)
+    if (vec_extract (a, i) != SPLAT)
+      abort ();
+}
+
+vector float
+pack_reg (float a, float b, float c, float d)
+{
+  return (vector float) { a, b, c, d };
+}
+
+vector float
+pack_from_ptr (float *p_a, float *p_b, float *p_c, float *p_d)
+{
+  return (vector float) { *p_a, *p_b, *p_c, *p_d };
+}
+
+vector float
+pack_const (void)
+{
+  return (vector float) { ELEMENTS };
+}
+
+void
+pack_ptr (vector float *p, float a, float b, float c, float d)
+{
+  *p = (vector float) { a, b, c, d };
+}
+
+void
+pack_static (float a, float b, float c, float d)
+{
+  sv_static = (vector float) { a, b, c, d };
+}
+
+void
+pack_global (float a, float b, float c, float d)
+{
+  sv_global = (vector float) { a, b, c, d };
+}
+
+vector float
+splat_reg (float a)
+{
+  return (vector float) { a, a, a, a };
+}
+
+vector float
+splat_from_ptr (float *p)
+{
+  return (vector float) { *p, *p, *p, *p };
+}
+
+vector float
+splat_const (void)
+{
+  return (vector float) { SPLAT, SPLAT, SPLAT, SPLAT };
+}
+
+void
+splat_ptr (vector float *p, float a)
+{
+  *p = (vector float) { a, a, a, a };
+}
+
+void
+splat_static (float a)
+{
+  sp_static = (vector float) { a, a, a, a };
+}
+
+void
+splat_global (float a)
+{
+  sp_global = (vector float) { a, a, a, a };
+}
+
+int main (void)
+{
+  vector float sv2, sv3;
+  float mem = SPLAT;
+  float mem2[4] = { ELEMENTS };
+
+  check (sv);
+
+  check (pack_reg (ELEMENTS));
+
+  check (pack_from_ptr (&mem2[0], &mem2[1], &mem2[2], &mem2[3]));
+
+  check (pack_const ());
+
+  pack_ptr (&sv2, ELEMENTS);
+  check (sv2);
+
+  pack_static (ELEMENTS);
+  check (sv_static);
+
+  pack_global (ELEMENTS);
+  check (sv_global);
+
+  check_splat (splat);
+
+  check_splat (splat_reg (SPLAT));
+
+  check_splat (splat_from_ptr (&mem));
+
+  check_splat (splat_const ());
+
+  splat_ptr (&sv2, SPLAT);
+  check_splat (sv2);
+
+  splat_static (SPLAT);
+  check_splat (sp_static);
+
+  splat_global (SPLAT);
+  check_splat (sp_global);
+
+  return 0;
+}
Index: gcc/testsuite/gcc.target/powerpc/vec-init-9.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vec-init-9.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vec-init-9.c	(working copy)
@@ -0,0 +1,193 @@
+/* { dg-do run { target { powerpc*-*-linux* && lp64 } } } */
+/* { dg-require-effective-target vsx_hw } */
+/* { dg-options "-O2 -mvsx" } */
+
+#include <stdlib.h>
+#include <stddef.h>
+#include <altivec.h>
+
+#define ELEMENTS -12345.0, 23456.0
+#define SPLAT 34567.0
+
+vector double sv = (vector double) { ELEMENTS };
+vector double splat = (vector double) { SPLAT, SPLAT };
+vector double sv_global, sp_global;
+static vector double sv_static, sp_static;
+static const int expected[] = { ELEMENTS };
+
+extern void check (vector double a)
+  __attribute__((__noinline__));
+
+extern void check_splat (vector double a)
+  __attribute__((__noinline__));
+
+extern vector double pack_reg (double a, double b)
+  __attribute__((__noinline__));
+
+extern vector double pack_from_ptr (double *p_a, double *p_b)
+  __attribute__((__noinline__));
+
+extern vector double pack_const (void)
+  __attribute__((__noinline__));
+
+extern void pack_ptr (vector double *p, double a, double b)
+  __attribute__((__noinline__));
+
+extern void pack_static (double a, double b)
+  __attribute__((__noinline__));
+
+extern void pack_global (double a, double b)
+  __attribute__((__noinline__));
+
+extern vector double splat_reg (double a)
+  __attribute__((__noinline__));
+
+extern vector double splat_from_ptr (double *p)
+  __attribute__((__noinline__));
+
+extern vector double splat_const (void)
+  __attribute__((__noinline__));
+
+extern void splat_ptr (vector double *p, double a)
+  __attribute__((__noinline__));
+
+extern void splat_static (double a)
+  __attribute__((__noinline__));
+
+extern void splat_global (double a)
+  __attribute__((__noinline__));
+
+void
+check (vector double a)
+{
+  size_t i;
+
+  for (i = 0; i < 2; i++)
+    if (vec_extract (a, i) != expected[i])
+      abort ();
+}
+
+void
+check_splat (vector double a)
+{
+  size_t i;
+
+  for (i = 0; i < 2; i++)
+    if (vec_extract (a, i) != SPLAT)
+      abort ();
+}
+
+vector double
+pack_reg (double a, double b)
+{
+  return (vector double) { a, b };
+}
+
+vector double
+pack_from_ptr (double *p_a, double *p_b)
+{
+  return (vector double) { *p_a, *p_b };
+}
+
+vector double
+pack_const (void)
+{
+  return (vector double) { ELEMENTS };
+}
+
+void
+pack_ptr (vector double *p, double a, double b)
+{
+  *p = (vector double) { a, b };
+}
+
+void
+pack_static (double a, double b)
+{
+  sv_static = (vector double) { a, b };
+}
+
+void
+pack_global (double a, double b)
+{
+  sv_global = (vector double) { a, b };
+}
+
+vector double
+splat_reg (double a)
+{
+  return (vector double) { a, a };
+}
+
+vector double
+splat_from_ptr (double *p)
+{
+  return (vector double) { *p, *p };
+}
+
+vector double
+splat_const (void)
+{
+  return (vector double) { SPLAT, SPLAT };
+}
+
+void
+splat_ptr (vector double *p, double a)
+{
+  *p = (vector double) { a, a };
+}
+
+void
+splat_static (double a)
+{
+  sp_static = (vector double) { a, a };
+}
+
+void
+splat_global (double a)
+{
+  sp_global = (vector double) { a, a };
+}
+
+int  main (void)
+{
+  vector double sv2, sv3;
+  double mem = SPLAT;
+  double mem2[2] = { ELEMENTS };
+
+  check (sv);
+
+  check (pack_reg (ELEMENTS));
+
+  check (pack_from_ptr (&mem2[0], &mem2[1]));
+
+  check (pack_const ());
+
+  pack_ptr (&sv2, ELEMENTS);
+  check (sv2);
+
+  pack_static (ELEMENTS);
+  check (sv_static);
+
+  pack_global (ELEMENTS);
+  check (sv_global);
+
+  check_splat (splat);
+
+  check_splat (splat_reg (SPLAT));
+
+  check_splat (splat_from_ptr (&mem));
+
+  check_splat (splat_const ());
+
+  splat_ptr (&sv2, SPLAT);
+  check_splat (sv2);
+
+  splat_static (SPLAT);
+  check_splat (sp_static);
+
+  splat_global (SPLAT);
+  check_splat (sp_global);
+
+  return 0;
+}

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH], Patch #7, Add PowerPC vector initialization tests
  2016-08-26 19:30           ` [PATCH], Patch #7, Add PowerPC vector initialization tests Michael Meissner
@ 2016-08-27  1:24             ` Segher Boessenkool
  2016-08-29 19:35               ` Michael Meissner
  0 siblings, 1 reply; 16+ messages in thread
From: Segher Boessenkool @ 2016-08-27  1:24 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, David Edelsohn, Bill Schmidt

On Fri, Aug 26, 2016 at 03:29:50PM -0400, Michael Meissner wrote:
> These patches add more tests to the PowerPC vector initialization tests.  Four
> of the tests added (#4, #5, #8, and #9) just try to do a bunch of vector
> initializations for the different vector type (char, short, float, and double).
> 
> These other patches (#6, #7) test the code generated in paches #5 and #6.
> 
> I have run tese tests on a big endian power7 system (with both 32-bt and 64-bit
> tests runs), a big endian power8 system (just 64-bit tests), and a little
> endian power8 system.  There were no regressions.  As these tests ok to
> install?

This is okay for trunk; one comment:

> +/* { dg-final { scan-assembler     "mtvsrd" } } */

This also matches mtvsrdd; if you don't want that, you can avoid it by
writing it as {\mmtvsrd\M} (the {} instead of "" to avoid toothpickeritus).

Thanks,


Segher

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH], Patch #7, Add PowerPC vector initialization tests
  2016-08-27  1:24             ` Segher Boessenkool
@ 2016-08-29 19:35               ` Michael Meissner
  0 siblings, 0 replies; 16+ messages in thread
From: Michael Meissner @ 2016-08-29 19:35 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Michael Meissner, gcc-patches, David Edelsohn, Bill Schmidt

On Fri, Aug 26, 2016 at 08:24:40PM -0500, Segher Boessenkool wrote:
> On Fri, Aug 26, 2016 at 03:29:50PM -0400, Michael Meissner wrote:
> > These patches add more tests to the PowerPC vector initialization tests.  Four
> > of the tests added (#4, #5, #8, and #9) just try to do a bunch of vector
> > initializations for the different vector type (char, short, float, and double).
> > 
> > These other patches (#6, #7) test the code generated in paches #5 and #6.
> > 
> > I have run tese tests on a big endian power7 system (with both 32-bt and 64-bit
> > tests runs), a big endian power8 system (just 64-bit tests), and a little
> > endian power8 system.  There were no regressions.  As these tests ok to
> > install?
> 
> This is okay for trunk; one comment:
> 
> > +/* { dg-final { scan-assembler     "mtvsrd" } } */
> 
> This also matches mtvsrdd; if you don't want that, you can avoid it by
> writing it as {\mmtvsrd\M} (the {} instead of "" to avoid toothpickeritus).

In this case, mtvsrd and mtvsrdd are both fine.  However, given the test has an
explicit -mcpu=power8, it should never see mtvsrdd.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH], Patch #5, Improve vector int initialization on PowerPC
  2016-08-19 22:18       ` [PATCH], Patch #5, Improve vector int " Michael Meissner
  2016-08-19 23:59         ` [PATCH], Patch #6, Improve vector short/char splat " Michael Meissner
@ 2016-08-22 16:38         ` Segher Boessenkool
  2016-08-22 22:01           ` Michael Meissner
  1 sibling, 1 reply; 16+ messages in thread
From: Segher Boessenkool @ 2016-08-22 16:38 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, David Edelsohn, Bill Schmidt

[ seems this mail never arrived, resending, sorry if it turns out a duplicate ]

Hi Mike,

Okay for trunk.  A few comments...

On Fri, Aug 19, 2016 at 06:17:54PM -0400, Michael Meissner wrote:
> --- gcc/config/rs6000/rs6000.c	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)	(revision 239554)
> +++ gcc/config/rs6000/rs6000.c	(.../gcc/config/rs6000)	(working copy)
> @@ -6692,7 +6692,7 @@ rs6000_expand_vector_init (rtx target, r
>        if ((int_vector_p || TARGET_VSX) && all_const_zero)
>  	{
>  	  /* Zero register.  */
> -	  emit_insn (gen_rtx_SET (target, gen_rtx_XOR (mode, target, target)));
> +	  emit_insn (gen_rtx_SET (target, CONST0_RTX (mode)));

Can you use emit_move_insn here?  If so, please do.

> +  /* Special case initializing vector int if we are on 64-bit systems with
> +     direct move or we have the ISA 3.0 instructions.  */
> +  if (mode == V4SImode  && VECTOR_MEM_VSX_P (V4SImode)
> +      && TARGET_DIRECT_MOVE_64BIT)
>      {
> -      emit_insn (gen_vsx_splat_v4si (target, XVECEXP (vals, 0, 0)));
> -      return;
> +      if (all_same)
> +	{
> +	  rtx element0 = XVECEXP (vals, 0, 0);
> +	  if (MEM_P (element0))
> +	    element0 = rs6000_address_for_fpconvert (element0);
> +	  else if (!REG_P (element0))
> +	    element0 = force_reg (SImode, element0);

You can call force_reg if REG_P holds as well (it immediately returns
that reg itself).

> +static void
> +rs6000_split_v4si_init_di_reg (rtx dest, rtx si1, rtx si2, rtx tmp)
> +{
> +  const unsigned HOST_WIDE_INT mask_32bit = HOST_WIDE_INT_C (0xffffffff);

Does using that macro buy us anything?  Won't the plain number work just
as well?

> +      /* Generate RLDIC.  */
> +      rtx si1_di = gen_rtx_REG (DImode, regno_or_subregno (si1));
> +      rtx shift_rtx = gen_rtx_ASHIFT (DImode, si1_di, GEN_INT (32));
> +      rtx mask_rtx = GEN_INT (mask_32bit << 32);
> +      rtx and_rtx = gen_rtx_AND (DImode, shift_rtx, mask_rtx);
> +      gcc_assert (!reg_overlap_mentioned_p (dest, si1));
> +      emit_insn (gen_rtx_SET (dest, and_rtx));

Maybe gen_rotldi3_mask (after taking the "*" off of that)?  Is that too
unfriendly to use?  We could add another helper.

Thanks,


Segher

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH], Patch #5, Improve vector int initialization on PowerPC
  2016-08-22 16:38         ` [PATCH], Patch #5, Improve vector int initialization on PowerPC Segher Boessenkool
@ 2016-08-22 22:01           ` Michael Meissner
  2016-08-22 22:57             ` Segher Boessenkool
  0 siblings, 1 reply; 16+ messages in thread
From: Michael Meissner @ 2016-08-22 22:01 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Michael Meissner, gcc-patches, David Edelsohn, Bill Schmidt

On Mon, Aug 22, 2016 at 11:37:55AM -0500, Segher Boessenkool wrote:
> [ seems this mail never arrived, resending, sorry if it turns out a duplicate ]
> 
> Hi Mike,
> 
> Okay for trunk.  A few comments...
> 
> On Fri, Aug 19, 2016 at 06:17:54PM -0400, Michael Meissner wrote:
> > --- gcc/config/rs6000/rs6000.c	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)	(revision 239554)
> > +++ gcc/config/rs6000/rs6000.c	(.../gcc/config/rs6000)	(working copy)
> > @@ -6692,7 +6692,7 @@ rs6000_expand_vector_init (rtx target, r
> >        if ((int_vector_p || TARGET_VSX) && all_const_zero)
> >  	{
> >  	  /* Zero register.  */
> > -	  emit_insn (gen_rtx_SET (target, gen_rtx_XOR (mode, target, target)));
> > +	  emit_insn (gen_rtx_SET (target, CONST0_RTX (mode)));
> 
> Can you use emit_move_insn here?  If so, please do.

Ok.

> > +  /* Special case initializing vector int if we are on 64-bit systems with
> > +     direct move or we have the ISA 3.0 instructions.  */
> > +  if (mode == V4SImode  && VECTOR_MEM_VSX_P (V4SImode)
> > +      && TARGET_DIRECT_MOVE_64BIT)
> >      {
> > -      emit_insn (gen_vsx_splat_v4si (target, XVECEXP (vals, 0, 0)));
> > -      return;
> > +      if (all_same)
> > +	{
> > +	  rtx element0 = XVECEXP (vals, 0, 0);
> > +	  if (MEM_P (element0))
> > +	    element0 = rs6000_address_for_fpconvert (element0);
> > +	  else if (!REG_P (element0))
> > +	    element0 = force_reg (SImode, element0);
> 
> You can call force_reg if REG_P holds as well (it immediately returns
> that reg itself).

Ok.

> > +static void
> > +rs6000_split_v4si_init_di_reg (rtx dest, rtx si1, rtx si2, rtx tmp)
> > +{
> > +  const unsigned HOST_WIDE_INT mask_32bit = HOST_WIDE_INT_C (0xffffffff);
> 
> Does using that macro buy us anything?  Won't the plain number work just
> as well?

I would imagine you don't want to use a bare 0xffffffff if the compiler is
being built in a 32-bit environment.  Also, using mask_32bit as a const allowed
me to code the following lines without breaking them into smaller lines due to
the archic 79 character column limit.

> > +      /* Generate RLDIC.  */
> > +      rtx si1_di = gen_rtx_REG (DImode, regno_or_subregno (si1));
> > +      rtx shift_rtx = gen_rtx_ASHIFT (DImode, si1_di, GEN_INT (32));
> > +      rtx mask_rtx = GEN_INT (mask_32bit << 32);
> > +      rtx and_rtx = gen_rtx_AND (DImode, shift_rtx, mask_rtx);
> > +      gcc_assert (!reg_overlap_mentioned_p (dest, si1));
> > +      emit_insn (gen_rtx_SET (dest, and_rtx));
> 
> Maybe gen_rotldi3_mask (after taking the "*" off of that)?  Is that too
> unfriendly to use?  We could add another helper.

The problem is rotld3_mask takes a match_operator, and you pretty much would
have to construct the AND part of the expression, it is kind of useless to call
the generator function (and having to first create the AND insn, and then have
the generator function only do GET_CODE on the part, and ignore it) means an
extra insn is created that will add space until it is garbage collected.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH], Patch #5, Improve vector int initialization on PowerPC
  2016-08-22 22:01           ` Michael Meissner
@ 2016-08-22 22:57             ` Segher Boessenkool
  0 siblings, 0 replies; 16+ messages in thread
From: Segher Boessenkool @ 2016-08-22 22:57 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, David Edelsohn, Bill Schmidt

On Mon, Aug 22, 2016 at 06:01:22PM -0400, Michael Meissner wrote:
> > > +static void
> > > +rs6000_split_v4si_init_di_reg (rtx dest, rtx si1, rtx si2, rtx tmp)
> > > +{
> > > +  const unsigned HOST_WIDE_INT mask_32bit = HOST_WIDE_INT_C (0xffffffff);
> > 
> > Does using that macro buy us anything?  Won't the plain number work just
> > as well?
> 
> I would imagine you don't want to use a bare 0xffffffff if the compiler is
> being built in a 32-bit environment.  Also, using mask_32bit as a const allowed
> me to code the following lines without breaking them into smaller lines due to
> the archic 79 character column limit.

HOST_WIDE_INT is always at least 64-bit nowadays.  I didn't mean use the
number everywhere, I meant just deleting the HOST_WIDE_INT_C.

> > > +      /* Generate RLDIC.  */
> > > +      rtx si1_di = gen_rtx_REG (DImode, regno_or_subregno (si1));
> > > +      rtx shift_rtx = gen_rtx_ASHIFT (DImode, si1_di, GEN_INT (32));
> > > +      rtx mask_rtx = GEN_INT (mask_32bit << 32);
> > > +      rtx and_rtx = gen_rtx_AND (DImode, shift_rtx, mask_rtx);
> > > +      gcc_assert (!reg_overlap_mentioned_p (dest, si1));
> > > +      emit_insn (gen_rtx_SET (dest, and_rtx));
> > 
> > Maybe gen_rotldi3_mask (after taking the "*" off of that)?  Is that too
> > unfriendly to use?  We could add another helper.
> 
> The problem is rotld3_mask takes a match_operator, and you pretty much would
> have to construct the AND part of the expression,

Ah yes.  I'll see if I can do another helper.

Thanks,


Segher

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH], Improve vector int/long initialization on PowerPC
  2016-08-04  4:34 [PATCH], Improve vector int/long initialization on PowerPC Michael Meissner
  2016-08-04 15:03 ` Segher Boessenkool
@ 2016-08-05 22:00 ` Pat Haugen
  2016-08-08 22:56   ` Michael Meissner
  1 sibling, 1 reply; 16+ messages in thread
From: Pat Haugen @ 2016-08-05 22:00 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool,
	David Edelsohn, Bill Schmidt

On 08/03/2016 11:33 PM, Michael Meissner wrote:
>  {
> -  if (BYTES_BIG_ENDIAN)
> -    return "xxpermdi %x0,%x1,%x2,0";
> +  if (which_alternative == 0)
> +    return (BYTES_BIG_ENDIAN
> +	    ? "xxpermdi %x0,%x1,%x2,0"
> +	    : "xxpermdi %x0,%x2,%x1,0");
> +
> +  else if (which_alternative == 1)
> +    return (BYTES_BIG_ENDIAN
> +	    ? "mtvsrdd %x0,%1,%2"
> +	    : "mtvsrdd %x0,%2,%1");
> +
>    else
> -    return "xxpermdi %x0,%x2,%x1,0";
> +    gcc_unreachable ();
>  }
> -  [(set_attr "type" "vecperm")])
> +  [(set_attr "type" "vecperm,mftgpr")
> +   (set_attr "length" "4")])

mtvsrdd actually behaves like a permute, so vecperm would be best insn type for it.

-Pat

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH], Improve vector int/long initialization on PowerPC
  2016-08-05 22:00 ` [PATCH], Improve vector int/long " Pat Haugen
@ 2016-08-08 22:56   ` Michael Meissner
  0 siblings, 0 replies; 16+ messages in thread
From: Michael Meissner @ 2016-08-08 22:56 UTC (permalink / raw)
  To: Pat Haugen
  Cc: Michael Meissner, gcc-patches, Segher Boessenkool,
	David Edelsohn, Bill Schmidt

On Fri, Aug 05, 2016 at 05:00:39PM -0500, Pat Haugen wrote:
> On 08/03/2016 11:33 PM, Michael Meissner wrote:
> >  {
> > -  if (BYTES_BIG_ENDIAN)
> > -    return "xxpermdi %x0,%x1,%x2,0";
> > +  if (which_alternative == 0)
> > +    return (BYTES_BIG_ENDIAN
> > +	    ? "xxpermdi %x0,%x1,%x2,0"
> > +	    : "xxpermdi %x0,%x2,%x1,0");
> > +
> > +  else if (which_alternative == 1)
> > +    return (BYTES_BIG_ENDIAN
> > +	    ? "mtvsrdd %x0,%1,%2"
> > +	    : "mtvsrdd %x0,%2,%1");
> > +
> >    else
> > -    return "xxpermdi %x0,%x2,%x1,0";
> > +    gcc_unreachable ();
> >  }
> > -  [(set_attr "type" "vecperm")])
> > +  [(set_attr "type" "vecperm,mftgpr")
> > +   (set_attr "length" "4")])
> 
> mtvsrdd actually behaves like a permute, so vecperm would be best insn type for it.

Ok, when I submit the patch again, I will change the type to "vecperm".  I will
also change it in "vsx_splat_<mode>" which also generates MTVSRDD.  Thanks.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2016-08-29 19:35 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-04  4:34 [PATCH], Improve vector int/long initialization on PowerPC Michael Meissner
2016-08-04 15:03 ` Segher Boessenkool
2016-08-08 22:55   ` Michael Meissner
2016-08-11 23:15     ` [PATCH], Patch #4, " Michael Meissner
2016-08-12  0:21       ` Segher Boessenkool
2016-08-19 22:18       ` [PATCH], Patch #5, Improve vector int " Michael Meissner
2016-08-19 23:59         ` [PATCH], Patch #6, Improve vector short/char splat " Michael Meissner
2016-08-22 16:47           ` Segher Boessenkool
2016-08-26 19:30           ` [PATCH], Patch #7, Add PowerPC vector initialization tests Michael Meissner
2016-08-27  1:24             ` Segher Boessenkool
2016-08-29 19:35               ` Michael Meissner
2016-08-22 16:38         ` [PATCH], Patch #5, Improve vector int initialization on PowerPC Segher Boessenkool
2016-08-22 22:01           ` Michael Meissner
2016-08-22 22:57             ` Segher Boessenkool
2016-08-05 22:00 ` [PATCH], Improve vector int/long " Pat Haugen
2016-08-08 22:56   ` Michael Meissner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).