public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH], Improve vector int/long initialization on PowerPC
@ 2016-08-04  4:34 Michael Meissner
  2016-08-04 15:03 ` Segher Boessenkool
  2016-08-05 22:00 ` [PATCH], Improve vector int/long " Pat Haugen
  0 siblings, 2 replies; 16+ messages in thread
From: Michael Meissner @ 2016-08-04  4:34 UTC (permalink / raw)
  To: gcc-patches, Segher Boessenkool, David Edelsohn, Bill Schmidt

[-- Attachment #1: Type: text/plain, Size: 3187 bytes --]

This is a set of 3 patches to improve initializing vectors on the PowerPC.

The first patch changes the initialization of vector int where the
initialization part was not constant.  Previously, the compiler would create
the vector initialization on the stack and load it up, and now on 64-bit power8
and newer systems, it will create the parts in the GPRs.

Before the switch from using the old RELOAD register allocator to the newer LRA
register allocator, this patch had a problem with one of the fortran benchmarks
(cray_pointers_2) on a Power8 system (it works on power7 because the
optimization is not done there, and on power9 because power9 has d-form vector
addressing).  This was due to TImode not being allowed in vector registers.
So, I added a test to disable the optimization on such a system.  Since LRA
enables TImode to go into vector registers, most users will see the benefits of
this optimization.

The second part is cosmetic, in that moves the determination of the true
register number for a REG or SUBREG to a helper function from the
rs6000_adjust_vec_address function.  Previous versions of this patch had
additional callers to regno_or_subregno, but those uses were removed at this
time.  However, as I work on further optimizing vector initialization, set, and
extract, I may wind up using the helper function.

The third patch improves formation of vector long on ISA 3.0 system, to use the
new MTVSRDD instruction (that builds a vector from two 64-bit GPRs).

I built spec 2006 with these patches on a little endian power8 system, and at
least 18 of the benchmarks had vector initializations replaced.  Most
benchmarks only used the initialization in a few places, but gamess, dealII,
h264ref, and wrf each had over 100 initializations changed.

I have tried these patches on a big endian power7 system (both 32-bit and
64-bit targets), on a big endian power8 system (just 64-bit targets), and a
little endian power8 system (just 64-bit targets).  There were no regressions
on any of the systems.  Can I install these patches to the trunk?

[gcc]
2016-08-03  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* config/rs6000/rs6000.c (rs6000_expand_vector): On 64-bit systems
	with direct move and TImode registers allowed in VSX, initialize a
	V4SImode vector in the GPRs, rather than creating a temporary
	vector on the stack, doing 4 stores to that temporary vector, and
	then doing a vector load (which causes a pipeline bubble between
	the stores and the load).
	(regno_or_subregno): New helper function to get the register
	number of a REG or SUBREG rtx.
	(rs6000_adjust_vec_address): Use regno_or_subregno.
	* config/rs6000/vsx.md (vsx_concat_<mode>): Add support for the
	ISA 3.0 mtvsrdd instruction if we are moving two gpr registers to
	create on vector register.

[gcc/testsuite]
2016-08-03  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* gcc.target/powerpc/vec-init-1.c: New tests for vector init.
	* gcc.target/powerpc/vec-init-2.c: Likewise.
	* gcc.target/powerpc/vec-init-3.c: Likewise.


-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

[-- Attachment #2: gcc-stage7.init001b --]
[-- Type: text/plain, Size: 7670 bytes --]

Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)	(revision 239098)
+++ gcc/config/rs6000/rs6000.c	(.../gcc/config/rs6000)	(working copy)
@@ -6736,6 +6736,38 @@ rs6000_expand_vector_init (rtx target, r
       return;
     }
 
+  /* Special case initializing vector int if we are on 64-bit systems with
+     direct move.  This bug tickles a bug in reload for fortran's
+     cray_pointers_2 test unless -mvsx-timode is enabled.  */
+  if (mode == V4SImode && TARGET_DIRECT_MOVE_64BIT && TARGET_VSX_TIMODE)
+    {
+      rtx di_hi, di_lo, elements[4], tmp;
+      size_t i;
+
+      for (i = 0; i < 4; i++)
+	{
+	  rtx element_si = XVECEXP (vals, 0, VECTOR_ELT_ORDER_BIG ? i : 3 - i);
+	  element_si = copy_to_mode_reg (SImode, element_si);
+	  elements[i] = gen_reg_rtx (DImode);
+	  convert_move (elements[i], element_si, true);
+	}
+
+      di_hi = gen_reg_rtx (DImode);
+      tmp = gen_reg_rtx (DImode);
+      emit_insn (gen_ashldi3 (tmp, elements[0], GEN_INT (32)));
+      emit_insn (gen_iordi3 (di_hi, tmp, elements[1]));
+
+      di_lo = gen_reg_rtx (DImode);
+      tmp = gen_reg_rtx (DImode);
+      emit_insn (gen_ashldi3 (tmp, elements[2], GEN_INT (32)));
+      emit_insn (gen_iordi3 (di_lo, tmp, elements[3]));
+
+      emit_insn (gen_rtx_CLOBBER (VOIDmode, target));
+      emit_move_insn (gen_highpart (DImode, target), di_hi);
+      emit_move_insn (gen_lowpart (DImode, target), di_lo);
+      return;
+    }
+
   /* With single precision floating point on VSX, know that internally single
      precision is actually represented as a double, and either make 2 V2DF
      vectors, and convert these vectors to single precision, or do one
@@ -7021,6 +7053,18 @@ rs6000_expand_vector_extract (rtx target
   emit_move_insn (target, adjust_address_nv (mem, inner_mode, 0));
 }
 
+/* Helper function to return the register number of a RTX.  */
+static inline int
+regno_or_subregno (rtx op)
+{
+  if (REG_P (op))
+    return REGNO (op);
+  else if (SUBREG_P (op))
+    return subreg_regno (op);
+  else
+    gcc_unreachable ();
+}
+
 /* Adjust a memory address (MEM) of a vector type to point to a scalar field
    within the vector (ELEMENT) with a mode (SCALAR_MODE).  Use a base register
    temporary (BASE_TMP) to fixup the address.  Return the new memory address
@@ -7136,14 +7180,7 @@ rs6000_adjust_vec_address (rtx scalar_re
     {
       rtx op1 = XEXP (new_addr, 1);
       addr_mask_type addr_mask;
-      int scalar_regno;
-
-      if (REG_P (scalar_reg))
-	scalar_regno = REGNO (scalar_reg);
-      else if (SUBREG_P (scalar_reg))
-	scalar_regno = subreg_regno (scalar_reg);
-      else
-	gcc_unreachable ();
+      int scalar_regno = regno_or_subregno (scalar_reg);
 
       gcc_assert (scalar_regno < FIRST_PSEUDO_REGISTER);
       if (INT_REGNO_P (scalar_regno))
Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)	(revision 239098)
+++ gcc/config/rs6000/vsx.md	(.../gcc/config/rs6000)	(working copy)
@@ -1899,18 +1899,28 @@ (define_insn "*vsx_float_fix_v2df2"
 
 ;; Build a V2DF/V2DI vector from two scalars
 (define_insn "vsx_concat_<mode>"
-  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=<VSr>,?<VSa>")
+  [(set (match_operand:VSX_D 0 "gpc_reg_operand" "=<VSa>,we")
 	(vec_concat:VSX_D
-	 (match_operand:<VS_scalar> 1 "vsx_register_operand" "<VS_64reg>,<VSa>")
-	 (match_operand:<VS_scalar> 2 "vsx_register_operand" "<VS_64reg>,<VSa>")))]
+	 (match_operand:<VS_scalar> 1 "gpc_reg_operand" "<VS_64reg>,r")
+	 (match_operand:<VS_scalar> 2 "gpc_reg_operand" "<VS_64reg>,r")))
+   (clobber (match_scratch:DI 3 "=X,X"))]
   "VECTOR_MEM_VSX_P (<MODE>mode)"
 {
-  if (BYTES_BIG_ENDIAN)
-    return "xxpermdi %x0,%x1,%x2,0";
+  if (which_alternative == 0)
+    return (BYTES_BIG_ENDIAN
+	    ? "xxpermdi %x0,%x1,%x2,0"
+	    : "xxpermdi %x0,%x2,%x1,0");
+
+  else if (which_alternative == 1)
+    return (BYTES_BIG_ENDIAN
+	    ? "mtvsrdd %x0,%1,%2"
+	    : "mtvsrdd %x0,%2,%1");
+
   else
-    return "xxpermdi %x0,%x2,%x1,0";
+    gcc_unreachable ();
 }
-  [(set_attr "type" "vecperm")])
+  [(set_attr "type" "vecperm,mftgpr")
+   (set_attr "length" "4")])
 
 ;; Special purpose concat using xxpermdi to glue two single precision values
 ;; together, relying on the fact that internally scalar floats are represented

Index: gcc/testsuite/gcc.target/powerpc/vec-init-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vec-init-1.c	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc)	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vec-init-1.c	(.../gcc/testsuite/gcc.target/powerpc)	(revision 239099)
@@ -0,0 +1,36 @@
+/* { dg-do run { target { powerpc*-*-linux* } } } */
+/* { dg-require-effective-target vsx_hw } */
+/* { dg-options "-O2 -mvsx" } */
+
+#include <stdlib.h>
+#include <stddef.h>
+#include <altivec.h>
+
+extern void check (vector int a)                    __attribute__((__noinline__));
+extern vector int pack (int a, int b, int c, int d) __attribute__((__noinline__));
+
+void
+check (vector int a)
+{
+  static const int expected[] = { -1, 2, 0, -3 };
+  size_t i;
+
+  for (i = 0; i < 4; i++)
+    if (vec_extract (a, i) != expected[i])
+      abort ();
+}
+
+vector int
+pack (int a, int b, int c, int d)
+{
+  return (vector int) { a, b, c, d };
+}
+
+vector int sv = (vector int) { -1, 2, 0, -3 };
+
+int main (void)
+{
+  check (sv);
+  check (pack (-1, 2, 0, -3));
+  return 0;
+}
Index: gcc/testsuite/gcc.target/powerpc/vec-init-2.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vec-init-2.c	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc)	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vec-init-2.c	(.../gcc/testsuite/gcc.target/powerpc)	(revision 239099)
@@ -0,0 +1,36 @@
+/* { dg-do run { target { powerpc*-*-linux* && lp64 } } } */
+/* { dg-require-effective-target vsx_hw } */
+/* { dg-options "-O2 -mvsx" } */
+
+#include <stdlib.h>
+#include <stddef.h>
+#include <altivec.h>
+
+extern void check (vector long a)        __attribute__((__noinline__));
+extern vector long pack (long a, long b) __attribute__((__noinline__));
+
+void
+check (vector long a)
+{
+  static const long expected[] = { 2L, -3L };
+  size_t i;
+
+  for (i = 0; i < 2; i++)
+    if (vec_extract (a, i) != expected[i])
+      abort ();
+}
+
+vector long
+pack (long a, long b)
+{
+  return (vector long) { a, b };
+}
+
+vector long sv = (vector long) { 2L, -3L };
+
+int main (void)
+{
+  check (sv);
+  check (pack (2L, -3L));
+  return 0;
+}
Index: gcc/testsuite/gcc.target/powerpc/vec-init-3.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vec-init-3.c	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc)	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vec-init-3.c	(.../gcc/testsuite/gcc.target/powerpc)	(revision 239099)
@@ -0,0 +1,12 @@
+/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-mcpu=power9 -O2 -mupper-regs-di" } */
+
+vector long
+merge (long a, long b)
+{
+  return (vector long) { a, b };
+}
+
+/* { dg-final { scan-assembler "mtvsrdd" } } */

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2016-08-29 19:35 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-04  4:34 [PATCH], Improve vector int/long initialization on PowerPC Michael Meissner
2016-08-04 15:03 ` Segher Boessenkool
2016-08-08 22:55   ` Michael Meissner
2016-08-11 23:15     ` [PATCH], Patch #4, " Michael Meissner
2016-08-12  0:21       ` Segher Boessenkool
2016-08-19 22:18       ` [PATCH], Patch #5, Improve vector int " Michael Meissner
2016-08-19 23:59         ` [PATCH], Patch #6, Improve vector short/char splat " Michael Meissner
2016-08-22 16:47           ` Segher Boessenkool
2016-08-26 19:30           ` [PATCH], Patch #7, Add PowerPC vector initialization tests Michael Meissner
2016-08-27  1:24             ` Segher Boessenkool
2016-08-29 19:35               ` Michael Meissner
2016-08-22 16:38         ` [PATCH], Patch #5, Improve vector int initialization on PowerPC Segher Boessenkool
2016-08-22 22:01           ` Michael Meissner
2016-08-22 22:57             ` Segher Boessenkool
2016-08-05 22:00 ` [PATCH], Improve vector int/long " Pat Haugen
2016-08-08 22:56   ` Michael Meissner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).