[PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts
@ 2017-07-27 23:21 Michael Meissner
  2017-07-28  7:51 ` Richard Biener
  2017-07-28 21:09 ` Segher Boessenkool
  0 siblings, 2 replies; 16+ messages in thread
From: Michael Meissner @ 2017-07-27 23:21 UTC (permalink / raw)
  To: GCC Patches, Segher Boessenkool, David Edelsohn, Bill Schmidt

[-- Attachment #1: Type: text/plain, Size: 2105 bytes --]

This patches optimizes the PowerPC vector set operation for 64-bit doubles and
longs where the elements in the vector set may have been extracted from another
vector (PR target/81593):

Here an an example:

	vector double
	test_vpasted (vector double high, vector double low)
	{
	  vector double res;
	  res[1] = high[1];
	  res[0] = low[0];
	  return res;
	}

Previously it would generate:

        xxpermdi 12,34,34,2
        vspltisw 2,0
        xxlor 0,35,35
        xxpermdi 34,34,12,0
        xxpermdi 34,0,34,1

and with these patches, it now generates:

        xxpermdi 34,35,34,1

I have tested it on a little endian power8 system and a big endian power7
system with the usual bootstrap and make checks with no regressions.  Can I
check this into the trunk?

I also built Spec 2006 with the compiler, and saw no changes in the code
generated.  This isn't surprising because it isn't something that auto
vectorization might generate by default.

[gcc]
2017-07-27  Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/81593
	* config/rs6000/rs6000-protos.h (rs6000_emit_xxpermdi): New
	declaration.
	* config/rs6000/rs6000.c (rs6000_emit_xxpermdi): New function to
	emit XXPERMDI accessing either double word in either vector
	register inputs.
	* config/rs6000/vsx.md (vsx_concat_<mode>, VSX_D iterator):
	Rewrite VEC_CONCAT insn to call rs6000_emit_xxpermdi.  Simplify
	the constraints with the removal of the -mupper-regs-* switches.
	(vsx_concat_<mode>_1): New combiner insns to optimize CONCATs
	where either register might have come from VEC_SELECT.
	(vsx_concat_<mode>_2): Likewise.
	(vsx_concat_<mode>_3): Likewise.
	(vsx_set_<mode>, VSX_D iterator): Rewrite insn to generate a
	VEC_CONCAT rather than use an UNSPEC to specify the option.

[gcc/testsuite]
2017-07-27  Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/81593
	* gcc.target/powerpc/vsx-extract-6.c: New test.
	* gcc.target/powerpc/vsx-extract-7.c: Likewise.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

[-- Attachment #2: pr81593.patch01b --]
[-- Type: text/plain, Size: 8484 bytes --]

Index: gcc/config/rs6000/rs6000-protos.h
===================================================================
--- gcc/config/rs6000/rs6000-protos.h	(svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000/rs6000-protos.h)	(revision 250577)
+++ gcc/config/rs6000/rs6000-protos.h	(.../gcc/config/rs6000/rs6000-protos.h)	(working copy)
@@ -233,6 +233,7 @@ extern void rs6000_asm_output_dwarf_pcre
 					   const char *label);
 extern void rs6000_asm_output_dwarf_datarel (FILE *file, int size,
 					     const char *label);
+extern const char *rs6000_emit_xxpermdi (rtx[], rtx, rtx);
 
 /* Declare functions in rs6000-c.c */
 
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000/rs6000.c)	(revision 250577)
+++ gcc/config/rs6000/rs6000.c	(.../gcc/config/rs6000/rs6000.c)	(working copy)
@@ -39167,6 +39167,38 @@ rs6000_optab_supported_p (int op, machin
       return true;
     }
 }
+
+\f
+/* Emit a XXPERMDI instruction that can extract from either double word of the
+   two arguments.  ELEMENT1 and ELEMENT2 are either NULL or they are 0/1 giving
+   which double word to be used for the operand.  */
+
+const char *
+rs6000_emit_xxpermdi (rtx operands[], rtx element1, rtx element2)
+{
+  int op1_dword = (!element1) ? 0 : INTVAL (element1);
+  int op2_dword = (!element2) ? 0 : INTVAL (element2);
+
+  gcc_assert (IN_RANGE (op1_dword | op2_dword, 0, 1));
+
+  if (BYTES_BIG_ENDIAN)
+    {
+      operands[3] = GEN_INT (2*op1_dword + op2_dword);
+      return "xxpermdi %x0,%x1,%x2,%3";
+    }
+  else
+    {
+      if (element1)
+	op1_dword = 1 - op1_dword;
+
+      if (element2)
+	op2_dword = 1 - op2_dword;
+
+      operands[3] = GEN_INT (op1_dword + 2*op2_dword);
+      return "xxpermdi %x0,%x2,%x1,%3";
+    }
+}
+
 \f
 struct gcc_target targetm = TARGET_INITIALIZER;
 
Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md	(svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000/vsx.md)	(revision 250577)
+++ gcc/config/rs6000/vsx.md	(.../gcc/config/rs6000/vsx.md)	(working copy)
@@ -2366,19 +2366,17 @@ (define_insn "*vsx_float_fix_v2df2"
 
 ;; Build a V2DF/V2DI vector from two scalars
 (define_insn "vsx_concat_<mode>"
-  [(set (match_operand:VSX_D 0 "gpc_reg_operand" "=<VSa>,we")
+  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa,we")
 	(vec_concat:VSX_D
-	 (match_operand:<VS_scalar> 1 "gpc_reg_operand" "<VS_64reg>,b")
-	 (match_operand:<VS_scalar> 2 "gpc_reg_operand" "<VS_64reg>,b")))]
+	 (match_operand:<VS_scalar> 1 "gpc_reg_operand" "wa,b")
+	 (match_operand:<VS_scalar> 2 "gpc_reg_operand" "wa,b")))]
   "VECTOR_MEM_VSX_P (<MODE>mode)"
 {
   if (which_alternative == 0)
-    return (BYTES_BIG_ENDIAN
-	    ? "xxpermdi %x0,%x1,%x2,0"
-	    : "xxpermdi %x0,%x2,%x1,0");
+    return rs6000_emit_xxpermdi (operands, NULL_RTX, NULL_RTX);
 
   else if (which_alternative == 1)
-    return (BYTES_BIG_ENDIAN
+    return (VECTOR_ELT_ORDER_BIG
 	    ? "mtvsrdd %x0,%1,%2"
 	    : "mtvsrdd %x0,%2,%1");
 
@@ -2387,6 +2385,47 @@ (define_insn "vsx_concat_<mode>"
 }
   [(set_attr "type" "vecperm")])
 
+;; Combiner patterns to allow creating XXPERMDI's to access either double
+;; register in a vector register.  Note, rs6000_emit_xxpermdi expects
+;; operands[0..2] to be the vector registers.
+(define_insn "*vsx_concat_<mode>_1"
+  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")
+	(vec_concat:VSX_D
+	 (vec_select:<VS_scalar>
+	  (match_operand:VSX_D 1 "gpc_reg_operand" "wa")
+	  (parallel [(match_operand:QI 3 "const_0_to_1_operand" "n")]))
+	 (match_operand:<VS_scalar> 2 "gpc_reg_operand" "wa")))]
+  "VECTOR_MEM_VSX_P (<MODE>mode)"
+{
+  return rs6000_emit_xxpermdi (operands, operands[3], NULL_RTX);
+})
+
+(define_insn "*vsx_concat_<mode>_2"
+  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")
+	(vec_concat:VSX_D
+	 (match_operand:<VS_scalar> 1 "gpc_reg_operand" "wa")
+	 (vec_select:<VS_scalar>
+	  (match_operand:VSX_D 2 "gpc_reg_operand" "wa")
+	  (parallel [(match_operand:QI 3 "const_0_to_1_operand" "n")]))))]
+  "VECTOR_MEM_VSX_P (<MODE>mode)"
+{
+  return rs6000_emit_xxpermdi (operands, NULL_RTX, operands[3]);
+})
+
+(define_insn "*vsx_concat_<mode>_3"
+  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")
+	(vec_concat:VSX_D
+	 (vec_select:<VS_scalar>
+	  (match_operand:VSX_D 1 "gpc_reg_operand" "wa")
+	  (parallel [(match_operand:QI 3 "const_0_to_1_operand" "n")]))
+	 (vec_select:<VS_scalar>
+	  (match_operand:VSX_D 2 "gpc_reg_operand" "wa")
+	  (parallel [(match_operand:QI 4 "const_0_to_1_operand" "n")]))))]
+  "VECTOR_MEM_VSX_P (<MODE>mode)"
+{
+  return rs6000_emit_xxpermdi (operands, operands[3], operands[4]);
+})
+
 ;; Special purpose concat using xxpermdi to glue two single precision values
 ;; together, relying on the fact that internally scalar floats are represented
 ;; as doubles.  This is used to initialize a V4SF vector with 4 floats
@@ -2587,25 +2626,35 @@ (define_expand "vsx_set_v1ti"
   DONE;
 })
 
-;; Set the element of a V2DI/VD2F mode
-(define_insn "vsx_set_<mode>"
-  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wd,?<VSa>")
-	(unspec:VSX_D
-	 [(match_operand:VSX_D 1 "vsx_register_operand" "wd,<VSa>")
-	  (match_operand:<VS_scalar> 2 "vsx_register_operand" "<VS_64reg>,<VSa>")
-	  (match_operand:QI 3 "u5bit_cint_operand" "i,i")]
-	 UNSPEC_VSX_SET))]
+;; Rewrite V2DF/V2DI set in terms of VEC_CONCAT
+(define_expand "vsx_set_<mode>"
+  [(use (match_operand:VSX_D 0 "vsx_register_operand"))
+   (use (match_operand:VSX_D 1 "vsx_register_operand"))
+   (use (match_operand:<VS_scalar> 2 "gpc_reg_operand"))
+   (use (match_operand:QI 3 "const_0_to_1_operand"))]
   "VECTOR_MEM_VSX_P (<MODE>mode)"
 {
-  int idx_first = BYTES_BIG_ENDIAN ? 0 : 1;
-  if (INTVAL (operands[3]) == idx_first)
-    return \"xxpermdi %x0,%x2,%x1,1\";
-  else if (INTVAL (operands[3]) == 1 - idx_first)
-    return \"xxpermdi %x0,%x1,%x2,0\";
+  rtx dest = operands[0];
+  rtx vec_reg = operands[1];
+  rtx value = operands[2];
+  rtx ele = operands[3];
+  rtx tmp = gen_reg_rtx (<VS_scalar>mode);
+
+  if (ele == const0_rtx)
+    {
+      emit_insn (gen_vsx_extract_<mode> (tmp, vec_reg, const1_rtx));
+      emit_insn (gen_vsx_concat_<mode> (dest, value, tmp));
+      DONE;
+    }
+  else if (ele == const1_rtx)
+    {
+      emit_insn (gen_vsx_extract_<mode> (tmp, vec_reg, const0_rtx));
+      emit_insn (gen_vsx_concat_<mode> (dest, tmp, value));
+      DONE;
+    }
   else
     gcc_unreachable ();
-}
-  [(set_attr "type" "vecperm")])
+})
 
 ;; Extract a DF/DI element from V2DF/V2DI
 ;; Optimize cases were we can do a simple or direct move.
Index: gcc/testsuite/gcc.target/powerpc/vsx-extract-6.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vsx-extract-6.c	(svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc/vsx-extract-6.c)	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vsx-extract-6.c	(.../gcc/testsuite/gcc.target/powerpc/vsx-extract-6.c)	(revision 250640)
@@ -0,0 +1,15 @@
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mvsx" } */
+
+vector unsigned long
+test_vpasted (vector unsigned long high, vector unsigned long low)
+{
+  vector unsigned long res;
+  res[1] = high[1];
+  res[0] = low[0];
+  return res;
+}
+
+/* { dg-final { scan-assembler-times {\mxxpermdi\M} 1 } } */
Index: gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c	(svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c)	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c	(.../gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c)	(revision 250640)
@@ -0,0 +1,15 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mvsx" } */
+
+vector double
+test_vpasted (vector double high, vector double low)
+{
+  vector double res;
+  res[1] = high[1];
+  res[0] = low[0];
+  return res;
+}
+
+/* { dg-final { scan-assembler-times {\mxxpermdi\M} 1 } } */

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts
  2017-07-27 23:21 [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts Michael Meissner
@ 2017-07-28  7:51 ` Richard Biener
  2017-07-28  8:02   ` Andrew Pinski
  2017-07-28 14:44   ` Michael Meissner
  2017-07-28 21:09 ` Segher Boessenkool
  1 sibling, 2 replies; 16+ messages in thread
From: Richard Biener @ 2017-07-28  7:51 UTC (permalink / raw)
  To: Michael Meissner, GCC Patches, Segher Boessenkool,
	David Edelsohn, Bill Schmidt

On Fri, Jul 28, 2017 at 1:21 AM, Michael Meissner
<meissner@linux.vnet.ibm.com> wrote:
> This patches optimizes the PowerPC vector set operation for 64-bit doubles and
> longs where the elements in the vector set may have been extracted from another
> vector (PR target/81593):
>
> Here an an example:
>
>         vector double
>         test_vpasted (vector double high, vector double low)
>         {
>           vector double res;
>           res[1] = high[1];
>           res[0] = low[0];
>           return res;
>         }

Interesting.  We expand from

  <bb 2> [100.00%] [count: INV]:
  _1 = BIT_FIELD_REF <high_4(D), 64, 64>;
  res_6 = BIT_INSERT_EXPR <res_5(D), _1, 64 (64 bits)>;
  _2 = BIT_FIELD_REF <low_7(D), 64, 0>;
  res_8 = BIT_INSERT_EXPR <res_6, _2, 0 (64 bits)>;
  return res_8;

but ideally we'd pattern-match that to a VEC_PERM_EXPR.  The bswap
pass looks like the canonical pass for this even though it's quite awkward
to fill this in.

So a match.pd rule would work as well here - your ppc backend patterns
are v2df specific, right?

> Previously it would generate:
>
>         xxpermdi 12,34,34,2
>         vspltisw 2,0
>         xxlor 0,35,35
>         xxpermdi 34,34,12,0
>         xxpermdi 34,0,34,1
>
> and with these patches, it now generates:
>
>         xxpermdi 34,35,34,1
>
> I have tested it on a little endian power8 system and a big endian power7
> system with the usual bootstrap and make checks with no regressions.  Can I
> check this into the trunk?
>
> I also built Spec 2006 with the compiler, and saw no changes in the code
> generated.  This isn't surprising because it isn't something that auto
> vectorization might generate by default.
>
> [gcc]
> 2017-07-27  Michael Meissner  <meissner@linux.vnet.ibm.com>
>
>         PR target/81593
>         * config/rs6000/rs6000-protos.h (rs6000_emit_xxpermdi): New
>         declaration.
>         * config/rs6000/rs6000.c (rs6000_emit_xxpermdi): New function to
>         emit XXPERMDI accessing either double word in either vector
>         register inputs.
>         * config/rs6000/vsx.md (vsx_concat_<mode>, VSX_D iterator):
>         Rewrite VEC_CONCAT insn to call rs6000_emit_xxpermdi.  Simplify
>         the constraints with the removal of the -mupper-regs-* switches.
>         (vsx_concat_<mode>_1): New combiner insns to optimize CONCATs
>         where either register might have come from VEC_SELECT.
>         (vsx_concat_<mode>_2): Likewise.
>         (vsx_concat_<mode>_3): Likewise.
>         (vsx_set_<mode>, VSX_D iterator): Rewrite insn to generate a
>         VEC_CONCAT rather than use an UNSPEC to specify the option.
>
> [gcc/testsuite]
> 2017-07-27  Michael Meissner  <meissner@linux.vnet.ibm.com>
>
>         PR target/81593
>         * gcc.target/powerpc/vsx-extract-6.c: New test.
>         * gcc.target/powerpc/vsx-extract-7.c: Likewise.
>
> --
> Michael Meissner, IBM
> IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
> email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts
  2017-07-28  7:51 ` Richard Biener
@ 2017-07-28  8:02   ` Andrew Pinski
  2017-07-28  8:20     ` Richard Biener
  2017-07-28  8:42     ` Marc Glisse
  2017-07-28 14:44   ` Michael Meissner
  1 sibling, 2 replies; 16+ messages in thread
From: Andrew Pinski @ 2017-07-28  8:02 UTC (permalink / raw)
  To: Richard Biener
  Cc: Michael Meissner, GCC Patches, Segher Boessenkool,
	David Edelsohn, Bill Schmidt

On Fri, Jul 28, 2017 at 12:51 AM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Fri, Jul 28, 2017 at 1:21 AM, Michael Meissner
> <meissner@linux.vnet.ibm.com> wrote:
>> This patches optimizes the PowerPC vector set operation for 64-bit doubles and
>> longs where the elements in the vector set may have been extracted from another
>> vector (PR target/81593):
>>
>> Here an an example:
>>
>>         vector double
>>         test_vpasted (vector double high, vector double low)
>>         {
>>           vector double res;
>>           res[1] = high[1];
>>           res[0] = low[0];
>>           return res;
>>         }
>
> Interesting.  We expand from
>
>   <bb 2> [100.00%] [count: INV]:
>   _1 = BIT_FIELD_REF <high_4(D), 64, 64>;
>   res_6 = BIT_INSERT_EXPR <res_5(D), _1, 64 (64 bits)>;
>   _2 = BIT_FIELD_REF <low_7(D), 64, 0>;
>   res_8 = BIT_INSERT_EXPR <res_6, _2, 0 (64 bits)>;
>   return res_8;
>
> but ideally we'd pattern-match that to a VEC_PERM_EXPR.  The bswap
> pass looks like the canonical pass for this even though it's quite awkward
> to fill this in.

I was thinking about this exactly.  Though for the scale use of
BIT_INSERT_EXPR/BIT_FIELD_REF.
I have a case where someone writes (this shows up in GCC too):
a->b = c->b;
a->d = c->d;
a->e = c->e;
a->f = c->f;
a->g = c->g;
a->h = c->h;

Where b,d,e,f,g,h are adjacent bit-fields after I lowered the bit-fields I have:
_1 = BIT_FIELD_REF <a.1_3, 2, 0>;
_8 = BIT_INSERT_EXPR <c.1_4, _1, 0 (2 bits)>;
_2 = BIT_FIELD_REF <a.1_3, 4, 2>;
_9 = BIT_INSERT_EXPR <_8, _2, 2 (4 bits)>;
....

For the vector case, can't we write it as:
_1 = BIT_FIELD_REF <high_4(D), 64, 64>;
_2 = BIT_FIELD_REF <low_7(D), 64, 0>;
res_8 = {_1, _2};

And then have some match.pd patterns (which might get complex), to
rewrite that into VEC_PERM_EXPR?
The reason why I ask that is because say someone who wrote:
vector double
test_vpasted (vector double high, vector double low)
{
  vector double res = { high[1], low[0] };
  return res;
}


Thanks,
Andrew Pinski

>
> So a match.pd rule would work as well here - your ppc backend patterns
> are v2df specific, right?
>
>> Previously it would generate:
>>
>>         xxpermdi 12,34,34,2
>>         vspltisw 2,0
>>         xxlor 0,35,35
>>         xxpermdi 34,34,12,0
>>         xxpermdi 34,0,34,1
>>
>> and with these patches, it now generates:
>>
>>         xxpermdi 34,35,34,1
>>
>> I have tested it on a little endian power8 system and a big endian power7
>> system with the usual bootstrap and make checks with no regressions.  Can I
>> check this into the trunk?
>>
>> I also built Spec 2006 with the compiler, and saw no changes in the code
>> generated.  This isn't surprising because it isn't something that auto
>> vectorization might generate by default.
>>
>> [gcc]
>> 2017-07-27  Michael Meissner  <meissner@linux.vnet.ibm.com>
>>
>>         PR target/81593
>>         * config/rs6000/rs6000-protos.h (rs6000_emit_xxpermdi): New
>>         declaration.
>>         * config/rs6000/rs6000.c (rs6000_emit_xxpermdi): New function to
>>         emit XXPERMDI accessing either double word in either vector
>>         register inputs.
>>         * config/rs6000/vsx.md (vsx_concat_<mode>, VSX_D iterator):
>>         Rewrite VEC_CONCAT insn to call rs6000_emit_xxpermdi.  Simplify
>>         the constraints with the removal of the -mupper-regs-* switches.
>>         (vsx_concat_<mode>_1): New combiner insns to optimize CONCATs
>>         where either register might have come from VEC_SELECT.
>>         (vsx_concat_<mode>_2): Likewise.
>>         (vsx_concat_<mode>_3): Likewise.
>>         (vsx_set_<mode>, VSX_D iterator): Rewrite insn to generate a
>>         VEC_CONCAT rather than use an UNSPEC to specify the option.
>>
>> [gcc/testsuite]
>> 2017-07-27  Michael Meissner  <meissner@linux.vnet.ibm.com>
>>
>>         PR target/81593
>>         * gcc.target/powerpc/vsx-extract-6.c: New test.
>>         * gcc.target/powerpc/vsx-extract-7.c: Likewise.
>>
>> --
>> Michael Meissner, IBM
>> IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
>> email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts
  2017-07-28  8:02   ` Andrew Pinski
@ 2017-07-28  8:20     ` Richard Biener
  2017-07-28  8:42     ` Marc Glisse
  1 sibling, 0 replies; 16+ messages in thread
From: Richard Biener @ 2017-07-28  8:20 UTC (permalink / raw)
  To: Andrew Pinski
  Cc: Michael Meissner, GCC Patches, Segher Boessenkool,
	David Edelsohn, Bill Schmidt

On Fri, Jul 28, 2017 at 10:02 AM, Andrew Pinski <pinskia@gmail.com> wrote:
> On Fri, Jul 28, 2017 at 12:51 AM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Fri, Jul 28, 2017 at 1:21 AM, Michael Meissner
>> <meissner@linux.vnet.ibm.com> wrote:
>>> This patches optimizes the PowerPC vector set operation for 64-bit doubles and
>>> longs where the elements in the vector set may have been extracted from another
>>> vector (PR target/81593):
>>>
>>> Here an an example:
>>>
>>>         vector double
>>>         test_vpasted (vector double high, vector double low)
>>>         {
>>>           vector double res;
>>>           res[1] = high[1];
>>>           res[0] = low[0];
>>>           return res;
>>>         }
>>
>> Interesting.  We expand from
>>
>>   <bb 2> [100.00%] [count: INV]:
>>   _1 = BIT_FIELD_REF <high_4(D), 64, 64>;
>>   res_6 = BIT_INSERT_EXPR <res_5(D), _1, 64 (64 bits)>;
>>   _2 = BIT_FIELD_REF <low_7(D), 64, 0>;
>>   res_8 = BIT_INSERT_EXPR <res_6, _2, 0 (64 bits)>;
>>   return res_8;
>>
>> but ideally we'd pattern-match that to a VEC_PERM_EXPR.  The bswap
>> pass looks like the canonical pass for this even though it's quite awkward
>> to fill this in.
>
> I was thinking about this exactly.  Though for the scale use of
> BIT_INSERT_EXPR/BIT_FIELD_REF.
> I have a case where someone writes (this shows up in GCC too):
> a->b = c->b;
> a->d = c->d;
> a->e = c->e;
> a->f = c->f;
> a->g = c->g;
> a->h = c->h;
>
> Where b,d,e,f,g,h are adjacent bit-fields after I lowered the bit-fields I have:
> _1 = BIT_FIELD_REF <a.1_3, 2, 0>;
> _8 = BIT_INSERT_EXPR <c.1_4, _1, 0 (2 bits)>;
> _2 = BIT_FIELD_REF <a.1_3, 4, 2>;
> _9 = BIT_INSERT_EXPR <_8, _2, 2 (4 bits)>;
> ....
>
> For the vector case, can't we write it as:
> _1 = BIT_FIELD_REF <high_4(D), 64, 64>;
> _2 = BIT_FIELD_REF <low_7(D), 64, 0>;
> res_8 = {_1, _2};
>
> And then have some match.pd patterns (which might get complex), to
> rewrite that into VEC_PERM_EXPR?
> The reason why I ask that is because say someone who wrote:
> vector double
> test_vpasted (vector double high, vector double low)
> {
>   vector double res = { high[1], low[0] };
>   return res;
> }

I still believe a proper pass is better than match.pd patterns (which are
awkward when dealing with "variable operand number" cases).

I believe in the end we want to "unify" SRA, bswap and store-merging
at least.  Analyze memory/component accesses, their flow and then
pattern-match the result.  bswap is good with the flow stuff but its
memory/component access analysis is too ad-hoc.

"unify" in the sense of using common infrastructure.

Richard.

>
> Thanks,
> Andrew Pinski
>
>>
>> So a match.pd rule would work as well here - your ppc backend patterns
>> are v2df specific, right?
>>
>>> Previously it would generate:
>>>
>>>         xxpermdi 12,34,34,2
>>>         vspltisw 2,0
>>>         xxlor 0,35,35
>>>         xxpermdi 34,34,12,0
>>>         xxpermdi 34,0,34,1
>>>
>>> and with these patches, it now generates:
>>>
>>>         xxpermdi 34,35,34,1
>>>
>>> I have tested it on a little endian power8 system and a big endian power7
>>> system with the usual bootstrap and make checks with no regressions.  Can I
>>> check this into the trunk?
>>>
>>> I also built Spec 2006 with the compiler, and saw no changes in the code
>>> generated.  This isn't surprising because it isn't something that auto
>>> vectorization might generate by default.
>>>
>>> [gcc]
>>> 2017-07-27  Michael Meissner  <meissner@linux.vnet.ibm.com>
>>>
>>>         PR target/81593
>>>         * config/rs6000/rs6000-protos.h (rs6000_emit_xxpermdi): New
>>>         declaration.
>>>         * config/rs6000/rs6000.c (rs6000_emit_xxpermdi): New function to
>>>         emit XXPERMDI accessing either double word in either vector
>>>         register inputs.
>>>         * config/rs6000/vsx.md (vsx_concat_<mode>, VSX_D iterator):
>>>         Rewrite VEC_CONCAT insn to call rs6000_emit_xxpermdi.  Simplify
>>>         the constraints with the removal of the -mupper-regs-* switches.
>>>         (vsx_concat_<mode>_1): New combiner insns to optimize CONCATs
>>>         where either register might have come from VEC_SELECT.
>>>         (vsx_concat_<mode>_2): Likewise.
>>>         (vsx_concat_<mode>_3): Likewise.
>>>         (vsx_set_<mode>, VSX_D iterator): Rewrite insn to generate a
>>>         VEC_CONCAT rather than use an UNSPEC to specify the option.
>>>
>>> [gcc/testsuite]
>>> 2017-07-27  Michael Meissner  <meissner@linux.vnet.ibm.com>
>>>
>>>         PR target/81593
>>>         * gcc.target/powerpc/vsx-extract-6.c: New test.
>>>         * gcc.target/powerpc/vsx-extract-7.c: Likewise.
>>>
>>> --
>>> Michael Meissner, IBM
>>> IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
>>> email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts
  2017-07-28  8:02   ` Andrew Pinski
  2017-07-28  8:20     ` Richard Biener
@ 2017-07-28  8:42     ` Marc Glisse
  1 sibling, 0 replies; 16+ messages in thread
From: Marc Glisse @ 2017-07-28  8:42 UTC (permalink / raw)
  To: Andrew Pinski
  Cc: Richard Biener, Michael Meissner, GCC Patches,
	Segher Boessenkool, David Edelsohn, Bill Schmidt

On Fri, 28 Jul 2017, Andrew Pinski wrote:

> For the vector case, can't we write it as:
> _1 = BIT_FIELD_REF <high_4(D), 64, 64>;
> _2 = BIT_FIELD_REF <low_7(D), 64, 0>;
> res_8 = {_1, _2};
>
> And then have some match.pd patterns (which might get complex), to
> rewrite that into VEC_PERM_EXPR?

For this last part, we have simplify_vector_constructor in 
tree-ssa-forwprop.c, which currently only recognizes VEC_PERM_EXPR of a 
single vector, but I guess it could be extended to 2 vectors. Not as good 
as a bswap revamp (which will be needed anyway at some point), but less 
work.

-- 
Marc Glisse

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts
  2017-07-28  7:51 ` Richard Biener
  2017-07-28  8:02   ` Andrew Pinski
@ 2017-07-28 14:44   ` Michael Meissner
  1 sibling, 0 replies; 16+ messages in thread
From: Michael Meissner @ 2017-07-28 14:44 UTC (permalink / raw)
  To: Richard Biener
  Cc: Michael Meissner, GCC Patches, Segher Boessenkool,
	David Edelsohn, Bill Schmidt

On Fri, Jul 28, 2017 at 09:51:30AM +0200, Richard Biener wrote:
> On Fri, Jul 28, 2017 at 1:21 AM, Michael Meissner
> <meissner@linux.vnet.ibm.com> wrote:
> > This patches optimizes the PowerPC vector set operation for 64-bit doubles and
> > longs where the elements in the vector set may have been extracted from another
> > vector (PR target/81593):
> >
> > Here an an example:
> >
> >         vector double
> >         test_vpasted (vector double high, vector double low)
> >         {
> >           vector double res;
> >           res[1] = high[1];
> >           res[0] = low[0];
> >           return res;
> >         }
> 
> Interesting.  We expand from
> 
>   <bb 2> [100.00%] [count: INV]:
>   _1 = BIT_FIELD_REF <high_4(D), 64, 64>;
>   res_6 = BIT_INSERT_EXPR <res_5(D), _1, 64 (64 bits)>;
>   _2 = BIT_FIELD_REF <low_7(D), 64, 0>;
>   res_8 = BIT_INSERT_EXPR <res_6, _2, 0 (64 bits)>;
>   return res_8;
> 
> but ideally we'd pattern-match that to a VEC_PERM_EXPR.  The bswap
> pass looks like the canonical pass for this even though it's quite awkward
> to fill this in.
> 
> So a match.pd rule would work as well here - your ppc backend patterns
> are v2df specific, right?

Both V2DF and V2DI.

While it would be great to have a machine independent optimization, my patches
would also work for PowerPC specific built-ins for vector extract and vector
insert.

Also my patches replaces an UNSPEC to create the vector with VEC_CONCAT.

Thus work going on in for machine independent support should not preclude
this patch from being accepted in the PowerPC backend.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts
  2017-07-27 23:21 [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts Michael Meissner
  2017-07-28  7:51 ` Richard Biener
@ 2017-07-28 21:09 ` Segher Boessenkool
  2017-07-30 14:01   ` Bill Schmidt
                     ` (2 more replies)
  1 sibling, 3 replies; 16+ messages in thread
From: Segher Boessenkool @ 2017-07-28 21:09 UTC (permalink / raw)
  To: Michael Meissner, GCC Patches, David Edelsohn, Bill Schmidt

Hi!

On Thu, Jul 27, 2017 at 07:21:14PM -0400, Michael Meissner wrote:
> This patches optimizes the PowerPC vector set operation for 64-bit doubles and
> longs where the elements in the vector set may have been extracted from another
> vector (PR target/81593):

> 	* config/rs6000/rs6000.c (rs6000_emit_xxpermdi): New function to
> 	emit XXPERMDI accessing either double word in either vector
> 	register inputs.

"emit" is not a good name for this: that is generally used for something
that does emit_insn, i.e. put an insn in the instruction stream.  This
function returns a string a define_insn can return.  For the rl* insns
I called the similar functions rs6000_insn_for_*, maybe something like
that is better here?

> +/* Emit a XXPERMDI instruction that can extract from either double word of the
> +   two arguments.  ELEMENT1 and ELEMENT2 are either NULL or they are 0/1 giving
> +   which double word to be used for the operand.  */
> +
> +const char *
> +rs6000_emit_xxpermdi (rtx operands[], rtx element1, rtx element2)
> +{
> +  int op1_dword = (!element1) ? 0 : INTVAL (element1);
> +  int op2_dword = (!element2) ? 0 : INTVAL (element2);
> +
> +  gcc_assert (IN_RANGE (op1_dword | op2_dword, 0, 1));
> +
> +  if (BYTES_BIG_ENDIAN)
> +    {
> +      operands[3] = GEN_INT (2*op1_dword + op2_dword);
> +      return "xxpermdi %x0,%x1,%x2,%3";
> +    }
> +  else
> +    {
> +      if (element1)
> +	op1_dword = 1 - op1_dword;
> +
> +      if (element2)
> +	op2_dword = 1 - op2_dword;
> +
> +      operands[3] = GEN_INT (op1_dword + 2*op2_dword);
> +      return "xxpermdi %x0,%x2,%x1,%3";
> +    }
> +}

I think calling this with the rtx elementN args makes this only more
complicated (the function comment doesn't say what they are or what
NULL means, btw).

>  (define_insn "vsx_concat_<mode>"
> -  [(set (match_operand:VSX_D 0 "gpc_reg_operand" "=<VSa>,we")
> +  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa,we")
>  	(vec_concat:VSX_D
> -	 (match_operand:<VS_scalar> 1 "gpc_reg_operand" "<VS_64reg>,b")
> -	 (match_operand:<VS_scalar> 2 "gpc_reg_operand" "<VS_64reg>,b")))]
> +	 (match_operand:<VS_scalar> 1 "gpc_reg_operand" "wa,b")
> +	 (match_operand:<VS_scalar> 2 "gpc_reg_operand" "wa,b")))]
>    "VECTOR_MEM_VSX_P (<MODE>mode)"
>  {
>    if (which_alternative == 0)
> -    return (BYTES_BIG_ENDIAN
> -	    ? "xxpermdi %x0,%x1,%x2,0"
> -	    : "xxpermdi %x0,%x2,%x1,0");
> +    return rs6000_emit_xxpermdi (operands, NULL_RTX, NULL_RTX);
>  
>    else if (which_alternative == 1)
> -    return (BYTES_BIG_ENDIAN
> +    return (VECTOR_ELT_ORDER_BIG
>  	    ? "mtvsrdd %x0,%1,%2"
>  	    : "mtvsrdd %x0,%2,%1");

This one could be

{
  if (!BYTES_BIG_ENDIAN)
    std::swap (operands[1], operands[2]);

  switch (which_alternative)
    {
    case 0:
      return "xxpermdi %x0,%x1,%x2,0";
    case 1:
      return "mtvsrdd %x0,%1,%2";
    default:
      gcc_unreachable ();
    }
}

(Could/should we use xxmrghd instead?  Do all supported assemblers know
that extended mnemonic, is it actually more readable?)

> --- gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c	(svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c)	(revision 0)
> +++ gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c	(.../gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c)	(revision 250640)
> @@ -0,0 +1,15 @@
> +/* { dg-do compile { target { powerpc*-*-* } } } */
> +/* { dg-skip-if "" { powerpc*-*-darwin* } } */
> +/* { dg-require-effective-target powerpc_vsx_ok } */
> +/* { dg-options "-O2 -mvsx" } */
> +
> +vector double
> +test_vpasted (vector double high, vector double low)
> +{
> +  vector double res;
> +  res[1] = high[1];
> +  res[0] = low[0];
> +  return res;
> +}
> +
> +/* { dg-final { scan-assembler-times {\mxxpermdi\M} 1 } } */

In this and the other testcase, should you test no other insns at all
are generated?


Segher

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts
  2017-07-28 21:09 ` Segher Boessenkool
@ 2017-07-30 14:01   ` Bill Schmidt
  2017-07-31 17:41     ` Michael Meissner
  2017-07-31 17:38   ` Michael Meissner
  2017-08-02 14:29   ` Michael Meissner
  2 siblings, 1 reply; 16+ messages in thread
From: Bill Schmidt @ 2017-07-30 14:01 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Michael Meissner, GCC Patches, David Edelsohn


> On Jul 28, 2017, at 4:08 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> Hi!
> 
> On Thu, Jul 27, 2017 at 07:21:14PM -0400, Michael Meissner wrote:
>> This patches optimizes the PowerPC vector set operation for 64-bit doubles and
>> longs where the elements in the vector set may have been extracted from another
>> vector (PR target/81593):
> 
>> 	* config/rs6000/rs6000.c (rs6000_emit_xxpermdi): New function to
>> 	emit XXPERMDI accessing either double word in either vector
>> 	register inputs.
> 
> "emit" is not a good name for this: that is generally used for something
> that does emit_insn, i.e. put an insn in the instruction stream.  This
> function returns a string a define_insn can return.  For the rl* insns
> I called the similar functions rs6000_insn_for_*, maybe something like
> that is better here?
> 
>> +/* Emit a XXPERMDI instruction that can extract from either double word of the
>> +   two arguments.  ELEMENT1 and ELEMENT2 are either NULL or they are 0/1 giving
>> +   which double word to be used for the operand.  */
>> +
>> +const char *
>> +rs6000_emit_xxpermdi (rtx operands[], rtx element1, rtx element2)
>> +{
>> +  int op1_dword = (!element1) ? 0 : INTVAL (element1);
>> +  int op2_dword = (!element2) ? 0 : INTVAL (element2);
>> +
>> +  gcc_assert (IN_RANGE (op1_dword | op2_dword, 0, 1));
>> +
>> +  if (BYTES_BIG_ENDIAN)
>> +    {
>> +      operands[3] = GEN_INT (2*op1_dword + op2_dword);
>> +      return "xxpermdi %x0,%x1,%x2,%3";
>> +    }
>> +  else
>> +    {
>> +      if (element1)
>> +	op1_dword = 1 - op1_dword;
>> +
>> +      if (element2)
>> +	op2_dword = 1 - op2_dword;
>> +
>> +      operands[3] = GEN_INT (op1_dword + 2*op2_dword);
>> +      return "xxpermdi %x0,%x2,%x1,%3";
>> +    }
>> +}
> 
> I think calling this with the rtx elementN args makes this only more
> complicated (the function comment doesn't say what they are or what
> NULL means, btw).
> 
>> (define_insn "vsx_concat_<mode>"
>> -  [(set (match_operand:VSX_D 0 "gpc_reg_operand" "=<VSa>,we")
>> +  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa,we")
>> 	(vec_concat:VSX_D
>> -	 (match_operand:<VS_scalar> 1 "gpc_reg_operand" "<VS_64reg>,b")
>> -	 (match_operand:<VS_scalar> 2 "gpc_reg_operand" "<VS_64reg>,b")))]
>> +	 (match_operand:<VS_scalar> 1 "gpc_reg_operand" "wa,b")
>> +	 (match_operand:<VS_scalar> 2 "gpc_reg_operand" "wa,b")))]
>>   "VECTOR_MEM_VSX_P (<MODE>mode)"
>> {
>>   if (which_alternative == 0)
>> -    return (BYTES_BIG_ENDIAN
>> -	    ? "xxpermdi %x0,%x1,%x2,0"
>> -	    : "xxpermdi %x0,%x2,%x1,0");
>> +    return rs6000_emit_xxpermdi (operands, NULL_RTX, NULL_RTX);
>> 
>>   else if (which_alternative == 1)
>> -    return (BYTES_BIG_ENDIAN
>> +    return (VECTOR_ELT_ORDER_BIG
>> 	    ? "mtvsrdd %x0,%1,%2"
>> 	    : "mtvsrdd %x0,%2,%1");
> 
> This one could be
> 
> {
>  if (!BYTES_BIG_ENDIAN)

!VECTOR_ELT_ORDER_BIG (to accommodate -maltivec=be).  (We have some general bitrot associated with -maltivec=be that needs to be addressed, or we need to give up on it altogether.  Still of two minds about this.)

Bill

>    std::swap (operands[1], operands[2]);
> 
>  switch (which_alternative)
>    {
>    case 0:
>      return "xxpermdi %x0,%x1,%x2,0";
>    case 1:
>      return "mtvsrdd %x0,%1,%2";
>    default:
>      gcc_unreachable ();
>    }
> }
> 
> (Could/should we use xxmrghd instead?  Do all supported assemblers know
> that extended mnemonic, is it actually more readable?)
> 
>> --- gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c	(svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c)	(revision 0)
>> +++ gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c	(.../gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c)	(revision 250640)
>> @@ -0,0 +1,15 @@
>> +/* { dg-do compile { target { powerpc*-*-* } } } */
>> +/* { dg-skip-if "" { powerpc*-*-darwin* } } */
>> +/* { dg-require-effective-target powerpc_vsx_ok } */
>> +/* { dg-options "-O2 -mvsx" } */
>> +
>> +vector double
>> +test_vpasted (vector double high, vector double low)
>> +{
>> +  vector double res;
>> +  res[1] = high[1];
>> +  res[0] = low[0];
>> +  return res;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times {\mxxpermdi\M} 1 } } */
> 
> In this and the other testcase, should you test no other insns at all
> are generated?
> 
> 
> Segher
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts
  2017-07-30 14:01   ` Bill Schmidt
@ 2017-07-31 17:41     ` Michael Meissner
  2017-07-31 17:42       ` Bill Schmidt
  0 siblings, 1 reply; 16+ messages in thread
From: Michael Meissner @ 2017-07-31 17:41 UTC (permalink / raw)
  To: Bill Schmidt
  Cc: Segher Boessenkool, Michael Meissner, GCC Patches, David Edelsohn

On Sun, Jul 30, 2017 at 09:00:58AM -0500, Bill Schmidt wrote:
> >> (define_insn "vsx_concat_<mode>"
> >> -  [(set (match_operand:VSX_D 0 "gpc_reg_operand" "=<VSa>,we")
> >> +  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa,we")
> >> 	(vec_concat:VSX_D
> >> -	 (match_operand:<VS_scalar> 1 "gpc_reg_operand" "<VS_64reg>,b")
> >> -	 (match_operand:<VS_scalar> 2 "gpc_reg_operand" "<VS_64reg>,b")))]
> >> +	 (match_operand:<VS_scalar> 1 "gpc_reg_operand" "wa,b")
> >> +	 (match_operand:<VS_scalar> 2 "gpc_reg_operand" "wa,b")))]
> >>   "VECTOR_MEM_VSX_P (<MODE>mode)"
> >> {
> >>   if (which_alternative == 0)
> >> -    return (BYTES_BIG_ENDIAN
> >> -	    ? "xxpermdi %x0,%x1,%x2,0"
> >> -	    : "xxpermdi %x0,%x2,%x1,0");
> >> +    return rs6000_emit_xxpermdi (operands, NULL_RTX, NULL_RTX);
> >> 
> >>   else if (which_alternative == 1)
> >> -    return (BYTES_BIG_ENDIAN
> >> +    return (VECTOR_ELT_ORDER_BIG
> >> 	    ? "mtvsrdd %x0,%1,%2"
> >> 	    : "mtvsrdd %x0,%2,%1");
> > 
> > This one could be
> > 
> > {
> >  if (!BYTES_BIG_ENDIAN)
> 
> !VECTOR_ELT_ORDER_BIG (to accommodate -maltivec=be).  (We have some general bitrot associated with -maltivec=be that needs to be addressed, or we need to give up on it altogether.  Still of two minds about this.)
> 
> Bill

In this case, I believe I tested -maltivec=be, and BYTES_BIG_ENDIAN is correct
(I originally had it using VECTOR_ELT_ORDER_BIG and got failures).  But I need
to look at it again.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts
  2017-07-31 17:41     ` Michael Meissner
@ 2017-07-31 17:42       ` Bill Schmidt
  0 siblings, 0 replies; 16+ messages in thread
From: Bill Schmidt @ 2017-07-31 17:42 UTC (permalink / raw)
  To: Michael Meissner; +Cc: Segher Boessenkool, GCC Patches, David Edelsohn


> On Jul 31, 2017, at 12:40 PM, Michael Meissner <meissner@linux.vnet.ibm.com> wrote:
> 
> On Sun, Jul 30, 2017 at 09:00:58AM -0500, Bill Schmidt wrote:
>>>> (define_insn "vsx_concat_<mode>"
>>>> -  [(set (match_operand:VSX_D 0 "gpc_reg_operand" "=<VSa>,we")
>>>> +  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa,we")
>>>> 	(vec_concat:VSX_D
>>>> -	 (match_operand:<VS_scalar> 1 "gpc_reg_operand" "<VS_64reg>,b")
>>>> -	 (match_operand:<VS_scalar> 2 "gpc_reg_operand" "<VS_64reg>,b")))]
>>>> +	 (match_operand:<VS_scalar> 1 "gpc_reg_operand" "wa,b")
>>>> +	 (match_operand:<VS_scalar> 2 "gpc_reg_operand" "wa,b")))]
>>>>  "VECTOR_MEM_VSX_P (<MODE>mode)"
>>>> {
>>>>  if (which_alternative == 0)
>>>> -    return (BYTES_BIG_ENDIAN
>>>> -	    ? "xxpermdi %x0,%x1,%x2,0"
>>>> -	    : "xxpermdi %x0,%x2,%x1,0");
>>>> +    return rs6000_emit_xxpermdi (operands, NULL_RTX, NULL_RTX);
>>>> 
>>>>  else if (which_alternative == 1)
>>>> -    return (BYTES_BIG_ENDIAN
>>>> +    return (VECTOR_ELT_ORDER_BIG
>>>> 	    ? "mtvsrdd %x0,%1,%2"
>>>> 	    : "mtvsrdd %x0,%2,%1");
>>> 
>>> This one could be
>>> 
>>> {
>>> if (!BYTES_BIG_ENDIAN)
>> 
>> !VECTOR_ELT_ORDER_BIG (to accommodate -maltivec=be).  (We have some general bitrot associated with -maltivec=be that needs to be addressed, or we need to give up on it altogether.  Still of two minds about this.)
>> 
>> Bill
> 
> In this case, I believe I tested -maltivec=be, and BYTES_BIG_ENDIAN is correct
> (I originally had it using VECTOR_ELT_ORDER_BIG and got failures).  But I need
> to look at it again.

Hi Mike,

You misunderstand me, I think you had it right (you did move to VECTOR_ELT_ORDER_BIG here)
but I just wanted to clarify that Segher's suggestion would also need to use VECTOR_ELT_ORDER_BIG.

Thanks,
Bill

> 
> -- 
> Michael Meissner, IBM
> IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
> email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts
  2017-07-28 21:09 ` Segher Boessenkool
  2017-07-30 14:01   ` Bill Schmidt
@ 2017-07-31 17:38   ` Michael Meissner
  2017-08-02 14:29   ` Michael Meissner
  2 siblings, 0 replies; 16+ messages in thread
From: Michael Meissner @ 2017-07-31 17:38 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Michael Meissner, GCC Patches, David Edelsohn, Bill Schmidt

On Fri, Jul 28, 2017 at 04:08:50PM -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, Jul 27, 2017 at 07:21:14PM -0400, Michael Meissner wrote:
> > This patches optimizes the PowerPC vector set operation for 64-bit doubles and
> > longs where the elements in the vector set may have been extracted from another
> > vector (PR target/81593):
> 
> > 	* config/rs6000/rs6000.c (rs6000_emit_xxpermdi): New function to
> > 	emit XXPERMDI accessing either double word in either vector
> > 	register inputs.
> 
> "emit" is not a good name for this: that is generally used for something
> that does emit_insn, i.e. put an insn in the instruction stream.  This
> function returns a string a define_insn can return.  For the rl* insns
> I called the similar functions rs6000_insn_for_*, maybe something like
> that is better here?

Yeah, I should have used rs6000_output_xxpermdi or similar (or output_xxpermdi,
etc.), which is what the other functions used.

> > +/* Emit a XXPERMDI instruction that can extract from either double word of the
> > +   two arguments.  ELEMENT1 and ELEMENT2 are either NULL or they are 0/1 giving
> > +   which double word to be used for the operand.  */
> > +
> > +const char *
> > +rs6000_emit_xxpermdi (rtx operands[], rtx element1, rtx element2)
> > +{
> > +  int op1_dword = (!element1) ? 0 : INTVAL (element1);
> > +  int op2_dword = (!element2) ? 0 : INTVAL (element2);
> > +
> > +  gcc_assert (IN_RANGE (op1_dword | op2_dword, 0, 1));
> > +
> > +  if (BYTES_BIG_ENDIAN)
> > +    {
> > +      operands[3] = GEN_INT (2*op1_dword + op2_dword);
> > +      return "xxpermdi %x0,%x1,%x2,%3";
> > +    }
> > +  else
> > +    {
> > +      if (element1)
> > +	op1_dword = 1 - op1_dword;
> > +
> > +      if (element2)
> > +	op2_dword = 1 - op2_dword;
> > +
> > +      operands[3] = GEN_INT (op1_dword + 2*op2_dword);
> > +      return "xxpermdi %x0,%x2,%x1,%3";
> > +    }
> > +}
> 
> I think calling this with the rtx elementN args makes this only more
> complicated (the function comment doesn't say what they are or what
> NULL means, btw).

Ok, let me think on it.

> 
> >  (define_insn "vsx_concat_<mode>"
> > -  [(set (match_operand:VSX_D 0 "gpc_reg_operand" "=<VSa>,we")
> > +  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa,we")
> >  	(vec_concat:VSX_D
> > -	 (match_operand:<VS_scalar> 1 "gpc_reg_operand" "<VS_64reg>,b")
> > -	 (match_operand:<VS_scalar> 2 "gpc_reg_operand" "<VS_64reg>,b")))]
> > +	 (match_operand:<VS_scalar> 1 "gpc_reg_operand" "wa,b")
> > +	 (match_operand:<VS_scalar> 2 "gpc_reg_operand" "wa,b")))]
> >    "VECTOR_MEM_VSX_P (<MODE>mode)"
> >  {
> >    if (which_alternative == 0)
> > -    return (BYTES_BIG_ENDIAN
> > -	    ? "xxpermdi %x0,%x1,%x2,0"
> > -	    : "xxpermdi %x0,%x2,%x1,0");
> > +    return rs6000_emit_xxpermdi (operands, NULL_RTX, NULL_RTX);
> >  
> >    else if (which_alternative == 1)
> > -    return (BYTES_BIG_ENDIAN
> > +    return (VECTOR_ELT_ORDER_BIG
> >  	    ? "mtvsrdd %x0,%1,%2"
> >  	    : "mtvsrdd %x0,%2,%1");
> 
> This one could be
> 
> {
>   if (!BYTES_BIG_ENDIAN)
>     std::swap (operands[1], operands[2]);
> 
>   switch (which_alternative)
>     {
>     case 0:
>       return "xxpermdi %x0,%x1,%x2,0";
>     case 1:
>       return "mtvsrdd %x0,%1,%2";
>     default:
>       gcc_unreachable ();
>     }
> }

> (Could/should we use xxmrghd instead?  Do all supported assemblers know
> that extended mnemonic, is it actually more readable?)

For me no, xxpermdi is clearer.  But if you want xxmrghd, I can do it.

> > --- gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c	(svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c)	(revision 0)
> > +++ gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c	(.../gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c)	(revision 250640)
> > @@ -0,0 +1,15 @@
> > +/* { dg-do compile { target { powerpc*-*-* } } } */
> > +/* { dg-skip-if "" { powerpc*-*-darwin* } } */
> > +/* { dg-require-effective-target powerpc_vsx_ok } */
> > +/* { dg-options "-O2 -mvsx" } */
> > +
> > +vector double
> > +test_vpasted (vector double high, vector double low)
> > +{
> > +  vector double res;
> > +  res[1] = high[1];
> > +  res[0] = low[0];
> > +  return res;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times {\mxxpermdi\M} 1 } } */
> 
> In this and the other testcase, should you test no other insns at all
> are generated?

It is kind of hard to test a negative, without trying to guess what possible
instructions could be generated.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts
  2017-07-28 21:09 ` Segher Boessenkool
  2017-07-30 14:01   ` Bill Schmidt
  2017-07-31 17:38   ` Michael Meissner
@ 2017-08-02 14:29   ` Michael Meissner
  2017-08-03 15:01     ` Segher Boessenkool
  2 siblings, 1 reply; 16+ messages in thread
From: Michael Meissner @ 2017-08-02 14:29 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Michael Meissner, GCC Patches, David Edelsohn, Bill Schmidt

[-- Attachment #1: Type: text/plain, Size: 1951 bytes --]

On Fri, Jul 28, 2017 at 04:08:50PM -0500, Segher Boessenkool wrote:
> 
> "emit" is not a good name for this: that is generally used for something
> that does emit_insn, i.e. put an insn in the instruction stream.  This
> function returns a string a define_insn can return.  For the rl* insns
> I called the similar functions rs6000_insn_for_*, maybe something like
> that is better here?

...

> I think calling this with the rtx elementN args makes this only more
> complicated (the function comment doesn't say what they are or what
> NULL means, btw).

...

> In this and the other testcase, should you test no other insns at all
> are generated?

Here are the revised patches.  I tested on a little endian power8 system and a
big endian power7 system.  Are these patches ok for the trunk?

[gcc]
2017-08-02  Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/81593
	* config/rs6000/rs6000-protos.h (rs6000_output_xxpermdi): New
	declaration.
	* config/rs6000/rs6000.c (rs6000_output_xxpermdi): New function to
	emit XXPERMDI accessing either double word in either vector
	register inputs.
	* config/rs6000/vsx.md (vsx_concat_<mode>, VSX_D iterator):
	Rewrite VEC_CONCAT insn to call rs6000_output_xxpermdi.  Simplify
	the constraints with the removal of the -mupper-regs-* switches.
	(vsx_concat_<mode>_1): New combiner insns to optimize CONCATs
	where either register might have come from VEC_SELECT.
	(vsx_concat_<mode>_2): Likewise.
	(vsx_concat_<mode>_3): Likewise.
	(vsx_set_<mode>, VSX_D iterator): Rewrite insn to generate a
	VEC_CONCAT rather than use an UNSPEC to specify the option.

[gcc/testsuite]
2017-08-02  Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/81593
	* gcc.target/powerpc/vsx-extract-6.c: New test.
	* gcc.target/powerpc/vsx-extract-7.c: Likewise.


-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

[-- Attachment #2: pr81593.patch02b --]
[-- Type: text/plain, Size: 10486 bytes --]

Index: gcc/config/rs6000/rs6000-protos.h
===================================================================
--- gcc/config/rs6000/rs6000-protos.h	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)	(revision 250793)
+++ gcc/config/rs6000/rs6000-protos.h	(.../gcc/config/rs6000)	(working copy)
@@ -233,6 +233,7 @@ extern void rs6000_asm_output_dwarf_pcre
 					   const char *label);
 extern void rs6000_asm_output_dwarf_datarel (FILE *file, int size,
 					     const char *label);
+extern const char *rs6000_output_xxpermdi (rtx, rtx, rtx, rtx, rtx);
 
 /* Declare functions in rs6000-c.c */
 
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)	(revision 250793)
+++ gcc/config/rs6000/rs6000.c	(.../gcc/config/rs6000)	(working copy)
@@ -39007,6 +39007,60 @@ rs6000_optab_supported_p (int op, machin
       return true;
     }
 }
+
+\f
+/* Output a xxpermdi instruction that sets a 128-bit vector DEST combining two
+   inputs SRC1 and SRC2.
+
+   If ELEMENT1 is null, use the top 64-bit double word of ARG1.  If it is
+   non-NULL, it is a 0 or 1 constant that gives the vector element number to
+   use for extracting the 64-bit double word from ARG1.
+
+   If ELEMENT2 is null, use the top 64-bit double word of ARG2.  If it is
+   non-NULL, it is a 0 or 1 constant that gives the vector element number to
+   use for extracting the 64-bit double word from ARG2.
+
+   The element number is based on the user element ordering, set by the
+   endianess and by the -maltivec={le,be} options.  */
+
+const char *
+rs6000_output_xxpermdi (rtx dest,
+			rtx src1,
+			rtx src2,
+			rtx element1,
+			rtx element2)
+{
+  int op1_dword = (!element1) ? 0 : INTVAL (element1);
+  int op2_dword = (!element2) ? 0 : INTVAL (element2);
+  rtx xops[10];
+  const char *insn_string;
+
+  gcc_assert (IN_RANGE (op1_dword | op2_dword, 0, 1));
+  xops[0] = dest;
+  xops[1] = src1;
+  xops[2] = src2;
+
+  if (BYTES_BIG_ENDIAN)
+    {
+      xops[3] = GEN_INT (2*op1_dword + op2_dword);
+      insn_string = "xxpermdi %x0,%x1,%x2,%3";
+    }
+  else
+    {
+      if (element1)
+	op1_dword = 1 - op1_dword;
+
+      if (element2)
+	op2_dword = 1 - op2_dword;
+
+      xops[3] = GEN_INT (op1_dword + 2*op2_dword);
+      insn_string = "xxpermdi %x0,%x2,%x1,%3";
+    }
+
+  output_asm_insn (insn_string, xops);
+  return "";
+}
+
 \f
 struct gcc_target targetm = TARGET_INITIALIZER;
 
Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)	(revision 250793)
+++ gcc/config/rs6000/vsx.md	(.../gcc/config/rs6000)	(working copy)
@@ -2364,19 +2364,18 @@ (define_insn "*vsx_float_fix_v2df2"
 
 ;; Build a V2DF/V2DI vector from two scalars
 (define_insn "vsx_concat_<mode>"
-  [(set (match_operand:VSX_D 0 "gpc_reg_operand" "=<VSa>,we")
+  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa,we")
 	(vec_concat:VSX_D
-	 (match_operand:<VS_scalar> 1 "gpc_reg_operand" "<VS_64reg>,b")
-	 (match_operand:<VS_scalar> 2 "gpc_reg_operand" "<VS_64reg>,b")))]
+	 (match_operand:<VS_scalar> 1 "gpc_reg_operand" "wa,b")
+	 (match_operand:<VS_scalar> 2 "gpc_reg_operand" "wa,b")))]
   "VECTOR_MEM_VSX_P (<MODE>mode)"
 {
   if (which_alternative == 0)
-    return (BYTES_BIG_ENDIAN
-	    ? "xxpermdi %x0,%x1,%x2,0"
-	    : "xxpermdi %x0,%x2,%x1,0");
+    return rs6000_output_xxpermdi (operands[0], operands[1], operands[2],
+				   NULL_RTX, NULL_RTX);
 
   else if (which_alternative == 1)
-    return (BYTES_BIG_ENDIAN
+    return (VECTOR_ELT_ORDER_BIG
 	    ? "mtvsrdd %x0,%1,%2"
 	    : "mtvsrdd %x0,%2,%1");
 
@@ -2385,6 +2384,50 @@ (define_insn "vsx_concat_<mode>"
 }
   [(set_attr "type" "vecperm")])
 
+;; Combiner patterns to allow creating XXPERMDI's to access either double
+;; register in a vector register.  Note, rs6000_output_xxpermdi expects
+;; operands[0..2] to be the vector registers.
+(define_insn "*vsx_concat_<mode>_1"
+  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")
+	(vec_concat:VSX_D
+	 (vec_select:<VS_scalar>
+	  (match_operand:VSX_D 1 "gpc_reg_operand" "wa")
+	  (parallel [(match_operand:QI 3 "const_0_to_1_operand" "n")]))
+	 (match_operand:<VS_scalar> 2 "gpc_reg_operand" "wa")))]
+  "VECTOR_MEM_VSX_P (<MODE>mode)"
+{
+  return rs6000_output_xxpermdi (operands[0], operands[1], operands[2],
+				 operands[3], NULL_RTX);
+})
+
+(define_insn "*vsx_concat_<mode>_2"
+  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")
+	(vec_concat:VSX_D
+	 (match_operand:<VS_scalar> 1 "gpc_reg_operand" "wa")
+	 (vec_select:<VS_scalar>
+	  (match_operand:VSX_D 2 "gpc_reg_operand" "wa")
+	  (parallel [(match_operand:QI 3 "const_0_to_1_operand" "n")]))))]
+  "VECTOR_MEM_VSX_P (<MODE>mode)"
+{
+  return rs6000_output_xxpermdi (operands[0], operands[1], operands[2],
+				 NULL_RTX, operands[3]);
+})
+
+(define_insn "*vsx_concat_<mode>_3"
+  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")
+	(vec_concat:VSX_D
+	 (vec_select:<VS_scalar>
+	  (match_operand:VSX_D 1 "gpc_reg_operand" "wa")
+	  (parallel [(match_operand:QI 3 "const_0_to_1_operand" "n")]))
+	 (vec_select:<VS_scalar>
+	  (match_operand:VSX_D 2 "gpc_reg_operand" "wa")
+	  (parallel [(match_operand:QI 4 "const_0_to_1_operand" "n")]))))]
+  "VECTOR_MEM_VSX_P (<MODE>mode)"
+{
+  return rs6000_output_xxpermdi (operands[0], operands[1], operands[2],
+				 operands[3], operands[4]);
+})
+
 ;; Special purpose concat using xxpermdi to glue two single precision values
 ;; together, relying on the fact that internally scalar floats are represented
 ;; as doubles.  This is used to initialize a V4SF vector with 4 floats
@@ -2585,25 +2628,35 @@ (define_expand "vsx_set_v1ti"
   DONE;
 })
 
-;; Set the element of a V2DI/VD2F mode
-(define_insn "vsx_set_<mode>"
-  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wd,?<VSa>")
-	(unspec:VSX_D
-	 [(match_operand:VSX_D 1 "vsx_register_operand" "wd,<VSa>")
-	  (match_operand:<VS_scalar> 2 "vsx_register_operand" "<VS_64reg>,<VSa>")
-	  (match_operand:QI 3 "u5bit_cint_operand" "i,i")]
-	 UNSPEC_VSX_SET))]
+;; Rewrite V2DF/V2DI set in terms of VEC_CONCAT
+(define_expand "vsx_set_<mode>"
+  [(use (match_operand:VSX_D 0 "vsx_register_operand"))
+   (use (match_operand:VSX_D 1 "vsx_register_operand"))
+   (use (match_operand:<VS_scalar> 2 "gpc_reg_operand"))
+   (use (match_operand:QI 3 "const_0_to_1_operand"))]
   "VECTOR_MEM_VSX_P (<MODE>mode)"
 {
-  int idx_first = BYTES_BIG_ENDIAN ? 0 : 1;
-  if (INTVAL (operands[3]) == idx_first)
-    return \"xxpermdi %x0,%x2,%x1,1\";
-  else if (INTVAL (operands[3]) == 1 - idx_first)
-    return \"xxpermdi %x0,%x1,%x2,0\";
+  rtx dest = operands[0];
+  rtx vec_reg = operands[1];
+  rtx value = operands[2];
+  rtx ele = operands[3];
+  rtx tmp = gen_reg_rtx (<VS_scalar>mode);
+
+  if (ele == const0_rtx)
+    {
+      emit_insn (gen_vsx_extract_<mode> (tmp, vec_reg, const1_rtx));
+      emit_insn (gen_vsx_concat_<mode> (dest, value, tmp));
+      DONE;
+    }
+  else if (ele == const1_rtx)
+    {
+      emit_insn (gen_vsx_extract_<mode> (tmp, vec_reg, const0_rtx));
+      emit_insn (gen_vsx_concat_<mode> (dest, tmp, value));
+      DONE;
+    }
   else
     gcc_unreachable ();
-}
-  [(set_attr "type" "vecperm")])
+})
 
 ;; Extract a DF/DI element from V2DF/V2DI
 ;; Optimize cases were we can do a simple or direct move.
Index: gcc/testsuite/gcc.target/powerpc/vsx-extract-6.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vsx-extract-6.c	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc)	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vsx-extract-6.c	(.../gcc/testsuite/gcc.target/powerpc)	(revision 250804)
@@ -0,0 +1,25 @@
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mvsx" } */
+
+vector unsigned long
+test_vpasted (vector unsigned long high, vector unsigned long low)
+{
+  vector unsigned long res;
+  res[1] = high[1];
+  res[0] = low[0];
+  return res;
+}
+
+/* { dg-final { scan-assembler-times {\mxxpermdi\M} 1    } } */
+/* { dg-final { scan-assembler-not   {\mvspltisw\M}      } } */
+/* { dg-final { scan-assembler-not   {\mxxlor\M}         } } */
+/* { dg-final { scan-assembler-not   {\mxxlxor\M}        } } */
+/* { dg-final { scan-assembler-not   {\mxxspltib\M}      } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}         } } */
+/* { dg-final { scan-assembler-not   {\mlxv[dw][24]x\M}  } } */
+/* { dg-final { scan-assembler-not   {\mlvx\M}           } } */
+/* { dg-final { scan-assembler-not   {\mstxvx?\M}        } } */
+/* { dg-final { scan-assembler-not   {\mstxv[dw][24]x\M} } } */
+/* { dg-final { scan-assembler-not   {\mstvx\M}          } } */
Index: gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc)	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c	(.../gcc/testsuite/gcc.target/powerpc)	(revision 250804)
@@ -0,0 +1,25 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mvsx" } */
+
+vector double
+test_vpasted (vector double high, vector double low)
+{
+  vector double res;
+  res[1] = high[1];
+  res[0] = low[0];
+  return res;
+}
+
+/* { dg-final { scan-assembler-times {\mxxpermdi\M} 1    } } */
+/* { dg-final { scan-assembler-not   {\mvspltisw\M}      } } */
+/* { dg-final { scan-assembler-not   {\mxxlor\M}         } } */
+/* { dg-final { scan-assembler-not   {\mxxlxor\M}        } } */
+/* { dg-final { scan-assembler-not   {\mxxspltib\M}      } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}         } } */
+/* { dg-final { scan-assembler-not   {\mlxv[dw][24]x\M}  } } */
+/* { dg-final { scan-assembler-not   {\mlvx\M}           } } */
+/* { dg-final { scan-assembler-not   {\mstxvx?\M}        } } */
+/* { dg-final { scan-assembler-not   {\mstxv[dw][24]x\M} } } */
+/* { dg-final { scan-assembler-not   {\mstvx\M}          } } */

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts
  2017-08-02 14:29   ` Michael Meissner
@ 2017-08-03 15:01     ` Segher Boessenkool
  2017-08-03 17:23       ` Michael Meissner
  2017-08-07 13:18       ` Michael Meissner
  0 siblings, 2 replies; 16+ messages in thread
From: Segher Boessenkool @ 2017-08-03 15:01 UTC (permalink / raw)
  To: Michael Meissner, GCC Patches, David Edelsohn, Bill Schmidt

Hi Mike,

On Wed, Aug 02, 2017 at 10:28:55AM -0400, Michael Meissner wrote:
> On Fri, Jul 28, 2017 at 04:08:50PM -0500, Segher Boessenkool wrote:
> > I think calling this with the rtx elementN args makes this only more
> > complicated (the function comment doesn't say what they are or what
> > NULL means, btw).

You didn't handle the first part of this as far as I see?  It's the
big complicating issue here.

> +   If ELEMENT1 is null, use the top 64-bit double word of ARG1.  If it is
> +   non-NULL, it is a 0 or 1 constant that gives the vector element number to
> +   use for extracting the 64-bit double word from ARG1.
> +
> +   If ELEMENT2 is null, use the top 64-bit double word of ARG2.  If it is
> +   non-NULL, it is a 0 or 1 constant that gives the vector element number to
> +   use for extracting the 64-bit double word from ARG2.
> +
> +   The element number is based on the user element ordering, set by the
> +   endianess and by the -maltivec={le,be} options.  */

("endianness", two n's).

I don't like using NULL as a magic value at all; it does not simplify
this interface, it complicates it instead.

Can you move the "which half is high" decision to the callers?


Segher

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts
  2017-08-03 15:01     ` Segher Boessenkool
@ 2017-08-03 17:23       ` Michael Meissner
  2017-08-07 13:18       ` Michael Meissner
  1 sibling, 0 replies; 16+ messages in thread
From: Michael Meissner @ 2017-08-03 17:23 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Michael Meissner, GCC Patches, David Edelsohn, Bill Schmidt

On Thu, Aug 03, 2017 at 10:01:41AM -0500, Segher Boessenkool wrote:
> Hi Mike,
> 
> On Wed, Aug 02, 2017 at 10:28:55AM -0400, Michael Meissner wrote:
> > On Fri, Jul 28, 2017 at 04:08:50PM -0500, Segher Boessenkool wrote:
> > > I think calling this with the rtx elementN args makes this only more
> > > complicated (the function comment doesn't say what they are or what
> > > NULL means, btw).
> 
> You didn't handle the first part of this as far as I see?  It's the
> big complicating issue here.

I am not sure exactly what you are asking for.  This is like the other output
functions that take the rtx insns.

> > +   If ELEMENT1 is null, use the top 64-bit double word of ARG1.  If it is
> > +   non-NULL, it is a 0 or 1 constant that gives the vector element number to
> > +   use for extracting the 64-bit double word from ARG1.
> > +
> > +   If ELEMENT2 is null, use the top 64-bit double word of ARG2.  If it is
> > +   non-NULL, it is a 0 or 1 constant that gives the vector element number to
> > +   use for extracting the 64-bit double word from ARG2.
> > +
> > +   The element number is based on the user element ordering, set by the
> > +   endianess and by the -maltivec={le,be} options.  */
> 
> ("endianness", two n's).
> 
> I don't like using NULL as a magic value at all; it does not simplify
> this interface, it complicates it instead.
>
> Can you move the "which half is high" decision to the callers?

And then essentially there is no need for the function, since each of the 4
concat variants have to have the logic to support big endian, -maltivec=be, and
-maltivec=le.

Let me see what I can do about it.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts
  2017-08-03 15:01     ` Segher Boessenkool
  2017-08-03 17:23       ` Michael Meissner
@ 2017-08-07 13:18       ` Michael Meissner
  2017-08-07 21:54         ` Segher Boessenkool
  1 sibling, 1 reply; 16+ messages in thread
From: Michael Meissner @ 2017-08-07 13:18 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Michael Meissner, GCC Patches, David Edelsohn, Bill Schmidt

[-- Attachment #1: Type: text/plain, Size: 3765 bytes --]

On Thu, Aug 03, 2017 at 10:01:41AM -0500, Segher Boessenkool wrote:
> Hi Mike,
> 
> On Wed, Aug 02, 2017 at 10:28:55AM -0400, Michael Meissner wrote:
> > On Fri, Jul 28, 2017 at 04:08:50PM -0500, Segher Boessenkool wrote:
> > > I think calling this with the rtx elementN args makes this only more
> > > complicated (the function comment doesn't say what they are or what
> > > NULL means, btw).
> 
> You didn't handle the first part of this as far as I see?  It's the
> big complicating issue here.
> 
> > +   If ELEMENT1 is null, use the top 64-bit double word of ARG1.  If it is
> > +   non-NULL, it is a 0 or 1 constant that gives the vector element number to
> > +   use for extracting the 64-bit double word from ARG1.
> > +
> > +   If ELEMENT2 is null, use the top 64-bit double word of ARG2.  If it is
> > +   non-NULL, it is a 0 or 1 constant that gives the vector element number to
> > +   use for extracting the 64-bit double word from ARG2.
> > +
> > +   The element number is based on the user element ordering, set by the
> > +   endianess and by the -maltivec={le,be} options.  */
> 
> ("endianness", two n's).
> 
> I don't like using NULL as a magic value at all; it does not simplify
> this interface, it complicates it instead.
> 
> Can you move the "which half is high" decision to the callers?

I rewrote the patch to eliminate the rs6000_output_xxpermdi function, and do
the calculation of the XXPERMDI mask in each of the vsx_concat_<mask>_{1,2,3}
insns.  Just to be sure I got things correct, I wrote a new executable test
that tests various methods of creating/inserting 2 element vectors with double
word elements, and tested in BE, LE -maltivec=be, and LE, and the results match
previous compilers.

I have done bootstrap/build checks on a big endian power7, a little endian
power8 system, and I have done a non-bootstrap/check on a power9 prototype (I
have script issues that prevents a bootstrap build on power9 that I need to
look into).  There are no regressions in the tests and the new tests were run
on each of the systems.  Can I check this into the trunk?

I would also like to backport it to all open branches (particularly GCC 7, but
GCC 6 if possible).  Note, the patch will need a slight tweak on the older
systems due to GCC 7 still supporting -mupper-regs-{df,di} and I have to adjust
the constraints to accomidate this, and under GCC 6 DImode not being allowed in
traditional Altivec registers.

[gcc]
2017-08-07  Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/81593
	* config/rs6000/vsx.md (vsx_concat_<mode>, VSX_D): Cleanup
	constraints since the -mupper-regs-* switches have been
	eliminated.
	(vsx_concat_<mode>_1): New combiner insns to recognize inserting
	into a vector from a double word element that was extracted from
	another vector, and eliminate extra XXPERMDI instructions.
	(vsx_concat_<mode>_2): Likewise.
	(vsx_concat_<mode>_3): Likewise.
	(vsx_set_<mode>, VSX_D): Rewrite vector set in terms of vector
	concat to allow optimizing inserts from previous extracts.

[gcc/testsuite]
2017-08-07  Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/81593
	* gcc.target/powerpc/vec-setup.h: New tests to test various
	combinations of setting up vectors of 2 double word elements.
	* gcc.target/powerpc/vec-setup-long.c: Likewise.
	* gcc.target/powerpc/vec-setup-double.c: Likewise.
	* gcc.target/powerpc/vec-setup-be-long.c: Likewise.
	* gcc.target/powerpc/vec-setup-be-double.c: Likewise.
	* gcc.target/powerpc/vsx-extract-6.c: New tests for optimzing
	vector inserts from vector extracts.
	* gcc.target/powerpc/vsx-extract-7.c: Likewise.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

[-- Attachment #2: pr81593.patch05b --]
[-- Type: text/plain, Size: 21249 bytes --]

Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)	(revision 250858)
+++ gcc/config/rs6000/vsx.md	(.../gcc/config/rs6000)	(working copy)
@@ -2364,10 +2364,10 @@ (define_insn "*vsx_float_fix_v2df2"
 
 ;; Build a V2DF/V2DI vector from two scalars
 (define_insn "vsx_concat_<mode>"
-  [(set (match_operand:VSX_D 0 "gpc_reg_operand" "=<VSa>,we")
+  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa,we")
 	(vec_concat:VSX_D
-	 (match_operand:<VS_scalar> 1 "gpc_reg_operand" "<VS_64reg>,b")
-	 (match_operand:<VS_scalar> 2 "gpc_reg_operand" "<VS_64reg>,b")))]
+	 (match_operand:<VS_scalar> 1 "gpc_reg_operand" "wa,b")
+	 (match_operand:<VS_scalar> 2 "gpc_reg_operand" "wa,b")))]
   "VECTOR_MEM_VSX_P (<MODE>mode)"
 {
   if (which_alternative == 0)
@@ -2385,6 +2385,80 @@ (define_insn "vsx_concat_<mode>"
 }
   [(set_attr "type" "vecperm")])
 
+;; Combiner patterns to allow creating XXPERMDI's to access either double
+;; word element in a vector register.
+(define_insn "*vsx_concat_<mode>_1"
+  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")
+	(vec_concat:VSX_D
+	 (vec_select:<VS_scalar>
+	  (match_operand:VSX_D 1 "gpc_reg_operand" "wa")
+	  (parallel [(match_operand:QI 2 "const_0_to_1_operand" "n")]))
+	 (match_operand:<VS_scalar> 3 "gpc_reg_operand" "wa")))]
+  "VECTOR_MEM_VSX_P (<MODE>mode)"
+{
+  HOST_WIDE_INT dword = INTVAL (operands[2]);
+  if (BYTES_BIG_ENDIAN)
+    {
+      operands[4] = GEN_INT (2*dword);
+      return "xxpermdi %x0,%x1,%x3,%4";
+    }
+  else
+    {
+      operands[4] = GEN_INT (!dword);
+      return "xxpermdi %x0,%x3,%x1,%4";
+    }
+}
+  [(set_attr "type" "vecperm")])
+
+(define_insn "*vsx_concat_<mode>_2"
+  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")
+	(vec_concat:VSX_D
+	 (match_operand:<VS_scalar> 1 "gpc_reg_operand" "wa")
+	 (vec_select:<VS_scalar>
+	  (match_operand:VSX_D 2 "gpc_reg_operand" "wa")
+	  (parallel [(match_operand:QI 3 "const_0_to_1_operand" "n")]))))]
+  "VECTOR_MEM_VSX_P (<MODE>mode)"
+{
+  HOST_WIDE_INT dword = INTVAL (operands[3]);
+  if (BYTES_BIG_ENDIAN)
+    {
+      operands[4] = GEN_INT (dword);
+      return "xxpermdi %x0,%x1,%x2,%4";
+    }
+  else
+    {
+      operands[4] = GEN_INT (2 * !dword);
+      return "xxpermdi %x0,%x2,%x1,%4";
+    }
+}
+  [(set_attr "type" "vecperm")])
+
+(define_insn "*vsx_concat_<mode>_3"
+  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")
+	(vec_concat:VSX_D
+	 (vec_select:<VS_scalar>
+	  (match_operand:VSX_D 1 "gpc_reg_operand" "wa")
+	  (parallel [(match_operand:QI 2 "const_0_to_1_operand" "n")]))
+	 (vec_select:<VS_scalar>
+	  (match_operand:VSX_D 3 "gpc_reg_operand" "wa")
+	  (parallel [(match_operand:QI 4 "const_0_to_1_operand" "n")]))))]
+  "VECTOR_MEM_VSX_P (<MODE>mode)"
+{
+  HOST_WIDE_INT dword1 = INTVAL (operands[2]);
+  HOST_WIDE_INT dword2 = INTVAL (operands[4]);
+  if (BYTES_BIG_ENDIAN)
+    {
+      operands[5] = GEN_INT ((2 * dword1) + dword2);
+      return "xxpermdi %x0,%x1,%x3,%5";
+    }
+  else
+    {
+      operands[5] = GEN_INT ((2 * !dword2) + !dword1);
+      return "xxpermdi %x0,%x3,%x1,%5";
+    }
+}
+  [(set_attr "type" "vecperm")])
+
 ;; Special purpose concat using xxpermdi to glue two single precision values
 ;; together, relying on the fact that internally scalar floats are represented
 ;; as doubles.  This is used to initialize a V4SF vector with 4 floats
@@ -2585,25 +2659,35 @@ (define_expand "vsx_set_v1ti"
   DONE;
 })
 
-;; Set the element of a V2DI/VD2F mode
-(define_insn "vsx_set_<mode>"
-  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wd,?<VSa>")
-	(unspec:VSX_D
-	 [(match_operand:VSX_D 1 "vsx_register_operand" "wd,<VSa>")
-	  (match_operand:<VS_scalar> 2 "vsx_register_operand" "<VS_64reg>,<VSa>")
-	  (match_operand:QI 3 "u5bit_cint_operand" "i,i")]
-	 UNSPEC_VSX_SET))]
+;; Rewrite V2DF/V2DI set in terms of VEC_CONCAT
+(define_expand "vsx_set_<mode>"
+  [(use (match_operand:VSX_D 0 "vsx_register_operand"))
+   (use (match_operand:VSX_D 1 "vsx_register_operand"))
+   (use (match_operand:<VS_scalar> 2 "gpc_reg_operand"))
+   (use (match_operand:QI 3 "const_0_to_1_operand"))]
   "VECTOR_MEM_VSX_P (<MODE>mode)"
 {
-  int idx_first = BYTES_BIG_ENDIAN ? 0 : 1;
-  if (INTVAL (operands[3]) == idx_first)
-    return \"xxpermdi %x0,%x2,%x1,1\";
-  else if (INTVAL (operands[3]) == 1 - idx_first)
-    return \"xxpermdi %x0,%x1,%x2,0\";
+  rtx dest = operands[0];
+  rtx vec_reg = operands[1];
+  rtx value = operands[2];
+  rtx ele = operands[3];
+  rtx tmp = gen_reg_rtx (<VS_scalar>mode);
+
+  if (ele == const0_rtx)
+    {
+      emit_insn (gen_vsx_extract_<mode> (tmp, vec_reg, const1_rtx));
+      emit_insn (gen_vsx_concat_<mode> (dest, value, tmp));
+      DONE;
+    }
+  else if (ele == const1_rtx)
+    {
+      emit_insn (gen_vsx_extract_<mode> (tmp, vec_reg, const0_rtx));
+      emit_insn (gen_vsx_concat_<mode> (dest, tmp, value));
+      DONE;
+    }
   else
     gcc_unreachable ();
-}
-  [(set_attr "type" "vecperm")])
+})
 
 ;; Extract a DF/DI element from V2DF/V2DI
 ;; Optimize cases were we can do a simple or direct move.
Index: gcc/testsuite/gcc.target/powerpc/vec-setup-be-long.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vec-setup-be-long.c	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc)	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vec-setup-be-long.c	(.../gcc/testsuite/gcc.target/powerpc)	(revision 250878)
@@ -0,0 +1,11 @@
+/* { dg-do run { target { powerpc64le*-*-linux* } } } */
+/* { dg-require-effective-target vsx_hw } */
+/* { dg-options "-O2 -mvsx -maltivec=be" } */
+
+/* Test various ways of creating vectors with 2 double words and accessing the
+   elements.  This test uses the long (on 64-bit systems) or long long datatype
+   (on 32-bit systems).
+
+   This test explicitly tests -maltivec=be to make sure things are correct.  */
+
+#include "vec-setup.h"
Index: gcc/testsuite/gcc.target/powerpc/vec-setup.h
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vec-setup.h	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc)	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vec-setup.h	(.../gcc/testsuite/gcc.target/powerpc)	(revision 250878)
@@ -0,0 +1,366 @@
+#include <altivec.h>
+
+/* Test various ways of creating vectors with 2 double words and accessing the
+   elements.  This include files supports:
+
+	 testing double
+	 testing long on 64-bit systems
+	 testing long long on 32-bit systems.
+
+   The endian support is:
+
+	big endian
+	little endian with little endian element ordering
+	little endian with big endian element ordering.  */
+
+#ifdef DEBUG
+#include <stdio.h>
+#define DEBUG0(STR)		fputs (STR, stdout)
+#define DEBUG2(STR,A,B)		printf (STR, A, B)
+
+static int errors = 0;
+
+#else
+#include <stdlib.h>
+#define DEBUG0(STR)
+#define DEBUG2(STR,A,B)
+#endif
+
+#if defined(DO_DOUBLE)
+#define TYPE	double
+#define STYPE	"double"
+#define ZERO	0.0
+#define ONE	1.0
+#define TWO	2.0
+#define THREE	3.0
+#define FOUR	4.0
+#define FIVE	5.0
+#define SIX	6.0
+#define FMT	"g"
+
+#elif defined(_ARCH_PPC64)
+#define TYPE	long
+#define STYPE	"long"
+#define ZERO	0L
+#define ONE	1L
+#define TWO	2L
+#define THREE	3L
+#define FOUR	4L
+#define FIVE	5L
+#define SIX	6L
+#define FMT	"ld"
+
+#else
+#define TYPE	long long
+#define STYPE	"long long"
+#define ZERO	0LL
+#define ONE	1LL
+#define TWO	2LL
+#define THREE	3LL
+#define FOUR	4LL
+#define FIVE	5LL
+#define SIX	6LL
+#define FMT	"lld"
+#endif
+
+/* Macros to order the left/right values correctly.  Note, -maltivec=be does
+   not change the order for static initializations, so we have to handle it
+   specially.  */
+
+#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+#define INIT_ORDER(A, B)	(TYPE) A, (TYPE) B
+#define ELEMENT_ORDER(A, B)	(TYPE) A, (TYPE) B
+#define ENDIAN			"-mbig"
+
+#elif __VEC_ELEMENT_REG_ORDER__ == __ORDER_BIG_ENDIAN__
+#define NO_ARRAY
+#define INIT_ORDER(A, B)	(TYPE) B, (TYPE) A
+#define ELEMENT_ORDER(A, B)	(TYPE) A, (TYPE) B
+#define ENDIAN			"-mlittle -maltivec=be"
+
+#else
+#define INIT_ORDER(A, B)	(TYPE) B, (TYPE) A
+#define ELEMENT_ORDER(A, B)	(TYPE) B, (TYPE) A
+#define ENDIAN			"-mlittle"
+#endif
+
+static volatile TYPE		five	= FIVE;
+static volatile TYPE		six	= SIX;
+static volatile vector TYPE	s_v12 = { ONE,   TWO };
+static volatile vector TYPE	g_v34 = { THREE, FOUR };
+
+
+__attribute__((__noinline__))
+static void
+vector_check (vector TYPE v, TYPE expect_hi, TYPE expect_lo)
+{
+  TYPE actual_hi, actual_lo;
+#ifdef DEBUG
+  const char *pass_fail;
+#endif
+
+  __asm__ ("xxlor %x0,%x1,%x1"		: "=&wa" (actual_hi) : "wa" (v));
+  __asm__ ("xxpermdi %x0,%x1,%x1,3"	: "=&wa" (actual_lo) : "wa" (v));
+
+#ifdef DEBUG
+  if ((actual_hi == expect_hi) && (actual_lo == expect_lo))
+    pass_fail = ", pass";
+  else
+    {
+      pass_fail = ", fail";
+      errors++;
+    }
+
+  printf ("Expected %" FMT ", %" FMT ", got %" FMT ", %" FMT "%s\n",
+	  expect_hi, expect_lo,
+	  actual_hi, actual_lo,
+	  pass_fail);
+#else
+  if ((actual_hi != expect_hi) || (actual_lo != expect_lo))
+    abort ();
+#endif
+}
+
+__attribute__((__noinline__))
+static vector TYPE
+combine (TYPE op0, TYPE op1)
+{
+  return (vector TYPE) { op0, op1 };
+}
+
+__attribute__((__noinline__))
+static vector TYPE
+combine_insert (TYPE op0, TYPE op1)
+{
+  vector TYPE ret = (vector TYPE) { ZERO, ZERO };
+  ret = vec_insert (op0, ret, 0);
+  ret = vec_insert (op1, ret, 1);
+  return ret;
+}
+
+__attribute__((__noinline__))
+static vector TYPE
+concat_extract_00 (vector TYPE a, vector TYPE b)
+{
+  return (vector TYPE) { vec_extract (a, 0), vec_extract (b, 0) };
+}
+
+__attribute__((__noinline__))
+static vector TYPE
+concat_extract_01 (vector TYPE a, vector TYPE b)
+{
+  return (vector TYPE) { vec_extract (a, 0), vec_extract (b, 1) };
+}
+
+__attribute__((__noinline__))
+static vector TYPE
+concat_extract_10 (vector TYPE a, vector TYPE b)
+{
+  return (vector TYPE) { vec_extract (a, 1), vec_extract (b, 0) };
+}
+
+__attribute__((__noinline__))
+static vector TYPE
+concat_extract_11 (vector TYPE a, vector TYPE b)
+{
+  return (vector TYPE) { vec_extract (a, 1), vec_extract (b, 1) };
+}
+
+__attribute__((__noinline__))
+static vector TYPE
+concat_extract2_0s (vector TYPE a, TYPE b)
+{
+  return (vector TYPE) { vec_extract (a, 0), b };
+}
+
+__attribute__((__noinline__))
+static vector TYPE
+concat_extract2_1s (vector TYPE a, TYPE b)
+{
+  return (vector TYPE) { vec_extract (a, 1), b };
+}
+
+__attribute__((__noinline__))
+static vector TYPE
+concat_extract2_s0 (TYPE a, vector TYPE b)
+{
+  return (vector TYPE) { a, vec_extract (b, 0) };
+}
+
+__attribute__((__noinline__))
+static vector TYPE
+concat_extract2_s1 (TYPE a, vector TYPE b)
+{
+  return (vector TYPE) { a, vec_extract (b, 1) };
+}
+
+__attribute__((__noinline__))
+static vector TYPE
+concat_extract_nn (vector TYPE a, vector TYPE b, size_t i, size_t j)
+{
+  return (vector TYPE) { vec_extract (a, i), vec_extract (b, j) };
+}
+
+#ifndef NO_ARRAY
+__attribute__((__noinline__))
+static vector TYPE
+array_0 (vector TYPE v, TYPE a)
+{
+  v[0] = a;
+  return v;
+}
+
+__attribute__((__noinline__))
+static vector TYPE
+array_1 (vector TYPE v, TYPE a)
+{
+  v[1] = a;
+  return v;
+}
+
+__attribute__((__noinline__))
+static vector TYPE
+array_01 (vector TYPE v, TYPE a, TYPE b)
+{
+  v[0] = a;
+  v[1] = b;
+  return v;
+}
+
+__attribute__((__noinline__))
+static vector TYPE
+array_01b (TYPE a, TYPE b)
+{
+  vector TYPE v = (vector TYPE) { 0, 0 };
+  v[0] = a;
+  v[1] = b;
+  return v;
+}
+#endif
+
+int
+main (void)
+{
+  vector TYPE a = (vector TYPE) { ONE,   TWO  };
+  vector TYPE b = (vector TYPE) { THREE, FOUR };
+  size_t i, j;
+
+#ifndef NO_ARRAY
+  vector TYPE z = (vector TYPE) { ZERO,  ZERO };
+#endif
+
+  DEBUG2 ("Endian: %s, type: %s\n", ENDIAN, STYPE);
+  DEBUG0 ("\nStatic/global initialization\n");
+  vector_check (s_v12, INIT_ORDER (1, 2));
+  vector_check (g_v34, INIT_ORDER (3, 4));
+
+  DEBUG0 ("\nVector via constant runtime intiialization\n");
+  vector_check (a, INIT_ORDER (1, 2));
+  vector_check (b, INIT_ORDER (3, 4));
+
+  DEBUG0 ("\nCombine scalars using vector initialization\n");
+  vector_check (combine (1, 2), INIT_ORDER (1, 2));
+  vector_check (combine (3, 4), INIT_ORDER (3, 4));
+
+  DEBUG0 ("\nSetup with vec_insert\n");
+  a = combine_insert (1, 2);
+  b = combine_insert (3, 4);
+  vector_check (a, ELEMENT_ORDER (1, 2));
+  vector_check (b, ELEMENT_ORDER (3, 4));
+
+#ifndef NO_ARRAY
+  DEBUG0 ("\nTesting array syntax\n");
+  vector_check (array_0   (a, FIVE),      ELEMENT_ORDER (5, 2));
+  vector_check (array_1   (b, SIX),       ELEMENT_ORDER (3, 6));
+  vector_check (array_01  (z, FIVE, SIX), ELEMENT_ORDER (5, 6));
+  vector_check (array_01b (FIVE, SIX),    ELEMENT_ORDER (5, 6));
+
+  vector_check (array_0   (a, five),      ELEMENT_ORDER (5, 2));
+  vector_check (array_1   (b, six),       ELEMENT_ORDER (3, 6));
+  vector_check (array_01  (z, five, six), ELEMENT_ORDER (5, 6));
+  vector_check (array_01b (five, six),    ELEMENT_ORDER (5, 6));
+#else
+  DEBUG0 ("\nSkipping array syntax on -maltivec=be\n");
+#endif
+
+  DEBUG0 ("\nTesting concat and extract\n");
+  vector_check (concat_extract_00 (a, b), INIT_ORDER (1, 3));
+  vector_check (concat_extract_01 (a, b), INIT_ORDER (1, 4));
+  vector_check (concat_extract_10 (a, b), INIT_ORDER (2, 3));
+  vector_check (concat_extract_11 (a, b), INIT_ORDER (2, 4));
+
+  DEBUG0 ("\nTesting concat and extract #2\n");
+  vector_check (concat_extract2_0s (a, FIVE), INIT_ORDER (1, 5));
+  vector_check (concat_extract2_1s (a, FIVE), INIT_ORDER (2, 5));
+  vector_check (concat_extract2_s0 (SIX, a),  INIT_ORDER (6, 1));
+  vector_check (concat_extract2_s1 (SIX, a),  INIT_ORDER (6, 2));
+
+  DEBUG0 ("\nTesting variable concat and extract\n");
+  for (i = 0; i < 2; i++)
+    {
+      for (j = 0; j < 2; j++)
+	{
+	  static struct {
+	    TYPE hi;
+	    TYPE lo;
+	  } hilo[2][2] =
+	      { { { ONE, THREE }, { ONE, FOUR } },
+		{ { TWO, THREE }, { TWO, FOUR } } };
+
+	  vector_check (concat_extract_nn (a, b, i, j),
+			INIT_ORDER (hilo[i][j].hi, hilo[i][j].lo));
+	}
+    }
+
+  DEBUG0 ("\nTesting separate function\n");
+  vector_check (combine (vec_extract (a, 0), vec_extract (b, 0)),
+		INIT_ORDER (1, 3));
+
+  vector_check (combine (vec_extract (a, 0), vec_extract (b, 1)),
+		INIT_ORDER (1, 4));
+
+  vector_check (combine (vec_extract (a, 1), vec_extract (b, 0)),
+		INIT_ORDER (2, 3));
+
+  vector_check (combine (vec_extract (a, 1), vec_extract (b, 1)),
+		INIT_ORDER (2, 4));
+
+  vector_check (combine_insert (vec_extract (a, 0), vec_extract (b, 0)),
+		ELEMENT_ORDER (1, 3));
+
+  vector_check (combine_insert (vec_extract (a, 0), vec_extract (b, 1)),
+		ELEMENT_ORDER (1, 4));
+
+  vector_check (combine_insert (vec_extract (a, 1), vec_extract (b, 0)),
+		ELEMENT_ORDER (2, 3));
+
+  vector_check (combine_insert (vec_extract (a, 1), vec_extract (b, 1)),
+		ELEMENT_ORDER (2, 4));
+
+
+#if defined(DO_DOUBLE)
+  DEBUG0 ("\nTesting explicit 2df concat\n");
+  vector_check (__builtin_vsx_concat_2df (FIVE, SIX), INIT_ORDER (5, 6));
+  vector_check (__builtin_vsx_concat_2df (five, six), INIT_ORDER (5, 6));
+
+#elif defined(_ARCH_PPC64)
+  DEBUG0 ("\nTesting explicit 2di concat\n");
+  vector_check (__builtin_vsx_concat_2di (FIVE, SIX), INIT_ORDER (5, 6));
+  vector_check (__builtin_vsx_concat_2di (five, six), INIT_ORDER (5, 6));
+
+#else
+  DEBUG0 ("\nSkip explicit 2di concat on 32-bit\n");
+#endif
+
+#ifdef DEBUG
+  if (errors)
+    printf ("\n%d error%s were found", errors, (errors == 1) ? "" : "s");
+  else
+    printf ("\nNo errors were found.\n");
+
+  return errors;
+
+#else
+  return 0;
+#endif
+}
Index: gcc/testsuite/gcc.target/powerpc/vec-setup-be-double.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vec-setup-be-double.c	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc)	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vec-setup-be-double.c	(.../gcc/testsuite/gcc.target/powerpc)	(revision 250878)
@@ -0,0 +1,12 @@
+/* { dg-do run { target { powerpc*-*-linux* } } } */
+/* { dg-require-effective-target vsx_hw } */
+/* { dg-options "-O2 -mvsx" } */
+
+/* Test various ways of creating vectors with 2 double words and accessing the
+   elements.  This test uses the double datatype.
+
+   This test explicitly tests -maltivec=be to make sure things are correct.  */
+
+#define DO_DOUBLE
+
+#include "vec-setup.h"
Index: gcc/testsuite/gcc.target/powerpc/vec-setup-double.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vec-setup-double.c	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc)	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vec-setup-double.c	(.../gcc/testsuite/gcc.target/powerpc)	(revision 250878)
@@ -0,0 +1,11 @@
+/* { dg-do run { target { powerpc*-*-linux* } } } */
+/* { dg-require-effective-target vsx_hw } */
+/* { dg-options "-O2 -mvsx" } */
+
+/* Test various ways of creating vectors with 2 double words and accessing the
+   elements.  This test uses the double datatype and the default endian
+   order.  */
+
+#define DO_DOUBLE
+
+#include "vec-setup.h"
Index: gcc/testsuite/gcc.target/powerpc/vec-setup-long.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vec-setup-long.c	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc)	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vec-setup-long.c	(.../gcc/testsuite/gcc.target/powerpc)	(revision 250878)
@@ -0,0 +1,9 @@
+/* { dg-do run { target { powerpc*-*-linux* } } } */
+/* { dg-require-effective-target vsx_hw } */
+/* { dg-options "-O2 -mvsx" } */
+
+/* Test various ways of creating vectors with 2 double words and accessing the
+   elements.  This test uses the long (on 64-bit systems) or long long datatype
+   (on 32-bit systems).  The default endian order is used.  */
+
+#include "vec-setup.h"
Index: gcc/testsuite/gcc.target/powerpc/vsx-extract-6.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vsx-extract-6.c	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc)	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vsx-extract-6.c	(.../gcc/testsuite/gcc.target/powerpc)	(revision 250858)
@@ -0,0 +1,25 @@
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mvsx" } */
+
+vector unsigned long
+test_vpasted (vector unsigned long high, vector unsigned long low)
+{
+  vector unsigned long res;
+  res[1] = high[1];
+  res[0] = low[0];
+  return res;
+}
+
+/* { dg-final { scan-assembler-times {\mxxpermdi\M} 1    } } */
+/* { dg-final { scan-assembler-not   {\mvspltisw\M}      } } */
+/* { dg-final { scan-assembler-not   {\mxxlor\M}         } } */
+/* { dg-final { scan-assembler-not   {\mxxlxor\M}        } } */
+/* { dg-final { scan-assembler-not   {\mxxspltib\M}      } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}         } } */
+/* { dg-final { scan-assembler-not   {\mlxv[dw][24]x\M}  } } */
+/* { dg-final { scan-assembler-not   {\mlvx\M}           } } */
+/* { dg-final { scan-assembler-not   {\mstxvx?\M}        } } */
+/* { dg-final { scan-assembler-not   {\mstxv[dw][24]x\M} } } */
+/* { dg-final { scan-assembler-not   {\mstvx\M}          } } */
Index: gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc)	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c	(.../gcc/testsuite/gcc.target/powerpc)	(revision 250858)
@@ -0,0 +1,25 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mvsx" } */
+
+vector double
+test_vpasted (vector double high, vector double low)
+{
+  vector double res;
+  res[1] = high[1];
+  res[0] = low[0];
+  return res;
+}
+
+/* { dg-final { scan-assembler-times {\mxxpermdi\M} 1    } } */
+/* { dg-final { scan-assembler-not   {\mvspltisw\M}      } } */
+/* { dg-final { scan-assembler-not   {\mxxlor\M}         } } */
+/* { dg-final { scan-assembler-not   {\mxxlxor\M}        } } */
+/* { dg-final { scan-assembler-not   {\mxxspltib\M}      } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}         } } */
+/* { dg-final { scan-assembler-not   {\mlxv[dw][24]x\M}  } } */
+/* { dg-final { scan-assembler-not   {\mlvx\M}           } } */
+/* { dg-final { scan-assembler-not   {\mstxvx?\M}        } } */
+/* { dg-final { scan-assembler-not   {\mstxv[dw][24]x\M} } } */
+/* { dg-final { scan-assembler-not   {\mstvx\M}          } } */

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts
  2017-08-07 13:18       ` Michael Meissner
@ 2017-08-07 21:54         ` Segher Boessenkool
  0 siblings, 0 replies; 16+ messages in thread
From: Segher Boessenkool @ 2017-08-07 21:54 UTC (permalink / raw)
  To: Michael Meissner, GCC Patches, David Edelsohn, Bill Schmidt

On Mon, Aug 07, 2017 at 09:18:30AM -0400, Michael Meissner wrote:
> > I don't like using NULL as a magic value at all; it does not simplify
> > this interface, it complicates it instead.
> > 
> > Can you move the "which half is high" decision to the callers?
> 
> I rewrote the patch to eliminate the rs6000_output_xxpermdi function, and do
> the calculation of the XXPERMDI mask in each of the vsx_concat_<mask>_{1,2,3}
> insns.  Just to be sure I got things correct, I wrote a new executable test
> that tests various methods of creating/inserting 2 element vectors with double
> word elements, and tested in BE, LE -maltivec=be, and LE, and the results match
> previous compilers.
> 
> I have done bootstrap/build checks on a big endian power7, a little endian
> power8 system, and I have done a non-bootstrap/check on a power9 prototype (I
> have script issues that prevents a bootstrap build on power9 that I need to
> look into).  There are no regressions in the tests and the new tests were run
> on each of the systems.  Can I check this into the trunk?
> 
> I would also like to backport it to all open branches (particularly GCC 7, but
> GCC 6 if possible).  Note, the patch will need a slight tweak on the older
> systems due to GCC 7 still supporting -mupper-regs-{df,di} and I have to adjust
> the constraints to accomidate this, and under GCC 6 DImode not being allowed in
> traditional Altivec registers.

Thanks!  Okay for trunk.  The 7 branch is frozen; okay for 7 after the
release, and for 6 too.


Segher

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2017-08-07 21:54 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-27 23:21 [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts Michael Meissner
2017-07-28  7:51 ` Richard Biener
2017-07-28  8:02   ` Andrew Pinski
2017-07-28  8:20     ` Richard Biener
2017-07-28  8:42     ` Marc Glisse
2017-07-28 14:44   ` Michael Meissner
2017-07-28 21:09 ` Segher Boessenkool
2017-07-30 14:01   ` Bill Schmidt
2017-07-31 17:41     ` Michael Meissner
2017-07-31 17:42       ` Bill Schmidt
2017-07-31 17:38   ` Michael Meissner
2017-08-02 14:29   ` Michael Meissner
2017-08-03 15:01     ` Segher Boessenkool
2017-08-03 17:23       ` Michael Meissner
2017-08-07 13:18       ` Michael Meissner
2017-08-07 21:54         ` Segher Boessenkool

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).