public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH, rs6000] Prefer vspltisw/h over xxspltib+instruction when available
@ 2016-06-21 20:15 Bill Schmidt
  2016-06-21 22:34 ` Segher Boessenkool
  0 siblings, 1 reply; 6+ messages in thread
From: Bill Schmidt @ 2016-06-21 20:15 UTC (permalink / raw)
  To: GCC Patches; +Cc: Segher Boessenkool, David Edelsohn

Hi,

I discovered recently that, with -mcpu=power9, an attempt to generate a vspltish instruction resulted instead in an xxspltib followed by a vupkhsb.  This is semantically correct but the extra instruction is not optimal.  I found that there was some logic in xxspltib_constant_p to do special casing for const_vector with small constants, but not for vec_duplicate with small constants.  This patch duplicates that logic so we can generate the single instruction when possible.

When I did this, I ran into a problem with an existing test case.  We end up matching the *vsx_splat_v4si_internal pattern instead of falling back to the altivec_vspltisw pattern.  The constraints don't match for constant input.  To avoid this, I added a pattern ahead of this one that will match for VMX output registers and produce the vspltisw as desired.  This corrected the failing test and produces the expected code.

I've added a test case to demonstrate the code works properly now in the usual case.

Bootstrapped and tested on powerpc64le-unknown-linux-gnu.  OK for trunk, and for 6.2 after suitable burn-in?

Thanks!

Bill


[gcc]

2016-06-21  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* config/rs6000/rs6000.c (xxspltib_constant_p): Prefer vspltisw/h
	for vec_duplicate when this is cheaper.
	* config/rs6000/vsx.md (*vsx_splat_v4si_altivec): New define_insn.

[gcc/testsuite]

2016-06-21  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* gcc.target/powerpc/splat-p9-1.c: New test.


Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 237619)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -6329,6 +6329,13 @@ xxspltib_constant_p (rtx op,
       value = INTVAL (element);
       if (!IN_RANGE (value, -128, 127))
 	return false;
+
+      /* See if we could generate vspltisw/vspltish directly instead of
+	 xxspltib + sign extend.  Special case 0/-1 to allow getting
+         any VSX register instead of an Altivec register.  */
+      if (!IN_RANGE (value, -1, 0) && EASY_VECTOR_15 (value)
+	  && (mode == V4SImode || mode == V8HImode))
+	return false;
     }
 
   /* Handle (const_vector [...]).  */
Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md	(revision 237619)
+++ gcc/config/rs6000/vsx.md	(working copy)
@@ -2400,6 +2400,17 @@
     operands[1] = force_reg (<VS_scalar>mode, operands[1]);
 })
 
+;; The pattern following this one hides altivec_vspltisw, which we
+;; prefer to match when possible, so duplicate that here for
+;; TARGET_P9_VECTOR.
+(define_insn "*vsx_splat_v4si_altivec"
+  [(set (match_operand:V4SI 0 "altivec_register_operand" "=v")
+        (vec_duplicate:V4SI
+	 (match_operand:QI 1 "s5bit_cint_operand" "i")))]
+  "TARGET_P9_VECTOR"
+  "vspltisw %0,%1"
+  [(set_attr "type" "vecperm")])
+
 (define_insn "*vsx_splat_v4si_internal"
   [(set (match_operand:V4SI 0 "vsx_register_operand" "=wa,wa")
 	(vec_duplicate:V4SI
Index: gcc/testsuite/gcc.target/powerpc/splat-p9-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/splat-p9-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/splat-p9-1.c	(working copy)
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-maltivec -mcpu=power9" } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-final { scan-assembler "vspltish" } } */
+/* { dg-final { scan-assembler-not "xxspltib" } } */
+
+/* Make sure we don't use an inefficient sequence for small integer splat.  */
+
+#include <altivec.h>
+
+vector short
+foo ()
+{
+  return vec_splat_s16 (5);
+}



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH, rs6000] Prefer vspltisw/h over xxspltib+instruction when available
  2016-06-21 20:15 [PATCH, rs6000] Prefer vspltisw/h over xxspltib+instruction when available Bill Schmidt
@ 2016-06-21 22:34 ` Segher Boessenkool
  2016-06-21 23:47   ` Bill Schmidt
  0 siblings, 1 reply; 6+ messages in thread
From: Segher Boessenkool @ 2016-06-21 22:34 UTC (permalink / raw)
  To: Bill Schmidt; +Cc: GCC Patches, David Edelsohn

On Tue, Jun 21, 2016 at 03:14:51PM -0500, Bill Schmidt wrote:
> I discovered recently that, with -mcpu=power9, an attempt to generate a vspltish instruction resulted instead in an xxspltib followed by a vupkhsb.  This is semantically correct but the extra instruction is not optimal.  I found that there was some logic in xxspltib_constant_p to do special casing for const_vector with small constants, but not for vec_duplicate with small constants.  This patch duplicates that logic so we can generate the single instruction when possible.

This part is okay.

> When I did this, I ran into a problem with an existing test case.  We end up matching the *vsx_splat_v4si_internal pattern instead of falling back to the altivec_vspltisw pattern.  The constraints don't match for constant input.  To avoid this, I added a pattern ahead of this one that will match for VMX output registers and produce the vspltisw as desired.  This corrected the failing test and produces the expected code.

Why does the predicate allow constant input, while the constraints do not?

> I've added a test case to demonstrate the code works properly now in the usual case.

Thanks :-)


Segher

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH, rs6000] Prefer vspltisw/h over xxspltib+instruction when available
  2016-06-21 22:34 ` Segher Boessenkool
@ 2016-06-21 23:47   ` Bill Schmidt
  2016-06-22 14:22     ` Segher Boessenkool
  0 siblings, 1 reply; 6+ messages in thread
From: Bill Schmidt @ 2016-06-21 23:47 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: GCC Patches, David Edelsohn


> On Jun 21, 2016, at 5:34 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> On Tue, Jun 21, 2016 at 03:14:51PM -0500, Bill Schmidt wrote:
>> I discovered recently that, with -mcpu=power9, an attempt to generate a vspltish instruction resulted instead in an xxspltib followed by a vupkhsb.  This is semantically correct but the extra instruction is not optimal.  I found that there was some logic in xxspltib_constant_p to do special casing for const_vector with small constants, but not for vec_duplicate with small constants.  This patch duplicates that logic so we can generate the single instruction when possible.
> 
> This part is okay.
> 
>> When I did this, I ran into a problem with an existing test case.  We end up matching the *vsx_splat_v4si_internal pattern instead of falling back to the altivec_vspltisw pattern.  The constraints don't match for constant input.  To avoid this, I added a pattern ahead of this one that will match for VMX output registers and produce the vspltisw as desired.  This corrected the failing test and produces the expected code.
> 
> Why does the predicate allow constant input, while the constraints do not?

I have no idea why it was built that way.  The predicate seems to provide for all sorts of things, but this and the subsequent pattern both handle only a subset of the constraints implied by it.  To be honest, I didn't feel competent to try to fix the existing patterns.  Do you have any suggestions for what to do instead?

Thanks!
Bill

> 
>> I've added a test case to demonstrate the code works properly now in the usual case.
> 
> Thanks :-)
> 
> 
> Segher
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH, rs6000] Prefer vspltisw/h over xxspltib+instruction when available
  2016-06-21 23:47   ` Bill Schmidt
@ 2016-06-22 14:22     ` Segher Boessenkool
  2016-06-22 21:30       ` Michael Meissner
  0 siblings, 1 reply; 6+ messages in thread
From: Segher Boessenkool @ 2016-06-22 14:22 UTC (permalink / raw)
  To: Bill Schmidt; +Cc: GCC Patches, David Edelsohn

On Tue, Jun 21, 2016 at 06:46:57PM -0500, Bill Schmidt wrote:
> >> When I did this, I ran into a problem with an existing test case.  We end up matching the *vsx_splat_v4si_internal pattern instead of falling back to the altivec_vspltisw pattern.  The constraints don't match for constant input.  To avoid this, I added a pattern ahead of this one that will match for VMX output registers and produce the vspltisw as desired.  This corrected the failing test and produces the expected code.
> > 
> > Why does the predicate allow constant input, while the constraints do not?
> 
> I have no idea why it was built that way.  The predicate seems to provide for all sorts of things, but this and the subsequent pattern both handle only a subset of the constraints implied by it.  To be honest, I didn't feel competent to try to fix the existing patterns.  Do you have any suggestions for what to do instead?

Don't give up so easily?  ;-)

The predicate should be tightened, the expander should use a new predicate
that allows all those other things.  The hardest part is figuring a good
name for it ;-)


Segher

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH, rs6000] Prefer vspltisw/h over xxspltib+instruction when available
  2016-06-22 14:22     ` Segher Boessenkool
@ 2016-06-22 21:30       ` Michael Meissner
  2016-06-22 23:51         ` Segher Boessenkool
  0 siblings, 1 reply; 6+ messages in thread
From: Michael Meissner @ 2016-06-22 21:30 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Bill Schmidt, GCC Patches, David Edelsohn

[-- Attachment #1: Type: text/plain, Size: 1765 bytes --]

On Wed, Jun 22, 2016 at 09:22:22AM -0500, Segher Boessenkool wrote:
> Don't give up so easily?  ;-)
> 
> The predicate should be tightened, the expander should use a new predicate
> that allows all those other things.  The hardest part is figuring a good
> name for it ;-)

This code should fix the problem.  It does not allow constants in the
arguments.  Combine will create one of the vec_duplicate patterns with a
constant integer that will generate VSPLTIS<x> or XXSPLTIB/etc.  I also
tightened the memory requirements to only allow indexed memory forms
during/after register allocation, since the instruction only uses indexed
addressing.

I bootstrapped the compiler and ran make check with no regressions on a little
endian power8 system.  Can I check it into trunk, and after an appropriate
waiting period check it into GCC 6.x if there were no issues?

[gcc]
2016-06-22  Michael Meissner  <meissner@linux.vnet.ibm.com>
	    Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* config/rs6000/predicates.md (splat_input_operand): Rework.
	Don't allow constants, since the caller insns don't support
	constants.  During and after register allocation, only allow
	indexed or indirect addresses, and not general addresses.  Only
	allow modes supported by the hardware.
	* config/rs6000/rs6000.c (xxsplitb_constant_p): Update usage
	comment.  Move check for using VSPLTIS<x> to a common location,
	instead of doing it in two different places.

[gcc/testsuite]
2016-06-22  Michael Meissner  <meissner@linux.vnet.ibm.com>
	    Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* gcc.target/powerpc/p9-splat-5.c: New test.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

[-- Attachment #2: gcc-power9.patch132b --]
[-- Type: text/plain, Size: 4324 bytes --]

Index: gcc/config/rs6000/predicates.md
===================================================================
--- gcc/config/rs6000/predicates.md	(revision 237715)
+++ gcc/config/rs6000/predicates.md	(working copy)
@@ -1056,27 +1056,34 @@ (define_predicate "input_operand"
 
 ;; Return 1 if this operand is a valid input for a vsx_splat insn.
 (define_predicate "splat_input_operand"
-  (match_code "symbol_ref,const,reg,subreg,mem,
-	       const_double,const_wide_int,const_vector,const_int")
+  (match_code "reg,subreg,mem")
 {
+  machine_mode vmode;
+
+  if (mode == DFmode)
+    vmode = V2DFmode;
+  else if (mode == DImode)
+    vmode = V2DImode;
+  else if (mode == SImode && TARGET_P9_VECTOR)
+    vmode = V4SImode;
+  else if (mode == SFmode && TARGET_P9_VECTOR)
+    vmode = V4SFmode;
+  else
+    return false;
+
   if (MEM_P (op))
     {
+      rtx addr = XEXP (op, 0);
+
       if (! volatile_ok && MEM_VOLATILE_P (op))
 	return 0;
-      if (mode == DFmode)
-	mode = V2DFmode;
-      else if (mode == DImode)
-	mode = V2DImode;
-      else if (mode == SImode && TARGET_P9_VECTOR)
-	mode = V4SImode;
-      else if (mode == SFmode && TARGET_P9_VECTOR)
-	mode = V4SFmode;
+
+      if (reload_in_progress || lra_in_progress || reload_completed)
+	return indexed_or_indirect_address (addr, vmode);
       else
-	gcc_unreachable ();
-      return memory_address_addr_space_p (mode, XEXP (op, 0),
-					  MEM_ADDR_SPACE (op));
+	return memory_address_addr_space_p (vmode, addr, MEM_ADDR_SPACE (op));
     }
-  return input_operand (op, mode);
+  return gpc_reg_operand (op, mode);
 })
 
 ;; Return true if OP is a non-immediate operand and not an invalid
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 237715)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -6282,10 +6282,7 @@ gen_easy_altivec_constant (rtx op)
    Return the number of instructions needed (1 or 2) into the address pointed
    via NUM_INSNS_PTR.
 
-   If NOSPLIT_P, only return true for constants that only generate the XXSPLTIB
-   instruction and can go in any VSX register.  If !NOSPLIT_P, only return true
-   for constants that generate XXSPLTIB and need a sign extend operation, which
-   restricts us to the Altivec registers.
+   Return the constant that is being split via CONSTANT_PTR.
 
    Allow either (vec_const [...]) or (vec_duplicate <const>).  If OP is a valid
    XXSPLTIB constant, return the constant being set via the CONST_PTR
@@ -6355,13 +6352,6 @@ xxspltib_constant_p (rtx op,
 	  if (value != INTVAL (element))
 	    return false;
 	}
-
-      /* See if we could generate vspltisw/vspltish directly instead of
-	 xxspltib + sign extend.  Special case 0/-1 to allow getting
-         any VSX register instead of an Altivec register.  */
-      if (!IN_RANGE (value, -1, 0) && EASY_VECTOR_15 (value)
-	  && (mode == V4SImode || mode == V8HImode))
-	return false;
     }
 
   /* Handle integer constants being loaded into the upper part of the VSX
@@ -6389,6 +6379,13 @@ xxspltib_constant_p (rtx op,
   else
     return false;
 
+  /* See if we could generate vspltisw/vspltish directly instead of xxspltib +
+     sign extend.  Special case 0/-1 to allow getting any VSX register instead
+     of an Altivec register.  */
+  if ((mode == V4SImode || mode == V8HImode) && !IN_RANGE (value, -1, 0)
+      && EASY_VECTOR_15 (value))
+    return false;
+
   /* Return # of instructions and the constant byte for XXSPLTIB.  */
   if (mode == V16QImode)
     *num_insns_ptr = 1;
Index: gcc/testsuite/gcc.target/powerpc/p9-splat-5.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/p9-splat-5.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/p9-splat-5.c	(working copy)
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-mcpu=power9 -O2" } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-final { scan-assembler "vspltish" } } */
+/* { dg-final { scan-assembler-not "xxspltib" } } */
+
+/* Make sure we don't use an inefficient sequence for small integer splat.  */
+
+#include <altivec.h>
+
+vector short
+foo ()
+{
+  return vec_splat_s16 (5);
+}

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH, rs6000] Prefer vspltisw/h over xxspltib+instruction when available
  2016-06-22 21:30       ` Michael Meissner
@ 2016-06-22 23:51         ` Segher Boessenkool
  0 siblings, 0 replies; 6+ messages in thread
From: Segher Boessenkool @ 2016-06-22 23:51 UTC (permalink / raw)
  To: Michael Meissner, Bill Schmidt, GCC Patches, David Edelsohn

On Wed, Jun 22, 2016 at 05:29:59PM -0400, Michael Meissner wrote:
> This code should fix the problem.  It does not allow constants in the
> arguments.  Combine will create one of the vec_duplicate patterns with a
> constant integer that will generate VSPLTIS<x> or XXSPLTIB/etc.  I also
> tightened the memory requirements to only allow indexed memory forms
> during/after register allocation, since the instruction only uses indexed
> addressing.
> 
> I bootstrapped the compiler and ran make check with no regressions on a little
> endian power8 system.  Can I check it into trunk, and after an appropriate
> waiting period check it into GCC 6.x if there were no issues?

If this works, that is marvelous.  Okay for trunk and 6 later, thanks!

Some tiny things...

> 	* config/rs6000/predicates.md (splat_input_operand): Rework.
> 	Don't allow constants, since the caller insns don't support
> 	constants.  During and after register allocation, only allow
> 	indexed or indirect addresses, and not general addresses.  Only
> 	allow modes supported by the hardware.

"caller insns"?

> --- gcc/config/rs6000/rs6000.c	(revision 237715)
> +++ gcc/config/rs6000/rs6000.c	(working copy)
> @@ -6282,10 +6282,7 @@ gen_easy_altivec_constant (rtx op)
>     Return the number of instructions needed (1 or 2) into the address pointed
>     via NUM_INSNS_PTR.
>  
> -   If NOSPLIT_P, only return true for constants that only generate the XXSPLTIB
> -   instruction and can go in any VSX register.  If !NOSPLIT_P, only return true
> -   for constants that generate XXSPLTIB and need a sign extend operation, which
> -   restricts us to the Altivec registers.
> +   Return the constant that is being split via CONSTANT_PTR.
>  
>     Allow either (vec_const [...]) or (vec_duplicate <const>).  If OP is a valid
>     XXSPLTIB constant, return the constant being set via the CONST_PTR

The CONST_PTR in that last line here is a misspelling of CONSTANT_PTR;
this last part should be removed I think?


Segher

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-06-22 23:51 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-21 20:15 [PATCH, rs6000] Prefer vspltisw/h over xxspltib+instruction when available Bill Schmidt
2016-06-21 22:34 ` Segher Boessenkool
2016-06-21 23:47   ` Bill Schmidt
2016-06-22 14:22     ` Segher Boessenkool
2016-06-22 21:30       ` Michael Meissner
2016-06-22 23:51         ` Segher Boessenkool

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).