[PATCH] Make xxsplti*, xpermx, xxeval be vecperm type.

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] Make xxsplti*, xpermx, xxeval be vecperm type.
@ 2021-08-25 16:37 Michael Meissner
  2021-08-25 17:44 ` Segher Boessenkool
  0 siblings, 1 reply; 5+ messages in thread
From: Michael Meissner @ 2021-08-25 16:37 UTC (permalink / raw)
  To: gcc-patches, Michael Meissner, Segher Boessenkool,
	David Edelsohn, Bill Schmidt, Peter Bergner, Will Schmidt

Make xxsplti*, xpermx, xxeval be vecperm type.

I noticed that the built-functions for xxspltiw, xxspltidp, xxsplti32dx,
xxpermx, and xxeval all used the 'vecsimple' type.  These instructions are
permute instructions (3 cycle latency) and should use 'vecperm' instead.

While I was at it, I changed the UNSPEC name for xxspltidp to be
UNSPEC_XXSPLTIDP instead of UNSPEC_XXSPLTID.

I've built bootstrap compilers with this fix applied on little endian power9,
little endian power10, and big endian power8.  Can I check this patch into the
master branch.

I would also like to check this patch into the gcc-11 branch after a suitable
period of time.  Can I do this also?  Note, the patch for gcc-11 will be
different because the patch to move the xx* built-ins from altivec.md to vsx.md
has not been back ported to gcc 11.

2021-08-24  Michael Meissner  <meissner@linux.ibm.com>

gcc/
	* config/rs6000/vsx.md (UNSPEC_XXSPLTIDP): Rename from
	UNSPEC_XXSPLTID.
	(xxspltiw_v4si): Use vecperm type attribute.
	(xxspltiw_v4si_inst): Use vecperm type attribute.
	(xxspltiw_v4sf_inst): Likewise.
	(xxspltidp_v2df): Use vecperm type attribute.  Use
	UUNSPEC_XXSPLTIDP instead of UNSPEC_XXSPLTID.
	(xxspltidp_v2df_inst): Likewise.
	(xxsplti32dx_v4si): Use vecperm type attribute.
	(xxsplti32dx_v4si_inst): Likewise.
	(xxsplti32dx_v4sf_inst): Likewise.
	(xxblend_<mode>): Likewise.
	(xxpermx): Likewise.
	(xxpermx_inst): Likewise.
	(xxeval): Likewise.
---
 gcc/config/rs6000/vsx.md | 26 +++++++++++++-------------
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index e4ca6e94d49..bf033e31c1c 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -374,7 +374,7 @@ (define_c_enum "unspec"
    UNSPEC_VDIVEU
    UNSPEC_XXEVAL
    UNSPEC_XXSPLTIW
-   UNSPEC_XXSPLTID
+   UNSPEC_XXSPLTIDP
    UNSPEC_XXSPLTI32DX
    UNSPEC_XXBLEND
    UNSPEC_XXPERMX
@@ -6414,7 +6414,7 @@ (define_insn "xxspltiw_v4si"
 		     UNSPEC_XXSPLTIW))]
  "TARGET_POWER10"
  "xxspltiw %x0,%1"
- [(set_attr "type" "vecsimple")
+ [(set_attr "type" "vecperm")
   (set_attr "prefixed" "yes")])
 
 (define_expand "xxspltiw_v4sf"
@@ -6434,14 +6434,14 @@ (define_insn "xxspltiw_v4sf_inst"
 		     UNSPEC_XXSPLTIW))]
  "TARGET_POWER10"
  "xxspltiw %x0,%1"
- [(set_attr "type" "vecsimple")
+ [(set_attr "type" "vecperm")
   (set_attr "prefixed" "yes")])
 
 ;; XXSPLTIDP built-in function support
 (define_expand "xxspltidp_v2df"
   [(set (match_operand:V2DF 0 "register_operand" )
 	(unspec:V2DF [(match_operand:SF 1 "const_double_operand")]
-		     UNSPEC_XXSPLTID))]
+		     UNSPEC_XXSPLTIDP))]
  "TARGET_POWER10"
 {
   long value = rs6000_const_f32_to_i32 (operands[1]);
@@ -6452,10 +6452,10 @@ (define_expand "xxspltidp_v2df"
 (define_insn "xxspltidp_v2df_inst"
   [(set (match_operand:V2DF 0 "register_operand" "=wa")
 	(unspec:V2DF [(match_operand:SI 1 "c32bit_cint_operand" "n")]
-		     UNSPEC_XXSPLTID))]
+		     UNSPEC_XXSPLTIDP))]
   "TARGET_POWER10"
   "xxspltidp %x0,%1"
-  [(set_attr "type" "vecsimple")
+  [(set_attr "type" "vecperm")
    (set_attr "prefixed" "yes")])
 
 ;; XXSPLTI32DX built-in function support
@@ -6476,7 +6476,7 @@ (define_expand "xxsplti32dx_v4si"
 					 GEN_INT (index), operands[3]));
    DONE;
 }
- [(set_attr "type" "vecsimple")])
+ [(set_attr "type" "vecperm")])
 
 (define_insn "xxsplti32dx_v4si_inst"
   [(set (match_operand:V4SI 0 "register_operand" "=wa")
@@ -6486,7 +6486,7 @@ (define_insn "xxsplti32dx_v4si_inst"
 		     UNSPEC_XXSPLTI32DX))]
   "TARGET_POWER10"
   "xxsplti32dx %x0,%2,%3"
-  [(set_attr "type" "vecsimple")
+  [(set_attr "type" "vecperm")
    (set_attr "prefixed" "yes")])
 
 (define_expand "xxsplti32dx_v4sf"
@@ -6515,7 +6515,7 @@ (define_insn "xxsplti32dx_v4sf_inst"
 		     UNSPEC_XXSPLTI32DX))]
   "TARGET_POWER10"
   "xxsplti32dx %x0,%2,%3"
-  [(set_attr "type" "vecsimple")
+  [(set_attr "type" "vecperm")
    (set_attr "prefixed" "yes")])
 
 ;; XXBLEND built-in function support
@@ -6527,7 +6527,7 @@ (define_insn "xxblend_<mode>"
 		    UNSPEC_XXBLEND))]
   "TARGET_POWER10"
   "xxblendv<VM3_char> %x0,%x1,%x2,%x3"
-  [(set_attr "type" "vecsimple")
+  [(set_attr "type" "vecperm")
    (set_attr "prefixed" "yes")])
 
 ;; XXPERMX built-in function support
@@ -6562,7 +6562,7 @@ (define_expand "xxpermx"
 
   DONE;
 }
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecperm")])
 
 (define_insn "xxpermx_inst"
   [(set (match_operand:V2DI 0 "register_operand" "+v")
@@ -6573,7 +6573,7 @@ (define_insn "xxpermx_inst"
 		     UNSPEC_XXPERMX))]
   "TARGET_POWER10"
   "xxpermx %x0,%x1,%x2,%x3,%4"
-  [(set_attr "type" "vecsimple")
+  [(set_attr "type" "vecperm")
    (set_attr "prefixed" "yes")])
 
 ;; XXEVAL built-in function support
@@ -6586,6 +6586,6 @@ (define_insn "xxeval"
 		     UNSPEC_XXEVAL))]
    "TARGET_POWER10"
    "xxeval %0,%1,%2,%3,%4"
-   [(set_attr "type" "vecsimple")
+   [(set_attr "type" "vecperm")
     (set_attr "prefixed" "yes")])
 
-- 
2.31.1


-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Make xxsplti*, xpermx, xxeval be vecperm type.
  2021-08-25 16:37 [PATCH] Make xxsplti*, xpermx, xxeval be vecperm type Michael Meissner
@ 2021-08-25 17:44 ` Segher Boessenkool
  2021-08-25 18:22   ` Michael Meissner
  0 siblings, 1 reply; 5+ messages in thread
From: Segher Boessenkool @ 2021-08-25 17:44 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, David Edelsohn, Bill Schmidt,
	Peter Bergner, Will Schmidt

Hi Mike,

On Wed, Aug 25, 2021 at 12:37:14PM -0400, Michael Meissner wrote:
> I noticed that the built-functions for xxspltiw, xxspltidp, xxsplti32dx,
> xxpermx, and xxeval all used the 'vecsimple' type.  These instructions are
> permute instructions (3 cycle latency) and should use 'vecperm' instead.

They are all executed on the PM pipe currently, yup.  If this changes
later we'll have to fix it, but that is for then :-)

> While I was at it, I changed the UNSPEC name for xxspltidp to be
> UNSPEC_XXSPLTIDP instead of UNSPEC_XXSPLTID.

In the future please do separate things as separate patches.

> 	* config/rs6000/vsx.md (UNSPEC_XXSPLTIDP): Rename from
> 	UNSPEC_XXSPLTID.

	* config/rs6000/vsx.md (UNSPEC_XXSPLTID): Rename to...
	(UNSPEC_XXSPLTIDP): ... this.

> 	(xxspltidp_v2df): Use vecperm type attribute.  Use
> 	UUNSPEC_XXSPLTIDP instead of UNSPEC_XXSPLTID.

Typo ("UU").

Okay for trunk with those trivial fixes.  Also okay for backport to 11,
it is trivial enough.  Thanks!

Out of interest, did you notice any scheduling differences with this?


Segher

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Make xxsplti*, xpermx, xxeval be vecperm type.
  2021-08-25 17:44 ` Segher Boessenkool
@ 2021-08-25 18:22   ` Michael Meissner
  2021-08-25 22:29     ` Segher Boessenkool
  0 siblings, 1 reply; 5+ messages in thread
From: Michael Meissner @ 2021-08-25 18:22 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Michael Meissner, gcc-patches, David Edelsohn, Bill Schmidt,
	Peter Bergner, Will Schmidt

On Wed, Aug 25, 2021 at 12:44:16PM -0500, Segher Boessenkool wrote:
> Hi Mike,
> 
> On Wed, Aug 25, 2021 at 12:37:14PM -0400, Michael Meissner wrote:
> > I noticed that the built-functions for xxspltiw, xxspltidp, xxsplti32dx,
> > xxpermx, and xxeval all used the 'vecsimple' type.  These instructions are
> > permute instructions (3 cycle latency) and should use 'vecperm' instead.
> 
> They are all executed on the PM pipe currently, yup.  If this changes
> later we'll have to fix it, but that is for then :-)
> 
> > While I was at it, I changed the UNSPEC name for xxspltidp to be
> > UNSPEC_XXSPLTIDP instead of UNSPEC_XXSPLTID.
> 
> In the future please do separate things as separate patches.
> 
> > 	* config/rs6000/vsx.md (UNSPEC_XXSPLTIDP): Rename from
> > 	UNSPEC_XXSPLTID.
> 
> 	* config/rs6000/vsx.md (UNSPEC_XXSPLTID): Rename to...
> 	(UNSPEC_XXSPLTIDP): ... this.
> 
> > 	(xxspltidp_v2df): Use vecperm type attribute.  Use
> > 	UUNSPEC_XXSPLTIDP instead of UNSPEC_XXSPLTID.
> 
> Typo ("UU").
> 
> Okay for trunk with those trivial fixes.  Also okay for backport to 11,
> it is trivial enough.  Thanks!

Thanks.

> Out of interest, did you notice any scheduling differences with this?

I don't use the built-ins so I wouldn't notice a difference.  I noticed this as
part of the next patch to add support for XXSPLTIDP (and ultimately XXSPLTIW in
a future patch).  The XXSPLTIDP instruction allows loading up many SFmode,
DFmode, and V2DFmode constants.  The XXSPLTIW instruction allows loading up
certain V16QImode, V8HImode, V4SImode, and V4SFmode constants.

I'm trying to iron out the slow-downs, and I wanted the scheduler to know it
needed to add insns between the XXSPLTIDP to load the constant and its use if
it can.  Right now, I'm seeing a slight boost in blender_r with XXSPLTIDP (over
doing a load).  However, I suspect if you aren't running spec on an otherwise
idle machine, things will change where XXSPLTIDP will be more of a win by
eliminating the loads.

While XXSPLTIDP by itself is positive, unfortunately, there is a regression in
cactuBSSN_r (3%) when I add XXSPLTIW (but not XXSPLTIDP) that I'm trying to
track down.

If I add both instructions, several of the benchmarks improve (including
xalancbmk by 11% and x264_r by 27%), but cactuBSSN_r has the 3% regression and
fotonik3d_r also has a new 3% regression.

Given that many more programs use floating point constants than vector
constants (66,000 XXSPLTID's created vs. 5,000 XXSPLTIW's), I figure to push
the XXSPLTIDP now, and try to figure out the differences before submitting the
XXSPLTIW patch.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Make xxsplti*, xpermx, xxeval be vecperm type.
  2021-08-25 18:22   ` Michael Meissner
@ 2021-08-25 22:29     ` Segher Boessenkool
  2021-08-25 22:42       ` Michael Meissner
  0 siblings, 1 reply; 5+ messages in thread
From: Segher Boessenkool @ 2021-08-25 22:29 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, David Edelsohn, Bill Schmidt,
	Peter Bergner, Will Schmidt

On Wed, Aug 25, 2021 at 02:22:06PM -0400, Michael Meissner wrote:
> On Wed, Aug 25, 2021 at 12:44:16PM -0500, Segher Boessenkool wrote:
> > Out of interest, did you notice any scheduling differences with this?
> 
> I don't use the built-ins so I wouldn't notice a difference.  I noticed this as
> part of the next patch to add support for XXSPLTIDP (and ultimately XXSPLTIW in
> a future patch).  The XXSPLTIDP instruction allows loading up many SFmode,
> DFmode, and V2DFmode constants.  The XXSPLTIW instruction allows loading up
> certain V16QImode, V8HImode, V4SImode, and V4SFmode constants.

Yeah, you might notice scheduling differences when "normal" code starts
using this.  Builtins are meh :-)

> However, I suspect if you aren't running spec on an otherwise
> idle machine, things will change where XXSPLTIDP will be more of a win by
> eliminating the loads.

It has more bandwidth and it is less contested, yup.  It does have the
usual "prefixed" gotchas though.

> While XXSPLTIDP by itself is positive, unfortunately, there is a regression in
> cactuBSSN_r (3%) when I add XXSPLTIW (but not XXSPLTIDP) that I'm trying to
> track down.
> 
> If I add both instructions, several of the benchmarks improve (including
> xalancbmk by 11% and x264_r by 27%), but cactuBSSN_r has the 3% regression and
> fotonik3d_r also has a new 3% regression.
> 
> Given that many more programs use floating point constants than vector
> constants (66,000 XXSPLTID's created vs. 5,000 XXSPLTIW's), I figure to push
> the XXSPLTIDP now, and try to figure out the differences before submitting the
> XXSPLTIW patch.

It would be interesting to figure out a pattern behind the regressions :-)


Segher

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Make xxsplti*, xpermx, xxeval be vecperm type.
  2021-08-25 22:29     ` Segher Boessenkool
@ 2021-08-25 22:42       ` Michael Meissner
  0 siblings, 0 replies; 5+ messages in thread
From: Michael Meissner @ 2021-08-25 22:42 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Michael Meissner, gcc-patches, David Edelsohn, Bill Schmidt,
	Peter Bergner, Will Schmidt

On Wed, Aug 25, 2021 at 05:29:22PM -0500, Segher Boessenkool wrote:
> On Wed, Aug 25, 2021 at 02:22:06PM -0400, Michael Meissner wrote:
> > On Wed, Aug 25, 2021 at 12:44:16PM -0500, Segher Boessenkool wrote:
> > > Out of interest, did you notice any scheduling differences with this?
> > 
> > I don't use the built-ins so I wouldn't notice a difference.  I noticed this as
> > part of the next patch to add support for XXSPLTIDP (and ultimately XXSPLTIW in
> > a future patch).  The XXSPLTIDP instruction allows loading up many SFmode,
> > DFmode, and V2DFmode constants.  The XXSPLTIW instruction allows loading up
> > certain V16QImode, V8HImode, V4SImode, and V4SFmode constants.
> 
> Yeah, you might notice scheduling differences when "normal" code starts
> using this.  Builtins are meh :-)

Yep, that's my thought.  It just noticed the vecsimple stuff as I was adding
the xxsplti{w,dp,32dx} support.

> > However, I suspect if you aren't running spec on an otherwise
> > idle machine, things will change where XXSPLTIDP will be more of a win by
> > eliminating the loads.
> 
> It has more bandwidth and it is less contested, yup.  It does have the
> usual "prefixed" gotchas though.

Note, it replaces one prefixed instruction (plfd) with another (xxspltidp).

> > While XXSPLTIDP by itself is positive, unfortunately, there is a regression in
> > cactuBSSN_r (3%) when I add XXSPLTIW (but not XXSPLTIDP) that I'm trying to
> > track down.
> > 
> > If I add both instructions, several of the benchmarks improve (including
> > xalancbmk by 11% and x264_r by 27%), but cactuBSSN_r has the 3% regression and
> > fotonik3d_r also has a new 3% regression.
> > 
> > Given that many more programs use floating point constants than vector
> > constants (66,000 XXSPLTID's created vs. 5,000 XXSPLTIW's), I figure to push
> > the XXSPLTIDP now, and try to figure out the differences before submitting the
> > XXSPLTIW patch.
> 
> It would be interesting to figure out a pattern behind the regressions :-)

Definately.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-08-25 22:42 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-25 16:37 [PATCH] Make xxsplti*, xpermx, xxeval be vecperm type Michael Meissner
2021-08-25 17:44 ` Segher Boessenkool
2021-08-25 18:22   ` Michael Meissner
2021-08-25 22:29     ` Segher Boessenkool
2021-08-25 22:42       ` Michael Meissner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).