[PATCH, 0/3] Disable generating store vector pair.

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH, 0/3] Disable generating store vector pair.
@ 2022-06-07  0:53 Michael Meissner
  2022-06-07  0:55 ` [PATCH 1/3] " Michael Meissner
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Michael Meissner @ 2022-06-07  0:53 UTC (permalink / raw)
  To: gcc-patches, Michael Meissner, Segher Boessenkool, Kewen.Lin,
	David Edelsohn, Peter Bergner, Will Schmidt

[PATCH 0/3] Disable generating store vector pair.

Testing has revealed that the power10 has some slowdowns if the store vector
pair instruction is generated in some cases.  This patch disables generating
the store vector pair instructions (stxvp, pstxvp, and stxvpx) unless an
undocumented switch (-mstore-vector-pair) is used.  It is anticipated that
perhaps with future machines we can generate the store vector pair instruction.

This patch does a split after reload to convert a store vector pair
instruction into a pair of store vector instructions.

We do continue to generate the load vector pair instructions (lxvp, plxvp,
and lxvpx), since we have found that in code that heavily uses MMA, it is
still a win to generate the load vector pair instructions.

There are 3 patches in this set:

    1)	Disable the generation of the stxvp, stxvpx, and pstxvp instructions
	for stores of OOmode and XOmodes.

    2)	Disable block moves from generating load/store vector pair
        instructions unless the the store vector pair instructions are
        being generted.  With patch #1 installed, the block move code will
        generate a load vector pair and store vector pair combination, but
        after reload, the store vector pair instructions are split into two
        separate store vector instructions.

    2)  Fix up the mma test suite to deal with store vector pair not being
	generated by default.  In most of the tests, I just deleted the lines
	that counted the store vector pair instructions.  In a few of the
	tests, I explicitly passed the -mstore-vector-pair instruction since
	the point of the test was to generate store vector pair instructions.

There is a 4th patch that Peter Bergner will be developing.  This patch will
update the built-in functions for load and store vector pair, so that these
built-ins will always generate the lxvp and stxvp instructions.

I have built bootstrap compilers and run the regression tests on three
different systems:

    1)	Little endian power10 using the --with-cpu=power10 option.

    2)	Little endian power9 using the --with-cpu=power9 option.

    3)	Big endian power8 using the --with-cpu=power8 option.  On this system,
	both 64-bit and 32-bit code generation was tested.

Once all 3 patches have been applied, there are no regressions in the runs.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/3] Disable generating store vector pair.
  2022-06-07  0:53 [PATCH, 0/3] Disable generating store vector pair Michael Meissner
@ 2022-06-07  0:55 ` Michael Meissner
  2022-06-07 20:28   ` will schmidt
  2022-06-07 21:17   ` Peter Bergner
  2022-06-07  0:55 ` [PATCH 2/3] Disable generating load/store vector pairs for block copies Michael Meissner
  2022-06-07  0:56 ` [PATCH 3/3] Adjust MMA tests to account for no store vector pair Michael Meissner
  2 siblings, 2 replies; 14+ messages in thread
From: Michael Meissner @ 2022-06-07  0:55 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool, Kewen.Lin,
	David Edelsohn, Peter Bergner, Will Schmidt

[PATCH 1/3] Disable generating store vector pair.

Testing has revealed that the power10 has some slowdowns if the store
vector pair instruction is generated in some cases.  This patch disables
generating the store vector pair instructions (stxvp, pstxvp, and stxvpx)
unless an undocumented switch is used.  It is anticipated that perhaps
with future machines we can generate the store vector pair instruction.

This patch does a split after reload to convert a store vector pair
instruction into a pair of store vector instructions.

We do continue to generate the load vector pair instructions (lxvp, plxvp,
and lxvpx), since we have found that in code that heavily uses MMA, it is
still a win to generate the load vector pair instructions.

There are two future patches planed:

    1)	Disable block moves from generating load/store vector pair
	instructions unless the the store vector pair instructions are
	being generted.

    2)	Make the built-in functions for generating store vector pair
	always generate those instructions even if store vector pair
	instructions are disabled.

I have built bootstrap compilers and run the regression tests on three
different systems:

    1)	Little endian power10 using the --with-cpu=power10 option.

    2)	Little endian power9 using the --with-cpu=power9 option.

    3)	Big endian power8 using the --with-cpu=power8 option.  On this system,
	both 64-bit and 32-bit code generation was tested.

There were no regressions in the runs except for the tests that are
modified in patch #3 in these series of patches.  Can I check this patch
into the trunk?  If there are no changes needed for the backports, can I
check this code into the active branches after a burn-in period?

2022-06-06   Michael Meissner  <meissner@linux.ibm.com>

gcc/

	* config/rs6000/mma.md (movoo): Disable generating store vector
	pair instructions unless these are enabled by the user.
	(movxo): Likewise.
	* config/rs6000/rs6000.cc (rs6000_setup_reg_addr_masks): If store
	vector pair instructions are disabled, do not allow vector pair
	addresses to be indexed.
	(rs6000_split_multireg_move): Do not split XOmode stores into two
	store vector pair instructions unless store vector pair
	instructions are enabled.
	* config/rs6000/rs6000.md (isa attribute): Add stxvp attribute.
	(enabled attribute): Disable alternative using store vector pair
	instructions unless they are enabled.
	* config/rs6000/rs6000.opt (-mstore-vector-pair): New option.

gcc/testsuite/

	* gcc.target/powerpc/p10-store-vector-pair-1.c: New test.
	* gcc.target/powerpc/p10-store-vector-pair-2.c: New test.
---
 gcc/config/rs6000/mma.md                      | 41 ++++++----
 gcc/config/rs6000/rs6000.cc                   |  9 +-
 gcc/config/rs6000/rs6000.md                   |  8 +-
 gcc/config/rs6000/rs6000.opt                  |  4 +
 .../powerpc/p10-store-vector-pair-1.c         | 82 +++++++++++++++++++
 .../powerpc/p10-store-vector-pair-2.c         | 81 ++++++++++++++++++
 6 files changed, 206 insertions(+), 19 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/p10-store-vector-pair-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/p10-store-vector-pair-2.c

diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
index a183b6a168a..9b5f243b88d 100644
--- a/gcc/config/rs6000/mma.md
+++ b/gcc/config/rs6000/mma.md
@@ -274,26 +274,35 @@ (define_expand "movoo"
   DONE;
 })
 
+;; By default for power10, do not generate the stxvp/pstxvp/stxvpx
+;; instructions.  Instead, split these instructions into two separate store
+;; vector instructions.  We do always generate a lxvp/plxvp/lxvpx instruction.
+;; We leave in the support for generating stxvp/pstxvp/stxvpx in future
+;; machines.
 (define_insn_and_split "*movoo"
-  [(set (match_operand:OO 0 "nonimmediate_operand" "=wa,m,wa")
-	(match_operand:OO 1 "input_operand" "m,wa,wa"))]
+  [(set (match_operand:OO 0 "nonimmediate_operand" "=wa,m, o, wa")
+        (match_operand:OO 1 "input_operand"         "m, wa,wa,wa"))]
   "TARGET_MMA
    && (gpc_reg_operand (operands[0], OOmode)
        || gpc_reg_operand (operands[1], OOmode))"
   "@
    lxvp%X1 %x0,%1
    stxvp%X0 %x1,%0
+   #
    #"
   "&& reload_completed
-   && (!MEM_P (operands[0]) && !MEM_P (operands[1]))"
+   && ((MEM_P (operands[0]) && !TARGET_STORE_VECTOR_PAIR)
+       || (!MEM_P (operands[0]) && !MEM_P (operands[1])))"
   [(const_int 0)]
 {
   rs6000_split_multireg_move (operands[0], operands[1]);
   DONE;
 }
-  [(set_attr "type" "vecload,vecstore,veclogical")
-   (set_attr "size" "256")
-   (set_attr "length" "*,*,8")])
+  [(set_attr "type"               "vecload,vecstore,vecstore,veclogical")
+   (set_attr "max_prefixed_insns" "*,      *,       2,       *")
+   (set_attr "length"             "*,      *,       *,       8")
+   (set_attr "isa"                "*,      stxvp,   *,       *")])
+   (set_attr "size"               "256")
 
 \f
 ;; Vector quad support.  XOmode can only live in FPRs.
@@ -306,25 +315,27 @@ (define_expand "movxo"
   DONE;
 })
 
+;; By default for power10, do not generate two stxvp/pstxvp instructions.
+;; Instead, split these instructions into four separate store vector
+;; instructions.  We do always generate two lxvp/plxvp instructions.  We leave
+;; in the support for generating stxvp/pstxvp in future machines.
 (define_insn_and_split "*movxo"
-  [(set (match_operand:XO 0 "nonimmediate_operand" "=d,m,d")
-	(match_operand:XO 1 "input_operand" "m,d,d"))]
+  [(set (match_operand:XO 0 "nonimmediate_operand" "=d,m,o,d")
+	(match_operand:XO 1 "input_operand"         "m,d,d,d"))]
   "TARGET_MMA
    && (gpc_reg_operand (operands[0], XOmode)
        || gpc_reg_operand (operands[1], XOmode))"
-  "@
-   #
-   #
-   #"
+  "#"
   "&& reload_completed"
   [(const_int 0)]
 {
   rs6000_split_multireg_move (operands[0], operands[1]);
   DONE;
 }
-  [(set_attr "type" "vecload,vecstore,veclogical")
-   (set_attr "length" "*,*,16")
-   (set_attr "max_prefixed_insns" "2,2,*")])
+  [(set_attr "type"               "vecload,vecstore,vecstore,veclogical")
+   (set_attr "length"             "*,      *,       *,       16")
+   (set_attr "max_prefixed_insns" "2,      2,       4,       *")
+   (set_attr "isa"                "*,      stxvp,   *,       *")])
 
 (define_expand "vsx_assemble_pair"
   [(match_operand:OO 0 "vsx_register_operand")
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 0af2085adc0..30ed24fff30 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -2714,7 +2714,8 @@ rs6000_setup_reg_addr_masks (void)
 	  /* Vector pairs can do both indexed and offset loads if the
 	     instructions are enabled, otherwise they can only do offset loads
 	     since it will be broken into two vector moves.  Vector quads can
-	     only do offset loads.  */
+	     only do offset loads.  If stxvp is disabled, we can't do indexed
+	     arithmetic.  */
 	  else if ((addr_mask != 0) && TARGET_MMA
 		   && (m2 == OOmode || m2 == XOmode))
 	    {
@@ -2722,7 +2723,8 @@ rs6000_setup_reg_addr_masks (void)
 	      if (rc == RELOAD_REG_FPR || rc == RELOAD_REG_VMX)
 		{
 		  addr_mask |= RELOAD_REG_QUAD_OFFSET;
-		  if (m2 == OOmode)
+		  if (m2 == OOmode
+		      && TARGET_STORE_VECTOR_PAIR)
 		    addr_mask |= RELOAD_REG_INDEXED;
 		}
 	    }
@@ -26992,7 +26994,8 @@ rs6000_split_multireg_move (rtx dst, rtx src)
   /* If we have a vector quad register for MMA, and this is a load or store,
      see if we can use vector paired load/stores.  */
   if (mode == XOmode && TARGET_MMA
-      && (MEM_P (dst) || MEM_P (src)))
+      && ((MEM_P (dst) && TARGET_STORE_VECTOR_PAIR)
+	  || MEM_P (src)))
     {
       reg_mode = OOmode;
       nregs /= 2;
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 3eca448a262..7eb107148ca 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -354,7 +354,7 @@ (define_attr "cpu"
   (const (symbol_ref "(enum attr_cpu) rs6000_tune")))
 
 ;; The ISA we implement.
-(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9,p9v,p9kf,p9tf,p10"
+(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9,p9v,p9kf,p9tf,p10,stxvp"
   (const_string "any"))
 
 ;; Is this alternative enabled for the current CPU/ISA/etc.?
@@ -402,6 +402,12 @@ (define_attr "enabled" ""
      (and (eq_attr "isa" "p10")
 	  (match_test "TARGET_POWER10"))
      (const_int 1)
+
+     (and (eq_attr "isa" "stxvp")
+	  (match_test "TARGET_POWER10")
+	  (match_test "TARGET_STORE_VECTOR_PAIR"))
+     (const_int 1)
+
     ] (const_int 0)))
 
 ;; If this instruction is microcoded on the CELL processor
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 4931d781c4e..79ceec6e6a5 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -624,6 +624,10 @@ mieee128-constant
 Target Var(TARGET_IEEE128_CONSTANT) Init(1) Save
 Generate (do not generate) code that uses the LXVKQ instruction.
 
+; Generate (do not generate) code that uses the store vector pair instruction.
+mstore-vector-pair
+Target Undocumented Var(TARGET_STORE_VECTOR_PAIR) Init(0) Save
+
 -param=rs6000-density-pct-threshold=
 Target Undocumented Joined UInteger Var(rs6000_density_pct_threshold) Init(85) IntegerRange(0, 100) Param
 When costing for loop vectorization, we probably need to penalize the loop body
diff --git a/gcc/testsuite/gcc.target/powerpc/p10-store-vector-pair-1.c b/gcc/testsuite/gcc.target/powerpc/p10-store-vector-pair-1.c
new file mode 100644
index 00000000000..c1a36bf5fff
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/p10-store-vector-pair-1.c
@@ -0,0 +1,82 @@
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mstore-vector-pair -mmma" } */
+
+/* Test if we generate store vector pair instructions if the user uses the
+   -mstore-vector-pair option.  */
+static __vector_quad sq;
+static __vector_pair sp;
+
+void
+load_store_pair (__vector_pair *p, __vector_pair *q)
+{
+  *p = *q;			/* lxvp, stxvp.  */
+}
+
+void
+load_store_pair_1 (__vector_pair *p, __vector_pair *q)
+{
+  p[1] = q[1];			/* lxvp, stxvp.  */
+}
+
+void
+load_store_pair_0x10000 (__vector_pair *p, __vector_pair *q)
+{
+  p[0x10000] = q[0x10000];	/* plxvp, pstxvp.  */
+}
+
+void
+load_store_pair_n (__vector_pair *p, __vector_pair *q, unsigned long n)
+{
+  p[n] = q[n];			/* lxvpx, 2x stxvp.  */
+}
+
+void
+load_pair_static (__vector_pair *p)
+{
+  *p = sp;			/* plxvp, stxvp.  */
+}
+
+void
+store_pair_static (__vector_pair *p)
+{
+  sp = *p;			/* lxvp, pstxvp.  */
+}
+
+void
+load_store_quad (__vector_quad *p, __vector_quad *q)
+{
+  *p = *q;			/* 2x lxvp, 2x stxvp.  */
+}
+
+void
+load_store_quad_1 (__vector_quad *p, __vector_quad *q)
+{
+  p[1] = q[1];			/* 2x lxvp, 2x stxvp.  */
+}
+
+void
+load_store_quad_0x10000 (__vector_quad *p, __vector_quad *q)
+{
+  p[0x10000] = q[0x10000];	/* 2x plxvp, 2x pstxvp.  */
+}
+
+void
+load_store_quad_n (__vector_quad *p, __vector_quad *q, unsigned long n)
+{
+  p[n] = q[n];			/* 2x lxvp, 2x stxv.  */
+}
+
+void
+load_quad_static (__vector_quad *p)
+{
+  *p = sq;			/* 2x plxvp, 2x stxvp.  */
+}
+
+void
+store_quad_static (__vector_quad *p)
+{
+  sq = *p;			/* 2x lxvp, 2x stxvp.  */
+}
+
+/* { dg-final { scan-assembler {\mp?stxvpx?\M}  } } */
+
diff --git a/gcc/testsuite/gcc.target/powerpc/p10-store-vector-pair-2.c b/gcc/testsuite/gcc.target/powerpc/p10-store-vector-pair-2.c
new file mode 100644
index 00000000000..b8c3bdbfd89
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/p10-store-vector-pair-2.c
@@ -0,0 +1,81 @@
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mno-store-vector-pair -mmma" } */
+
+/* Test if we do not generate store vector pair instructions if the user uses
+   the -mno-store-vector-pair option.  */
+static __vector_quad sq;
+static __vector_pair sp;
+
+void
+load_store_pair (__vector_pair *p, __vector_pair *q)
+{
+  *p = *q;			/* lxvp, 2x stxv.  */
+}
+
+void
+load_store_pair_1 (__vector_pair *p, __vector_pair *q)
+{
+  p[1] = q[1];			/* lxvp, 2x stxv.  */
+}
+
+void
+load_store_pair_0x10000 (__vector_pair *p, __vector_pair *q)
+{
+  p[0x10000] = q[0x10000];	/* plxvp, 2x pstxv.  */
+}
+
+void
+load_store_pair_n (__vector_pair *p, __vector_pair *q, unsigned long n)
+{
+  p[n] = q[n];			/* lxvpx, 2x stxv.  */
+}
+
+void
+load_pair_static (__vector_pair *p)
+{
+  *p = sp;			/* plxvp, 2x stxv.  */
+}
+
+void
+store_pair_static (__vector_pair *p)
+{
+  sp = *p;			/* lxvp, 2x pstxv.  */
+}
+
+void
+load_store_quad (__vector_quad *p, __vector_quad *q)
+{
+  *p = *q;			/* 2x lxvp, 4x stxv.  */
+}
+
+void
+load_store_quad_1 (__vector_quad *p, __vector_quad *q)
+{
+  p[1] = q[1];			/* 2x lxvp, 4x stxv.  */
+}
+
+void
+load_store_quad_0x10000 (__vector_quad *p, __vector_quad *q)
+{
+  p[0x10000] = q[0x10000];	/* 2x plxvp, 4x pstxv.  */
+}
+
+void
+load_store_quad_n (__vector_quad *p, __vector_quad *q, unsigned long n)
+{
+  p[n] = q[n];			/* 2x lxvp, 4x stxv.  */
+}
+
+void
+load_quad_static (__vector_quad *p)
+{
+  *p = sq;			/* 2x plxvp, 4x stxv.  */
+}
+
+void
+store_quad_static (__vector_quad *p)
+{
+  sq = *p;			/* 2x lxvp, 4x pstxv.  */
+}
+
+/* { dg-final { scan-assembler-not {\mp?vstxvpx?\M} } } */
-- 
2.35.3


-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 2/3] Disable generating load/store vector pairs for block copies.
  2022-06-07  0:53 [PATCH, 0/3] Disable generating store vector pair Michael Meissner
  2022-06-07  0:55 ` [PATCH 1/3] " Michael Meissner
@ 2022-06-07  0:55 ` Michael Meissner
  2022-06-07 20:28   ` will schmidt
  2022-06-07  0:56 ` [PATCH 3/3] Adjust MMA tests to account for no store vector pair Michael Meissner
  2 siblings, 1 reply; 14+ messages in thread
From: Michael Meissner @ 2022-06-07  0:55 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool, Kewen.Lin,
	David Edelsohn, Peter Bergner, Will Schmidt

[PATCH 2/3] Disable generating load/store vector pairs for block copies.

If the store vector pair instruction is disabled, do not generate block
copies that use load and store vector pair instructions.

I have built bootstrap compilers and run the regression tests on three
different systems:

    1)	Little endian power10 using the --with-cpu=power10 option.

    2)	Little endian power9 using the --with-cpu=power9 option.

    3)	Big endian power8 using the --with-cpu=power8 option.  On this system,
	both 64-bit and 32-bit code generation was tested.

There were no regressions in the runs.  Can I check this patch into the
trunk?  If there are no changes needed for the backports, can I check this
code into the active branches after a burn-in period?

2022-06-06   Michael Meissner  <meissner@linux.ibm.com>

gcc/

	* config/rs6000/rs6000-string.cc (expand_block_move): If the store
	vector pair instructions are disabled, do not generate block
	copies using load and store vector pairs.
---
 gcc/config/rs6000/rs6000-string.cc | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-string.cc b/gcc/config/rs6000/rs6000-string.cc
index 59d901ac68d..1b18e043269 100644
--- a/gcc/config/rs6000/rs6000-string.cc
+++ b/gcc/config/rs6000/rs6000-string.cc
@@ -2787,14 +2787,16 @@ expand_block_move (rtx operands[], bool might_overlap)
       rtx src, dest;
       bool move_with_length = false;
 
-      /* Use OOmode for paired vsx load/store.  Use V2DI for single
-	 unaligned vsx load/store, for consistency with what other
-	 expansions (compare) already do, and so we can use lxvd2x on
-	 p8.  Order is VSX pair unaligned, VSX unaligned, Altivec, VSX
-	 with length < 16 (if allowed), then gpr load/store.  */
+      /* Use OOmode for paired vsx load/store unless the store vector pair
+	 instructions are disabled.  Use V2DI for single unaligned vsx
+	 load/store, for consistency with what other expansions (compare)
+	 already do, and so we can use lxvd2x on p8.  Order is VSX pair
+	 unaligned, VSX unaligned, Altivec, VSX with length < 16 (if allowed),
+	 then gpr load/store.  */
 
       if (TARGET_MMA && TARGET_BLOCK_OPS_UNALIGNED_VSX
 	  && TARGET_BLOCK_OPS_VECTOR_PAIR
+	  && TARGET_STORE_VECTOR_PAIR
 	  && bytes >= 32
 	  && (align >= 256 || !STRICT_ALIGNMENT))
 	{
-- 
2.35.3


-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 3/3] Adjust MMA tests to account for no store vector pair.
  2022-06-07  0:53 [PATCH, 0/3] Disable generating store vector pair Michael Meissner
  2022-06-07  0:55 ` [PATCH 1/3] " Michael Meissner
  2022-06-07  0:55 ` [PATCH 2/3] Disable generating load/store vector pairs for block copies Michael Meissner
@ 2022-06-07  0:56 ` Michael Meissner
  2022-06-07 20:28   ` will schmidt
  2 siblings, 1 reply; 14+ messages in thread
From: Michael Meissner @ 2022-06-07  0:56 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool, Kewen.Lin,
	David Edelsohn, Peter Bergner, Will Schmidt

[PATCH 3/3] Adjust MMA tests to account for no store vector pair.

In changing the default for generating the store vector pair instructions,
I had to adjust several of the MMA tests to remove checking for these
instructions.  Mostly I just deleted the scan-assembler lines checking for
stxvp.  In two of the tests, I added the -mstore-vector-pair option since
the point of the test was to check for specific cases with store vector
pair instructions.

I have built bootstrap compilers and run the regression tests on three
different systems:

    1)	Little endian power10 using the --with-cpu=power10 option.

    2)	Little endian power9 using the --with-cpu=power9 option.

    3)	Big endian power8 using the --with-cpu=power8 option.  On this system,
	both 64-bit and 32-bit code generation was tested.

There were no regressions in the runs.  Can I check this patch into the
trunk?  If there are no changes needed for the backports, can I check this
code into the active branches after a burn-in period?

2022-06-06   Michael Meissner  <meissner@linux.ibm.com>

gcc/testsuite/

	* gcc.target/powerpc/mma-builtin-1.c: Eliminate checking for store
	vector pair instructions.
	* gcc.target/powerpc/mma-builtin-10-pair.c: Likewise.
	* gcc.target/powerpc/mma-builtin-10-quit.c: Likewise.
	* gcc.target/powerpc/mma-builtin-2.c: Likewise.
	* gcc.target/powerpc/mma-builtin-3.c: Likewise.
	* gcc.target/powerpc/mma-builtin-4.c: Likewise.
	* gcc.target/powerpc/mma-builtin-5.c: Likewise.
	* gcc.target/powerpc/mma-builtin-6.c: Likewise.
	* gcc.target/powerpc/mma-builtin-7.c: Likewise.
	* gcc.target/powerpc/mma-builtin-9.c: Likewise.
	* gcc.target/powerpc/mma-builtin-8.c: Add -mstore-vector-pair.
	* gcc.target/powerpc/pr102976.c: Likewise.
---
 gcc/testsuite/gcc.target/powerpc/mma-builtin-1.c       | 1 -
 gcc/testsuite/gcc.target/powerpc/mma-builtin-10-pair.c | 2 --
 gcc/testsuite/gcc.target/powerpc/mma-builtin-10-quad.c | 2 --
 gcc/testsuite/gcc.target/powerpc/mma-builtin-2.c       | 1 -
 gcc/testsuite/gcc.target/powerpc/mma-builtin-3.c       | 1 -
 gcc/testsuite/gcc.target/powerpc/mma-builtin-4.c       | 2 --
 gcc/testsuite/gcc.target/powerpc/mma-builtin-5.c       | 2 --
 gcc/testsuite/gcc.target/powerpc/mma-builtin-6.c       | 1 -
 gcc/testsuite/gcc.target/powerpc/mma-builtin-7.c       | 2 --
 gcc/testsuite/gcc.target/powerpc/mma-builtin-8.c       | 2 +-
 gcc/testsuite/gcc.target/powerpc/mma-builtin-9.c       | 2 --
 gcc/testsuite/gcc.target/powerpc/pr102976.c            | 6 +++++-
 12 files changed, 6 insertions(+), 18 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/mma-builtin-1.c b/gcc/testsuite/gcc.target/powerpc/mma-builtin-1.c
index 69ee826e1be..47b45b00403 100644
--- a/gcc/testsuite/gcc.target/powerpc/mma-builtin-1.c
+++ b/gcc/testsuite/gcc.target/powerpc/mma-builtin-1.c
@@ -260,7 +260,6 @@ foo13b (__vector_quad *dst, __vector_quad *src, vec_t *vec)
 
 /* { dg-final { scan-assembler-times {\mlxv\M} 40 } } */
 /* { dg-final { scan-assembler-times {\mlxvp\M} 12 } } */
-/* { dg-final { scan-assembler-times {\mstxvp\M} 40 } } */
 /* { dg-final { scan-assembler-times {\mxxmfacc\M} 20 } } */
 /* { dg-final { scan-assembler-times {\mxxmtacc\M} 6 } } */
 /* { dg-final { scan-assembler-times {\mxvbf16ger2\M} 1 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/mma-builtin-10-pair.c b/gcc/testsuite/gcc.target/powerpc/mma-builtin-10-pair.c
index d8748d8e7d0..9522673d83e 100644
--- a/gcc/testsuite/gcc.target/powerpc/mma-builtin-10-pair.c
+++ b/gcc/testsuite/gcc.target/powerpc/mma-builtin-10-pair.c
@@ -16,6 +16,4 @@ foo (__vector_pair *dst, vec_t *src)
 }
 
 /* { dg-final { scan-assembler-not {\mlxv\M} } } */
-/* { dg-final { scan-assembler-not {\mstxv\M} } } */
 /* { dg-final { scan-assembler-times {\mlxvp\M} 1 } } */
-/* { dg-final { scan-assembler-times {\mstxvp\M} 2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/mma-builtin-10-quad.c b/gcc/testsuite/gcc.target/powerpc/mma-builtin-10-quad.c
index 02342c76f5f..3cbdffc15ba 100644
--- a/gcc/testsuite/gcc.target/powerpc/mma-builtin-10-quad.c
+++ b/gcc/testsuite/gcc.target/powerpc/mma-builtin-10-quad.c
@@ -16,8 +16,6 @@ foo (__vector_quad *dst, vec_t *src)
 }
 
 /* { dg-final { scan-assembler-not {\mlxv\M} } } */
-/* { dg-final { scan-assembler-not {\mstxv\M} } } */
 /* { dg-final { scan-assembler-times {\mlxvp\M} 4 } } */
 /* { dg-final { scan-assembler-times {\mxxmtacc\M} 2 } } */
 /* { dg-final { scan-assembler-times {\mxxmfacc\M} 2 } } */
-/* { dg-final { scan-assembler-times {\mstxvp\M} 4 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/mma-builtin-2.c b/gcc/testsuite/gcc.target/powerpc/mma-builtin-2.c
index 0230d727657..5943702d8f3 100644
--- a/gcc/testsuite/gcc.target/powerpc/mma-builtin-2.c
+++ b/gcc/testsuite/gcc.target/powerpc/mma-builtin-2.c
@@ -59,7 +59,6 @@ foo3 (__vector_quad *dst, __vector_quad *src, vec_t *vec, __vector_pair *pvecp)
 /* { dg-final { scan-assembler-times {\mxxmtacc\M} 2 } } */
 /* { dg-final { scan-assembler-times {\mlxv\M} 4 } } */
 /* { dg-final { scan-assembler-times {\mlxvp\M} 8 } } */
-/* { dg-final { scan-assembler-times {\mstxvp\M} 8 } } */
 /* { dg-final { scan-assembler-times {\mxvf64ger\M} 1 } } */
 /* { dg-final { scan-assembler-times {\mxvf64gerpp\M} 1 } } */
 /* { dg-final { scan-assembler-times {\mxvf64gerpn\M} 1 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/mma-builtin-3.c b/gcc/testsuite/gcc.target/powerpc/mma-builtin-3.c
index 9bec78d333f..ee65ef9d96f 100644
--- a/gcc/testsuite/gcc.target/powerpc/mma-builtin-3.c
+++ b/gcc/testsuite/gcc.target/powerpc/mma-builtin-3.c
@@ -26,6 +26,5 @@ foo1 (vec_t *vec)
 /* { dg-final { scan-assembler-times {\mlxv\M} 2 } } */
 /* { dg-final { scan-assembler-times {\mstxv\M} 2 } } */
 /* { dg-final { scan-assembler-not {\mlxvp\M} } } */
-/* { dg-final { scan-assembler-not {\mstxvp\M} } } */
 /* { dg-final { scan-assembler-times {\mxvcvspbf16\M} 1 } } */
 /* { dg-final { scan-assembler-times {\mxvcvbf16spn\M} 1 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/mma-builtin-4.c b/gcc/testsuite/gcc.target/powerpc/mma-builtin-4.c
index a9fb0107d12..aa6e6136f4f 100644
--- a/gcc/testsuite/gcc.target/powerpc/mma-builtin-4.c
+++ b/gcc/testsuite/gcc.target/powerpc/mma-builtin-4.c
@@ -68,6 +68,4 @@ bar2 (vec_t *dst, __vector_pair *src)
 
 /* { dg-final { scan-assembler-times {\mlxv\M} 6 } } */
 /* { dg-final { scan-assembler-times {\mlxvp\M} 2 } } */
-/* { dg-final { scan-assembler-times {\mstxv\M} 4 } } */
-/* { dg-final { scan-assembler-times {\mstxvp\M} 3 } } */
 
diff --git a/gcc/testsuite/gcc.target/powerpc/mma-builtin-5.c b/gcc/testsuite/gcc.target/powerpc/mma-builtin-5.c
index 00503b7343d..0d332acee93 100644
--- a/gcc/testsuite/gcc.target/powerpc/mma-builtin-5.c
+++ b/gcc/testsuite/gcc.target/powerpc/mma-builtin-5.c
@@ -41,7 +41,5 @@ bar (vec_t *dst, __vector_quad *src)
 
 /* { dg-final { scan-assembler-times {\mlxv\M} 8 } } */
 /* { dg-final { scan-assembler-times {\mlxvp\M} 2 } } */
-/* { dg-final { scan-assembler-times {\mstxv\M} 4 } } */
-/* { dg-final { scan-assembler-times {\mstxvp\M} 4 } } */
 /* { dg-final { scan-assembler-times {\mxxmfacc\M} 3 } } */
 /* { dg-final { scan-assembler-times {\mxxmtacc\M} 3 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/mma-builtin-6.c b/gcc/testsuite/gcc.target/powerpc/mma-builtin-6.c
index 715b28138e9..2f5747da070 100644
--- a/gcc/testsuite/gcc.target/powerpc/mma-builtin-6.c
+++ b/gcc/testsuite/gcc.target/powerpc/mma-builtin-6.c
@@ -17,4 +17,3 @@ foo (__vector_quad *dst)
 /* { dg-final { scan-assembler-not {\mxxmtacc\M} } } */
 /* { dg-final { scan-assembler-times {\mxxsetaccz\M} 2 } } */
 /* { dg-final { scan-assembler-times {\mxxmfacc\M} 2 } } */
-/* { dg-final { scan-assembler-times {\mstxvp\M} 4 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/mma-builtin-7.c b/gcc/testsuite/gcc.target/powerpc/mma-builtin-7.c
index c661a4b84bc..6eba0a34e8a 100644
--- a/gcc/testsuite/gcc.target/powerpc/mma-builtin-7.c
+++ b/gcc/testsuite/gcc.target/powerpc/mma-builtin-7.c
@@ -19,8 +19,6 @@ foo (__vector_pair *dst, __vector_pair *src, long idx)
 #endif
 
 /* { dg-final { scan-assembler-not {\mlxv\M} } } */
-/* { dg-final { scan-assembler-not {\mstxv\M} } } */
 /* { dg-final { scan-assembler-times {\mlxvp\M} 3 } } */
 /* { dg-final { scan-assembler-times {\mlxvpx\M} 1 } } */
 /* { dg-final { scan-assembler-times {\mplxvp\M} 1 } } */
-/* { dg-final { scan-assembler-times {\mstxvp\M} 5 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/mma-builtin-8.c b/gcc/testsuite/gcc.target/powerpc/mma-builtin-8.c
index af29e479f83..cbd2e6dbae1 100644
--- a/gcc/testsuite/gcc.target/powerpc/mma-builtin-8.c
+++ b/gcc/testsuite/gcc.target/powerpc/mma-builtin-8.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target power10_ok } */
-/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mstore-vector-pair" } */
 
 void
 foo (__vector_pair *dst, __vector_pair *src, long idx)
diff --git a/gcc/testsuite/gcc.target/powerpc/mma-builtin-9.c b/gcc/testsuite/gcc.target/powerpc/mma-builtin-9.c
index 397d0f1db35..7232e840204 100644
--- a/gcc/testsuite/gcc.target/powerpc/mma-builtin-9.c
+++ b/gcc/testsuite/gcc.target/powerpc/mma-builtin-9.c
@@ -23,6 +23,4 @@ bar (__vector_quad *dst, vec_t *src)
 }
 
 /* { dg-final { scan-assembler-not {\mlxv\M} } } */
-/* { dg-final { scan-assembler-not {\mstxv\M} } } */
 /* { dg-final { scan-assembler-times {\mlxvp\M} 3 } } */
-/* { dg-final { scan-assembler-times {\mstxvp\M} 3 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr102976.c b/gcc/testsuite/gcc.target/powerpc/pr102976.c
index 5a4320f8e0a..c975eba86da 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr102976.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr102976.c
@@ -1,7 +1,11 @@
 /* { dg-require-effective-target power10_ok } */
-/* { dg-options "-O2 -mdejagnu-cpu=power10" } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10 -mstore-vector-pair" } */
 
 #include <altivec.h>
+
+/* The test relies on store vector pair being generated.  Otherwise, it
+   will generate 2 stxv instructions.  */
+
 void
 bug (__vector_pair *dst)
 {
-- 
2.35.3


-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] Disable generating store vector pair.
  2022-06-07  0:55 ` [PATCH 1/3] " Michael Meissner
@ 2022-06-07 20:28   ` will schmidt
  2022-06-07 21:17   ` Peter Bergner
  1 sibling, 0 replies; 14+ messages in thread
From: will schmidt @ 2022-06-07 20:28 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool, Kewen.Lin,
	David Edelsohn, Peter Bergner

On Mon, 2022-06-06 at 20:55 -0400, Michael Meissner wrote:
> [PATCH 1/3] Disable generating store vector pair.
> 
> Testing has revealed that the power10 has some slowdowns if the store
> vector pair instruction is generated in some cases.  This patch disables
> generating the store vector pair instructions (stxvp, pstxvp, and stxvpx)
> unless an undocumented switch is used.  It is anticipated that perhaps
> with future machines we can generate the store vector pair instruction.
> 
> This patch does a split after reload to convert a store vector pair
> instruction into a pair of store vector instructions.
> 
> We do continue to generate the load vector pair instructions (lxvp, plxvp,
> and lxvpx), since we have found that in code that heavily uses MMA, it is
> still a win to generate the load vector pair instructions.
> 
> There are two future patches planed:
> 
>     1)	Disable block moves from generating load/store vector pair
> 	instructions unless the the store vector pair instructions are
> 	being generted.
> 
>     2)	Make the built-in functions for generating store vector pair
> 	always generate those instructions even if store vector pair
> 	instructions are disabled.
> 
> I have built bootstrap compilers and run the regression tests on three
> different systems:
> 
>     1)	Little endian power10 using the --with-cpu=power10 option.
> 
>     2)	Little endian power9 using the --with-cpu=power9 option.
> 
>     3)	Big endian power8 using the --with-cpu=power8 option.  On this system,
> 	both 64-bit and 32-bit code generation was tested.
> 
> There were no regressions in the runs except for the tests that are
> modified in patch #3 in these series of patches.  Can I check this patch
> into the trunk?  If there are no changes needed for the backports, can I
> check this code into the active branches after a burn-in period?
> 
> 2022-06-06   Michael Meissner  <meissner@linux.ibm.com>
> 
> gcc/
> 
> 	* config/rs6000/mma.md (movoo): Disable generating store vector
> 	pair instructions unless these are enabled by the user.
> 	(movxo): Likewise.
> 	* config/rs6000/rs6000.cc (rs6000_setup_reg_addr_masks): If store
> 	vector pair instructions are disabled, do not allow vector pair
> 	addresses to be indexed.
> 	(rs6000_split_multireg_move): Do not split XOmode stores into two
> 	store vector pair instructions unless store vector pair
> 	instructions are enabled.
> 	* config/rs6000/rs6000.md (isa attribute): Add stxvp attribute.
> 	(enabled attribute): Disable alternative using store vector pair
> 	instructions unless they are enabled.
> 	* config/rs6000/rs6000.opt (-mstore-vector-pair): New option.
> 
> gcc/testsuite/
> 
> 	* gcc.target/powerpc/p10-store-vector-pair-1.c: New test.
> 	* gcc.target/powerpc/p10-store-vector-pair-2.c: New test.
> ---
>  gcc/config/rs6000/mma.md                      | 41 ++++++----
>  gcc/config/rs6000/rs6000.cc                   |  9 +-
>  gcc/config/rs6000/rs6000.md                   |  8 +-
>  gcc/config/rs6000/rs6000.opt                  |  4 +
>  .../powerpc/p10-store-vector-pair-1.c         | 82 +++++++++++++++++++
>  .../powerpc/p10-store-vector-pair-2.c         | 81 ++++++++++++++++++
>  6 files changed, 206 insertions(+), 19 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/p10-store-vector-pair-1.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/p10-store-vector-pair-2.c
> 
> diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
> index a183b6a168a..9b5f243b88d 100644
> --- a/gcc/config/rs6000/mma.md
> +++ b/gcc/config/rs6000/mma.md
> @@ -274,26 +274,35 @@ (define_expand "movoo"
>    DONE;
>  })
> 
> +;; By default for power10, do not generate the stxvp/pstxvp/stxvpx
> +;; instructions.  Instead, split these instructions into two separate store
> +;; vector instructions.  We do always generate a lxvp/plxvp/lxvpx instruction.
> +;; We leave in the support for generating stxvp/pstxvp/stxvpx in future
> +;; machines.

... and if (undocumented) STORE_VECTOR_PAIR option is indicated ?

Nothing else jumps out at me.  

Thanks
-Will



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] Disable generating load/store vector pairs for block copies.
  2022-06-07  0:55 ` [PATCH 2/3] Disable generating load/store vector pairs for block copies Michael Meissner
@ 2022-06-07 20:28   ` will schmidt
  0 siblings, 0 replies; 14+ messages in thread
From: will schmidt @ 2022-06-07 20:28 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool, Kewen.Lin,
	David Edelsohn, Peter Bergner

On Mon, 2022-06-06 at 20:55 -0400, Michael Meissner wrote:
> [PATCH 2/3] Disable generating load/store vector pairs for block copies.
> 
> If the store vector pair instruction is disabled, do not generate block
> copies that use load and store vector pair instructions.
> 
> I have built bootstrap compilers and run the regression tests on three
> different systems:
> 
>     1)	Little endian power10 using the --with-cpu=power10 option.
> 
>     2)	Little endian power9 using the --with-cpu=power9 option.
> 
>     3)	Big endian power8 using the --with-cpu=power8 option.  On this system,
> 	both 64-bit and 32-bit code generation was tested.
> 
> There were no regressions in the runs.  Can I check this patch into the
> trunk?  If there are no changes needed for the backports, can I check this
> code into the active branches after a burn-in period?
> 
> 2022-06-06   Michael Meissner  <meissner@linux.ibm.com>
> 
> gcc/
> 
> 	* config/rs6000/rs6000-string.cc (expand_block_move): If the store
> 	vector pair instructions are disabled, do not generate block
> 	copies using load and store vector pairs.
> ---
>  gcc/config/rs6000/rs6000-string.cc | 12 +++++++-----
>  1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-string.cc b/gcc/config/rs6000/rs6000-string.cc
> index 59d901ac68d..1b18e043269 100644
> --- a/gcc/config/rs6000/rs6000-string.cc
> +++ b/gcc/config/rs6000/rs6000-string.cc
> @@ -2787,14 +2787,16 @@ expand_block_move (rtx operands[], bool might_overlap)
>        rtx src, dest;
>        bool move_with_length = false;
> 
> -      /* Use OOmode for paired vsx load/store.  Use V2DI for single
> -	 unaligned vsx load/store, for consistency with what other
> -	 expansions (compare) already do, and so we can use lxvd2x on
> -	 p8.  Order is VSX pair unaligned, VSX unaligned, Altivec, VSX
> -	 with length < 16 (if allowed), then gpr load/store.  */
> +      /* Use OOmode for paired vsx load/store unless the store vector pair
> +	 instructions are disabled.  Use V2DI for single unaligned vsx
> +	 load/store, for consistency with what other expansions (compare)
> +	 already do, and so we can use lxvd2x on p8.  Order is VSX pair
> +	 unaligned, VSX unaligned, Altivec, VSX with length < 16 (if allowed),
> +	 then gpr load/store.  */
> 
>        if (TARGET_MMA && TARGET_BLOCK_OPS_UNALIGNED_VSX
>  	  && TARGET_BLOCK_OPS_VECTOR_PAIR
> +	  && TARGET_STORE_VECTOR_PAIR
>  	  && bytes >= 32
>  	  && (align >= 256 || !STRICT_ALIGNMENT))


Seems straightforward.  LGTM, 
Thanks
-Will




>  	{
> -- 
> 2.35.3
> 
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] Adjust MMA tests to account for no store vector pair.
  2022-06-07  0:56 ` [PATCH 3/3] Adjust MMA tests to account for no store vector pair Michael Meissner
@ 2022-06-07 20:28   ` will schmidt
  0 siblings, 0 replies; 14+ messages in thread
From: will schmidt @ 2022-06-07 20:28 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool, Kewen.Lin,
	David Edelsohn, Peter Bergner

On Mon, 2022-06-06 at 20:56 -0400, Michael Meissner wrote:
> [PATCH 3/3] Adjust MMA tests to account for no store vector pair.
> 
> In changing the default for generating the store vector pair instructions,
> I had to adjust several of the MMA tests to remove checking for these
> instructions.  Mostly I just deleted the scan-assembler lines checking for
> stxvp.  In two of the tests, I added the -mstore-vector-pair option since
> the point of the test was to check for specific cases with store vector
> pair instructions.
> 
> I have built bootstrap compilers and run the regression tests on three
> different systems:
> 
>     1)	Little endian power10 using the --with-cpu=power10 option.
> 
>     2)	Little endian power9 using the --with-cpu=power9 option.
> 
>     3)	Big endian power8 using the --with-cpu=power8 option.  On this system,
> 	both 64-bit and 32-bit code generation was tested.
> 
> There were no regressions in the runs.  Can I check this patch into the
> trunk?  If there are no changes needed for the backports, can I check this
> code into the active branches after a burn-in period?
> 
> 2022-06-06   Michael Meissner  <meissner@linux.ibm.com>
> 
> gcc/testsuite/
> 
> 	* gcc.target/powerpc/mma-builtin-1.c: Eliminate checking for store
> 	vector pair instructions.
> 	* gcc.target/powerpc/mma-builtin-10-pair.c: Likewise.
> 	* gcc.target/powerpc/mma-builtin-10-quit.c: Likewise.
> 	* gcc.target/powerpc/mma-builtin-2.c: Likewise.
> 	* gcc.target/powerpc/mma-builtin-3.c: Likewise.
> 	* gcc.target/powerpc/mma-builtin-4.c: Likewise.
> 	* gcc.target/powerpc/mma-builtin-5.c: Likewise.
> 	* gcc.target/powerpc/mma-builtin-6.c: Likewise.
> 	* gcc.target/powerpc/mma-builtin-7.c: Likewise.
> 	* gcc.target/powerpc/mma-builtin-9.c: Likewise.
> 	* gcc.target/powerpc/mma-builtin-8.c: Add -mstore-vector-pair.
> 	* gcc.target/powerpc/pr102976.c: Likewise.
> ---
>  gcc/testsuite/gcc.target/powerpc/mma-builtin-1.c       | 1 -
>  gcc/testsuite/gcc.target/powerpc/mma-builtin-10-pair.c | 2 --
>  gcc/testsuite/gcc.target/powerpc/mma-builtin-10-quad.c | 2 --
>  gcc/testsuite/gcc.target/powerpc/mma-builtin-2.c       | 1 -
>  gcc/testsuite/gcc.target/powerpc/mma-builtin-3.c       | 1 -
>  gcc/testsuite/gcc.target/powerpc/mma-builtin-4.c       | 2 --
>  gcc/testsuite/gcc.target/powerpc/mma-builtin-5.c       | 2 --
>  gcc/testsuite/gcc.target/powerpc/mma-builtin-6.c       | 1 -
>  gcc/testsuite/gcc.target/powerpc/mma-builtin-7.c       | 2 --
>  gcc/testsuite/gcc.target/powerpc/mma-builtin-8.c       | 2 +-
>  gcc/testsuite/gcc.target/powerpc/mma-builtin-9.c       | 2 --
>  gcc/testsuite/gcc.target/powerpc/pr102976.c            | 6 +++++-
>  12 files changed, 6 insertions(+), 18 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/mma-builtin-1.c b/gcc/testsuite/gcc.target/powerpc/mma-builtin-1.c
> index 69ee826e1be..47b45b00403 100644
> --- a/gcc/testsuite/gcc.target/powerpc/mma-builtin-1.c
> +++ b/gcc/testsuite/gcc.target/powerpc/mma-builtin-1.c
> @@ -260,7 +260,6 @@ foo13b (__vector_quad *dst, __vector_quad *src, vec_t *vec)
> 
>  /* { dg-final { scan-assembler-times {\mlxv\M} 40 } } */
>  /* { dg-final { scan-assembler-times {\mlxvp\M} 12 } } */
> -/* { dg-final { scan-assembler-times {\mstxvp\M} 40 } } */
>  /* { dg-final { scan-assembler-times {\mxxmfacc\M} 20 } } */
>  /* { dg-final { scan-assembler-times {\mxxmtacc\M} 6 } } */
>  /* { dg-final { scan-assembler-times {\mxvbf16ger2\M} 1 } } */


<snip>

This all seems straightforward.   LGTM, thanks. 
-Will



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] Disable generating store vector pair.
  2022-06-07  0:55 ` [PATCH 1/3] " Michael Meissner
  2022-06-07 20:28   ` will schmidt
@ 2022-06-07 21:17   ` Peter Bergner
  2022-06-07 21:24     ` Segher Boessenkool
  2022-06-07 23:20     ` Michael Meissner
  1 sibling, 2 replies; 14+ messages in thread
From: Peter Bergner @ 2022-06-07 21:17 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool, Kewen.Lin,
	David Edelsohn, Will Schmidt

On 6/6/22 7:55 PM, Michael Meissner wrote:
> gcc/
[snip]
> 	* config/rs6000/rs6000.opt (-mstore-vector-pair): New option.
[snip]
> diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
> index 4931d781c4e..79ceec6e6a5 100644
> --- a/gcc/config/rs6000/rs6000.opt
> +++ b/gcc/config/rs6000/rs6000.opt
> @@ -624,6 +624,10 @@ mieee128-constant
>  Target Var(TARGET_IEEE128_CONSTANT) Init(1) Save
>  Generate (do not generate) code that uses the LXVKQ instruction.
>  
> +; Generate (do not generate) code that uses the store vector pair instruction.
> +mstore-vector-pair
> +Target Undocumented Var(TARGET_STORE_VECTOR_PAIR) Init(0) Save
> +
>  -param=rs6000-density-pct-threshold=
>  Target Undocumented Joined UInteger Var(rs6000_density_pct_threshold) Init(85) IntegerRange(0, 100) Param
>  When costing for loop vectorization, we probably need to penalize the loop body

I think I mentioned this offline, but I'd prefer a negative target flag,
something like TARGET_NO_STORE_VECTOR_PAIR that defaults to off, meaning we'd
generate stxvp by default.  Then I'd like to see MASK_NO_STORE_VECTOR_PAIR
added to power10's rs6000-cpu.def definition.  That way, stxvp isn't generated
on Power10, but would be by default on any possible future cpus without
having to add a flag to those cpus rs6000-cpu.def entries.

Peter


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] Disable generating store vector pair.
  2022-06-07 21:17   ` Peter Bergner
@ 2022-06-07 21:24     ` Segher Boessenkool
  2022-06-08  0:59       ` Peter Bergner
  2022-06-07 23:20     ` Michael Meissner
  1 sibling, 1 reply; 14+ messages in thread
From: Segher Boessenkool @ 2022-06-07 21:24 UTC (permalink / raw)
  To: Peter Bergner
  Cc: Michael Meissner, gcc-patches, Kewen.Lin, David Edelsohn, Will Schmidt

On Tue, Jun 07, 2022 at 04:17:04PM -0500, Peter Bergner wrote:
> I think I mentioned this offline, but I'd prefer a negative target flag,
> something like TARGET_NO_STORE_VECTOR_PAIR that defaults to off, meaning we'd
> generate stxvp by default.

NAK.  All negatives should be -mno-xxx with -mxxx the corresponding
positive.  All of them.

>  Then I'd like to see MASK_NO_STORE_VECTOR_PAIR
> added to power10's rs6000-cpu.def definition.  That way, stxvp isn't generated
> on Power10, but would be by default on any possible future cpus without
> having to add a flag to those cpus rs6000-cpu.def entries.

The p10 cpu support should simply not enable the option by default.
There is no reason to play games.


Segher

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] Disable generating store vector pair.
  2022-06-07 21:17   ` Peter Bergner
  2022-06-07 21:24     ` Segher Boessenkool
@ 2022-06-07 23:20     ` Michael Meissner
  1 sibling, 0 replies; 14+ messages in thread
From: Michael Meissner @ 2022-06-07 23:20 UTC (permalink / raw)
  To: Peter Bergner
  Cc: Michael Meissner, gcc-patches, Segher Boessenkool, Kewen.Lin,
	David Edelsohn, Will Schmidt

On Tue, Jun 07, 2022 at 04:17:04PM -0500, Peter Bergner wrote:
> On 6/6/22 7:55 PM, Michael Meissner wrote:
> > gcc/
> [snip]
> > 	* config/rs6000/rs6000.opt (-mstore-vector-pair): New option.
> [snip]
> > diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
> > index 4931d781c4e..79ceec6e6a5 100644
> > --- a/gcc/config/rs6000/rs6000.opt
> > +++ b/gcc/config/rs6000/rs6000.opt
> > @@ -624,6 +624,10 @@ mieee128-constant
> >  Target Var(TARGET_IEEE128_CONSTANT) Init(1) Save
> >  Generate (do not generate) code that uses the LXVKQ instruction.
> >  
> > +; Generate (do not generate) code that uses the store vector pair instruction.
> > +mstore-vector-pair
> > +Target Undocumented Var(TARGET_STORE_VECTOR_PAIR) Init(0) Save
> > +
> >  -param=rs6000-density-pct-threshold=
> >  Target Undocumented Joined UInteger Var(rs6000_density_pct_threshold) Init(85) IntegerRange(0, 100) Param
> >  When costing for loop vectorization, we probably need to penalize the loop body
> 
> I think I mentioned this offline, but I'd prefer a negative target flag,
> something like TARGET_NO_STORE_VECTOR_PAIR that defaults to off, meaning we'd
> generate stxvp by default.  Then I'd like to see MASK_NO_STORE_VECTOR_PAIR
> added to power10's rs6000-cpu.def definition.  That way, stxvp isn't generated
> on Power10, but would be by default on any possible future cpus without
> having to add a flag to those cpus rs6000-cpu.def entries.

I don't much care when the option is spelled, but I'm happy to go with whatever
name people want.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] Disable generating store vector pair.
  2022-06-07 21:24     ` Segher Boessenkool
@ 2022-06-08  0:59       ` Peter Bergner
  2022-06-08  3:16         ` Michael Meissner
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Bergner @ 2022-06-08  0:59 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Michael Meissner, gcc-patches, Kewen.Lin, David Edelsohn, Will Schmidt

On 6/7/22 4:24 PM, Segher Boessenkool wrote:
> On Tue, Jun 07, 2022 at 04:17:04PM -0500, Peter Bergner wrote:
>> I think I mentioned this offline, but I'd prefer a negative target flag,
>> something like TARGET_NO_STORE_VECTOR_PAIR that defaults to off, meaning we'd
>> generate stxvp by default.
> 
> NAK.  All negatives should be -mno-xxx with -mxxx the corresponding
> positive.  All of them.

That's not what I was asking for.  I totally agree that -mno-store-vector-pair
should disable generating stxvp and that -mstore-vector-pair should enable
generating it.  What I asked for was that the internal flag we use to enable
and disable it should be a negative flag, where TARGET_NO_STORE_VECTOR_PAIR is
true when we use -mno-store-vector-pair and false when using -mstore-vector-pair.
That way we can add that flag to power10's rs6000-cpu.def entry and then we're
done.  What I don't want to have to do is that if/when power87 is released, we
still have to add TARGET_STORE_VECTOR_PAIR its rs6000-cpu.def entry just to
get stxvp insns generated.  That adds a cost to every cpu after power10 since
we'd have to remember to add that flag to every follow-on cpu.

Peter

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] Disable generating store vector pair.
  2022-06-08  0:59       ` Peter Bergner
@ 2022-06-08  3:16         ` Michael Meissner
  2022-06-08 14:49           ` will schmidt
  2022-06-08 14:58           ` Peter Bergner
  0 siblings, 2 replies; 14+ messages in thread
From: Michael Meissner @ 2022-06-08  3:16 UTC (permalink / raw)
  To: Peter Bergner
  Cc: Segher Boessenkool, Michael Meissner, gcc-patches, Kewen.Lin,
	David Edelsohn, Will Schmidt

On Tue, Jun 07, 2022 at 07:59:34PM -0500, Peter Bergner wrote:
> On 6/7/22 4:24 PM, Segher Boessenkool wrote:
> > On Tue, Jun 07, 2022 at 04:17:04PM -0500, Peter Bergner wrote:
> >> I think I mentioned this offline, but I'd prefer a negative target flag,
> >> something like TARGET_NO_STORE_VECTOR_PAIR that defaults to off, meaning we'd
> >> generate stxvp by default.
> > 
> > NAK.  All negatives should be -mno-xxx with -mxxx the corresponding
> > positive.  All of them.
> 
> That's not what I was asking for.  I totally agree that -mno-store-vector-pair
> should disable generating stxvp and that -mstore-vector-pair should enable
> generating it.  What I asked for was that the internal flag we use to enable
> and disable it should be a negative flag, where TARGET_NO_STORE_VECTOR_PAIR is
> true when we use -mno-store-vector-pair and false when using -mstore-vector-pair.
> That way we can add that flag to power10's rs6000-cpu.def entry and then we're
> done.  What I don't want to have to do is that if/when power87 is released, we
> still have to add TARGET_STORE_VECTOR_PAIR its rs6000-cpu.def entry just to
> get stxvp insns generated.  That adds a cost to every cpu after power10 since
> we'd have to remember to add that flag to every follow-on cpu.

FWIW, I really dislike having negative flags like that (just talking about the
option mask internals, not the user option).

I don't view the cost to add one postive flag to the next CPU as bad, as it
will be a one time cost.  Presumably it would be set also next++ CPU.  This is
like power8 is all of the power7 flags + new flags.  Power9 is all of the
power8 flags + new flags.  I.e. in general it is cumulative.  Yes, I'm aware
there are times when there are breaks, but hopefully those are rare.

Otherwise it is like the mess with -mpower8-fusion, where going from power8 to
power9 we have to clear the fusion flag.  If store vector pair is a postive
flag, then it isn't set in power10 flags, but it might be set in next cpu
flags.  But if it is a negative flag, we have to explicitly clear it.

We can do it, but I just prefer to go with the positive flag approach.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] Disable generating store vector pair.
  2022-06-08  3:16         ` Michael Meissner
@ 2022-06-08 14:49           ` will schmidt
  2022-06-08 14:58           ` Peter Bergner
  1 sibling, 0 replies; 14+ messages in thread
From: will schmidt @ 2022-06-08 14:49 UTC (permalink / raw)
  To: Michael Meissner, Peter Bergner
  Cc: Segher Boessenkool, gcc-patches, Kewen.Lin, David Edelsohn

On Tue, 2022-06-07 at 23:16 -0400, Michael Meissner wrote:
> On Tue, Jun 07, 2022 at 07:59:34PM -0500, Peter Bergner wrote:
> > On 6/7/22 4:24 PM, Segher Boessenkool wrote:
> > > On Tue, Jun 07, 2022 at 04:17:04PM -0500, Peter Bergner wrote:
> > > > I think I mentioned this offline, but I'd prefer a negative target flag,
> > > > something like TARGET_NO_STORE_VECTOR_PAIR that defaults to off, meaning we'd
> > > > generate stxvp by default.
> > > 
> > > NAK.  All negatives should be -mno-xxx with -mxxx the corresponding
> > > positive.  All of them.
> > 
> > That's not what I was asking for.  I totally agree that -mno-store-vector-pair
> > should disable generating stxvp and that -mstore-vector-pair should enable
> > generating it.  What I asked for was that the internal flag we use to enable
> > and disable it should be a negative flag, where TARGET_NO_STORE_VECTOR_PAIR is
> > true when we use -mno-store-vector-pair and false when using -mstore-vector-pair.
> > That way we can add that flag to power10's rs6000-cpu.def entry and then we're
> > done.  What I don't want to have to do is that if/when power87 is released, we
> > still have to add TARGET_STORE_VECTOR_PAIR its rs6000-cpu.def entry just to
> > get stxvp insns generated.  That adds a cost to every cpu after power10 since
> > we'd have to remember to add that flag to every follow-on cpu.
> 
> FWIW, I really dislike having negative flags like that (just talking about the
> option mask internals, not the user option).

I can't tell there is agreement in either direction, i'll throw some
comments out and see if that helps make a decision. 

I agree with avoiding the negative flags.  Whenever I run across a code
snippet reading  "if (! TARGET_NOT_FOO) ... " it's time to double-check 
everything.  :-)      

If the proposal is to have "TARGET_NO_STORE_VECTOR_PAIR" set to "off",
I'd counter propose whatever variation possible to drop the "NO" from
the string. i.e. "TARGET_STORE_VECTOR_PAIR" set to however it makes
sense to indicate enabled, or not.

All that said, .. with a strong preference to have the internal flags
matching the option flags as closely as possible.


> 
> I don't view the cost to add one postive flag to the next CPU as bad, as it
> will be a one time cost.  Presumably it would be set also next++ CPU.  This is
> like power8 is all of the power7 flags + new flags.  Power9 is all of the
> power8 flags + new flags.  I.e. in general it is cumulative.  Yes, I'm aware
> there are times when there are breaks, but hopefully those are rare.

This sounds reasonable.   Some weight could be added for which way to
bias the flag based on a guess of what the 'power87' release will
allow, but ultimately that shouldn't really matter. 

And no, power87 isnt' real AFAIK,.. I'm just repeating the example
provided by Peter :-) 

Thanks
-Will

> 
> Otherwise it is like the mess with -mpower8-fusion, where going from power8 to
> power9 we have to clear the fusion flag.  If store vector pair is a postive
> flag, then it isn't set in power10 flags, but it might be set in next cpu
> flags.  But if it is a negative flag, we have to explicitly clear it.
> 
> We can do it, but I just prefer to go with the positive flag approach.
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] Disable generating store vector pair.
  2022-06-08  3:16         ` Michael Meissner
  2022-06-08 14:49           ` will schmidt
@ 2022-06-08 14:58           ` Peter Bergner
  1 sibling, 0 replies; 14+ messages in thread
From: Peter Bergner @ 2022-06-08 14:58 UTC (permalink / raw)
  To: Michael Meissner, Segher Boessenkool, gcc-patches, Kewen.Lin,
	David Edelsohn, Will Schmidt

On 6/7/22 10:16 PM, Michael Meissner wrote:
> Otherwise it is like the mess with -mpower8-fusion, where going from power8 to
> power9 we have to clear the fusion flag.  If store vector pair is a postive
> flag, then it isn't set in power10 flags, but it might be set in next cpu
> flags.  But if it is a negative flag, we have to explicitly clear it.

Ok, I completely forgot about this specific issue and its negative affect on
inlining, so I agree it's a bad idea.  Request withdrawn. :-) 

Peter



^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-06-08 14:58 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-07  0:53 [PATCH, 0/3] Disable generating store vector pair Michael Meissner
2022-06-07  0:55 ` [PATCH 1/3] " Michael Meissner
2022-06-07 20:28   ` will schmidt
2022-06-07 21:17   ` Peter Bergner
2022-06-07 21:24     ` Segher Boessenkool
2022-06-08  0:59       ` Peter Bergner
2022-06-08  3:16         ` Michael Meissner
2022-06-08 14:49           ` will schmidt
2022-06-08 14:58           ` Peter Bergner
2022-06-07 23:20     ` Michael Meissner
2022-06-07  0:55 ` [PATCH 2/3] Disable generating load/store vector pairs for block copies Michael Meissner
2022-06-07 20:28   ` will schmidt
2022-06-07  0:56 ` [PATCH 3/3] Adjust MMA tests to account for no store vector pair Michael Meissner
2022-06-07 20:28   ` will schmidt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).