public inbox for gcc-cvs@sourceware.org
help / color / mirror / Atom feed
* [gcc(refs/users/meissner/heads/work090)] Update ChangeLog.meissner.
@ 2022-06-03 16:26 Michael Meissner
  0 siblings, 0 replies; 4+ messages in thread
From: Michael Meissner @ 2022-06-03 16:26 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:a0dce5a1b1b2332fce6d4e1f271725329978e356

commit a0dce5a1b1b2332fce6d4e1f271725329978e356
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Fri Jun 3 12:26:40 2022 -0400

    Update ChangeLog.meissner.
    
    2022-06-03   Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
            * ChangeLog.meissner: Update.

Diff:
---
 gcc/ChangeLog.meissner | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/gcc/ChangeLog.meissner b/gcc/ChangeLog.meissner
index aa5e262ea6e..b21240d0afc 100644
--- a/gcc/ChangeLog.meissner
+++ b/gcc/ChangeLog.meissner
@@ -1,3 +1,32 @@
+==================== work090 patch #3
+
+Adjust MMA tests to account for no store vector pair.
+
+In changing the default for generating the store vector pair instructions,
+I had to adjust several of the MMA tests to remove checking for these
+instructions.  Mostly I just deleted the scan-assembler lines checking for
+stxvp.  In two of the tests, I added the -mstore-vector-pair option since
+the point of the test was to check for specific cases with store vector
+pair instructions.
+
+2022-06-03   Michael Meissner  <meissner@linux.ibm.com>
+
+gcc/testsuite/
+
+	* gcc.target/powerpc/mma-builtin-1.c: Eliminate checking for store
+	vector pair instructions.
+	* gcc.target/powerpc/mma-builtin-10-pair.c: Likewise.
+	* gcc.target/powerpc/mma-builtin-10-quit.c: Likewise.
+	* gcc.target/powerpc/mma-builtin-2.c: Likewise.
+	* gcc.target/powerpc/mma-builtin-3.c: Likewise.
+	* gcc.target/powerpc/mma-builtin-4.c: Likewise.
+	* gcc.target/powerpc/mma-builtin-5.c: Likewise.
+	* gcc.target/powerpc/mma-builtin-6.c: Likewise.
+	* gcc.target/powerpc/mma-builtin-7.c: Likewise.
+	* gcc.target/powerpc/mma-builtin-9.c: Likewise.
+	* gcc.target/powerpc/mma-builtin-8.c: Add -mstore-vector-pair.
+	* gcc.target/powerpc/pr102976.c: Likewise.
+
 ==================== work090 patch #2
 
 Disable generating load/store vector pairs for block copies.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [gcc(refs/users/meissner/heads/work090)] Update ChangeLog.meissner.
@ 2022-06-06 20:16 Michael Meissner
  0 siblings, 0 replies; 4+ messages in thread
From: Michael Meissner @ 2022-06-06 20:16 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:348ae6d2898798b31f136f8d16a12b3d1c84af68

commit 348ae6d2898798b31f136f8d16a12b3d1c84af68
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Mon Jun 6 16:15:50 2022 -0400

    Update ChangeLog.meissner.
    
    2022-06-06   Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
            * ChangeLog.meissner: Update.

Diff:
---
 gcc/ChangeLog.meissner | 90 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 90 insertions(+)

diff --git a/gcc/ChangeLog.meissner b/gcc/ChangeLog.meissner
index b21240d0afc..421d8fbff94 100644
--- a/gcc/ChangeLog.meissner
+++ b/gcc/ChangeLog.meissner
@@ -1,3 +1,93 @@
+==================== work090 patch #4
+
+Optimize vec_splats of constant vec_extract for V2DI/V2DF, PR target 99293.
+
+This is version 3 of the patch.  The original patch was:
+
+| Date: Mon, 28 Mar 2022 12:26:02 -0400
+| Subject: [PATCH 1/4] Optimize vec_splats of constant vec_extract for V2DI/V2DF, PR target 99293.
+| Message-ID: <YkHhmvwSJF7DUDhJ@toto.the-meissners.org>
+| https://gcc.gnu.org/pipermail/gcc-patches/2022-March/592420.html
+
+Version 2 of the patch was:
+
+| Date: Fri, 13 May 2022 10:49:26 -0400
+| Subject: [PATCH] Optimize vec_splats of constant V2DI/V2DF vec_extract, PR target/99293
+| Message-ID: <Yn5v9kqBaETg0roR@toto.the-meissners.org>
+| https://gcc.gnu.org/pipermail/gcc-patches/2022-May/594797.html
+
+In PR target/99293, it was pointed out that doing:
+
+	vector long long dest0, dest1, src;
+	/* ... */
+	dest0 = vec_splats (vec_extract (src, 0));
+	dest1 = vec_splats (vec_extract (src, 1));
+
+would generate slower code.
+
+It generates the following code on power8:
+
+	;; vec_splats (vec_extract (src, 0))
+	xxpermdi 0,34,34,3
+	xxpermdi 34,0,0,0
+
+	;; vec_splats (vec_extract (src, 1))
+	xxlor 0,34,34
+	xxpermdi 34,0,0,0
+
+However on power9 and power10 it generates:
+
+	;; vec_splats (vec_extract (src, 0))
+	mfvsld 3,34
+	mtvsrdd 34,9,9
+
+	;; vec_splats (vec_extract (src, 1))
+	mfvsrd 9,34
+	mtvsrdd 34,9,9
+
+This is due to the power9 having the mfvsrld instruction which can extract
+either 64-bit element into a GPR.  While there are alternatives for both
+vector registers and GPR registers, the register allocator prefers to put
+DImode into GPR registers.
+
+In this case, it is better to have a single combiner pattern that can generate
+a single xxpermdi, instead of 2 insnsns (the extract and then the concat).
+This is true if the two operations are move from vector register and move to
+vector register.  As Segher pointed out in a previous version of the patch, the
+combiner already tries doing creating a (vec_duplicate (vec_select ...))
+pattern, but we didn't provide one.
+
+This patch reworks vsx_xxspltd_<mode> for V2DImode and V2DFmode so that it now
+uses VEC_DUPLICATE, which the combiner checks for.
+
+I have built Spec 2017 with this patch installed, and the cam4_r benchmark
+is the only benchmark that generated different code (3 mfvsrld/mtvsrdd
+pairs of instructions were replaced with xxpermdi).
+
+I have built bootstrap versions on the following systems and I have run
+the regression tests.  There were no regressions in the runs:
+
+	Power9 little endian, --with-cpu=power9
+	Power10 little endian, --with-cpu=power10
+	Power8 big endian, --with-cpu=power8 (both 32-bit & 64-bit tests)
+
+Can I install this into the trunk?  After a burn-in period, can I backport
+and install this into GCC 11 and GCC 10 branches?
+
+2022-06-06   Michael Meissner  <meissner@linux.ibm.com>
+
+gcc/
+	PR target/99293
+	* config/rs6000/rs6000-p8swap.cc (rtx_is_swappable_p): Remove
+	UNSPEC_VSX_XXSPLTD case.
+	* config/rs6000/vsx.md (UNSPEC_VSX_XXSPLTD): Delete.
+	(vsx_xxspltd_<mode>): Rewrite to use VEC_DUPLICATE.
+
+gcc/testsuite:
+	PR target/99293
+	* gcc.target/powerpc/builtins-1.c: Update insn count.
+	* gcc.target/powerpc/pr99293.c: New test.
+
 ==================== work090 patch #3
 
 Adjust MMA tests to account for no store vector pair.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [gcc(refs/users/meissner/heads/work090)] Update ChangeLog.meissner.
@ 2022-06-06 18:24 Michael Meissner
  0 siblings, 0 replies; 4+ messages in thread
From: Michael Meissner @ 2022-06-06 18:24 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:140c3bb4a687ae40068f686a99836e1c50a8b3de

commit 140c3bb4a687ae40068f686a99836e1c50a8b3de
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Mon Jun 6 14:23:50 2022 -0400

    Update ChangeLog.meissner.
    
    2022-06-06   Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
            * ChangeLog.meissner: Update.

Diff:
---
 gcc/ChangeLog.meissner | 90 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 90 insertions(+)

diff --git a/gcc/ChangeLog.meissner b/gcc/ChangeLog.meissner
index b21240d0afc..421d8fbff94 100644
--- a/gcc/ChangeLog.meissner
+++ b/gcc/ChangeLog.meissner
@@ -1,3 +1,93 @@
+==================== work090 patch #4
+
+Optimize vec_splats of constant vec_extract for V2DI/V2DF, PR target 99293.
+
+This is version 3 of the patch.  The original patch was:
+
+| Date: Mon, 28 Mar 2022 12:26:02 -0400
+| Subject: [PATCH 1/4] Optimize vec_splats of constant vec_extract for V2DI/V2DF, PR target 99293.
+| Message-ID: <YkHhmvwSJF7DUDhJ@toto.the-meissners.org>
+| https://gcc.gnu.org/pipermail/gcc-patches/2022-March/592420.html
+
+Version 2 of the patch was:
+
+| Date: Fri, 13 May 2022 10:49:26 -0400
+| Subject: [PATCH] Optimize vec_splats of constant V2DI/V2DF vec_extract, PR target/99293
+| Message-ID: <Yn5v9kqBaETg0roR@toto.the-meissners.org>
+| https://gcc.gnu.org/pipermail/gcc-patches/2022-May/594797.html
+
+In PR target/99293, it was pointed out that doing:
+
+	vector long long dest0, dest1, src;
+	/* ... */
+	dest0 = vec_splats (vec_extract (src, 0));
+	dest1 = vec_splats (vec_extract (src, 1));
+
+would generate slower code.
+
+It generates the following code on power8:
+
+	;; vec_splats (vec_extract (src, 0))
+	xxpermdi 0,34,34,3
+	xxpermdi 34,0,0,0
+
+	;; vec_splats (vec_extract (src, 1))
+	xxlor 0,34,34
+	xxpermdi 34,0,0,0
+
+However on power9 and power10 it generates:
+
+	;; vec_splats (vec_extract (src, 0))
+	mfvsld 3,34
+	mtvsrdd 34,9,9
+
+	;; vec_splats (vec_extract (src, 1))
+	mfvsrd 9,34
+	mtvsrdd 34,9,9
+
+This is due to the power9 having the mfvsrld instruction which can extract
+either 64-bit element into a GPR.  While there are alternatives for both
+vector registers and GPR registers, the register allocator prefers to put
+DImode into GPR registers.
+
+In this case, it is better to have a single combiner pattern that can generate
+a single xxpermdi, instead of 2 insnsns (the extract and then the concat).
+This is true if the two operations are move from vector register and move to
+vector register.  As Segher pointed out in a previous version of the patch, the
+combiner already tries doing creating a (vec_duplicate (vec_select ...))
+pattern, but we didn't provide one.
+
+This patch reworks vsx_xxspltd_<mode> for V2DImode and V2DFmode so that it now
+uses VEC_DUPLICATE, which the combiner checks for.
+
+I have built Spec 2017 with this patch installed, and the cam4_r benchmark
+is the only benchmark that generated different code (3 mfvsrld/mtvsrdd
+pairs of instructions were replaced with xxpermdi).
+
+I have built bootstrap versions on the following systems and I have run
+the regression tests.  There were no regressions in the runs:
+
+	Power9 little endian, --with-cpu=power9
+	Power10 little endian, --with-cpu=power10
+	Power8 big endian, --with-cpu=power8 (both 32-bit & 64-bit tests)
+
+Can I install this into the trunk?  After a burn-in period, can I backport
+and install this into GCC 11 and GCC 10 branches?
+
+2022-06-06   Michael Meissner  <meissner@linux.ibm.com>
+
+gcc/
+	PR target/99293
+	* config/rs6000/rs6000-p8swap.cc (rtx_is_swappable_p): Remove
+	UNSPEC_VSX_XXSPLTD case.
+	* config/rs6000/vsx.md (UNSPEC_VSX_XXSPLTD): Delete.
+	(vsx_xxspltd_<mode>): Rewrite to use VEC_DUPLICATE.
+
+gcc/testsuite:
+	PR target/99293
+	* gcc.target/powerpc/builtins-1.c: Update insn count.
+	* gcc.target/powerpc/pr99293.c: New test.
+
 ==================== work090 patch #3
 
 Adjust MMA tests to account for no store vector pair.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [gcc(refs/users/meissner/heads/work090)] Update ChangeLog.meissner.
@ 2022-06-03  2:44 Michael Meissner
  0 siblings, 0 replies; 4+ messages in thread
From: Michael Meissner @ 2022-06-03  2:44 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:3ac3752b2522ed246527eef3a4578b3bc5c75e15

commit 3ac3752b2522ed246527eef3a4578b3bc5c75e15
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Thu Jun 2 22:43:53 2022 -0400

    Update ChangeLog.meissner.
    
    2022-06-02   Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
            * ChangeLog.meissner: Update.

Diff:
---
 gcc/ChangeLog.meissner | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 67 insertions(+)

diff --git a/gcc/ChangeLog.meissner b/gcc/ChangeLog.meissner
index fb34f5c5a97..aa5e262ea6e 100644
--- a/gcc/ChangeLog.meissner
+++ b/gcc/ChangeLog.meissner
@@ -1,3 +1,70 @@
+==================== work090 patch #2
+
+Disable generating load/store vector pairs for block copies.
+
+If the store vector pair instruction is disabled, do not generate block
+copies that use load and store vector pair instructions.
+
+2022-06-02   Michael Meissner  <meissner@linux.ibm.com>
+
+gcc/
+
+	* config/rs6000/rs6000-string.cc (expand_block_move): If the store
+	vector pair instructions are disabled, do not generate block
+	copies using load and store vector pairs.
+
+Disable generating store vector pair.
+
+Testing has revealed that the power10 has some slowdowns if the store
+vector pair instruction is generated in some cases.  This patch disables
+generating the store vector pair instructions (stxvp, pstxvp, and stxvpx)
+unless an undocumented switch is used.  It is anticipated that perhaps
+with future machines we can generate the store vector pair instruction.
+
+This patch does a split after reload to convert a store vector pair
+instruction into a pair of store vector instructions.
+
+We do continue to generate the load vector pair instructions (lxvp, plxvp,
+and lxvpx), since we have found that in code that heavily uses MMA, it is
+still a win to generate the load vector pair instructions.
+
+There are two future patches planed:
+
+    1)	Disable block moves from generating load/store vector pair
+	instructions unless the the store vector pair instructions are
+	being generted.
+
+    2)	Make the built-in functions for generating store vector pair
+	always generate those instructions even if store vector pair
+	instructions are disabled.
+
+==================== work090 patch #1
+
+2022-06-02   Michael Meissner  <meissner@linux.ibm.com>
+
+gcc/
+
+	* config/rs6000/mma.md (movoo): Disable generating store vector
+	pair instructions unless these are enabled by the user.
+	(movxo): Likewise.
+	* config/rs6000/rs6000.cc (rs6000_setup_reg_addr_masks): If store
+	vector pair instructions are disabled, do not allow vector pair
+	addresses to be indexed.
+	(rs6000_split_multireg_move): Do not split XOmode stores into two
+	store vector pair instructions unless store vector pair
+	instructions are enabled.
+	* config/rs6000/rs6000.md (isa attribute): Add stxvp attribute.
+	(enabled attribute): Disable alternative using store vector pair
+	instructions unless they are enabled.
+	* config/rs6000/rs6000.opt (-mstore-vector-pair): New option.
+
+gcc/testsuite/
+
+	* gcc.target/powerpc/p10-store-vector-pair-1.c: New test.
+	* gcc.target/powerpc/p10-store-vector-pair-2.c: New test.
+
+==================== work090 banch creation
+
 2022-06-02   Michael Meissner  <meissner@linux.ibm.com>
 
 	Clone branch


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-06-06 20:16 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-03 16:26 [gcc(refs/users/meissner/heads/work090)] Update ChangeLog.meissner Michael Meissner
  -- strict thread matches above, loose matches on Subject: below --
2022-06-06 20:16 Michael Meissner
2022-06-06 18:24 Michael Meissner
2022-06-03  2:44 Michael Meissner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).