public inbox for gcc-cvs@sourceware.org
help / color / mirror / Atom feed
* [gcc(refs/users/meissner/heads/work084)] Update ChangeLog.meissner.
@ 2022-04-01 22:19 Michael Meissner
0 siblings, 0 replies; 13+ messages in thread
From: Michael Meissner @ 2022-04-01 22:19 UTC (permalink / raw)
To: gcc-cvs
https://gcc.gnu.org/g:982211f6a06ff1050f05394f8d5ace85f0b15fa8
commit 982211f6a06ff1050f05394f8d5ace85f0b15fa8
Author: Michael Meissner <meissner@linux.ibm.com>
Date: Fri Apr 1 18:18:54 2022 -0400
Update ChangeLog.meissner.
2022-04-01 Michael Meissner <meissner@linux.ibm.com>
gcc/
* ChangeLog.meissner: Update.
Diff:
---
gcc/ChangeLog.meissner | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
diff --git a/gcc/ChangeLog.meissner b/gcc/ChangeLog.meissner
index c7e98861615..e6ad03fd49c 100644
--- a/gcc/ChangeLog.meissner
+++ b/gcc/ChangeLog.meissner
@@ -1,3 +1,27 @@
+==================== Work084, patch #5:
+
+Add zero_extendditi2. Improve lxvr*x code generation.
+
+This pattern adds zero_extendditi2 so that if we are extending DImode to
+TImode, and we want the result in a vector register, the compiler can
+generate MTVSRDDD.
+
+In addition the patterns for generating lxvr{b,h,w,d}x and stxvr{b,h,w,d}x
+were tuned to allow loading to gpr registers and storing from gpr
+registers. This prevents needlessly doing direct moves to get the value
+into the vector registers if the gpr register was already selected.
+
+2022-04-01 Michael Meissner <meissner@linux.ibm.com>
+
+gcc/
+ * config/rs6000/vsx.md (vsx_lxvr<wd>x): Add support for loading to
+ GPR registers.
+ (vsx_stxvr<wd>x): Add support for storing from GPR registers.
+ (zero_extendditi2): New insn.
+
+gcc/testsuite/
+ * gcc.target/powerpc/zero-extend-di-ti.c: New test.
+
==================== Work084, patch #4:
Add zero_extendditi2.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [gcc(refs/users/meissner/heads/work084)] Update ChangeLog.meissner.
@ 2022-04-05 18:46 Michael Meissner
0 siblings, 0 replies; 13+ messages in thread
From: Michael Meissner @ 2022-04-05 18:46 UTC (permalink / raw)
To: gcc-cvs
https://gcc.gnu.org/g:f1ee76626c2b89504804e6406d6e152ca8991abe
commit f1ee76626c2b89504804e6406d6e152ca8991abe
Author: Michael Meissner <meissner@linux.ibm.com>
Date: Tue Apr 5 14:45:45 2022 -0400
Update ChangeLog.meissner.
2022-04-05 Michael Meissner <meissner@linux.ibm.com>
gcc/
* ChangeLog.meissner: Update.
Diff:
---
gcc/ChangeLog.meissner | 55 ++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 55 insertions(+)
diff --git a/gcc/ChangeLog.meissner b/gcc/ChangeLog.meissner
index 452690fe771..d3932454b00 100644
--- a/gcc/ChangeLog.meissner
+++ b/gcc/ChangeLog.meissner
@@ -1,3 +1,58 @@
+==================== Work084, patch #7:
+Optimize multiply/add of DImode extended to TImode.
+
+On power9 and power10 systems, we have instructions that support doing
+64-bit integers converted to 128-bit integers and producing 128-bit
+results. This patch adds support to generate these instructions.
+
+Previously GCC had define_expands to handle conversion of the 64-bit
+extend to 128-bit and multiply. This patch changes these define_expands
+to define_insn_and_split and then it provides combiner patterns to
+generate thes multiply/add instructions.
+
+To support using this optimization on power9, this patch extend the sign
+extend DImode to TImode to also run on power9 (added for PR
+target/104698).
+
+This patch needs the previous patch to add unsigned DImode to TImode
+conversion so that the combiner can combine the extend, multiply, and add
+instructions.
+
+
+2022-04-05 Michael Meissner <meissner@linux.ibm.com>
+
+gcc/
+ PR target/103109
+ * config/rs6000/rs6000.md (su_int32): New code attribute.
+ (<u>mul<mode><dmode>3): Convert from define_expand to
+ define_insn_and_split.
+ (maddld<mode>4): Add generator function.
+ (<u>mulditi3_<u>adddi3): New insn.
+ (<u>mulditi3_add_const): New insn.
+ (<u>mulditi3_<u>adddi3_upper): New insn.
+
+gcc/testsuite/
+ PR target/103109
+ * gcc.target/powerpc/pr103109.c: New test.
+
+==================== Work084, patch #6:
+Make addti3/subti3 be define_insn_and_split, instead of define_expand
+
+This patch makes addti3 and subti3 be define_insn_and_split instead of
+define_expand. This patch will be a building block to support in a future
+patch PR target/103109 which wants to optimize 128-bit some integer
+multiply-add combinations to use the power9 maddld, maddhd, maddhdu
+instructions. In order to support recognizing the multiply and add
+combination, we need to keep the addti3 and subti3 as complete insns
+through the combiner phase.
+
+2022-04-05 Michael Meissner <meissner@linux.ibm.com>
+
+gcc/
+ * config/rs6000/rs6000.md (addti3): Don't immediately expand the
+ insn, delay expansion until the split passes.
+ (subti3): Likewise.
+
==================== Work084, patch #5:
Add zero_extendditi2. Improve lxvr*x code generation.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [gcc(refs/users/meissner/heads/work084)] Update ChangeLog.meissner.
@ 2022-04-02 0:50 Michael Meissner
0 siblings, 0 replies; 13+ messages in thread
From: Michael Meissner @ 2022-04-02 0:50 UTC (permalink / raw)
To: gcc-cvs
https://gcc.gnu.org/g:2aea504015b39db038d72e8f6c13842ff6598c44
commit 2aea504015b39db038d72e8f6c13842ff6598c44
Author: Michael Meissner <meissner@linux.ibm.com>
Date: Fri Apr 1 20:50:41 2022 -0400
Update ChangeLog.meissner.
2022-04-01 Michael Meissner <meissner@linux.ibm.com>
gcc/
* ChangeLog.meissner: Update.
Diff:
---
gcc/ChangeLog.meissner | 3 +++
1 file changed, 3 insertions(+)
diff --git a/gcc/ChangeLog.meissner b/gcc/ChangeLog.meissner
index e6ad03fd49c..452690fe771 100644
--- a/gcc/ChangeLog.meissner
+++ b/gcc/ChangeLog.meissner
@@ -20,6 +20,9 @@ gcc/
(zero_extendditi2): New insn.
gcc/testsuite/
+ * gcc.target/powerpc/vsx-load-element-extend-int.c: Use -O2
+ instead of -O0 and update insn counts.
+ * gcc.target/powerpc/vsx-load-element-extend-short.c: Likewise.
* gcc.target/powerpc/zero-extend-di-ti.c: New test.
==================== Work084, patch #4:
^ permalink raw reply [flat|nested] 13+ messages in thread
* [gcc(refs/users/meissner/heads/work084)] Update ChangeLog.meissner.
@ 2022-03-31 19:55 Michael Meissner
0 siblings, 0 replies; 13+ messages in thread
From: Michael Meissner @ 2022-03-31 19:55 UTC (permalink / raw)
To: gcc-cvs
https://gcc.gnu.org/g:1aaa770e86c0ef6595f85f14245e19275e489215
commit 1aaa770e86c0ef6595f85f14245e19275e489215
Author: Michael Meissner <meissner@linux.ibm.com>
Date: Thu Mar 31 15:54:51 2022 -0400
Update ChangeLog.meissner.
2022-03-31 Michael Meissner <meissner@linux.ibm.com>
gcc/
* ChangeLog.meissner: Update.
Diff:
---
gcc/ChangeLog.meissner | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/gcc/ChangeLog.meissner b/gcc/ChangeLog.meissner
index 4633953a09f..c7e98861615 100644
--- a/gcc/ChangeLog.meissner
+++ b/gcc/ChangeLog.meissner
@@ -1,3 +1,18 @@
+==================== Work084, patch #4:
+
+Add zero_extendditi2.
+
+This pattern adds zero_extendditi2 so that if we are extending DImode to
+TImode, and we want the result in a vector register, the compiler can
+generate MTVSRDDD. In addition, on power10, it can generate LXVRDX if it
+is loading the value from memory and wanting to use it in a vector
+register.
+
+2022-03-31 Michael Meissner <meissner@linux.ibm.com>
+
+gcc/
+ * config/rs6000/vsx.md (zero_extendditi2): New insn.
+
==================== Work084, patch #3:
Replace UNSPEC with RTL code for extendditi2.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [gcc(refs/users/meissner/heads/work084)] Update ChangeLog.meissner.
@ 2022-03-31 16:34 Michael Meissner
0 siblings, 0 replies; 13+ messages in thread
From: Michael Meissner @ 2022-03-31 16:34 UTC (permalink / raw)
To: gcc-cvs
https://gcc.gnu.org/g:f5099c0334b5b0d5561027e80dfac9815b056821
commit f5099c0334b5b0d5561027e80dfac9815b056821
Author: Michael Meissner <meissner@linux.ibm.com>
Date: Thu Mar 31 12:33:40 2022 -0400
Update ChangeLog.meissner.
2022-03-31 Michael Meissner <meissner@linux.ibm.com>
gcc/
* ChangeLog.meissner: Update.
Diff:
---
gcc/ChangeLog.meissner | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/gcc/ChangeLog.meissner b/gcc/ChangeLog.meissner
index dbe5df13eec..4633953a09f 100644
--- a/gcc/ChangeLog.meissner
+++ b/gcc/ChangeLog.meissner
@@ -1,3 +1,20 @@
+==================== Work084, patch #3:
+
+Replace UNSPEC with RTL code for extendditi2.
+
+When I submitted my patch on March 12th for extendditi2, Segher wished I
+had removed the use of the UNSPEC for the vextsd2q instruction. This
+patch rewrites extendditi2_vector to use VEC_SELECT rather than UNSPEC.
+
+
+2022-03-30 Michael Meissner <meissner@linux.ibm.com>
+
+gcc/
+ * config/rs6000/vsx.md (UNSPEC_EXTENDDITI2): Delete.
+ (extendditi2_vector): Rewrite to use VEC_SELECT as a
+ define_expand.
+ (extendditi2_vector2): New insn.
+
==================== Work084, patch #2:
Optimize vec_splats of constant vec_extract for V2DI/V2DF, PR target 99293.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [gcc(refs/users/meissner/heads/work084)] Update ChangeLog.meissner.
@ 2022-03-31 14:01 Michael Meissner
0 siblings, 0 replies; 13+ messages in thread
From: Michael Meissner @ 2022-03-31 14:01 UTC (permalink / raw)
To: gcc-cvs
https://gcc.gnu.org/g:b3657557ead4e256255c25efba6478775dd77093
commit b3657557ead4e256255c25efba6478775dd77093
Author: Michael Meissner <meissner@linux.ibm.com>
Date: Wed Mar 30 14:00:50 2022 -0400
Update ChangeLog.meissner.
2022-03-30 Michael Meissner <meissner@linux.ibm.com>
gcc/
* ChangeLog.meissner: Update.
Diff:
---
gcc/ChangeLog.meissner | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/gcc/ChangeLog.meissner b/gcc/ChangeLog.meissner
index f646c26cfe5..dbe5df13eec 100644
--- a/gcc/ChangeLog.meissner
+++ b/gcc/ChangeLog.meissner
@@ -69,7 +69,7 @@ the regression tests. There were no regressions in the runs:
Can I install this into the trunk? After a burn-in period, can I backport
and install this into GCC 11 and GCC 10 branches?
-2022-03-29 Michael Meissner <meissner@linux.ibm.com>
+2022-03-30 Michael Meissner <meissner@linux.ibm.com>
gcc/
PR target/99293
^ permalink raw reply [flat|nested] 13+ messages in thread
* [gcc(refs/users/meissner/heads/work084)] Update ChangeLog.meissner.
@ 2022-03-31 14:00 Michael Meissner
0 siblings, 0 replies; 13+ messages in thread
From: Michael Meissner @ 2022-03-31 14:00 UTC (permalink / raw)
To: gcc-cvs
https://gcc.gnu.org/g:d09dcf3bcb6b162e71e11ea6e5db86cdb2a38557
commit d09dcf3bcb6b162e71e11ea6e5db86cdb2a38557
Author: Michael Meissner <meissner@linux.ibm.com>
Date: Tue Mar 29 19:57:09 2022 -0400
Update ChangeLog.meissner.
2022-03-29 Michael Meissner <meissner@linux.ibm.com>
gcc/
* ChangeLog.meissner: Update.
Diff:
---
gcc/ChangeLog.meissner | 29 +++++++++++++++++------------
1 file changed, 17 insertions(+), 12 deletions(-)
diff --git a/gcc/ChangeLog.meissner b/gcc/ChangeLog.meissner
index cc395f2f000..f646c26cfe5 100644
--- a/gcc/ChangeLog.meissner
+++ b/gcc/ChangeLog.meissner
@@ -1,7 +1,14 @@
-==================== Work084, patch #1:
+==================== Work084, patch #2:
Optimize vec_splats of constant vec_extract for V2DI/V2DF, PR target 99293.
+This is version 2 of the patch. The original patch was:
+
+| Date: Mon, 28 Mar 2022 12:26:02 -0400
+| Subject: [PATCH 1/4] Optimize vec_splats of constant vec_extract for V2DI/V2DF, PR target 99293.
+| Message-ID: <YkHhmvwSJF7DUDhJ@toto.the-meissners.org>
+| https://gcc.gnu.org/pipermail/gcc-patches/2022-March/592420.html
+
In PR target/99293, it was pointed out that doing:
vector long long dest0, dest1, src;
@@ -44,15 +51,16 @@ out in a previous version of the patch, the combiner already tries doing
creating a (vec_duplicate (vec_select ...)) pattern, but we didn't provide
one.
-I rewrote the existing pattern vsx_xxspltd_<mode> to have a VEC_DUPLCIATE
-so that the case would match for the PR instead of using UNSPEC.
+This patch reworks vsx_xxspltd_<mode> for V2DImode and V2DFmode so that it
+no longer uses an UNSPEC. Instead it uses VEC_DUPLICATE, which the
+combiner checks for.
I have built Spec 2017 with this patch installed, and the cam4_r benchmark
-is the only benchmark that generated different code. On a power9, I did
-not notice any significant changes in the runtime of cam4_r.
+is the only benchmark that generated different code (3 mfvsrld/mtvsrdd
+pairs of instructions were replaced with xxpermdi).
-I have built bootstrap versions on the following systems. There were no
-regressions in the runs:
+I have built bootstrap versions on the following systems and I have run
+the regression tests. There were no regressions in the runs:
Power9 little endian, --with-cpu=power9
Power10 little endian, --with-cpu=power10
@@ -61,7 +69,7 @@ regressions in the runs:
Can I install this into the trunk? After a burn-in period, can I backport
and install this into GCC 11 and GCC 10 branches?
-2022-03-28 Michael Meissner <meissner@linux.ibm.com>
+2022-03-29 Michael Meissner <meissner@linux.ibm.com>
gcc/
PR target/99293
@@ -75,10 +83,7 @@ gcc/testsuite:
* gcc.target/powerpc/builtins-1.c: Update insn count.
* gcc.target/powerpc/pr99293.c: New test.
-2022-03-28 Michael Meissner <meissner@linux.ibm.com>
-
-gcc/
- * ChangeLog.meissner: Update.
+==================== Work084, patch #1 (reverted):
==================== Work084, branch start
^ permalink raw reply [flat|nested] 13+ messages in thread
* [gcc(refs/users/meissner/heads/work084)] Update ChangeLog.meissner.
@ 2022-03-31 14:00 Michael Meissner
0 siblings, 0 replies; 13+ messages in thread
From: Michael Meissner @ 2022-03-31 14:00 UTC (permalink / raw)
To: gcc-cvs
https://gcc.gnu.org/g:44a156212f83ee928f771feb543c22642b2ea80e
commit 44a156212f83ee928f771feb543c22642b2ea80e
Author: Michael Meissner <meissner@linux.ibm.com>
Date: Mon Mar 28 23:06:00 2022 -0400
Update ChangeLog.meissner.
2022-03-28 Michael Meissner <meissner@linux.ibm.com>
gcc/
* ChangeLog.meissner: Update.
Diff:
---
gcc/ChangeLog.meissner | 79 ++++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 77 insertions(+), 2 deletions(-)
diff --git a/gcc/ChangeLog.meissner b/gcc/ChangeLog.meissner
index 5293370cb48..cc395f2f000 100644
--- a/gcc/ChangeLog.meissner
+++ b/gcc/ChangeLog.meissner
@@ -1,12 +1,87 @@
==================== Work084, patch #1:
-Update ChangeLog.meissner.
+
+Optimize vec_splats of constant vec_extract for V2DI/V2DF, PR target 99293.
+
+In PR target/99293, it was pointed out that doing:
+
+ vector long long dest0, dest1, src;
+ /* ... */
+ dest0 = vec_splats (vec_extract (src, 0));
+ dest1 = vec_splats (vec_extract (src, 1));
+
+would generate slower code.
+
+It generates the following code on power8:
+
+ ;; vec_splats (vec_extract (src, 0))
+ xxpermdi 0,34,34,3
+ xxpermdi 34,0,0,0
+
+ ;; vec_splats (vec_extract (src, 1))
+ xxlor 0,34,34
+ xxpermdi 34,0,0,0
+
+However on power9 and power10 it generates:
+
+ ;; vec_splats (vec_extract (src, 0))
+ mfvsld 3,34
+ mtvsrdd 34,9,9
+
+ ;; vec_splats (vec_extract (src, 1))
+ mfvsrd 9,34
+ mtvsrdd 34,9,9
+
+This is due to the power9 having the mfvsrld instruction which can extract
+either 64-bit element into a GPR. While there are alternatives for both
+vector registers and GPR registers, the register allocator prefers to put
+DImode into GPR registers.
+
+However in this case, it is better to have a single combiner pattern that
+can generate a single xxpermdi, instead of doing 2 insnsns (the extract
+and then the concat). This is particularly true if the two operations are
+move from vector register and move to vector register. As Segher pointed
+out in a previous version of the patch, the combiner already tries doing
+creating a (vec_duplicate (vec_select ...)) pattern, but we didn't provide
+one.
+
+I rewrote the existing pattern vsx_xxspltd_<mode> to have a VEC_DUPLCIATE
+so that the case would match for the PR instead of using UNSPEC.
+
+I have built Spec 2017 with this patch installed, and the cam4_r benchmark
+is the only benchmark that generated different code. On a power9, I did
+not notice any significant changes in the runtime of cam4_r.
+
+I have built bootstrap versions on the following systems. There were no
+regressions in the runs:
+
+ Power9 little endian, --with-cpu=power9
+ Power10 little endian, --with-cpu=power10
+ Power8 big endian, --with-cpu=power8 (both 32-bit & 64-bit tests)
+
+Can I install this into the trunk? After a burn-in period, can I backport
+and install this into GCC 11 and GCC 10 branches?
+
+2022-03-28 Michael Meissner <meissner@linux.ibm.com>
+
+gcc/
+ PR target/99293
+ * config/rs6000/rs6000-p8swap.cc (rtx_is_swappable_p): Remove
+ UNSPEC_VSX_XXSPLTD case.
+ * config/rs6000/vsx.md (UNSPEC_VSX_XXSPLTD): Delete.
+ (vsx_xxspltd_<mode>): Rewrite to use VEC_DUPLICATE.
+
+gcc/testsuite:
+ PR target/99293
+ * gcc.target/powerpc/builtins-1.c: Update insn count.
+ * gcc.target/powerpc/pr99293.c: New test.
2022-03-28 Michael Meissner <meissner@linux.ibm.com>
gcc/
* ChangeLog.meissner: Update.
+==================== Work084, branch start
+
2022-03-28 Michael Meissner <meissner@linux.ibm.com>
Clone branch
-
^ permalink raw reply [flat|nested] 13+ messages in thread
* [gcc(refs/users/meissner/heads/work084)] Update ChangeLog.meissner.
@ 2022-03-31 14:00 Michael Meissner
0 siblings, 0 replies; 13+ messages in thread
From: Michael Meissner @ 2022-03-31 14:00 UTC (permalink / raw)
To: gcc-cvs
https://gcc.gnu.org/g:b0b2e77a142f82f3361506ea0f1e02d24f3c93bb
commit b0b2e77a142f82f3361506ea0f1e02d24f3c93bb
Author: Michael Meissner <meissner@linux.ibm.com>
Date: Mon Mar 28 20:12:27 2022 -0400
Update ChangeLog.meissner.
2022-03-28 Michael Meissner <meissner@linux.ibm.com>
gcc/
* ChangeLog.meissner: Update.
Diff:
---
gcc/ChangeLog.meissner | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/gcc/ChangeLog.meissner b/gcc/ChangeLog.meissner
index d3cf56499f6..5293370cb48 100644
--- a/gcc/ChangeLog.meissner
+++ b/gcc/ChangeLog.meissner
@@ -1,3 +1,11 @@
+==================== Work084, patch #1:
+Update ChangeLog.meissner.
+
+2022-03-28 Michael Meissner <meissner@linux.ibm.com>
+
+gcc/
+ * ChangeLog.meissner: Update.
+
2022-03-28 Michael Meissner <meissner@linux.ibm.com>
Clone branch
^ permalink raw reply [flat|nested] 13+ messages in thread
* [gcc(refs/users/meissner/heads/work084)] Update ChangeLog.meissner.
@ 2022-03-30 18:01 Michael Meissner
0 siblings, 0 replies; 13+ messages in thread
From: Michael Meissner @ 2022-03-30 18:01 UTC (permalink / raw)
To: gcc-cvs
https://gcc.gnu.org/g:8f5781257ce998948f363aa6c9bbf72097b3f275
commit 8f5781257ce998948f363aa6c9bbf72097b3f275
Author: Michael Meissner <meissner@linux.ibm.com>
Date: Wed Mar 30 14:00:50 2022 -0400
Update ChangeLog.meissner.
2022-03-30 Michael Meissner <meissner@linux.ibm.com>
gcc/
* ChangeLog.meissner: Update.
Diff:
---
gcc/ChangeLog.meissner | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/gcc/ChangeLog.meissner b/gcc/ChangeLog.meissner
index f646c26cfe5..dbe5df13eec 100644
--- a/gcc/ChangeLog.meissner
+++ b/gcc/ChangeLog.meissner
@@ -69,7 +69,7 @@ the regression tests. There were no regressions in the runs:
Can I install this into the trunk? After a burn-in period, can I backport
and install this into GCC 11 and GCC 10 branches?
-2022-03-29 Michael Meissner <meissner@linux.ibm.com>
+2022-03-30 Michael Meissner <meissner@linux.ibm.com>
gcc/
PR target/99293
^ permalink raw reply [flat|nested] 13+ messages in thread
* [gcc(refs/users/meissner/heads/work084)] Update ChangeLog.meissner.
@ 2022-03-29 23:57 Michael Meissner
0 siblings, 0 replies; 13+ messages in thread
From: Michael Meissner @ 2022-03-29 23:57 UTC (permalink / raw)
To: gcc-cvs
https://gcc.gnu.org/g:91455bff1f0cb862215c27bc3432e5181ae91a03
commit 91455bff1f0cb862215c27bc3432e5181ae91a03
Author: Michael Meissner <meissner@linux.ibm.com>
Date: Tue Mar 29 19:57:09 2022 -0400
Update ChangeLog.meissner.
2022-03-29 Michael Meissner <meissner@linux.ibm.com>
gcc/
* ChangeLog.meissner: Update.
Diff:
---
gcc/ChangeLog.meissner | 29 +++++++++++++++++------------
1 file changed, 17 insertions(+), 12 deletions(-)
diff --git a/gcc/ChangeLog.meissner b/gcc/ChangeLog.meissner
index cc395f2f000..f646c26cfe5 100644
--- a/gcc/ChangeLog.meissner
+++ b/gcc/ChangeLog.meissner
@@ -1,7 +1,14 @@
-==================== Work084, patch #1:
+==================== Work084, patch #2:
Optimize vec_splats of constant vec_extract for V2DI/V2DF, PR target 99293.
+This is version 2 of the patch. The original patch was:
+
+| Date: Mon, 28 Mar 2022 12:26:02 -0400
+| Subject: [PATCH 1/4] Optimize vec_splats of constant vec_extract for V2DI/V2DF, PR target 99293.
+| Message-ID: <YkHhmvwSJF7DUDhJ@toto.the-meissners.org>
+| https://gcc.gnu.org/pipermail/gcc-patches/2022-March/592420.html
+
In PR target/99293, it was pointed out that doing:
vector long long dest0, dest1, src;
@@ -44,15 +51,16 @@ out in a previous version of the patch, the combiner already tries doing
creating a (vec_duplicate (vec_select ...)) pattern, but we didn't provide
one.
-I rewrote the existing pattern vsx_xxspltd_<mode> to have a VEC_DUPLCIATE
-so that the case would match for the PR instead of using UNSPEC.
+This patch reworks vsx_xxspltd_<mode> for V2DImode and V2DFmode so that it
+no longer uses an UNSPEC. Instead it uses VEC_DUPLICATE, which the
+combiner checks for.
I have built Spec 2017 with this patch installed, and the cam4_r benchmark
-is the only benchmark that generated different code. On a power9, I did
-not notice any significant changes in the runtime of cam4_r.
+is the only benchmark that generated different code (3 mfvsrld/mtvsrdd
+pairs of instructions were replaced with xxpermdi).
-I have built bootstrap versions on the following systems. There were no
-regressions in the runs:
+I have built bootstrap versions on the following systems and I have run
+the regression tests. There were no regressions in the runs:
Power9 little endian, --with-cpu=power9
Power10 little endian, --with-cpu=power10
@@ -61,7 +69,7 @@ regressions in the runs:
Can I install this into the trunk? After a burn-in period, can I backport
and install this into GCC 11 and GCC 10 branches?
-2022-03-28 Michael Meissner <meissner@linux.ibm.com>
+2022-03-29 Michael Meissner <meissner@linux.ibm.com>
gcc/
PR target/99293
@@ -75,10 +83,7 @@ gcc/testsuite:
* gcc.target/powerpc/builtins-1.c: Update insn count.
* gcc.target/powerpc/pr99293.c: New test.
-2022-03-28 Michael Meissner <meissner@linux.ibm.com>
-
-gcc/
- * ChangeLog.meissner: Update.
+==================== Work084, patch #1 (reverted):
==================== Work084, branch start
^ permalink raw reply [flat|nested] 13+ messages in thread
* [gcc(refs/users/meissner/heads/work084)] Update ChangeLog.meissner.
@ 2022-03-29 3:06 Michael Meissner
0 siblings, 0 replies; 13+ messages in thread
From: Michael Meissner @ 2022-03-29 3:06 UTC (permalink / raw)
To: gcc-cvs
https://gcc.gnu.org/g:955759242cc723ffecb0b47202e07335c9e21032
commit 955759242cc723ffecb0b47202e07335c9e21032
Author: Michael Meissner <meissner@linux.ibm.com>
Date: Mon Mar 28 23:06:00 2022 -0400
Update ChangeLog.meissner.
2022-03-28 Michael Meissner <meissner@linux.ibm.com>
gcc/
* ChangeLog.meissner: Update.
Diff:
---
gcc/ChangeLog.meissner | 79 ++++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 77 insertions(+), 2 deletions(-)
diff --git a/gcc/ChangeLog.meissner b/gcc/ChangeLog.meissner
index 5293370cb48..cc395f2f000 100644
--- a/gcc/ChangeLog.meissner
+++ b/gcc/ChangeLog.meissner
@@ -1,12 +1,87 @@
==================== Work084, patch #1:
-Update ChangeLog.meissner.
+
+Optimize vec_splats of constant vec_extract for V2DI/V2DF, PR target 99293.
+
+In PR target/99293, it was pointed out that doing:
+
+ vector long long dest0, dest1, src;
+ /* ... */
+ dest0 = vec_splats (vec_extract (src, 0));
+ dest1 = vec_splats (vec_extract (src, 1));
+
+would generate slower code.
+
+It generates the following code on power8:
+
+ ;; vec_splats (vec_extract (src, 0))
+ xxpermdi 0,34,34,3
+ xxpermdi 34,0,0,0
+
+ ;; vec_splats (vec_extract (src, 1))
+ xxlor 0,34,34
+ xxpermdi 34,0,0,0
+
+However on power9 and power10 it generates:
+
+ ;; vec_splats (vec_extract (src, 0))
+ mfvsld 3,34
+ mtvsrdd 34,9,9
+
+ ;; vec_splats (vec_extract (src, 1))
+ mfvsrd 9,34
+ mtvsrdd 34,9,9
+
+This is due to the power9 having the mfvsrld instruction which can extract
+either 64-bit element into a GPR. While there are alternatives for both
+vector registers and GPR registers, the register allocator prefers to put
+DImode into GPR registers.
+
+However in this case, it is better to have a single combiner pattern that
+can generate a single xxpermdi, instead of doing 2 insnsns (the extract
+and then the concat). This is particularly true if the two operations are
+move from vector register and move to vector register. As Segher pointed
+out in a previous version of the patch, the combiner already tries doing
+creating a (vec_duplicate (vec_select ...)) pattern, but we didn't provide
+one.
+
+I rewrote the existing pattern vsx_xxspltd_<mode> to have a VEC_DUPLCIATE
+so that the case would match for the PR instead of using UNSPEC.
+
+I have built Spec 2017 with this patch installed, and the cam4_r benchmark
+is the only benchmark that generated different code. On a power9, I did
+not notice any significant changes in the runtime of cam4_r.
+
+I have built bootstrap versions on the following systems. There were no
+regressions in the runs:
+
+ Power9 little endian, --with-cpu=power9
+ Power10 little endian, --with-cpu=power10
+ Power8 big endian, --with-cpu=power8 (both 32-bit & 64-bit tests)
+
+Can I install this into the trunk? After a burn-in period, can I backport
+and install this into GCC 11 and GCC 10 branches?
+
+2022-03-28 Michael Meissner <meissner@linux.ibm.com>
+
+gcc/
+ PR target/99293
+ * config/rs6000/rs6000-p8swap.cc (rtx_is_swappable_p): Remove
+ UNSPEC_VSX_XXSPLTD case.
+ * config/rs6000/vsx.md (UNSPEC_VSX_XXSPLTD): Delete.
+ (vsx_xxspltd_<mode>): Rewrite to use VEC_DUPLICATE.
+
+gcc/testsuite:
+ PR target/99293
+ * gcc.target/powerpc/builtins-1.c: Update insn count.
+ * gcc.target/powerpc/pr99293.c: New test.
2022-03-28 Michael Meissner <meissner@linux.ibm.com>
gcc/
* ChangeLog.meissner: Update.
+==================== Work084, branch start
+
2022-03-28 Michael Meissner <meissner@linux.ibm.com>
Clone branch
-
^ permalink raw reply [flat|nested] 13+ messages in thread
* [gcc(refs/users/meissner/heads/work084)] Update ChangeLog.meissner.
@ 2022-03-29 0:12 Michael Meissner
0 siblings, 0 replies; 13+ messages in thread
From: Michael Meissner @ 2022-03-29 0:12 UTC (permalink / raw)
To: gcc-cvs
https://gcc.gnu.org/g:98834384d9b299e60f24dba6ebf672599d4da925
commit 98834384d9b299e60f24dba6ebf672599d4da925
Author: Michael Meissner <meissner@linux.ibm.com>
Date: Mon Mar 28 20:12:27 2022 -0400
Update ChangeLog.meissner.
2022-03-28 Michael Meissner <meissner@linux.ibm.com>
gcc/
* ChangeLog.meissner: Update.
Diff:
---
gcc/ChangeLog.meissner | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/gcc/ChangeLog.meissner b/gcc/ChangeLog.meissner
index d3cf56499f6..5293370cb48 100644
--- a/gcc/ChangeLog.meissner
+++ b/gcc/ChangeLog.meissner
@@ -1,3 +1,11 @@
+==================== Work084, patch #1:
+Update ChangeLog.meissner.
+
+2022-03-28 Michael Meissner <meissner@linux.ibm.com>
+
+gcc/
+ * ChangeLog.meissner: Update.
+
2022-03-28 Michael Meissner <meissner@linux.ibm.com>
Clone branch
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2022-04-05 18:46 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-01 22:19 [gcc(refs/users/meissner/heads/work084)] Update ChangeLog.meissner Michael Meissner
-- strict thread matches above, loose matches on Subject: below --
2022-04-05 18:46 Michael Meissner
2022-04-02 0:50 Michael Meissner
2022-03-31 19:55 Michael Meissner
2022-03-31 16:34 Michael Meissner
2022-03-31 14:01 Michael Meissner
2022-03-31 14:00 Michael Meissner
2022-03-31 14:00 Michael Meissner
2022-03-31 14:00 Michael Meissner
2022-03-30 18:01 Michael Meissner
2022-03-29 23:57 Michael Meissner
2022-03-29 3:06 Michael Meissner
2022-03-29 0:12 Michael Meissner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).