public inbox for gcc-cvs@sourceware.org
help / color / mirror / Atom feed
* [gcc(refs/users/meissner/heads/dmf007)] Update ChangeLog.meissner
@ 2023-02-01  6:28 Michael Meissner
  0 siblings, 0 replies; 11+ messages in thread
From: Michael Meissner @ 2023-02-01  6:28 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:3d0eb0096c1513f0fdf6ea7bea8721255a21860d

commit 3d0eb0096c1513f0fdf6ea7bea8721255a21860d
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Wed Feb 1 01:27:58 2023 -0500

    Update ChangeLog.meissner

Diff:
---
 gcc/ChangeLog.meissner | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/gcc/ChangeLog.meissner b/gcc/ChangeLog.meissner
index c361afa4260..d7a41436152 100644
--- a/gcc/ChangeLog.meissner
+++ b/gcc/ChangeLog.meissner
@@ -1,3 +1,13 @@
+==================== dmf007, patch #39 ====================
+
+Fix dmr test.
+
+2023-02-01   Michael Meissner  <meissner@linux.ibm.com>
+
+gcc/testsuite/
+
+	* gcc.target/powerpc/dm-double-test.s: Fix test to use dmsetdmrz.
+
 ==================== dmf007, patch #38 ====================
 
 Bump up precision size to 11 bits.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [gcc(refs/users/meissner/heads/dmf007)] Update ChangeLog.meissner
@ 2023-02-02  2:59 Michael Meissner
  0 siblings, 0 replies; 11+ messages in thread
From: Michael Meissner @ 2023-02-02  2:59 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:5c95330904db0dd2808c3717d1b4394c06fe7332

commit 5c95330904db0dd2808c3717d1b4394c06fe7332
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Wed Feb 1 21:59:23 2023 -0500

    Update ChangeLog.meissner

Diff:
---
 gcc/ChangeLog.meissner | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/gcc/ChangeLog.meissner b/gcc/ChangeLog.meissner
index be0f215b9a5..a62040e495d 100644
--- a/gcc/ChangeLog.meissner
+++ b/gcc/ChangeLog.meissner
@@ -1,5 +1,20 @@
 ==================== dmf007, patch #40 ====================
 
+Update lxvrl test for 32-bit.
+
+2023-02-01   Michael Meissner  <meissner@linux.ibm.com>
+
+gcc/
+
+	* config/rs6000/rs6000-string.c (expand_block_move): Only generate LXVL
+	and STXVL moves on 64-bit, since the pattern is not enabled on 32-bit.
+
+gcc/testsuite/
+
+	* gcc.target/powerpc/lxvrl.c: Restrict test to 64-bit.
+
+==================== dmf007, patch #40 ====================
+
 Bump up precision size to 16 bits.
 
 The new __dmr type that is being added as a possible future PowerPC instruction

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [gcc(refs/users/meissner/heads/dmf007)] Update ChangeLog.meissner
@ 2023-02-01 17:35 Michael Meissner
  0 siblings, 0 replies; 11+ messages in thread
From: Michael Meissner @ 2023-02-01 17:35 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:6569abbe7968409a5433508c8d618d90e9013f9e

commit 6569abbe7968409a5433508c8d618d90e9013f9e
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Wed Feb 1 12:35:28 2023 -0500

    Update ChangeLog.meissner

Diff:
---
 gcc/ChangeLog.meissner | 34 +++++++++++++++++++---------------
 1 file changed, 19 insertions(+), 15 deletions(-)

diff --git a/gcc/ChangeLog.meissner b/gcc/ChangeLog.meissner
index d7a41436152..be0f215b9a5 100644
--- a/gcc/ChangeLog.meissner
+++ b/gcc/ChangeLog.meissner
@@ -1,16 +1,6 @@
-==================== dmf007, patch #39 ====================
-
-Fix dmr test.
+==================== dmf007, patch #40 ====================
 
-2023-02-01   Michael Meissner  <meissner@linux.ibm.com>
-
-gcc/testsuite/
-
-	* gcc.target/powerpc/dm-double-test.s: Fix test to use dmsetdmrz.
-
-==================== dmf007, patch #38 ====================
-
-Bump up precision size to 11 bits.
+Bump up precision size to 16 bits.
 
 The new __dmr type that is being added as a possible future PowerPC instruction
 set bumps into a structure field size issue.  The size of the __dmr type is 1024 bits.
@@ -28,13 +18,27 @@ before the two shifts, while the PowerPC always returns 0.  In the ccp pass, the
 original input is -1, and so it worked.  When I did the runs on the PowerPC, the
 result was 0, which ultimately led to the failure.
 
-2023-01-31   Michael Meissner  <meissner@linux.ibm.com>
+2023-02-01   Richard Biener  <rguenther@suse.de>
+	     Michael Meissner  <meissner@linux.ibm.com>
 
 gcc/
 
+	PR middle-end/108623
 	* hwint.h (sext_hwi): Add assertion against precision 0.
-	* tree-core.h (tree_type_common): Bump up precision field by 1 bit, and
-	reduce contains_placeholder_bits to 1 bit.
+	* tree-core.h (tree_type_common): Bump up precision field to 16 bits.
+	Align bit fields > 1 bit to at least an 8-bit boundary.
+
+==================== dmf007, patch #39 ====================
+
+Fix dmr test.
+
+2023-02-01   Michael Meissner  <meissner@linux.ibm.com>
+
+gcc/testsuite/
+
+	* gcc.target/powerpc/dm-double-test.s: Fix test to use dmsetdmrz.
+
+==================== dmf007, patch #38 was reverted ====================
 
 ==================== dmf007, patch #37 ====================

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [gcc(refs/users/meissner/heads/dmf007)] Update ChangeLog.meissner
@ 2023-02-01  3:10 Michael Meissner
  0 siblings, 0 replies; 11+ messages in thread
From: Michael Meissner @ 2023-02-01  3:10 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:3f0725b61d02f647c03378cd536e83796212aef0

commit 3f0725b61d02f647c03378cd536e83796212aef0
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Tue Jan 31 22:10:23 2023 -0500

    Update ChangeLog.meissner

Diff:
---
 gcc/ChangeLog.meissner | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/gcc/ChangeLog.meissner b/gcc/ChangeLog.meissner
index a0d78e0f5aa..c361afa4260 100644
--- a/gcc/ChangeLog.meissner
+++ b/gcc/ChangeLog.meissner
@@ -1,3 +1,31 @@
+==================== dmf007, patch #38 ====================
+
+Bump up precision size to 11 bits.
+
+The new __dmr type that is being added as a possible future PowerPC instruction
+set bumps into a structure field size issue.  The size of the __dmr type is 1024 bits.
+The precision field in tree_type_common is currently 10 bits, so if you store
+1,024 into field, you get a 0 back.  When you get 0 in the precision field, the
+ccp pass passes this 0 to sext_hwi in hwint.h.  That function in turn generates
+a shift that is equal to the host wide int bit size, which is undefined as
+machine dependent for shifting in C/C++.
+
+      int shift = HOST_BITS_PER_WIDE_INT - prec;
+      return ((HOST_WIDE_INT) ((unsigned HOST_WIDE_INT) src << shift)) >> shift;
+
+It turns out the x86_64 where I first did my tests returns the original input
+before the two shifts, while the PowerPC always returns 0.  In the ccp pass, the
+original input is -1, and so it worked.  When I did the runs on the PowerPC, the
+result was 0, which ultimately led to the failure.
+
+2023-01-31   Michael Meissner  <meissner@linux.ibm.com>
+
+gcc/
+
+	* hwint.h (sext_hwi): Add assertion against precision 0.
+	* tree-core.h (tree_type_common): Bump up precision field by 1 bit, and
+	reduce contains_placeholder_bits to 1 bit.
+
 ==================== dmf007, patch #37 ====================
 
 Support load/store vector with right length.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [gcc(refs/users/meissner/heads/dmf007)] Update ChangeLog.meissner
@ 2023-01-28  7:14 Michael Meissner
  0 siblings, 0 replies; 11+ messages in thread
From: Michael Meissner @ 2023-01-28  7:14 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:84d98611a2a43e289908e8ba0977101a218f19c3

commit 84d98611a2a43e289908e8ba0977101a218f19c3
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Sat Jan 28 02:14:38 2023 -0500

    Update ChangeLog.meissner

Diff:
---
 gcc/ChangeLog.meissner | 23 +++++++++--------------
 1 file changed, 9 insertions(+), 14 deletions(-)

diff --git a/gcc/ChangeLog.meissner b/gcc/ChangeLog.meissner
index 36efa83d3a7..a0d78e0f5aa 100644
--- a/gcc/ChangeLog.meissner
+++ b/gcc/ChangeLog.meissner
@@ -1,4 +1,4 @@
-==================== dmf007, patch #37 was reverted ====================
+==================== dmf007, patch #37 ====================
 
 Support load/store vector with right length.
 
@@ -37,7 +37,7 @@ gcc/testsuite/
 
 	* gcc.target/powerpc/lxvrl.c: New test.
 
-==================== dmf007, patch #36 was reverted ====================
+==================== dmf007, patch #36 ====================
 
 Add saturating subtract built-ins.
 
@@ -87,7 +87,7 @@ gcc/testsuite/
 	* lib/target-supports.exp (check_effective_target_powerpc_future_ok):
 	New effective target.
 
-==================== dmf007, patch #35 was reverted ====================
+==================== dmf007, patch #35 ====================
 
 PowerPC: Add support for 1,024 bit DMR registers.
 
@@ -106,11 +106,6 @@ since there are no load/store dense math instructions.  I added the new keyword
 '__dmr' to create 1,024 bit types that can be loaded into DMRs.  At present, I
 don't have aliases for __dmr512 and __dmr1024 that we've discussed internally.
 
-At present, the tree constant propigation patch does not work with 1,024 bit
-DMRs.  I believe this is due to the CCP pass not skipping opaque modes.  I hope
-once this patch is committed, we can work on the machine independent changes to
-allow the CCP pass not to issue an internal error when a DMR is used.
-
 The patches have been tested on the following platforms.  I added the patches
 for PR target/107299 that I submitted on November 2nd before doing the builds so
 that GCC would build on systems using IEEE 128-bit long double.
@@ -126,7 +121,7 @@ tests:
 
 Can I check this patch into the GCC 13 master branch?
 
-2023-01-27   Michael Meissner  <meissner@linux.ibm.com>
+2023-01-28   Michael Meissner  <meissner@linux.ibm.com>
 
 gcc/
 
@@ -175,7 +170,7 @@ gcc/testsuite/
 
 	* gcc.target/powerpc/dm-1024bit.c: New test.
 
-==================== dmf007, patch #34 was reverted ====================
+==================== dmf007, patch #34 ====================
 
 PowerPC: Switch to dense math names for all MMA operations.
 
@@ -235,7 +230,7 @@ gcc/testsuite/
 	* lib/target-supports.exp (check_effective_target_ppc_dmr_ok): New
 	target test.
 
-==================== dmf007, patch #33 was reverted ====================
+==================== dmf007, patch #33 ====================
 
 PowerPC: Make MMA insns support DMR registers.
 
@@ -294,7 +289,7 @@ gcc/
 	prime the DMR registers or the xxmfacc instruction to de-prime
 	instructions if we have dense math register support.
 
-==================== dmf007, patch #32 was reverted ====================
+==================== dmf007, patch #32 ====================
 
 PowerPC: Add support for accumulators in DMR registers.
 
@@ -422,7 +417,7 @@ gcc/
 	* config/rs6000/rs6000.opt (-mdense-math): New switch.
 	* doc/md.texi (PowerPC constraints): Document wD constraint.
 
-==================== dmf007, patch #31 was reverted ====================
+==================== dmf007, patch #31 ====================
 
 PowerPC: Make -mcpu=future enable -mblock-ops-vector-pair.
 
@@ -455,7 +450,7 @@ gcc/
 	-mblock-ops-vector-pair.
 	(POWERPC_MASKS): Likewise.
 
-==================== dmf007, patch #30 was reverted ====================
+==================== dmf007, patch #30 ====================
 
 PowerPC: Add -mcpu=future.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [gcc(refs/users/meissner/heads/dmf007)] Update ChangeLog.meissner
@ 2023-01-28  3:21 Michael Meissner
  0 siblings, 0 replies; 11+ messages in thread
From: Michael Meissner @ 2023-01-28  3:21 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:314be43dcc8dc1357519fe370c626fb8959f1aaf

commit 314be43dcc8dc1357519fe370c626fb8959f1aaf
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Fri Jan 27 22:21:09 2023 -0500

    Update ChangeLog.meissner

Diff:
---
 gcc/ChangeLog.meissner | 598 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 598 insertions(+)

diff --git a/gcc/ChangeLog.meissner b/gcc/ChangeLog.meissner
index b2df16891af..36efa83d3a7 100644
--- a/gcc/ChangeLog.meissner
+++ b/gcc/ChangeLog.meissner
@@ -1,3 +1,601 @@
+==================== dmf007, patch #37 was reverted ====================
+
+Support load/store vector with right length.
+
+This patch adds support for new instructions that may be added to the PowerPC
+architecture in the future to enhance the load and store vector with length
+instructions.
+
+The current instructions (lxvl, lxvll, stxvl, and stxvll) are inconvient to use
+since the count for the number of bytes must be in the top 8 bits of the GPR
+register, instead of the bottom 8 bits.  This meant that code generating these
+instructions typically had to do a shift left by 56 bits to get the count into
+the right position.  In a future version of the PowerPC architecture, new
+variants of these instructions might be added that expect the count to be in
+the bottom 8 bits of the GPR register.  These patches add this support to GCC
+if the user uses the -mcpu=future option.
+
+I tested this patch on a little endian power10 system with long double using
+the tradiational IBM double double format.  Assuming the other 6 patches for
+-mcpu=future are checked in (or at least the first patch), can I check this
+patch into the master branch for GCC 13.
+
+2023-01-27   Michael Meissner  <meissner@linux.ibm.com>
+
+gcc/
+
+	* config/rs6000/vsx.md (lxvl): If -mcpu=future, generate the lxvl with
+	the shift count automaticaly used in the insn.
+	(lxvrl): New insn for -mcpu=future.
+	(lxvrll): Likewise.
+	(stxvl): If -mcpu=future, generate the stxvl with the shift count
+	automaticaly used in the insn.
+	(stxvrl): New insn for -mcpu=future.
+	(stxvrll): Likewise.
+
+gcc/testsuite/
+
+	* gcc.target/powerpc/lxvrl.c: New test.
+
+==================== dmf007, patch #36 was reverted ====================
+
+Add saturating subtract built-ins.
+
+This patch adds support for a saturating subtract built-in function that may be
+added to a future PowerPC processor.  Note, if it is added, the name of the
+built-in function may change before GCC 13 is released.  If the name changes,
+we will submit a patch changing the name.
+
+I also added support for providing dense math built-in functions, even though
+at present, we have not added any new built-in functions for dense math.  It is
+likely we will want to add new dense math built-in functions as the dense math
+support is fleshed out.
+
+I tested this patch on a little endian power10 system with long double using
+the tradiational IBM double double format.  Assuming the other 6 patches for
+-mcpu=future are checked in (or at least the first patch), can I check this
+patch into the master branch for GCC 13.
+
+2023-01-27   Michael Meissner  <meissner@linux.ibm.com>
+
+gcc/
+
+	* config/rs6000/rs6000-builtin.cc (rs6000_invalid_builtin): Add support
+	for flagging invalid use of future built-in functions.
+	(rs6000_builtin_is_supported): Add support for future built-in
+	functions.
+	* config/rs6000/rs6000-builtins.def (__builtin_saturate_subtract32): New
+	built-in function for -mcpu=future.
+	(__builtin_saturate_subtract64): Likewise.
+	* config/rs6000/rs6000-gen-builtins.cc (enum bif_stanza): Add stanzas
+	for -mcpu=future built-ins.
+	(stanza_map): Likewise.
+	(enable_string): Likewise.
+	(struct attrinfo): Likewise.
+	(parse_bif_attrs): Likewise.
+	(write_decls): Likewise.
+	* config/rs6000/rs6000.md (sat_sub<mode>3): Add saturating subtract
+	built-in insn declarations.
+	(sat_sub<mode>3_dot): Likewise.
+	(sat_sub<mode>3_dot2): Likewise.
+	* doc/extend.texi (Future PowerPC built-ins): New section.
+
+gcc/testsuite/
+
+	* gcc.target/powerpc/subfus-1.c: New test.
+	* gcc.target/powerpc/subfus-2.c: Likewise.
+	* lib/target-supports.exp (check_effective_target_powerpc_future_ok):
+	New effective target.
+
+==================== dmf007, patch #35 was reverted ====================
+
+PowerPC: Add support for 1,024 bit DMR registers.
+
+This patch is a prelimianry patch to add the full 1,024 bit dense math register
+(DMRs) for -mcpu=future.  The MMA 512-bit accumulators map onto the top of the
+DMR register.
+
+This patch only adds the new 1,024 bit register support.  It does not add
+support for any instructions that need 1,024 bit registers instead of 512 bit
+registers.
+
+I used the new mode 'TDOmode' to be the opaque mode used for 1,204 bit
+registers.  The 'wD' constraint added in previous patches is used for these
+registers.  I added support to do load and store of DMRs via the VSX registers,
+since there are no load/store dense math instructions.  I added the new keyword
+'__dmr' to create 1,024 bit types that can be loaded into DMRs.  At present, I
+don't have aliases for __dmr512 and __dmr1024 that we've discussed internally.
+
+At present, the tree constant propigation patch does not work with 1,024 bit
+DMRs.  I believe this is due to the CCP pass not skipping opaque modes.  I hope
+once this patch is committed, we can work on the machine independent changes to
+allow the CCP pass not to issue an internal error when a DMR is used.
+
+The patches have been tested on the following platforms.  I added the patches
+for PR target/107299 that I submitted on November 2nd before doing the builds so
+that GCC would build on systems using IEEE 128-bit long double.
+    *	https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604834.html
+
+There were no regressions with doing bootstrap builds and running the regression
+tests:
+
+    1)	Power10 LE using --with-cpu=power10 --with-long-double-format=ieee;
+    2)	Power10 LE using --with-cpu=power10 --with-long-double-format=ibm;
+    3)	Power9 LE using --with-cpu=power9 --with-long-double-format=ibm; and
+    4)	Power8 BE using --with-cpu=power8 (both 32-bit & 64-bit tested).
+
+Can I check this patch into the GCC 13 master branch?
+
+2023-01-27   Michael Meissner  <meissner@linux.ibm.com>
+
+gcc/
+
+	* config/rs6000/mma.md (UNSPEC_DM_INSERT512_UPPER): New unspec.
+	(UNSPEC_DM_INSERT512_LOWER): Likewise.
+	(UNSPEC_DM_EXTRACT512): Likewise.
+	(UNSPEC_DMR_RELOAD_FROM_MEMORY): Likewise.
+	(UNSPEC_DMR_RELOAD_TO_MEMORY): Likewise.
+	(movtdo): New define_expand and define_insn_and_split to implement 1,024
+	bit DMR registers.
+	(movtdo_insert512_upper): New insn.
+	(movtdo_insert512_lower): Likewise.
+	(movtdo_extract512): Likewise.
+	(reload_dmr_from_memory): Likewise.
+	(reload_dmr_to_memory): Likewise.
+	* config/rs6000/rs6000-builtin.cc (rs6000_type_string): Add DMR
+	support.
+	(rs6000_init_builtins): Add support for __dmr keyword.
+	* config/rs6000/rs6000-call.cc (rs6000_return_in_memory): Add support
+	for TDOmode.
+	(rs6000_function_arg): Likewise.
+	* config/rs6000/rs6000-modes.def (TDOmode): New mode.
+	* config/rs6000/rs6000.cc (rs6000_hard_regno_nregs_internal): Add
+	support for TDOmode.
+	(rs6000_hard_regno_mode_ok_uncached): Likewise.
+	(rs6000_hard_regno_mode_ok): Likewise.
+	(rs6000_modes_tieable_p): Likewise.
+	(rs6000_debug_reg_global): Likewise.
+	(rs6000_setup_reg_addr_masks): Likewise.
+	(rs6000_init_hard_regno_mode_ok): Add support for TDOmode.  Setup reload
+	hooks for DMR mode.
+	(reg_offset_addressing_ok_p): Add support for TDOmode.
+	(rs6000_emit_move): Likewise.
+	(rs6000_secondary_reload_simple_move): Likewise.
+	(rs6000_secondary_reload_class): Likewise.
+	(rs6000_mangle_type): Add mangling for __dmr type.
+	(rs6000_dmr_register_move_cost): Add support for TDOmode.
+	(rs6000_split_multireg_move): Likewise.
+	(rs6000_invalid_conversion): Likewise.
+	* config/rs6000/rs6000.h (VECTOR_ALIGNMENT_P): Add TDOmode.
+	(enum rs6000_builtin_type_index): Add DMR type nodes.
+	(dmr_type_node): Likewise.
+	(ptr_dmr_type_node): Likewise.
+
+gcc/testsuite/
+
+	* gcc.target/powerpc/dm-1024bit.c: New test.
+
+==================== dmf007, patch #34 was reverted ====================
+
+PowerPC: Switch to dense math names for all MMA operations.
+
+This patch changes the assembler instruction names for MMA instructions from
+the original name used in power10 to the new name when used with the dense math
+system.  I.e. xvf64gerpp becomes dmxvf64gerpp.  The assembler will emit the
+same bits for either spelling.
+
+The patches have been tested on the following platforms.  I added the patches
+for PR target/107299 that I submitted on November 2nd before doing the builds so
+that GCC would build on systems using IEEE 128-bit long double.
+    *	https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604834.html
+
+There were no regressions with doing bootstrap builds and running the regression
+tests:
+
+    1)	Power10 LE using --with-cpu=power10 --with-long-double-format=ieee;
+    2)	Power10 LE using --with-cpu=power10 --with-long-double-format=ibm;
+    3)	Power9 LE using --with-cpu=power9 --with-long-double-format=ibm; and
+    4)	Power8 BE using --with-cpu=power8 (both 32-bit & 64-bit tested).
+
+Can I check this patch into the GCC 13 master branch?
+
+2023-01-27   Michael Meissner  <meissner@linux.ibm.com>
+
+gcc/
+
+	* config/rs6000/mma.md (vvi4i4i8_dm): New int attribute.
+	(avvi4i4i8_dm): Likewise.
+	(vvi4i4i2_dm): Likewise.
+	(avvi4i4i2_dm): Likewise.
+	(vvi4i4_dm): Likewise.
+	(avvi4i4_dm): Likewise.
+	(pvi4i2_dm): Likewise.
+	(apvi4i2_dm): Likewise.
+	(vvi4i4i4_dm): Likewise.
+	(avvi4i4i4_dm): Likewise.
+	(mma_<vv>): Add support for running on DMF systems, generating the dense
+	math instruction and using the dense math accumulators.
+	(mma_<avv>): Likewise.
+	(mma_<pv>): Likewise.
+	(mma_<apv>): Likewise.
+	(mma_<vvi4i4i8>): Likewise.
+	(mma_<avvi4i4i8>): Likewise.
+	(mma_<vvi4i4i2>): Likewise.
+	(mma_<avvi4i4i2>): Likewise.
+	(mma_<vvi4i4>): Likewise.
+	(mma_<avvi4i4): Likewise.
+	(mma_<pvi4i2>): Likewise.
+	(mma_<apvi4i2): Likewise.
+	(mma_<vvi4i4i4>): Likewise.
+	(mma_<avvi4i4i4>): Likewise.
+
+gcc/testsuite/
+
+	* gcc.target/powerpc/dm-double-test.c: New test.
+	* lib/target-supports.exp (check_effective_target_ppc_dmr_ok): New
+	target test.
+
+==================== dmf007, patch #33 was reverted ====================
+
+PowerPC: Make MMA insns support DMR registers.
+
+This patch changes the MMA instructions to use either FPR registers
+(-mcpu=power10) or DMRs (-mcpu=future).  In this patch, the existing MMA
+instruction names are used.
+
+A macro (__PPC_DMR__) is defined if the MMA instructions use the DMRs.
+
+The patches have been tested on the following platforms.  I added the patches
+for PR target/107299 that I submitted on November 2nd before doing the builds so
+that GCC would build on systems using IEEE 128-bit long double.
+    *	https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604834.html
+
+There were no regressions with doing bootstrap builds and running the regression
+tests:
+
+    1)	Power10 LE using --with-cpu=power10 --with-long-double-format=ieee;
+    2)	Power10 LE using --with-cpu=power10 --with-long-double-format=ibm;
+    3)	Power9 LE using --with-cpu=power9 --with-long-double-format=ibm; and
+    4)	Power8 BE using --with-cpu=power8 (both 32-bit & 64-bit tested).
+
+Can I check this patch into the GCC 13 master branch?
+
+2023-01-27   Michael Meissner  <meissner@linux.ibm.com>
+
+gcc/
+
+	* config/rs6000/mma.md (mma_<acc>): New define_expand to handle
+	mma_<acc> for dense math and non dense math.
+	(mma_<acc> insn): Restrict to non dense math.
+	(mma_xxsetaccz): Convert to define_expand to handle non dense math and
+	dense math.
+	(mma_xxsetaccz_p10): Rename from mma_xxsetaccz and restrict usage to non
+	dense math.
+	(mma_xxsetaccz_dm): Dense math version of mma_xxsetaccz.
+	(mma_<vv>): Add support for dense math.
+	(mma_<avv>): Likewise.
+	(mma_<pv>): Likewise.
+	(mma_<apv>): Likewise.
+	(mma_<vvi4i4i8>): Likewise.
+	(mma_<avvi4i4i8>): Likewise.
+	(mma_<vvi4i4i2>): Likewise.
+	(mma_<avvi4i4i2>): Likewise.
+	(mma_<vvi4i4>): Likewise.
+	(mma_<avvi4i4>): Likewise.
+	(mma_<pvi4i2>): Likewise.
+	(mma_<apvi4i2>): Likewise.
+	(mma_<vvi4i4i4>): Likewise.
+	(mma_<avvi4i4i4>): Likewise.
+	* config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): Define
+	__PPC_DMR__ if we have dense math instructions.
+	* config/rs6000/rs6000.cc (print_operand): Make %A handle only DMRs if
+	dense math and only FPRs if not dense math.
+	(rs6000_split_multireg_move): Do not generate the xxmtacc instruction to
+	prime the DMR registers or the xxmfacc instruction to de-prime
+	instructions if we have dense math register support.
+
+==================== dmf007, patch #32 was reverted ====================
+
+PowerPC: Add support for accumulators in DMR registers.
+
+The MMA subsystem added the notion of accumulator registers as an optional
+feature of ISA 3.1.  In ISA 3.1, these accumulators overlapped with the VSX
+vector registers 0..31, but logically the accumulator registers were separate
+from the FPR registers.  In ISA 3.1, it was anticipated that in future systems,
+the accumulator registers may no overlap with the FPR registers.  This patch
+adds the support for dense math registers as separate registers.
+
+These changes are preliminary.  They are expected to change over time.
+
+This particular patch does not change the MMA support to use the accumulators
+within the dense math registers.  This patch just adds the basic support for
+having separate DMRs.  The next patch will switch the MMA support to use the
+accumulators if -mcpu=future is used.
+
+For testing purposes, I added an undocumented option '-mdense-math' to enable
+or disable the dense math support.
+
+This patch adds a new constraint (wD).  If MMA is selected but dense math is
+not selected (i.e. -mcpu=power10), the wD constraint will allow access to
+accumulators that overlap with the VSX vector registers 0..31.  If both MMA and
+dense math are selected (i.e. -mcpu=future), the wD constraint will only allow
+dense math registers.
+
+This patch modifies the existing %A output modifier.  If MMA is selected but
+dense math is not selected, then %A output modifier converts the VSX register
+number to the accumulator number, by dividing it by 4.  If both MMA and dense
+math are selected, then %A will map the separate DMR registers into 0..7.
+
+The intention is that user code using extended asm can be modified to run on
+both MMA without dense math and MMA with dense math:
+
+    1)	If possible, don't use extended asm, but instead use the MMA built-in
+	functions;
+
+    2)	If you do need to write extended asm, change the d constraints
+	targetting accumulators should now use wD;
+
+    3)	Only use the built-in zero, assemble and disassemble functions create
+	move data between vector quad types and dense math accumulators.
+	I.e. do not use the xxmfacc, xxmtacc, and xxsetaccz directly in the
+	extended asm code.  The reason is these instructions assume there is a
+	1-to-1 correspondence between 4 adjacent FPR registers and an
+	accumulator that overlaps with those instructions.  With accumulators
+	now being separate registers, there no longer is a 1-to-1
+	correspondence.
+
+It is possible that the mangling for DMRs and the GDB register numbers may
+change in the future.
+
+The patches have been tested on the following platforms.  I added the patches
+for PR target/107299 that I submitted on November 2nd before doing the builds so
+that GCC would build on systems using IEEE 128-bit long double.
+    *	https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604834.html
+
+There were no regressions with doing bootstrap builds and running the regression
+tests:
+
+    1)	Power10 LE using --with-cpu=power10 --with-long-double-format=ieee;
+    2)	Power10 LE using --with-cpu=power10 --with-long-double-format=ibm;
+    3)	Power9 LE using --with-cpu=power9 --with-long-double-format=ibm; and
+    4)	Power8 BE using --with-cpu=power8 (both 32-bit & 64-bit tested).
+
+Can I check this patch into the GCC 13 master branch?
+
+2023-01-27   Michael Meissner  <meissner@linux.ibm.com>
+
+gcc/
+
+	* config/rs6000/constraints.md (wD constraint): New constraint.
+	* config/rs6000/mma.md (UNSPEC_DM_ASSEMBLE_ACC): New unspec.
+	(movxo): Convert into define_expand.
+	(movxo_fpr): Version of movxo where accumulators overlap with FPRs.
+	(movxo_dm): Dense math version of movxo.
+	(mma_assemble_acc): Add dense match support to define_expand.
+	(mma_assemble_acc_fpr): Rename from mma_assemble_acc, and restrict it to
+	non dense math.
+	(mma_assemble_acc_dm): Dense math version of mma_assemble_acc.
+	(mma_disassemble_acc): Add dense math support to define_expand.
+	(mma_disassemble_acc_fpr): Rename from mma_disassemble_acc, and restrict
+	it to non dense math.
+	(mma_disassemble_acc_dm): Dense math version of mma_disassemble_acc.
+	* config/rs6000/predicates.md (dmr_operand): New predicate.
+	(accumulator_operand): Likewise.
+	* config/rs6000/rs6000-cpus.def (ISA_FUTURE_MASKS): Add -mdense-math.
+	(POWERPC_MASKS): Likewise.
+	* config/rs6000/rs6000.cc (enum rs6000_reg_type): Add DMR_REG_TYPE.
+	(enum rs6000_reload_reg_type): Add RELOAD_REG_DMR.
+	(LAST_RELOAD_REG_CLASS): Add support for DMR registers.
+	(reload_reg_map): Likewise.
+	(rs6000_reg_names): Likewise.
+	(alt_reg_names): Likewise.
+	(rs6000_hard_regno_nregs_internal): Likewise.
+	(rs6000_hard_regno_mode_ok_uncached): Likewise.
+	(rs6000_debug_reg_global): Likewise.
+	(rs6000_setup_reg_addr_masks): Likewise.
+	(rs6000_init_hard_regno_mode_ok): Likewise.
+	(rs6000_option_override_internal): Add checking for -mdense-math.
+	(rs6000_secondary_reload_memory): Add support for DMR registers.
+	(rs6000_secondary_reload_simple_move): Likewise.
+	(rs6000_preferred_reload_class): Likewise.
+	(rs6000_secondary_reload_class): Likewise.
+	(print_operand): Make %A handle both FPRs and DMRs.
+	(rs6000_dmr_register_move_cost): New helper function.
+	(rs6000_register_move_cost): Add support for DMR registers.
+	(rs6000_memory_move_cost): Likewise.
+	(rs6000_compute_pressure_classes): Likewise.
+	(rs6000_debugger_regno): Likewise.
+	(rs6000_opt_masks): Add -mdense-math.
+	(rs6000_split_multireg_move): Add support for DMRs.
+	* config/rs6000/rs6000.h (UNITS_PER_DMR_WORD): New macro.
+	(FIRST_PSEUDO_REGISTER): Update for DMRs.
+	(FIXED_REGISTERS): Add DMRs.
+	(CALL_REALLY_USED_REGISTERS): Likewise.
+	(REG_ALLOC_ORDER): Likewise.
+	(enum reg_class): Add DM_REGS.
+	(REG_CLASS_NAMES): Likewise.
+	(REG_CLASS_CONTENTS): Likewise.
+	* config/rs6000/rs6000.md (FIRST_DMR_REGNO): New constant.
+	(LAST_DMR_REGNO): Likewise.
+	(isa attribute): Add 'dm' and 'not_dm' attributes.
+	(enabled attribute): Support 'dm' and 'not_dm' attributes.
+	* config/rs6000/rs6000.opt (-mdense-math): New switch.
+	* doc/md.texi (PowerPC constraints): Document wD constraint.
+
+==================== dmf007, patch #31 was reverted ====================
+
+PowerPC: Make -mcpu=future enable -mblock-ops-vector-pair.
+
+This patch enables generating load and store vector pair instructions when
+doing certain memory copy operations when -mcpu=future is used.  In doing tests
+on power10, it was determined that using these instructions were problematical
+in a few cases, so we disabled generating them by default.  This patch
+re-enabled generating these instructions if -mcpu=future is used.
+
+The patches have been tested on the following platforms.  I added the patches
+for PR target/107299 that I submitted on November 2nd before doing the builds so
+that GCC would build on systems using IEEE 128-bit long double.
+    *	https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604834.html
+
+There were no regressions with doing bootstrap builds and running the regression
+tests:
+
+    1)	Power10 LE using --with-cpu=power10 --with-long-double-format=ieee;
+    2)	Power10 LE using --with-cpu=power10 --with-long-double-format=ibm;
+    3)	Power9 LE using --with-cpu=power9 --with-long-double-format=ibm; and
+    4)	Power8 BE using --with-cpu=power8 (both 32-bit & 64-bit tested).
+
+Can I check this patch into the GCC 13 master branch?
+
+2023-01-27   Michael Meissner  <meissner@linux.ibm.com>
+
+gcc/
+
+	* config/rs6000/rs6000-cpus.def (ISA_FUTURE_MASKS): Add
+	-mblock-ops-vector-pair.
+	(POWERPC_MASKS): Likewise.
+
+==================== dmf007, patch #30 was reverted ====================
+
+PowerPC: Add -mcpu=future.
+
+These patches implement support for potential future PowerPC cpus.  At this
+time, features enabled with -mcpu=future may or may not be in actual PowerPCs
+that will be delivered in the future.
+
+This patch adds support for the -mcpu=future and -mtune=future options.
+If you use -mcpu=future, the macro __ARCH_PWR_FUTURE__ is defined, and the
+assembler .machine directive "future" is used.  Future patches in this
+series will add support for new instructions that may be present in future
+PowerPC processors.
+
+At present, we do not have any specific differences in terms of cpu tuning for
+future machines, so we make -mtune=future act the same as -mtune=power10.  The
+costs are also cloned from the power10.  It is anticipated that we may change
+the tuning characteristics for -mtune=future at a later time.
+
+The patches have been tested on the following platforms.  I added the patches
+for PR target/107299 that I submitted on November 2nd before doing the builds so
+that GCC would build on systems using IEEE 128-bit long double.
+	* https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604834.html
+
+There were no regressions with doing bootstrap builds and running the regression
+tests:
+
+    1)	Power10 LE using --with-cpu=power10 --with-long-double-format=ieee;
+    2)	Power10 LE using --with-cpu=power10 --with-long-double-format=ibm;
+    3)	Power9 LE using --with-cpu=power9 --with-long-double-format=ibm; and
+    4)	Power8 BE using --with-cpu=power8 (both 32-bit & 64-bit tested).
+
+Can I check this patch into the GCC 13 master branch?
+
+2023-01-27   Michael Meissner  <meissner@linux.ibm.com>
+
+gcc/
+
+	* config/rs6000/power10.md (power10-load): Temporarily treat
+	-mcpu=future the same as -mcpu=power10.
+	(power10-fused-load): Likewise.
+	(power10-prefixed-load): Likewise.
+	(power10-prefixed-load): Likewise.
+	(power10-load-update): Likewise.
+	(power10-fpload-double): Likewise.
+	(power10-fpload-double): Likewise.
+	(power10-prefixed-fpload-double): Likewise.
+	(power10-prefixed-fpload-double): Likewise.
+	(power10-fpload-update-double): Likewise.
+	(power10-fpload-single): Likewise.
+	(power10-fpload-update-single): Likewise.
+	(power10-vecload): Likewise.
+	(power10-vecload-pair): Likewise.
+	(power10-store): Likewise.
+	(power10-fused-store): Likewise.
+	(power10-prefixed-store): Likewise.
+	(power10-prefixed-store): Likewise.
+	(power10-store-update): Likewise.
+	(power10-vecstore-pair): Likewise.
+	(power10-larx): Likewise.
+	(power10-lq): Likewise.
+	(power10-stcx): Likewise.
+	(power10-stq): Likewise.
+	(power10-sync): Likewise.
+	(power10-sync): Likewise.
+	(power10-alu): Likewise.
+	(power10-fused_alu): Likewise.
+	(power10-paddi): Likewise.
+	(power10-rot): Likewise.
+	(power10-rot-compare): Likewise.
+	(power10-alu2): Likewise.
+	(power10-cmp): Likewise.
+	(power10-two): Likewise.
+	(power10-three): Likewise.
+	(power10-mul): Likewise.
+	(power10-mul-compare): Likewise.
+	(power10-div): Likewise.
+	(power10-div-compare): Likewise.
+	(power10-crlogical): Likewise.
+	(power10-mfcrf): Likewise.
+	(power10-mfcr): Likewise.
+	(power10-mtcr): Likewise.
+	(power10-mtjmpr): Likewise.
+	(power10-mfjmpr): Likewise.
+	(power10-mfjmpr): Likewise.
+	(power10-fpsimple): Likewise.
+	(power10-fp): Likewise.
+	(power10-fpcompare): Likewise.
+	(power10-sdiv): Likewise.
+	(power10-ddiv): Likewise.
+	(power10-sqrt): Likewise.
+	(power10-dsqrt): Likewise.
+	(power10-vec-2cyc): Likewise.
+	(power10-fused-vec): Likewise.
+	(power10-veccmp): Likewise.
+	(power10-vecsimple): Likewise.
+	(power10-vecnormal): Likewise.
+	(power10-qp): Likewise.
+	(power10-vecperm): Likewise.
+	(power10-vecperm-compare): Likewise.
+	(power10-prefixed-vecperm): Likewise.
+	(power10-veccomplex): Likewise.
+	(power10-vecfdiv): Likewise.
+	(power10-vecdiv): Likewise.
+	(power10-qpdiv): Likewise.
+	(power10-qpmul): Likewise.
+	(power10-mtvsr): Likewise.
+	(power10-mfvsr): Likewise.
+	(power10-mfvsr): Likewise.
+	(power10-branch): Likewise.
+	(power10-fused-branch): Likewise.
+	(power10-crypto): Likewise.
+	(power10-htm): Likewise.
+	(power10-htm): Likewise.
+	(power10-dfp): Likewise.
+	(power10-dfpq): Likewise.
+	(power10-mma): Likewise.
+	(power10-prefixed-mma): Likewise.
+	* config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): Define
+	__ARCH_PWR_FUTURE__ if -mcpu=future.
+	* config/rs6000/rs6000-cpus.def (ISA_FUTURE_MASKS): New macro.
+	(POWERPC_MASKS): Add -mcpu=future.
+	* config/rs6000/rs6000-opts.h (enum processor_type): Add
+	PROCESSOR_FUTURE.
+	* config/rs6000/rs6000-tables.opt: Regenerate.
+	* config/rs6000/rs6000.cc (future_costs): Add -mcpu=future support.
+	Make -mtune=future act like -mtune=power10 for now.
+	(rs6000_option_override_internal): 
+	(rs6000_machine_from_flags): Likewise.
+	(rs6000_reassociation_width): Likewise.
+	(rs6000_adjust_cost): Likewise.
+	(rs6000_issue_rate): Likewise.
+	(rs6000_sched_reorder): Likewise.
+	(rs6000_sched_reorder2): Likewise.
+	(rs6000_register_move_cost): Likewise.
+	(rs6000_opt_masks): Add -mfuture.
+	* config/rs6000/rs6000.h (ASM_CPU_SUPPORT): Likewise.
+	* config/rs6000/rs6000.md (cpu attribute): Add -mcpu=future support.
+	* config/rs6000/rs6000.opt (-mfuture): New undocumented debug switch.
+	* doc/invoke.texi (IBM RS/6000 and PowerPC Options): Document -mcpu=future.
+
 ==================== dmf007, rebase 1/23 ====================
 
 ==================== dmf007, patch #17 was reverted ====================

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [gcc(refs/users/meissner/heads/dmf007)] Update ChangeLog.meissner
@ 2023-01-23 21:19 Michael Meissner
  0 siblings, 0 replies; 11+ messages in thread
From: Michael Meissner @ 2023-01-23 21:19 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:6ea543413e3bb89da9c74683613bb1aa30daed6b

commit 6ea543413e3bb89da9c74683613bb1aa30daed6b
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Mon Jan 23 16:19:23 2023 -0500

    Update ChangeLog.meissner

Diff:
---
 gcc/ChangeLog.meissner | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/ChangeLog.meissner b/gcc/ChangeLog.meissner
index 501730114f7..796b3527b53 100644
--- a/gcc/ChangeLog.meissner
+++ b/gcc/ChangeLog.meissner
@@ -1,3 +1,5 @@
+==================== dmf007, rebase 1/23 ====================
+
 ==================== dmf007, patch #17 ====================
 
 Support load/store vector with right length.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [gcc(refs/users/meissner/heads/dmf007)] Update ChangeLog.meissner
@ 2023-01-23 21:19 Michael Meissner
  0 siblings, 0 replies; 11+ messages in thread
From: Michael Meissner @ 2023-01-23 21:19 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:1c5640bece352dc465547b204d5d327a4f83e0ee

commit 1c5640bece352dc465547b204d5d327a4f83e0ee
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Fri Jan 20 22:26:57 2023 -0500

    Update ChangeLog.meissner

Diff:
---
 gcc/ChangeLog.meissner | 514 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 514 insertions(+)

diff --git a/gcc/ChangeLog.meissner b/gcc/ChangeLog.meissner
index 83e9ee06ffc..501730114f7 100644
--- a/gcc/ChangeLog.meissner
+++ b/gcc/ChangeLog.meissner
@@ -1,3 +1,517 @@
+==================== dmf007, patch #17 ====================
+
+Support load/store vector with right length.
+
+This patch adds support for new instructions that may be added to the PowerPC
+architecture in the future to enhance the load and store vector with length
+instructions.
+
+The current instructions (lxvl, lxvll, stxvl, and stxvll) are inconvient to use
+since the count for the number of bytes must be in the top 8 bits of the GPR
+register, instead of the bottom 8 bits.  This meant that code generating these
+instructions typically had to do a shift left by 56 bits to get the count into
+the right position.  In a future version of the PowerPC architecture, new
+variants of these instructions might be added that expect the count to be in
+the bottom 8 bits of the GPR register.  These patches add this support to GCC
+if the user uses the -mcpu=future option.
+
+I tested this patch on a little endian power10 system with long double using
+the tradiational IBM double double format.  Assuming the other 6 patches for
+-mcpu=future are checked in (or at least the first patch), can I check this
+patch into the master branch for GCC 13.
+
+2022-12-02   Michael Meissner  <meissner@linux.ibm.com>
+
+gcc/
+
+	* config/rs6000/vsx.md (lxvl): If -mcpu=future, generate the lxvl with
+	the shift count automaticaly used in the insn.
+	(lxvrl): New insn for -mcpu=future.
+	(lxvrll): Likewise.
+	(stxvl): If -mcpu=future, generate the stxvl with the shift count
+	automaticaly used in the insn.
+	(stxvrl): New insn for -mcpu=future.
+	(stxvrll): Likewise.
+
+gcc/testsuite/
+
+	* gcc.target/powerpc/lxvrl.c: New test.
+
+==================== dmf007, patch #16 ====================
+
+Add saturating subtract built-ins.
+
+This patch adds support for a saturating subtract built-in function that may be
+added to a future PowerPC processor.  Note, if it is added, the name of the
+built-in function may change before GCC 13 is released.  If the name changes,
+we will submit a patch changing the name.
+
+I also added support for providing dense math built-in functions, even though
+at present, we have not added any new built-in functions for dense math.  It is
+likely we will want to add new dense math built-in functions as the dense math
+support is fleshed out.
+
+I tested this patch on a little endian power10 system with long double using
+the tradiational IBM double double format.  Assuming the other 6 patches for
+-mcpu=future are checked in (or at least the first patch), can I check this
+patch into the master branch for GCC 13.
+
+2022-12-02   Michael Meissner  <meissner@linux.ibm.com>
+
+gcc/
+
+	* config/rs6000/rs6000-builtin.cc (rs6000_invalid_builtin): Add support
+	for flagging invalid use of future built-in functions.
+	(rs6000_builtin_is_supported): Add support for future built-in
+	functions.
+	* config/rs6000/rs6000-builtins.def (__builtin_saturate_subtract32): New
+	built-in function for -mcpu=future.
+	(__builtin_saturate_subtract64): Likewise.
+	* config/rs6000/rs6000-gen-builtins.cc (enum bif_stanza): Add stanzas
+	for -mcpu=future built-ins.
+	(stanza_map): Likewise.
+	(enable_string): Likewise.
+	(struct attrinfo): Likewise.
+	(parse_bif_attrs): Likewise.
+	(write_decls): Likewise.
+	* config/rs6000/rs6000.md (sat_sub<mode>3): Add saturating subtract
+	built-in insn declarations.
+	(sat_sub<mode>3_dot): Likewise.
+	(sat_sub<mode>3_dot2): Likewise.
+	* doc/extend.texi (Future PowerPC built-ins): New section.
+
+gcc/testsuite/
+
+	* gcc.target/powerpc/subfus-1.c: New test.
+	* gcc.target/powerpc/subfus-2.c: Likewise.
+	* lib/target-supports.exp (check_effective_target_powerpc_future_ok):
+	New effective target.
+
+==================== dmf007, patch #15 ====================
+
+PowerPC: Add support for 1,024 bit DMR registers.
+
+This patch is a prelimianry patch to add the full 1,024 bit dense math register
+(DMRs) for -mcpu=future.  The MMA 512-bit accumulators map onto the top of the
+DMR register.
+
+This patch only adds the new 1,024 bit register support.  It does not add
+support for any instructions that need 1,024 bit registers instead of 512 bit
+registers.
+
+I used the new mode 'TDOmode' to be the opaque mode used for 1,204 bit
+registers.  The 'wD' constraint added in previous patches is used for these
+registers.  I added support to do load and store of DMRs via the VSX registers,
+since there are no load/store dense math instructions.  I added the new keyword
+'__dmr' to create 1,024 bit types that can be loaded into DMRs.  At present, I
+don't have aliases for __dmr512 and __dmr1024 that we've discussed internally.
+
+At present, the tree constant propigation patch does not work with 1,024 bit
+DMRs.  I believe this is due to the CCP pass not skipping opaque modes.  I hope
+once this patch is committed, we can work on the machine independent changes to
+allow the CCP pass not to issue an internal error when a DMR is used.
+
+The patches have been tested on the following platforms.  I added the patches
+for PR target/107299 that I submitted on November 2nd before doing the builds so
+that GCC would build on systems using IEEE 128-bit long double.
+    *	https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604834.html
+
+There were no regressions with doing bootstrap builds and running the regression
+tests:
+
+    1)	Power10 LE using --with-cpu=power10 --with-long-double-format=ieee;
+    2)	Power10 LE using --with-cpu=power10 --with-long-double-format=ibm;
+    3)	Power9 LE using --with-cpu=power9 --with-long-double-format=ibm; and
+    4)	Power8 BE using --with-cpu=power8 (both 32-bit & 64-bit tested).
+
+Can I check this patch into the GCC 13 master branch?
+
+2022-12-02   Michael Meissner  <meissner@linux.ibm.com>
+
+gcc/
+
+	* config/rs6000/mma.md (UNSPEC_DM_INSERT512_UPPER): New unspec.
+	(UNSPEC_DM_INSERT512_LOWER): Likewise.
+	(UNSPEC_DM_EXTRACT512): Likewise.
+	(UNSPEC_DMR_RELOAD_FROM_MEMORY): Likewise.
+	(UNSPEC_DMR_RELOAD_TO_MEMORY): Likewise.
+	(movtdo): New define_expand and define_insn_and_split to implement 1,024
+	bit DMR registers.
+	(movtdo_insert512_upper): New insn.
+	(movtdo_insert512_lower): Likewise.
+	(movtdo_extract512): Likewise.
+	(reload_dmr_from_memory): Likewise.
+	(reload_dmr_to_memory): Likewise.
+	* config/rs6000/rs6000-builtin.cc (rs6000_type_string): Add DMR
+	support.
+	(rs6000_init_builtins): Add support for __dmr keyword.
+	* config/rs6000/rs6000-call.cc (rs6000_return_in_memory): Add support
+	for TDOmode.
+	(rs6000_function_arg): Likewise.
+	* config/rs6000/rs6000-modes.def (TDOmode): New mode.
+	* config/rs6000/rs6000.cc (rs6000_hard_regno_nregs_internal): Add
+	support for TDOmode.
+	(rs6000_hard_regno_mode_ok_uncached): Likewise.
+	(rs6000_hard_regno_mode_ok): Likewise.
+	(rs6000_modes_tieable_p): Likewise.
+	(rs6000_debug_reg_global): Likewise.
+	(rs6000_setup_reg_addr_masks): Likewise.
+	(rs6000_init_hard_regno_mode_ok): Add support for TDOmode.  Setup reload
+	hooks for DMR mode.
+	(reg_offset_addressing_ok_p): Add support for TDOmode.
+	(rs6000_emit_move): Likewise.
+	(rs6000_secondary_reload_simple_move): Likewise.
+	(rs6000_secondary_reload_class): Likewise.
+	(rs6000_mangle_type): Add mangling for __dmr type.
+	(rs6000_dmr_register_move_cost): Add support for TDOmode.
+	(rs6000_split_multireg_move): Likewise.
+	(rs6000_invalid_conversion): Likewise.
+	* config/rs6000/rs6000.h (VECTOR_ALIGNMENT_P): Add TDOmode.
+	(enum rs6000_builtin_type_index): Add DMR type nodes.
+	(dmr_type_node): Likewise.
+	(ptr_dmr_type_node): Likewise.
+
+gcc/testsuite/
+
+	* gcc.target/powerpc/dm-1024bit.c: New test.
+
+==================== dmf007, patch #14 ====================
+
+PowerPC: Switch to dense math names for all MMA operations.
+
+This patch changes the assembler instruction names for MMA instructions from
+the original name used in power10 to the new name when used with the dense math
+system.  I.e. xvf64gerpp becomes dmxvf64gerpp.  The assembler will emit the
+same bits for either spelling.
+
+The patches have been tested on the following platforms.  I added the patches
+for PR target/107299 that I submitted on November 2nd before doing the builds so
+that GCC would build on systems using IEEE 128-bit long double.
+    *	https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604834.html
+
+There were no regressions with doing bootstrap builds and running the regression
+tests:
+
+    1)	Power10 LE using --with-cpu=power10 --with-long-double-format=ieee;
+    2)	Power10 LE using --with-cpu=power10 --with-long-double-format=ibm;
+    3)	Power9 LE using --with-cpu=power9 --with-long-double-format=ibm; and
+    4)	Power8 BE using --with-cpu=power8 (both 32-bit & 64-bit tested).
+
+Can I check this patch into the GCC 13 master branch?
+
+2022-11-09   Michael Meissner  <meissner@linux.ibm.com>
+
+gcc/
+
+	* config/rs6000/mma.md (vvi4i4i8_dm): New int attribute.
+	(avvi4i4i8_dm): Likewise.
+	(vvi4i4i2_dm): Likewise.
+	(avvi4i4i2_dm): Likewise.
+	(vvi4i4_dm): Likewise.
+	(avvi4i4_dm): Likewise.
+	(pvi4i2_dm): Likewise.
+	(apvi4i2_dm): Likewise.
+	(vvi4i4i4_dm): Likewise.
+	(avvi4i4i4_dm): Likewise.
+	(mma_<vv>): Add support for running on DMF systems, generating the dense
+	math instruction and using the dense math accumulators.
+	(mma_<avv>): Likewise.
+	(mma_<pv>): Likewise.
+	(mma_<apv>): Likewise.
+	(mma_<vvi4i4i8>): Likewise.
+	(mma_<avvi4i4i8>): Likewise.
+	(mma_<vvi4i4i2>): Likewise.
+	(mma_<avvi4i4i2>): Likewise.
+	(mma_<vvi4i4>): Likewise.
+	(mma_<avvi4i4): Likewise.
+	(mma_<pvi4i2>): Likewise.
+	(mma_<apvi4i2): Likewise.
+	(mma_<vvi4i4i4>): Likewise.
+	(mma_<avvi4i4i4>): Likewise.
+
+gcc/testsuite/
+
+	* gcc.target/powerpc/dm-double-test.c: New test.
+	* lib/target-supports.exp (check_effective_target_ppc_dmr_ok): New
+	target test.
+
+==================== dmf007, patch #13 ====================
+
+PowerPC: Make MMA insns support DMR registers.
+
+This patch changes the MMA instructions to use either FPR registers
+(-mcpu=power10) or DMRs (-mcpu=future).  In this patch, the existing MMA
+instruction names are used.
+
+A macro (__PPC_DMR__) is defined if the MMA instructions use the DMRs.
+
+The patches have been tested on the following platforms.  I added the patches
+for PR target/107299 that I submitted on November 2nd before doing the builds so
+that GCC would build on systems using IEEE 128-bit long double.
+    *	https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604834.html
+
+There were no regressions with doing bootstrap builds and running the regression
+tests:
+
+    1)	Power10 LE using --with-cpu=power10 --with-long-double-format=ieee;
+    2)	Power10 LE using --with-cpu=power10 --with-long-double-format=ibm;
+    3)	Power9 LE using --with-cpu=power9 --with-long-double-format=ibm; and
+    4)	Power8 BE using --with-cpu=power8 (both 32-bit & 64-bit tested).
+
+Can I check this patch into the GCC 13 master branch?
+
+2022-12-02   Michael Meissner  <meissner@linux.ibm.com>
+
+gcc/
+
+	* config/rs6000/mma.md (mma_<acc>): New define_expand to handle
+	mma_<acc> for dense math and non dense math.
+	(mma_<acc> insn): Restrict to non dense math.
+	(mma_xxsetaccz): Convert to define_expand to handle non dense math and
+	dense math.
+	(mma_xxsetaccz_p10): Rename from mma_xxsetaccz and restrict usage to non
+	dense math.
+	(mma_xxsetaccz_dm): Dense math version of mma_xxsetaccz.
+	(mma_<vv>): Add support for dense math.
+	(mma_<avv>): Likewise.
+	(mma_<pv>): Likewise.
+	(mma_<apv>): Likewise.
+	(mma_<vvi4i4i8>): Likewise.
+	(mma_<avvi4i4i8>): Likewise.
+	(mma_<vvi4i4i2>): Likewise.
+	(mma_<avvi4i4i2>): Likewise.
+	(mma_<vvi4i4>): Likewise.
+	(mma_<avvi4i4>): Likewise.
+	(mma_<pvi4i2>): Likewise.
+	(mma_<apvi4i2>): Likewise.
+	(mma_<vvi4i4i4>): Likewise.
+	(mma_<avvi4i4i4>): Likewise.
+	* config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): Define
+	__PPC_DMR__ if we have dense math instructions.
+	* config/rs6000/rs6000.cc (print_operand): Make %A handle only DMRs if
+	dense math and only FPRs if not dense math.
+	(rs6000_split_multireg_move): Do not generate accumulator prime or
+	de-prime instructions if dense math.
+
+==================== dmf007, patch #12 ====================
+
+PowerPC: Add support for accumulators in DMR registers.
+
+The MMA system added the notion of accumulator registers.  In power10, these
+accumulators overlapped with the FPR registers, but logically the accumulators
+were separate from the FPR registers.  It is anticipated that in future
+systems, we may have a separate dense math unit and the accumulators will be
+mapped onto the new dense math registers (DMRs).  This patch adds the support
+for dense math registers.
+
+These changes are preliminary.  They are expected to change over time.
+
+This particular patch does not change the MMA support to use the accumulators
+within the dense math registers.  This patch just adds the basic support for
+having separate DMRs.  The next patch will switch the MMA support to use the
+accumulators if -mcpu=future is used.
+
+For testing purposes, I added an undocumented option '-mdense-math' to enable
+or disable the dense math support.
+
+This patch adds a new constraint (wD).  If MMA is selected but dense math is
+not selected (i.e. -mcpu=power10), the wD constraint will match accumulators
+that overlap with the FPRs.  If both MMA and dense math are selected
+(i.e. -mcpu=future), the wD constraint will only match DMRs.
+
+This patch modifies the existing %A output modifier.  If MMA is selected but
+dense math is not selected, then %A convert the FPR register number to the
+accumulator number.  If both MMA and dense math are selected, then %A will only
+work if the register is an accumulator mapped onto a DMR.
+
+The intention is that user code using extended asm can be modified to run on
+both MMA without dense math and MMA with dense math:
+
+    1)	If possible, don't use extended asm, but instead use the MMA built-in
+	functions;
+
+    2)	If you do need to write extended asm, change the d constraints
+	targetting accumulators should now use wD;
+
+    3)	Only use the built-in zero, assemble and disassemble functions create
+	move data between vector quad types and dense math accumulators.
+	I.e. do not use the xxmfacc, xxmtacc, and xxsetaccz directly in the
+	extended asm code.  The reason is these instructions assume there is a
+	1-to-1 correspondence between 4 adjacent FPR registers and an
+	accumulator that overlaps with those instructions.  With accumulators
+	now being separate registers, there no longer is a 1-to-1
+	correspondence.
+
+It is possible that the mangling for DMRs and the GDB register numbers may
+change in the future.
+
+The patches have been tested on the following platforms.  I added the patches
+for PR target/107299 that I submitted on November 2nd before doing the builds so
+that GCC would build on systems using IEEE 128-bit long double.
+    *	https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604834.html
+
+There were no regressions with doing bootstrap builds and running the regression
+tests:
+
+    1)	Power10 LE using --with-cpu=power10 --with-long-double-format=ieee;
+    2)	Power10 LE using --with-cpu=power10 --with-long-double-format=ibm;
+    3)	Power9 LE using --with-cpu=power9 --with-long-double-format=ibm; and
+    4)	Power8 BE using --with-cpu=power8 (both 32-bit & 64-bit tested).
+
+Can I check this patch into the GCC 13 master branch?
+
+2022-12-02   Michael Meissner  <meissner@linux.ibm.com>
+
+gcc/
+
+	* config/rs6000/constraints.md (wD constraint): New constraint.
+	* config/rs6000/mma.md (UNSPEC_DM_ASSEMBLE_ACC): New unspec.
+	(movxo): Convert into define_expand.
+	(movxo_fpr): Version of movxo where accumulators overlap with FPRs.
+	(movxo_dm): Dense math version of movxo.
+	(mma_assemble_acc): Add dense match support to define_expand.
+	(mma_assemble_acc_fpr): Rename from mma_assemble_acc, and restrict it to
+	non dense math.
+	(mma_assemble_acc_dm): Dense math version of mma_assemble_acc.
+	(mma_disassemble_acc): Add dense math support to define_expand.
+	(mma_disassemble_acc_fpr): Rename from mma_disassemble_acc, and restrict
+	it to non dense math.
+	(mma_disassemble_acc_dm): Dense math version of mma_disassemble_acc.
+	* config/rs6000/predicates.md (dmr_operand): New predicate.
+	(accumulator_operand): Likewise.
+	* config/rs6000/rs6000-cpus.def (ISA_FUTURE_MASKS): Add -mdense-math.
+	(POWERPC_MASKS): Likewise.
+	* config/rs6000/rs6000.cc (enum rs6000_reg_type): Add DMR_REG_TYPE.
+	(enum rs6000_reload_reg_type): Add RELOAD_REG_DMR.
+	(LAST_RELOAD_REG_CLASS): Add support for DMR registers.
+	(reload_reg_map): Likewise.
+	(rs6000_reg_names): Likewise.
+	(alt_reg_names): Likewise.
+	(rs6000_hard_regno_nregs_internal): Likewise.
+	(rs6000_hard_regno_mode_ok_uncached): Likewise.
+	(rs6000_debug_reg_global): Likewise.
+	(rs6000_setup_reg_addr_masks): Likewise.
+	(rs6000_init_hard_regno_mode_ok): Likewise.
+	(rs6000_option_override_internal): Add checking for -mdense-math.
+	(rs6000_secondary_reload_memory): Add support for DMR registers.
+	(rs6000_secondary_reload_simple_move): Likewise.
+	(rs6000_preferred_reload_class): Likewise.
+	(rs6000_secondary_reload_class): Likewise.
+	(print_operand): Make %A handle both FPRs and DMRs.
+	(rs6000_dmr_register_move_cost): New helper function.
+	(rs6000_register_move_cost): Add support for DMR registers.
+	(rs6000_memory_move_cost): Likewise.
+	(rs6000_compute_pressure_classes): Likewise.
+	(rs6000_debugger_regno): Likewise.
+	(rs6000_opt_masks): Add -mdense-math.
+	(rs6000_split_multireg_move): Add support for DMRs.
+	* config/rs6000/rs6000.h (UNITS_PER_DMR_WORD): New macro.
+	(FIRST_PSEUDO_REGISTER): Update for DMRs.
+	(FIXED_REGISTERS): Add DMRs.
+	(CALL_REALLY_USED_REGISTERS): Likewise.
+	(REG_ALLOC_ORDER): Likewise.
+	(enum reg_class): Add DM_REGS.
+	(REG_CLASS_NAMES): Likewise.
+	(REG_CLASS_CONTENTS): Likewise.
+	* config/rs6000/rs6000.md (FIRST_DMR_REGNO): New constant.
+	(LAST_DMR_REGNO): Likewise.
+	(isa attribute): Add 'dm' and 'not_dm' attributes.
+	(enabled attribute): Support 'dm' and 'not_dm' attributes.
+	* config/rs6000/rs6000.opt (-mdense-math): New switch.
+	* doc/md.texi (PowerPC constraints): Document wD constraint.
+
+==================== dmf007, patch #10 ====================
+
+PowerPC: Make -mcpu=future enable -mblock-ops-vector-pair.
+
+This patch enables generating load and store vector pair instructions when
+doing certain memory copy operations when -mcpu=future is used.  In doing tests
+on power10, it was determined that using these instructions were problematical
+in a few cases, so we disabled generating them by default.  This patch
+re-enabled generating these instructions if -mcpu=future is used.
+
+The patches have been tested on the following platforms.  I added the patches
+for PR target/107299 that I submitted on November 2nd before doing the builds so
+that GCC would build on systems using IEEE 128-bit long double.
+    *	https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604834.html
+
+There were no regressions with doing bootstrap builds and running the regression
+tests:
+
+    1)	Power10 LE using --with-cpu=power10 --with-long-double-format=ieee;
+    2)	Power10 LE using --with-cpu=power10 --with-long-double-format=ibm;
+    3)	Power9 LE using --with-cpu=power9 --with-long-double-format=ibm; and
+    4)	Power8 BE using --with-cpu=power8 (both 32-bit & 64-bit tested).
+
+Can I check this patch into the GCC 13 master branch?
+
+2022-12-02   Michael Meissner  <meissner@linux.ibm.com>
+
+gcc/
+
+	* config/rs6000/rs6000-cpus.def (ISA_FUTURE_MASKS): Add
+	-mblock-ops-vector-pair.
+	(POWERPC_MASKS): Likewise.
+
+==================== dmf007, patch #10 ====================
+
+PowerPC: Add -mcpu=future.
+
+This patch adds support for the -mcpu=future and -mtune=future options.
+Besides defining __ARCH_PWR_FUTURE__ this particular patch does not enable any
+new features.
+
+At present, we do not have any specific differences in terms of cpu tuning for
+future machines, so we make -mtune=future act the same as -mtune=power10.  It
+is anticipated that we may add support for changing the tuning characteristics
+for -mtune=future at a later time.
+
+These patches implement support for potential future PowerPC cpus.  At this
+time, features enabled with -mcpu=future may or may not be in actual PowerPCs
+that will be delivered in the future.
+
+The patches have been tested on the following platforms.  I added the patches
+for PR target/107299 that I submitted on November 2nd before doing the builds so
+that GCC would build on systems using IEEE 128-bit long double.
+    *	https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604834.html
+
+There were no regressions with doing bootstrap builds and running the regression
+tests:
+
+    1)	Power10 LE using --with-cpu=power10 --with-long-double-format=ieee;
+    2)	Power10 LE using --with-cpu=power10 --with-long-double-format=ibm;
+    3)	Power9 LE using --with-cpu=power9 --with-long-double-format=ibm; and
+    4)	Power8 BE using --with-cpu=power8 (both 32-bit & 64-bit tested).
+
+Can I check this patch into the GCC 13 master branch?
+
+2022-12-02   Michael Meissner  <meissner@linux.ibm.com>
+
+gcc/
+
+	* config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): Define
+	__ARCH_PWR_FUTURE__ if -mcpu=future.
+	* config/rs6000/rs6000-cpus.def (ISA_FUTURE_MASKS): New macro.
+	(POWERPC_MASKS): Add -mfuture.
+	* config/rs6000/rs6000-opts.h (enum processor_type): Add
+	PROCESSOR_FUTURE.
+	* config/rs6000/rs6000-tables.opt: Regenerate.
+	* config/rs6000/rs6000.cc (rs6000_option_override_internal): Add
+	-mcpu=future support.  Make -mtune=future act like -mtune=power10 for
+	now.
+	(rs6000_machine_from_flags): Likewise.
+	(rs6000_reassociation_width): Likewise.
+	(rs6000_adjust_cost): Likewise.
+	(rs6000_issue_rate): Likewise.
+	(rs6000_sched_reorder): Likewise.
+	(rs6000_sched_reorder2): Likewise.
+	(rs6000_register_move_cost): Likewise.
+	(rs6000_opt_masks): Add -mfuture.
+	* config/rs6000/rs6000.h (ASM_CPU_SUPPORT): Likewise.
+	* config/rs6000/rs6000.opt (-mfuture): New undocumented debug switch.
+	* config/rs6000/rs6000.md (cpu attribute): Add -mcpu=future support.
+	* doc/invoke.texi (IBM RS/6000 and PowerPC Options): Document -mcpu=future.
+
 ==================== dmf007, patch #2 ====================
 
 Rework 128-bit complex multiply and divide.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [gcc(refs/users/meissner/heads/dmf007)] Update ChangeLog.meissner
@ 2023-01-23 21:18 Michael Meissner
  0 siblings, 0 replies; 11+ messages in thread
From: Michael Meissner @ 2023-01-23 21:18 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:121b1d3e9608a575e6c6d4c7ba6e6ed49b93340c

commit 121b1d3e9608a575e6c6d4c7ba6e6ed49b93340c
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Fri Jan 20 17:22:47 2023 -0500

    Update ChangeLog.meissner

Diff:
---
 gcc/ChangeLog.meissner | 141 ++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 140 insertions(+), 1 deletion(-)

diff --git a/gcc/ChangeLog.meissner b/gcc/ChangeLog.meissner
index fda09a48c25..83e9ee06ffc 100644
--- a/gcc/ChangeLog.meissner
+++ b/gcc/ChangeLog.meissner
@@ -1,4 +1,143 @@
+==================== dmf007, patch #2 ====================
+
+Rework 128-bit complex multiply and divide.
+
+This patch reworks how the complex multiply and divide built-in functions are
+done.  Previously we created built-in declarations for doing long double complex
+multiply and divide when long double is IEEE 128-bit.  The old code also did not
+support __ibm128 complex multiply and divide if long double is IEEE 128-bit.
+
+In terms of history, I wrote the original code just as I was starting to test
+GCC on systems where IEEE 128-bit long double was the default.  At the time, we
+had not yet started mangling the built-in function names as a way to bridge
+going from a system with 128-bit IBM long double to 128-bin IEEE long double.
+
+The original code depends on there only being two 128-bit types invovled.  With
+the next patch in this series, this assumption will no longer be true.  When
+long double is IEEE 128-bit, there will be 2 IEEE 128-bit types (one for the
+explicit __float128/_Float128 type and one for long double).
+
+The problem is we cannot create two separate built-in functions that resolve to
+the same name.  This is a requirement of add_builtin_function and the C front
+end.  That means for the 3 possible modes (IFmode, KFmode, and TFmode), you can
+only use 2 of them.
+
+This code does not create the built-in declaration with the changed name.
+Instead, it uses the TARGET_MANGLE_DECL_ASSEMBLER_NAME hook to change the name
+before it is written out to the assembler file like it now does for all of the
+other long double built-in functions.
+
+When I wrote these patches, I discovered that __ibm128 complex multiply and
+divide had originally not been supported if long double is IEEE 128-bit as it
+would generate calls to __mulic3 and __divic3.  I added tests in the testsuite
+to verify that the correct name (i.e. __multc3 and __divtc3) is used in this
+case.
+
+I had previously sent this patch out on November 1st.  Compared to that version,
+this version no longer disables the special mapping when you are building
+libgcc, as it turns out we don't need it.
+
+I tested all 3 patchs for PR target/107299 on:
+
+    1)	LE Power10 using --with-cpu=power10 --with-long-double-format=ieee
+    2)	LE Power10 using --with-cpu=power10 --with-long-double-format=ibm
+    3)	LE Power9  using --with-cpu=power9  --with-long-double-format=ibm
+    4)	BE Power8  using --with-cpu=power8  --with-long-double-format=ibm
+
+Once all 3 patches have been applied, we can once again build GCC when long
+double is IEEE 128-bit.  There were no other regressions with these patches.
+Can I check these patches into the trunk?
+
 2023-01-20   Michael Meissner  <meissner@linux.ibm.com>
 
-	Clone branch
+gcc/
+
+	PR target/107299
+	* config/rs6000/rs6000.cc (create_complex_muldiv): Delete.
+	(init_float128_ieee): Delete code to switch complex multiply and divide
+	for long double.
+	(complex_multiply_builtin_code): New helper function.
+	(complex_divide_builtin_code): Likewise.
+	(rs6000_mangle_decl_assembler_name): Add support for mangling the name
+	of complex 128-bit multiply and divide built-in functions.
+
+gcc/testsuite/
+
+	PR target/107299
+	* gcc.target/powerpc/divic3-1.c: New test.
+	* gcc.target/powerpc/divic3-2.c: Likewise.
+	* gcc.target/powerpc/mulic3-1.c: Likewise.
+	* gcc.target/powerpc/mulic3-2.c: Likewise.
+
+==================== dmf007, patch #1 ====================
+
+PR target/107299: Fix build issue when long double is IEEE 128-bit
+
+This patch updates the IEEE 128-bit types used in libgcc.
 
+At the moment, we cannot build GCC when the target uses IEEE 128-bit long
+doubles, such as building the compiler for a native Fedora 36 system.  The
+build dies when it is trying to build the _mulkc3.c and _divkc3 modules.
+
+This patch changes libgcc to use long double for the IEEE 128-bit base type if
+long double is IEEE 128-bit, and it uses _Float128 otherwise.  The built-in
+functions are adjusted to be the correct version based on the IEEE 128-bit base
+type used.
+
+While it is desirable to ultimately have __float128 and _Float128 use the same
+internal type and mode within GCC, at present if you use the option
+-mabi=ieeelongdouble, the __float128 type will use the long double type and not
+the _Float128 type.  We get an internal compiler error if we combine the
+signbitf128 built-in with a long double type.
+
+I've gone through several iterations of trying to fix this within GCC, and
+there are various problems that have come up.  I developed this alternative
+patch that changes libgcc so that it does not tickle the issue.  I hope we can
+fix the compiler at some point, but right now, this is preventing people on
+Fedora 36 systems from building compilers where the default long double is IEEE
+128-bit.
+
+I have built a GCC compiler tool chain on the following platforms and there
+were no regressions caused by these patches.
+
+    *	Power10 little endian, IBM long double, --with-cpu=power10
+
+    *	Power9 little endian, IBM long double, --with-cpu=power9
+
+    *	Power8 big endian, IBM long double, --with-cpu=power8, both
+	32-bit/64-bit tests.
+
+In addition, I have built a GCC compiler tool chain on the following systems
+with IEEE 128-bit long double as the default.  Comparing the test suite runs to
+the runs for the toolchain with IBM long double as the default, I only get the
+expected differences (C++ modules test fail on IEEE long double, 3 Fortran
+tests pass on IEEE long double that fail on IBM long double, C test pr105334.c
+fails, and C test fp128_conversions.c fails on power10):
+
+    *	Power10 little endian, IEEE long double, --with-cpu=power10
+
+    *	Power9 little endian, IEEE long double, --with-cpu=power9
+
+Can I check this change into the master branch?
+
+2023-01-20   Michael Meissner  <meissner@linux.ibm.com>
+
+	PR target/107299
+	* config/rs6000/_divkc3.c (COPYSIGN): Use the correct built-in based on
+	whether long double is IBM or IEEE.
+	(INFINITY): Likewise.
+	(FABS): Likewise.
+	* config/rs6000/_mulkc3.c (COPYSIGN): Likewise.
+	(INFINITY): Likewise.
+	* config/rs6000/quad-float128.h (TF): Remove definition.
+	(TFtype): Define to be long double or _Float128.
+	(TCtype): Define to be _Complex long double or _Complex _Float128.
+	* libgcc2.h (TFtype): Allow machine config files to override this.
+	(TCtype): Likewise.
+	* soft-fp/quad.h (TFtype): Likewise.
+
+==================== dmf007, branch head ====================
+
+2023-01-20   Michael Meissner  <meissner@linux.ibm.com>
+
+	Clone branch

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [gcc(refs/users/meissner/heads/dmf007)] Update ChangeLog.meissner
@ 2023-01-21  3:27 Michael Meissner
  0 siblings, 0 replies; 11+ messages in thread
From: Michael Meissner @ 2023-01-21  3:27 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:d4c1fec1b7ba80c4b3b1abde2155072830a81c60

commit d4c1fec1b7ba80c4b3b1abde2155072830a81c60
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Fri Jan 20 22:26:57 2023 -0500

    Update ChangeLog.meissner

Diff:
---
 gcc/ChangeLog.meissner | 514 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 514 insertions(+)

diff --git a/gcc/ChangeLog.meissner b/gcc/ChangeLog.meissner
index 83e9ee06ffc..501730114f7 100644
--- a/gcc/ChangeLog.meissner
+++ b/gcc/ChangeLog.meissner
@@ -1,3 +1,517 @@
+==================== dmf007, patch #17 ====================
+
+Support load/store vector with right length.
+
+This patch adds support for new instructions that may be added to the PowerPC
+architecture in the future to enhance the load and store vector with length
+instructions.
+
+The current instructions (lxvl, lxvll, stxvl, and stxvll) are inconvient to use
+since the count for the number of bytes must be in the top 8 bits of the GPR
+register, instead of the bottom 8 bits.  This meant that code generating these
+instructions typically had to do a shift left by 56 bits to get the count into
+the right position.  In a future version of the PowerPC architecture, new
+variants of these instructions might be added that expect the count to be in
+the bottom 8 bits of the GPR register.  These patches add this support to GCC
+if the user uses the -mcpu=future option.
+
+I tested this patch on a little endian power10 system with long double using
+the tradiational IBM double double format.  Assuming the other 6 patches for
+-mcpu=future are checked in (or at least the first patch), can I check this
+patch into the master branch for GCC 13.
+
+2022-12-02   Michael Meissner  <meissner@linux.ibm.com>
+
+gcc/
+
+	* config/rs6000/vsx.md (lxvl): If -mcpu=future, generate the lxvl with
+	the shift count automaticaly used in the insn.
+	(lxvrl): New insn for -mcpu=future.
+	(lxvrll): Likewise.
+	(stxvl): If -mcpu=future, generate the stxvl with the shift count
+	automaticaly used in the insn.
+	(stxvrl): New insn for -mcpu=future.
+	(stxvrll): Likewise.
+
+gcc/testsuite/
+
+	* gcc.target/powerpc/lxvrl.c: New test.
+
+==================== dmf007, patch #16 ====================
+
+Add saturating subtract built-ins.
+
+This patch adds support for a saturating subtract built-in function that may be
+added to a future PowerPC processor.  Note, if it is added, the name of the
+built-in function may change before GCC 13 is released.  If the name changes,
+we will submit a patch changing the name.
+
+I also added support for providing dense math built-in functions, even though
+at present, we have not added any new built-in functions for dense math.  It is
+likely we will want to add new dense math built-in functions as the dense math
+support is fleshed out.
+
+I tested this patch on a little endian power10 system with long double using
+the tradiational IBM double double format.  Assuming the other 6 patches for
+-mcpu=future are checked in (or at least the first patch), can I check this
+patch into the master branch for GCC 13.
+
+2022-12-02   Michael Meissner  <meissner@linux.ibm.com>
+
+gcc/
+
+	* config/rs6000/rs6000-builtin.cc (rs6000_invalid_builtin): Add support
+	for flagging invalid use of future built-in functions.
+	(rs6000_builtin_is_supported): Add support for future built-in
+	functions.
+	* config/rs6000/rs6000-builtins.def (__builtin_saturate_subtract32): New
+	built-in function for -mcpu=future.
+	(__builtin_saturate_subtract64): Likewise.
+	* config/rs6000/rs6000-gen-builtins.cc (enum bif_stanza): Add stanzas
+	for -mcpu=future built-ins.
+	(stanza_map): Likewise.
+	(enable_string): Likewise.
+	(struct attrinfo): Likewise.
+	(parse_bif_attrs): Likewise.
+	(write_decls): Likewise.
+	* config/rs6000/rs6000.md (sat_sub<mode>3): Add saturating subtract
+	built-in insn declarations.
+	(sat_sub<mode>3_dot): Likewise.
+	(sat_sub<mode>3_dot2): Likewise.
+	* doc/extend.texi (Future PowerPC built-ins): New section.
+
+gcc/testsuite/
+
+	* gcc.target/powerpc/subfus-1.c: New test.
+	* gcc.target/powerpc/subfus-2.c: Likewise.
+	* lib/target-supports.exp (check_effective_target_powerpc_future_ok):
+	New effective target.
+
+==================== dmf007, patch #15 ====================
+
+PowerPC: Add support for 1,024 bit DMR registers.
+
+This patch is a prelimianry patch to add the full 1,024 bit dense math register
+(DMRs) for -mcpu=future.  The MMA 512-bit accumulators map onto the top of the
+DMR register.
+
+This patch only adds the new 1,024 bit register support.  It does not add
+support for any instructions that need 1,024 bit registers instead of 512 bit
+registers.
+
+I used the new mode 'TDOmode' to be the opaque mode used for 1,204 bit
+registers.  The 'wD' constraint added in previous patches is used for these
+registers.  I added support to do load and store of DMRs via the VSX registers,
+since there are no load/store dense math instructions.  I added the new keyword
+'__dmr' to create 1,024 bit types that can be loaded into DMRs.  At present, I
+don't have aliases for __dmr512 and __dmr1024 that we've discussed internally.
+
+At present, the tree constant propigation patch does not work with 1,024 bit
+DMRs.  I believe this is due to the CCP pass not skipping opaque modes.  I hope
+once this patch is committed, we can work on the machine independent changes to
+allow the CCP pass not to issue an internal error when a DMR is used.
+
+The patches have been tested on the following platforms.  I added the patches
+for PR target/107299 that I submitted on November 2nd before doing the builds so
+that GCC would build on systems using IEEE 128-bit long double.
+    *	https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604834.html
+
+There were no regressions with doing bootstrap builds and running the regression
+tests:
+
+    1)	Power10 LE using --with-cpu=power10 --with-long-double-format=ieee;
+    2)	Power10 LE using --with-cpu=power10 --with-long-double-format=ibm;
+    3)	Power9 LE using --with-cpu=power9 --with-long-double-format=ibm; and
+    4)	Power8 BE using --with-cpu=power8 (both 32-bit & 64-bit tested).
+
+Can I check this patch into the GCC 13 master branch?
+
+2022-12-02   Michael Meissner  <meissner@linux.ibm.com>
+
+gcc/
+
+	* config/rs6000/mma.md (UNSPEC_DM_INSERT512_UPPER): New unspec.
+	(UNSPEC_DM_INSERT512_LOWER): Likewise.
+	(UNSPEC_DM_EXTRACT512): Likewise.
+	(UNSPEC_DMR_RELOAD_FROM_MEMORY): Likewise.
+	(UNSPEC_DMR_RELOAD_TO_MEMORY): Likewise.
+	(movtdo): New define_expand and define_insn_and_split to implement 1,024
+	bit DMR registers.
+	(movtdo_insert512_upper): New insn.
+	(movtdo_insert512_lower): Likewise.
+	(movtdo_extract512): Likewise.
+	(reload_dmr_from_memory): Likewise.
+	(reload_dmr_to_memory): Likewise.
+	* config/rs6000/rs6000-builtin.cc (rs6000_type_string): Add DMR
+	support.
+	(rs6000_init_builtins): Add support for __dmr keyword.
+	* config/rs6000/rs6000-call.cc (rs6000_return_in_memory): Add support
+	for TDOmode.
+	(rs6000_function_arg): Likewise.
+	* config/rs6000/rs6000-modes.def (TDOmode): New mode.
+	* config/rs6000/rs6000.cc (rs6000_hard_regno_nregs_internal): Add
+	support for TDOmode.
+	(rs6000_hard_regno_mode_ok_uncached): Likewise.
+	(rs6000_hard_regno_mode_ok): Likewise.
+	(rs6000_modes_tieable_p): Likewise.
+	(rs6000_debug_reg_global): Likewise.
+	(rs6000_setup_reg_addr_masks): Likewise.
+	(rs6000_init_hard_regno_mode_ok): Add support for TDOmode.  Setup reload
+	hooks for DMR mode.
+	(reg_offset_addressing_ok_p): Add support for TDOmode.
+	(rs6000_emit_move): Likewise.
+	(rs6000_secondary_reload_simple_move): Likewise.
+	(rs6000_secondary_reload_class): Likewise.
+	(rs6000_mangle_type): Add mangling for __dmr type.
+	(rs6000_dmr_register_move_cost): Add support for TDOmode.
+	(rs6000_split_multireg_move): Likewise.
+	(rs6000_invalid_conversion): Likewise.
+	* config/rs6000/rs6000.h (VECTOR_ALIGNMENT_P): Add TDOmode.
+	(enum rs6000_builtin_type_index): Add DMR type nodes.
+	(dmr_type_node): Likewise.
+	(ptr_dmr_type_node): Likewise.
+
+gcc/testsuite/
+
+	* gcc.target/powerpc/dm-1024bit.c: New test.
+
+==================== dmf007, patch #14 ====================
+
+PowerPC: Switch to dense math names for all MMA operations.
+
+This patch changes the assembler instruction names for MMA instructions from
+the original name used in power10 to the new name when used with the dense math
+system.  I.e. xvf64gerpp becomes dmxvf64gerpp.  The assembler will emit the
+same bits for either spelling.
+
+The patches have been tested on the following platforms.  I added the patches
+for PR target/107299 that I submitted on November 2nd before doing the builds so
+that GCC would build on systems using IEEE 128-bit long double.
+    *	https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604834.html
+
+There were no regressions with doing bootstrap builds and running the regression
+tests:
+
+    1)	Power10 LE using --with-cpu=power10 --with-long-double-format=ieee;
+    2)	Power10 LE using --with-cpu=power10 --with-long-double-format=ibm;
+    3)	Power9 LE using --with-cpu=power9 --with-long-double-format=ibm; and
+    4)	Power8 BE using --with-cpu=power8 (both 32-bit & 64-bit tested).
+
+Can I check this patch into the GCC 13 master branch?
+
+2022-11-09   Michael Meissner  <meissner@linux.ibm.com>
+
+gcc/
+
+	* config/rs6000/mma.md (vvi4i4i8_dm): New int attribute.
+	(avvi4i4i8_dm): Likewise.
+	(vvi4i4i2_dm): Likewise.
+	(avvi4i4i2_dm): Likewise.
+	(vvi4i4_dm): Likewise.
+	(avvi4i4_dm): Likewise.
+	(pvi4i2_dm): Likewise.
+	(apvi4i2_dm): Likewise.
+	(vvi4i4i4_dm): Likewise.
+	(avvi4i4i4_dm): Likewise.
+	(mma_<vv>): Add support for running on DMF systems, generating the dense
+	math instruction and using the dense math accumulators.
+	(mma_<avv>): Likewise.
+	(mma_<pv>): Likewise.
+	(mma_<apv>): Likewise.
+	(mma_<vvi4i4i8>): Likewise.
+	(mma_<avvi4i4i8>): Likewise.
+	(mma_<vvi4i4i2>): Likewise.
+	(mma_<avvi4i4i2>): Likewise.
+	(mma_<vvi4i4>): Likewise.
+	(mma_<avvi4i4): Likewise.
+	(mma_<pvi4i2>): Likewise.
+	(mma_<apvi4i2): Likewise.
+	(mma_<vvi4i4i4>): Likewise.
+	(mma_<avvi4i4i4>): Likewise.
+
+gcc/testsuite/
+
+	* gcc.target/powerpc/dm-double-test.c: New test.
+	* lib/target-supports.exp (check_effective_target_ppc_dmr_ok): New
+	target test.
+
+==================== dmf007, patch #13 ====================
+
+PowerPC: Make MMA insns support DMR registers.
+
+This patch changes the MMA instructions to use either FPR registers
+(-mcpu=power10) or DMRs (-mcpu=future).  In this patch, the existing MMA
+instruction names are used.
+
+A macro (__PPC_DMR__) is defined if the MMA instructions use the DMRs.
+
+The patches have been tested on the following platforms.  I added the patches
+for PR target/107299 that I submitted on November 2nd before doing the builds so
+that GCC would build on systems using IEEE 128-bit long double.
+    *	https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604834.html
+
+There were no regressions with doing bootstrap builds and running the regression
+tests:
+
+    1)	Power10 LE using --with-cpu=power10 --with-long-double-format=ieee;
+    2)	Power10 LE using --with-cpu=power10 --with-long-double-format=ibm;
+    3)	Power9 LE using --with-cpu=power9 --with-long-double-format=ibm; and
+    4)	Power8 BE using --with-cpu=power8 (both 32-bit & 64-bit tested).
+
+Can I check this patch into the GCC 13 master branch?
+
+2022-12-02   Michael Meissner  <meissner@linux.ibm.com>
+
+gcc/
+
+	* config/rs6000/mma.md (mma_<acc>): New define_expand to handle
+	mma_<acc> for dense math and non dense math.
+	(mma_<acc> insn): Restrict to non dense math.
+	(mma_xxsetaccz): Convert to define_expand to handle non dense math and
+	dense math.
+	(mma_xxsetaccz_p10): Rename from mma_xxsetaccz and restrict usage to non
+	dense math.
+	(mma_xxsetaccz_dm): Dense math version of mma_xxsetaccz.
+	(mma_<vv>): Add support for dense math.
+	(mma_<avv>): Likewise.
+	(mma_<pv>): Likewise.
+	(mma_<apv>): Likewise.
+	(mma_<vvi4i4i8>): Likewise.
+	(mma_<avvi4i4i8>): Likewise.
+	(mma_<vvi4i4i2>): Likewise.
+	(mma_<avvi4i4i2>): Likewise.
+	(mma_<vvi4i4>): Likewise.
+	(mma_<avvi4i4>): Likewise.
+	(mma_<pvi4i2>): Likewise.
+	(mma_<apvi4i2>): Likewise.
+	(mma_<vvi4i4i4>): Likewise.
+	(mma_<avvi4i4i4>): Likewise.
+	* config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): Define
+	__PPC_DMR__ if we have dense math instructions.
+	* config/rs6000/rs6000.cc (print_operand): Make %A handle only DMRs if
+	dense math and only FPRs if not dense math.
+	(rs6000_split_multireg_move): Do not generate accumulator prime or
+	de-prime instructions if dense math.
+
+==================== dmf007, patch #12 ====================
+
+PowerPC: Add support for accumulators in DMR registers.
+
+The MMA system added the notion of accumulator registers.  In power10, these
+accumulators overlapped with the FPR registers, but logically the accumulators
+were separate from the FPR registers.  It is anticipated that in future
+systems, we may have a separate dense math unit and the accumulators will be
+mapped onto the new dense math registers (DMRs).  This patch adds the support
+for dense math registers.
+
+These changes are preliminary.  They are expected to change over time.
+
+This particular patch does not change the MMA support to use the accumulators
+within the dense math registers.  This patch just adds the basic support for
+having separate DMRs.  The next patch will switch the MMA support to use the
+accumulators if -mcpu=future is used.
+
+For testing purposes, I added an undocumented option '-mdense-math' to enable
+or disable the dense math support.
+
+This patch adds a new constraint (wD).  If MMA is selected but dense math is
+not selected (i.e. -mcpu=power10), the wD constraint will match accumulators
+that overlap with the FPRs.  If both MMA and dense math are selected
+(i.e. -mcpu=future), the wD constraint will only match DMRs.
+
+This patch modifies the existing %A output modifier.  If MMA is selected but
+dense math is not selected, then %A convert the FPR register number to the
+accumulator number.  If both MMA and dense math are selected, then %A will only
+work if the register is an accumulator mapped onto a DMR.
+
+The intention is that user code using extended asm can be modified to run on
+both MMA without dense math and MMA with dense math:
+
+    1)	If possible, don't use extended asm, but instead use the MMA built-in
+	functions;
+
+    2)	If you do need to write extended asm, change the d constraints
+	targetting accumulators should now use wD;
+
+    3)	Only use the built-in zero, assemble and disassemble functions create
+	move data between vector quad types and dense math accumulators.
+	I.e. do not use the xxmfacc, xxmtacc, and xxsetaccz directly in the
+	extended asm code.  The reason is these instructions assume there is a
+	1-to-1 correspondence between 4 adjacent FPR registers and an
+	accumulator that overlaps with those instructions.  With accumulators
+	now being separate registers, there no longer is a 1-to-1
+	correspondence.
+
+It is possible that the mangling for DMRs and the GDB register numbers may
+change in the future.
+
+The patches have been tested on the following platforms.  I added the patches
+for PR target/107299 that I submitted on November 2nd before doing the builds so
+that GCC would build on systems using IEEE 128-bit long double.
+    *	https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604834.html
+
+There were no regressions with doing bootstrap builds and running the regression
+tests:
+
+    1)	Power10 LE using --with-cpu=power10 --with-long-double-format=ieee;
+    2)	Power10 LE using --with-cpu=power10 --with-long-double-format=ibm;
+    3)	Power9 LE using --with-cpu=power9 --with-long-double-format=ibm; and
+    4)	Power8 BE using --with-cpu=power8 (both 32-bit & 64-bit tested).
+
+Can I check this patch into the GCC 13 master branch?
+
+2022-12-02   Michael Meissner  <meissner@linux.ibm.com>
+
+gcc/
+
+	* config/rs6000/constraints.md (wD constraint): New constraint.
+	* config/rs6000/mma.md (UNSPEC_DM_ASSEMBLE_ACC): New unspec.
+	(movxo): Convert into define_expand.
+	(movxo_fpr): Version of movxo where accumulators overlap with FPRs.
+	(movxo_dm): Dense math version of movxo.
+	(mma_assemble_acc): Add dense match support to define_expand.
+	(mma_assemble_acc_fpr): Rename from mma_assemble_acc, and restrict it to
+	non dense math.
+	(mma_assemble_acc_dm): Dense math version of mma_assemble_acc.
+	(mma_disassemble_acc): Add dense math support to define_expand.
+	(mma_disassemble_acc_fpr): Rename from mma_disassemble_acc, and restrict
+	it to non dense math.
+	(mma_disassemble_acc_dm): Dense math version of mma_disassemble_acc.
+	* config/rs6000/predicates.md (dmr_operand): New predicate.
+	(accumulator_operand): Likewise.
+	* config/rs6000/rs6000-cpus.def (ISA_FUTURE_MASKS): Add -mdense-math.
+	(POWERPC_MASKS): Likewise.
+	* config/rs6000/rs6000.cc (enum rs6000_reg_type): Add DMR_REG_TYPE.
+	(enum rs6000_reload_reg_type): Add RELOAD_REG_DMR.
+	(LAST_RELOAD_REG_CLASS): Add support for DMR registers.
+	(reload_reg_map): Likewise.
+	(rs6000_reg_names): Likewise.
+	(alt_reg_names): Likewise.
+	(rs6000_hard_regno_nregs_internal): Likewise.
+	(rs6000_hard_regno_mode_ok_uncached): Likewise.
+	(rs6000_debug_reg_global): Likewise.
+	(rs6000_setup_reg_addr_masks): Likewise.
+	(rs6000_init_hard_regno_mode_ok): Likewise.
+	(rs6000_option_override_internal): Add checking for -mdense-math.
+	(rs6000_secondary_reload_memory): Add support for DMR registers.
+	(rs6000_secondary_reload_simple_move): Likewise.
+	(rs6000_preferred_reload_class): Likewise.
+	(rs6000_secondary_reload_class): Likewise.
+	(print_operand): Make %A handle both FPRs and DMRs.
+	(rs6000_dmr_register_move_cost): New helper function.
+	(rs6000_register_move_cost): Add support for DMR registers.
+	(rs6000_memory_move_cost): Likewise.
+	(rs6000_compute_pressure_classes): Likewise.
+	(rs6000_debugger_regno): Likewise.
+	(rs6000_opt_masks): Add -mdense-math.
+	(rs6000_split_multireg_move): Add support for DMRs.
+	* config/rs6000/rs6000.h (UNITS_PER_DMR_WORD): New macro.
+	(FIRST_PSEUDO_REGISTER): Update for DMRs.
+	(FIXED_REGISTERS): Add DMRs.
+	(CALL_REALLY_USED_REGISTERS): Likewise.
+	(REG_ALLOC_ORDER): Likewise.
+	(enum reg_class): Add DM_REGS.
+	(REG_CLASS_NAMES): Likewise.
+	(REG_CLASS_CONTENTS): Likewise.
+	* config/rs6000/rs6000.md (FIRST_DMR_REGNO): New constant.
+	(LAST_DMR_REGNO): Likewise.
+	(isa attribute): Add 'dm' and 'not_dm' attributes.
+	(enabled attribute): Support 'dm' and 'not_dm' attributes.
+	* config/rs6000/rs6000.opt (-mdense-math): New switch.
+	* doc/md.texi (PowerPC constraints): Document wD constraint.
+
+==================== dmf007, patch #10 ====================
+
+PowerPC: Make -mcpu=future enable -mblock-ops-vector-pair.
+
+This patch enables generating load and store vector pair instructions when
+doing certain memory copy operations when -mcpu=future is used.  In doing tests
+on power10, it was determined that using these instructions were problematical
+in a few cases, so we disabled generating them by default.  This patch
+re-enabled generating these instructions if -mcpu=future is used.
+
+The patches have been tested on the following platforms.  I added the patches
+for PR target/107299 that I submitted on November 2nd before doing the builds so
+that GCC would build on systems using IEEE 128-bit long double.
+    *	https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604834.html
+
+There were no regressions with doing bootstrap builds and running the regression
+tests:
+
+    1)	Power10 LE using --with-cpu=power10 --with-long-double-format=ieee;
+    2)	Power10 LE using --with-cpu=power10 --with-long-double-format=ibm;
+    3)	Power9 LE using --with-cpu=power9 --with-long-double-format=ibm; and
+    4)	Power8 BE using --with-cpu=power8 (both 32-bit & 64-bit tested).
+
+Can I check this patch into the GCC 13 master branch?
+
+2022-12-02   Michael Meissner  <meissner@linux.ibm.com>
+
+gcc/
+
+	* config/rs6000/rs6000-cpus.def (ISA_FUTURE_MASKS): Add
+	-mblock-ops-vector-pair.
+	(POWERPC_MASKS): Likewise.
+
+==================== dmf007, patch #10 ====================
+
+PowerPC: Add -mcpu=future.
+
+This patch adds support for the -mcpu=future and -mtune=future options.
+Besides defining __ARCH_PWR_FUTURE__ this particular patch does not enable any
+new features.
+
+At present, we do not have any specific differences in terms of cpu tuning for
+future machines, so we make -mtune=future act the same as -mtune=power10.  It
+is anticipated that we may add support for changing the tuning characteristics
+for -mtune=future at a later time.
+
+These patches implement support for potential future PowerPC cpus.  At this
+time, features enabled with -mcpu=future may or may not be in actual PowerPCs
+that will be delivered in the future.
+
+The patches have been tested on the following platforms.  I added the patches
+for PR target/107299 that I submitted on November 2nd before doing the builds so
+that GCC would build on systems using IEEE 128-bit long double.
+    *	https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604834.html
+
+There were no regressions with doing bootstrap builds and running the regression
+tests:
+
+    1)	Power10 LE using --with-cpu=power10 --with-long-double-format=ieee;
+    2)	Power10 LE using --with-cpu=power10 --with-long-double-format=ibm;
+    3)	Power9 LE using --with-cpu=power9 --with-long-double-format=ibm; and
+    4)	Power8 BE using --with-cpu=power8 (both 32-bit & 64-bit tested).
+
+Can I check this patch into the GCC 13 master branch?
+
+2022-12-02   Michael Meissner  <meissner@linux.ibm.com>
+
+gcc/
+
+	* config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): Define
+	__ARCH_PWR_FUTURE__ if -mcpu=future.
+	* config/rs6000/rs6000-cpus.def (ISA_FUTURE_MASKS): New macro.
+	(POWERPC_MASKS): Add -mfuture.
+	* config/rs6000/rs6000-opts.h (enum processor_type): Add
+	PROCESSOR_FUTURE.
+	* config/rs6000/rs6000-tables.opt: Regenerate.
+	* config/rs6000/rs6000.cc (rs6000_option_override_internal): Add
+	-mcpu=future support.  Make -mtune=future act like -mtune=power10 for
+	now.
+	(rs6000_machine_from_flags): Likewise.
+	(rs6000_reassociation_width): Likewise.
+	(rs6000_adjust_cost): Likewise.
+	(rs6000_issue_rate): Likewise.
+	(rs6000_sched_reorder): Likewise.
+	(rs6000_sched_reorder2): Likewise.
+	(rs6000_register_move_cost): Likewise.
+	(rs6000_opt_masks): Add -mfuture.
+	* config/rs6000/rs6000.h (ASM_CPU_SUPPORT): Likewise.
+	* config/rs6000/rs6000.opt (-mfuture): New undocumented debug switch.
+	* config/rs6000/rs6000.md (cpu attribute): Add -mcpu=future support.
+	* doc/invoke.texi (IBM RS/6000 and PowerPC Options): Document -mcpu=future.
+
 ==================== dmf007, patch #2 ====================
 
 Rework 128-bit complex multiply and divide.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [gcc(refs/users/meissner/heads/dmf007)] Update ChangeLog.meissner
@ 2023-01-20 22:22 Michael Meissner
  0 siblings, 0 replies; 11+ messages in thread
From: Michael Meissner @ 2023-01-20 22:22 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:696815fa224950eed9f7199e80044aba77fddbcf

commit 696815fa224950eed9f7199e80044aba77fddbcf
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Fri Jan 20 17:22:47 2023 -0500

    Update ChangeLog.meissner

Diff:
---
 gcc/ChangeLog.meissner | 141 ++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 140 insertions(+), 1 deletion(-)

diff --git a/gcc/ChangeLog.meissner b/gcc/ChangeLog.meissner
index fda09a48c25..83e9ee06ffc 100644
--- a/gcc/ChangeLog.meissner
+++ b/gcc/ChangeLog.meissner
@@ -1,4 +1,143 @@
+==================== dmf007, patch #2 ====================
+
+Rework 128-bit complex multiply and divide.
+
+This patch reworks how the complex multiply and divide built-in functions are
+done.  Previously we created built-in declarations for doing long double complex
+multiply and divide when long double is IEEE 128-bit.  The old code also did not
+support __ibm128 complex multiply and divide if long double is IEEE 128-bit.
+
+In terms of history, I wrote the original code just as I was starting to test
+GCC on systems where IEEE 128-bit long double was the default.  At the time, we
+had not yet started mangling the built-in function names as a way to bridge
+going from a system with 128-bit IBM long double to 128-bin IEEE long double.
+
+The original code depends on there only being two 128-bit types invovled.  With
+the next patch in this series, this assumption will no longer be true.  When
+long double is IEEE 128-bit, there will be 2 IEEE 128-bit types (one for the
+explicit __float128/_Float128 type and one for long double).
+
+The problem is we cannot create two separate built-in functions that resolve to
+the same name.  This is a requirement of add_builtin_function and the C front
+end.  That means for the 3 possible modes (IFmode, KFmode, and TFmode), you can
+only use 2 of them.
+
+This code does not create the built-in declaration with the changed name.
+Instead, it uses the TARGET_MANGLE_DECL_ASSEMBLER_NAME hook to change the name
+before it is written out to the assembler file like it now does for all of the
+other long double built-in functions.
+
+When I wrote these patches, I discovered that __ibm128 complex multiply and
+divide had originally not been supported if long double is IEEE 128-bit as it
+would generate calls to __mulic3 and __divic3.  I added tests in the testsuite
+to verify that the correct name (i.e. __multc3 and __divtc3) is used in this
+case.
+
+I had previously sent this patch out on November 1st.  Compared to that version,
+this version no longer disables the special mapping when you are building
+libgcc, as it turns out we don't need it.
+
+I tested all 3 patchs for PR target/107299 on:
+
+    1)	LE Power10 using --with-cpu=power10 --with-long-double-format=ieee
+    2)	LE Power10 using --with-cpu=power10 --with-long-double-format=ibm
+    3)	LE Power9  using --with-cpu=power9  --with-long-double-format=ibm
+    4)	BE Power8  using --with-cpu=power8  --with-long-double-format=ibm
+
+Once all 3 patches have been applied, we can once again build GCC when long
+double is IEEE 128-bit.  There were no other regressions with these patches.
+Can I check these patches into the trunk?
+
 2023-01-20   Michael Meissner  <meissner@linux.ibm.com>
 
-	Clone branch
+gcc/
+
+	PR target/107299
+	* config/rs6000/rs6000.cc (create_complex_muldiv): Delete.
+	(init_float128_ieee): Delete code to switch complex multiply and divide
+	for long double.
+	(complex_multiply_builtin_code): New helper function.
+	(complex_divide_builtin_code): Likewise.
+	(rs6000_mangle_decl_assembler_name): Add support for mangling the name
+	of complex 128-bit multiply and divide built-in functions.
+
+gcc/testsuite/
+
+	PR target/107299
+	* gcc.target/powerpc/divic3-1.c: New test.
+	* gcc.target/powerpc/divic3-2.c: Likewise.
+	* gcc.target/powerpc/mulic3-1.c: Likewise.
+	* gcc.target/powerpc/mulic3-2.c: Likewise.
+
+==================== dmf007, patch #1 ====================
+
+PR target/107299: Fix build issue when long double is IEEE 128-bit
+
+This patch updates the IEEE 128-bit types used in libgcc.
 
+At the moment, we cannot build GCC when the target uses IEEE 128-bit long
+doubles, such as building the compiler for a native Fedora 36 system.  The
+build dies when it is trying to build the _mulkc3.c and _divkc3 modules.
+
+This patch changes libgcc to use long double for the IEEE 128-bit base type if
+long double is IEEE 128-bit, and it uses _Float128 otherwise.  The built-in
+functions are adjusted to be the correct version based on the IEEE 128-bit base
+type used.
+
+While it is desirable to ultimately have __float128 and _Float128 use the same
+internal type and mode within GCC, at present if you use the option
+-mabi=ieeelongdouble, the __float128 type will use the long double type and not
+the _Float128 type.  We get an internal compiler error if we combine the
+signbitf128 built-in with a long double type.
+
+I've gone through several iterations of trying to fix this within GCC, and
+there are various problems that have come up.  I developed this alternative
+patch that changes libgcc so that it does not tickle the issue.  I hope we can
+fix the compiler at some point, but right now, this is preventing people on
+Fedora 36 systems from building compilers where the default long double is IEEE
+128-bit.
+
+I have built a GCC compiler tool chain on the following platforms and there
+were no regressions caused by these patches.
+
+    *	Power10 little endian, IBM long double, --with-cpu=power10
+
+    *	Power9 little endian, IBM long double, --with-cpu=power9
+
+    *	Power8 big endian, IBM long double, --with-cpu=power8, both
+	32-bit/64-bit tests.
+
+In addition, I have built a GCC compiler tool chain on the following systems
+with IEEE 128-bit long double as the default.  Comparing the test suite runs to
+the runs for the toolchain with IBM long double as the default, I only get the
+expected differences (C++ modules test fail on IEEE long double, 3 Fortran
+tests pass on IEEE long double that fail on IBM long double, C test pr105334.c
+fails, and C test fp128_conversions.c fails on power10):
+
+    *	Power10 little endian, IEEE long double, --with-cpu=power10
+
+    *	Power9 little endian, IEEE long double, --with-cpu=power9
+
+Can I check this change into the master branch?
+
+2023-01-20   Michael Meissner  <meissner@linux.ibm.com>
+
+	PR target/107299
+	* config/rs6000/_divkc3.c (COPYSIGN): Use the correct built-in based on
+	whether long double is IBM or IEEE.
+	(INFINITY): Likewise.
+	(FABS): Likewise.
+	* config/rs6000/_mulkc3.c (COPYSIGN): Likewise.
+	(INFINITY): Likewise.
+	* config/rs6000/quad-float128.h (TF): Remove definition.
+	(TFtype): Define to be long double or _Float128.
+	(TCtype): Define to be _Complex long double or _Complex _Float128.
+	* libgcc2.h (TFtype): Allow machine config files to override this.
+	(TCtype): Likewise.
+	* soft-fp/quad.h (TFtype): Likewise.
+
+==================== dmf007, branch head ====================
+
+2023-01-20   Michael Meissner  <meissner@linux.ibm.com>
+
+	Clone branch

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-02-02  2:59 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-01  6:28 [gcc(refs/users/meissner/heads/dmf007)] Update ChangeLog.meissner Michael Meissner
  -- strict thread matches above, loose matches on Subject: below --
2023-02-02  2:59 Michael Meissner
2023-02-01 17:35 Michael Meissner
2023-02-01  3:10 Michael Meissner
2023-01-28  7:14 Michael Meissner
2023-01-28  3:21 Michael Meissner
2023-01-23 21:19 Michael Meissner
2023-01-23 21:19 Michael Meissner
2023-01-23 21:18 Michael Meissner
2023-01-21  3:27 Michael Meissner
2023-01-20 22:22 Michael Meissner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).