public inbox for gcc-cvs@sourceware.org
help / color / mirror / Atom feed
* [gcc(refs/users/meissner/heads/work162-dmf)] Update ChangeLog.*
@ 2024-03-13 6:36 Michael Meissner
0 siblings, 0 replies; only message in thread
From: Michael Meissner @ 2024-03-13 6:36 UTC (permalink / raw)
To: gcc-cvs
https://gcc.gnu.org/g:9e9f7da1148b547dd4aa1f2084cd7df1d407d2dd
commit 9e9f7da1148b547dd4aa1f2084cd7df1d407d2dd
Author: Michael Meissner <meissner@linux.ibm.com>
Date: Wed Mar 13 02:36:19 2024 -0400
Update ChangeLog.*
Diff:
---
gcc/ChangeLog.dmf | 307 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 307 insertions(+)
diff --git a/gcc/ChangeLog.dmf b/gcc/ChangeLog.dmf
index 4bf550e6556..03ab4ad714c 100644
--- a/gcc/ChangeLog.dmf
+++ b/gcc/ChangeLog.dmf
@@ -1,5 +1,312 @@
+==================== Branch work162-dmf, patch #106 ====================
+
+PowerPC: Add support for 1,024 bit DMR registers.
+
+This patch is a prelimianry patch to add the full 1,024 bit dense math register
+(DMRs) for -mcpu=future. The MMA 512-bit accumulators map onto the top of the
+DMR register.
+
+This patch only adds the new 1,024 bit register support. It does not add
+support for any instructions that need 1,024 bit registers instead of 512 bit
+registers.
+
+I used the new mode 'TDOmode' to be the opaque mode used for 1,024 bit
+registers. The 'wD' constraint added in previous patches is used for these
+registers. I added support to do load and store of DMRs via the VSX registers,
+since there are no load/store dense math instructions. I added the new keyword
+'__dmr' to create 1,024 bit types that can be loaded into DMRs. At present, I
+don't have aliases for __dmr512 and __dmr1024 that we've discussed internally.
+
+The patches have been tested on both little and big endian systems. Can I check
+it into the master branch?
+
+2024-03-13 Michael Meissner <meissner@linux.ibm.com>
+
+gcc/
+
+ * config/rs6000/mma.md (UNSPEC_DM_INSERT512_UPPER): New unspec.
+ (UNSPEC_DM_INSERT512_LOWER): Likewise.
+ (UNSPEC_DM_EXTRACT512): Likewise.
+ (UNSPEC_DMR_RELOAD_FROM_MEMORY): Likewise.
+ (UNSPEC_DMR_RELOAD_TO_MEMORY): Likewise.
+ (movtdo): New define_expand and define_insn_and_split to implement 1,024
+ bit DMR registers.
+ (movtdo_insert512_upper): New insn.
+ (movtdo_insert512_lower): Likewise.
+ (movtdo_extract512): Likewise.
+ (reload_dmr_from_memory): Likewise.
+ (reload_dmr_to_memory): Likewise.
+ * config/rs6000/rs6000-builtin.cc (rs6000_type_string): Add DMR
+ support.
+ (rs6000_init_builtins): Add support for __dmr keyword.
+ * config/rs6000/rs6000-call.cc (rs6000_return_in_memory): Add support
+ for TDOmode.
+ (rs6000_function_arg): Likewise.
+ * config/rs6000/rs6000-modes.def (TDOmode): New mode.
+ * config/rs6000/rs6000.cc (rs6000_hard_regno_nregs_internal): Add
+ support for TDOmode.
+ (rs6000_hard_regno_mode_ok_uncached): Likewise.
+ (rs6000_hard_regno_mode_ok): Likewise.
+ (rs6000_modes_tieable_p): Likewise.
+ (rs6000_debug_reg_global): Likewise.
+ (rs6000_setup_reg_addr_masks): Likewise.
+ (rs6000_init_hard_regno_mode_ok): Add support for TDOmode. Setup reload
+ hooks for DMR mode.
+ (reg_offset_addressing_ok_p): Add support for TDOmode.
+ (rs6000_emit_move): Likewise.
+ (rs6000_secondary_reload_simple_move): Likewise.
+ (rs6000_preferred_reload_class): Likewise.
+ (rs6000_secondary_reload_class): Likewise.
+ (rs6000_mangle_type): Add mangling for __dmr type.
+ (rs6000_dmr_register_move_cost): Add support for TDOmode.
+ (rs6000_split_multireg_move): Likewise.
+ (rs6000_invalid_conversion): Likewise.
+ * config/rs6000/rs6000.h (VECTOR_ALIGNMENT_P): Add TDOmode.
+ (enum rs6000_builtin_type_index): Add DMR type nodes.
+ (dmr_type_node): Likewise.
+ (ptr_dmr_type_node): Likewise.
+
+gcc/testsuite/
+
+ * gcc.target/powerpc/dm-1024bit.c: New test.
+
+==================== Branch work162-dmf, patch #105 ====================
+
+Add dense math test for new instruction names.
+
+2024-03-13 Michael Meissner <meissner@linux.ibm.com>
+
+gcc/testsuite/
+
+ * gcc.target/powerpc/dm-double-test.c: New test.
+ * lib/target-supports.exp (check_effective_target_ppc_dmr_ok): New
+ target test.
+
+==================== Branch work162-dmf, patch #104 ====================
+
+PowerPC: Switch to dense math names for all MMA operations.
+
+This patch changes the assembler instruction names for MMA instructions from
+the original name used in power10 to the new name when used with the dense math
+system. I.e. xvf64gerpp becomes dmxvf64gerpp. The assembler will emit the
+same bits for either spelling.
+
+For the non-prefixed MMA instructions, we add a 'dm' prefix in front of the
+instruction. However, the prefixed instructions have a 'pm' prefix, and we add
+the 'dm' prefix afterwards. To prevent having two sets of parallel int
+attributes, we remove the "pm" prefix from the instruction string in the
+attributes, and add it later, both in the insn name and in the output template.
+
+2024-03-13 Michael Meissner <meissner@linux.ibm.com>
+
+gcc/
+
+ * config/rs6000/mma.md (vvi4i4i8): Change the instruction to not have a
+ "pm" prefix.
+ (avvi4i4i8): Likewise.
+ (vvi4i4i2): Likewise.
+ (avvi4i4i2): Likewise.
+ (vvi4i4): Likewise.
+ (avvi4i4): Likewise.
+ (pvi4i2): Likewise.
+ (apvi4i2): Likewise.
+ (vvi4i4i4): Likewise.
+ (avvi4i4i4): Likewise.
+ (mma_xxsetaccz): Add support for running on DMF systems, generating the
+ dense math instruction and using the dense math accumulators.
+ (mma_<vv>): Likewise.
+ (mma_<pv>): Likewise.
+ (mma_<avv>): Likewise.
+ (mma_<apv>): Likewise.
+ (mma_pm<vvi4i4i8>): Add support for running on DMF systems, generating
+ the dense math instruction and using the dense math accumulators.
+ Rename the insn with a 'pm' prefix and add either 'pm' or 'pmdm'
+ prefixes based on whether we have the original MMA specification or if
+ we have dense math support.
+ (mma_pm<avvi4i4i8>): Likewise.
+ (mma_pm<vvi4i4i2>): Likewise.
+ (mma_pm<avvi4i4i2>): Likewise.
+ (mma_pm<vvi4i4>): Likewise.
+ (mma_pm<avvi4i4): Likewise.
+ (mma_pm<pvi4i2>): Likewise.
+ (mma_pm<apvi4i2): Likewise.
+ (mma_pm<vvi4i4i4>): Likewise.
+ (mma_pm<avvi4i4i4>): Likewise.
+
+==================== Branch work162-dmf, patch #103 ====================
+
+Add support for dense math registers.
+
+The MMA subsystem added the notion of accumulator registers as an optional
+feature of ISA 3.1 (power10). In ISA 3.1, these accumulators overlapped with
+the VSX registers 0..31, but logically the accumulator registers were separate
+from the FPR registers. In ISA 3.1, it was anticipated that in future systems,
+the accumulator registers may no overlap with the FPR registers. This patch
+adds the support for dense math registers as separate registers.
+
+This particular patch does not change the MMA support to use the accumulators
+within the dense math registers. This patch just adds the basic support for
+having separate DMRs. The next patch will switch the MMA support to use the
+accumulators if -mcpu=future is used.
+
+For testing purposes, I added an undocumented option '-mdense-math' to enable
+or disable the dense math support.
+
+This patch adds a new constraint (wD). If MMA is selected but dense math is
+not selected (i.e. -mcpu=power10), the wD constraint will allow access to
+accumulators that overlap with VSX registers 0..31. If both MMA and dense math
+are selected (i.e. -mcpu=future), the wD constraint will only allow dense math
+registers.
+
+This patch modifies the existing %A output modifier. If MMA is selected but
+dense math is not selected, then %A output modifier converts the VSX register
+number to the accumulator number, by dividing it by 4. If both MMA and dense
+math are selected, then %A will map the separate DMR registers into 0..7.
+
+The intention is that user code using extended asm can be modified to run on
+both MMA without dense math and MMA with dense math:
+
+ 1) If possible, don't use extended asm, but instead use the MMA built-in
+ functions;
+
+ 2) If you do need to write extended asm, change the d constraints
+ targetting accumulators should now use wD;
+
+ 3) Only use the built-in zero, assemble and disassemble functions create
+ move data between vector quad types and dense math accumulators.
+ I.e. do not use the xxmfacc, xxmtacc, and xxsetaccz directly in the
+ extended asm code. The reason is these instructions assume there is a
+ 1-to-1 correspondence between 4 adjacent FPR registers and an
+ accumulator that overlaps with those instructions. With accumulators
+ now being separate registers, there no longer is a 1-to-1
+ correspondence.
+
+It is possible that the mangling for DMRs and the GDB register numbers may
+produce other changes in the future.
+
+2024-03-13 Michael Meissner <meissner@linux.ibm.com>
+
+ * config/rs6000/mma.md (movxo): Add comments about dense math registers.
+ (movxo_nodm): Rename from movxo and restrict the usage to machines
+ without dense math registers.
+ (movxo_dm): New insn for movxo support for machines with dense math
+ registers.
+ (mma_<acc>): Restrict usage to machines without dense math registers.
+ (mma_xxsetaccz): Make a define_expand, and add support for dense math
+ registers.
+ (mma_xxsetaccz_nodm): Rename from mma_xxsetaccz, and restrict to
+ machines without dense math registers.
+ (mma_dmsetaccz): New insn.
+ * config/rs6000/predicates.md (dmr_operand): New predicate.
+ (accumulator_operand): Add support for dense math registers.
+ * config/rs6000/rs6000-builtin.cc (rs6000_gimple_fold_mma_builtin): Do
+ not de-prime accumulator when disassembling a vector quad.
+ * config/rs6000/rs6000.cc (enum rs6000_reg_type): Add DMR_REG_TYPE.
+ (enum rs6000_reload_reg_type): Add RELOAD_REG_DMR.
+ (LAST_RELOAD_REG_CLASS): Add support for DMR registers and the wD
+ constraint.
+ (reload_reg_map): Likewise.
+ (rs6000_reg_names): Likewise.
+ (alt_reg_names): Likewise.
+ (rs6000_hard_regno_nregs_internal): Likewise.
+ (rs6000_hard_regno_mode_ok_uncached): Likewise.
+ (rs6000_debug_reg_global): Likewise.
+ (rs6000_setup_reg_addr_masks): Likewise.
+ (rs6000_init_hard_regno_mode_ok): Likewise.
+ (rs6000_secondary_reload_memory): Add support for DMR registers.
+ (rs6000_secondary_reload_simple_move): Likewise.
+ (rs6000_preferred_reload_class): Likewise.
+ (rs6000_secondary_reload_class): Likewise.
+ (print_operand): Make %A handle both FPRs and DMRs.
+ (rs6000_dmr_register_move_cost): New helper function.
+ (rs6000_register_move_cost): Add support for DMR registers.
+ (rs6000_memory_move_cost): Likewise.
+ (rs6000_compute_pressure_classes): Likewise.
+ (rs6000_debugger_regno): Likewise.
+ (rs6000_split_multireg_move): Add support for DMRs.
+ * config/rs6000/rs6000.h (TARGET_DENSE_MATH): New macro.
+ (TARGET_MMA_DENSE_MATH): Likewise.
+ (TARGET_MMA_NO_DENSE_MATH): Likewise
+ (UNITS_PER_DMR_WORD): Likewise.
+ (FIRST_PSEUDO_REGISTER): Update for DMRs.
+ (FIXED_REGISTERS): Add DMRs.
+ (CALL_REALLY_USED_REGISTERS): Likewise.
+ (REG_ALLOC_ORDER): Likewise.
+ (DMR_REGNO_P): New macro.
+ (enum reg_class): Add DM_REGS.
+ (REG_CLASS_NAMES): Likewise.
+ (REG_CLASS_CONTENTS): Likewise.
+ (enum r6000_reg_class_enum): Add RS6000_CONSTRAINT_wD.
+ (REGISTER_NAMES): Add DMR registers.
+ (ADDITIONAL_REGISTER_NAMES): Likewise.
+
+==================== Branch work162-dmf, patch #102 ====================
+
+Add wD constraint.
+
+This patch adds a new constraint ('wD') that matches the accumulator registers
+that overlap with VSX registers 0..31 on power10. Future patches will add the
+support for a separate accumulator register class that will be used when the
+support for dense math registes is added.
+
+2024-03-13 Michael Meissner <meissner@linux.ibm.com>
+
+ * config/rs6000/constraints.md (wD): New constraint.
+ * config/rs6000/mma.md (mma_disassemble_acc): Likewise.
+ (mma_<vv>): Likewise.
+ (mma_<avv>): Likewise.
+ (mma_<pv>): Likewise.
+ (mma_<apv>): Likewise.
+ (mma_<vvi4i4i8>): Likewise.
+ (mma_<avvi4i4i8>): Likewise.
+ (mma_<vvi4i4i2>): Likewise.
+ (mma_<avvi4i4i2>): Likewise.
+ (mma_<vvi4i4>): Likewise.
+ (mma_<avvi4i4>): Likewise.
+ (mma_<pvi4i2): Likewise.
+ (mma_<apvi4i2>): Likewise.
+ (mma_<vvi4i4i4>): Likewise.
+ (mma_<avvi4i4i4): Likewise.
+ * config/rs6000/predicates.md (accumulator_operand): New predicate.
+ * config/rs6000/rs6000.cc (rs6000_debug_reg_global): Print the register
+ class for the 'wD' constraint.
+ (rs6000_init_hard_regno_mode_ok): Set the 'wD' register constraint
+ class.
+ * config/rs6000/rs6000.h (enum r6000_reg_class_enum): Add element for
+ the 'wD' constraint.
+ * doc/md.texi (PowerPC constraints): Document the 'wD' constraint.
+
+==================== Branch work162-dmf, patch #101 ====================
+
+Use vector pair load/store for memcpy with -mcpu=future
+
+In the development for the power10 processor, GCC did not enable using the load
+vector pair and store vector pair instructions when optimizing things like
+memory copy. This patch enables using those instructions if -mcpu=future is
+used.
+
+2024-03-12 Michael Meissner <meissner@linux.ibm.com>
+
+gcc/
+
+ * config/rs6000/rs6000-cpus.def (ISA_FUTURE_MASKS_SERVER): Enable using
+ load vector pair and store vector pair instructions for memory copy
+ operations.
+ (POWERPC_MASKS): Make the bit for enabling using load vector pair and
+ store vector pair operations set and reset when the PowerPC processor is
+ changed.
+
==================== Branch work162-dmf, baseline ====================
+Add ChangeLog.dmf and update REVISION.
+
+2024-03-07 Michael Meissner <meissner@linux.ibm.com>
+
+gcc/
+
+ * ChangeLog.dmf: New file for branch.
+ * REVISION: Update.
+
2024-03-07 Michael Meissner <meissner@linux.ibm.com>
Clone branch
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2024-03-13 6:36 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-13 6:36 [gcc(refs/users/meissner/heads/work162-dmf)] Update ChangeLog.* Michael Meissner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).