From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 1005) id 0D30A3858D20; Sat, 3 Dec 2022 04:39:59 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 0D30A3858D20 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1670042399; bh=dGTRtH3EWTAv3HUrPd+XWvv+OLGnHBrlz5gzxIjexV0=; h=From:To:Subject:Date:From; b=bRV83HVUggaoNKDSVMh8ewvBvhMI3CDZ/wVDH/EoIRKDXlQxCv4PGEbeqg58DrpUX 9HTWXXZy7rAAEGtFB6olJhc9yuPTV0V7L+HS8U8abTHxWYsI9tGSWRDDrKtoz2rqR9 dPtBR//XjI7aVBH+yTghUTeilmvlkquaQXbLCSCo= Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: Michael Meissner To: gcc-cvs@gcc.gnu.org Subject: [gcc(refs/users/meissner/heads/dmf005)] Update ChangeLog.meissner. X-Act-Checkin: gcc X-Git-Author: Michael Meissner X-Git-Refname: refs/users/meissner/heads/dmf005 X-Git-Oldrev: 76b94f28c88e78569c408fea65191e3b5678b76d X-Git-Newrev: 5799ab41a692e981881f915121ef719913f03dae Message-Id: <20221203043959.0D30A3858D20@sourceware.org> Date: Sat, 3 Dec 2022 04:39:59 +0000 (GMT) List-Id: https://gcc.gnu.org/g:5799ab41a692e981881f915121ef719913f03dae commit 5799ab41a692e981881f915121ef719913f03dae Author: Michael Meissner Date: Fri Dec 2 23:39:43 2022 -0500 Update ChangeLog.meissner. 2022-12-02 Michael Meissner gcc/ * ChangeLog.meissner: Update. Diff: --- gcc/ChangeLog.meissner | 876 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 876 insertions(+) diff --git a/gcc/ChangeLog.meissner b/gcc/ChangeLog.meissner index 43a531f7b7e..e8eb5287acc 100644 --- a/gcc/ChangeLog.meissner +++ b/gcc/ChangeLog.meissner @@ -1,3 +1,879 @@ +==================== Branch dmf005, patch #20. + +Use lxvl and stxvl for small variable memcpy moves. + +This patch adds support to generate inline code for block copy with a variable +size if the size is 16 bytes or less. If the size is more than 16 bytes, just +call memcpy. + +To handle variable sizes, I found we need DImode versions of the two insns for +copying memory (cpymem and ). + +2022-12-02 Michael Meissner + +gcc/ + + * config/rs6000/rs6000-string.cc (toplevel): Include optabs.h. + (expand_lxvl_stxvl): New helper function for variable sized moves. + (expand_block_move_variable): New function to optionally generate + variable sized block move insns. + (expand_block_move): Add support for using lxvl and stxvl to move bytes + inline if the variable length is small enough before calling memcpy. + * config/rs6000/rs6000.md (cpymem): Expand cpymemsi to also + provide cpymemdi to handle DImode sizes as well as SImode sizes. + (movmem): Expand movmemsi to also provide movmemdi to handle + DImode sizes as well as SImode sizes. + * config/rs6000/rs6000.opt (rs6000-memcpy-inline-bytes): New parameter. + * config/rs6000/vsx.md (lxvprl): New insns for -mcpu=future. + (stxvprl): Likewise. + +==================== Branch dmf005, patches #18 - 19 were skipped. + +==================== Branch dmf005, patch #17. + +Support load/store vector with right length. + +This patch adds support for new instructions that may be added to the PowerPC +architecture in the future to enhance the load and store vector with length +instructions. + +The current instructions (lxvl, lxvll, stxvl, and stxvll) are inconvient to use +since the count for the number of bytes must be in the top 8 bits of the GPR +register, instead of the bottom 8 bits. This meant that code generating these +instructions typically had to do a shift left by 56 bits to get the count into +the right position. In a future version of the PowerPC architecture, new +variants of these instructions might be added that expect the count to be in +the bottom 8 bits of the GPR register. These patches add this support to GCC +if the user uses the -mcpu=future option. + +I tested this patch on a little endian power10 system with long double using +the tradiational IBM double double format. Assuming the other 6 patches for +-mcpu=future are checked in (or at least the first patch), can I check this +patch into the master branch for GCC 13. + +2022-12-02 Michael Meissner + +gcc/ + + * config/rs6000/vsx.md (lxvl): If -mcpu=future, generate the lxvl with + the shift count automaticaly used in the insn. + (lxvrl): New insn for -mcpu=future. + (lxvrll): Likewise. + (stxvl): If -mcpu=future, generate the stxvl with the shift count + automaticaly used in the insn. + (stxvrl): New insn for -mcpu=future. + (stxvrll): Likewise. + +gcc/testsuite/ + + * gcc.target/powerpc/lxvrl.c: New test. + +==================== Branch dmf005, patch #16. + +Add saturating subtract built-ins. + +This patch adds support for a saturating subtract built-in function that may be +added to a future PowerPC processor. Note, if it is added, the name of the +built-in function may change before GCC 13 is released. If the name changes, +we will submit a patch changing the name. + +I also added support for providing dense math built-in functions, even though +at present, we have not added any new built-in functions for dense math. It is +likely we will want to add new dense math built-in functions as the dense math +support is fleshed out. + +I tested this patch on a little endian power10 system with long double using +the tradiational IBM double double format. Assuming the other 6 patches for +-mcpu=future are checked in (or at least the first patch), can I check this +patch into the master branch for GCC 13. + +2022-12-02 Michael Meissner + +gcc/ + + * config/rs6000/rs6000-builtin.cc (rs6000_invalid_builtin): Add support + for flagging invalid use of future built-in functions. + (rs6000_builtin_is_supported): Add support for future built-in + functions. + * config/rs6000/rs6000-builtins.def (__builtin_saturate_subtract32): New + built-in function for -mcpu=future. + (__builtin_saturate_subtract64): Likewise. + * config/rs6000/rs6000-gen-builtins.cc (enum bif_stanza): Add stanzas + for -mcpu=future built-ins. + (stanza_map): Likewise. + (enable_string): Likewise. + (struct attrinfo): Likewise. + (parse_bif_attrs): Likewise. + (write_decls): Likewise. + * config/rs6000/rs6000.md (sat_sub3): Add saturating subtract + built-in insn declarations. + (sat_sub3_dot): Likewise. + (sat_sub3_dot2): Likewise. + * doc/extend.texi (Future PowerPC built-ins): New section. + +gcc/testsuite/ + + * gcc.target/powerpc/subfus-1.c: New test. + * gcc.target/powerpc/subfus-2.c: Likewise. + * lib/target-supports.exp (check_effective_target_powerpc_future_ok): + New effective target. + +==================== Branch dmf005, patch #15. + +PowerPC: Add support for 1,024 bit DMR registers. + +This patch is a prelimianry patch to add the full 1,024 bit dense math register +(DMRs) for -mcpu=future. The MMA 512-bit accumulators map onto the top of the +DMR register. + +This patch only adds the new 1,024 bit register support. It does not add +support for any instructions that need 1,024 bit registers instead of 512 bit +registers. + +I used the new mode 'TDOmode' to be the opaque mode used for 1,204 bit +registers. The 'wD' constraint added in previous patches is used for these +registers. I added support to do load and store of DMRs via the VSX registers, +since there are no load/store dense math instructions. I added the new keyword +'__dmr' to create 1,024 bit types that can be loaded into DMRs. At present, I +don't have aliases for __dmr512 and __dmr1024 that we've discussed internally. + +At present, the tree constant propigation patch does not work with 1,024 bit +DMRs. I believe this is due to the CCP pass not skipping opaque modes. I hope +once this patch is committed, we can work on the machine independent changes to +allow the CCP pass not to issue an internal error when a DMR is used. + +The patches have been tested on the following platforms. I added the patches +for PR target/107299 that I submitted on November 2nd before doing the builds so +that GCC would build on systems using IEEE 128-bit long double. + * https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604834.html + +There were no regressions with doing bootstrap builds and running the regression +tests: + + 1) Power10 LE using --with-cpu=power10 --with-long-double-format=ieee; + 2) Power10 LE using --with-cpu=power10 --with-long-double-format=ibm; + 3) Power9 LE using --with-cpu=power9 --with-long-double-format=ibm; and + 4) Power8 BE using --with-cpu=power8 (both 32-bit & 64-bit tested). + +Can I check this patch into the GCC 13 master branch? + +2022-12-02 Michael Meissner + +gcc/ + + * config/rs6000/mma.md (UNSPEC_DM_INSERT512_UPPER): New unspec. + (UNSPEC_DM_INSERT512_LOWER): Likewise. + (UNSPEC_DM_EXTRACT512): Likewise. + (UNSPEC_DMR_RELOAD_FROM_MEMORY): Likewise. + (UNSPEC_DMR_RELOAD_TO_MEMORY): Likewise. + (movtdo): New define_expand and define_insn_and_split to implement 1,024 + bit DMR registers. + (movtdo_insert512_upper): New insn. + (movtdo_insert512_lower): Likewise. + (movtdo_extract512): Likewise. + (reload_dmr_from_memory): Likewise. + (reload_dmr_to_memory): Likewise. + * config/rs6000/rs6000-builtin.cc (rs6000_type_string): Add DMR + support. + (rs6000_init_builtins): Add support for __dmr keyword. + * config/rs6000/rs6000-call.cc (rs6000_return_in_memory): Add support + for TDOmode. + (rs6000_function_arg): Likewise. + * config/rs6000/rs6000-modes.def (TDOmode): New mode. + * config/rs6000/rs6000.cc (rs6000_hard_regno_nregs_internal): Add + support for TDOmode. + (rs6000_hard_regno_mode_ok_uncached): Likewise. + (rs6000_hard_regno_mode_ok): Likewise. + (rs6000_modes_tieable_p): Likewise. + (rs6000_debug_reg_global): Likewise. + (rs6000_setup_reg_addr_masks): Likewise. + (rs6000_init_hard_regno_mode_ok): Add support for TDOmode. Setup reload + hooks for DMR mode. + (reg_offset_addressing_ok_p): Add support for TDOmode. + (rs6000_emit_move): Likewise. + (rs6000_secondary_reload_simple_move): Likewise. + (rs6000_secondary_reload_class): Likewise. + (rs6000_mangle_type): Add mangling for __dmr type. + (rs6000_dmr_register_move_cost): Add support for TDOmode. + (rs6000_split_multireg_move): Likewise. + (rs6000_invalid_conversion): Likewise. + * config/rs6000/rs6000.h (VECTOR_ALIGNMENT_P): Add TDOmode. + (enum rs6000_builtin_type_index): Add DMR type nodes. + (dmr_type_node): Likewise. + (ptr_dmr_type_node): Likewise. + +gcc/testsuite/ + + * gcc.target/powerpc/dm-1024bit.c: New test. + +==================== Branch dmf005, patch #14. + +PowerPC: Switch to dense math names for all MMA operations. + +This patch changes the assembler instruction names for MMA instructions from +the original name used in power10 to the new name when used with the dense math +system. I.e. xvf64gerpp becomes dmxvf64gerpp. The assembler will emit the +same bits for either spelling. + +The patches have been tested on the following platforms. I added the patches +for PR target/107299 that I submitted on November 2nd before doing the builds so +that GCC would build on systems using IEEE 128-bit long double. + * https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604834.html + +There were no regressions with doing bootstrap builds and running the regression +tests: + + 1) Power10 LE using --with-cpu=power10 --with-long-double-format=ieee; + 2) Power10 LE using --with-cpu=power10 --with-long-double-format=ibm; + 3) Power9 LE using --with-cpu=power9 --with-long-double-format=ibm; and + 4) Power8 BE using --with-cpu=power8 (both 32-bit & 64-bit tested). + +Can I check this patch into the GCC 13 master branch? + +2022-11-09 Michael Meissner + +gcc/ + + * config/rs6000/mma.md (vvi4i4i8_dm): New int attribute. + (avvi4i4i8_dm): Likewise. + (vvi4i4i2_dm): Likewise. + (avvi4i4i2_dm): Likewise. + (vvi4i4_dm): Likewise. + (avvi4i4_dm): Likewise. + (pvi4i2_dm): Likewise. + (apvi4i2_dm): Likewise. + (vvi4i4i4_dm): Likewise. + (avvi4i4i4_dm): Likewise. + (mma_): Add support for running on DMF systems, generating the dense + math instruction and using the dense math accumulators. + (mma_): Likewise. + (mma_): Likewise. + (mma_): Likewise. + (mma_): Likewise. + (mma_): Likewise. + (mma_): Likewise. + (mma_): Likewise. + (mma_): Likewise. + (mma_): Likewise. + (mma_): Likewise. + (mma_): Likewise. + +gcc/testsuite/ + + * gcc.target/powerpc/dm-double-test.c: New test. + * lib/target-supports.exp (check_effective_target_ppc_dmr_ok): New + target test. + +==================== Branch dmf005, patch #13. + +PowerPC: Make MMA insns support DMR registers. + +This patch changes the MMA instructions to use either FPR registers +(-mcpu=power10) or DMRs (-mcpu=future). In this patch, the existing MMA +instruction names are used. + +A macro (__PPC_DMR__) is defined if the MMA instructions use the DMRs. + +The patches have been tested on the following platforms. I added the patches +for PR target/107299 that I submitted on November 2nd before doing the builds so +that GCC would build on systems using IEEE 128-bit long double. + * https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604834.html + +There were no regressions with doing bootstrap builds and running the regression +tests: + + 1) Power10 LE using --with-cpu=power10 --with-long-double-format=ieee; + 2) Power10 LE using --with-cpu=power10 --with-long-double-format=ibm; + 3) Power9 LE using --with-cpu=power9 --with-long-double-format=ibm; and + 4) Power8 BE using --with-cpu=power8 (both 32-bit & 64-bit tested). + +Can I check this patch into the GCC 13 master branch? + +2022-12-02 Michael Meissner + +gcc/ + + * config/rs6000/mma.md (mma_): New define_expand to handle + mma_ for dense math and non dense math. + (mma_ insn): Restrict to non dense math. + (mma_xxsetaccz): Convert to define_expand to handle non dense math and + dense math. + (mma_xxsetaccz_p10): Rename from mma_xxsetaccz and restrict usage to non + dense math. + (mma_xxsetaccz_dm): Dense math version of mma_xxsetaccz. + (mma_): Add support for dense math. + (mma_): Likewise. + (mma_): Likewise. + (mma_): Likewise. + (mma_): Likewise. + (mma_): Likewise. + (mma_): Likewise. + (mma_): Likewise. + (mma_): Likewise. + (mma_): Likewise. + (mma_): Likewise. + (mma_): Likewise. + (mma_): Likewise. + (mma_): Likewise. + * config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): Define + __PPC_DMR__ if we have dense math instructions. + * config/rs6000/rs6000.cc (print_operand): Make %A handle only DMRs if + dense math and only FPRs if not dense math. + (rs6000_split_multireg_move): Do not generate accumulator prime or + de-prime instructions if dense math. + +==================== Branch dmf005, patch #12. + +PowerPC: Add support for accumulators in DMR registers. + +The MMA system added the notion of accumulator registers. In power10, these +accumulators overlapped with the FPR registers, but logically the accumulators +were separate from the FPR registers. It is anticipated that in future +systems, we may have a separate dense math unit and the accumulators will be +mapped onto the new dense math registers (DMRs). This patch adds the support +for dense math registers. + +These changes are preliminary. They are expected to change over time. + +This particular patch does not change the MMA support to use the accumulators +within the dense math registers. This patch just adds the basic support for +having separate DMRs. The next patch will switch the MMA support to use the +accumulators if -mcpu=future is used. + +For testing purposes, I added an undocumented option '-mdense-math' to enable +or disable the dense math support. + +This patch adds a new constraint (wD). If MMA is selected but dense math is +not selected (i.e. -mcpu=power10), the wD constraint will match accumulators +that overlap with the FPRs. If both MMA and dense math are selected +(i.e. -mcpu=future), the wD constraint will only match DMRs. + +This patch modifies the existing %A output modifier. If MMA is selected but +dense math is not selected, then %A convert the FPR register number to the +accumulator number. If both MMA and dense math are selected, then %A will only +work if the register is an accumulator mapped onto a DMR. + +The intention is that user code using extended asm can be modified to run on +both MMA without dense math and MMA with dense math: + + 1) If possible, don't use extended asm, but instead use the MMA built-in + functions; + + 2) If you do need to write extended asm, change the d constraints + targetting accumulators should now use wD; + + 3) Only use the built-in zero, assemble and disassemble functions create + move data between vector quad types and dense math accumulators. + I.e. do not use the xxmfacc, xxmtacc, and xxsetaccz directly in the + extended asm code. The reason is these instructions assume there is a + 1-to-1 correspondence between 4 adjacent FPR registers and an + accumulator that overlaps with those instructions. With accumulators + now being separate registers, there no longer is a 1-to-1 + correspondence. + +It is possible that the mangling for DMRs and the GDB register numbers may +change in the future. + +The patches have been tested on the following platforms. I added the patches +for PR target/107299 that I submitted on November 2nd before doing the builds so +that GCC would build on systems using IEEE 128-bit long double. + * https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604834.html + +There were no regressions with doing bootstrap builds and running the regression +tests: + + 1) Power10 LE using --with-cpu=power10 --with-long-double-format=ieee; + 2) Power10 LE using --with-cpu=power10 --with-long-double-format=ibm; + 3) Power9 LE using --with-cpu=power9 --with-long-double-format=ibm; and + 4) Power8 BE using --with-cpu=power8 (both 32-bit & 64-bit tested). + +Can I check this patch into the GCC 13 master branch? + +2022-12-02 Michael Meissner + +gcc/ + + * config/rs6000/constraints.md (wD constraint): New constraint. + * config/rs6000/mma.md (UNSPEC_DM_ASSEMBLE_ACC): New unspec. + (movxo): Convert into define_expand. + (movxo_fpr): Version of movxo where accumulators overlap with FPRs. + (movxo_dm): Dense math version of movxo. + (mma_assemble_acc): Add dense match support to define_expand. + (mma_assemble_acc_fpr): Rename from mma_assemble_acc, and restrict it to + non dense math. + (mma_assemble_acc_dm): Dense math version of mma_assemble_acc. + (mma_disassemble_acc): Add dense math support to define_expand. + (mma_disassemble_acc_fpr): Rename from mma_disassemble_acc, and restrict + it to non dense math. + (mma_disassemble_acc_dm): Dense math version of mma_disassemble_acc. + * config/rs6000/predicates.md (dmr_operand): New predicate. + (accumulator_operand): Likewise. + * config/rs6000/rs6000-cpus.def (ISA_FUTURE_MASKS): Add -mdense-math. + (POWERPC_MASKS): Likewise. + * config/rs6000/rs6000.cc (enum rs6000_reg_type): Add DMR_REG_TYPE. + (enum rs6000_reload_reg_type): Add RELOAD_REG_DMR. + (LAST_RELOAD_REG_CLASS): Add support for DMR registers. + (reload_reg_map): Likewise. + (rs6000_reg_names): Likewise. + (alt_reg_names): Likewise. + (rs6000_hard_regno_nregs_internal): Likewise. + (rs6000_hard_regno_mode_ok_uncached): Likewise. + (rs6000_debug_reg_global): Likewise. + (rs6000_setup_reg_addr_masks): Likewise. + (rs6000_init_hard_regno_mode_ok): Likewise. + (rs6000_option_override_internal): Add checking for -mdense-math. + (rs6000_secondary_reload_memory): Add support for DMR registers. + (rs6000_secondary_reload_simple_move): Likewise. + (rs6000_preferred_reload_class): Likewise. + (rs6000_secondary_reload_class): Likewise. + (print_operand): Make %A handle both FPRs and DMRs. + (rs6000_dmr_register_move_cost): New helper function. + (rs6000_register_move_cost): Add support for DMR registers. + (rs6000_memory_move_cost): Likewise. + (rs6000_compute_pressure_classes): Likewise. + (rs6000_debugger_regno): Likewise. + (rs6000_opt_masks): Add -mdense-math. + (rs6000_split_multireg_move): Add support for DMRs. + * config/rs6000/rs6000.h (UNITS_PER_DMR_WORD): New macro. + (FIRST_PSEUDO_REGISTER): Update for DMRs. + (FIXED_REGISTERS): Add DMRs. + (CALL_REALLY_USED_REGISTERS): Likewise. + (REG_ALLOC_ORDER): Likewise. + (enum reg_class): Add DM_REGS. + (REG_CLASS_NAMES): Likewise. + (REG_CLASS_CONTENTS): Likewise. + * config/rs6000/rs6000.md (FIRST_DMR_REGNO): New constant. + (LAST_DMR_REGNO): Likewise. + (isa attribute): Add 'dm' and 'not_dm' attributes. + (enabled attribute): Support 'dm' and 'not_dm' attributes. + * config/rs6000/rs6000.opt (-mdense-math): New switch. + * doc/md.texi (PowerPC constraints): Document wD constraint. + +==================== Branch dmf005, patch #11. + +PowerPC: Make -mcpu=future enable -mblock-ops-vector-pair. + +This patch enables generating load and store vector pair instructions when +doing certain memory copy operations when -mcpu=future is used. In doing tests +on power10, it was determined that using these instructions were problematical +in a few cases, so we disabled generating them by default. This patch +re-enabled generating these instructions if -mcpu=future is used. + +The patches have been tested on the following platforms. I added the patches +for PR target/107299 that I submitted on November 2nd before doing the builds so +that GCC would build on systems using IEEE 128-bit long double. + * https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604834.html + +There were no regressions with doing bootstrap builds and running the regression +tests: + + 1) Power10 LE using --with-cpu=power10 --with-long-double-format=ieee; + 2) Power10 LE using --with-cpu=power10 --with-long-double-format=ibm; + 3) Power9 LE using --with-cpu=power9 --with-long-double-format=ibm; and + 4) Power8 BE using --with-cpu=power8 (both 32-bit & 64-bit tested). + +Can I check this patch into the GCC 13 master branch? + +2022-12-02 Michael Meissner + +gcc/ + + * config/rs6000/rs6000-cpus.def (ISA_FUTURE_MASKS): Add + -mblock-ops-vector-pair. + (POWERPC_MASKS): Likewise. + +==================== Branch dmf005, patch #10. + +PowerPC: Add -mcpu=future. + +This patch adds support for the -mcpu=future and -mtune=future options. +Besides defining __ARCH_PWR_FUTURE__ this particular patch does not enable any +new features. + +At present, we do not have any specific differences in terms of cpu tuning for +future machines, so we make -mtune=future act the same as -mtune=power10. It +is anticipated that we may add support for changing the tuning characteristics +for -mtune=future at a later time. + +These patches implement support for potential future PowerPC cpus. At this +time, features enabled with -mcpu=future may or may not be in actual PowerPCs +that will be delivered in the future. + +The patches have been tested on the following platforms. I added the patches +for PR target/107299 that I submitted on November 2nd before doing the builds so +that GCC would build on systems using IEEE 128-bit long double. + * https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604834.html + +There were no regressions with doing bootstrap builds and running the regression +tests: + + 1) Power10 LE using --with-cpu=power10 --with-long-double-format=ieee; + 2) Power10 LE using --with-cpu=power10 --with-long-double-format=ibm; + 3) Power9 LE using --with-cpu=power9 --with-long-double-format=ibm; and + 4) Power8 BE using --with-cpu=power8 (both 32-bit & 64-bit tested). + +Can I check this patch into the GCC 13 master branch? + +2022-12-02 Michael Meissner + +gcc/ + + * config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): Define + __ARCH_PWR_FUTURE__ if -mcpu=future. + * config/rs6000/rs6000-cpus.def (ISA_FUTURE_MASKS): New macro. + (POWERPC_MASKS): Add -mfuture. + * config/rs6000/rs6000-opts.h (enum processor_type): Add + PROCESSOR_FUTURE. + * config/rs6000/rs6000-tables.opt: Regenerate. + * config/rs6000/rs6000.cc (rs6000_option_override_internal): Add + -mcpu=future support. Make -mtune=future act like -mtune=power10 for + now. + (rs6000_machine_from_flags): Likewise. + (rs6000_reassociation_width): Likewise. + (rs6000_adjust_cost): Likewise. + (rs6000_issue_rate): Likewise. + (rs6000_sched_reorder): Likewise. + (rs6000_sched_reorder2): Likewise. + (rs6000_register_move_cost): Likewise. + (rs6000_opt_masks): Add -mfuture. + * config/rs6000/rs6000.h (ASM_CPU_SUPPORT): Likewise. + * config/rs6000/rs6000.opt (-mfuture): New undocumented debug switch. + * config/rs6000/rs6000.md (cpu attribute): Add -mcpu=future support. + * doc/invoke.texi (IBM RS/6000 and PowerPC Options): Document -mcpu=future. + +==================== Branch dmf005, patches #5 - 9 were skipped. + +==================== Branch dmf005, patch #4. + +Patch libgcc to always use _Float128 and _Complex _Float128 on PowerPC. + +2022-12-02 Michael Meissner + +libgcc/ + + * config/rs6000/quad-float128.h (TF): Delete definition. + (TFtype): Define to be _Float128. + (TCtype): Change to be _Complex _Float128. + * libgcc2.h (TFtype): Allow MD code to override definition. + (TCtype): Likewise. + * soft-fp/quad.h (TFtype): Likewise. + +==================== Branch dmf005, patch #3. + +Update float 128-bit conversions, PR target/107299. + +This patch fixes two tests that are still failing when long double is IEEE +128-bit after the previous 2 patches for PR target/107299 have been applied. +The tests are: + + gcc.target/powerpc/convert-fp-128.c + gcc.target/powerpc/pr85657-3.c + +This patch is a rewrite of the patch submitted on August 18th: + +| https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599988.html + +This patch reworks the conversions between 128-bit binary floating point types. +Previously, we would call rs6000_expand_float128_convert to do all conversions. +Now, we only define the conversions between the same representation that turn +into a NOP. The appropriate extend or truncate insn is generated, and after +register allocation, it is converted to a move. + +This patch also fixes two places where we want to override the external name +for the conversion function, and the wrong optab was used. Previously, +rs6000_expand_float128_convert would handle the move or generate the call as +needed. Now, it lets the machine independent code generate the call. But if +we use the machine independent code to generate the call, we need to update the +name for two optabs where a truncate would be used in terms of converting +between the modes. This patch updates those two optabs. + +I tested this patch on: + + 1) LE Power10 using --with-cpu=power10 --with-long-double-format=ieee + 2) LE Power10 using --with-cpu=power10 --with-long-double-format=ibm + 3) LE Power9 using --with-cpu=power9 --with-long-double-format=ibm + 4) BE Power8 using --with-cpu=power8 --with-long-double-format=ibm + +In the past I have also tested this exact patch on the following systems: + + 1) LE Power10 using --with-cpu=power9 --with-long-double-format=ibm + 2) LE Power10 using --with-cpu=power8 --with-long-double-format=ibm + 3) LE Power10 using --with-cpu=power10 --with-long-double-format=ibm + +There were no regressions in the bootstrap process or running the tests (after +applying all 3 patches for PR target/107299). Can I check this patch into the +trunk? + +2022-12-02 Michael Meissner + +gcc/ + + PR target/107299 + * config/rs6000/rs6000.cc (init_float128_ieee): Use the correct + float_extend or float_truncate optab based on how the machine converts + between IEEE 128-bit and IBM 128-bit. + * config/rs6000/rs6000.md (IFKF): Delete. + (IFKF_reg): Delete. + (extendiftf2): Rewrite to be a move if IFmode and TFmode are both IBM + 128-bit. Do not run if TFmode is IEEE 128-bit. + (extendifkf2): Delete. + (extendtfkf2): Delete. + (extendtfif2): Delete. + (trunciftf2): Delete. + (truncifkf2): Delete. + (trunckftf2): Delete. + (extendkftf2): Implement conversion of IEEE 128-bit types as a move. + (trunctfif2): Delete. + (trunctfkf2): Implement conversion of IEEE 128-bit types as a move. + (extendtf2_internal): Delete. + (extendtf2_internal): Delete. + +==================== Branch dmf005, patch #2. + +Make __float128 use the _Float128 type, PR target/107299. + +This patch fixes the issue that GCC cannot build when the default long double +is IEEE 128-bit. It fails in building libgcc, specifically when it is trying +to buld the __mulkc3 function in libgcc. It is failing in gimple-range-fold.cc +during the evrp pass. Ultimately it is failing because the code declared the +type to use TFmode but it used F128 functions (i.e. KFmode). + + typedef float TFtype __attribute__((mode (TF))); + typedef __complex float TCtype __attribute__((mode (TC))); + + TCtype + __mulkc3_sw (TFtype a, TFtype b, TFtype c, TFtype d) + { + TFtype ac, bd, ad, bc, x, y; + TCtype res; + + ac = a * c; + bd = b * d; + ad = a * d; + bc = b * c; + + x = ac - bd; + y = ad + bc; + + if (__builtin_isnan (x) && __builtin_isnan (y)) + { + _Bool recalc = 0; + if (__builtin_isinf (a) || __builtin_isinf (b)) + { + + a = __builtin_copysignf128 (__builtin_isinf (a) ? 1 : 0, a); + b = __builtin_copysignf128 (__builtin_isinf (b) ? 1 : 0, b); + if (__builtin_isnan (c)) + c = __builtin_copysignf128 (0, c); + if (__builtin_isnan (d)) + d = __builtin_copysignf128 (0, d); + recalc = 1; + } + if (__builtin_isinf (c) || __builtin_isinf (d)) + { + + c = __builtin_copysignf128 (__builtin_isinf (c) ? 1 : 0, c); + d = __builtin_copysignf128 (__builtin_isinf (d) ? 1 : 0, d); + if (__builtin_isnan (a)) + a = __builtin_copysignf128 (0, a); + if (__builtin_isnan (b)) + b = __builtin_copysignf128 (0, b); + recalc = 1; + } + if (!recalc + && (__builtin_isinf (ac) || __builtin_isinf (bd) + || __builtin_isinf (ad) || __builtin_isinf (bc))) + { + + if (__builtin_isnan (a)) + a = __builtin_copysignf128 (0, a); + if (__builtin_isnan (b)) + b = __builtin_copysignf128 (0, b); + if (__builtin_isnan (c)) + c = __builtin_copysignf128 (0, c); + if (__builtin_isnan (d)) + d = __builtin_copysignf128 (0, d); + recalc = 1; + } + if (recalc) + { + x = __builtin_inff128 () * (a * c - b * d); + y = __builtin_inff128 () * (a * d + b * c); + } + } + + __real__ res = x; + __imag__ res = y; + return res; + } + +Currently GCC uses the long double type node for __float128 if long double is +IEEE 128-bit. It did not use the node for _Float128. + +Originally this was noticed if you call the nansq function to make a signaling +NaN (nansq is mapped to nansf128). Because the type node for _Float128 is +different from __float128, the machine independent code converts signaling NaNs +to quiet NaNs if the types are not compatible. The following tests used to +fail when run on a system where long double is IEEE 128-bit: + + gcc.dg/torture/float128-nan.c + gcc.target/powerpc/nan128-1.c + +This patch makes both __float128 and _Float128 use the same type node. + +One side effect of not using the long double type node for __float128 is that we +must only use KFmode for _Float128/__float128. The libstdc++ library won't +build if we use TFmode for _Float128 and __float128 when long double is IEEE +128-bit. + +Another minor side effect is that the f128 round to odd fused multiply-add +function will not merge negatition with the FMA operation when the type is long +double. If the type is __float128 or _Float128, then it will continue to do the +optimization. The round to odd functions are defined in terms of __float128 +arguments. For example: + + long double + do_fms (long double a, long double b, long double c) + { + return __builtin_fmaf128_round_to_odd (a, b, -c); + } + +will generate (assuming -mabi=ieeelongdouble): + + xsnegqp 4,4 + xsmaddqpo 4,2,3 + xxlor 34,36,36 + +while: + + __float128 + do_fms (__float128 a, __float128 b, __float128 c) + { + return __builtin_fmaf128_round_to_odd (a, b, -c); + } + +will generate: + + xsmsubqpo 4,2,3 + xxlor 34,36,36 + +I tested all 3 patchs for PR target/107299 on: + + 1) LE Power10 using --with-cpu=power10 --with-long-double-format=ieee + 2) LE Power10 using --with-cpu=power10 --with-long-double-format=ibm + 3) LE Power9 using --with-cpu=power9 --with-long-double-format=ibm + 4) BE Power8 using --with-cpu=power8 --with-long-double-format=ibm + +Once all 3 patches have been applied, we can once again build GCC when long +double is IEEE 128-bit. There were no other regressions with these patches. +Can I check these patches into the trunk? + +2022-12-02 Michael Meissner + +gcc/ + + PR target/107299 + * config/rs6000/rs6000-builtin.cc (rs6000_init_builtins): Always use the + _Float128 type for __float128. + (rs6000_expand_builtin): Only change a KFmode built-in to TFmode, if the + built-in passes or returns TFmode. If the predicate failed because the + modes were different, use convert_move to load up the value instead of + copy_to_mode_reg. + * config/rs6000/rs6000.cc (rs6000_translate_mode_attribute): Don't + translate IEEE 128-bit floating point modes to explicit IEEE 128-bit + modes (KFmode or KCmode), even if long double is IEEE 128-bit. + (rs6000_libgcc_floating_mode_supported_p): Support KFmode all of the + time if we support IEEE 128-bit floating point. + (rs6000_floatn_mode): _Float128 and _Float128x always uses KFmode. + +gcc/testsuite/ + + PR target/107299 + * gcc.target/powerpc/float128-hw12.c: New test. + * gcc.target/powerpc/float128-hw13.c: Likewise. + * gcc.target/powerpc/float128-hw4.c: Update insns. + +==================== Branch dmf005, patch #1. + +Rework 128-bit complex multiply and divide. + +This function reworks how the complex multiply and divide built-in functions are +done. Previously we created built-in declarations for doing long double complex +multiply and divide when long double is IEEE 128-bit. The old code also did not +support __ibm128 complex multiply and divide if long double is IEEE 128-bit. + +In terms of history, I wrote the original code just as I was starting to test +GCC on systems where IEEE 128-bit long double was the default. At the time, we +had not yet started mangling the built-in function names as a way to bridge +going from a system with 128-bit IBM long double to 128-bin IEEE long double. + +The original code depends on there only being two 128-bit types invovled. With +the next patch in this series, this assumption will no longer be true. When +long double is IEEE 128-bit, there will be 2 IEEE 128-bit types (one for the +explicit __float128/_Float128 type and one for long double). + +The problem is we cannot create two separate built-in functions that resolve to +the same name. This is a requirement of add_builtin_function and the C front +end. That means for the 3 possible modes (IFmode, KFmode, and TFmode), you can +only use 2 of them. + +This code does not create the built-in declaration with the changed name. +Instead, it uses the TARGET_MANGLE_DECL_ASSEMBLER_NAME hook to change the name +before it is written out to the assembler file like it now does for all of the +other long double built-in functions. + +We need to disable using this mapping when we are building libgcc, specifically +when it is building the floating point 128-bit multiply and divide functions. +The flag that is used when libgcc is built (-fbuilding-libcc) is only available +in the C/C++ front ends. We need to remember that we are building libgcc in the +rs6000-c.cc support to be able to use this later to decided whether to mangle +the decl assembler name or not. + +When I wrote these patches, I discovered that __ibm128 complex multiply and +divide had originally not been supported if long double is IEEE 128-bit as it +would generate calls to __mulic3 and __divic3. I added tests in the testsuite +to verify that the correct name (i.e. __multc3 and __divtc3) is used in this +case. + +I tested all 3 patchs for PR target/107299 on: + + 1) LE Power10 using --with-cpu=power10 --with-long-double-format=ieee + 2) LE Power10 using --with-cpu=power10 --with-long-double-format=ibm + 3) LE Power9 using --with-cpu=power9 --with-long-double-format=ibm + 4) BE Power8 using --with-cpu=power8 --with-long-double-format=ibm + +Once all 3 patches have been applied, we can once again build GCC when long +double is IEEE 128-bit. There were no other regressions with these patches. +Can I check these patches into the trunk? + +2022-12-02 Michael Meissner + +gcc/ + + PR target/107299 + * config/rs6000/rs6000-c.cc (rs6000_cpu_cpp_builtins): Set + building_libgcc. + * config/rs6000/rs6000.cc (create_complex_muldiv): Delete. + (init_float128_ieee): Delete code to switch complex multiply and divide + for long double. + (complex_multiply_builtin_code): New helper function. + (complex_divide_builtin_code): Likewise. + (rs6000_mangle_decl_assembler_name): Add support for mangling the name + of complex 128-bit multiply and divide built-in functions. + * config/rs6000/rs6000.opt (building_libgcc): New target variable. + +gcc/testsuite/ + + PR target/107299 + * gcc.target/powerpc/divic3-1.c: New test. + * gcc.target/powerpc/divic3-2.c: Likewise. + * gcc.target/powerpc/mulic3-1.c: Likewise. + * gcc.target/powerpc/mulic3-2.c: Likewise. + +==================== Branch dmf005, base branch. + 2022-12-02 Michael Meissner Clone branch