public inbox for gcc-cvs@sourceware.org
help / color / mirror / Atom feed
From: Michael Meissner <meissner@gcc.gnu.org>
To: gcc-cvs@gcc.gnu.org
Subject: [gcc(refs/users/meissner/heads/work161-dmf)] PowerPC: Add support for accumulators in DMR registers.
Date: Thu, 29 Feb 2024 19:18:11 +0000 (GMT)	[thread overview]
Message-ID: <20240229191811.3A0AB3858016@sourceware.org> (raw)

https://gcc.gnu.org/g:419f8923b4faa644245a54be1c4e5242695ead39

commit 419f8923b4faa644245a54be1c4e5242695ead39
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Thu Feb 29 13:50:16 2024 -0500

    PowerPC: Add support for accumulators in DMR registers.
    
    The MMA subsystem added the notion of accumulator registers as an optional
    feature of ISA 3.1 (power10).  In ISA 3.1, these accumulators overlapped with
    the VSX registers 0..31, but logically the accumulator registers were separate
    from the FPR registers.  In ISA 3.1, it was anticipated that in future systems,
    the accumulator registers may no overlap with the FPR registers.  This patch
    adds the support for dense math registers as separate registers.
    
    This particular patch does not change the MMA support to use the accumulators
    within the dense math registers.  This patch just adds the basic support for
    having separate DMRs.  The next patch will switch the MMA support to use the
    accumulators if -mcpu=future is used.
    
    For testing purposes, I added an undocumented option '-mdense-math' to enable
    or disable the dense math support.
    
    This patch adds a new constraint (wD).  If MMA is selected but dense math is
    not selected (i.e. -mcpu=power10), the wD constraint will allow access to
    accumulators that overlap with VSX registers 0..31.  If both MMA and dense math
    are selected (i.e. -mcpu=future), the wD constraint will only allow dense math
    registers.
    
    This patch modifies the existing %A output modifier.  If MMA is selected but
    dense math is not selected, then %A output modifier converts the VSX register
    number to the accumulator number, by dividing it by 4.  If both MMA and dense
    math are selected, then %A will map the separate DMR registers into 0..7.
    
    The intention is that user code using extended asm can be modified to run on
    both MMA without dense math and MMA with dense math:
    
        1)  If possible, don't use extended asm, but instead use the MMA built-in
            functions;
    
        2)  If you do need to write extended asm, change the d constraints
            targetting accumulators should now use wD;
    
        3)  Only use the built-in zero, assemble and disassemble functions create
            move data between vector quad types and dense math accumulators.
            I.e. do not use the xxmfacc, xxmtacc, and xxsetaccz directly in the
            extended asm code.  The reason is these instructions assume there is a
            1-to-1 correspondence between 4 adjacent FPR registers and an
            accumulator that overlaps with those instructions.  With accumulators
            now being separate registers, there no longer is a 1-to-1
            correspondence.
    
    It is possible that the mangling for DMRs and the GDB register numbers may
    produce other changes in the future.
    
    2024-02-29   Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            * config/rs6000/constraints.md (wD constraint): New constraint.
            * config/rs6000/mma.md (UNSPEC_DM_ASSEMBLE): New unspec.
            (movxo): Convert into define_expand.
            (movxo_nodm): Version of movxo where accumulators overlap with VSX vector
            registers 0..31.
            (movxo_dm): Verson of movxo that supports separate dense math
            accumulators.
            (mma_assemble_acc): Add dense math support to define_expand.
            (mma_assemble_acc_nodm): Rename from mma_assemble_acc, and restrict it
            to non dense math systems.
            (mma_assemble_acc_dm): Dense math version of mma_assemble_acc.
            (mma_disassemble_acc): Add dense math support to define_expand.
            (mma_disassemble_acc_nodm): Rename from mma_disassemble_acc, and
            restrict it to non dense math systems.
            (mma_disassemble_acc_dm): Dense math version of mma_disassemble_acc.
            * config/rs6000/predicates.md (dmr_operand): New predicate.
            (accumulator_operand): Likewise.
            * config/rs6000/rs6000.cc (enum rs6000_reg_type): Add DMR_REG_TYPE.
            (enum rs6000_reload_reg_type): Add RELOAD_REG_DMR.
            (LAST_RELOAD_REG_CLASS): Add support for DMR registers and the wD
            constraint.
            (reload_reg_map): Likewise.
            (rs6000_reg_names): Likewise.
            (alt_reg_names): Likewise.
            (rs6000_hard_regno_nregs_internal): Likewise.
            (rs6000_hard_regno_mode_ok_uncached): Likewise.
            (rs6000_debug_reg_global): Likewise.
            (rs6000_setup_reg_addr_masks): Likewise.
            (rs6000_init_hard_regno_mode_ok): Likewise.
            (rs6000_secondary_reload_memory): Add support for DMR registers.
            (rs6000_secondary_reload_simple_move): Likewise.
            (rs6000_preferred_reload_class): Likewise.
            (rs6000_secondary_reload_class): Likewise.
            (print_operand): Make %A handle both FPRs and DMRs.
            (rs6000_dmr_register_move_cost): New helper function.
            (rs6000_register_move_cost): Add support for DMR registers.
            (rs6000_memory_move_cost): Likewise.
            (rs6000_compute_pressure_classes): Likewise.
            (rs6000_debugger_regno): Likewise.
            (rs6000_split_multireg_move): Add support for DMRs.
            * config/rs6000/rs6000.h (TARGET_DENSE_MATH): New macro.
            (TARGET_MMA_NO_DENSE_MATH): Likewise
            (UNITS_PER_DMR_WORD): New macro.
            (FIRST_PSEUDO_REGISTER): Update for DMRs.
            (FIXED_REGISTERS): Add DMRs.
            (CALL_REALLY_USED_REGISTERS): Likewise.
            (REG_ALLOC_ORDER): Likewise.
            (DMR_REGNO_P): New macro.
            (enum reg_class): Add DM_REGS.
            (REG_CLASS_NAMES): Likewise.
            (REG_CLASS_CONTENTS): Likewise.
            (enum r6000_reg_class_enum): Add RS6000_CONSTRAINT_wD.
            (REGISTER_NAMES): Add DMR registers.
            (ADDITIONAL_REGISTER_NAMES): Likewise.
            * config/rs6000/rs6000.md (FIRST_DMR_REGNO): New constant.
            (LAST_DMR_REGNO): Likewise.
            (isa attribute): Add 'dm' and 'not_dm' attributes.
            (enabled attribute): Support 'dm' and 'not_dm' attributes.
            * doc/md.texi (PowerPC constraints): Document wD constraint.

Diff:
---
 gcc/config/rs6000/mma.md      | 251 ++++++++++++++++++++++++++----------------
 gcc/config/rs6000/rs6000-c.cc |   3 +
 gcc/config/rs6000/rs6000.cc   |  35 +++---
 gcc/config/rs6000/rs6000.h    |   3 +
 4 files changed, 181 insertions(+), 111 deletions(-)

diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
index 9bc9895bf2d..ac26de93143 100644
--- a/gcc/config/rs6000/mma.md
+++ b/gcc/config/rs6000/mma.md
@@ -552,190 +552,249 @@
   "dmxxextfdmr256 %0,%1,2"
   [(set_attr "type" "mma")])
 
-(define_insn "mma_<acc>"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d")
-	(unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0")]
-		    MMA_ACC))]
+;; MMA instructions that do not use their accumulators as an input, still must
+;; not allow their vector operands to overlap the registers used by the
+;; accumulator.  We enforce this by marking the output as early clobber.  If we
+;; have dense math, we don't need the whole prime/de-prime action, so just make
+;; these instructions be NOPs.
+
+(define_expand "mma_<acc>"
+  [(set (match_operand:XO 0 "accumulator_operand")
+	(unspec:XO [(match_operand:XO 1 "accumulator_operand")]
+		   MMA_ACC))]
   "TARGET_MMA"
+{
+  if (TARGET_DENSE_MATH)
+    {
+      if (!rtx_equal_p (operands[0], operands[1]))
+	emit_move_insn (operands[0], operands[1]);
+      DONE;
+    }
+
+  /* Generate the prime/de-prime code.  */
+})
+
+(define_insn "*mma_<acc>_nodm"
+  [(set (match_operand:XO 0 "accumulator_operand" "=&d")
+	(unspec:XO [(match_operand:XO 1 "accumulator_operand" "0")]
+		    MMA_ACC))]
+  "TARGET_MMA_NO_DENSE_MATH"
   "<acc> %A0"
   [(set_attr "type" "mma")])
 
 ;; We can't have integer constants in XOmode so we wrap this in an
-;; UNSPEC_VOLATILE.
+;; UNSPEC_VOLATILE for the non-dense math case.  For dense math, we don't need
+;; to disable optimization and we can do a normal UNSPEC.
 
-(define_insn "mma_xxsetaccz"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=d")
+(define_expand "mma_xxsetaccz"
+  [(set (match_operand:XO 0 "register_operand")
 	(unspec_volatile:XO [(const_int 0)]
 			    UNSPECV_MMA_XXSETACCZ))]
   "TARGET_MMA"
+{
+  if (TARGET_DENSE_MATH)
+    {
+      emit_insn (gen_mma_xxsetaccz_dm (operands[0]));
+      DONE;
+    }
+})
+
+(define_insn "*mma_xxsetaccz_nodm"
+  [(set (match_operand:XO 0 "fpr_reg_operand" "=d")
+	(unspec_volatile:XO [(const_int 0)]
+			    UNSPECV_MMA_XXSETACCZ))]
+  "TARGET_MMA_NO_DENSE_MATH"
   "xxsetaccz %A0"
   [(set_attr "type" "mma")])
 
+
+(define_insn "mma_xxsetaccz_dm"
+  [(set (match_operand:XO 0 "dmr_operand" "=wD")
+	(unspec:XO [(const_int 0)]
+		   UNSPECV_MMA_XXSETACCZ))]
+  "TARGET_DENSE_MATH"
+  "dmsetdmrz %0"
+  [(set_attr "type" "mma")])
+
 (define_insn "mma_<vv>"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
-	(unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa")
-		    (match_operand:V16QI 2 "vsx_register_operand" "v,?wa")]
+  [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d")
+	(unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "wa,v,?wa")
+		    (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa")]
 		    MMA_VV))]
   "TARGET_MMA"
   "<vv> %A0,%x1,%x2"
-  [(set_attr "type" "mma")])
+  [(set_attr "type" "mma")
+   (set_attr "isa" "dm,not_dm,not_dm")])
 
 (define_insn "mma_<avv>"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
-	(unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0")
-		    (match_operand:V16QI 2 "vsx_register_operand" "v,?wa")
-		    (match_operand:V16QI 3 "vsx_register_operand" "v,?wa")]
+  [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d")
+	(unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0,0")
+		    (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa")
+		    (match_operand:V16QI 3 "vsx_register_operand" "wa,v,?wa")]
 		    MMA_AVV))]
   "TARGET_MMA"
   "<avv> %A0,%x2,%x3"
-  [(set_attr "type" "mma")])
+  [(set_attr "type" "mma")
+   (set_attr "isa" "dm,not_dm,not_dm")])
 
 (define_insn "mma_<pv>"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
-	(unspec:XO [(match_operand:OO 1 "vsx_register_operand" "v,?wa")
-		    (match_operand:V16QI 2 "vsx_register_operand" "v,?wa")]
+  [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d")
+	(unspec:XO [(match_operand:OO 1 "vsx_register_operand" "wa,v,?wa")
+		    (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa")]
 		    MMA_PV))]
   "TARGET_MMA"
   "<pv> %A0,%x1,%x2"
-  [(set_attr "type" "mma")])
+  [(set_attr "type" "mma")
+   (set_attr "isa" "dm,not_dm,not_dm")])
 
 (define_insn "mma_<apv>"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
-	(unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0")
-		    (match_operand:OO 2 "vsx_register_operand" "v,?wa")
-		    (match_operand:V16QI 3 "vsx_register_operand" "v,?wa")]
+  [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d")
+	(unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0,0")
+		    (match_operand:OO 2 "vsx_register_operand" "wa,v,?wa")
+		    (match_operand:V16QI 3 "vsx_register_operand" "wa,v,?wa")]
 		    MMA_APV))]
   "TARGET_MMA"
   "<apv> %A0,%x2,%x3"
-  [(set_attr "type" "mma")])
+  [(set_attr "type" "mma")
+   (set_attr "isa" "dm,not_dm,not_dm")])
 
 (define_insn "mma_<vvi4i4i8>"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
-	(unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa")
-		    (match_operand:V16QI 2 "vsx_register_operand" "v,?wa")
-		    (match_operand:SI 3 "const_0_to_15_operand" "n,n")
-		    (match_operand:SI 4 "const_0_to_15_operand" "n,n")
-		    (match_operand:SI 5 "u8bit_cint_operand" "n,n")]
+  [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d")
+	(unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "wa,v,?wa")
+		    (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa")
+		    (match_operand:SI 3 "const_0_to_15_operand" "n,n,n")
+		    (match_operand:SI 4 "const_0_to_15_operand" "n,n,n")
+		    (match_operand:SI 5 "u8bit_cint_operand" "n,n,n")]
 		    MMA_VVI4I4I8))]
   "TARGET_MMA"
   "<vvi4i4i8> %A0,%x1,%x2,%3,%4,%5"
   [(set_attr "type" "mma")
-   (set_attr "prefixed" "yes")])
+   (set_attr "prefixed" "yes")
+   (set_attr "isa" "dm,not_dm,not_dm")])
 
 (define_insn "mma_<avvi4i4i8>"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
-	(unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0")
-		    (match_operand:V16QI 2 "vsx_register_operand" "v,?wa")
-		    (match_operand:V16QI 3 "vsx_register_operand" "v,?wa")
-		    (match_operand:SI 4 "const_0_to_15_operand" "n,n")
-		    (match_operand:SI 5 "const_0_to_15_operand" "n,n")
-		    (match_operand:SI 6 "u8bit_cint_operand" "n,n")]
+  [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d")
+	(unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0,0")
+		    (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa")
+		    (match_operand:V16QI 3 "vsx_register_operand" "wa,v,?wa")
+		    (match_operand:SI 4 "const_0_to_15_operand" "n,n,n")
+		    (match_operand:SI 5 "const_0_to_15_operand" "n,n,n")
+		    (match_operand:SI 6 "u8bit_cint_operand" "n,n,n")]
 		    MMA_AVVI4I4I8))]
   "TARGET_MMA"
   "<avvi4i4i8> %A0,%x2,%x3,%4,%5,%6"
   [(set_attr "type" "mma")
-   (set_attr "prefixed" "yes")])
+   (set_attr "prefixed" "yes")
+   (set_attr "isa" "dm,not_dm,not_dm")])
 
 (define_insn "mma_<vvi4i4i2>"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
-	(unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa")
-		    (match_operand:V16QI 2 "vsx_register_operand" "v,?wa")
-		    (match_operand:SI 3 "const_0_to_15_operand" "n,n")
-		    (match_operand:SI 4 "const_0_to_15_operand" "n,n")
-		    (match_operand:SI 5 "const_0_to_3_operand" "n,n")]
+  [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d")
+	(unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "wa,v,?wa")
+		    (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa")
+		    (match_operand:SI 3 "const_0_to_15_operand" "n,n,n")
+		    (match_operand:SI 4 "const_0_to_15_operand" "n,n,n")
+		    (match_operand:SI 5 "const_0_to_3_operand" "n,n,n")]
 		    MMA_VVI4I4I2))]
   "TARGET_MMA"
   "<vvi4i4i2> %A0,%x1,%x2,%3,%4,%5"
   [(set_attr "type" "mma")
-   (set_attr "prefixed" "yes")])
+   (set_attr "prefixed" "yes")
+   (set_attr "isa" "dm,not_dm,not_dm")])
 
 (define_insn "mma_<avvi4i4i2>"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
-	(unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0")
-		    (match_operand:V16QI 2 "vsx_register_operand" "v,?wa")
-		    (match_operand:V16QI 3 "vsx_register_operand" "v,?wa")
-		    (match_operand:SI 4 "const_0_to_15_operand" "n,n")
-		    (match_operand:SI 5 "const_0_to_15_operand" "n,n")
-		    (match_operand:SI 6 "const_0_to_3_operand" "n,n")]
+  [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d")
+	(unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0,0")
+		    (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa")
+		    (match_operand:V16QI 3 "vsx_register_operand" "wa,v,?wa")
+		    (match_operand:SI 4 "const_0_to_15_operand" "n,n,n")
+		    (match_operand:SI 5 "const_0_to_15_operand" "n,n,n")
+		    (match_operand:SI 6 "const_0_to_3_operand" "n,n,n")]
 		    MMA_AVVI4I4I2))]
   "TARGET_MMA"
   "<avvi4i4i2> %A0,%x2,%x3,%4,%5,%6"
   [(set_attr "type" "mma")
-   (set_attr "prefixed" "yes")])
+   (set_attr "prefixed" "yes")
+   (set_attr "isa" "dm,not_dm,not_dm")])
 
 (define_insn "mma_<vvi4i4>"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
-	(unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa")
-		    (match_operand:V16QI 2 "vsx_register_operand" "v,?wa")
-		    (match_operand:SI 3 "const_0_to_15_operand" "n,n")
-		    (match_operand:SI 4 "const_0_to_15_operand" "n,n")]
+  [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d")
+	(unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "wa,v,?wa")
+		    (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa")
+		    (match_operand:SI 3 "const_0_to_15_operand" "n,n,n")
+		    (match_operand:SI 4 "const_0_to_15_operand" "n,n,n")]
 		    MMA_VVI4I4))]
   "TARGET_MMA"
   "<vvi4i4> %A0,%x1,%x2,%3,%4"
   [(set_attr "type" "mma")
-   (set_attr "prefixed" "yes")])
+   (set_attr "prefixed" "yes")
+   (set_attr "isa" "dm,not_dm,not_dm")])
 
 (define_insn "mma_<avvi4i4>"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
-	(unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0")
-		    (match_operand:V16QI 2 "vsx_register_operand" "v,?wa")
-		    (match_operand:V16QI 3 "vsx_register_operand" "v,?wa")
-		    (match_operand:SI 4 "const_0_to_15_operand" "n,n")
-		    (match_operand:SI 5 "const_0_to_15_operand" "n,n")]
+  [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d")
+	(unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0,0")
+		    (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa")
+		    (match_operand:V16QI 3 "vsx_register_operand" "wa,v,?wa")
+		    (match_operand:SI 4 "const_0_to_15_operand" "n,n,n")
+		    (match_operand:SI 5 "const_0_to_15_operand" "n,n,n")]
 		    MMA_AVVI4I4))]
   "TARGET_MMA"
   "<avvi4i4> %A0,%x2,%x3,%4,%5"
   [(set_attr "type" "mma")
-   (set_attr "prefixed" "yes")])
+   (set_attr "prefixed" "yes")
+   (set_attr "isa" "dm,not_dm,not_dm")])
 
 (define_insn "mma_<pvi4i2>"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
-	(unspec:XO [(match_operand:OO 1 "vsx_register_operand" "v,?wa")
-		    (match_operand:V16QI 2 "vsx_register_operand" "v,?wa")
-		    (match_operand:SI 3 "const_0_to_15_operand" "n,n")
-		    (match_operand:SI 4 "const_0_to_3_operand" "n,n")]
+  [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d")
+	(unspec:XO [(match_operand:OO 1 "vsx_register_operand" "wa,v,?wa")
+		    (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa")
+		    (match_operand:SI 3 "const_0_to_15_operand" "n,n,n")
+		    (match_operand:SI 4 "const_0_to_3_operand" "n,n,n")]
 		    MMA_PVI4I2))]
   "TARGET_MMA"
   "<pvi4i2> %A0,%x1,%x2,%3,%4"
   [(set_attr "type" "mma")
-   (set_attr "prefixed" "yes")])
+   (set_attr "prefixed" "yes")
+   (set_attr "isa" "dm,not_dm,not_dm")])
 
 (define_insn "mma_<apvi4i2>"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
-	(unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0")
-		    (match_operand:OO 2 "vsx_register_operand" "v,?wa")
-		    (match_operand:V16QI 3 "vsx_register_operand" "v,?wa")
-		    (match_operand:SI 4 "const_0_to_15_operand" "n,n")
-		    (match_operand:SI 5 "const_0_to_3_operand" "n,n")]
+  [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d")
+	(unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0,0")
+		    (match_operand:OO 2 "vsx_register_operand" "wa,v,?wa")
+		    (match_operand:V16QI 3 "vsx_register_operand" "wa,v,?wa")
+		    (match_operand:SI 4 "const_0_to_15_operand" "n,n,n")
+		    (match_operand:SI 5 "const_0_to_3_operand" "n,n,n")]
 		    MMA_APVI4I2))]
   "TARGET_MMA"
   "<apvi4i2> %A0,%x2,%x3,%4,%5"
   [(set_attr "type" "mma")
-   (set_attr "prefixed" "yes")])
+   (set_attr "prefixed" "yes")
+   (set_attr "isa" "dm,not_dm,not_dm")])
 
 (define_insn "mma_<vvi4i4i4>"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
-	(unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa")
-		    (match_operand:V16QI 2 "vsx_register_operand" "v,?wa")
-		    (match_operand:SI 3 "const_0_to_15_operand" "n,n")
-		    (match_operand:SI 4 "const_0_to_15_operand" "n,n")
-		    (match_operand:SI 5 "const_0_to_15_operand" "n,n")]
+  [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d")
+	(unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "wa,v,?wa")
+		    (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa")
+		    (match_operand:SI 3 "const_0_to_15_operand" "n,n,n")
+		    (match_operand:SI 4 "const_0_to_15_operand" "n,n,n")
+		    (match_operand:SI 5 "const_0_to_15_operand" "n,n,n")]
 		    MMA_VVI4I4I4))]
   "TARGET_MMA"
   "<vvi4i4i4> %A0,%x1,%x2,%3,%4,%5"
   [(set_attr "type" "mma")
-   (set_attr "prefixed" "yes")])
+   (set_attr "prefixed" "yes")
+   (set_attr "isa" "dm,not_dm,not_dm")])
 
 (define_insn "mma_<avvi4i4i4>"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
-	(unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0")
-		    (match_operand:V16QI 2 "vsx_register_operand" "v,?wa")
-		    (match_operand:V16QI 3 "vsx_register_operand" "v,?wa")
-		    (match_operand:SI 4 "const_0_to_15_operand" "n,n")
-		    (match_operand:SI 5 "const_0_to_15_operand" "n,n")
-		    (match_operand:SI 6 "const_0_to_15_operand" "n,n")]
+  [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d")
+	(unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0,0")
+		    (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa")
+		    (match_operand:V16QI 3 "vsx_register_operand" "wa,v,?wa")
+		    (match_operand:SI 4 "const_0_to_15_operand" "n,n,n")
+		    (match_operand:SI 5 "const_0_to_15_operand" "n,n,n")
+		    (match_operand:SI 6 "const_0_to_15_operand" "n,n,n")]
 		    MMA_AVVI4I4I4))]
   "TARGET_MMA"
   "<avvi4i4i4> %A0,%x2,%x3,%4,%5,%6"
   [(set_attr "type" "mma")
-   (set_attr "prefixed" "yes")])
+   (set_attr "prefixed" "yes")
+   (set_attr "isa" "dm,not_dm,not_dm")])
diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
index d15bb85743c..ef65ed8c259 100644
--- a/gcc/config/rs6000/rs6000-c.cc
+++ b/gcc/config/rs6000/rs6000-c.cc
@@ -602,6 +602,9 @@ rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT flags)
   /* Tell the user if we support the MMA instructions.  */
   if ((flags & OPTION_MASK_MMA) != 0)
     rs6000_define_or_undefine_macro (define_p, "__MMA__");
+  /* Tell the user if we support the dense math instructions.  */
+  if ((flags & DENSE_MATH_FLAGS) == DENSE_MATH_FLAGS)
+    rs6000_define_or_undefine_macro (define_p, "__PPC_DMR__");
   /* Whether pc-relative code is being generated.  */
   if ((flags & OPTION_MASK_PCREL) != 0)
     rs6000_define_or_undefine_macro (define_p, "__PCREL__");
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 190d3828085..95ecf922143 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -14159,8 +14159,13 @@ print_operand (FILE *file, rtx x, int code)
 	 overlapping with the FPR registers.  */
       if (!REG_P (x))
 	output_operand_lossage ("invalid %%A value");
-      else if (TARGET_DENSE_MATH && DMR_REGNO_P (REGNO (x)))
-	fprintf (file, "%d", REGNO (x) - FIRST_DMR_REGNO);
+      else if (TARGET_DENSE_MATH)
+	{
+	  if (DMR_REGNO_P (REGNO (x)))
+	    fprintf (file, "%d", REGNO (x) - FIRST_DMR_REGNO);
+	  else
+	    output_operand_lossage ("%%A operand is not a DMR");
+	}
       else if (!FP_REGNO_P (REGNO (x)) || (REGNO (x) % 4) != 0)
 	output_operand_lossage ("invalid %%A value");
       else
@@ -27609,7 +27614,7 @@ rs6000_split_multireg_move (rtx dst, rtx src)
 
 	  /* If we are reading an accumulator register, we have to
 	     deprime it before we can access it.  */
-	  if (TARGET_MMA
+	  if (TARGET_MMA && !TARGET_DENSE_MATH
 	      && GET_MODE (src) == XOmode && FP_REGNO_P (REGNO (src)))
 	    emit_insn (gen_mma_xxmfacc (src, src));
 
@@ -27641,9 +27646,9 @@ rs6000_split_multireg_move (rtx dst, rtx src)
 	      emit_insn (gen_rtx_SET (dst2, src2));
 	    }
 
-	  /* If we are writing an accumulator register, we have to
-	     prime it after we've written it.  */
-	  if (TARGET_MMA
+	  /* If we are writing an accumulator register that overlaps with the
+	     FPR registers, we have to prime it after we've written it.  */
+	  if (TARGET_MMA && !TARGET_DENSE_MATH
 	      && GET_MODE (dst) == XOmode && FP_REGNO_P (REGNO (dst)))
 	    emit_insn (gen_mma_xxmtacc (dst, dst));
 
@@ -27712,9 +27717,9 @@ rs6000_split_multireg_move (rtx dst, rtx src)
 	      emit_insn (gen_rtx_SET (dst_i, op));
 	    }
 
-	  /* We are writing an accumulator register, so we have to
-	     prime it after we've written it.  */
-	  if (GET_MODE (src) == XOmode)
+	  /* On systems without dense math where accumulators overlap with the
+	     vector registers, we have to prime it after we've written it.  */
+	  if (GET_MODE (src) == XOmode && !TARGET_DENSE_MATH)
 	    emit_insn (gen_mma_xxmtacc (dst, dst));
 
 	  return;
@@ -27725,9 +27730,9 @@ rs6000_split_multireg_move (rtx dst, rtx src)
 
   if (REG_P (src) && REG_P (dst) && (REGNO (src) < REGNO (dst)))
     {
-      /* If we are reading an accumulator register, we have to
-	 deprime it before we can access it.  */
-      if (TARGET_MMA
+      /* If we are reading an accumulator register and we don't have dense
+	 math, we have to deprime it before we can access it.  */
+      if (TARGET_MMA && !TARGET_DENSE_MATH
 	  && GET_MODE (src) == XOmode && FP_REGNO_P (REGNO (src)))
 	emit_insn (gen_mma_xxmfacc (src, src));
 
@@ -27755,7 +27760,7 @@ rs6000_split_multireg_move (rtx dst, rtx src)
 
       /* If we are writing an accumulator register, we have to
 	 prime it after we've written it.  */
-      if (TARGET_MMA
+      if (TARGET_MMA && !TARGET_DENSE_MATH
 	  && GET_MODE (dst) == XOmode && FP_REGNO_P (REGNO (dst)))
 	emit_insn (gen_mma_xxmtacc (dst, dst));
     }
@@ -27892,7 +27897,7 @@ rs6000_split_multireg_move (rtx dst, rtx src)
 
       /* If we are reading an accumulator register, we have to
 	 deprime it before we can access it.  */
-      if (TARGET_MMA && REG_P (src)
+      if (TARGET_MMA && !TARGET_DENSE_MATH && REG_P (src)
 	  && GET_MODE (src) == XOmode && FP_REGNO_P (REGNO (src)))
 	emit_insn (gen_mma_xxmfacc (src, src));
 
@@ -27924,7 +27929,7 @@ rs6000_split_multireg_move (rtx dst, rtx src)
 
       /* If we are writing an accumulator register, we have to
 	 prime it after we've written it.  */
-      if (TARGET_MMA && REG_P (dst)
+      if (TARGET_MMA && !TARGET_DENSE_MATH && REG_P (dst)
 	  && GET_MODE (dst) == XOmode && FP_REGNO_P (REGNO (dst)))
 	emit_insn (gen_mma_xxmtacc (dst, dst));
 
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index bea5db90fab..caec2b3bf62 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -566,6 +566,9 @@ extern int rs6000_vector_align[];
 #define TARGET_DENSE_MATH		(TARGET_MMA && TARGET_FUTURE)
 #define TARGET_MMA_NO_DENSE_MATH	(TARGET_MMA && !TARGET_FUTURE)
 
+/* Dense math flags.  */
+#define DENSE_MATH_FLAGS		(OPTION_MASK_FUTURE & OPTION_MASK_MMA)
+
 /* Inlining allows targets to define the meanings of bits in target_info
    field of ipa_fn_summary by itself, the used bits for rs6000 are listed
    below.  */

             reply	other threads:[~2024-02-29 19:18 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-29 19:18 Michael Meissner [this message]
  -- strict thread matches above, loose matches on Subject: below --
2024-03-01 21:46 Michael Meissner
2024-02-29 19:28 Michael Meissner
2024-02-29 19:18 Michael Meissner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240229191811.3A0AB3858016@sourceware.org \
    --to=meissner@gcc.gnu.org \
    --cc=gcc-cvs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).