public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 0/3, v2] rs6000: Add support for Matrix-Multiply Assist (MMA) built-in functions.
@ 2020-06-18 20:42 Peter Bergner
  2020-06-18 20:44 ` [PATCH 1/3, v2] rs6000: Add base support and types for defining MMA built-ins Peter Bergner
                   ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: Peter Bergner @ 2020-06-18 20:42 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: GCC Patches, Bill Schmidt, Michael Meissner, David Edelsohn,
	Will Schmidt

POWER ISA 3.1 added new Matrix-Multiply Assist (MMA) instructions.
The following patch set adds support for generating these instructions
through built-in functions which are enabled with the -mmma option.

The patch1 and patch1+patch2+patch3 have been bootstrapped and regtested on
powerpc64le-linux with no regressions.  In addition, patch1+patch2+patch3
has been bootstrapped and regtested on powerpc64-linux (BE), also without
regressions.  I'll note that I split the testsuite changes into their own
patch for review purposes, but I plan on committing patch2 and patch3 together.

Changes since v1:
  Patch 1/3:
    - Modified verbiage in mma.md per Will's suggestion.
    - Modified rs6000_split_multireg_move to correctly handle BE PXImode
      and POImode moves.
  Patch 2/3:
    - Updated ChangeLog entry per Segher's suggestion.
    - Updated doc/extend.texi with correct built-in names for
      __builtin_vsx_xvcvspbf16 and __builtin_vsx_xvcvbf16sp.
  Patch 3/3:
    - No changes.

Peter



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/3, v2] rs6000: Add base support and types for defining MMA built-ins.
  2020-06-18 20:42 [PATCH 0/3, v2] rs6000: Add support for Matrix-Multiply Assist (MMA) built-in functions Peter Bergner
@ 2020-06-18 20:44 ` Peter Bergner
  2020-06-18 23:44   ` Segher Boessenkool
  2020-06-18 20:45 ` [PATCH 2/3, v2] rs6000: Add MMA built-in function definitions Peter Bergner
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 19+ messages in thread
From: Peter Bergner @ 2020-06-18 20:44 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: GCC Patches, Bill Schmidt, Michael Meissner, David Edelsohn,
	Will Schmidt

Changes since v1:
  - Modified verbiage in mma.md per Will's suggestion.
  - Modified rs6000_split_multireg_move to correctly handle BE PXImode
    and POImode moves.

This patch adds the new -mmma option as well as the initial MMA support,
which includes the target specific __vector_pair and __vector_quad types,
the POImode and PXImode partial integer modes they are mapped to, and their
associated  move patterns.  Support for the restrictions on the registers
these modes can be assigned to as also been added.

The v1 patch passed bootstrap and regtesting with no regressions on
powerpc64le-linux.  This updated patch is bootstrapping and regtesting
on powerpc64le-linux.  Ok for trunk if there are no regressions?

Peter

2020-06-18  Peter Bergner  <bergner@linux.ibm.com>
	    Michael Meissner  <meissner@linux.ibm.com>

gcc/
	* config/rs6000/mma.md: New file.
	* config/rs6000/rs6000-c.c (rs6000_target_modify_macros): Define
	__MMA__ for mma.
	* config/rs6000/rs6000-call.c (rs6000_init_builtins): Add support
	for __vector_pair and __vector_quad types.
	* config/rs6000/rs6000-cpus.def (OTHER_FUTURE_MASKS): Add
	OPTION_MASK_MMA.
	(POWERPC_MASKS): Likewise.
	* config/rs6000/rs6000-modes.def (OI, XI): New integer modes.
	(POI, PXI): New partial integer modes.
	* config/rs6000/rs6000.c (TARGET_INVALID_CONVERSION): Define.
	(rs6000_hard_regno_nregs_internal): Use VECTOR_ALIGNMENT_P.
	(rs6000_hard_regno_mode_ok_uncached): Likewise.
	Add support for POImode being allowed in VSX registers and PXImode
	being allowed in FP registers.
	(rs6000_modes_tieable_p): Adjust comment.
	Add support for POImode and PXImode.
	(rs6000_debug_reg_global) <print_tieable_modes>: Add OImode, POImode
	XImode and PXImode.
	(rs6000_setup_reg_addr_masks): Use VECTOR_ALIGNMENT_P.
	Set up appropriate addr_masks for vector pair and vector quad addresses.
	(rs6000_init_hard_regno_mode_ok): Add support for vector pair and
	vector quad registers.  Setup reload handlers for POImode and PXImode.
	(rs6000_builtin_mask_calculate): Add support for RS6000_BTM_MMA
	and RS6000_BTM_FUTURE.
	(rs6000_option_override_internal): Error if -mmma is specified
	without -mcpu=future.
	(rs6000_slow_unaligned_access): Use VECTOR_ALIGNMENT_P.
	(quad_address_p): Change size test to less than 16 bytes.
	(reg_offset_addressing_ok_p): Add support for ISA 3.1 vector pair
	and vector quad instructions.
	(avoiding_indexed_address_p): Likewise.
	(rs6000_emit_move): Disallow POImode and PXImode moves involving
	constants.
	(rs6000_preferred_reload_class): Prefer VSX registers for POImode
	and FP registers for PXImode.
	(rs6000_split_multireg_move): Support splitting POImode and PXImode
	move instructions.  Insert xxmtacc and xxmfacc instructions when
	setting a PXImode register and reading a PXImode register respectively.
	(rs6000_mangle_type): Adjust comment.  Add support for mangling
	__vector_pair and __vector_quad types.
	(rs6000_opt_masks): Add entry for mma.
	(rs6000_builtin_mask_names): Add RS6000_BTM_MMA and RS6000_BTM_FUTURE.
	(rs6000_function_value): Use VECTOR_ALIGNMENT_P.
	(address_to_insn_form): Likewise.
	(reg_to_non_prefixed): Likewise.
	(rs6000_invalid_conversion): New function.
	* config/rs6000/rs6000.h (MASK_MMA): Define.
	(BIGGEST_ALIGNMENT): Set to 512 if MMA support is enabled.
	(VECTOR_ALIGNMENT_P): New helper macro.
	(ALTIVEC_VECTOR_MODE): Use VECTOR_ALIGNMENT_P.
	(RS6000_BTM_MMA): Define.
	(RS6000_BTM_COMMON): Add RS6000_BTM_MMA and RS6000_BTM_FUTURE.
	(rs6000_builtin_type_index): Add RS6000_BTI_vector_pair and
	RS6000_BTI_vector_quad.
	(vector_pair_type_node): Define.
	(vector_quad_type_node): Likewise.
	* config/rs6000/rs6000.md (define_attr "isa"): Add mma.
	(define_attr "enabled"): Handle mma.
	(define_mode_iterator RELOAD): Add POI and PXI.
	Include mma.md.
	* config/rs6000/t-rs6000 (MD_INCLUDES): Add mma.md.
	* config/rs6000/rs6000.opt (-mmma): New.
	* doc/invoke.texi: Document -mmma.

diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
new file mode 100644
index 00000000000..66c3cb5f2dc
--- /dev/null
+++ b/gcc/config/rs6000/mma.md
@@ -0,0 +1,126 @@
+;; Matrix-Multiply Assist (MMA) patterns.
+;; Copyright (C) 2020 Free Software Foundation, Inc.
+;; Contributed by Peter Bergner <bergner@linux.ibm.com> and
+;;		  Michael Meissner <meissner@linux.ibm.com>
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published
+;; by the Free Software Foundation; either version 3, or (at your
+;; option) any later version.
+;;
+;; GCC is distributed in the hope that it will be useful, but WITHOUT
+;; ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+;; or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+;; License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+;; The MMA patterns use the multi-register PXImode and POImode partial
+;; integer modes to implement the target specific __vector_quad and
+;; __vector_pair types that the MMA built-in functions reference.
+;; To use these modes, we must define XImode and OImode move patterns
+;; so the independent parts of the compiler can use our large partial
+;; integer modes.  However, if we enable the XImode and OImode move
+;; patterns, then the compiler will attempt to use them and this can
+;; cause byte swapping issues on litte-endian systems.  We don't need
+;; the XImode and OImode move patterns for actual code generation,
+;; therefor, we define the XImode and OImode move patterns, but we
+;; disable their use with a "false" condition flag.
+
+;; Define a disabled OImode move pattern, so we can use POImode.
+(define_expand "movoi"
+  [(set (match_operand:OI 0 "nonimmediate_operand")
+	(match_operand:OI 1 "input_operand"))]
+  "0"
+{
+  gcc_unreachable ();
+})
+
+;; Vector pair support.  POImode is only defined for vector registers.
+(define_expand "movpoi"
+  [(set (match_operand:POI 0 "nonimmediate_operand")
+	(match_operand:POI 1 "input_operand"))]
+  "TARGET_MMA"
+{
+  rs6000_emit_move (operands[0], operands[1], POImode);
+  DONE;
+})
+
+(define_insn_and_split "*movpoi"
+  [(set (match_operand:POI 0 "nonimmediate_operand" "=wa,m,wa")
+	(match_operand:POI 1 "input_operand"	    "m,wa,wa"))]
+  "TARGET_MMA
+   && (gpc_reg_operand (operands[0], POImode)
+       || gpc_reg_operand (operands[1], POImode))"
+  "@
+   lxvp%X1 %x0,%1
+   stxvp%X0 %x1,%0
+   #"
+  "&& reload_completed
+   && (!MEM_P (operands[0]) && !MEM_P (operands[1]))"
+  [(const_int 0)]
+{
+  rs6000_split_multireg_move (operands[0], operands[1]);
+  DONE;
+}
+  [(set_attr "type" "vecload,vecstore,veclogical")
+   (set_attr "length" "*,*,8")])
+
+;; Special pattern to prevent DSE from generating an internal error if it
+;; notices a structure copy that it wants to eliminate.  This generates pretty
+;; bad code, but at least it doesn't die.
+(define_insn_and_split "truncpoidi2"
+  [(set (match_operand:DI 0 "gpc_reg_operand" "=r")
+	(truncate:DI (match_operand:POI 1 "gpc_reg_operand" "wa")))]
+  "TARGET_MMA"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0)
+	(vec_select:DI (match_dup 2)
+		       (parallel [(match_dup 3)])))]
+{
+  unsigned r = reg_or_subregno (operands[1]) + !BYTES_BIG_ENDIAN;
+  operands[2] = gen_rtx_REG (V2DImode, r);
+  operands[3] = BYTES_BIG_ENDIAN ? const1_rtx : const0_rtx;
+})
+
+\f
+;; Define a disabled XImode move pattern, so we can use PXImode.
+(define_expand "movxi"
+  [(set (match_operand:XI 0 "nonimmediate_operand")
+	(match_operand:XI 1 "input_operand"))]
+  "0"
+{
+  gcc_unreachable ();
+})
+
+;; Vector quad support.  PXImode is only defined for floating point registers.
+(define_expand "movpxi"
+  [(set (match_operand:PXI 0 "nonimmediate_operand")
+	(match_operand:PXI 1 "input_operand"))]
+  "TARGET_MMA"
+{
+  rs6000_emit_move (operands[0], operands[1], PXImode);
+  DONE;
+})
+
+(define_insn_and_split "*movpxi"
+  [(set (match_operand:PXI 0 "nonimmediate_operand" "=d,m,d")
+	(match_operand:PXI 1 "input_operand" "m,d,d"))]
+  "TARGET_MMA
+   && (gpc_reg_operand (operands[0], PXImode)
+       || gpc_reg_operand (operands[1], PXImode))"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  rs6000_split_multireg_move (operands[0], operands[1]);
+  DONE;
+}
+  [(set_attr "type" "vecload,vecstore,veclogical")
+   (set_attr "length" "8,8,16")
+   (set_attr "max_prefixed_insns" "2,2,*")])
diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
index 07ca33a89b4..47514552449 100644
--- a/gcc/config/rs6000/rs6000-c.c
+++ b/gcc/config/rs6000/rs6000-c.c
@@ -593,6 +593,10 @@ rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT flags,
      PROCESSOR_CELL) (e.g. -mcpu=cell).  */
   if ((bu_mask & RS6000_BTM_CELL) != 0)
     rs6000_define_or_undefine_macro (define_p, "__PPU__");
+
+  /* Tell the user if we support the MMA instructions.  */
+  if ((flags & OPTION_MASK_MMA) != 0)
+    rs6000_define_or_undefine_macro (define_p, "__MMA__");
 }
 
 void
diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index 817a14c9c0d..eeb20e5200d 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -12205,6 +12205,24 @@ rs6000_init_builtins (void)
   else
     ieee128_float_type_node = ibm128_float_type_node = long_double_type_node;
 
+  /* Vector paired and vector quad support.  */
+  if (TARGET_MMA)
+    {
+      tree oi_uns_type = make_unsigned_type (256);
+      vector_pair_type_node = build_distinct_type_copy (oi_uns_type);
+      SET_TYPE_MODE (vector_pair_type_node, POImode);
+      layout_type (vector_pair_type_node);
+      lang_hooks.types.register_builtin_type (vector_pair_type_node,
+					      "__vector_pair");
+
+      tree xi_uns_type = make_unsigned_type (512);
+      vector_quad_type_node = build_distinct_type_copy (xi_uns_type);
+      SET_TYPE_MODE (vector_quad_type_node, PXImode);
+      layout_type (vector_quad_type_node);
+      lang_hooks.types.register_builtin_type (vector_quad_type_node,
+					      "__vector_quad");
+    }
+
   /* Initialize the modes for builtin_function_type, mapping a machine mode to
      tree type node.  */
   builtin_mode_to_type[QImode][0] = integer_type_node;
@@ -12236,6 +12254,8 @@ rs6000_init_builtins (void)
   builtin_mode_to_type[V8HImode][1] = unsigned_V8HI_type_node;
   builtin_mode_to_type[V16QImode][0] = V16QI_type_node;
   builtin_mode_to_type[V16QImode][1] = unsigned_V16QI_type_node;
+  builtin_mode_to_type[POImode][1] = vector_pair_type_node;
+  builtin_mode_to_type[PXImode][1] = vector_quad_type_node;
 
   tdecl = add_builtin_type ("__bool char", bool_char_type_node);
   TYPE_NAME (bool_char_type_node) = tdecl;
diff --git a/gcc/config/rs6000/rs6000-cpus.def b/gcc/config/rs6000/rs6000-cpus.def
index 83362e05b10..667c7ecefb8 100644
--- a/gcc/config/rs6000/rs6000-cpus.def
+++ b/gcc/config/rs6000/rs6000-cpus.def
@@ -76,7 +76,8 @@
 				 | OPTION_MASK_P9_VECTOR)
 
 /* Flags that need to be turned off if -mno-future.  */
-#define OTHER_FUTURE_MASKS	(OPTION_MASK_PCREL			\
+#define OTHER_FUTURE_MASKS	(OPTION_MASK_MMA			\
+				 | OPTION_MASK_PCREL			\
 				 | OPTION_MASK_PREFIXED)
 
 /* Support for a future processor's features.  */
@@ -132,6 +133,7 @@
 				 | OPTION_MASK_HTM			\
 				 | OPTION_MASK_ISEL			\
 				 | OPTION_MASK_MFCRF			\
+				 | OPTION_MASK_MMA			\
 				 | OPTION_MASK_MODULO			\
 				 | OPTION_MASK_MULHW			\
 				 | OPTION_MASK_NO_UPDATE		\
diff --git a/gcc/config/rs6000/rs6000-modes.def b/gcc/config/rs6000/rs6000-modes.def
index 5f43cadff80..ddb218b3fba 100644
--- a/gcc/config/rs6000/rs6000-modes.def
+++ b/gcc/config/rs6000/rs6000-modes.def
@@ -82,3 +82,13 @@ VECTOR_MODE (INT, SI, 2);     /*                 V2SI  */
    for quad memory atomic operations to force getting an even/odd register
    combination.  */
 PARTIAL_INT_MODE (TI, 128, PTI);
+
+/* Define, but don't use the larger integer modes.  We need an integer mode
+   defined that is the same size as the vector pair and vector quad modes.  */
+
+INT_MODE (OI, 32);
+INT_MODE (XI, 64);
+
+/* Modes used by __vector_pair and __vector_quad.  */
+PARTIAL_INT_MODE (OI, 256, POI);	/* __vector_pair.  */
+PARTIAL_INT_MODE (XI, 512, PXI);	/* __vector_quad.  */
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 58f5d780603..a0f4991d00a 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -1745,6 +1745,9 @@ static const struct attribute_spec rs6000_attribute_table[] =
 #undef TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV_P
 #define TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV_P \
   rs6000_cannot_substitute_mem_equiv_p
+
+#undef TARGET_INVALID_CONVERSION
+#define TARGET_INVALID_CONVERSION rs6000_invalid_conversion
 \f
 
 /* Processor table.  */
@@ -1798,7 +1801,7 @@ rs6000_hard_regno_nregs_internal (int regno, machine_mode mode)
      128-bit floating point that can go in vector registers, which has VSX
      memory addressing.  */
   if (FP_REGNO_P (regno))
-    reg_size = (VECTOR_MEM_VSX_P (mode) || FLOAT128_VECTOR_P (mode)
+    reg_size = (VECTOR_MEM_VSX_P (mode) || VECTOR_ALIGNMENT_P (mode)
 		? UNITS_PER_VSX_WORD
 		: UNITS_PER_FP_WORD);
 
@@ -1821,6 +1824,20 @@ rs6000_hard_regno_mode_ok_uncached (int regno, machine_mode mode)
   if (COMPLEX_MODE_P (mode))
     mode = GET_MODE_INNER (mode);
 
+  /* Vector pair modes need even/odd VSX register pairs.  Only allow vector
+     registers.  We need to allow OImode to have the same registers as POImode,
+     even though we do not enable the move pattern for OImode.  */
+  if (mode == POImode || mode == OImode)
+    return (TARGET_MMA && VSX_REGNO_P (regno)
+	    && (regno & 1) == 0);
+
+  /* MMA accumulator modes need FPR registers divisible by 4.  We need to allow
+     XImode to have the same registers as PXImode, even though we do not enable
+     the move pattern for XImode.  */
+  if (mode == PXImode || mode == XImode)
+    return (TARGET_MMA && FP_REGNO_P (regno)
+	    && (regno & 3) == 0);
+
   /* PTImode can only go in GPRs.  Quad word memory operations require even/odd
      register combinations, and use PTImode where we need to deal with quad
      word memory operations.  Don't allow quad words in the argument or frame
@@ -1836,7 +1853,7 @@ rs6000_hard_regno_mode_ok_uncached (int regno, machine_mode mode)
      asked for it.  */
   if (TARGET_VSX && VSX_REGNO_P (regno)
       && (VECTOR_MEM_VSX_P (mode)
-	  || FLOAT128_VECTOR_P (mode)
+	  || VECTOR_ALIGNMENT_P (mode)
 	  || reg_addr[mode].scalar_in_vmx_p
 	  || mode == TImode
 	  || (TARGET_VADDUQM && mode == V1TImode)))
@@ -1846,7 +1863,7 @@ rs6000_hard_regno_mode_ok_uncached (int regno, machine_mode mode)
 
       if (ALTIVEC_REGNO_P (regno))
 	{
-	  if (GET_MODE_SIZE (mode) != 16 && !reg_addr[mode].scalar_in_vmx_p)
+	  if (GET_MODE_SIZE (mode) < 16 && !reg_addr[mode].scalar_in_vmx_p)
 	    return 0;
 
 	  return ALTIVEC_REGNO_P (last_regno);
@@ -1862,7 +1879,7 @@ rs6000_hard_regno_mode_ok_uncached (int regno, machine_mode mode)
      modes and DImode.  */
   if (FP_REGNO_P (regno))
     {
-      if (FLOAT128_VECTOR_P (mode))
+      if (VECTOR_ALIGNMENT_P (mode))
 	return false;
 
       if (SCALAR_FLOAT_MODE_P (mode)
@@ -1925,15 +1942,19 @@ rs6000_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
    GPR registers, and TImode can go in any GPR as well as VSX registers (PR
    57744).
 
+   Similarly, don't allow POImode (vector pair, restricted to even VSX
+   registers) or PXImode (vector quad, restricted to FPR registers divisible
+   by 4) to tie with other modes.
+
    Altivec/VSX vector tests were moved ahead of scalar float mode, so that IEEE
    128-bit floating point on VSX systems ties with other vectors.  */
 
 static bool
 rs6000_modes_tieable_p (machine_mode mode1, machine_mode mode2)
 {
-  if (mode1 == PTImode)
-    return mode2 == PTImode;
-  if (mode2 == PTImode)
+  if (mode1 == PTImode || mode1 == POImode || mode1 == PXImode)
+    return mode1 == mode2;
+  if (mode2 == PTImode || mode2 == POImode || mode2 == PXImode)
     return false;
 
   if (ALTIVEC_OR_VSX_VECTOR_MODE (mode1))
@@ -2206,6 +2227,8 @@ rs6000_debug_reg_global (void)
     SDmode,
     DDmode,
     TDmode,
+    V2SImode,
+    V2SFmode,
     V16QImode,
     V8HImode,
     V4SImode,
@@ -2220,9 +2243,14 @@ rs6000_debug_reg_global (void)
     V2DFmode,
     V8SFmode,
     V4DFmode,
+    OImode,
+    XImode,
+    POImode,
+    PXImode,
     CCmode,
     CCUNSmode,
     CCEQmode,
+    CCFPmode,
   };
 
   /* Virtual regs we are interested in.  */
@@ -2619,7 +2647,7 @@ rs6000_setup_reg_addr_masks (void)
 		  && (rc == RELOAD_REG_GPR || rc == RELOAD_REG_FPR)
 		  && msize <= 8
 		  && !VECTOR_MODE_P (m2)
-		  && !FLOAT128_VECTOR_P (m2)
+		  && !VECTOR_ALIGNMENT_P (m2)
 		  && !complex_p
 		  && (m != E_DFmode || !TARGET_VSX)
 		  && (m != E_SFmode || !TARGET_P8_VECTOR)
@@ -2675,6 +2703,22 @@ rs6000_setup_reg_addr_masks (void)
 		addr_mask |= RELOAD_REG_QUAD_OFFSET;
 	    }
 
+	  /* Vector pairs can do both indexed and offset loads if the
+	     instructions are enabled, otherwise they can only do offset loads
+	     since it will be broken into two vector moves.  Vector quads can
+	     only do offset loads.  */
+	  else if ((addr_mask != 0) && TARGET_MMA
+		   && (m2 == POImode || m2 == PXImode))
+	    {
+	      addr_mask |= RELOAD_REG_OFFSET;
+	      if (rc == RELOAD_REG_FPR || rc == RELOAD_REG_VMX)
+		{
+		  addr_mask |= RELOAD_REG_QUAD_OFFSET;
+		  if (m2 == POImode)
+		    addr_mask |= RELOAD_REG_INDEXED;
+		}
+	    }
+
 	  /* VMX registers can do (REG & -16) and ((REG+REG) & -16)
 	     addressing on 128-bit types.  */
 	  if (rc == RELOAD_REG_VMX && msize == 16
@@ -2876,6 +2920,18 @@ rs6000_init_hard_regno_mode_ok (bool global_init_p)
       rs6000_vector_align[TImode] = align64;
     }
 
+  /* Add support for vector pairs and vector quad registers.  */
+  if (TARGET_MMA)
+    {
+      for (m = 0; m < NUM_MACHINE_MODES; ++m)
+	if (m == POImode || m == PXImode)
+	  {
+	    rs6000_vector_unit[m] = VECTOR_NONE;
+	    rs6000_vector_mem[m] = VECTOR_VSX;
+	    rs6000_vector_align[m] = (m == POImode) ? 256 : 512;
+	  }
+    }
+
   /* Register class constraints for the constraints that depend on compile
      switches. When the VSX code was added, different constraints were added
      based on the type (DFmode, V2DFmode, V4SFmode).  For the vector types, all
@@ -3007,6 +3063,14 @@ rs6000_init_hard_regno_mode_ok (bool global_init_p)
 		  reg_addr[TFmode].reload_gpr_vsx = CODE_FOR_reload_gpr_from_vsxtf;
 		  reg_addr[TFmode].reload_vsx_gpr = CODE_FOR_reload_vsx_from_gprtf;
 		}
+
+	      if (TARGET_MMA)
+		{
+		  reg_addr[POImode].reload_store = CODE_FOR_reload_poi_di_store;
+		  reg_addr[POImode].reload_load = CODE_FOR_reload_poi_di_load;
+		  reg_addr[PXImode].reload_store = CODE_FOR_reload_pxi_di_store;
+		  reg_addr[PXImode].reload_load = CODE_FOR_reload_pxi_di_load;
+		}
 	    }
 	}
       else
@@ -3339,7 +3403,8 @@ rs6000_builtin_mask_calculate (void)
 	      && !TARGET_IEEEQUAD)	    ? RS6000_BTM_LDBL128   : 0)
 	  | ((TARGET_FLOAT128_TYPE)	    ? RS6000_BTM_FLOAT128  : 0)
 	  | ((TARGET_FLOAT128_HW)	    ? RS6000_BTM_FLOAT128_HW : 0)
-	  | ((TARGET_FUTURE)                ? RS6000_BTM_FUTURE    : 0));
+	  | ((TARGET_MMA)		    ? RS6000_BTM_MMA	   : 0)
+	  | ((TARGET_FUTURE)		    ? RS6000_BTM_FUTURE    : 0));
 }
 
 /* Implement TARGET_MD_ASM_ADJUST.  All asm statements are considered
@@ -4202,6 +4267,15 @@ rs6000_option_override_internal (bool global_init_p)
       rs6000_isa_flags &= ~OPTION_MASK_PCREL;
     }
 
+  /* Turn off vector pair/mma options on non-future systems.  */
+  if (!TARGET_FUTURE && TARGET_MMA)
+    {
+      if ((rs6000_isa_flags_explicit & OPTION_MASK_MMA) != 0)
+	error ("%qs requires %qs", "-mmma", "-mcpu=future");
+
+      rs6000_isa_flags &= ~OPTION_MASK_MMA;
+    }
+
   if (TARGET_DEBUG_REG || TARGET_DEBUG_TARGET)
     rs6000_print_isa_options (stderr, 0, "after subtarget", rs6000_isa_flags);
 
@@ -7175,7 +7249,7 @@ rs6000_slow_unaligned_access (machine_mode mode, unsigned int align)
   return (STRICT_ALIGNMENT
 	  || (!TARGET_EFFICIENT_UNALIGNED_VSX
 	      && ((SCALAR_FLOAT_MODE_NOT_VECTOR_P (mode) && align < 32)
-		  || ((VECTOR_MODE_P (mode) || FLOAT128_VECTOR_P (mode))
+		  || ((VECTOR_MODE_P (mode) || VECTOR_ALIGNMENT_P (mode))
 		      && (int) align < VECTOR_ALIGN (mode)))));
 }
 
@@ -7360,7 +7434,7 @@ quad_address_p (rtx addr, machine_mode mode, bool strict)
 {
   rtx op0, op1;
 
-  if (GET_MODE_SIZE (mode) != 16)
+  if (GET_MODE_SIZE (mode) < 16)
     return false;
 
   if (legitimate_indirect_address_p (addr, strict))
@@ -7678,6 +7752,12 @@ reg_offset_addressing_ok_p (machine_mode mode)
 	return mode_supports_dq_form (mode);
       break;
 
+      /* The vector pair/quad types support offset addressing if the
+	 underlying vectors support offset addressing.  */
+    case E_POImode:
+    case E_PXImode:
+      return TARGET_MMA;
+
     case E_SDmode:
       /* If we can do direct load/stores of SDmode, restrict it to reg+reg
 	 addressing for the LFIWZX and STFIWX instructions.  */
@@ -8024,8 +8104,14 @@ legitimate_indexed_address_p (rtx x, int strict)
 bool
 avoiding_indexed_address_p (machine_mode mode)
 {
-  /* Avoid indexed addressing for modes that have non-indexed
-     load/store instruction forms.  */
+  unsigned int msize = GET_MODE_SIZE (mode);
+
+  /* Avoid indexed addressing for modes that have non-indexed load/store
+     instruction forms.  On the future system, vector pairs have an indexed
+     form, but vector quads don't.  */
+  if (msize > 16)
+    return msize != 32;
+
   return (TARGET_AVOID_XFORM && VECTOR_MEM_NONE_P (mode));
 }
 
@@ -9856,6 +9942,13 @@ rs6000_emit_move (rtx dest, rtx source, machine_mode mode)
 	operands[1] = force_const_mem (mode, operands[1]);
       break;
 
+    case E_POImode:
+    case E_PXImode:
+      if (CONSTANT_P (operands[1]))
+	error ("%qs is an opaque type, and you can't set it to other values.",
+	       (mode == POImode) ? "__vector_pair" : "__vector_quad");
+      break;
+
     case E_SImode:
     case E_DImode:
       /* Use default pattern for address of ELF small data */
@@ -12117,8 +12210,20 @@ rs6000_preferred_reload_class (rtx x, enum reg_class rclass)
       return NO_REGS;
     }
 
-  if (GET_MODE_CLASS (mode) == MODE_INT && rclass == GEN_OR_FLOAT_REGS)
-    return GENERAL_REGS;
+  /* For the vector pair and vector quad modes, prefer their natural register
+     (VSX or FPR) rather than GPR registers.  For other integer types, prefer
+     the GPR registers.  */
+  if (rclass == GEN_OR_FLOAT_REGS)
+    {
+      if (mode == POImode)
+	return VSX_REGS;
+
+      if (mode == PXImode)
+	return FLOAT_REGS;
+
+      if (GET_MODE_CLASS (mode) == MODE_INT)
+	return GENERAL_REGS;
+    }
 
   return rclass;
 }
@@ -15793,7 +15898,23 @@ rs6000_split_multireg_move (rtx dst, rtx src)
   reg = REG_P (dst) ? REGNO (dst) : REGNO (src);
   mode = GET_MODE (dst);
   nregs = hard_regno_nregs (reg, mode);
-  if (FP_REGNO_P (reg))
+  /* If we have a quad vector register for MMA, and this is a load or store,
+     see if we can use vector paired load/stores.  */
+  if (mode == PXImode && TARGET_MMA
+      && (MEM_P (dst) || MEM_P (src)))
+    {
+      reg_mode = POImode;;
+      nregs /= hard_regno_nregs (reg, reg_mode);
+    }
+
+  /* If we have a vector pair/quad mode, split it into two/four separate
+     vectors.  */
+  else if (mode == POImode || mode == PXImode)
+    {
+      reg_mode = V1TImode;
+      nregs /= hard_regno_nregs (reg, reg_mode);
+    }
+  else if (FP_REGNO_P (reg))
     reg_mode = DECIMAL_FLOAT_MODE_P (mode) ? DDmode :
 	(TARGET_HARD_FLOAT ? DFmode : SFmode);
   else if (ALTIVEC_REGNO_P (reg))
@@ -15837,6 +15958,51 @@ rs6000_split_multireg_move (rtx dst, rtx src)
       return;
     }
 
+  /* The __vector_pair and __vector_quad modes are multi-register modes,
+     so if have to load or store the registers, we have to be careful to
+     properly swap them if we're in little endian mode below.  This means
+     the last register gets the first memory location.  */
+  if (mode == POImode || mode == PXImode)
+    {
+      if (MEM_P (dst))
+	{
+	  unsigned offset = 0;
+	  unsigned size = GET_MODE_SIZE (reg_mode);
+
+	  for (int i = 0; i < nregs; i++)
+	    {
+	      unsigned subreg = (WORDS_BIG_ENDIAN)
+				  ? i * size : (nregs - 1 - i) * size;
+	      rtx dst2 = adjust_address (dst, reg_mode, offset);
+	      rtx src2 = simplify_gen_subreg (reg_mode, src, mode, subreg);
+	      offset += size;
+	      emit_insn (gen_rtx_SET (dst2, src2));
+	    }
+
+	  return;
+	}
+
+      if (MEM_P (src))
+	{
+	  unsigned offset = 0;
+	  unsigned size = GET_MODE_SIZE (reg_mode);
+
+	  for (int i = 0; i < nregs; i++)
+	    {
+	      unsigned subreg = (WORDS_BIG_ENDIAN)
+				  ? i * size : (nregs - 1 - i) * size;
+	      rtx dst2 = simplify_gen_subreg (reg_mode, dst, mode, subreg);
+	      rtx src2 = adjust_address (src, reg_mode, offset);
+	      offset += size;
+	      emit_insn (gen_rtx_SET (dst2, src2));
+	    }
+
+	  return;
+	}
+
+      /* Register -> register moves can use common code.  */
+    }
+
   if (REG_P (src) && REG_P (dst) && (REGNO (src) < REGNO (dst)))
     {
       /* Move register range backwards, if we might have destructive
@@ -19227,7 +19393,8 @@ rs6000_handle_altivec_attribute (tree *node,
 
 /* AltiVec defines five built-in scalar types that serve as vector
    elements; we must teach the compiler how to mangle them.  The 128-bit
-   floating point mangling is target-specific as well.  */
+   floating point mangling is target-specific as well.  MMA defines
+   two built-in types to be used as opaque vector types.  */
 
 static const char *
 rs6000_mangle_type (const_tree type)
@@ -19249,6 +19416,9 @@ rs6000_mangle_type (const_tree type)
   if (SCALAR_FLOAT_TYPE_P (type) && FLOAT128_IEEE_P (TYPE_MODE (type)))
     return ieee128_mangling_gcc_8_1 ? "U10__float128" : "u9__ieee128";
 
+  if (type == vector_pair_type_node) return "u13__vector_pair";
+  if (type == vector_quad_type_node) return "u13__vector_quad";
+
   /* For all other types, use the default mangling.  */
   return NULL;
 }
@@ -22506,7 +22676,7 @@ rs6000_function_value (const_tree valtype,
   /* VSX is a superset of Altivec and adds V2DImode/V2DFmode.  Since the same
      return register is used in both cases, and we won't see V2DImode/V2DFmode
      for pure altivec, combine the two cases.  */
-  else if ((TREE_CODE (valtype) == VECTOR_TYPE || FLOAT128_VECTOR_P (mode))
+  else if ((TREE_CODE (valtype) == VECTOR_TYPE || VECTOR_ALIGNMENT_P (mode))
 	   && TARGET_ALTIVEC && TARGET_ALTIVEC_ABI
 	   && ALTIVEC_OR_VSX_VECTOR_MODE (mode))
     regno = ALTIVEC_ARG_RETURN;
@@ -22922,6 +23092,7 @@ static struct rs6000_opt_mask const rs6000_opt_masks[] =
   { "isel",			OPTION_MASK_ISEL,		false, true  },
   { "mfcrf",			OPTION_MASK_MFCRF,		false, true  },
   { "mfpgpr",			0,				false, true  },
+  { "mma",			OPTION_MASK_MMA,		false, true  },
   { "modulo",			OPTION_MASK_MODULO,		false, true  },
   { "mulhw",			OPTION_MASK_MULHW,		false, true  },
   { "multiple",			OPTION_MASK_MULTIPLE,		false, true  },
@@ -22992,6 +23163,8 @@ static struct rs6000_opt_mask const rs6000_builtin_mask_names[] =
   { "powerpc64",	 RS6000_BTM_POWERPC64,  false, false },
   { "float128",		 RS6000_BTM_FLOAT128,   false, false },
   { "float128-hw",	 RS6000_BTM_FLOAT128_HW,false, false },
+  { "mma",		 RS6000_BTM_MMA,	false, false },
+  { "future",		 RS6000_BTM_FUTURE,	false, false },
 };
 
 /* Option variables that we want to support inside attribute((target)) and
@@ -24947,7 +25120,7 @@ address_to_insn_form (rtx addr,
 	non_prefixed_format = NON_PREFIXED_DS;
 
       else if (TARGET_VSX && size >= 16
-	       && (VECTOR_MODE_P (mode) || FLOAT128_VECTOR_P (mode)))
+	       && (VECTOR_MODE_P (mode) || VECTOR_ALIGNMENT_P (mode)))
 	non_prefixed_format = NON_PREFIXED_DQ;
 
       else
@@ -25076,7 +25249,7 @@ reg_to_non_prefixed (rtx reg, machine_mode mode)
 
       else if (TARGET_VSX && size >= 16
 	       && (VECTOR_MODE_P (mode)
-		   || FLOAT128_VECTOR_P (mode)
+		   || VECTOR_ALIGNMENT_P (mode)
 		   || mode == TImode || mode == CTImode))
 	return (TARGET_P9_VECTOR) ? NON_PREFIXED_DQ : NON_PREFIXED_X;
 
@@ -25100,7 +25273,7 @@ reg_to_non_prefixed (rtx reg, machine_mode mode)
 
       else if (TARGET_VSX && size >= 16
 	       && (VECTOR_MODE_P (mode)
-		   || FLOAT128_VECTOR_P (mode)
+		   || VECTOR_ALIGNMENT_P (mode)
 		   || mode == TImode || mode == CTImode))
 	return NON_PREFIXED_DQ;
 
@@ -26494,6 +26667,45 @@ rs6000_cannot_substitute_mem_equiv_p (rtx mem)
   return false;
 }
 
+/* Implement TARGET_INVALID_CONVERSION.  */
+
+static const char *
+rs6000_invalid_conversion (const_tree fromtype, const_tree totype)
+{
+  if (element_mode (fromtype) != element_mode (totype))
+    {
+      /* Do not allow conversions to/from PXImode and POImode types.  */
+      if (TYPE_MODE (fromtype) == PXImode)
+	return N_("invalid conversion from type %<__vector_quad%>");
+      if (TYPE_MODE (totype) == PXImode)
+	return N_("invalid conversion to type %<__vector_quad%>");
+      if (TYPE_MODE (fromtype) == POImode)
+	return N_("invalid conversion from type %<__vector_pair%>");
+      if (TYPE_MODE (totype) == POImode)
+	return N_("invalid conversion to type %<__vector_pair%>");
+    }
+  else if (POINTER_TYPE_P (fromtype) && POINTER_TYPE_P (totype))
+    {
+      /* Do not allow conversions to/from PXImode and POImode pointer
+	 types, except to/from void pointers.  */
+      if (TYPE_MODE (TREE_TYPE (fromtype)) == PXImode
+	  && TYPE_MODE (TREE_TYPE (totype)) != VOIDmode)
+	return N_("invalid conversion from type %<* __vector_quad%>");
+      if (TYPE_MODE (TREE_TYPE (totype)) == PXImode
+	  && TYPE_MODE (TREE_TYPE (fromtype)) != VOIDmode)
+	return N_("invalid conversion to type %<* __vector_quad%>");
+      if (TYPE_MODE (TREE_TYPE (fromtype)) == POImode
+	  && TYPE_MODE (TREE_TYPE (totype)) != VOIDmode)
+	return N_("invalid conversion from type %<* __vector_pair%>");
+      if (TYPE_MODE (TREE_TYPE (totype)) == POImode
+	  && TYPE_MODE (TREE_TYPE (fromtype)) != VOIDmode)
+	return N_("invalid conversion to type %<* __vector_pair%>");
+    }
+
+  /* Conversion allowed.  */
+  return NULL;
+}
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-rs6000.h"
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 1209a33173e..9c103bf8f7d 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -522,6 +522,7 @@ extern int rs6000_vector_align[];
 #define MASK_HTM			OPTION_MASK_HTM
 #define MASK_ISEL			OPTION_MASK_ISEL
 #define MASK_MFCRF			OPTION_MASK_MFCRF
+#define MASK_MMA			OPTION_MASK_MMA
 #define MASK_MULHW			OPTION_MASK_MULHW
 #define MASK_MULTIPLE			OPTION_MASK_MULTIPLE
 #define MASK_NO_UPDATE			OPTION_MASK_NO_UPDATE
@@ -776,7 +777,7 @@ extern unsigned rs6000_pointer_size;
 #define FUNCTION_BOUNDARY 32
 
 /* No data type wants to be aligned rounder than this.  */
-#define BIGGEST_ALIGNMENT 128
+#define BIGGEST_ALIGNMENT ((TARGET_MMA) ? 512 : 128)
 
 /* Alignment of field after `int : 0' in a structure.  */
 #define EMPTY_FIELD_BOUNDARY 32
@@ -1035,16 +1036,17 @@ enum data_align { align_abi, align_opt, align_both };
 	 ((MODE) == V4SFmode		\
 	  || (MODE) == V2DFmode)	\
 
-/* Note KFmode and possibly TFmode (i.e. IEEE 128-bit floating point) are not
-   really a vector, but we want to treat it as a vector for moves, and
-   such.  */
+/* Modes that are not vectors, but require vector alignment.  Treat these like
+   vectors in terms of loads and stores.  */
+#define VECTOR_ALIGNMENT_P(MODE)					\
+  (FLOAT128_VECTOR_P (MODE) || (MODE) == POImode || (MODE) == PXImode)
 
 #define ALTIVEC_VECTOR_MODE(MODE)					\
   ((MODE) == V16QImode							\
    || (MODE) == V8HImode						\
    || (MODE) == V4SFmode						\
    || (MODE) == V4SImode						\
-   || FLOAT128_VECTOR_P (MODE))
+   || VECTOR_ALIGNMENT_P (MODE))
 
 #define ALTIVEC_OR_VSX_VECTOR_MODE(MODE)				\
   (ALTIVEC_VECTOR_MODE (MODE) || VSX_VECTOR_MODE (MODE)			\
@@ -2309,6 +2311,7 @@ extern int frame_pointer_needed;
 #define RS6000_BTM_POWERPC64	MASK_POWERPC64	/* 64-bit registers.  */
 #define RS6000_BTM_FLOAT128	MASK_FLOAT128_KEYWORD /* IEEE 128-bit float.  */
 #define RS6000_BTM_FLOAT128_HW	MASK_FLOAT128_HW /* IEEE 128-bit float h/w.  */
+#define RS6000_BTM_MMA		MASK_MMA	/* ISA 3.1 MMA.  */
 #define RS6000_BTM_FUTURE	MASK_FUTURE
 
 
@@ -2331,7 +2334,9 @@ extern int frame_pointer_needed;
 				 | RS6000_BTM_LDBL128			\
 				 | RS6000_BTM_POWERPC64			\
 				 | RS6000_BTM_FLOAT128			\
-				 | RS6000_BTM_FLOAT128_HW)
+				 | RS6000_BTM_FLOAT128_HW		\
+				 | RS6000_BTM_MMA			\
+				 | RS6000_BTM_FUTURE)
 
 /* Define builtin enum index.  */
 
@@ -2443,6 +2448,8 @@ enum rs6000_builtin_type_index
   RS6000_BTI_ieee128_float,	 /* ieee 128-bit floating point */
   RS6000_BTI_ibm128_float,	 /* IBM 128-bit floating point */
   RS6000_BTI_const_str,		 /* pointer to const char * */
+  RS6000_BTI_vector_pair,	 /* unsigned 256-bit types (vector pair).  */
+  RS6000_BTI_vector_quad,	 /* unsigned 512-bit types (vector quad).  */
   RS6000_BTI_MAX
 };
 
@@ -2495,6 +2502,8 @@ enum rs6000_builtin_type_index
 #define ieee128_float_type_node		 (rs6000_builtin_types[RS6000_BTI_ieee128_float])
 #define ibm128_float_type_node		 (rs6000_builtin_types[RS6000_BTI_ibm128_float])
 #define const_str_type_node		 (rs6000_builtin_types[RS6000_BTI_const_str])
+#define vector_pair_type_node		 (rs6000_builtin_types[RS6000_BTI_vector_pair])
+#define vector_quad_type_node		 (rs6000_builtin_types[RS6000_BTI_vector_quad])
 
 extern GTY(()) tree rs6000_builtin_types[RS6000_BTI_MAX];
 extern GTY(()) tree rs6000_builtin_decls[RS6000_BUILTIN_COUNT];
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 0aa5265d199..6b462a3ecdb 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -322,7 +322,7 @@ (define_attr "cpu"
   (const (symbol_ref "(enum attr_cpu) rs6000_tune")))
 
 ;; The ISA we implement.
-(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9v,p9kf,p9tf,fut"
+(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9v,p9kf,p9tf,fut,mma"
   (const_string "any"))
 
 ;; Is this alternative enabled for the current CPU/ISA/etc.?
@@ -366,6 +366,10 @@ (define_attr "enabled" ""
      (and (eq_attr "isa" "fut")
 	  (match_test "TARGET_FUTURE"))
      (const_int 1)
+
+     (and (eq_attr "isa" "mma")
+	  (match_test "TARGET_MMA"))
+     (const_int 1)
     ] (const_int 0)))
 
 ;; If this instruction is microcoded on the CELL processor
@@ -772,7 +776,8 @@ (define_mode_attr BOOL_REGS_UNARY	[(TI	"r,0,0,wa,v")
 ;; Reload iterator for creating the function to allocate a base register to
 ;; supplement addressing modes.
 (define_mode_iterator RELOAD [V16QI V8HI V4SI V2DI V4SF V2DF V1TI
-			      SF SD SI DF DD DI TI PTI KF IF TF])
+			      SF SD SI DF DD DI TI PTI KF IF TF
+			      POI PXI])
 
 ;; Iterate over smin, smax
 (define_code_iterator fp_minmax	[smin smax])
@@ -14866,6 +14871,7 @@ (define_insn "*cmpeqb_internal"
 (include "vector.md")
 (include "vsx.md")
 (include "altivec.md")
+(include "mma.md")
 (include "dfp.md")
 (include "crypto.md")
 (include "htm.md")
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index f95b8279270..92951483e4e 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -578,3 +578,7 @@ Generate (do not generate) prefixed memory instructions.
 mpcrel
 Target Report Mask(PCREL) Var(rs6000_isa_flags)
 Generate (do not generate) pc-relative memory addressing.
+
+mmma
+Target Report Mask(MMA) Var(rs6000_isa_flags)
+Generate (do not generate) MMA instructions.
diff --git a/gcc/config/rs6000/t-rs6000 b/gcc/config/rs6000/t-rs6000
index 170a69591dd..81d550ce236 100644
--- a/gcc/config/rs6000/t-rs6000
+++ b/gcc/config/rs6000/t-rs6000
@@ -83,6 +83,7 @@ MD_INCLUDES = $(srcdir)/config/rs6000/rs64.md \
 	$(srcdir)/config/rs6000/vector.md \
 	$(srcdir)/config/rs6000/vsx.md \
 	$(srcdir)/config/rs6000/altivec.md \
+	$(srcdir)/config/rs6000/mma.md \
 	$(srcdir)/config/rs6000/crypto.md \
 	$(srcdir)/config/rs6000/htm.md \
 	$(srcdir)/config/rs6000/dfp.md
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 06a04e3d7dd..1452aabe693 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1201,7 +1201,7 @@ See RS/6000 and PowerPC Options.
 -mgnu-attribute  -mno-gnu-attribute @gol
 -mstack-protector-guard=@var{guard} -mstack-protector-guard-reg=@var{reg} @gol
 -mstack-protector-guard-offset=@var{offset} -mprefixed -mno-prefixed @gol
--mpcrel -mno-pcrel}
+-mpcrel -mno-pcrel -mmma -mno-mmma}
 
 @emph{RX Options}
 @gccoptlist{-m64bit-doubles  -m32bit-doubles  -fpu  -nofpu@gol
@@ -25940,7 +25940,8 @@ following options:
 -mpowerpc-gpopt  -mpowerpc-gfxopt @gol
 -mmulhw  -mdlmzb  -mmfpgpr  -mvsx @gol
 -mcrypto  -mhtm  -mpower8-fusion  -mpower8-vector @gol
--mquad-memory  -mquad-memory-atomic  -mfloat128  -mfloat128-hardware}
+-mquad-memory  -mquad-memory-atomic  -mfloat128 @gol
+-mfloat128-hardware -mprefixed -mpcrel -mmma}
 
 The particular options set for any particular CPU varies between
 compiler versions, depending on what setting seems to produce optimal
@@ -26936,6 +26937,13 @@ addressing (@option{-mprefixed}) options are enabled.
 @opindex mno-prefixed
 Generate (do not generate) addressing modes using prefixed load and
 store instructions when the option @option{-mcpu=future} is used.
+
+@item -mmma
+@itemx -mno-mma
+@opindex mmma
+@opindex mno-mma
+Generate (do not generate) the MMA instructions when the option
+@option{-mcpu=future} is used.
 @end table
 
 @node RX Options

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 2/3, v2] rs6000: Add MMA built-in function definitions
  2020-06-18 20:42 [PATCH 0/3, v2] rs6000: Add support for Matrix-Multiply Assist (MMA) built-in functions Peter Bergner
  2020-06-18 20:44 ` [PATCH 1/3, v2] rs6000: Add base support and types for defining MMA built-ins Peter Bergner
@ 2020-06-18 20:45 ` Peter Bergner
  2020-06-19 16:45   ` Segher Boessenkool
  2020-06-18 20:46 ` [PATCH 3/3, v2] rs6000: Add testsuite test cases for MMA built-ins Peter Bergner
  2020-06-24 19:28 ` [PATCH 0/3, v2] rs6000: Add support for Matrix-Multiply Assist (MMA) built-in functions Peter Bergner
  3 siblings, 1 reply; 19+ messages in thread
From: Peter Bergner @ 2020-06-18 20:45 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: GCC Patches, Bill Schmidt, Michael Meissner, David Edelsohn,
	Will Schmidt

Changes since v1:
  - Updated ChangeLog entry per Segher's suggestion.
  - Updated doc/extend.texi with correct built-in names for
    __builtin_vsx_xvcvspbf16 and __builtin_vsx_xvcvbf16sp.

This patches adds the actual MMA built-ins.  The MMA accumulators are INOUT
operands for most MMA instructions, but they are also very expensive to
move around.  For this reason, we have implemented a built-in API
where the accumulators are passed using pass-by-reference/pointers, so
the user won't use one accumulator as input and another as output,
which would entail a lot of copies.  However, using pointers gives us
poor code generation when we expand the built-ins at normal expand time.
We therefore expand the MMA built-ins early into gimple, converting
the pass-by-reference calls to an internal built-in that uses pass-by-value
calling convention, where we can enforce the input and output accumulators
are the same.  This gives us much better code generation.

The associated test cases for these built-ins are in patch3.

This patch plus patch1 passed bootstrap and regtesting with no regressions
on both powerpc64le-linux and powerpc64-linux.  Ok for trunk?

The v1 patch passed bootstrap and regtesting with no regressions on both
powerpc64le-linux and powerpc64-linux.  This updated patch + patch1 is
bootstrapping and regtesting on powerpc64{,64}-linux.  Ok for trunk if
there are no regressions?

Peter

2020-06-18  Peter Bergner  <bergner@linux.ibm.com>

gcc/
	* config/rs6000/predicates.md (mma_input_operand): New predicate.
	* config/rs6000/rs6000-builtin.def (BU_MMA_1, BU_MMA_V2, BU_MMA_3,
	BU_MMA_5, BU_MMA_6, BU_VSX_1): Add support macros for defining MMA
	built-in functions.
	(ASSEMBLE_ACC, ASSEMBLE_PAIR, DISASSEMBLE_ACC, DISASSEMBLE_PAIR,
	PMXVBF16GER2, PMXVBF16GER2NN, PMXVBF16GER2NP, PMXVBF16GER2PN,
	PMXVBF16GER2PP, PMXVF16GER2, PMXVF16GER2NN, PMXVF16GER2NP,
	PMXVF16GER2PN, PMXVF16GER2PP, PMXVF32GER, PMXVF32GERNN,
	PMXVF32GERNP, PMXVF32GERPN, PMXVF32GERPP, PMXVF64GER, PMXVF64GERNN,
	PMXVF64GERNP, PMXVF64GERPN, PMXVF64GERPP, PMXVI16GER2, PMXVI16GER2PP,
	PMXVI16GER2S, PMXVI16GER2SPP, PMXVI4GER8, PMXVI4GER8PP, PMXVI8GER4,
	PMXVI8GER4PP, PMXVI8GER4SPP, XVBF16GER2, XVBF16GER2NN, XVBF16GER2NP,
	XVBF16GER2PN, XVBF16GER2PP, XVCVBF16SP, XVCVSPBF16, XVF16GER2,
	XVF16GER2NN, XVF16GER2NP, XVF16GER2PN, XVF16GER2PP, XVF32GER,
	XVF32GERNN, XVF32GERNP, XVF32GERPN, XVF32GERPP, XVF64GER, XVF64GERNN,
	XVF64GERNP, XVF64GERPN, XVF64GERPP, XVI16GER2, XVI16GER2PP, XVI16GER2S,
	XVI16GER2SPP, XVI4GER8, XVI4GER8PP, XVI8GER4, XVI8GER4PP, XVI8GER4SPP,
	XXMFACC, XXMTACC, XXSETACCZ): Add MMA built-ins.
	* config/rs6000/rs6000.c (rs6000_emit_move): Allow zero constants.
	(print_operand) <case 'A'>: New output modifier.
	(rs6000_split_multireg_move): Add support for inserting accumulator
	priming and depriming instructions.  Add support for splitting an
	assemble accumulator pattern.
	* config/rs6000/rs6000-call.c (mma_init_builtins, mma_expand_builtin,
	rs6000_gimple_fold_mma_builtin): New functions.
	(RS6000_BUILTIN_M): New macro.
	(def_builtin): Handle RS6000_BTC_QUAD and RS6000_BTC_PAIR attributes.
	(bdesc_mma): Add new MMA built-in support.
	(htm_expand_builtin): Use RS6000_BTC_OPND_MASK.
	(rs6000_invalid_builtin): Add handling of RS6000_BTM_FUTURE and
	RS6000_BTM_MMA.
	(rs6000_builtin_valid_without_lhs): Handle RS6000_BTC_VOID attribute.
	(rs6000_gimple_fold_builtin): Call rs6000_builtin_is_supported_p
	and rs6000_gimple_fold_mma_builtin.
	(rs6000_expand_builtin): Call mma_expand_builtin.
	Use RS6000_BTC_OPND_MASK.
	(rs6000_init_builtins): Adjust comment.  Call mma_init_builtins.
	(htm_init_builtins): Use RS6000_BTC_OPND_MASK.
	(builtin_function_type): Handle VSX_BUILTIN_XVCVSPBF16 and
	VSX_BUILTIN_XVCVBF16SP.
	* config/rs6000/rs6000.h (RS6000_BTC_QUINARY, RS6000_BTC_SENARY,
	RS6000_BTC_OPND_MASK, RS6000_BTC_QUAD, RS6000_BTC_PAIR,
	RS6000_BTC_QUADPAIR, RS6000_BTC_GIMPLE): New defines.
	(RS6000_BTC_PREDICATE, RS6000_BTC_ABS, RS6000_BTC_DST,
	RS6000_BTC_TYPE_MASK, RS6000_BTC_ATTR_MASK): Adjust values.
	* config/rs6000/mma.md (MAX_MMA_OPERANDS): New define_constant.
	(UNSPEC_MMA_ASSEMBLE_ACC, UNSPEC_MMA_PMXVBF16GER2,
	UNSPEC_MMA_PMXVBF16GER2NN, UNSPEC_MMA_PMXVBF16GER2NP,
	UNSPEC_MMA_PMXVBF16GER2PN, UNSPEC_MMA_PMXVBF16GER2PP,
	UNSPEC_MMA_PMXVF16GER2, UNSPEC_MMA_PMXVF16GER2NN,
	UNSPEC_MMA_PMXVF16GER2NP, UNSPEC_MMA_PMXVF16GER2PN,
	UNSPEC_MMA_PMXVF16GER2PP, UNSPEC_MMA_PMXVF32GER,
	UNSPEC_MMA_PMXVF32GERNN, UNSPEC_MMA_PMXVF32GERNP,
	UNSPEC_MMA_PMXVF32GERPN, UNSPEC_MMA_PMXVF32GERPP,
	UNSPEC_MMA_PMXVF64GER, UNSPEC_MMA_PMXVF64GERNN,
	UNSPEC_MMA_PMXVF64GERNP, UNSPEC_MMA_PMXVF64GERPN,
	UNSPEC_MMA_PMXVF64GERPP, UNSPEC_MMA_PMXVI16GER2,
	UNSPEC_MMA_PMXVI16GER2PP, UNSPEC_MMA_PMXVI16GER2S,
	UNSPEC_MMA_PMXVI16GER2SPP, UNSPEC_MMA_PMXVI4GER8,
	UNSPEC_MMA_PMXVI4GER8PP, UNSPEC_MMA_PMXVI8GER4,
	UNSPEC_MMA_PMXVI8GER4PP, UNSPEC_MMA_PMXVI8GER4SPP,
	UNSPEC_MMA_XVBF16GER2, UNSPEC_MMA_XVBF16GER2NN,
	UNSPEC_MMA_XVBF16GER2NP, UNSPEC_MMA_XVBF16GER2PN,
	UNSPEC_MMA_XVBF16GER2PP, UNSPEC_MMA_XVF16GER2, UNSPEC_MMA_XVF16GER2NN,
	UNSPEC_MMA_XVF16GER2NP, UNSPEC_MMA_XVF16GER2PN, UNSPEC_MMA_XVF16GER2PP,
	UNSPEC_MMA_XVF32GER, UNSPEC_MMA_XVF32GERNN, UNSPEC_MMA_XVF32GERNP,
	UNSPEC_MMA_XVF32GERPN, UNSPEC_MMA_XVF32GERPP, UNSPEC_MMA_XVF64GER,
	UNSPEC_MMA_XVF64GERNN, UNSPEC_MMA_XVF64GERNP, UNSPEC_MMA_XVF64GERPN,
	UNSPEC_MMA_XVF64GERPP, UNSPEC_MMA_XVI16GER2, UNSPEC_MMA_XVI16GER2PP,
	UNSPEC_MMA_XVI16GER2S, UNSPEC_MMA_XVI16GER2SPP, UNSPEC_MMA_XVI4GER8,
	UNSPEC_MMA_XVI4GER8PP, UNSPEC_MMA_XVI8GER4, UNSPEC_MMA_XVI8GER4PP,
	UNSPEC_MMA_XVI8GER4SPP, UNSPEC_MMA_XXMFACC, UNSPEC_MMA_XXMTACC): New.
	(MMA_ACC, MMA_VV, MMA_AVV, MMA_PV, MMA_APV, MMA_VVI4I4I8,
	MMA_AVVI4I4I8, MMA_VVI4I4I2, MMA_AVVI4I4I2, MMA_VVI4I4,
	MMA_AVVI4I4, MMA_PVI4I2, MMA_APVI4I2, MMA_VVI4I4I4,
	MMA_AVVI4I4I4): New define_int_iterator.
	(acc, vv, avv, pv, apv, vvi4i4i8, avvi4i4i8, vvi4i4i2,
	avvi4i4i2, vvi4i4, avvi4i4, pvi4i2, apvi4i2, vvi4i4i4,
	avvi4i4i4): New define_int_attr.
	(*movpxi): Add zero constant alternative.
	(mma_assemble_pair, mma_assemble_acc): New define_expand.
	(*mma_assemble_acc): New define_insn_and_split.
	(mma_<acc>, mma_xxsetaccz, mma_<vv>, mma_<avv>, mma_<pv>, mma_<apv>,
	mma_<vvi4i4i8>, mma_<avvi4i4i8>, mma_<vvi4i4i2>, mma_<avvi4i4i2>,
	mma_<vvi4i4>, mma_<avvi4i4>, mma_<pvi4i2>, mma_<apvi4i2>,
	mma_<vvi4i4i4>, mma_<avvi4i4i4>): New define_insn.
	* config/rs6000/rs6000.md (define_attr "type"): New type mma.
	* config/rs6000/vsx.md (UNSPEC_VSX_XVCVBF16SP): New.
	(UNSPEC_VSX_XVCVSPBF16): Likewise.
	(XVCVBF16): New define_int_iterator.
	(xvcvbf16): New define_int_attr.
	(vsx_<xvcvbf16>): New define_insn.
	* doc/extend.texi: Document the mma built-ins.

diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index c3f460face2..4e37ce35c5d 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -1119,6 +1119,12 @@ (define_predicate "splat_input_operand"
   return gpc_reg_operand (op, mode);
 })
 
+;; Return 1 if this operand is valid for a MMA assemble accumulator insn.
+(define_special_predicate "mma_input_operand"
+  (match_test "(mode == PXImode
+		&& (GET_MODE (op) == V16QImode)
+		&& (vsx_register_operand (op, GET_MODE (op)) || MEM_P (op)))"))
+
 ;; Return true if operand is an operator used in rotate-and-mask instructions.
 (define_predicate "rotate_mask_operator"
   (match_code "rotate,ashift,lshiftrt"))
diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def
index 8b1ddb00045..968c46cc36f 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -32,6 +32,7 @@
    RS6000_BUILTIN_A -- ABS builtins
    RS6000_BUILTIN_D -- DST builtins
    RS6000_BUILTIN_H -- HTM builtins
+   RS6000_BUILTIN_M -- MMA builtins
    RS6000_BUILTIN_P -- Altivec, VSX, ISA 2.07 vector predicate builtins
    RS6000_BUILTIN_X -- special builtins
 
@@ -74,6 +75,10 @@
   #error "RS6000_BUILTIN_H is not defined."
 #endif
 
+#ifndef RS6000_BUILTIN_M
+  #error "RS6000_BUILTIN_M is not defined."
+#endif
+
 #ifndef RS6000_BUILTIN_P
   #error "RS6000_BUILTIN_P is not defined."
 #endif
@@ -329,6 +334,82 @@
 		     | RS6000_BTC_SPECIAL),				\
 		    CODE_FOR_nothing)			/* ICODE */
 
+/* MMA convenience macros.  */
+
+#define BU_MMA_1(ENUM, NAME, ATTR, ICODE)				\
+  RS6000_BUILTIN_M (MMA_BUILTIN_ ## ENUM,		/* ENUM */	\
+		    "__builtin_mma_" NAME,		/* NAME */	\
+		    RS6000_BTM_MMA,			/* MASK */	\
+		    (RS6000_BTC_ ## ATTR		/* ATTR */	\
+		     | RS6000_BTC_UNARY					\
+		     | RS6000_BTC_VOID					\
+		     | RS6000_BTC_GIMPLE),				\
+		    CODE_FOR_nothing)			/* ICODE */	\
+  RS6000_BUILTIN_M (MMA_BUILTIN_ ## ENUM ## _INTERNAL,	/* ENUM */	\
+		    "__builtin_mma_" NAME "_internal",	/* NAME */	\
+		    RS6000_BTM_MMA,			/* MASK */	\
+		    (RS6000_BTC_ ## ATTR		/* ATTR */	\
+		     | RS6000_BTC_UNARY),				\
+		    CODE_FOR_ ## ICODE)			/* ICODE */
+
+#define BU_MMA_V2(ENUM, NAME, ATTR, ICODE)				\
+  RS6000_BUILTIN_M (MMA_BUILTIN_ ## ENUM,		/* ENUM */	\
+		    "__builtin_mma_" NAME,		/* NAME */	\
+		    RS6000_BTM_MMA,			/* MASK */	\
+		    (RS6000_BTC_ ## ATTR		/* ATTR */	\
+		     | RS6000_BTC_BINARY				\
+		     | RS6000_BTC_VOID					\
+		     | RS6000_BTC_GIMPLE),				\
+		    CODE_FOR_nothing)			/* ICODE */
+
+#define BU_MMA_3(ENUM, NAME, ATTR, ICODE)				\
+  RS6000_BUILTIN_M (MMA_BUILTIN_ ## ENUM,		/* ENUM */	\
+		    "__builtin_mma_" NAME,		/* NAME */	\
+		    RS6000_BTM_MMA,			/* MASK */	\
+		    (RS6000_BTC_ ## ATTR		/* ATTR */	\
+		     | RS6000_BTC_TERNARY				\
+		     | RS6000_BTC_VOID					\
+		     | RS6000_BTC_GIMPLE),				\
+		    CODE_FOR_nothing)			/* ICODE */	\
+  RS6000_BUILTIN_M (MMA_BUILTIN_ ## ENUM ## _INTERNAL,	/* ENUM */	\
+		    "__builtin_mma_" NAME "_internal",	/* NAME */	\
+		    RS6000_BTM_MMA,			/* MASK */	\
+		    (RS6000_BTC_ ## ATTR		/* ATTR */	\
+		     | RS6000_BTC_TERNARY),				\
+		    CODE_FOR_ ## ICODE)			/* ICODE */
+
+#define BU_MMA_5(ENUM, NAME, ATTR, ICODE)				\
+  RS6000_BUILTIN_M (MMA_BUILTIN_ ## ENUM,		/* ENUM */	\
+		    "__builtin_mma_" NAME,		/* NAME */	\
+		    RS6000_BTM_MMA,			/* MASK */	\
+		    (RS6000_BTC_ ## ATTR		/* ATTR */	\
+		     | RS6000_BTC_QUINARY				\
+		     | RS6000_BTC_VOID					\
+		     | RS6000_BTC_GIMPLE),				\
+		    CODE_FOR_nothing)			/* ICODE */	\
+  RS6000_BUILTIN_M (MMA_BUILTIN_ ## ENUM ## _INTERNAL,	/* ENUM */	\
+		    "__builtin_mma_" NAME "_internal",	/* NAME */	\
+		    RS6000_BTM_MMA,			/* MASK */	\
+		    (RS6000_BTC_ ## ATTR		/* ATTR */	\
+		     | RS6000_BTC_QUINARY),				\
+		    CODE_FOR_ ## ICODE)			/* ICODE */
+
+#define BU_MMA_6(ENUM, NAME, ATTR, ICODE)				\
+  RS6000_BUILTIN_M (MMA_BUILTIN_ ## ENUM,		/* ENUM */	\
+		    "__builtin_mma_" NAME,		/* NAME */	\
+		    RS6000_BTM_MMA,			/* MASK */	\
+		    (RS6000_BTC_ ## ATTR		/* ATTR */	\
+		     | RS6000_BTC_SENARY				\
+		     | RS6000_BTC_VOID					\
+		     | RS6000_BTC_GIMPLE),				\
+		    CODE_FOR_nothing)			/* ICODE */	\
+  RS6000_BUILTIN_M (MMA_BUILTIN_ ## ENUM ## _INTERNAL,	/* ENUM */	\
+		    "__builtin_mma_" NAME "_internal",	/* NAME */	\
+		    RS6000_BTM_MMA,			/* MASK */	\
+		    (RS6000_BTC_ ## ATTR		/* ATTR */	\
+		     | RS6000_BTC_SENARY),				\
+		    CODE_FOR_ ## ICODE)			/* ICODE */
+
 /* ISA 2.05 (power6) convenience macros. */
 /* For functions that depend on the CMPB instruction */
 #define BU_P6_2(ENUM, NAME, ATTR, ICODE)				\
@@ -2785,3 +2866,77 @@ BU_SPECIAL_X (RS6000_BUILTIN_CPU_SUPPORTS, "__builtin_cpu_supports",
 /* Darwin CfString builtin.  */
 BU_SPECIAL_X (RS6000_BUILTIN_CFSTRING, "__builtin_cfstring", RS6000_BTM_ALWAYS,
 	      RS6000_BTC_MISC)
+
+/* FUTURE MMA builtins.  */
+BU_VSX_1 (XVCVBF16SP,	    "xvcvbf16sp",	MISC, vsx_xvcvbf16sp)
+BU_VSX_1 (XVCVSPBF16,	    "xvcvspbf16",	MISC, vsx_xvcvspbf16)
+
+BU_MMA_1 (XXMFACC,	    "xxmfacc",		QUAD, mma_xxmfacc)
+BU_MMA_1 (XXMTACC,	    "xxmtacc",		QUAD, mma_xxmtacc)
+BU_MMA_1 (XXSETACCZ,	    "xxsetaccz",	MISC, mma_xxsetaccz)
+
+BU_MMA_V2 (DISASSEMBLE_ACC, "disassemble_acc",  QUAD, nothing)
+BU_MMA_V2 (DISASSEMBLE_PAIR,"disassemble_pair", PAIR, nothing)
+
+BU_MMA_3 (ASSEMBLE_PAIR,    "assemble_pair",	MISC, mma_assemble_pair)
+BU_MMA_3 (XVBF16GER2,	    "xvbf16ger2",	MISC, mma_xvbf16ger2)
+BU_MMA_3 (XVF16GER2,	    "xvf16ger2",	MISC, mma_xvf16ger2)
+BU_MMA_3 (XVF32GER,	    "xvf32ger",		MISC, mma_xvf32ger)
+BU_MMA_3 (XVF64GER,	    "xvf64ger",		PAIR, mma_xvf64ger)
+BU_MMA_3 (XVI4GER8,	    "xvi4ger8",		MISC, mma_xvi4ger8)
+BU_MMA_3 (XVI8GER4,	    "xvi8ger4",		MISC, mma_xvi8ger4)
+BU_MMA_3 (XVI16GER2,	    "xvi16ger2",	MISC, mma_xvi16ger2)
+BU_MMA_3 (XVI16GER2S,	    "xvi16ger2s",	MISC, mma_xvi16ger2s)
+BU_MMA_3 (XVBF16GER2NN,	    "xvbf16ger2nn",     QUAD, mma_xvbf16ger2nn)
+BU_MMA_3 (XVBF16GER2NP,	    "xvbf16ger2np",     QUAD, mma_xvbf16ger2np)
+BU_MMA_3 (XVBF16GER2PN,	    "xvbf16ger2pn",     QUAD, mma_xvbf16ger2pn)
+BU_MMA_3 (XVBF16GER2PP,	    "xvbf16ger2pp",     QUAD, mma_xvbf16ger2pp)
+BU_MMA_3 (XVF16GER2NN,	    "xvf16ger2nn",      QUAD, mma_xvf16ger2nn)
+BU_MMA_3 (XVF16GER2NP,	    "xvf16ger2np",      QUAD, mma_xvf16ger2np)
+BU_MMA_3 (XVF16GER2PN,	    "xvf16ger2pn",      QUAD, mma_xvf16ger2pn)
+BU_MMA_3 (XVF16GER2PP,	    "xvf16ger2pp",      QUAD, mma_xvf16ger2pp)
+BU_MMA_3 (XVF32GERNN,	    "xvf32gernn",       QUAD, mma_xvf32gernn)
+BU_MMA_3 (XVF32GERNP,	    "xvf32gernp",       QUAD, mma_xvf32gernp)
+BU_MMA_3 (XVF32GERPN,	    "xvf32gerpn",       QUAD, mma_xvf32gerpn)
+BU_MMA_3 (XVF32GERPP,	    "xvf32gerpp",       QUAD, mma_xvf32gerpp)
+BU_MMA_3 (XVF64GERNN,	    "xvf64gernn",       QUADPAIR, mma_xvf64gernn)
+BU_MMA_3 (XVF64GERNP,	    "xvf64gernp",       QUADPAIR, mma_xvf64gernp)
+BU_MMA_3 (XVF64GERPN,	    "xvf64gerpn",       QUADPAIR, mma_xvf64gerpn)
+BU_MMA_3 (XVF64GERPP,	    "xvf64gerpp",       QUADPAIR, mma_xvf64gerpp)
+BU_MMA_3 (XVI4GER8PP,	    "xvi4ger8pp",	QUAD, mma_xvi4ger8pp)
+BU_MMA_3 (XVI8GER4PP,	    "xvi8ger4pp",       QUAD, mma_xvi8ger4pp)
+BU_MMA_3 (XVI8GER4SPP,	    "xvi8ger4spp",      QUAD, mma_xvi8ger4spp)
+BU_MMA_3 (XVI16GER2PP,	    "xvi16ger2pp",      QUAD, mma_xvi16ger2pp)
+BU_MMA_3 (XVI16GER2SPP,	    "xvi16ger2spp",     QUAD, mma_xvi16ger2spp)
+
+BU_MMA_5 (ASSEMBLE_ACC,     "assemble_acc",	MISC, mma_assemble_acc)
+BU_MMA_5 (PMXVF32GER,	    "pmxvf32ger",       MISC, mma_pmxvf32ger)
+BU_MMA_5 (PMXVF64GER,	    "pmxvf64ger",       PAIR, mma_pmxvf64ger)
+BU_MMA_5 (PMXVF32GERNN,	    "pmxvf32gernn",     QUAD, mma_pmxvf32gernn)
+BU_MMA_5 (PMXVF32GERNP,	    "pmxvf32gernp",     QUAD, mma_pmxvf32gernp)
+BU_MMA_5 (PMXVF32GERPN,	    "pmxvf32gerpn",     QUAD, mma_pmxvf32gerpn)
+BU_MMA_5 (PMXVF32GERPP,	    "pmxvf32gerpp",     QUAD, mma_pmxvf32gerpp)
+BU_MMA_5 (PMXVF64GERNN,	    "pmxvf64gernn",     QUADPAIR, mma_pmxvf64gernn)
+BU_MMA_5 (PMXVF64GERNP,	    "pmxvf64gernp",     QUADPAIR, mma_pmxvf64gernp)
+BU_MMA_5 (PMXVF64GERPN,	    "pmxvf64gerpn",     QUADPAIR, mma_pmxvf64gerpn)
+BU_MMA_5 (PMXVF64GERPP,	    "pmxvf64gerpp",     QUADPAIR, mma_pmxvf64gerpp)
+
+BU_MMA_6 (PMXVBF16GER2,	    "pmxvbf16ger2",     MISC, mma_pmxvbf16ger2)
+BU_MMA_6 (PMXVF16GER2,	    "pmxvf16ger2",      MISC, mma_pmxvf16ger2)
+BU_MMA_6 (PMXVI4GER8,	    "pmxvi4ger8",       MISC, mma_pmxvi4ger8)
+BU_MMA_6 (PMXVI8GER4,	    "pmxvi8ger4",	MISC, mma_pmxvi8ger4)
+BU_MMA_6 (PMXVI16GER2,	    "pmxvi16ger2",      MISC, mma_pmxvi16ger2)
+BU_MMA_6 (PMXVI16GER2S,	    "pmxvi16ger2s",     MISC, mma_pmxvi16ger2s)
+BU_MMA_6 (PMXVBF16GER2NN,   "pmxvbf16ger2nn",   QUAD, mma_pmxvbf16ger2nn)
+BU_MMA_6 (PMXVBF16GER2NP,   "pmxvbf16ger2np",   QUAD, mma_pmxvbf16ger2np)
+BU_MMA_6 (PMXVBF16GER2PN,   "pmxvbf16ger2pn",   QUAD, mma_pmxvbf16ger2pn)
+BU_MMA_6 (PMXVBF16GER2PP,   "pmxvbf16ger2pp",   QUAD, mma_pmxvbf16ger2pp)
+BU_MMA_6 (PMXVF16GER2NN,    "pmxvf16ger2nn",    QUAD, mma_pmxvf16ger2nn)
+BU_MMA_6 (PMXVF16GER2NP,    "pmxvf16ger2np",    QUAD, mma_pmxvf16ger2np)
+BU_MMA_6 (PMXVF16GER2PN,    "pmxvf16ger2pn",    QUAD, mma_pmxvf16ger2pn)
+BU_MMA_6 (PMXVF16GER2PP,    "pmxvf16ger2pp",    QUAD, mma_pmxvf16ger2pp)
+BU_MMA_6 (PMXVI4GER8PP,	    "pmxvi4ger8pp",     QUAD, mma_pmxvi4ger8pp)
+BU_MMA_6 (PMXVI8GER4PP,	    "pmxvi8ger4pp",	QUAD, mma_pmxvi8ger4pp)
+BU_MMA_6 (PMXVI8GER4SPP,    "pmxvi8ger4spp",	QUAD, mma_pmxvi8ger4spp)
+BU_MMA_6 (PMXVI16GER2PP,    "pmxvi16ger2pp",    QUAD, mma_pmxvi16ger2pp)
+BU_MMA_6 (PMXVI16GER2SPP,   "pmxvi16ger2spp",   QUAD, mma_pmxvi16ger2spp)
diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index eeb20e5200d..d47c3a3aeb0 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -183,6 +183,7 @@ static tree builtin_function_type (machine_mode, machine_mode,
 				   enum rs6000_builtins, const char *name);
 static void rs6000_common_init_builtins (void);
 static void htm_init_builtins (void);
+static void mma_init_builtins (void);
 
 
 /* Hash table to keep track of the argument types for builtin functions.  */
@@ -243,6 +244,7 @@ builtin_hasher::equal (builtin_hash_struct *p1, builtin_hash_struct *p2)
 #undef RS6000_BUILTIN_A
 #undef RS6000_BUILTIN_D
 #undef RS6000_BUILTIN_H
+#undef RS6000_BUILTIN_M
 #undef RS6000_BUILTIN_P
 #undef RS6000_BUILTIN_X
 
@@ -270,6 +272,9 @@ builtin_hasher::equal (builtin_hash_struct *p1, builtin_hash_struct *p2)
 #define RS6000_BUILTIN_H(ENUM, NAME, MASK, ATTR, ICODE)  \
   { NAME, ICODE, MASK, ATTR },
 
+#define RS6000_BUILTIN_M(ENUM, NAME, MASK, ATTR, ICODE)  \
+  { NAME, ICODE, MASK, ATTR },
+
 #define RS6000_BUILTIN_P(ENUM, NAME, MASK, ATTR, ICODE)  \
   { NAME, ICODE, MASK, ATTR },
 
@@ -296,6 +301,7 @@ static const struct rs6000_builtin_info_type rs6000_builtin_info[] =
 #undef RS6000_BUILTIN_A
 #undef RS6000_BUILTIN_D
 #undef RS6000_BUILTIN_H
+#undef RS6000_BUILTIN_M
 #undef RS6000_BUILTIN_P
 #undef RS6000_BUILTIN_X
 
@@ -8354,6 +8360,9 @@ def_builtin (const char *name, tree type, enum rs6000_builtins code)
 	  attr_string = ", fp, const";
 	}
     }
+  else if ((classify & (RS6000_BTC_QUAD | RS6000_BTC_PAIR)) != 0)
+    /* The function uses a register quad and/or pair.  Nothing to do.  */
+    ;
   else if ((classify & RS6000_BTC_ATTR_MASK) != 0)
     gcc_unreachable ();
 
@@ -8372,6 +8381,7 @@ def_builtin (const char *name, tree type, enum rs6000_builtins code)
 #undef RS6000_BUILTIN_A
 #undef RS6000_BUILTIN_D
 #undef RS6000_BUILTIN_H
+#undef RS6000_BUILTIN_M
 #undef RS6000_BUILTIN_P
 #undef RS6000_BUILTIN_X
 
@@ -8385,6 +8395,7 @@ def_builtin (const char *name, tree type, enum rs6000_builtins code)
 #define RS6000_BUILTIN_A(ENUM, NAME, MASK, ATTR, ICODE)
 #define RS6000_BUILTIN_D(ENUM, NAME, MASK, ATTR, ICODE)
 #define RS6000_BUILTIN_H(ENUM, NAME, MASK, ATTR, ICODE)
+#define RS6000_BUILTIN_M(ENUM, NAME, MASK, ATTR, ICODE)
 #define RS6000_BUILTIN_P(ENUM, NAME, MASK, ATTR, ICODE)
 #define RS6000_BUILTIN_X(ENUM, NAME, MASK, ATTR, ICODE)
 
@@ -8403,6 +8414,7 @@ static const struct builtin_description bdesc_3arg[] =
 #undef RS6000_BUILTIN_A
 #undef RS6000_BUILTIN_D
 #undef RS6000_BUILTIN_H
+#undef RS6000_BUILTIN_M
 #undef RS6000_BUILTIN_P
 #undef RS6000_BUILTIN_X
 
@@ -8416,6 +8428,7 @@ static const struct builtin_description bdesc_3arg[] =
 #define RS6000_BUILTIN_A(ENUM, NAME, MASK, ATTR, ICODE)
 #define RS6000_BUILTIN_D(ENUM, NAME, MASK, ATTR, ICODE)
 #define RS6000_BUILTIN_H(ENUM, NAME, MASK, ATTR, ICODE)
+#define RS6000_BUILTIN_M(ENUM, NAME, MASK, ATTR, ICODE)
 #define RS6000_BUILTIN_P(ENUM, NAME, MASK, ATTR, ICODE)
 #define RS6000_BUILTIN_X(ENUM, NAME, MASK, ATTR, ICODE)
 
@@ -8434,6 +8447,7 @@ static const struct builtin_description bdesc_4arg[] =
 #undef RS6000_BUILTIN_A
 #undef RS6000_BUILTIN_D
 #undef RS6000_BUILTIN_H
+#undef RS6000_BUILTIN_M
 #undef RS6000_BUILTIN_P
 #undef RS6000_BUILTIN_X
 
@@ -8447,6 +8461,7 @@ static const struct builtin_description bdesc_4arg[] =
   { MASK, ICODE, NAME, ENUM },
 
 #define RS6000_BUILTIN_H(ENUM, NAME, MASK, ATTR, ICODE)
+#define RS6000_BUILTIN_M(ENUM, NAME, MASK, ATTR, ICODE)
 #define RS6000_BUILTIN_P(ENUM, NAME, MASK, ATTR, ICODE)
 #define RS6000_BUILTIN_X(ENUM, NAME, MASK, ATTR, ICODE)
 
@@ -8465,6 +8480,7 @@ static const struct builtin_description bdesc_dst[] =
 #undef RS6000_BUILTIN_A
 #undef RS6000_BUILTIN_D
 #undef RS6000_BUILTIN_H
+#undef RS6000_BUILTIN_M
 #undef RS6000_BUILTIN_P
 #undef RS6000_BUILTIN_X
 
@@ -8478,6 +8494,7 @@ static const struct builtin_description bdesc_dst[] =
 #define RS6000_BUILTIN_A(ENUM, NAME, MASK, ATTR, ICODE)
 #define RS6000_BUILTIN_D(ENUM, NAME, MASK, ATTR, ICODE)
 #define RS6000_BUILTIN_H(ENUM, NAME, MASK, ATTR, ICODE)
+#define RS6000_BUILTIN_M(ENUM, NAME, MASK, ATTR, ICODE)
 #define RS6000_BUILTIN_P(ENUM, NAME, MASK, ATTR, ICODE)
 #define RS6000_BUILTIN_X(ENUM, NAME, MASK, ATTR, ICODE)
 
@@ -8494,6 +8511,7 @@ static const struct builtin_description bdesc_2arg[] =
 #undef RS6000_BUILTIN_A
 #undef RS6000_BUILTIN_D
 #undef RS6000_BUILTIN_H
+#undef RS6000_BUILTIN_M
 #undef RS6000_BUILTIN_P
 #undef RS6000_BUILTIN_X
 
@@ -8505,6 +8523,7 @@ static const struct builtin_description bdesc_2arg[] =
 #define RS6000_BUILTIN_A(ENUM, NAME, MASK, ATTR, ICODE)
 #define RS6000_BUILTIN_D(ENUM, NAME, MASK, ATTR, ICODE)
 #define RS6000_BUILTIN_H(ENUM, NAME, MASK, ATTR, ICODE)
+#define RS6000_BUILTIN_M(ENUM, NAME, MASK, ATTR, ICODE)
 #define RS6000_BUILTIN_P(ENUM, NAME, MASK, ATTR, ICODE) \
   { MASK, ICODE, NAME, ENUM },
 
@@ -8527,6 +8546,7 @@ static const struct builtin_description bdesc_altivec_preds[] =
 #undef RS6000_BUILTIN_A
 #undef RS6000_BUILTIN_D
 #undef RS6000_BUILTIN_H
+#undef RS6000_BUILTIN_M
 #undef RS6000_BUILTIN_P
 #undef RS6000_BUILTIN_X
 
@@ -8540,6 +8560,7 @@ static const struct builtin_description bdesc_altivec_preds[] =
 
 #define RS6000_BUILTIN_D(ENUM, NAME, MASK, ATTR, ICODE)
 #define RS6000_BUILTIN_H(ENUM, NAME, MASK, ATTR, ICODE)
+#define RS6000_BUILTIN_M(ENUM, NAME, MASK, ATTR, ICODE)
 #define RS6000_BUILTIN_P(ENUM, NAME, MASK, ATTR, ICODE)
 #define RS6000_BUILTIN_X(ENUM, NAME, MASK, ATTR, ICODE)
 
@@ -8559,6 +8580,7 @@ static const struct builtin_description bdesc_abs[] =
 #undef RS6000_BUILTIN_A
 #undef RS6000_BUILTIN_D
 #undef RS6000_BUILTIN_H
+#undef RS6000_BUILTIN_M
 #undef RS6000_BUILTIN_P
 #undef RS6000_BUILTIN_X
 
@@ -8572,6 +8594,7 @@ static const struct builtin_description bdesc_abs[] =
 #define RS6000_BUILTIN_A(ENUM, NAME, MASK, ATTR, ICODE)
 #define RS6000_BUILTIN_D(ENUM, NAME, MASK, ATTR, ICODE)
 #define RS6000_BUILTIN_H(ENUM, NAME, MASK, ATTR, ICODE)
+#define RS6000_BUILTIN_M(ENUM, NAME, MASK, ATTR, ICODE)
 #define RS6000_BUILTIN_P(ENUM, NAME, MASK, ATTR, ICODE)
 #define RS6000_BUILTIN_X(ENUM, NAME, MASK, ATTR, ICODE)
 
@@ -8590,6 +8613,7 @@ static const struct builtin_description bdesc_1arg[] =
 #undef RS6000_BUILTIN_A
 #undef RS6000_BUILTIN_D
 #undef RS6000_BUILTIN_H
+#undef RS6000_BUILTIN_M
 #undef RS6000_BUILTIN_P
 #undef RS6000_BUILTIN_X
 
@@ -8603,6 +8627,7 @@ static const struct builtin_description bdesc_1arg[] =
 #define RS6000_BUILTIN_A(ENUM, NAME, MASK, ATTR, ICODE)
 #define RS6000_BUILTIN_D(ENUM, NAME, MASK, ATTR, ICODE)
 #define RS6000_BUILTIN_H(ENUM, NAME, MASK, ATTR, ICODE)
+#define RS6000_BUILTIN_M(ENUM, NAME, MASK, ATTR, ICODE)
 #define RS6000_BUILTIN_P(ENUM, NAME, MASK, ATTR, ICODE)
 #define RS6000_BUILTIN_X(ENUM, NAME, MASK, ATTR, ICODE)
 
@@ -8620,6 +8645,7 @@ static const struct builtin_description bdesc_0arg[] =
 #undef RS6000_BUILTIN_A
 #undef RS6000_BUILTIN_D
 #undef RS6000_BUILTIN_H
+#undef RS6000_BUILTIN_M
 #undef RS6000_BUILTIN_P
 #undef RS6000_BUILTIN_X
 
@@ -8633,6 +8659,7 @@ static const struct builtin_description bdesc_0arg[] =
 #define RS6000_BUILTIN_H(ENUM, NAME, MASK, ATTR, ICODE) \
   { MASK, ICODE, NAME, ENUM },
 
+#define RS6000_BUILTIN_M(ENUM, NAME, MASK, ATTR, ICODE)
 #define RS6000_BUILTIN_P(ENUM, NAME, MASK, ATTR, ICODE)
 #define RS6000_BUILTIN_X(ENUM, NAME, MASK, ATTR, ICODE)
 
@@ -8641,6 +8668,7 @@ static const struct builtin_description bdesc_htm[] =
 #include "rs6000-builtin.def"
 };
 
+/* MMA builtins.  */
 #undef RS6000_BUILTIN_0
 #undef RS6000_BUILTIN_1
 #undef RS6000_BUILTIN_2
@@ -8649,7 +8677,40 @@ static const struct builtin_description bdesc_htm[] =
 #undef RS6000_BUILTIN_A
 #undef RS6000_BUILTIN_D
 #undef RS6000_BUILTIN_H
+#undef RS6000_BUILTIN_M
 #undef RS6000_BUILTIN_P
+#undef RS6000_BUILTIN_X
+
+#define RS6000_BUILTIN_0(ENUM, NAME, MASK, ATTR, ICODE)
+#define RS6000_BUILTIN_1(ENUM, NAME, MASK, ATTR, ICODE)
+#define RS6000_BUILTIN_2(ENUM, NAME, MASK, ATTR, ICODE)
+#define RS6000_BUILTIN_3(ENUM, NAME, MASK, ATTR, ICODE)
+#define RS6000_BUILTIN_4(ENUM, NAME, MASK, ATTR, ICODE)
+#define RS6000_BUILTIN_A(ENUM, NAME, MASK, ATTR, ICODE)
+#define RS6000_BUILTIN_D(ENUM, NAME, MASK, ATTR, ICODE)
+#define RS6000_BUILTIN_H(ENUM, NAME, MASK, ATTR, ICODE)
+#define RS6000_BUILTIN_M(ENUM, NAME, MASK, ATTR, ICODE) \
+  { MASK, ICODE, NAME, ENUM },
+
+#define RS6000_BUILTIN_P(ENUM, NAME, MASK, ATTR, ICODE)
+#define RS6000_BUILTIN_X(ENUM, NAME, MASK, ATTR, ICODE)
+
+static const struct builtin_description bdesc_mma[] =
+{
+#include "rs6000-builtin.def"
+};
+
+#undef RS6000_BUILTIN_0
+#undef RS6000_BUILTIN_1
+#undef RS6000_BUILTIN_2
+#undef RS6000_BUILTIN_3
+#undef RS6000_BUILTIN_4
+#undef RS6000_BUILTIN_A
+#undef RS6000_BUILTIN_D
+#undef RS6000_BUILTIN_H
+#undef RS6000_BUILTIN_M
+#undef RS6000_BUILTIN_P
+#undef RS6000_BUILTIN_X
 
 /* Return true if a builtin function is overloaded.  */
 bool
@@ -9393,6 +9454,133 @@ altivec_expand_stv_builtin (enum insn_code icode, tree exp)
   return NULL_RTX;
 }
 
+/* Expand the MMA built-in in EXP.
+   Store true in *EXPANDEDP if we found a built-in to expand.  */
+
+static rtx
+mma_expand_builtin (tree exp, rtx target, bool *expandedp)
+{
+  unsigned i;
+  tree fndecl = TREE_OPERAND (CALL_EXPR_FN (exp), 0);
+  enum rs6000_builtins fcode
+    = (enum rs6000_builtins) DECL_MD_FUNCTION_CODE (fndecl);
+  const struct builtin_description *d = bdesc_mma;
+
+  /* Expand the MMA built-in.  */
+  for (i = 0; i < ARRAY_SIZE (bdesc_mma); i++, d++)
+    if (d->code == fcode)
+      break;
+
+  if (i >= ARRAY_SIZE (bdesc_mma))
+    {
+      *expandedp = false;
+      return NULL_RTX;
+    }
+
+  *expandedp = true;
+
+  tree arg;
+  call_expr_arg_iterator iter;
+  enum insn_code icode = d->icode;
+  const struct insn_operand_data *insn_op;
+  rtx op[MAX_MMA_OPERANDS];
+  unsigned nopnds = 0;
+  unsigned attr = rs6000_builtin_info[fcode].attr;
+  bool void_func = (attr & RS6000_BTC_VOID);
+  machine_mode tmode = VOIDmode;
+
+  if (TREE_TYPE (TREE_TYPE (fndecl)) != void_type_node)
+    {
+      tmode = insn_data[icode].operand[0].mode;
+      if (!target
+	  || GET_MODE (target) != tmode
+	  || !(*insn_data[icode].operand[0].predicate) (target, tmode))
+	target = gen_reg_rtx (tmode);
+      op[nopnds++] = target;
+    }
+  else
+    target = const0_rtx;
+
+  FOR_EACH_CALL_EXPR_ARG (arg, iter, exp)
+    {
+      if (arg == error_mark_node)
+	return const0_rtx;
+
+      rtx opnd;
+      insn_op = &insn_data[icode].operand[nopnds];
+      if (TREE_CODE (arg) == ADDR_EXPR
+	  && MEM_P (DECL_RTL (TREE_OPERAND (arg, 0))))
+	opnd = DECL_RTL (TREE_OPERAND (arg, 0));
+      else
+	opnd = expand_normal (arg);
+
+      if (!(*insn_op->predicate) (opnd, insn_op->mode))
+	{
+	  if (!strcmp (insn_op->constraint, "n"))
+	    {
+	      if (!CONST_INT_P (opnd))
+		error ("argument %d must be an unsigned literal", nopnds);
+	      else
+		error ("argument %d is an unsigned literal that is "
+		       "out of range", nopnds);
+	      return const0_rtx;
+	    }
+	  opnd = copy_to_mode_reg (insn_op->mode, opnd);
+	}
+
+      /* Some MMA instructions have INOUT accumulator operands, so force
+	 their target register to be the same as their input register.  */
+      if (!void_func
+	  && nopnds == 1
+	  && !strcmp (insn_op->constraint, "0")
+	  && insn_op->mode == tmode
+	  && REG_P (opnd)
+	  && (*insn_data[icode].operand[0].predicate) (opnd, tmode))
+	target = op[0] = opnd;
+
+      op[nopnds++] = opnd;
+    }
+
+  unsigned attr_args = attr & RS6000_BTC_OPND_MASK;
+  if (attr & RS6000_BTC_QUAD)
+    attr_args++;
+
+  gcc_assert (nopnds == attr_args);
+
+  rtx pat;
+  switch (nopnds)
+    {
+    case 1:
+      pat = GEN_FCN (icode) (op[0]);
+      break;
+    case 2:
+      pat = GEN_FCN (icode) (op[0], op[1]);
+      break;
+    case 3:
+      pat = GEN_FCN (icode) (op[0], op[1], op[2]);
+      break;
+    case 4:
+      pat = GEN_FCN (icode) (op[0], op[1], op[2], op[3]);
+      break;
+    case 5:
+      pat = GEN_FCN (icode) (op[0], op[1], op[2], op[3], op[4]);
+      break;
+    case 6:
+      pat = GEN_FCN (icode) (op[0], op[1], op[2], op[3], op[4], op[5]);
+      break;
+    case 7:
+      pat = GEN_FCN (icode) (op[0], op[1], op[2], op[3], op[4], op[5], op[6]);
+      break;
+    default:
+      gcc_unreachable ();
+    }
+  if (!pat)
+    return NULL_RTX;
+  emit_insn (pat);
+
+  return target;
+}
+
 /* Return the appropriate SPR number associated with the given builtin.  */
 static inline HOST_WIDE_INT
 htm_spr_num (enum rs6000_builtins code)
@@ -9539,11 +9727,11 @@ htm_expand_builtin (tree exp, rtx target, bool * expandedp)
 	if (flag_checking)
 	  {
 	    int expected_nopnds = 0;
-	    if ((attr & RS6000_BTC_TYPE_MASK) == RS6000_BTC_UNARY)
+	    if ((attr & RS6000_BTC_OPND_MASK) == RS6000_BTC_UNARY)
 	      expected_nopnds = 1;
-	    else if ((attr & RS6000_BTC_TYPE_MASK) == RS6000_BTC_BINARY)
+	    else if ((attr & RS6000_BTC_OPND_MASK) == RS6000_BTC_BINARY)
 	      expected_nopnds = 2;
-	    else if ((attr & RS6000_BTC_TYPE_MASK) == RS6000_BTC_TERNARY)
+	    else if ((attr & RS6000_BTC_OPND_MASK) == RS6000_BTC_TERNARY)
 	      expected_nopnds = 3;
 	    else if ((attr & RS6000_BTC_TYPE_MASK) == RS6000_BTC_QUATERNARY)
 	      expected_nopnds = 4;
@@ -10647,6 +10835,10 @@ rs6000_invalid_builtin (enum rs6000_builtins fncode)
 	   "-m64");
   else if ((fnmask & RS6000_BTM_P9_MISC) == RS6000_BTM_P9_MISC)
     error ("%qs requires the %qs option", name, "-mcpu=power9");
+  else if ((fnmask & RS6000_BTM_FUTURE) != 0)
+    error ("%qs requires the %qs option", name, "-mcpu=future");
+  else if ((fnmask & RS6000_BTM_MMA) != 0)
+    error ("%qs requires the %qs option", name, "-mmma");
   else if ((fnmask & RS6000_BTM_LDBL128) == RS6000_BTM_LDBL128)
     {
       if (!TARGET_HARD_FLOAT)
@@ -10690,6 +10882,10 @@ rs6000_fold_builtin (tree fndecl ATTRIBUTE_UNUSED,
 static bool
 rs6000_builtin_valid_without_lhs (enum rs6000_builtins fn_code)
 {
+  /* Check for built-ins explicitly marked as a void function.  */
+  if (rs6000_builtin_info[fn_code].attr & RS6000_BTC_VOID)
+    return true;
+
   switch (fn_code)
     {
     case ALTIVEC_BUILTIN_STVX_V16QI:
@@ -10833,6 +11029,156 @@ fold_mergeeo_helper (gimple_stmt_iterator *gsi, gimple *stmt, int use_odd)
   gsi_replace (gsi, g, true);
 }
 
+/* Expand the MMA built-ins early, so that we can convert the pass-by-reference
+   __vector_quad arguments into pass-by-value arguments, leading to more
+   efficient code generation.  */
+
+bool
+rs6000_gimple_fold_mma_builtin (gimple_stmt_iterator *gsi)
+{
+  gimple *stmt = gsi_stmt (*gsi);
+  tree fndecl = gimple_call_fndecl (stmt);
+  enum rs6000_builtins fncode
+    = (enum rs6000_builtins) DECL_MD_FUNCTION_CODE (fndecl);
+  unsigned attr = rs6000_builtin_info[fncode].attr;
+
+  if ((attr & RS6000_BTC_GIMPLE) == 0)
+    return false;
+
+  unsigned nopnds = (attr & RS6000_BTC_OPND_MASK);
+  gimple_seq new_seq = NULL;
+  gimple *new_call;
+  tree new_decl;
+
+  if (rs6000_builtin_info[fncode + 1].icode == CODE_FOR_nothing)
+    {
+      /* This is an MMA disassemble built-in function.  */
+      gcc_assert (fncode == MMA_BUILTIN_DISASSEMBLE_ACC
+		  || fncode == MMA_BUILTIN_DISASSEMBLE_PAIR);
+
+      push_gimplify_context (true);
+      tree dst_ptr = gimple_call_arg (stmt, 0);
+      tree src_ptr = gimple_call_arg (stmt, 1);
+      tree src_type = TREE_TYPE (src_ptr);
+      tree src = make_ssa_name (TREE_TYPE (src_type));
+      gimplify_assign (src, build_simple_mem_ref (src_ptr), &new_seq);
+
+      /* If we are not disassembling an accumulator or our destination is
+	 another accumulator, then just copy the entire thing as is.  */
+      if (fncode != MMA_BUILTIN_DISASSEMBLE_ACC
+	  || TREE_TYPE (TREE_TYPE (dst_ptr)) == vector_quad_type_node)
+	{
+	  tree dst = build_simple_mem_ref (build1 (VIEW_CONVERT_EXPR,
+						   src_type, dst_ptr));
+	  gimplify_assign (dst, src, &new_seq);
+	  pop_gimplify_context (NULL);
+	  gsi_replace_with_seq (gsi, new_seq, true);
+	  return true;
+	}
+
+      /* We're disassembling an accumulator into a different type, so we need
+	 to emit a xxmfacc instruction now, since we cannot do it later.  */
+      new_decl = rs6000_builtin_decls[MMA_BUILTIN_XXMFACC_INTERNAL];
+      new_call = gimple_build_call (new_decl, 1, src);
+      src = make_ssa_name (vector_quad_type_node);
+      gimple_call_set_lhs (new_call, src);
+      gimple_seq_add_stmt (&new_seq, new_call);
+
+      /* Copy the accumulator vector by vector.  */
+      tree dst_type = build_pointer_type_for_mode (unsigned_V16QI_type_node,
+						   ptr_mode, true);
+      tree dst_base = build1 (VIEW_CONVERT_EXPR, dst_type, dst_ptr);
+      tree array_type = build_array_type_nelts (unsigned_V16QI_type_node, 4);
+      tree src_array = build1 (VIEW_CONVERT_EXPR, array_type, src);
+      for (unsigned i = 0; i < 4; i++)
+	{
+	  tree ref = build4 (ARRAY_REF, unsigned_V16QI_type_node, src_array,
+			     build_int_cst (size_type_node, i),
+			     NULL_TREE, NULL_TREE);
+	  tree dst = build2 (MEM_REF, unsigned_V16QI_type_node, dst_base,
+			     build_int_cst (dst_type, i * 16));
+	  gimplify_assign (dst, ref, &new_seq);
+	}
+      pop_gimplify_context (NULL);
+      gsi_replace_with_seq (gsi, new_seq, true);
+      return true;
+    }
+
+  /* Convert this built-in into an internal version that uses pass-by-value
+     arguments.  The internal built-in follows immediately after this one.  */
+  new_decl = rs6000_builtin_decls[fncode + 1];
+  tree lhs, mem, op[MAX_MMA_OPERANDS];
+  tree acc = gimple_call_arg (stmt, 0);
+  if (TREE_CODE (acc) == PARM_DECL)
+    mem = build1 (INDIRECT_REF, TREE_TYPE (TREE_TYPE (acc)), acc);
+  else
+    mem = build_simple_mem_ref (acc);
+  push_gimplify_context (true);
+
+  if ((attr & RS6000_BTC_QUAD) != 0)
+    {
+      /* This built-in has a pass-by-reference accumulator input, so load it
+	 into a temporary accumulator for use as a pass-by-value input.  */
+      op[0] = make_ssa_name (vector_quad_type_node);
+      for (unsigned i = 1; i < nopnds; i++)
+	op[i] = gimple_call_arg (stmt, i);
+      gimplify_assign (op[0], mem, &new_seq);
+    }
+  else
+    {
+      /* This built-in does not use its pass-by-reference accumulator argument
+	 as an input argument, so remove it from the input list.  */
+      nopnds--;
+      for (unsigned i = 0; i < nopnds; i++)
+	op[i] = gimple_call_arg (stmt, i + 1);
+    }
+
+  switch (nopnds)
+    {
+    case 0:
+      new_call = gimple_build_call (new_decl, 0);
+      break;
+    case 1:
+      new_call = gimple_build_call (new_decl, 1, op[0]);
+      break;
+    case 2:
+      new_call = gimple_build_call (new_decl, 2, op[0], op[1]);
+      break;
+    case 3:
+      new_call = gimple_build_call (new_decl, 3, op[0], op[1], op[2]);
+      break;
+    case 4:
+      new_call = gimple_build_call (new_decl, 4, op[0], op[1], op[2], op[3]);
+      break;
+    case 5:
+      new_call = gimple_build_call (new_decl, 5, op[0], op[1], op[2], op[3],
+				    op[4]);
+      break;
+    case 6:
+      new_call = gimple_build_call (new_decl, 6, op[0], op[1], op[2], op[3],
+				    op[4], op[5]);
+      break;
+    case 7:
+      new_call = gimple_build_call (new_decl, 7, op[0], op[1], op[2], op[3],
+				    op[4], op[5], op[6]);
+      break;
+    default:
+      gcc_unreachable ();
+    }
+
+  if (fncode == MMA_BUILTIN_ASSEMBLE_PAIR)
+    lhs = make_ssa_name (vector_pair_type_node);
+  else
+    lhs = make_ssa_name (vector_quad_type_node);
+  gimple_call_set_lhs (new_call, lhs);
+  gimple_seq_add_stmt (&new_seq, new_call);
+  gimplify_assign (mem, lhs, &new_seq);
+  pop_gimplify_context (NULL);
+  gsi_replace_with_seq (gsi, new_seq, true);
+
+  return true;
+}
+
 /* Fold a machine-dependent built-in in GIMPLE.  (For folding into
    a constant, use rs6000_fold_builtin.)  */
 
@@ -10868,11 +11214,12 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
     return false;
 
   /* Don't fold invalid builtins, let rs6000_expand_builtin diagnose it.  */
-  HOST_WIDE_INT mask = rs6000_builtin_info[uns_fncode].mask;
-  bool func_valid_p = (rs6000_builtin_mask & mask) == mask;
-  if (!func_valid_p)
+  if (!rs6000_builtin_is_supported_p (fn_code))
     return false;
 
+  if (rs6000_gimple_fold_mma_builtin (gsi))
+    return true;
+
   switch (fn_code)
     {
     /* Flavors of vec_add.  We deliberately don't expand
@@ -12007,6 +12354,13 @@ rs6000_expand_builtin (tree exp, rtx target, rtx subtarget ATTRIBUTE_UNUSED,
       break;
     }
 
+  if (TARGET_MMA)
+    {
+      ret = mma_expand_builtin (exp, target, &success);
+
+      if (success)
+	return ret;
+    }
   if (TARGET_ALTIVEC)
     {
       ret = altivec_expand_builtin (exp, target, &success);
@@ -12022,7 +12376,7 @@ rs6000_expand_builtin (tree exp, rtx target, rtx subtarget ATTRIBUTE_UNUSED,
 	return ret;
     }  
 
-  unsigned attr = rs6000_builtin_info[uns_fcode].attr & RS6000_BTC_TYPE_MASK;
+  unsigned attr = rs6000_builtin_info[uns_fcode].attr & RS6000_BTC_OPND_MASK;
   /* RS6000_BTC_SPECIAL represents no-operand operators.  */
   gcc_assert (attr == RS6000_BTC_UNARY
 	      || attr == RS6000_BTC_BINARY
@@ -12205,7 +12559,7 @@ rs6000_init_builtins (void)
   else
     ieee128_float_type_node = ibm128_float_type_node = long_double_type_node;
 
-  /* Vector paired and vector quad support.  */
+  /* Vector pair and vector quad support.  */
   if (TARGET_MMA)
     {
       tree oi_uns_type = make_unsigned_type (256);
@@ -12287,6 +12641,8 @@ rs6000_init_builtins (void)
      the target attribute.  */
   if (TARGET_EXTRA_BUILTINS)
     altivec_init_builtins ();
+  if (TARGET_MMA)
+    mma_init_builtins ();
   if (TARGET_HTM)
     htm_init_builtins ();
 
@@ -13012,6 +13368,119 @@ altivec_init_builtins (void)
 
 }
 
+static void
+mma_init_builtins (void)
+{
+  const struct builtin_description *d = bdesc_mma;
+
+  for (unsigned i = 0; i < ARRAY_SIZE (bdesc_mma); i++, d++)
+    {
+      tree op[MAX_MMA_OPERANDS], type;
+      HOST_WIDE_INT mask = d->mask;
+      unsigned icode = (unsigned) d->icode;
+      unsigned attr = rs6000_builtin_info[d->code].attr;
+      int attr_args = (attr & RS6000_BTC_OPND_MASK);
+      bool gimple_func = (attr & RS6000_BTC_GIMPLE);
+      unsigned nopnds = 0;
+
+      if ((mask & rs6000_builtin_mask) != mask)
+	{
+	  if (TARGET_DEBUG_BUILTIN)
+	    fprintf (stderr, "mma_builtin, skip binary %s\n", d->name);
+	  continue;
+	}
+
+      if (d->name == 0)
+	{
+	  if (TARGET_DEBUG_BUILTIN)
+	    fprintf (stderr, "mma_builtin, bdesc_mma[%ld] no name\n",
+		     (long unsigned) i);
+	  continue;
+	}
+
+      if (gimple_func)
+	{
+	  gcc_assert (icode == CODE_FOR_nothing);
+	  op[nopnds++] = void_type_node;
+	  /* Some MMA built-ins that are expanded into gimple are converted
+	     into internal MMA built-ins that are expanded into rtl.
+	     The internal built-in follows immediately after this built-in.  */
+	  icode = d[1].icode;
+	}
+      else
+	{
+	  if ((attr & RS6000_BTC_QUAD) == 0)
+	    attr_args--;
+
+	  /* Ensure we have the correct number and type of operands.  */
+	  gcc_assert (attr_args == insn_data[icode].n_operands - 1);
+	}
+
+      if (icode == CODE_FOR_nothing)
+	{
+	  /* This is a disassemble MMA built-in function.  */
+	  gcc_assert (attr_args == RS6000_BTC_BINARY
+		      && (d->code == MMA_BUILTIN_DISASSEMBLE_ACC
+			  || d->code == MMA_BUILTIN_DISASSEMBLE_PAIR));
+	  op[nopnds++] = build_pointer_type (void_type_node);
+	  if (attr & RS6000_BTC_QUAD)
+	    op[nopnds++] = build_pointer_type (vector_quad_type_node);
+	  else
+	    op[nopnds++] = build_pointer_type (vector_pair_type_node);
+	}
+      else
+	{
+	  /* This is a normal MMA built-in function.  */
+	  unsigned j = (attr & RS6000_BTC_QUAD) ? 1 : 0;
+	  for (; j < insn_data[icode].n_operands; j++)
+	    {
+	      machine_mode mode = insn_data[icode].operand[j].mode;
+	      if (gimple_func && mode == PXImode)
+		op[nopnds++] = build_pointer_type (vector_quad_type_node);
+	      else if (gimple_func && mode == POImode
+		       && d->code == MMA_BUILTIN_ASSEMBLE_PAIR)
+		op[nopnds++] = build_pointer_type (vector_pair_type_node);
+	      else
+		/* MMA uses unsigned types.  */
+		op[nopnds++] = builtin_mode_to_type[mode][1];
+	    }
+	}
+
+      switch (nopnds)
+	{
+	case 1:
+	  type = build_function_type_list (op[0], NULL_TREE);
+	  break;
+	case 2:
+	  type = build_function_type_list (op[0], op[1], NULL_TREE);
+	  break;
+	case 3:
+	  type = build_function_type_list (op[0], op[1], op[2], NULL_TREE);
+	  break;
+	case 4:
+	  type = build_function_type_list (op[0], op[1], op[2], op[3],
+					   NULL_TREE);
+	  break;
+	case 5:
+	  type = build_function_type_list (op[0], op[1], op[2], op[3], op[4],
+					   NULL_TREE);
+	  break;
+	case 6:
+	  type = build_function_type_list (op[0], op[1], op[2], op[3], op[4],
+					   op[5], NULL_TREE);
+	  break;
+	case 7:
+	  type = build_function_type_list (op[0], op[1], op[2], op[3], op[4],
+					   op[5], op[6], NULL_TREE);
+	  break;
+	default:
+	  gcc_unreachable ();
+	}
+
+      def_builtin (d->name, type, d->code);
+    }
+}
+
 static void
 htm_init_builtins (void)
 {
@@ -13026,7 +13495,7 @@ htm_init_builtins (void)
       HOST_WIDE_INT mask = d->mask;
       unsigned attr = rs6000_builtin_info[d->code].attr;
       bool void_func = (attr & RS6000_BTC_VOID);
-      int attr_args = (attr & RS6000_BTC_TYPE_MASK);
+      int attr_args = (attr & RS6000_BTC_OPND_MASK);
       int nopnds = 0;
       tree gpr_type_node;
       tree rettype;
@@ -13192,6 +13661,8 @@ builtin_function_type (machine_mode mode_ret, machine_mode mode_arg0,
     case P8V_BUILTIN_VGBBD:
     case MISC_BUILTIN_CDTBCD:
     case MISC_BUILTIN_CBCDTD:
+    case VSX_BUILTIN_XVCVSPBF16:
+    case VSX_BUILTIN_XVCVBF16SP:
       h.uns_p[0] = 1;
       h.uns_p[1] = 1;
       break;
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index a0f4991d00a..756a2ae8cb9 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -9944,7 +9944,8 @@ rs6000_emit_move (rtx dest, rtx source, machine_mode mode)
 
     case E_POImode:
     case E_PXImode:
-      if (CONSTANT_P (operands[1]))
+      if (CONSTANT_P (operands[1])
+	  && INTVAL (operands[1]) != 0)
 	error ("%qs is an opaque type, and you can't set it to other values.",
 	       (mode == POImode) ? "__vector_pair" : "__vector_quad");
       break;
@@ -12856,6 +12857,14 @@ print_operand (FILE *file, rtx x, int code)
       /* %c is output_addr_const if a CONSTANT_ADDRESS_P, otherwise
 	 output_operand.  */
 
+    case 'A':
+      /* Write the MMA accumulator number associated with VSX register X.  */
+      if (!REG_P (x) || !FP_REGNO_P (REGNO (x)) || (REGNO (x) % 4) != 0)
+	output_operand_lossage ("invalid %%A value");
+      else
+	fprintf (file, "%d", (REGNO (x) - FIRST_FPR_REGNO) / 4);
+      return;
+
     case 'D':
       /* Like 'J' but get to the GT bit only.  */
       if (!REG_P (x) || !CR_REGNO_P (REGNO (x)))
@@ -15969,6 +15978,12 @@ rs6000_split_multireg_move (rtx dst, rtx src)
 	  unsigned offset = 0;
 	  unsigned size = GET_MODE_SIZE (reg_mode);
 
+	  /* If we are reading an accumulator register, we have to
+	     deprime it before we can access it.  */
+	  if (TARGET_MMA
+	      && GET_MODE (src) == PXImode && FP_REGNO_P (REGNO (src)))
+	    emit_insn (gen_mma_xxmfacc (src, src));
+
 	  for (int i = 0; i < nregs; i++)
 	    {
 	      unsigned subreg = (WORDS_BIG_ENDIAN)
@@ -15997,6 +16012,32 @@ rs6000_split_multireg_move (rtx dst, rtx src)
 	      emit_insn (gen_rtx_SET (dst2, src2));
 	    }
 
+	  /* If we are writing an accumulator register, we have to
+	     prime it after we've written it.  */
+	  if (TARGET_MMA
+	      && GET_MODE (dst) == PXImode && FP_REGNO_P (REGNO (dst)))
+	    emit_insn (gen_mma_xxmtacc (dst, dst));
+
+	  return;
+	}
+
+      if (GET_CODE (src) == UNSPEC)
+	{
+	  gcc_assert (REG_P (dst)
+		      && FP_REGNO_P (REGNO (dst))
+		      && XINT (src, 1) == UNSPEC_MMA_ASSEMBLE_ACC);
+
+	  reg_mode = GET_MODE (XVECEXP (src, 0, 0));
+	  for (int i = 0; i < XVECLEN (src, 0); i++)
+	    {
+	      rtx dst_i = gen_rtx_REG (reg_mode, reg + i);
+	      emit_insn (gen_rtx_SET (dst_i, XVECEXP (src, 0, i)));
+	    }
+
+	  /* We are writing an accumulator register, so we have to
+	     prime it after we've written it.  */
+	  emit_insn (gen_mma_xxmtacc (dst, dst));
+
 	  return;
 	}
 
@@ -16005,6 +16046,12 @@ rs6000_split_multireg_move (rtx dst, rtx src)
 
   if (REG_P (src) && REG_P (dst) && (REGNO (src) < REGNO (dst)))
     {
+      /* If we are reading an accumulator register, we have to
+	 deprime it before we can access it.  */
+      if (TARGET_MMA
+	  && GET_MODE (src) == PXImode && FP_REGNO_P (REGNO (src)))
+	emit_insn (gen_mma_xxmfacc (src, src));
+
       /* Move register range backwards, if we might have destructive
 	 overlap.  */
       int i;
@@ -16013,6 +16060,12 @@ rs6000_split_multireg_move (rtx dst, rtx src)
 						     i * reg_mode_size),
 				simplify_gen_subreg (reg_mode, src, mode,
 						     i * reg_mode_size)));
+
+      /* If we are writing an accumulator register, we have to
+	 prime it after we've written it.  */
+      if (TARGET_MMA
+	  && GET_MODE (dst) == PXImode && FP_REGNO_P (REGNO (dst)))
+	emit_insn (gen_mma_xxmtacc (dst, dst));
     }
   else
     {
@@ -16145,6 +16198,12 @@ rs6000_split_multireg_move (rtx dst, rtx src)
 	    gcc_assert (rs6000_offsettable_memref_p (dst, reg_mode, true));
 	}
 
+      /* If we are reading an accumulator register, we have to
+	 deprime it before we can access it.  */
+      if (TARGET_MMA && REG_P (src)
+	  && GET_MODE (src) == PXImode && FP_REGNO_P (REGNO (src)))
+	emit_insn (gen_mma_xxmfacc (src, src));
+
       for (i = 0; i < nregs; i++)
 	{
 	  /* Calculate index to next subword.  */
@@ -16162,6 +16221,13 @@ rs6000_split_multireg_move (rtx dst, rtx src)
 				  simplify_gen_subreg (reg_mode, src, mode,
 						       j * reg_mode_size)));
 	}
+
+      /* If we are writing an accumulator register, we have to
+	 prime it after we've written it.  */
+      if (TARGET_MMA && REG_P (dst)
+	  && GET_MODE (dst) == PXImode && FP_REGNO_P (REGNO (dst)))
+	emit_insn (gen_mma_xxmtacc (dst, dst));
+
       if (restore_basereg != NULL_RTX)
 	emit_insn (restore_basereg);
     }
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 9c103bf8f7d..f3883b51255 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -2251,20 +2251,24 @@ extern int frame_pointer_needed;
    flags macros, but we've run out of bits, so we now map the options into new
    settings used here.  */
 
-/* Builtin attributes.  */
-#define RS6000_BTC_SPECIAL	0x00000000	/* Special function.  */
+/* Builtin operand count.  */
 #define RS6000_BTC_UNARY	0x00000001	/* normal unary function.  */
 #define RS6000_BTC_BINARY	0x00000002	/* normal binary function.  */
 #define RS6000_BTC_TERNARY	0x00000003	/* normal ternary function.  */
 #define RS6000_BTC_QUATERNARY	0x00000004	/* normal quaternary
 						   function. */
+#define RS6000_BTC_QUINARY	0x00000005	/* normal quinary function.  */
+#define RS6000_BTC_SENARY	0x00000006	/* normal senary function.  */
+#define RS6000_BTC_OPND_MASK	0x00000007	/* Mask to isolate operands. */
 
-#define RS6000_BTC_PREDICATE	0x00000005	/* predicate function.  */
-#define RS6000_BTC_ABS		0x00000006	/* Altivec/VSX ABS
+/* Builtin attributes.  */
+#define RS6000_BTC_SPECIAL	0x00000000	/* Special function.  */
+#define RS6000_BTC_PREDICATE	0x00000008	/* predicate function.  */
+#define RS6000_BTC_ABS		0x00000010	/* Altivec/VSX ABS
 						   function.  */
-#define RS6000_BTC_DST		0x00000007	/* Altivec DST function.  */
+#define RS6000_BTC_DST		0x00000020	/* Altivec DST function.  */
 
-#define RS6000_BTC_TYPE_MASK	0x0000000f	/* Mask to isolate types */
+#define RS6000_BTC_TYPE_MASK	0x0000003f	/* Mask to isolate types */
 
 #define RS6000_BTC_MISC		0x00000000	/* No special attributes.  */
 #define RS6000_BTC_CONST	0x00000100	/* Neither uses, nor
@@ -2273,13 +2277,18 @@ extern int frame_pointer_needed;
 						   state/mem and does
 						   not modify global state.  */
 #define RS6000_BTC_FP		0x00000400	/* depends on rounding mode.  */
-#define RS6000_BTC_ATTR_MASK	0x00000700	/* Mask of the attributes.  */
+#define RS6000_BTC_QUAD		0x00000800	/* Uses a register quad.  */
+#define RS6000_BTC_PAIR		0x00001000	/* Uses a register pair.  */
+#define RS6000_BTC_QUADPAIR	0x00001800	/* Uses a quad and a pair.  */
+#define RS6000_BTC_ATTR_MASK	0x00001f00	/* Mask of the attributes.  */
 
 /* Miscellaneous information.  */
 #define RS6000_BTC_SPR		0x01000000	/* function references SPRs.  */
 #define RS6000_BTC_VOID		0x02000000	/* function has no return value.  */
 #define RS6000_BTC_CR		0x04000000	/* function references a CR.  */
 #define RS6000_BTC_OVERLOADED	0x08000000	/* function is overloaded.  */
+#define RS6000_BTC_GIMPLE	0x10000000	/* function should be expanded
+						   into gimple.  */
 #define RS6000_BTC_MISC_MASK	0x1f000000	/* Mask of the misc info.  */
 
 /* Convenience macros to document the instruction type.  */
@@ -2348,6 +2357,7 @@ extern int frame_pointer_needed;
 #undef RS6000_BUILTIN_A
 #undef RS6000_BUILTIN_D
 #undef RS6000_BUILTIN_H
+#undef RS6000_BUILTIN_M
 #undef RS6000_BUILTIN_P
 #undef RS6000_BUILTIN_X
 
@@ -2359,6 +2369,7 @@ extern int frame_pointer_needed;
 #define RS6000_BUILTIN_A(ENUM, NAME, MASK, ATTR, ICODE) ENUM,
 #define RS6000_BUILTIN_D(ENUM, NAME, MASK, ATTR, ICODE) ENUM,
 #define RS6000_BUILTIN_H(ENUM, NAME, MASK, ATTR, ICODE) ENUM,
+#define RS6000_BUILTIN_M(ENUM, NAME, MASK, ATTR, ICODE) ENUM,
 #define RS6000_BUILTIN_P(ENUM, NAME, MASK, ATTR, ICODE) ENUM,
 #define RS6000_BUILTIN_X(ENUM, NAME, MASK, ATTR, ICODE) ENUM,
 
@@ -2377,6 +2388,7 @@ enum rs6000_builtins
 #undef RS6000_BUILTIN_A
 #undef RS6000_BUILTIN_D
 #undef RS6000_BUILTIN_H
+#undef RS6000_BUILTIN_M
 #undef RS6000_BUILTIN_P
 #undef RS6000_BUILTIN_X
 
diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
index 66c3cb5f2dc..a1ff5fa852f 100644
--- a/gcc/config/rs6000/mma.md
+++ b/gcc/config/rs6000/mma.md
@@ -31,6 +31,240 @@
 ;; therefor, we define the XImode and OImode move patterns, but we
 ;; disable their use with a "false" condition flag.
 
+(define_constants [(MAX_MMA_OPERANDS 7)])
+
+;; Constants for creating unspecs
+
+(define_c_enum "unspec"
+  [UNSPEC_MMA_ASSEMBLE_ACC
+   UNSPEC_MMA_PMXVBF16GER2
+   UNSPEC_MMA_PMXVBF16GER2NN
+   UNSPEC_MMA_PMXVBF16GER2NP
+   UNSPEC_MMA_PMXVBF16GER2PN
+   UNSPEC_MMA_PMXVBF16GER2PP
+   UNSPEC_MMA_PMXVF16GER2
+   UNSPEC_MMA_PMXVF16GER2NN
+   UNSPEC_MMA_PMXVF16GER2NP
+   UNSPEC_MMA_PMXVF16GER2PN
+   UNSPEC_MMA_PMXVF16GER2PP
+   UNSPEC_MMA_PMXVF32GER
+   UNSPEC_MMA_PMXVF32GERNN
+   UNSPEC_MMA_PMXVF32GERNP
+   UNSPEC_MMA_PMXVF32GERPN
+   UNSPEC_MMA_PMXVF32GERPP
+   UNSPEC_MMA_PMXVF64GER
+   UNSPEC_MMA_PMXVF64GERNN
+   UNSPEC_MMA_PMXVF64GERNP
+   UNSPEC_MMA_PMXVF64GERPN
+   UNSPEC_MMA_PMXVF64GERPP
+   UNSPEC_MMA_PMXVI16GER2
+   UNSPEC_MMA_PMXVI16GER2PP
+   UNSPEC_MMA_PMXVI16GER2S
+   UNSPEC_MMA_PMXVI16GER2SPP
+   UNSPEC_MMA_PMXVI4GER8
+   UNSPEC_MMA_PMXVI4GER8PP
+   UNSPEC_MMA_PMXVI8GER4
+   UNSPEC_MMA_PMXVI8GER4PP
+   UNSPEC_MMA_PMXVI8GER4SPP
+   UNSPEC_MMA_XVBF16GER2
+   UNSPEC_MMA_XVBF16GER2NN
+   UNSPEC_MMA_XVBF16GER2NP
+   UNSPEC_MMA_XVBF16GER2PN
+   UNSPEC_MMA_XVBF16GER2PP
+   UNSPEC_MMA_XVF16GER2
+   UNSPEC_MMA_XVF16GER2NN
+   UNSPEC_MMA_XVF16GER2NP
+   UNSPEC_MMA_XVF16GER2PN
+   UNSPEC_MMA_XVF16GER2PP
+   UNSPEC_MMA_XVF32GER
+   UNSPEC_MMA_XVF32GERNN
+   UNSPEC_MMA_XVF32GERNP
+   UNSPEC_MMA_XVF32GERPN
+   UNSPEC_MMA_XVF32GERPP
+   UNSPEC_MMA_XVF64GER
+   UNSPEC_MMA_XVF64GERNN
+   UNSPEC_MMA_XVF64GERNP
+   UNSPEC_MMA_XVF64GERPN
+   UNSPEC_MMA_XVF64GERPP
+   UNSPEC_MMA_XVI16GER2
+   UNSPEC_MMA_XVI16GER2PP
+   UNSPEC_MMA_XVI16GER2S
+   UNSPEC_MMA_XVI16GER2SPP
+   UNSPEC_MMA_XVI4GER8
+   UNSPEC_MMA_XVI4GER8PP
+   UNSPEC_MMA_XVI8GER4
+   UNSPEC_MMA_XVI8GER4PP
+   UNSPEC_MMA_XVI8GER4SPP
+   UNSPEC_MMA_XXMFACC
+   UNSPEC_MMA_XXMTACC
+  ])
+
+;; MMA instructions with 1 accumulator argument
+(define_int_iterator MMA_ACC		[UNSPEC_MMA_XXMFACC
+					 UNSPEC_MMA_XXMTACC])
+
+;; MMA instructions with 2 vector arguments
+(define_int_iterator MMA_VV		[UNSPEC_MMA_XVI4GER8
+					 UNSPEC_MMA_XVI8GER4
+					 UNSPEC_MMA_XVI16GER2
+					 UNSPEC_MMA_XVI16GER2S
+					 UNSPEC_MMA_XVF16GER2
+					 UNSPEC_MMA_XVBF16GER2
+					 UNSPEC_MMA_XVF32GER])
+
+;; MMA instructions with 1 accumulator and 2 vector arguments
+(define_int_iterator MMA_AVV		[UNSPEC_MMA_XVI4GER8PP
+					 UNSPEC_MMA_XVI8GER4PP
+					 UNSPEC_MMA_XVI8GER4SPP
+					 UNSPEC_MMA_XVI16GER2PP
+					 UNSPEC_MMA_XVI16GER2SPP
+					 UNSPEC_MMA_XVF16GER2PP
+					 UNSPEC_MMA_XVF16GER2PN
+					 UNSPEC_MMA_XVF16GER2NP
+					 UNSPEC_MMA_XVF16GER2NN
+					 UNSPEC_MMA_XVBF16GER2PP
+					 UNSPEC_MMA_XVBF16GER2PN
+					 UNSPEC_MMA_XVBF16GER2NP
+					 UNSPEC_MMA_XVBF16GER2NN
+					 UNSPEC_MMA_XVF32GERPP
+					 UNSPEC_MMA_XVF32GERPN
+					 UNSPEC_MMA_XVF32GERNP
+					 UNSPEC_MMA_XVF32GERNN])
+
+;; MMA instructions with 1 vector pair and 1 vector arguments
+(define_int_iterator MMA_PV		[UNSPEC_MMA_XVF64GER])
+
+;; MMA instructions with 1 accumulator, 1 vector pair and 1 vector arguments
+(define_int_iterator MMA_APV		[UNSPEC_MMA_XVF64GERPP
+					 UNSPEC_MMA_XVF64GERPN
+					 UNSPEC_MMA_XVF64GERNP
+					 UNSPEC_MMA_XVF64GERNN])
+
+;; MMA instructions with 2 vector, 2 4-bit and 1 8-bit arguments
+(define_int_iterator MMA_VVI4I4I8	[UNSPEC_MMA_PMXVI4GER8])
+
+;; MMA instructions with 1 accumulator, 2 vector, 2 4-bit and 1 8-bit arguments
+(define_int_iterator MMA_AVVI4I4I8	[UNSPEC_MMA_PMXVI4GER8PP])
+
+;; MMA instructions with 2 vector, 2 4-bit and 1 2-bit arguments
+(define_int_iterator MMA_VVI4I4I2	[UNSPEC_MMA_PMXVI16GER2
+					 UNSPEC_MMA_PMXVI16GER2S
+					 UNSPEC_MMA_PMXVF16GER2
+					 UNSPEC_MMA_PMXVBF16GER2])
+
+;; MMA instructions with 1 accumulator, 2 vector, 2 4-bit and 1 2-bit arguments
+(define_int_iterator MMA_AVVI4I4I2	[UNSPEC_MMA_PMXVI16GER2PP
+					 UNSPEC_MMA_PMXVI16GER2SPP
+					 UNSPEC_MMA_PMXVF16GER2PP
+					 UNSPEC_MMA_PMXVF16GER2PN
+					 UNSPEC_MMA_PMXVF16GER2NP
+					 UNSPEC_MMA_PMXVF16GER2NN
+					 UNSPEC_MMA_PMXVBF16GER2PP
+					 UNSPEC_MMA_PMXVBF16GER2PN
+					 UNSPEC_MMA_PMXVBF16GER2NP
+					 UNSPEC_MMA_PMXVBF16GER2NN])
+
+;; MMA instructions with 2 vector and 2 4-bit arguments
+(define_int_iterator MMA_VVI4I4		[UNSPEC_MMA_PMXVF32GER])
+
+;; MMA instructions with 1 accumulator, 2 vector and 2 4-bit arguments
+(define_int_iterator MMA_AVVI4I4	[UNSPEC_MMA_PMXVF32GERPP
+					 UNSPEC_MMA_PMXVF32GERPN
+					 UNSPEC_MMA_PMXVF32GERNP
+					 UNSPEC_MMA_PMXVF32GERNN])
+
+;; MMA instructions with 2 vector, 1 4-bit and 1 2-bit arguments
+(define_int_iterator MMA_PVI4I2		[UNSPEC_MMA_PMXVF64GER])
+
+;; MMA instructions with 1 accumulator, 2 vector, 1 4-bit and 1 2-bit arguments
+(define_int_iterator MMA_APVI4I2	[UNSPEC_MMA_PMXVF64GERPP
+					 UNSPEC_MMA_PMXVF64GERPN
+					 UNSPEC_MMA_PMXVF64GERNP
+					 UNSPEC_MMA_PMXVF64GERNN])
+
+;; MMA instructions with 2 vector and 3 4-bit arguments
+(define_int_iterator MMA_VVI4I4I4	[UNSPEC_MMA_PMXVI8GER4])
+
+;; MMA instructions with 1 accumulator, 2 vector and 3 4-bit arguments
+(define_int_iterator MMA_AVVI4I4I4	[UNSPEC_MMA_PMXVI8GER4PP
+					 UNSPEC_MMA_PMXVI8GER4SPP])
+
+(define_int_attr acc		[(UNSPEC_MMA_XXMFACC		"xxmfacc")
+				 (UNSPEC_MMA_XXMTACC		"xxmtacc")])
+
+(define_int_attr vv		[(UNSPEC_MMA_XVI4GER8		"xvi4ger8")
+				 (UNSPEC_MMA_XVI8GER4		"xvi8ger4")
+				 (UNSPEC_MMA_XVI16GER2		"xvi16ger2")
+				 (UNSPEC_MMA_XVI16GER2S		"xvi16ger2s")
+				 (UNSPEC_MMA_XVF16GER2		"xvf16ger2")
+				 (UNSPEC_MMA_XVBF16GER2		"xvbf16ger2")
+				 (UNSPEC_MMA_XVF32GER		"xvf32ger")])
+
+(define_int_attr avv		[(UNSPEC_MMA_XVI4GER8PP		"xvi4ger8pp")
+				 (UNSPEC_MMA_XVI8GER4PP		"xvi8ger4pp")
+				 (UNSPEC_MMA_XVI8GER4SPP	"xvi8ger4spp")
+				 (UNSPEC_MMA_XVI16GER2PP	"xvi16ger2pp")
+				 (UNSPEC_MMA_XVI16GER2SPP	"xvi16ger2spp")
+				 (UNSPEC_MMA_XVF16GER2PP	"xvf16ger2pp")
+				 (UNSPEC_MMA_XVF16GER2PN	"xvf16ger2pn")
+				 (UNSPEC_MMA_XVF16GER2NP	"xvf16ger2np")
+				 (UNSPEC_MMA_XVF16GER2NN	"xvf16ger2nn")
+				 (UNSPEC_MMA_XVBF16GER2PP	"xvbf16ger2pp")
+				 (UNSPEC_MMA_XVBF16GER2PN	"xvbf16ger2pn")
+				 (UNSPEC_MMA_XVBF16GER2NP	"xvbf16ger2np")
+				 (UNSPEC_MMA_XVBF16GER2NN	"xvbf16ger2nn")
+				 (UNSPEC_MMA_XVF32GERPP		"xvf32gerpp")
+				 (UNSPEC_MMA_XVF32GERPN		"xvf32gerpn")
+				 (UNSPEC_MMA_XVF32GERNP		"xvf32gernp")
+				 (UNSPEC_MMA_XVF32GERNN		"xvf32gernn")])
+
+(define_int_attr pv		[(UNSPEC_MMA_XVF64GER		"xvf64ger")])
+
+(define_int_attr apv		[(UNSPEC_MMA_XVF64GERPP		"xvf64gerpp")
+				 (UNSPEC_MMA_XVF64GERPN		"xvf64gerpn")
+				 (UNSPEC_MMA_XVF64GERNP		"xvf64gernp")
+				 (UNSPEC_MMA_XVF64GERNN		"xvf64gernn")])
+
+(define_int_attr vvi4i4i8	[(UNSPEC_MMA_PMXVI4GER8		"pmxvi4ger8")])
+
+(define_int_attr avvi4i4i8	[(UNSPEC_MMA_PMXVI4GER8PP	"pmxvi4ger8pp")])
+
+(define_int_attr vvi4i4i2	[(UNSPEC_MMA_PMXVI16GER2	"pmxvi16ger2")
+				 (UNSPEC_MMA_PMXVI16GER2S	"pmxvi16ger2s")
+				 (UNSPEC_MMA_PMXVF16GER2	"pmxvf16ger2")
+				 (UNSPEC_MMA_PMXVBF16GER2	"pmxvbf16ger2")])
+
+(define_int_attr avvi4i4i2	[(UNSPEC_MMA_PMXVI16GER2PP	"pmxvi16ger2pp")
+				 (UNSPEC_MMA_PMXVI16GER2SPP	"pmxvi16ger2spp")
+				 (UNSPEC_MMA_PMXVF16GER2PP	"pmxvf16ger2pp")
+				 (UNSPEC_MMA_PMXVF16GER2PN	"pmxvf16ger2pn")
+				 (UNSPEC_MMA_PMXVF16GER2NP	"pmxvf16ger2np")
+				 (UNSPEC_MMA_PMXVF16GER2NN	"pmxvf16ger2nn")
+				 (UNSPEC_MMA_PMXVBF16GER2PP	"pmxvbf16ger2pp")
+				 (UNSPEC_MMA_PMXVBF16GER2PN	"pmxvbf16ger2pn")
+				 (UNSPEC_MMA_PMXVBF16GER2NP	"pmxvbf16ger2np")
+				 (UNSPEC_MMA_PMXVBF16GER2NN	"pmxvbf16ger2nn")])
+
+(define_int_attr vvi4i4		[(UNSPEC_MMA_PMXVF32GER		"pmxvf32ger")])
+
+(define_int_attr avvi4i4	[(UNSPEC_MMA_PMXVF32GERPP	"pmxvf32gerpp")
+				 (UNSPEC_MMA_PMXVF32GERPN	"pmxvf32gerpn")
+				 (UNSPEC_MMA_PMXVF32GERNP	"pmxvf32gernp")
+				 (UNSPEC_MMA_PMXVF32GERNN	"pmxvf32gernn")])
+
+(define_int_attr pvi4i2		[(UNSPEC_MMA_PMXVF64GER		"pmxvf64ger")])
+
+(define_int_attr apvi4i2	[(UNSPEC_MMA_PMXVF64GERPP	"pmxvf64gerpp")
+				 (UNSPEC_MMA_PMXVF64GERPN	"pmxvf64gerpn")
+				 (UNSPEC_MMA_PMXVF64GERNP	"pmxvf64gernp")
+				 (UNSPEC_MMA_PMXVF64GERNN	"pmxvf64gernn")])
+
+(define_int_attr vvi4i4i4	[(UNSPEC_MMA_PMXVI8GER4		"pmxvi8ger4")])
+
+(define_int_attr avvi4i4i4	[(UNSPEC_MMA_PMXVI8GER4PP	"pmxvi8ger4pp")
+				 (UNSPEC_MMA_PMXVI8GER4SPP	"pmxvi8ger4spp")])
+
+
 ;; Define a disabled OImode move pattern, so we can use POImode.
 (define_expand "movoi"
   [(set (match_operand:OI 0 "nonimmediate_operand")
@@ -109,10 +343,11 @@ (define_expand "movpxi"
 })
 
 (define_insn_and_split "*movpxi"
-  [(set (match_operand:PXI 0 "nonimmediate_operand" "=d,m,d")
-	(match_operand:PXI 1 "input_operand" "m,d,d"))]
+  [(set (match_operand:PXI 0 "nonimmediate_operand" "=d,m,d,d")
+	(match_operand:PXI 1 "input_operand"    "m,d,d,O"))]
   "TARGET_MMA
-   && (gpc_reg_operand (operands[0], PXImode)
+   && ((gpc_reg_operand (operands[0], PXImode)
+	&& !(CONST_INT_P (operands[1]) && INTVAL (operands[1]) == 0))
        || gpc_reg_operand (operands[1], PXImode))"
   "#"
   "&& reload_completed"
@@ -121,6 +356,249 @@ (define_insn_and_split "*movpxi"
   rs6000_split_multireg_move (operands[0], operands[1]);
   DONE;
 }
-  [(set_attr "type" "vecload,vecstore,veclogical")
-   (set_attr "length" "8,8,16")
-   (set_attr "max_prefixed_insns" "2,2,*")])
+  [(set_attr "type" "vecload,vecstore,veclogical,mma")
+   (set_attr "length" "8,8,16,*")
+   (set_attr "max_prefixed_insns" "2,2,*,*")])
+
+(define_expand "mma_assemble_pair"
+  [(match_operand:POI 0 "vsx_register_operand")
+   (match_operand:V16QI 1 "input_operand")
+   (match_operand:V16QI 2 "input_operand")]
+  "TARGET_MMA"
+{
+  rtx dst;
+
+  /* Let the compiler know the code below fully defines our output value.  */
+  emit_clobber (operands[0]);
+
+  dst = simplify_gen_subreg (V16QImode, operands[0], POImode, 0);
+  emit_move_insn (dst, operands[1]);
+  dst = simplify_gen_subreg (V16QImode, operands[0], POImode, 16);
+  emit_move_insn (dst, operands[2]);
+  DONE;
+})
+
+(define_expand "mma_assemble_acc"
+  [(match_operand:PXI 0 "fpr_reg_operand")
+   (match_operand:V16QI 1 "input_operand")
+   (match_operand:V16QI 2 "input_operand")
+   (match_operand:V16QI 3 "input_operand")
+   (match_operand:V16QI 4 "input_operand")]
+  "TARGET_MMA"
+{
+  rtx src = gen_rtx_UNSPEC (PXImode,
+			    gen_rtvec (4, operands[1], operands[2],
+				       operands[3], operands[4]),
+			    UNSPEC_MMA_ASSEMBLE_ACC);
+  emit_move_insn (operands[0], src);
+  DONE;
+})
+
+(define_insn_and_split "*mma_assemble_acc"
+  [(set (match_operand:PXI 0 "fpr_reg_operand" "=d")
+	(unspec:PXI [(match_operand:PXI 1 "mma_input_operand" "mwa")
+		     (match_operand:PXI 2 "mma_input_operand" "mwa")
+		     (match_operand:PXI 3 "mma_input_operand" "mwa")
+		     (match_operand:PXI 4 "mma_input_operand" "mwa")]
+		     UNSPEC_MMA_ASSEMBLE_ACC))]
+  "TARGET_MMA
+   && fpr_reg_operand (operands[0], PXImode)"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  rtx src = gen_rtx_UNSPEC (PXImode,
+			    gen_rtvec (4, operands[1], operands[2],
+				       operands[3], operands[4]),
+			    UNSPEC_MMA_ASSEMBLE_ACC);
+  rs6000_split_multireg_move (operands[0], src);
+  DONE;
+})
+
+;; MMA instructions that do not use their accumulators as an input, still
+;; must not allow their vector operands to overlap the registers used by
+;; the accumulator.  We enforce this by marking the output as early clobber.
+
+(define_insn "mma_<acc>"
+  [(set (match_operand:PXI 0 "fpr_reg_operand" "=&d")
+	(unspec:PXI [(match_operand:PXI 1 "fpr_reg_operand" "0")]
+		    MMA_ACC))]
+  "TARGET_MMA"
+  "<acc> %A0"
+  [(set_attr "type" "mma")])
+
+(define_insn "mma_xxsetaccz"
+  [(set (match_operand:PXI 0 "fpr_reg_operand" "=d")
+	(const_int 0))]
+  "TARGET_MMA"
+  "xxsetaccz %A0"
+  [(set_attr "type" "mma")])
+
+(define_insn "mma_<vv>"
+  [(set (match_operand:PXI 0 "fpr_reg_operand" "=&d")
+	(unspec:PXI [(match_operand:V16QI 1 "vsx_register_operand" "wa")
+		     (match_operand:V16QI 2 "vsx_register_operand" "wa")]
+		     MMA_VV))]
+  "TARGET_MMA"
+  "<vv> %A0,%x1,%x2"
+  [(set_attr "type" "mma")])
+
+(define_insn "mma_<avv>"
+  [(set (match_operand:PXI 0 "fpr_reg_operand" "=&d")
+	(unspec:PXI [(match_operand:PXI 1 "fpr_reg_operand" "0")
+		     (match_operand:V16QI 2 "vsx_register_operand" "wa")
+		     (match_operand:V16QI 3 "vsx_register_operand" "wa")]
+		     MMA_AVV))]
+  "TARGET_MMA"
+  "<avv> %A0,%x2,%x3"
+  [(set_attr "type" "mma")])
+
+(define_insn "mma_<pv>"
+  [(set (match_operand:PXI 0 "fpr_reg_operand" "=&d")
+	(unspec:PXI [(match_operand:POI 1 "vsx_register_operand" "wa")
+		     (match_operand:V16QI 2 "vsx_register_operand" "wa")]
+		     MMA_PV))]
+  "TARGET_MMA"
+  "<pv> %A0,%x1,%x2"
+  [(set_attr "type" "mma")])
+
+(define_insn "mma_<apv>"
+  [(set (match_operand:PXI 0 "fpr_reg_operand" "=&d")
+	(unspec:PXI [(match_operand:PXI 1 "fpr_reg_operand" "0")
+		     (match_operand:POI 2 "vsx_register_operand" "wa")
+		     (match_operand:V16QI 3 "vsx_register_operand" "wa")]
+		     MMA_APV))]
+  "TARGET_MMA"
+  "<apv> %A0,%x2,%x3"
+  [(set_attr "type" "mma")])
+
+(define_insn "mma_<vvi4i4i8>"
+  [(set (match_operand:PXI 0 "fpr_reg_operand" "=&d")
+	(unspec:PXI [(match_operand:V16QI 1 "vsx_register_operand" "wa")
+		     (match_operand:V16QI 2 "vsx_register_operand" "wa")
+		     (match_operand:SI 3 "const_0_to_15_operand" "n")
+		     (match_operand:SI 4 "const_0_to_15_operand" "n")
+		     (match_operand:SI 5 "u8bit_cint_operand" "n")]
+		     MMA_VVI4I4I8))]
+  "TARGET_MMA"
+  "<vvi4i4i8> %A0,%x1,%x2,%3,%4,%5"
+  [(set_attr "type" "mma")
+   (set_attr "length" "8")])
+
+(define_insn "mma_<avvi4i4i8>"
+  [(set (match_operand:PXI 0 "fpr_reg_operand" "=&d")
+	(unspec:PXI [(match_operand:PXI 1 "fpr_reg_operand" "0")
+		     (match_operand:V16QI 2 "vsx_register_operand" "wa")
+		     (match_operand:V16QI 3 "vsx_register_operand" "wa")
+		     (match_operand:SI 4 "const_0_to_15_operand" "n")
+		     (match_operand:SI 5 "const_0_to_15_operand" "n")
+		     (match_operand:SI 6 "u8bit_cint_operand" "n")]
+		     MMA_AVVI4I4I8))]
+  "TARGET_MMA"
+  "<avvi4i4i8> %A0,%x2,%x3,%4,%5,%6"
+  [(set_attr "type" "mma")
+   (set_attr "length" "8")])
+
+(define_insn "mma_<vvi4i4i2>"
+  [(set (match_operand:PXI 0 "fpr_reg_operand" "=&d")
+	(unspec:PXI [(match_operand:V16QI 1 "vsx_register_operand" "wa")
+		     (match_operand:V16QI 2 "vsx_register_operand" "wa")
+		     (match_operand:SI 3 "const_0_to_15_operand" "n")
+		     (match_operand:SI 4 "const_0_to_15_operand" "n")
+		     (match_operand:SI 5 "const_0_to_3_operand" "n")]
+		     MMA_VVI4I4I2))]
+  "TARGET_MMA"
+  "<vvi4i4i2> %A0,%x1,%x2,%3,%4,%5"
+  [(set_attr "type" "mma")
+   (set_attr "length" "8")])
+
+(define_insn "mma_<avvi4i4i2>"
+  [(set (match_operand:PXI 0 "fpr_reg_operand" "=&d")
+	(unspec:PXI [(match_operand:PXI 1 "fpr_reg_operand" "0")
+		     (match_operand:V16QI 2 "vsx_register_operand" "wa")
+		     (match_operand:V16QI 3 "vsx_register_operand" "wa")
+		     (match_operand:SI 4 "const_0_to_15_operand" "n")
+		     (match_operand:SI 5 "const_0_to_15_operand" "n")
+		     (match_operand:SI 6 "const_0_to_3_operand" "n")]
+		     MMA_AVVI4I4I2))]
+  "TARGET_MMA"
+  "<avvi4i4i2> %A0,%x2,%x3,%4,%5,%6"
+  [(set_attr "type" "mma")
+   (set_attr "length" "8")])
+
+(define_insn "mma_<vvi4i4>"
+  [(set (match_operand:PXI 0 "fpr_reg_operand" "=&d")
+	(unspec:PXI [(match_operand:V16QI 1 "vsx_register_operand" "wa")
+		     (match_operand:V16QI 2 "vsx_register_operand" "wa")
+		     (match_operand:SI 3 "const_0_to_15_operand" "n")
+		     (match_operand:SI 4 "const_0_to_15_operand" "n")]
+		     MMA_VVI4I4))]
+  "TARGET_MMA"
+  "<vvi4i4> %A0,%x1,%x2,%3,%4"
+  [(set_attr "type" "mma")
+   (set_attr "length" "8")])
+
+(define_insn "mma_<avvi4i4>"
+  [(set (match_operand:PXI 0 "fpr_reg_operand" "=&d")
+	(unspec:PXI [(match_operand:PXI 1 "fpr_reg_operand" "0")
+		     (match_operand:V16QI 2 "vsx_register_operand" "wa")
+		     (match_operand:V16QI 3 "vsx_register_operand" "wa")
+		     (match_operand:SI 4 "const_0_to_15_operand" "n")
+		     (match_operand:SI 5 "const_0_to_15_operand" "n")]
+		     MMA_AVVI4I4))]
+  "TARGET_MMA"
+  "<avvi4i4> %A0,%x2,%x3,%4,%5"
+  [(set_attr "type" "mma")
+   (set_attr "length" "8")])
+
+(define_insn "mma_<pvi4i2>"
+  [(set (match_operand:PXI 0 "fpr_reg_operand" "=&d")
+	(unspec:PXI [(match_operand:POI 1 "vsx_register_operand" "wa")
+		     (match_operand:V16QI 2 "vsx_register_operand" "wa")
+		     (match_operand:SI 3 "const_0_to_15_operand" "n")
+		     (match_operand:SI 4 "const_0_to_3_operand" "n")]
+		     MMA_PVI4I2))]
+  "TARGET_MMA"
+  "<pvi4i2> %A0,%x1,%x2,%3,%4"
+  [(set_attr "type" "mma")
+   (set_attr "length" "8")])
+
+(define_insn "mma_<apvi4i2>"
+  [(set (match_operand:PXI 0 "fpr_reg_operand" "=&d")
+	(unspec:PXI [(match_operand:PXI 1 "fpr_reg_operand" "0")
+		     (match_operand:POI 2 "vsx_register_operand" "wa")
+		     (match_operand:V16QI 3 "vsx_register_operand" "wa")
+		     (match_operand:SI 4 "const_0_to_15_operand" "n")
+		     (match_operand:SI 5 "const_0_to_3_operand" "n")]
+		     MMA_APVI4I2))]
+  "TARGET_MMA"
+  "<apvi4i2> %A0,%x2,%x3,%4,%5"
+  [(set_attr "type" "mma")
+   (set_attr "length" "8")])
+
+(define_insn "mma_<vvi4i4i4>"
+  [(set (match_operand:PXI 0 "fpr_reg_operand" "=&d")
+	(unspec:PXI [(match_operand:V16QI 1 "vsx_register_operand" "wa")
+		     (match_operand:V16QI 2 "vsx_register_operand" "wa")
+		     (match_operand:SI 3 "const_0_to_15_operand" "n")
+		     (match_operand:SI 4 "const_0_to_15_operand" "n")
+		     (match_operand:SI 5 "const_0_to_15_operand" "n")]
+		     MMA_VVI4I4I4))]
+  "TARGET_MMA"
+  "<vvi4i4i4> %A0,%x1,%x2,%3,%4,%5"
+  [(set_attr "type" "mma")
+   (set_attr "length" "8")])
+
+(define_insn "mma_<avvi4i4i4>"
+  [(set (match_operand:PXI 0 "fpr_reg_operand" "=&d")
+	(unspec:PXI [(match_operand:PXI 1 "fpr_reg_operand" "0")
+		     (match_operand:V16QI 2 "vsx_register_operand" "wa")
+		     (match_operand:V16QI 3 "vsx_register_operand" "wa")
+		     (match_operand:SI 4 "const_0_to_15_operand" "n")
+		     (match_operand:SI 5 "const_0_to_15_operand" "n")
+		     (match_operand:SI 6 "const_0_to_15_operand" "n")]
+		     MMA_AVVI4I4I4))]
+  "TARGET_MMA"
+  "<avvi4i4i4> %A0,%x2,%x3,%4,%5,%6"
+  [(set_attr "type" "mma")
+   (set_attr "length" "8")])
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 6b462a3ecdb..bbe0b4610fb 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -203,7 +203,7 @@ (define_attr "type"
    vecsimple,veccomplex,vecdiv,veccmp,veccmpsimple,vecperm,
    vecfloat,vecfdiv,vecdouble,mffgpr,mftgpr,crypto,
    veclogical,veccmpfx,vecexts,vecmove,
-   htm,htmsimple,dfp"
+   htm,htmsimple,dfp,mma"
   (const_string "integer"))
 
 ;; What data size does this instruction work on?
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 2a28215ac5b..342927abeda 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -296,6 +296,8 @@ (define_c_enum "unspec"
    UNSPEC_VSX_DIVUD
    UNSPEC_VSX_MULSD
    UNSPEC_VSX_SIGN_EXTEND
+   UNSPEC_VSX_XVCVBF16SP
+   UNSPEC_VSX_XVCVSPBF16
    UNSPEC_VSX_XVCVSPSXDS
    UNSPEC_VSX_VSLO
    UNSPEC_VSX_EXTRACT
@@ -346,6 +348,12 @@ (define_c_enum "unspec"
    UNSPEC_XXGENPCV
   ])
 
+(define_int_iterator XVCVBF16	[UNSPEC_VSX_XVCVSPBF16
+				 UNSPEC_VSX_XVCVBF16SP])
+
+(define_int_attr xvcvbf16       [(UNSPEC_VSX_XVCVSPBF16 "xvcvspbf16")
+				 (UNSPEC_VSX_XVCVBF16SP "xvcvbf16sp")])
+
 ;; VSX moves
 
 ;; The patterns for LE permuted loads and stores come before the general
@@ -5676,3 +5684,10 @@ (define_expand "vec_unpack_<su>fix_trunc_lo_v4sf"
   DONE;
 })
 
+(define_insn "vsx_<xvcvbf16>"
+  [(set (match_operand:V16QI 0 "vsx_register_operand" "=wa")
+	(unspec:V16QI [(match_operand:V16QI 1 "vsx_register_operand" "wa")]
+		      XVCVBF16))]
+  "TARGET_FUTURE"
+  "<xvcvbf16> %x0,%x1"
+  [(set_attr "type" "vecfloat")])
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index e656e66a80c..8242c48337e 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -13858,6 +13858,7 @@ instructions, but allow the compiler to schedule those calls.
 * PowerPC AltiVec/VSX Built-in Functions::
 * PowerPC Hardware Transactional Memory Built-in Functions::
 * PowerPC Atomic Memory Operation Functions::
+* PowerPC Matrix-Multiply Assist Built-in Functions::
 * RX Built-in Functions::
 * S/390 System z Built-in Functions::
 * SH Built-in Functions::
@@ -21359,6 +21360,100 @@ void amo_stdat_smax (int64_t *, int64_t);
 void amo_stdat_smin (int64_t *, int64_t);
 @end smallexample
 
+@node PowerPC Matrix-Multiply Assist Built-in Functions
+@subsection PowerPC Matrix-Multiply Assist Built-in Functions
+ISA 3.1 of the PowerPC added new Matrix-Multiply Assist (MMA) instructions.
+GCC provides support for these instructions through the following built-in
+functions which are enabled with the @code{-mmma} option.  The vec_t type
+below is defined to be a normal vector unsigned char type.  The uint2, uint4
+and uint8 parameters are 2-bit, 4-bit and 8-bit unsigned integer constants
+respectively.  The compiler will verify that they are constants and that
+their values are within range. 
+
+The built-in functions supported are:
+
+@smallexample
+void __builtin_mma_xvi4ger8 (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvi8ger4 (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvi16ger2 (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvi16ger2s (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvf16ger2 (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvbf16ger2 (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvf32ger (__vector_quad *, vec_t, vec_t);
+
+void __builtin_mma_xvi4ger8pp (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvi8ger4pp (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvi8ger4spp(__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvi16ger2pp (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvi16ger2spp (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvf16ger2pp (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvf16ger2pn (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvf16ger2np (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvf16ger2nn (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvbf16ger2pp (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvbf16ger2pn (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvbf16ger2np (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvbf16ger2nn (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvf32gerpp (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvf32gerpn (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvf32gernp (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvf32gernn (__vector_quad *, vec_t, vec_t);
+
+void __builtin_mma_pmxvi4ger8 (__vector_quad *, vec_t, vec_t, uint4, uint4, uint8);
+void __builtin_mma_pmxvi4ger8pp (__vector_quad *, vec_t, vec_t, uint4, uint4, uint8);
+
+void __builtin_mma_pmxvi8ger4 (__vector_quad *, vec_t, vec_t, uint4, uint4, uint4);
+void __builtin_mma_pmxvi8ger4pp (__vector_quad *, vec_t, vec_t, uint4, uint4, uint4);
+void __builtin_mma_pmxvi8ger4spp(__vector_quad *, vec_t, vec_t, uint4, uint4, uint4);
+
+void __builtin_mma_pmxvi16ger2 (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+void __builtin_mma_pmxvi16ger2s (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+void __builtin_mma_pmxvf16ger2 (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+void __builtin_mma_pmxvbf16ger2 (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+
+void __builtin_mma_pmxvi16ger2pp (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+void __builtin_mma_pmxvi16ger2spp (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+void __builtin_mma_pmxvf16ger2pp (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+void __builtin_mma_pmxvf16ger2pn (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+void __builtin_mma_pmxvf16ger2np (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+void __builtin_mma_pmxvf16ger2nn (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+void __builtin_mma_pmxvbf16ger2pp (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+void __builtin_mma_pmxvbf16ger2pn (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+void __builtin_mma_pmxvbf16ger2np (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+void __builtin_mma_pmxvbf16ger2nn (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+
+void __builtin_mma_pmxvf32ger (__vector_quad *, vec_t, vec_t, uint4, uint4);
+void __builtin_mma_pmxvf32gerpp (__vector_quad *, vec_t, vec_t, uint4, uint4);
+void __builtin_mma_pmxvf32gerpn (__vector_quad *, vec_t, vec_t, uint4, uint4);
+void __builtin_mma_pmxvf32gernp (__vector_quad *, vec_t, vec_t, uint4, uint4);
+void __builtin_mma_pmxvf32gernn (__vector_quad *, vec_t, vec_t, uint4, uint4);
+
+void __builtin_mma_xvf64ger (__vector_quad *, __vector_pair, vec_t);
+void __builtin_mma_xvf64gerpp (__vector_quad *, __vector_pair, vec_t);
+void __builtin_mma_xvf64gerpn (__vector_quad *, __vector_pair, vec_t);
+void __builtin_mma_xvf64gernp (__vector_quad *, __vector_pair, vec_t);
+void __builtin_mma_xvf64gernn (__vector_quad *, __vector_pair, vec_t);
+
+void __builtin_mma_pmxvf64ger (__vector_quad *, __vector_pair, vec_t, uint4, uint2);
+void __builtin_mma_pmxvf64gerpp (__vector_quad *, __vector_pair, vec_t, uint4, uint2);
+void __builtin_mma_pmxvf64gerpn (__vector_quad *, __vector_pair, vec_t, uint4, uint2);
+void __builtin_mma_pmxvf64gernp (__vector_quad *, __vector_pair, vec_t, uint4, uint2);
+void __builtin_mma_pmxvf64gernn (__vector_quad *, __vector_pair, vec_t, uint4, uint2);
+
+void __builtin_mma_xxmtacc (__vector_quad *);
+void __builtin_mma_xxmfacc (__vector_quad *);
+void __builtin_mma_xxsetaccz (__vector_quad *);
+
+void __builtin_mma_assemble_acc (__vector_quad *, vec_t, vec_t, vec_t, vec_t);
+void __builtin_mma_disassemble_acc (void *, __vector_quad *);
+
+void __builtin_mma_assemble_pair (__vector_pair *, vec_t, vec_t);
+void __builtin_mma_disassemble_pair (void *, __vector_pair *);
+
+vec_t __builtin_vsx_xvcvspbf16 (vec_t);
+vec_t __builtin_vsx_xvcvbf16sp (vec_t);
+@end smallexample
+
 @node RX Built-in Functions
 @subsection RX Built-in Functions
 GCC supports some of the RX instructions which cannot be expressed in

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 3/3, v2] rs6000: Add testsuite test cases for MMA built-ins.
  2020-06-18 20:42 [PATCH 0/3, v2] rs6000: Add support for Matrix-Multiply Assist (MMA) built-in functions Peter Bergner
  2020-06-18 20:44 ` [PATCH 1/3, v2] rs6000: Add base support and types for defining MMA built-ins Peter Bergner
  2020-06-18 20:45 ` [PATCH 2/3, v2] rs6000: Add MMA built-in function definitions Peter Bergner
@ 2020-06-18 20:46 ` Peter Bergner
  2020-06-19 16:53   ` Segher Boessenkool
  2020-06-24 19:28 ` [PATCH 0/3, v2] rs6000: Add support for Matrix-Multiply Assist (MMA) built-in functions Peter Bergner
  3 siblings, 1 reply; 19+ messages in thread
From: Peter Bergner @ 2020-06-18 20:46 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: GCC Patches, Bill Schmidt, Michael Meissner, David Edelsohn,
	Will Schmidt

Changes since v1:
  - No changes from v1.

This patch adds the testsuite test cases for all of the MMA built-ins.

This patch was tested with patch1 + patch2.

Peter

2020-06-18  Peter Bergner  <bergner@linux.ibm.com>

gcc/testsuite/
	* gcc.target/powerpc/mma-builtin-1.c: New test.
	* gcc.target/powerpc/mma-builtin-2.c: New test.
	* gcc.target/powerpc/mma-builtin-3.c: New test.
	* gcc.target/powerpc/mma-builtin-4.c: New test.
	* gcc.target/powerpc/mma-builtin-5.c: New test.
	* gcc.target/powerpc/mma-builtin-6.c: New test.

diff --git a/gcc/testsuite/gcc.target/powerpc/mma-builtin-1.c b/gcc/testsuite/gcc.target/powerpc/mma-builtin-1.c
new file mode 100644
index 00000000000..a971c869095
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/mma-builtin-1.c
@@ -0,0 +1,313 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_future_ok } */
+/* { dg-options "-Wno-psabi -mdejagnu-cpu=future -O2" } */
+
+typedef unsigned char  vec_t __attribute__((vector_size(16)));
+
+void
+foo0 (__vector_quad *dst, vec_t *vec)
+{
+  __vector_quad acc;
+  vec_t vec0 = vec[0];
+  vec_t vec1 = vec[1];
+
+  __builtin_mma_xvi4ger8 (&acc, vec0, vec1);
+  __builtin_mma_xvi4ger8pp (&acc, vec0, vec1);
+  dst[0] = acc;
+}
+
+void
+foo1 (__vector_quad *dst, vec_t *vec)
+{
+  __vector_quad acc;
+  vec_t vec0 = vec[0];
+  vec_t vec1 = vec[1];
+
+  __builtin_mma_xvi8ger4 (&acc, vec0, vec1);
+  __builtin_mma_xvi8ger4pp (&acc, vec0, vec1);
+  __builtin_mma_xvi8ger4spp(&acc, vec0, vec1);
+  dst[1] = acc;
+}
+
+void
+foo2 (__vector_quad *dst, vec_t *vec)
+{
+  __vector_quad acc;
+  vec_t vec0 = vec[0];
+  vec_t vec1 = vec[1];
+
+  __builtin_mma_xvi16ger2 (&acc, vec0, vec1);
+  __builtin_mma_xvi16ger2pp (&acc, vec0, vec1);
+  dst[2] = acc;
+}
+
+void
+foo3 (__vector_quad *dst, vec_t *vec)
+{
+  __vector_quad acc;
+  vec_t vec0 = vec[0];
+  vec_t vec1 = vec[1];
+
+  __builtin_mma_xvi16ger2s (&acc, vec0, vec1);
+  __builtin_mma_xvi16ger2spp (&acc, vec0, vec1);
+  dst[3] = acc;
+}
+
+void
+foo4 (__vector_quad *dst, vec_t *vec)
+{
+  __vector_quad acc;
+  vec_t vec0 = vec[0];
+  vec_t vec1 = vec[1];
+
+  __builtin_mma_xvf16ger2 (&acc, vec0, vec1);
+  __builtin_mma_xvf16ger2pp (&acc, vec0, vec1);
+  __builtin_mma_xvf16ger2pn (&acc, vec0, vec1);
+  dst[4] = acc;
+}
+
+void
+foo4b (__vector_quad *dst, __vector_quad *src, vec_t *vec)
+{
+  __vector_quad acc;
+  vec_t vec0 = vec[0];
+  vec_t vec1 = vec[1];
+
+  acc = src[0];
+  __builtin_mma_xvf16ger2np (&acc, vec0, vec1);
+  __builtin_mma_xvf16ger2nn (&acc, vec0, vec1);
+  dst[4] = acc;
+}
+
+void
+foo5 (__vector_quad *dst, vec_t *vec)
+{
+  __vector_quad acc;
+  vec_t vec0 = vec[0];
+  vec_t vec1 = vec[1];
+
+  __builtin_mma_xvbf16ger2 (&acc, vec0, vec1);
+  __builtin_mma_xvbf16ger2pp (&acc, vec0, vec1);
+  __builtin_mma_xvbf16ger2pn (&acc, vec0, vec1);
+  dst[5] = acc;
+}
+
+void
+foo5b (__vector_quad *dst, __vector_quad *src, vec_t *vec)
+{
+  __vector_quad acc;
+  vec_t vec0 = vec[0];
+  vec_t vec1 = vec[1];
+
+  acc = src[0];
+  __builtin_mma_xvbf16ger2np (&acc, vec0, vec1);
+  __builtin_mma_xvbf16ger2nn (&acc, vec0, vec1);
+  dst[5] = acc;
+}
+
+void
+foo6 (__vector_quad *dst, vec_t *vec)
+{
+  __vector_quad acc;
+  vec_t vec0 = vec[0];
+  vec_t vec1 = vec[1];
+
+  __builtin_mma_xvf32ger (&acc, vec0, vec1);
+  __builtin_mma_xvf32gerpp (&acc, vec0, vec1);
+  __builtin_mma_xvf32gerpn (&acc, vec0, vec1);
+  dst[6] = acc;
+}
+
+void
+foo6b (__vector_quad *dst, __vector_quad *src, vec_t *vec)
+{
+  __vector_quad acc;
+  vec_t vec0 = vec[0];
+  vec_t vec1 = vec[1];
+
+  acc = src[0];
+  __builtin_mma_xvf32gernp (&acc, vec0, vec1);
+  __builtin_mma_xvf32gernn (&acc, vec0, vec1);
+  dst[6] = acc;
+}
+
+void
+foo7 (__vector_quad *dst, vec_t *vec)
+{
+  __vector_quad acc;
+  vec_t vec0 = vec[0];
+  vec_t vec1 = vec[1];
+
+  __builtin_mma_pmxvi4ger8 (&acc, vec0, vec1, 15, 15, 255);
+  __builtin_mma_pmxvi4ger8pp (&acc, vec0, vec1, 15, 15, 255);
+  dst[7] = acc;
+}
+
+void
+foo8 (__vector_quad *dst, vec_t *vec)
+{
+  __vector_quad acc;
+  vec_t vec0 = vec[0];
+  vec_t vec1 = vec[1];
+
+  __builtin_mma_pmxvi8ger4 (&acc, vec0, vec1, 15, 15, 15);
+  __builtin_mma_pmxvi8ger4pp (&acc, vec0, vec1, 15, 15, 15);
+  __builtin_mma_pmxvi8ger4spp(&acc, vec0, vec1, 15, 15, 15);
+  dst[8] = acc;
+}
+
+void
+foo9 (__vector_quad *dst, vec_t *vec)
+{
+  __vector_quad acc;
+  vec_t vec0 = vec[0];
+  vec_t vec1 = vec[1];
+
+  __builtin_mma_pmxvi16ger2 (&acc, vec0, vec1, 15, 15, 3);
+  __builtin_mma_pmxvi16ger2pp (&acc, vec0, vec1, 15, 15, 3);
+  dst[9] = acc;
+}
+
+void
+foo10 (__vector_quad *dst, vec_t *vec)
+{
+  __vector_quad acc;
+  vec_t vec0 = vec[0];
+  vec_t vec1 = vec[1];
+
+  __builtin_mma_pmxvi16ger2s (&acc, vec0, vec1, 15, 15, 3);
+  __builtin_mma_pmxvi16ger2spp (&acc, vec0, vec1, 15, 15, 3);
+  dst[10] = acc;
+}
+
+void
+foo11 (__vector_quad *dst, vec_t *vec)
+{
+  __vector_quad acc;
+  vec_t vec0 = vec[0];
+  vec_t vec1 = vec[1];
+
+  __builtin_mma_pmxvf16ger2 (&acc, vec0, vec1, 15, 15, 3);
+  __builtin_mma_pmxvf16ger2pp (&acc, vec0, vec1, 15, 15, 3);
+  __builtin_mma_pmxvf16ger2pn (&acc, vec0, vec1, 15, 15, 3);
+  dst[11] = acc;
+}
+
+void
+foo11b (__vector_quad *dst, __vector_quad *src, vec_t *vec)
+{
+  __vector_quad acc;
+  vec_t vec0 = vec[0];
+  vec_t vec1 = vec[1];
+
+  acc = src[0];
+  __builtin_mma_pmxvf16ger2np (&acc, vec0, vec1, 15, 15, 3);
+  __builtin_mma_pmxvf16ger2nn (&acc, vec0, vec1, 15, 15, 3);
+  dst[11] = acc;
+}
+
+void
+foo12 (__vector_quad *dst, vec_t *vec)
+{
+  __vector_quad acc;
+  vec_t vec0 = vec[0];
+  vec_t vec1 = vec[1];
+
+  __builtin_mma_pmxvbf16ger2 (&acc, vec0, vec1, 15, 15, 3);
+  __builtin_mma_pmxvbf16ger2pp (&acc, vec0, vec1, 15, 15, 3);
+  __builtin_mma_pmxvbf16ger2pn (&acc, vec0, vec1, 15, 15, 3);
+  dst[12] = acc;
+}
+
+void
+foo12b (__vector_quad *dst, __vector_quad *src, vec_t *vec)
+{
+  __vector_quad acc;
+  vec_t vec0 = vec[0];
+  vec_t vec1 = vec[1];
+
+  acc = src[0];
+  __builtin_mma_pmxvbf16ger2np (&acc, vec0, vec1, 15, 15, 3);
+  __builtin_mma_pmxvbf16ger2nn (&acc, vec0, vec1, 15, 15, 3);
+  dst[12] = acc;
+}
+
+void
+foo13 (__vector_quad *dst, vec_t *vec)
+{
+  __vector_quad acc;
+  vec_t vec0 = vec[0];
+  vec_t vec1 = vec[1];
+
+  __builtin_mma_pmxvf32ger (&acc, vec0, vec1, 15, 15);
+  __builtin_mma_pmxvf32gerpp (&acc, vec0, vec1, 15, 15);
+  __builtin_mma_pmxvf32gerpn (&acc, vec0, vec1, 15, 15);
+  dst[13] = acc;
+}
+
+void
+foo13b (__vector_quad *dst, __vector_quad *src, vec_t *vec)
+{
+  __vector_quad acc;
+  vec_t vec0 = vec[0];
+  vec_t vec1 = vec[1];
+
+  acc = src[0];
+  __builtin_mma_pmxvf32gernp (&acc, vec0, vec1, 15, 15);
+  __builtin_mma_pmxvf32gernn (&acc, vec0, vec1, 15, 15);
+  dst[13] = acc;
+}
+
+/* { dg-final { scan-assembler-times {\mlxv\M} 40 } } */
+/* { dg-final { scan-assembler-times {\mlxvp\M} 12 } } */
+/* { dg-final { scan-assembler-times {\mstxvp\M} 40 } } */
+/* { dg-final { scan-assembler-times {\mxxmfacc\M} 20 } } */
+/* { dg-final { scan-assembler-times {\mxxmtacc\M} 6 } } */
+/* { dg-final { scan-assembler-times {\mxvbf16ger2\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvbf16ger2nn\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvbf16ger2np\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvbf16ger2pn\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvbf16ger2pp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvf16ger2\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvf16ger2nn\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvf16ger2np\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvf16ger2pn\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvf16ger2pp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvf32ger\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvf32gernn\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvf32gernp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvf32gerpn\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvf32gerpp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvi16ger2\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvi16ger2pp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvi16ger2s\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvi16ger2spp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvi4ger8\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvi4ger8pp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvi8ger4\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvi8ger4pp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvi8ger4spp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvbf16ger2\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvbf16ger2nn\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvbf16ger2np\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvbf16ger2pn\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvbf16ger2pp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvf16ger2\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvf16ger2nn\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvf16ger2np\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvf16ger2pn\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvf16ger2pp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvf32ger\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvf32gernn\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvf32gernp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvf32gerpn\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvf32gerpp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvi16ger2\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvi16ger2pp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvi16ger2s\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvi16ger2spp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvi4ger8\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvi4ger8pp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvi8ger4\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvi8ger4pp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvi8ger4spp\M} 1 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/mma-builtin-2.c b/gcc/testsuite/gcc.target/powerpc/mma-builtin-2.c
new file mode 100644
index 00000000000..cb8b30dd992
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/mma-builtin-2.c
@@ -0,0 +1,72 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_future_ok } */
+/* { dg-options "-Wno-psabi -mdejagnu-cpu=future -O2" } */
+
+typedef unsigned char  vec_t __attribute__((vector_size(16)));
+
+void
+foo0 (__vector_quad *dst, vec_t *vec, __vector_pair *pvecp)
+{
+  __vector_quad acc;
+  __vector_pair vecp0 = *pvecp;
+  vec_t vec1 = vec[1];
+
+  __builtin_mma_xvf64ger (&acc, vecp0, vec1);
+  __builtin_mma_xvf64gerpp (&acc, vecp0, vec1);
+  __builtin_mma_xvf64gerpn (&acc, vecp0, vec1);
+  dst[0] = acc;
+}
+
+void
+foo1 (__vector_quad *dst, __vector_quad *src, vec_t *vec, __vector_pair *pvecp)
+{
+  __vector_quad acc;
+  __vector_pair vecp0 = *pvecp;
+  vec_t vec1 = vec[1];
+
+  acc = src[0];
+  __builtin_mma_xvf64gernp (&acc, vecp0, vec1);
+  __builtin_mma_xvf64gernn (&acc, vecp0, vec1);
+  dst[0] = acc;
+}
+
+void
+foo2 (__vector_quad *dst, vec_t *vec, __vector_pair *pvecp)
+{
+  __vector_quad acc;
+  __vector_pair vecp0 = *pvecp;
+  vec_t vec1 = vec[1];
+  __builtin_mma_pmxvf64ger (&acc, vecp0, vec1, 15, 3);
+  __builtin_mma_pmxvf64gerpp (&acc, vecp0, vec1, 15, 3);
+  __builtin_mma_pmxvf64gerpn (&acc, vecp0, vec1, 15, 3);
+  dst[1] = acc;
+}
+
+void
+foo3 (__vector_quad *dst, __vector_quad *src, vec_t *vec, __vector_pair *pvecp)
+{
+  __vector_quad acc;
+  __vector_pair vecp0 = *pvecp;
+  vec_t vec1 = vec[1];
+
+  acc = src[0];
+  __builtin_mma_pmxvf64gernp (&acc, vecp0, vec1, 15, 3);
+  __builtin_mma_pmxvf64gernn (&acc, vecp0, vec1, 15, 3);
+  dst[1] = acc;
+}
+
+/* { dg-final { scan-assembler-times {\mxxmfacc\M} 4 } } */
+/* { dg-final { scan-assembler-times {\mxxmtacc\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mlxv\M} 4 } } */
+/* { dg-final { scan-assembler-times {\mlxvp\M} 8 } } */
+/* { dg-final { scan-assembler-times {\mstxvp\M} 8 } } */
+/* { dg-final { scan-assembler-times {\mxvf64ger\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvf64gerpp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvf64gerpn\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvf64gernp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvf64gernn\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvf64ger\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvf64gerpp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvf64gerpn\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvf64gernp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvf64gernn\M} 1 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/mma-builtin-3.c b/gcc/testsuite/gcc.target/powerpc/mma-builtin-3.c
new file mode 100644
index 00000000000..5406707061e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/mma-builtin-3.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_future_ok } */
+/* { dg-options "-Wno-psabi -mdejagnu-cpu=future -O2" } */
+
+void
+foo0 (void)
+{
+  __vector_quad acc;
+  asm ("#..." : "=d" (acc));
+  __builtin_mma_xxmtacc (&acc);
+  __builtin_mma_xxmfacc (&acc);
+  asm ("#..." :: "d" (acc));
+}
+
+typedef unsigned char  vec_t __attribute__((vector_size(16)));
+
+void
+foo1 (vec_t *vec)
+{
+  vec[1] = __builtin_vsx_xvcvspbf16 (vec[0]);
+  vec[3] = __builtin_vsx_xvcvbf16sp (vec[2]);
+}
+
+/* { dg-final { scan-assembler-times {\mxxmtacc\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxxmfacc\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mlxv\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mstxv\M} 2 } } */
+/* { dg-final { scan-assembler-not {\mlxvp\M} } } */
+/* { dg-final { scan-assembler-not {\mstxvp\M} } } */
+/* { dg-final { scan-assembler-times {\mxvcvspbf16\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvcvbf16sp\M} 1 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/mma-builtin-4.c b/gcc/testsuite/gcc.target/powerpc/mma-builtin-4.c
new file mode 100644
index 00000000000..138d1b46bc4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/mma-builtin-4.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_future_ok } */
+/* { dg-options "-Wno-psabi -mdejagnu-cpu=future -O2" } */
+
+typedef unsigned char vec_t __attribute__((vector_size(16)));
+
+void
+foo (__vector_pair *dst, vec_t *src)
+{
+  __vector_pair pair;
+  __builtin_mma_assemble_pair (&pair, src[0], src[4]);
+  *dst = pair;
+}
+
+void
+bar (vec_t *dst, __vector_pair *src)
+{
+  vec_t res[2];
+  __builtin_mma_disassemble_pair (res, src);
+  dst[0] = res[0];
+  dst[4] = res[1];
+}
+
+/* { dg-final { scan-assembler-times {\mlxv\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mlxvp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mstxv\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mstxvp\M} 1 } } */
+
diff --git a/gcc/testsuite/gcc.target/powerpc/mma-builtin-5.c b/gcc/testsuite/gcc.target/powerpc/mma-builtin-5.c
new file mode 100644
index 00000000000..0ee45b6bdfd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/mma-builtin-5.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_future_ok } */
+/* { dg-options "-Wno-psabi -mdejagnu-cpu=future -O2" } */
+
+typedef unsigned char vec_t __attribute__((vector_size(16)));
+
+void
+foo (__vector_quad *dst, vec_t *src)
+{
+  __vector_quad acc;
+  __builtin_mma_assemble_acc (&acc, src[0], src[4], src[8], src[12]);
+  *dst = acc;
+}
+
+void
+bar (vec_t *dst, __vector_quad *src)
+{
+  vec_t res[4];
+  __builtin_mma_disassemble_acc (res, src);
+  dst[0] = res[0];
+  dst[4] = res[1];
+  dst[8] = res[2];
+  dst[12] = res[3];
+}
+
+/* { dg-final { scan-assembler-times {\mlxv\M} 4 } } */
+/* { dg-final { scan-assembler-times {\mlxvp\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mstxv\M} 4 } } */
+/* { dg-final { scan-assembler-times {\mstxvp\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mxxmfacc\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mxxmtacc\M} 2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/mma-builtin-6.c b/gcc/testsuite/gcc.target/powerpc/mma-builtin-6.c
new file mode 100644
index 00000000000..c0b5eedd3d1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/mma-builtin-6.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_future_ok } */
+/* { dg-options "-Wno-psabi -mdejagnu-cpu=future -O2" } */
+
+void
+foo (__vector_quad *dst)
+{
+  __vector_quad acc;
+  __builtin_mma_xxsetaccz (&acc);
+  *dst = acc;
+}
+
+/* { dg-final { scan-assembler-not {\mlxv\M} } } */
+/* { dg-final { scan-assembler-not {\mlxvp\M} } } */
+/* { dg-final { scan-assembler-not {\mxxmtacc\M} } } */
+/* { dg-final { scan-assembler-times {\mxxsetaccz\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxxmfacc\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mstxvp\M} 2 } } */

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3, v2] rs6000: Add base support and types for defining MMA built-ins.
  2020-06-18 20:44 ` [PATCH 1/3, v2] rs6000: Add base support and types for defining MMA built-ins Peter Bergner
@ 2020-06-18 23:44   ` Segher Boessenkool
  2020-06-19 16:47     ` Peter Bergner
  0 siblings, 1 reply; 19+ messages in thread
From: Segher Boessenkool @ 2020-06-18 23:44 UTC (permalink / raw)
  To: Peter Bergner
  Cc: GCC Patches, Bill Schmidt, Michael Meissner, David Edelsohn,
	Will Schmidt

Hi!

On Thu, Jun 18, 2020 at 03:44:04PM -0500, Peter Bergner wrote:
> This patch adds the new -mmma option as well as the initial MMA support,
> which includes the target specific __vector_pair and __vector_quad types,
> the POImode and PXImode partial integer modes they are mapped to, and their
> associated  move patterns.  Support for the restrictions on the registers
> these modes can be assigned to as also been added.

> 	(rs6000_builtin_mask_calculate): Add support for RS6000_BTM_MMA
> 	and RS6000_BTM_FUTURE.
The latter is already there?

> 	* config/rs6000/rs6000.md (define_attr "isa"): Add mma.

Is this ever useful?  Please leave it out if not.  The "isa" things
are only for when some insn alternatives are available only one some
configurations and not others (not for when the whole pattern is not
valid).

Maybe a later patch uses this?

> 	(define_mode_iterator RELOAD): Add POI and PXI.

Why POI and PXI, but not OI and XI?

> 	Include mma.md.

That looks to be about RELOAD, the way it is placed.  Maybe put it as
the very first thing for this file, in the changelog?

> +;; cause byte swapping issues on litte-endian systems.  We don't need
> +;; the XImode and OImode move patterns for actual code generation,
> +;; therefor, we define the XImode and OImode move patterns, but we
> +;; disable their use with a "false" condition flag.

"therefore".

> +;; Define a disabled OImode move pattern, so we can use POImode.
> +(define_expand "movoi"
> +  [(set (match_operand:OI 0 "nonimmediate_operand")
> +	(match_operand:OI 1 "input_operand"))]
> +  "0"
> +{
> +  gcc_unreachable ();
> +})

So dirty, I love it :-)

> +(define_insn_and_split "*movpoi"
> +  [(set (match_operand:POI 0 "nonimmediate_operand" "=wa,m,wa")
> +	(match_operand:POI 1 "input_operand"	    "m,wa,wa"))]

Don't use tabs other than at the start of the line, please (or *maybe*
in tables).

> +;; Special pattern to prevent DSE from generating an internal error if it
> +;; notices a structure copy that it wants to eliminate.  This generates pretty
> +;; bad code, but at least it doesn't die.
> +(define_insn_and_split "truncpoidi2"

Could you say *why*/*how* it prevents the ICE here?

> +  [(set (match_operand:DI 0 "gpc_reg_operand" "=r")
> +	(truncate:DI (match_operand:POI 1 "gpc_reg_operand" "wa")))]
> +  "TARGET_MMA"
> +  "#"
> +  "&& reload_completed"
> +  [(set (match_dup 0)
> +	(vec_select:DI (match_dup 2)
> +		       (parallel [(match_dup 3)])))]
> +{
> +  unsigned r = reg_or_subregno (operands[1]) + !BYTES_BIG_ENDIAN;

Don't + booleans please (use ?: instead).

> +  operands[2] = gen_rtx_REG (V2DImode, r);
> +  operands[3] = BYTES_BIG_ENDIAN ? const1_rtx : const0_rtx;
> +})

So maybe just do an  if (BYTES_BIG_ENDIAN)  even, the arms simplify a
bit then.

> +;; Vector quad support.  PXImode is only defined for floating point registers.

Rephrase this?  A mode is defined without referring to registers at
all ;-)  "PXImode can only live in FPRs", something like that?

> +  /* Vector pair modes need even/odd VSX register pairs.  Only allow vector
> +     registers.  We need to allow OImode to have the same registers as POImode,
> +     even though we do not enable the move pattern for OImode.  */
> +  if (mode == POImode || mode == OImode)
> +    return (TARGET_MMA && VSX_REGNO_P (regno)
> +	    && (regno & 1) == 0);

Put it all one one line?

> +  /* MMA accumulator modes need FPR registers divisible by 4.  We need to allow
> +     XImode to have the same registers as PXImode, even though we do not enable
> +     the move pattern for XImode.  */
> +  if (mode == PXImode || mode == XImode)
> +    return (TARGET_MMA && FP_REGNO_P (regno)
> +	    && (regno & 3) == 0);

Likewise.

Why are OImode and XImode handled here?

>  static bool
>  rs6000_modes_tieable_p (machine_mode mode1, machine_mode mode2)
>  {
> -  if (mode1 == PTImode)
> -    return mode2 == PTImode;
> -  if (mode2 == PTImode)
> +  if (mode1 == PTImode || mode1 == POImode || mode1 == PXImode)
> +    return mode1 == mode2;
> +  if (mode2 == PTImode || mode2 == POImode || mode2 == PXImode)
>      return false;

You can just do
  if (mode1 == PTImode || mode1 == POImode || mode1 == PXImode
      || mode2 == PTImode || mode2 == POImode || mode2 == PXImode)
    return mode1 == mode2;

> @@ -2206,6 +2227,8 @@ rs6000_debug_reg_global (void)
>      SDmode,
>      DDmode,
>      TDmode,
> +    V2SImode,
> +    V2SFmode,

Did the changelog mention these?  If it is a bugfix, could it need a
backport?  Do it a separate patch then?

Well, it is debug info only, so not really interesting, but heh.

> @@ -2220,9 +2243,14 @@ rs6000_debug_reg_global (void)
>      V2DFmode,
>      V8SFmode,
>      V4DFmode,
> +    OImode,
> +    XImode,
> +    POImode,
> +    PXImode,
>      CCmode,
>      CCUNSmode,
>      CCEQmode,
> +    CCFPmode,
>    };

Same for the CCFP one here.

> +  /* Add support for vector pairs and vector quad registers.  */
> +  if (TARGET_MMA)
> +    {
> +      for (m = 0; m < NUM_MACHINE_MODES; ++m)
> +	if (m == POImode || m == PXImode)
> +	  {
> +	    rs6000_vector_unit[m] = VECTOR_NONE;
> +	    rs6000_vector_mem[m] = VECTOR_VSX;
> +	    rs6000_vector_align[m] = (m == POImode) ? 256 : 512;
> +	  }
> +    }

This is just

  /* Add support for vector pairs and vector quad registers.  */
  if (TARGET_MMA)
    {
      rs6000_vector_unit[POImode] = VECTOR_NONE;
      rs6000_vector_mem[POImode] = VECTOR_VSX;
      rs6000_vector_align[POImode] = 256;

      rs6000_vector_unit[PXImode] = VECTOR_NONE;
      rs6000_vector_mem[PXImode] = VECTOR_VSX;
      rs6000_vector_align[PXImode] = 512;
    }

which is just not longer (even although is has that whiteline :-) )

> @@ -15793,7 +15898,23 @@ rs6000_split_multireg_move (rtx dst, rtx src)
>    reg = REG_P (dst) ? REGNO (dst) : REGNO (src);
>    mode = GET_MODE (dst);
>    nregs = hard_regno_nregs (reg, mode);
> -  if (FP_REGNO_P (reg))
> +  /* If we have a quad vector register for MMA, and this is a load or store,
> +     see if we can use vector paired load/stores.  */
> +  if (mode == PXImode && TARGET_MMA
> +      && (MEM_P (dst) || MEM_P (src)))
> +    {
> +      reg_mode = POImode;;
> +      nregs /= hard_regno_nregs (reg, reg_mode);
> +    }

(doubled semicolon)

So nregs is always 2?  Maybe it is better to just assert that here then?

> @@ -19249,6 +19416,9 @@ rs6000_mangle_type (const_tree type)
>    if (SCALAR_FLOAT_TYPE_P (type) && FLOAT128_IEEE_P (TYPE_MODE (type)))
>      return ieee128_mangling_gcc_8_1 ? "U10__float128" : "u9__ieee128";
>  
> +  if (type == vector_pair_type_node) return "u13__vector_pair";
> +  if (type == vector_quad_type_node) return "u13__vector_quad";

Line breaks?

>  /* No data type wants to be aligned rounder than this.  */
> -#define BIGGEST_ALIGNMENT 128
> +#define BIGGEST_ALIGNMENT ((TARGET_MMA) ? 512 : 128)

No silly parens around TARGET_MMA please (macros should protect
themselves, sure, but not try to protect other macros).


Okay for trunk modulo the above.  Thanks!  This was much less painful
than I feared.  Well, maybe it is all in the other patches, I'll get to
those tomorrow ;-)


Segher

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/3, v2] rs6000: Add MMA built-in function definitions
  2020-06-18 20:45 ` [PATCH 2/3, v2] rs6000: Add MMA built-in function definitions Peter Bergner
@ 2020-06-19 16:45   ` Segher Boessenkool
  2020-06-19 17:06     ` Peter Bergner
  0 siblings, 1 reply; 19+ messages in thread
From: Segher Boessenkool @ 2020-06-19 16:45 UTC (permalink / raw)
  To: Peter Bergner
  Cc: GCC Patches, Bill Schmidt, Michael Meissner, David Edelsohn,
	Will Schmidt

Hi!

On Thu, Jun 18, 2020 at 03:45:17PM -0500, Peter Bergner wrote:
> +;; Return 1 if this operand is valid for a MMA assemble accumulator insn.
> +(define_special_predicate "mma_input_operand"
> +  (match_test "(mode == PXImode
> +		&& (GET_MODE (op) == V16QImode)
> +		&& (vsx_register_operand (op, GET_MODE (op)) || MEM_P (op)))"))

Maybe the name could be better, then?  "mma_assemble_input_operand"?

I don't see how mode is PXI but GET_MODE (op) is V16QI.  The actual
register is V16QI, but then it is used as if it was a PXI?  Is there no
better way to do this?  It certainly needs a comment, and a more specific
name, if you keep this like this.

> +BU_MMA_6 (PMXVI16GER2SPP,   "pmxvi16ger2spp",   QUAD, mma_pmxvi16ger2spp)

(I didn't check the large table, I'll just hope you did -- check against
the ISA and the builtins docs, etc. :-) )

> +  else if ((fnmask & RS6000_BTM_FUTURE) != 0)
> +    error ("%qs requires the %qs option", name, "-mcpu=future");

In the future, please send such pieces in a separate patch?

> @@ -9944,7 +9944,8 @@ rs6000_emit_move (rtx dest, rtx source, machine_mode mode)
>  
>      case E_POImode:
>      case E_PXImode:
> -      if (CONSTANT_P (operands[1]))
> +      if (CONSTANT_P (operands[1])
> +	  && INTVAL (operands[1]) != 0)
>  	error ("%qs is an opaque type, and you can't set it to other values.",
>  	       (mode == POImode) ? "__vector_pair" : "__vector_quad");

Put that condition on just one line please?  CONSTANT_P might not be
good enough if you want do use INTVAL btw, CONST_INT_P is clearer and/or
more correct.

> +(define_insn_and_split "*mma_assemble_acc"
> +  [(set (match_operand:PXI 0 "fpr_reg_operand" "=d")
> +	(unspec:PXI [(match_operand:PXI 1 "mma_input_operand" "mwa")
> +		     (match_operand:PXI 2 "mma_input_operand" "mwa")
> +		     (match_operand:PXI 3 "mma_input_operand" "mwa")
> +		     (match_operand:PXI 4 "mma_input_operand" "mwa")]
> +		     UNSPEC_MMA_ASSEMBLE_ACC))]

I would expect all those four last match_operand to be :V16QI, so why
does it use this strange mode?

In general, many of the MMA insns play loose and fast with the modes.
This probably works fine, since everything is unspec, but eww :-)

Anyway, okay for trunk.  Thanks!  Thanks to all who worked on this, it
was a painful trip getting to here.


Segher

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3, v2] rs6000: Add base support and types for defining MMA built-ins.
  2020-06-18 23:44   ` Segher Boessenkool
@ 2020-06-19 16:47     ` Peter Bergner
  2020-06-19 18:12       ` Segher Boessenkool
  2020-06-21  5:45       ` Peter Bergner
  0 siblings, 2 replies; 19+ messages in thread
From: Peter Bergner @ 2020-06-19 16:47 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: GCC Patches, Bill Schmidt, Michael Meissner, David Edelsohn,
	Will Schmidt

On 6/18/20 6:44 PM, Segher Boessenkool wrote:
>> 	(rs6000_builtin_mask_calculate): Add support for RS6000_BTM_MMA
>> 	and RS6000_BTM_FUTURE.
> The latter is already there?

Oops, yes.  I'll remove it.



>> 	* config/rs6000/rs6000.md (define_attr "isa"): Add mma.
> 
> Is this ever useful?  Please leave it out if not.  The "isa" things
> are only for when some insn alternatives are available only one some
> configurations and not others (not for when the whole pattern is not
> valid).

I think I added it back when we had a "pair" isa attribute and I
think I thought I needed it then.  I think you are correct that
we don't need it now.  I'll remove it.



>> 	(define_mode_iterator RELOAD): Add POI and PXI.
> 
> Why POI and PXI, but not OI and XI?

We don't have an enabled XI or OI move pattern, so I don't think
we'll ever see those modes at all in rtl.




>> 	Include mma.md.
> 
> That looks to be about RELOAD, the way it is placed.  Maybe put it as
> the very first thing for this file, in the changelog?

Yo mean rewrite it like the following?
	...
        * config/rs6000/rs6000.md: Include mma.md.
	(define_attr "isa"): Add mma.
        (define_attr "enabled"): Handle mma.
        (define_mode_iterator RELOAD): Add POI and PXI.
	...

...or do you mean move the rs6000.md entry to be the first entry in the ChangeLog?



>> +;; cause byte swapping issues on litte-endian systems.  We don't need
>> +;; the XImode and OImode move patterns for actual code generation,
>> +;; therefor, we define the XImode and OImode move patterns, but we
>> +;; disable their use with a "false" condition flag.
> 
> "therefore".

Fixed.


>> +;; Define a disabled OImode move pattern, so we can use POImode.
>> +(define_expand "movoi"
>> +  [(set (match_operand:OI 0 "nonimmediate_operand")
>> +	(match_operand:OI 1 "input_operand"))]
>> +  "0"
>> +{
>> +  gcc_unreachable ();
>> +})
> 
> So dirty, I love it :-)

Heh, credit to Mike on this one.




>> +(define_insn_and_split "*movpoi"
>> +  [(set (match_operand:POI 0 "nonimmediate_operand" "=wa,m,wa")
>> +	(match_operand:POI 1 "input_operand"	    "m,wa,wa"))]
> 
> Don't use tabs other than at the start of the line, please (or *maybe*
> in tables).

Fixed.  I just replaced it with one space to match the *movpxi pattern.



>> +;; Special pattern to prevent DSE from generating an internal error if it
>> +;; notices a structure copy that it wants to eliminate.  This generates pretty
>> +;; bad code, but at least it doesn't die.
>> +(define_insn_and_split "truncpoidi2"
> 
> Could you say *why*/*how* it prevents the ICE here?

This was added by Mike.  I didn't debug the issue.  Mike, do you have
some verbiage we could add here?




>> +  [(set (match_operand:DI 0 "gpc_reg_operand" "=r")
>> +	(truncate:DI (match_operand:POI 1 "gpc_reg_operand" "wa")))]
>> +  "TARGET_MMA"
>> +  "#"
>> +  "&& reload_completed"
>> +  [(set (match_dup 0)
>> +	(vec_select:DI (match_dup 2)
>> +		       (parallel [(match_dup 3)])))]
>> +{
>> +  unsigned r = reg_or_subregno (operands[1]) + !BYTES_BIG_ENDIAN;
> 
> Don't + booleans please (use ?: instead).
> 
>> +  operands[2] = gen_rtx_REG (V2DImode, r);
>> +  operands[3] = BYTES_BIG_ENDIAN ? const1_rtx : const0_rtx;
>> +})
> 
> So maybe just do an  if (BYTES_BIG_ENDIAN)  even, the arms simplify a
> bit then.

Like so?

  if (BYTES_BIG_ENDIAN)
    {
      operands[2] = gen_rtx_REG (V2DImode, reg_or_subregno (operands[1]));
      operands[3] = const1_rtx;
    }
  else
    {
      operands[2] = gen_rtx_REG (V2DImode, reg_or_subregno (operands[1]) + 1);
      operands[3] = const0_rtx;
    }





>> +;; Vector quad support.  PXImode is only defined for floating point registers.
> 
> Rephrase this?  A mode is defined without referring to registers at
> all ;-)  "PXImode can only live in FPRs", something like that?

Ok, I changed it to that.  I assume you want the same thing changed for POImode
too, so I modified its comment to "POImode can only live in VSRs.".



>> +  /* Vector pair modes need even/odd VSX register pairs.  Only allow vector
>> +     registers.  We need to allow OImode to have the same registers as POImode,
>> +     even though we do not enable the move pattern for OImode.  */
>> +  if (mode == POImode || mode == OImode)
>> +    return (TARGET_MMA && VSX_REGNO_P (regno)
>> +	    && (regno & 1) == 0);
> 
> Put it all one one line?
> 
>> +  /* MMA accumulator modes need FPR registers divisible by 4.  We need to allow
>> +     XImode to have the same registers as PXImode, even though we do not enable
>> +     the move pattern for XImode.  */
>> +  if (mode == PXImode || mode == XImode)
>> +    return (TARGET_MMA && FP_REGNO_P (regno)
>> +	    && (regno & 3) == 0);
> 
> Likewise.

Done.




> Why are OImode and XImode handled here?
> 
>>  static bool
>>  rs6000_modes_tieable_p (machine_mode mode1, machine_mode mode2)
>>  {

Do you mean why *aren't* they handled in rs6000_modes_tieable_p?
Probably because since we don't generate them, we didn't think we
need to handle them.  Do you want me to add them?


>> -  if (mode1 == PTImode)
>> -    return mode2 == PTImode;
>> -  if (mode2 == PTImode)
>> +  if (mode1 == PTImode || mode1 == POImode || mode1 == PXImode)
>> +    return mode1 == mode2;
>> +  if (mode2 == PTImode || mode2 == POImode || mode2 == PXImode)
>>      return false;
> 
> You can just do
>   if (mode1 == PTImode || mode1 == POImode || mode1 == PXImode
>       || mode2 == PTImode || mode2 == POImode || mode2 == PXImode)
>     return mode1 == mode2;

Ok, changed.  Let me know if you want me to also add OImode and XImode
there too.


>> @@ -2206,6 +2227,8 @@ rs6000_debug_reg_global (void)
>>      SDmode,
>>      DDmode,
>>      TDmode,
>> +    V2SImode,
>> +    V2SFmode,
> 
> Did the changelog mention these?  If it is a bugfix, could it need a
> backport?  Do it a separate patch then?
> 
> Well, it is debug info only, so not really interesting, but heh.
> 
>> @@ -2220,9 +2243,14 @@ rs6000_debug_reg_global (void)
>>      V2DFmode,
>>      V8SFmode,
>>      V4DFmode,
>> +    OImode,
>> +    XImode,
>> +    POImode,
>> +    PXImode,
>>      CCmode,
>>      CCUNSmode,
>>      CCEQmode,
>> +    CCFPmode,
>>    };
> 
> Same for the CCFP one here.

Mike added those.  I guess I thought they were needed.  Mike?
If they're not needed for MMA, I'll remove them from this patch
and they be submitted in a separate patch if they are needed for
something else.



>> +  /* Add support for vector pairs and vector quad registers.  */
>> +  if (TARGET_MMA)
>> +    {
>> +      for (m = 0; m < NUM_MACHINE_MODES; ++m)
>> +	if (m == POImode || m == PXImode)
>> +	  {
>> +	    rs6000_vector_unit[m] = VECTOR_NONE;
>> +	    rs6000_vector_mem[m] = VECTOR_VSX;
>> +	    rs6000_vector_align[m] = (m == POImode) ? 256 : 512;
>> +	  }
>> +    }
> 
> This is just
> 
>   /* Add support for vector pairs and vector quad registers.  */
>   if (TARGET_MMA)
>     {
>       rs6000_vector_unit[POImode] = VECTOR_NONE;
>       rs6000_vector_mem[POImode] = VECTOR_VSX;
>       rs6000_vector_align[POImode] = 256;
> 
>       rs6000_vector_unit[PXImode] = VECTOR_NONE;
>       rs6000_vector_mem[PXImode] = VECTOR_VSX;
>       rs6000_vector_align[PXImode] = 512;
>     }
> 
> which is just not longer (even although is has that whiteline :-) )

Heh, yes duh!  Changed.



>> +      reg_mode = POImode;;
>> +      nregs /= hard_regno_nregs (reg, reg_mode);
>> +    }
> 
> (doubled semicolon)

Fixed.


> So nregs is always 2?  Maybe it is better to just assert that here then?

If in a VSR, yes.  I think maybe we thought early on that if they were somehow
in a GPR then they'd take more regs, but I think maybe we've guarantee that
can't happen???  I can set it to 2 and add an assert and see if that exposes
anything.


>> @@ -19249,6 +19416,9 @@ rs6000_mangle_type (const_tree type)
>>    if (SCALAR_FLOAT_TYPE_P (type) && FLOAT128_IEEE_P (TYPE_MODE (type)))
>>      return ieee128_mangling_gcc_8_1 ? "U10__float128" : "u9__ieee128";
>>  
>> +  if (type == vector_pair_type_node) return "u13__vector_pair";
>> +  if (type == vector_quad_type_node) return "u13__vector_quad";
> 
> Line breaks?

Ok, added line breaks.  I was kind of just following some of the code
before it.  Of course other code there does it the other way! :-) 



>>  /* No data type wants to be aligned rounder than this.  */
>> -#define BIGGEST_ALIGNMENT 128
>> +#define BIGGEST_ALIGNMENT ((TARGET_MMA) ? 512 : 128)
> 
> No silly parens around TARGET_MMA please (macros should protect
> themselves, sure, but not try to protect other macros).

Fixed.



> Okay for trunk modulo the above.  Thanks!  This was much less painful
> than I feared.  Well, maybe it is all in the other patches, I'll get to
> those tomorrow ;-)

Thanks!

Mike, can you answer the 2 questions for you above?

Peter




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3, v2] rs6000: Add testsuite test cases for MMA built-ins.
  2020-06-18 20:46 ` [PATCH 3/3, v2] rs6000: Add testsuite test cases for MMA built-ins Peter Bergner
@ 2020-06-19 16:53   ` Segher Boessenkool
  2020-06-21  5:50     ` Peter Bergner
  0 siblings, 1 reply; 19+ messages in thread
From: Segher Boessenkool @ 2020-06-19 16:53 UTC (permalink / raw)
  To: Peter Bergner
  Cc: GCC Patches, Bill Schmidt, Michael Meissner, David Edelsohn,
	Will Schmidt

Hi!

On Thu, Jun 18, 2020 at 03:46:31PM -0500, Peter Bergner wrote:
> +/* { dg-final { scan-assembler-times {\mlxv\M} 40 } } */
> +/* { dg-final { scan-assembler-times {\mlxvp\M} 12 } } */
> +/* { dg-final { scan-assembler-times {\mstxvp\M} 40 } } */
> +/* { dg-final { scan-assembler-times {\mxxmfacc\M} 20 } } */
> +/* { dg-final { scan-assembler-times {\mxxmtacc\M} 6 } } */
> +/* { dg-final { scan-assembler-times {\mxvbf16ger2\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mxvbf16ger2nn\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mxvbf16ger2np\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mxvbf16ger2pn\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mxvbf16ger2pp\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mxvf16ger2\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mxvf16ger2nn\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mxvf16ger2np\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mxvf16ger2pn\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mxvf16ger2pp\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mxvf32ger\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mxvf32gernn\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mxvf32gernp\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mxvf32gerpn\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mxvf32gerpp\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mxvi16ger2\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mxvi16ger2pp\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mxvi16ger2s\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mxvi16ger2spp\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mxvi4ger8\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mxvi4ger8pp\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mxvi8ger4\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mxvi8ger4pp\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mxvi8ger4spp\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mpmxvbf16ger2\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mpmxvbf16ger2nn\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mpmxvbf16ger2np\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mpmxvbf16ger2pn\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mpmxvbf16ger2pp\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mpmxvf16ger2\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mpmxvf16ger2nn\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mpmxvf16ger2np\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mpmxvf16ger2pn\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mpmxvf16ger2pp\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mpmxvf32ger\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mpmxvf32gernn\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mpmxvf32gernp\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mpmxvf32gerpn\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mpmxvf32gerpp\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mpmxvi16ger2\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mpmxvi16ger2pp\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mpmxvi16ger2s\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mpmxvi16ger2spp\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mpmxvi4ger8\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mpmxvi4ger8pp\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mpmxvi8ger4\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mpmxvi8ger4pp\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mpmxvi8ger4spp\M} 1 } } */

Nowhere does it say how many of which insns are expected in which of the
twenty-odd functions, so this can become a maintenance nightmare.  If
anything ever changes, and it will be *your* nightmare anyway ;-)

Okay for trunk.  Thanks!


Segher

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/3, v2] rs6000: Add MMA built-in function definitions
  2020-06-19 16:45   ` Segher Boessenkool
@ 2020-06-19 17:06     ` Peter Bergner
  2020-06-21  5:49       ` Peter Bergner
  0 siblings, 1 reply; 19+ messages in thread
From: Peter Bergner @ 2020-06-19 17:06 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: GCC Patches, Bill Schmidt, Michael Meissner, David Edelsohn,
	Will Schmidt

On 6/19/20 11:45 AM, Segher Boessenkool wrote:
> On Thu, Jun 18, 2020 at 03:45:17PM -0500, Peter Bergner wrote:
>> +;; Return 1 if this operand is valid for a MMA assemble accumulator insn.
>> +(define_special_predicate "mma_input_operand"
>> +  (match_test "(mode == PXImode
>> +		&& (GET_MODE (op) == V16QImode)
>> +		&& (vsx_register_operand (op, GET_MODE (op)) || MEM_P (op)))"))
> 
> Maybe the name could be better, then?  "mma_assemble_input_operand"?

Yes, that's a better name.  Changed.


>> +  else if ((fnmask & RS6000_BTM_FUTURE) != 0)
>> +    error ("%qs requires the %qs option", name, "-mcpu=future");
> 
> In the future, please send such pieces in a separate patch?

Ok.  I'm actually surprised this wasn't added as part of the
initial -mcpu=future patch that went in a while ago.



>> @@ -9944,7 +9944,8 @@ rs6000_emit_move (rtx dest, rtx source, machine_mode mode)
>>  
>>      case E_POImode:
>>      case E_PXImode:
>> -      if (CONSTANT_P (operands[1]))
>> +      if (CONSTANT_P (operands[1])
>> +	  && INTVAL (operands[1]) != 0)
>>  	error ("%qs is an opaque type, and you can't set it to other values.",
>>  	       (mode == POImode) ? "__vector_pair" : "__vector_quad");
> 
> Put that condition on just one line please?  CONSTANT_P might not be
> good enough if you want do use INTVAL btw, CONST_INT_P is clearer and/or
> more correct.

Ok, I'll make those changes.



>> +(define_insn_and_split "*mma_assemble_acc"
>> +  [(set (match_operand:PXI 0 "fpr_reg_operand" "=d")
>> +	(unspec:PXI [(match_operand:PXI 1 "mma_input_operand" "mwa")
>> +		     (match_operand:PXI 2 "mma_input_operand" "mwa")
>> +		     (match_operand:PXI 3 "mma_input_operand" "mwa")
>> +		     (match_operand:PXI 4 "mma_input_operand" "mwa")]
>> +		     UNSPEC_MMA_ASSEMBLE_ACC))]
> 
> I would expect all those four last match_operand to be :V16QI, so why
> does it use this strange mode?

Must be a cut/paste error and probably why we saw mode == PXImode
in the mma_input_operand predicate.  I'll change that and the
predicate and retest.  Thanks for pointing that out!


> Anyway, okay for trunk.  Thanks!  Thanks to all who worked on this, it
> was a painful trip getting to here.

Thanks for the review!

Peter



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3, v2] rs6000: Add base support and types for defining MMA built-ins.
  2020-06-19 16:47     ` Peter Bergner
@ 2020-06-19 18:12       ` Segher Boessenkool
  2020-06-19 19:33         ` Peter Bergner
  2020-06-21  5:45       ` Peter Bergner
  1 sibling, 1 reply; 19+ messages in thread
From: Segher Boessenkool @ 2020-06-19 18:12 UTC (permalink / raw)
  To: Peter Bergner
  Cc: GCC Patches, Bill Schmidt, Michael Meissner, David Edelsohn,
	Will Schmidt

Hi!

On Fri, Jun 19, 2020 at 11:47:35AM -0500, Peter Bergner wrote:
> >> 	(define_mode_iterator RELOAD): Add POI and PXI.
> > 
> > Why POI and PXI, but not OI and XI?
> 
> We don't have an enabled XI or OI move pattern, so I don't think
> we'll ever see those modes at all in rtl.

Yeah good point.  And the OI/XI move expanders are probably enough to
diagnose that if ever it does go wrong.

> >> +;; Define a disabled OImode move pattern, so we can use POImode.
> >> +(define_expand "movoi"
> >> +  [(set (match_operand:OI 0 "nonimmediate_operand")
> >> +	(match_operand:OI 1 "input_operand"))]
> >> +  "0"
> >> +{
> >> +  gcc_unreachable ();
> >> +})
> > 
> > So dirty, I love it :-)
> 
> Heh, credit to Mike on this one.

Thanks Mike :-)

> > Why are OImode and XImode handled here?
> > 
> >>  static bool
> >>  rs6000_modes_tieable_p (machine_mode mode1, machine_mode mode2)
> >>  {
> 
> Do you mean why *aren't* they handled in rs6000_modes_tieable_p?

No, this is a comment about the stuff above my comment, so

> +  /* MMA accumulator modes need FPR registers divisible by 4.  We need to allow
> +     XImode to have the same registers as PXImode, even though we do not enable
> +     the move pattern for XImode.  */
> +  if (mode == PXImode || mode == XImode)
> +    return (TARGET_MMA && FP_REGNO_P (regno)
> +	    && (regno & 3) == 0);

and the one with

> +  if (mode == POImode || mode == OImode)

before it.

> Ok, changed.  Let me know if you want me to also add OImode and XImode
> there too.

No, not handling those anywhere is fine, but let's be consistent then :-)

> > Well, it is debug info only, so not really interesting, but heh.
> > 
> >> @@ -2220,9 +2243,14 @@ rs6000_debug_reg_global (void)
> >>      V2DFmode,
> >>      V8SFmode,
> >>      V4DFmode,
> >> +    OImode,
> >> +    XImode,
> >> +    POImode,
> >> +    PXImode,
> >>      CCmode,
> >>      CCUNSmode,
> >>      CCEQmode,
> >> +    CCFPmode,
> >>    };
> > 
> > Same for the CCFP one here.
> 
> Mike added those.  I guess I thought they were needed.  Mike?
> If they're not needed for MMA, I'll remove them from this patch
> and they be submitted in a separate patch if they are needed for
> something else.

You can keep them, it's compiler debug only, but the changelog should
mention it (it looks like an accident now, which maybe it was ;-) )


Segher

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3, v2] rs6000: Add base support and types for defining MMA built-ins.
  2020-06-19 18:12       ` Segher Boessenkool
@ 2020-06-19 19:33         ` Peter Bergner
  2020-06-19 19:43           ` Peter Bergner
  0 siblings, 1 reply; 19+ messages in thread
From: Peter Bergner @ 2020-06-19 19:33 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: GCC Patches, Bill Schmidt, Michael Meissner, David Edelsohn,
	Will Schmidt

On 6/19/20 1:12 PM, Segher Boessenkool wrote:
> On Fri, Jun 19, 2020 at 11:47:35AM -0500, Peter Bergner wrote:
>>> Why are OImode and XImode handled here?
>>>
>>>>  static bool
>>>>  rs6000_modes_tieable_p (machine_mode mode1, machine_mode mode2)
>>>>  {
>>
>> Do you mean why *aren't* they handled in rs6000_modes_tieable_p?
> 
> No, this is a comment about the stuff above my comment, so
> 
>> +  /* MMA accumulator modes need FPR registers divisible by 4.  We need to allow
>> +     XImode to have the same registers as PXImode, even though we do not enable
>> +     the move pattern for XImode.  */
>> +  if (mode == PXImode || mode == XImode)
>> +    return (TARGET_MMA && FP_REGNO_P (regno)
>> +	    && (regno & 3) == 0);
> 
> and the one with
> 
>> +  if (mode == POImode || mode == OImode)
> 
> before it.

Ah, ok.  Yeah, I think that was an oversight and we shouldn't need those.
I'll remove them.



>>> Same for the CCFP one here.
>>
>> Mike added those.  I guess I thought they were needed.  Mike?
>> If they're not needed for MMA, I'll remove them from this patch
>> and they be submitted in a separate patch if they are needed for
>> something else.
> 
> You can keep them, it's compiler debug only, but the changelog should
> mention it (it looks like an accident now, which maybe it was ;-) )

Ok, I'll add a changelog entry for them then...unless Mike comes back
before my testing is done and says we don't need them at all.

Peter




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3, v2] rs6000: Add base support and types for defining MMA built-ins.
  2020-06-19 19:33         ` Peter Bergner
@ 2020-06-19 19:43           ` Peter Bergner
  2020-06-19 22:38             ` Segher Boessenkool
  0 siblings, 1 reply; 19+ messages in thread
From: Peter Bergner @ 2020-06-19 19:43 UTC (permalink / raw)
  To: Michael Meissner
  Cc: Segher Boessenkool, GCC Patches, Bill Schmidt, David Edelsohn,
	Will Schmidt

On 6/19/20 2:33 PM, Peter Bergner wrote:
> On 6/19/20 1:12 PM, Segher Boessenkool wrote:
>> On Fri, Jun 19, 2020 at 11:47:35AM -0500, Peter Bergner wrote:
>>>> Why are OImode and XImode handled here?
>>>>
>>>>>  static bool
>>>>>  rs6000_modes_tieable_p (machine_mode mode1, machine_mode mode2)
>>>>>  {
>>>
>>> Do you mean why *aren't* they handled in rs6000_modes_tieable_p?
>>
>> No, this is a comment about the stuff above my comment, so
>>
>>> +  /* MMA accumulator modes need FPR registers divisible by 4.  We need to allow
>>> +     XImode to have the same registers as PXImode, even though we do not enable
>>> +     the move pattern for XImode.  */
>>> +  if (mode == PXImode || mode == XImode)
>>> +    return (TARGET_MMA && FP_REGNO_P (regno) && (regno & 3) == 0);

Heh, now I'm not so sure after reading the comment before the test. :-)
Mike added this code.

Mike, it looks like you explicitly added XImode here, even though we
will never generate XImode uses.  Is there some code somewhere that
requires an integer mode and it's associated partial integer mode to
have the same registers and if they don't, something won't work right?
Meaning, is there some reason I shouldn't remove the XImode use here?

Ditto for OImode and PImode.

Peter



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3, v2] rs6000: Add base support and types for defining MMA built-ins.
  2020-06-19 19:43           ` Peter Bergner
@ 2020-06-19 22:38             ` Segher Boessenkool
  0 siblings, 0 replies; 19+ messages in thread
From: Segher Boessenkool @ 2020-06-19 22:38 UTC (permalink / raw)
  To: Peter Bergner
  Cc: Michael Meissner, GCC Patches, Bill Schmidt, David Edelsohn,
	Will Schmidt

On Fri, Jun 19, 2020 at 02:43:36PM -0500, Peter Bergner wrote:
> Heh, now I'm not so sure after reading the comment before the test. :-)
> Mike added this code.
> 
> Mike, it looks like you explicitly added XImode here, even though we
> will never generate XImode uses.  Is there some code somewhere that
> requires an integer mode and it's associated partial integer mode to
> have the same registers and if they don't, something won't work right?
> Meaning, is there some reason I shouldn't remove the XImode use here?
> 
> Ditto for OImode and PImode.

If you need to keep it, please add a comment explaining why?


Segher

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3, v2] rs6000: Add base support and types for defining MMA built-ins.
  2020-06-19 16:47     ` Peter Bergner
  2020-06-19 18:12       ` Segher Boessenkool
@ 2020-06-21  5:45       ` Peter Bergner
  1 sibling, 0 replies; 19+ messages in thread
From: Peter Bergner @ 2020-06-21  5:45 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: GCC Patches, Bill Schmidt, Michael Meissner, David Edelsohn,
	Will Schmidt

On 6/19/20 11:47 AM, Peter Bergner wrote:
>>> +;; Special pattern to prevent DSE from generating an internal error if it
>>> +;; notices a structure copy that it wants to eliminate.  This generates pretty
>>> +;; bad code, but at least it doesn't die.
>>> +(define_insn_and_split "truncpoidi2"
>>
>> Could you say *why*/*how* it prevents the ICE here?
> 
> This was added by Mike.  I didn't debug the issue.  Mike, do you have
> some verbiage we could add here?

So this pattern was added earlier in our implementation of these built-ins.
For kicks, I removed the pattern to recreate the ICE we saw before so I
could describe the ICE like you wanted.  However, the ICE we were seeing
in convert_mode_scalar() is gone.  In fact, we don't call convert_mode_scalar
anymore for the test case that used to ICE, so our implementation since
then seems to have obviated the need for the pattern at all, so I have
removed it!  




>>> +  /* MMA accumulator modes need FPR registers divisible by 4.  We need to allow
>>> +     XImode to have the same registers as PXImode, even though we do not enable
>>> +     the move pattern for XImode.  */
>>> +  if (mode == PXImode || mode == XImode)
>>> +    return (TARGET_MMA && FP_REGNO_P (regno)
>>> +	    && (regno & 3) == 0);
[snip]
>> Why are OImode and XImode handled here?

So I tried removing OImode and XImode here and things broke badly, so
we do need them.  Without them, we end up generating uses of OImode in
our rtl, even with the OImode move pattern being disabled.  With OImode
added, we get no OImode uses at all (what we want).  Mike said this has
to do with implicit assumptions within GCC going all of the way back
to the beginning of GCC development.  Yeah, kind of vague.  His existing
comment does say we need them here though.

I made the rest of your suggested changes and pushed the patch.  Thanks!

Peter

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/3, v2] rs6000: Add MMA built-in function definitions
  2020-06-19 17:06     ` Peter Bergner
@ 2020-06-21  5:49       ` Peter Bergner
  0 siblings, 0 replies; 19+ messages in thread
From: Peter Bergner @ 2020-06-21  5:49 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: GCC Patches, Bill Schmidt, Michael Meissner, David Edelsohn,
	Will Schmidt

On 6/19/20 12:06 PM, Peter Bergner wrote:
> On 6/19/20 11:45 AM, Segher Boessenkool wrote:
>>> +(define_insn_and_split "*mma_assemble_acc"
>>> +  [(set (match_operand:PXI 0 "fpr_reg_operand" "=d")
>>> +	(unspec:PXI [(match_operand:PXI 1 "mma_input_operand" "mwa")
>>> +		     (match_operand:PXI 2 "mma_input_operand" "mwa")
>>> +		     (match_operand:PXI 3 "mma_input_operand" "mwa")
>>> +		     (match_operand:PXI 4 "mma_input_operand" "mwa")]
>>> +		     UNSPEC_MMA_ASSEMBLE_ACC))]
>>
>> I would expect all those four last match_operand to be :V16QI, so why
>> does it use this strange mode?
> 
> Must be a cut/paste error and probably why we saw mode == PXImode
> in the mma_input_operand predicate.  I'll change that and the
> predicate and retest.  Thanks for pointing that out!

Yes, cut/paste error.  I changed those to all V16QI and modified the
mma_assemble_input_operand predicate to match (and made your other
suggested changes) and retesting came back clean, so I pushed the
patch with those changes.

Thanks for your reviews!!!

Peter



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3, v2] rs6000: Add testsuite test cases for MMA built-ins.
  2020-06-19 16:53   ` Segher Boessenkool
@ 2020-06-21  5:50     ` Peter Bergner
  0 siblings, 0 replies; 19+ messages in thread
From: Peter Bergner @ 2020-06-21  5:50 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: GCC Patches, Bill Schmidt, Michael Meissner, David Edelsohn,
	Will Schmidt

On 6/19/20 11:53 AM, Segher Boessenkool wrote:
> 
> Okay for trunk.  Thanks!

I committed this along with patch2, so it was pushed upstream with it.

Thanks!

Peter



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/3, v2] rs6000: Add support for Matrix-Multiply Assist (MMA) built-in functions.
  2020-06-18 20:42 [PATCH 0/3, v2] rs6000: Add support for Matrix-Multiply Assist (MMA) built-in functions Peter Bergner
                   ` (2 preceding siblings ...)
  2020-06-18 20:46 ` [PATCH 3/3, v2] rs6000: Add testsuite test cases for MMA built-ins Peter Bergner
@ 2020-06-24 19:28 ` Peter Bergner
  2020-06-24 19:37   ` Segher Boessenkool
  3 siblings, 1 reply; 19+ messages in thread
From: Peter Bergner @ 2020-06-24 19:28 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: GCC Patches, Bill Schmidt, Michael Meissner, David Edelsohn,
	Will Schmidt

On 6/18/20 3:42 PM, Peter Bergner wrote:
> POWER ISA 3.1 added new Matrix-Multiply Assist (MMA) instructions.
> The following patch set adds support for generating these instructions
> through built-in functions which are enabled with the -mmma option.
> 
> The patch1 and patch1+patch2+patch3 have been bootstrapped and regtested on
> powerpc64le-linux with no regressions.  In addition, patch1+patch2+patch3
> has been bootstrapped and regtested on powerpc64-linux (BE), also without
> regressions.  I'll note that I split the testsuite changes into their own
> patch for review purposes, but I plan on committing patch2 and patch3 together.
> 
> Changes since v1:
>   Patch 1/3:
>     - Modified verbiage in mma.md per Will's suggestion.
>     - Modified rs6000_split_multireg_move to correctly handle BE PXImode
>       and POImode moves.
>   Patch 2/3:
>     - Updated ChangeLog entry per Segher's suggestion.
>     - Updated doc/extend.texi with correct built-in names for
>       __builtin_vsx_xvcvspbf16 and __builtin_vsx_xvcvbf16sp.
>   Patch 3/3:
>     - No changes.

The committed patches don't seem to have caused any bootstrap issues on trunk
and are pretty independent of the rest of the rs6000 backend, so I'd like
permission to back port the two commits to GCC 10.  Patch 2 makes used of
the u8bit_cint_operand predicate which Kelvin added, but that isn't in GCC 10,
so I need to back port that change too.

The back ports of the MMA patches/commits was straight forward and I'm
currently bootstrapping/regtesting the backports on both powerpc64le-linux
and powerpc64-linux.  Is this (including the hunk below) ok for GCC 10
release branch assuming the tests come back clean?

Peter


    rs6000: Backport u8bit_cint_operand predicate.
    
    2020-05-11  Kelvin Nilsen  <wschmidt@linux.ibm.com>
            Backported from master
            * config/rs6000/predicates.md (u8bit_cint_operand): New predicate.

diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index bf04e4d431f..529c2beb773 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -234,6 +234,11 @@ (define_predicate "u7bit_cint_operand"
   (and (match_code "const_int")
        (match_test "IN_RANGE (INTVAL (op), 0, 127)")))
 
+;; Return 1 if op is an unsigned 8-bit constant integer.
+(define_predicate "u8bit_cint_operand"
+  (and (match_code "const_int")
+       (match_test "IN_RANGE (INTVAL (op), 0, 255)")))
+
 ;; Return 1 if op is a signed 8-bit constant integer.
 ;; Integer multiplication complete more quickly
 (define_predicate "s8bit_cint_operand"

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/3, v2] rs6000: Add support for Matrix-Multiply Assist (MMA) built-in functions.
  2020-06-24 19:28 ` [PATCH 0/3, v2] rs6000: Add support for Matrix-Multiply Assist (MMA) built-in functions Peter Bergner
@ 2020-06-24 19:37   ` Segher Boessenkool
  2020-06-25 13:15     ` Peter Bergner
  0 siblings, 1 reply; 19+ messages in thread
From: Segher Boessenkool @ 2020-06-24 19:37 UTC (permalink / raw)
  To: Peter Bergner; +Cc: Bill Schmidt, GCC Patches, David Edelsohn, Michael Meissner

Hi!

On Wed, Jun 24, 2020 at 02:28:00PM -0500, Peter Bergner via Gcc-patches wrote:
> On 6/18/20 3:42 PM, Peter Bergner wrote:
> > POWER ISA 3.1 added new Matrix-Multiply Assist (MMA) instructions.
> > The following patch set adds support for generating these instructions
> > through built-in functions which are enabled with the -mmma option.
> > 
> > The patch1 and patch1+patch2+patch3 have been bootstrapped and regtested on
> > powerpc64le-linux with no regressions.  In addition, patch1+patch2+patch3
> > has been bootstrapped and regtested on powerpc64-linux (BE), also without
> > regressions.  I'll note that I split the testsuite changes into their own
> > patch for review purposes, but I plan on committing patch2 and patch3 together.
> > 
> > Changes since v1:
> >   Patch 1/3:
> >     - Modified verbiage in mma.md per Will's suggestion.
> >     - Modified rs6000_split_multireg_move to correctly handle BE PXImode
> >       and POImode moves.
> >   Patch 2/3:
> >     - Updated ChangeLog entry per Segher's suggestion.
> >     - Updated doc/extend.texi with correct built-in names for
> >       __builtin_vsx_xvcvspbf16 and __builtin_vsx_xvcvbf16sp.
> >   Patch 3/3:
> >     - No changes.
> 
> The committed patches don't seem to have caused any bootstrap issues on trunk
> and are pretty independent of the rest of the rs6000 backend, so I'd like
> permission to back port the two commits to GCC 10.  Patch 2 makes used of
> the u8bit_cint_operand predicate which Kelvin added, but that isn't in GCC 10,
> so I need to back port that change too.
> 
> The back ports of the MMA patches/commits was straight forward and I'm
> currently bootstrapping/regtesting the backports on both powerpc64le-linux
> and powerpc64-linux.  Is this (including the hunk below) ok for GCC 10
> release branch assuming the tests come back clean?

Yes, all are okay for 10 as well (incl. Kelvin's backport).  Thanks!

>     rs6000: Backport u8bit_cint_operand predicate.

(No dot at the end of the subject please.)


Segher

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/3, v2] rs6000: Add support for Matrix-Multiply Assist (MMA) built-in functions.
  2020-06-24 19:37   ` Segher Boessenkool
@ 2020-06-25 13:15     ` Peter Bergner
  0 siblings, 0 replies; 19+ messages in thread
From: Peter Bergner @ 2020-06-25 13:15 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Bill Schmidt, GCC Patches, David Edelsohn, Michael Meissner

On 6/24/20 2:37 PM, Segher Boessenkool wrote:
> On Wed, Jun 24, 2020 at 02:28:00PM -0500, Peter Bergner via Gcc-patches wrote:
>> The back ports of the MMA patches/commits was straight forward and I'm
>> currently bootstrapping/regtesting the backports on both powerpc64le-linux
>> and powerpc64-linux.  Is this (including the hunk below) ok for GCC 10
>> release branch assuming the tests come back clean?
> 
> Yes, all are okay for 10 as well (incl. Kelvin's backport).  Thanks!
> 
>>     rs6000: Backport u8bit_cint_operand predicate.
> 
> (No dot at the end of the subject please.)

Ok, testing was clean on BE and LE and I made the subject change you requested.
Back port has been pushed to GCC 10.  Thanks!

Peter



^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2020-06-25 13:16 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-18 20:42 [PATCH 0/3, v2] rs6000: Add support for Matrix-Multiply Assist (MMA) built-in functions Peter Bergner
2020-06-18 20:44 ` [PATCH 1/3, v2] rs6000: Add base support and types for defining MMA built-ins Peter Bergner
2020-06-18 23:44   ` Segher Boessenkool
2020-06-19 16:47     ` Peter Bergner
2020-06-19 18:12       ` Segher Boessenkool
2020-06-19 19:33         ` Peter Bergner
2020-06-19 19:43           ` Peter Bergner
2020-06-19 22:38             ` Segher Boessenkool
2020-06-21  5:45       ` Peter Bergner
2020-06-18 20:45 ` [PATCH 2/3, v2] rs6000: Add MMA built-in function definitions Peter Bergner
2020-06-19 16:45   ` Segher Boessenkool
2020-06-19 17:06     ` Peter Bergner
2020-06-21  5:49       ` Peter Bergner
2020-06-18 20:46 ` [PATCH 3/3, v2] rs6000: Add testsuite test cases for MMA built-ins Peter Bergner
2020-06-19 16:53   ` Segher Boessenkool
2020-06-21  5:50     ` Peter Bergner
2020-06-24 19:28 ` [PATCH 0/3, v2] rs6000: Add support for Matrix-Multiply Assist (MMA) built-in functions Peter Bergner
2020-06-24 19:37   ` Segher Boessenkool
2020-06-25 13:15     ` Peter Bergner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).