public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 0/3] rs6000: Add support for Matrix-Multiply Assist (MMA) built-in functions.
@ 2020-06-15 19:54 Peter Bergner
  2020-06-15 19:56 ` [PATCH 1/3] rs6000: Add base support and types for defining MMA built-ins Peter Bergner
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Peter Bergner @ 2020-06-15 19:54 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: David Edelsohn, GCC Patches, Bill Schmidt, Michael Meissner

POWER ISA 3.1 added new Matrix-Multiply Assist (MMA) instructions.
The following patch series adds support for generating these instructions
through built-in functions which are enabled with the -mmma option.

Patch1 adds the base support required for defining the built-ins.
Patch2 adds the built-ins themselves and Patch3 adds testsuite test cases
to exercise the builtins.  I'll note that I split the testsuite changes
into their own patch solely for review purposes.  I plan on committing
patch2 and patch3 together.

The patch1 and patch1+patch2+patch3 have been bootstrapped and regtested on
powerpc64le-linux with no regressions.  In addition, patch1+patch2+patch3
has been bootstrapped and regtested on powerpc64-linux (BE), also without
regressions.  

Peter


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/3] rs6000: Add base support and types for defining MMA built-ins.
  2020-06-15 19:54 [PATCH 0/3] rs6000: Add support for Matrix-Multiply Assist (MMA) built-in functions Peter Bergner
@ 2020-06-15 19:56 ` Peter Bergner
  2020-06-15 22:43   ` will schmidt
  2020-06-15 19:58 ` [PATCH 2/3] rs6000: Add MMA built-in function definitions Peter Bergner
  2020-06-15 19:59 ` [PATCH 3/3] rs6000: Add testsuite test cases for MMA built-ins Peter Bergner
  2 siblings, 1 reply; 12+ messages in thread
From: Peter Bergner @ 2020-06-15 19:56 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: David Edelsohn, GCC Patches, Bill Schmidt, Michael Meissner

This patch adds the new -mmma option as well as the initial MMA support,
which includes the target specific __vector_pair and __vector_quad types,
the POImode and PXImode partial integer modes they are mapped to, and their
associated  move patterns.  Support for the restrictions on the registers
these modes can be assigned to as also been added.

This patch passed bootstrap and regtesting with no regressions on
powerpc64le-linux.  Ok for trunk?

Peter

2020-06-15  Peter Bergner  <bergner@linux.ibm.com>
	    Michael Meissner  <meissner@linux.ibm.com>

gcc/
	* config/rs6000/mma.md: New file.
	* config/rs6000/rs6000-c.c (rs6000_target_modify_macros): Define
	__MMA__ for mma.
	* config/rs6000/rs6000-call.c (rs6000_init_builtins): Add support
	for __vector_pair and __vector_quad types.
	* config/rs6000/rs6000-cpus.def (OTHER_FUTURE_MASKS): Add
	OPTION_MASK_MMA.
	(POWERPC_MASKS): Likewise.
	* config/rs6000/rs6000-modes.def (OI, XI): New integer modes.
	(POI, PXI): New partial integer modes.
	* config/rs6000/rs6000.c (TARGET_INVALID_CONVERSION): Define.
	(rs6000_hard_regno_nregs_internal): Use VECTOR_ALIGNMENT_P.
	(rs6000_hard_regno_mode_ok_uncached): Likewise.
	Add support for POImode being allowed in VSX registers and PXImode
	being allowed in FP registers.
	(rs6000_modes_tieable_p): Adjust comment.
	Add support for POImode and PXImode.
	(rs6000_debug_reg_global) <print_tieable_modes>: Add OImode, POImode
	XImode and PXImode.
	(rs6000_setup_reg_addr_masks): Use VECTOR_ALIGNMENT_P.
	Set up appropriate addr_masks for vector pair and vector quad addresses.
	(rs6000_init_hard_regno_mode_ok): Add support for vector pair and
	vector quad registers.  Setup reload handlers for POImode and PXImode.
	(rs6000_builtin_mask_calculate): Add support for RS6000_BTM_MMA
	and RS6000_BTM_FUTURE.
	(rs6000_option_override_internal): Error if -mmma is specified
	without -mcpu=future.
	(rs6000_slow_unaligned_access): Use VECTOR_ALIGNMENT_P.
	(quad_address_p): Change size test to less than 16 bytes.
	(reg_offset_addressing_ok_p): Add support for ISA 3.1 vector pair
	and vector quad instructions.
	(avoiding_indexed_address_p): Likewise.
	(rs6000_emit_move): Disallow POImode and PXImode moves involving
	constants.
	(rs6000_preferred_reload_class): Prefer VSX registers for POImode
	and FP registers for PXImode.
	(rs6000_split_multireg_move): Support splitting POImode and PXImode
	move instructions.  Insert xxmtacc and xxmfacc instructions when
	setting a PXImode register and reading a PXImode register respectively.
	(rs6000_mangle_type): Adjust comment.  Add support for mangling
	__vector_pair and __vector_quad types.
	(rs6000_opt_masks): Add entry for mma.
	(rs6000_builtin_mask_names): Add RS6000_BTM_MMA and RS6000_BTM_FUTURE.
	(rs6000_function_value): Use VECTOR_ALIGNMENT_P.
	(address_to_insn_form): Likewise.
	(reg_to_non_prefixed): Likewise.
	(rs6000_invalid_conversion): New function.
	* config/rs6000/rs6000.h (MASK_MMA): Define.
	(BIGGEST_ALIGNMENT): Set to 512 if MMA support is enabled.
	(VECTOR_ALIGNMENT_P): New helper macro.
	(ALTIVEC_VECTOR_MODE): Use VECTOR_ALIGNMENT_P.
	(RS6000_BTM_MMA): Define.
	(RS6000_BTM_COMMON): Add RS6000_BTM_MMA and RS6000_BTM_FUTURE.
	(rs6000_builtin_type_index): Add RS6000_BTI_vector_pair and
	RS6000_BTI_vector_quad.
	(vector_pair_type_node): Define.
	(vector_quad_type_node): Likewise.
	* config/rs6000/rs6000.md (define_attr "isa"): Add mma.
	(define_attr "enabled"): Handle mma.
	(define_mode_iterator RELOAD): Add POI and PXI.
	Include mma.md.
	* config/rs6000/t-rs6000 (MD_INCLUDES): Add mma.md.
	* config/rs6000/rs6000.opt (-mmma): New.
	* doc/invoke.texi: Document -mmma.

diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
new file mode 100644
index 00000000000..b35a15a2be1
--- /dev/null
+++ b/gcc/config/rs6000/mma.md
@@ -0,0 +1,128 @@
+;; Vector Quad, Vector Pair, and MMA patterns.
+;; Copyright (C) 2020 Free Software Foundation, Inc.
+;; Contributed by Peter Bergner <bergner@linux.ibm.com> and
+;;		  Michael Meissner <meissner@linux.ibm.com>
+
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published
+;; by the Free Software Foundation; either version 3, or (at your
+;; option) any later version.
+
+;; GCC is distributed in the hope that it will be useful, but WITHOUT
+;; ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+;; or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+;; License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+;; Vector load/store pair operations
+;; We need to define an OImode move pattern, even though we don't enable it,
+;; because the machine independent parts of the compiler at times uses the
+;; large integer modes.
+;;
+;; If we enable movoi, the compiler will try and use it.  Unfortunately, if it
+;; is enabled, it will cause problems on little endian systems with code that
+;; uses the vector_size attribute, due to endian issues.
+(define_expand "movoi"
+  [(set (match_operand:OI 0 "nonimmediate_operand")
+	(match_operand:OI 1 "input_operand"))]
+  "0"
+{
+  gcc_unreachable ();
+})
+
+;; Vector pair support.  POImode is only defined for vector registers.
+(define_expand "movpoi"
+  [(set (match_operand:POI 0 "nonimmediate_operand")
+	(match_operand:POI 1 "input_operand"))]
+  "TARGET_MMA"
+{
+  rs6000_emit_move (operands[0], operands[1], POImode);
+  DONE;
+})
+
+(define_insn_and_split "*movpoi"
+  [(set (match_operand:POI 0 "nonimmediate_operand" "=wa,m,wa")
+	(match_operand:POI 1 "input_operand"	    "m,wa,wa"))]
+  "TARGET_MMA
+   && (gpc_reg_operand (operands[0], POImode)
+       || gpc_reg_operand (operands[1], POImode))"
+  "@
+   lxvp%X1 %x0,%1
+   stxvp%X0 %x1,%0
+   #"
+  "&& reload_completed
+   && (!MEM_P (operands[0]) && !MEM_P (operands[1]))"
+  [(const_int 0)]
+{
+  rs6000_split_multireg_move (operands[0], operands[1]);
+  DONE;
+}
+  [(set_attr "type" "vecload,vecstore,veclogical")
+   (set_attr "length" "*,*,8")])
+
+;; Special pattern to prevent DSE from generating an internal error if it
+;; notices a structure copy that it wants to eliminate.  This generates pretty
+;; bad code, but at least it doesn't die.
+(define_insn_and_split "truncpoidi2"
+  [(set (match_operand:DI 0 "gpc_reg_operand" "=r")
+	(truncate:DI (match_operand:POI 1 "gpc_reg_operand" "wa")))]
+  "TARGET_MMA"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0)
+	(vec_select:DI (match_dup 2)
+		       (parallel [(match_dup 3)])))]
+{
+  unsigned r = reg_or_subregno (operands[1]) + !BYTES_BIG_ENDIAN;
+  operands[2] = gen_rtx_REG (V2DImode, r);
+  operands[3] = BYTES_BIG_ENDIAN ? const1_rtx : const0_rtx;
+})
+
+\f
+;; Vector quad load/store operations
+;; We need to define an XImode move pattern, even though we don't enable it,
+;; because the machine independent parts of the compiler at times uses the
+;; large integer modes.
+;;
+;; If we enable movxi, the compiler will try and use it.  Unfortunately, if it
+;; is enabled, it will cause problems on little endian systems with code that
+;; uses the vector_size attribute, due to endian issues.
+(define_expand "movxi"
+  [(set (match_operand:XI 0 "nonimmediate_operand")
+	(match_operand:XI 1 "input_operand"))]
+  "0"
+{
+  gcc_unreachable ();
+})
+
+;; Vector quad support.  PXImode is only defined for floating point registers.
+(define_expand "movpxi"
+  [(set (match_operand:PXI 0 "nonimmediate_operand")
+	(match_operand:PXI 1 "input_operand"))]
+  "TARGET_MMA"
+{
+  rs6000_emit_move (operands[0], operands[1], PXImode);
+  DONE;
+})
+
+(define_insn_and_split "*movpxi"
+  [(set (match_operand:PXI 0 "nonimmediate_operand" "=d,m,d")
+	(match_operand:PXI 1 "input_operand" "m,d,d"))]
+  "TARGET_MMA
+   && (gpc_reg_operand (operands[0], PXImode)
+       || gpc_reg_operand (operands[1], PXImode))"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  rs6000_split_multireg_move (operands[0], operands[1]);
+  DONE;
+}
+  [(set_attr "type" "vecload,vecstore,veclogical")
+   (set_attr "length" "8,8,16")
+   (set_attr "max_prefixed_insns" "2,2,*")])
diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
index 07ca33a89b4..47514552449 100644
--- a/gcc/config/rs6000/rs6000-c.c
+++ b/gcc/config/rs6000/rs6000-c.c
@@ -593,6 +593,10 @@ rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT flags,
      PROCESSOR_CELL) (e.g. -mcpu=cell).  */
   if ((bu_mask & RS6000_BTM_CELL) != 0)
     rs6000_define_or_undefine_macro (define_p, "__PPU__");
+
+  /* Tell the user if we support the MMA instructions.  */
+  if ((flags & OPTION_MASK_MMA) != 0)
+    rs6000_define_or_undefine_macro (define_p, "__MMA__");
 }
 
 void
diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index 817a14c9c0d..eeb20e5200d 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -12205,6 +12205,24 @@ rs6000_init_builtins (void)
   else
     ieee128_float_type_node = ibm128_float_type_node = long_double_type_node;
 
+  /* Vector paired and vector quad support.  */
+  if (TARGET_MMA)
+    {
+      tree oi_uns_type = make_unsigned_type (256);
+      vector_pair_type_node = build_distinct_type_copy (oi_uns_type);
+      SET_TYPE_MODE (vector_pair_type_node, POImode);
+      layout_type (vector_pair_type_node);
+      lang_hooks.types.register_builtin_type (vector_pair_type_node,
+					      "__vector_pair");
+
+      tree xi_uns_type = make_unsigned_type (512);
+      vector_quad_type_node = build_distinct_type_copy (xi_uns_type);
+      SET_TYPE_MODE (vector_quad_type_node, PXImode);
+      layout_type (vector_quad_type_node);
+      lang_hooks.types.register_builtin_type (vector_quad_type_node,
+					      "__vector_quad");
+    }
+
   /* Initialize the modes for builtin_function_type, mapping a machine mode to
      tree type node.  */
   builtin_mode_to_type[QImode][0] = integer_type_node;
@@ -12236,6 +12254,8 @@ rs6000_init_builtins (void)
   builtin_mode_to_type[V8HImode][1] = unsigned_V8HI_type_node;
   builtin_mode_to_type[V16QImode][0] = V16QI_type_node;
   builtin_mode_to_type[V16QImode][1] = unsigned_V16QI_type_node;
+  builtin_mode_to_type[POImode][1] = vector_pair_type_node;
+  builtin_mode_to_type[PXImode][1] = vector_quad_type_node;
 
   tdecl = add_builtin_type ("__bool char", bool_char_type_node);
   TYPE_NAME (bool_char_type_node) = tdecl;
diff --git a/gcc/config/rs6000/rs6000-cpus.def b/gcc/config/rs6000/rs6000-cpus.def
index 83362e05b10..667c7ecefb8 100644
--- a/gcc/config/rs6000/rs6000-cpus.def
+++ b/gcc/config/rs6000/rs6000-cpus.def
@@ -76,7 +76,8 @@
 				 | OPTION_MASK_P9_VECTOR)
 
 /* Flags that need to be turned off if -mno-future.  */
-#define OTHER_FUTURE_MASKS	(OPTION_MASK_PCREL			\
+#define OTHER_FUTURE_MASKS	(OPTION_MASK_MMA			\
+				 | OPTION_MASK_PCREL			\
 				 | OPTION_MASK_PREFIXED)
 
 /* Support for a future processor's features.  */
@@ -132,6 +133,7 @@
 				 | OPTION_MASK_HTM			\
 				 | OPTION_MASK_ISEL			\
 				 | OPTION_MASK_MFCRF			\
+				 | OPTION_MASK_MMA			\
 				 | OPTION_MASK_MODULO			\
 				 | OPTION_MASK_MULHW			\
 				 | OPTION_MASK_NO_UPDATE		\
diff --git a/gcc/config/rs6000/rs6000-modes.def b/gcc/config/rs6000/rs6000-modes.def
index 5f43cadff80..ddb218b3fba 100644
--- a/gcc/config/rs6000/rs6000-modes.def
+++ b/gcc/config/rs6000/rs6000-modes.def
@@ -82,3 +82,13 @@ VECTOR_MODE (INT, SI, 2);     /*                 V2SI  */
    for quad memory atomic operations to force getting an even/odd register
    combination.  */
 PARTIAL_INT_MODE (TI, 128, PTI);
+
+/* Define, but don't use the larger integer modes.  We need an integer mode
+   defined that is the same size as the vector pair and vector quad modes.  */
+
+INT_MODE (OI, 32);
+INT_MODE (XI, 64);
+
+/* Modes used by __vector_pair and __vector_quad.  */
+PARTIAL_INT_MODE (OI, 256, POI);	/* __vector_pair.  */
+PARTIAL_INT_MODE (XI, 512, PXI);	/* __vector_quad.  */
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 58f5d780603..5948f63ba4c 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -1745,6 +1745,9 @@ static const struct attribute_spec rs6000_attribute_table[] =
 #undef TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV_P
 #define TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV_P \
   rs6000_cannot_substitute_mem_equiv_p
+
+#undef TARGET_INVALID_CONVERSION
+#define TARGET_INVALID_CONVERSION rs6000_invalid_conversion
 \f
 
 /* Processor table.  */
@@ -1798,7 +1801,7 @@ rs6000_hard_regno_nregs_internal (int regno, machine_mode mode)
      128-bit floating point that can go in vector registers, which has VSX
      memory addressing.  */
   if (FP_REGNO_P (regno))
-    reg_size = (VECTOR_MEM_VSX_P (mode) || FLOAT128_VECTOR_P (mode)
+    reg_size = (VECTOR_MEM_VSX_P (mode) || VECTOR_ALIGNMENT_P (mode)
 		? UNITS_PER_VSX_WORD
 		: UNITS_PER_FP_WORD);
 
@@ -1821,6 +1824,20 @@ rs6000_hard_regno_mode_ok_uncached (int regno, machine_mode mode)
   if (COMPLEX_MODE_P (mode))
     mode = GET_MODE_INNER (mode);
 
+  /* Vector pair modes need even/odd VSX register pairs.  Only allow vector
+     registers.  We need to allow OImode to have the same registers as POImode,
+     even though we do not enable the move pattern for OImode.  */
+  if (mode == POImode || mode == OImode)
+    return (TARGET_MMA && VSX_REGNO_P (regno)
+	    && (regno & 1) == 0);
+
+  /* MMA accumulator modes need FPR registers divisible by 4.  We need to allow
+     XImode to have the same registers as PXImode, even though we do not enable
+     the move pattern for XImode.  */
+  if (mode == PXImode || mode == XImode)
+    return (TARGET_MMA && FP_REGNO_P (regno)
+	    && (regno & 3) == 0);
+
   /* PTImode can only go in GPRs.  Quad word memory operations require even/odd
      register combinations, and use PTImode where we need to deal with quad
      word memory operations.  Don't allow quad words in the argument or frame
@@ -1836,7 +1853,7 @@ rs6000_hard_regno_mode_ok_uncached (int regno, machine_mode mode)
      asked for it.  */
   if (TARGET_VSX && VSX_REGNO_P (regno)
       && (VECTOR_MEM_VSX_P (mode)
-	  || FLOAT128_VECTOR_P (mode)
+	  || VECTOR_ALIGNMENT_P (mode)
 	  || reg_addr[mode].scalar_in_vmx_p
 	  || mode == TImode
 	  || (TARGET_VADDUQM && mode == V1TImode)))
@@ -1846,7 +1863,7 @@ rs6000_hard_regno_mode_ok_uncached (int regno, machine_mode mode)
 
       if (ALTIVEC_REGNO_P (regno))
 	{
-	  if (GET_MODE_SIZE (mode) != 16 && !reg_addr[mode].scalar_in_vmx_p)
+	  if (GET_MODE_SIZE (mode) < 16 && !reg_addr[mode].scalar_in_vmx_p)
 	    return 0;
 
 	  return ALTIVEC_REGNO_P (last_regno);
@@ -1862,7 +1879,7 @@ rs6000_hard_regno_mode_ok_uncached (int regno, machine_mode mode)
      modes and DImode.  */
   if (FP_REGNO_P (regno))
     {
-      if (FLOAT128_VECTOR_P (mode))
+      if (VECTOR_ALIGNMENT_P (mode))
 	return false;
 
       if (SCALAR_FLOAT_MODE_P (mode)
@@ -1925,15 +1942,19 @@ rs6000_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
    GPR registers, and TImode can go in any GPR as well as VSX registers (PR
    57744).
 
+   Similarly, don't allow POImode (vector pair, restricted to even VSX
+   registers) or PXImode (vector quad, restricted to FPR registers divisible
+   by 4) to tie with other modes.
+
    Altivec/VSX vector tests were moved ahead of scalar float mode, so that IEEE
    128-bit floating point on VSX systems ties with other vectors.  */
 
 static bool
 rs6000_modes_tieable_p (machine_mode mode1, machine_mode mode2)
 {
-  if (mode1 == PTImode)
-    return mode2 == PTImode;
-  if (mode2 == PTImode)
+  if (mode1 == PTImode || mode1 == POImode || mode1 == PXImode)
+    return mode1 == mode2;
+  if (mode2 == PTImode || mode2 == POImode || mode2 == PXImode)
     return false;
 
   if (ALTIVEC_OR_VSX_VECTOR_MODE (mode1))
@@ -2206,6 +2227,8 @@ rs6000_debug_reg_global (void)
     SDmode,
     DDmode,
     TDmode,
+    V2SImode,
+    V2SFmode,
     V16QImode,
     V8HImode,
     V4SImode,
@@ -2220,9 +2243,14 @@ rs6000_debug_reg_global (void)
     V2DFmode,
     V8SFmode,
     V4DFmode,
+    OImode,
+    XImode,
+    POImode,
+    PXImode,
     CCmode,
     CCUNSmode,
     CCEQmode,
+    CCFPmode,
   };
 
   /* Virtual regs we are interested in.  */
@@ -2619,7 +2647,7 @@ rs6000_setup_reg_addr_masks (void)
 		  && (rc == RELOAD_REG_GPR || rc == RELOAD_REG_FPR)
 		  && msize <= 8
 		  && !VECTOR_MODE_P (m2)
-		  && !FLOAT128_VECTOR_P (m2)
+		  && !VECTOR_ALIGNMENT_P (m2)
 		  && !complex_p
 		  && (m != E_DFmode || !TARGET_VSX)
 		  && (m != E_SFmode || !TARGET_P8_VECTOR)
@@ -2675,6 +2703,22 @@ rs6000_setup_reg_addr_masks (void)
 		addr_mask |= RELOAD_REG_QUAD_OFFSET;
 	    }
 
+	  /* Vector pairs can do both indexed and offset loads if the
+	     instructions are enabled, otherwise they can only do offset loads
+	     since it will be broken into two vector moves.  Vector quads can
+	     only do offset loads.  */
+	  else if ((addr_mask != 0) && TARGET_MMA
+		   && (m2 == POImode || m2 == PXImode))
+	    {
+	      addr_mask |= RELOAD_REG_OFFSET;
+	      if (rc == RELOAD_REG_FPR || rc == RELOAD_REG_VMX)
+		{
+		  addr_mask |= RELOAD_REG_QUAD_OFFSET;
+		  if (m2 == POImode)
+		    addr_mask |= RELOAD_REG_INDEXED;
+		}
+	    }
+
 	  /* VMX registers can do (REG & -16) and ((REG+REG) & -16)
 	     addressing on 128-bit types.  */
 	  if (rc == RELOAD_REG_VMX && msize == 16
@@ -2876,6 +2920,18 @@ rs6000_init_hard_regno_mode_ok (bool global_init_p)
       rs6000_vector_align[TImode] = align64;
     }
 
+  /* Add support for vector pairs and vector quad registers.  */
+  if (TARGET_MMA)
+    {
+      for (m = 0; m < NUM_MACHINE_MODES; ++m)
+	if (m == POImode || m == PXImode)
+	  {
+	    rs6000_vector_unit[m] = VECTOR_NONE;
+	    rs6000_vector_mem[m] = VECTOR_VSX;
+	    rs6000_vector_align[m] = (m == POImode) ? 256 : 512;
+	  }
+    }
+
   /* Register class constraints for the constraints that depend on compile
      switches. When the VSX code was added, different constraints were added
      based on the type (DFmode, V2DFmode, V4SFmode).  For the vector types, all
@@ -3007,6 +3063,14 @@ rs6000_init_hard_regno_mode_ok (bool global_init_p)
 		  reg_addr[TFmode].reload_gpr_vsx = CODE_FOR_reload_gpr_from_vsxtf;
 		  reg_addr[TFmode].reload_vsx_gpr = CODE_FOR_reload_vsx_from_gprtf;
 		}
+
+	      if (TARGET_MMA)
+		{
+		  reg_addr[POImode].reload_store = CODE_FOR_reload_poi_di_store;
+		  reg_addr[POImode].reload_load = CODE_FOR_reload_poi_di_load;
+		  reg_addr[PXImode].reload_store = CODE_FOR_reload_pxi_di_store;
+		  reg_addr[PXImode].reload_load = CODE_FOR_reload_pxi_di_load;
+		}
 	    }
 	}
       else
@@ -3339,7 +3403,8 @@ rs6000_builtin_mask_calculate (void)
 	      && !TARGET_IEEEQUAD)	    ? RS6000_BTM_LDBL128   : 0)
 	  | ((TARGET_FLOAT128_TYPE)	    ? RS6000_BTM_FLOAT128  : 0)
 	  | ((TARGET_FLOAT128_HW)	    ? RS6000_BTM_FLOAT128_HW : 0)
-	  | ((TARGET_FUTURE)                ? RS6000_BTM_FUTURE    : 0));
+	  | ((TARGET_MMA)		    ? RS6000_BTM_MMA	   : 0)
+	  | ((TARGET_FUTURE)		    ? RS6000_BTM_FUTURE    : 0));
 }
 
 /* Implement TARGET_MD_ASM_ADJUST.  All asm statements are considered
@@ -4202,6 +4267,15 @@ rs6000_option_override_internal (bool global_init_p)
       rs6000_isa_flags &= ~OPTION_MASK_PCREL;
     }
 
+  /* Turn off vector pair/mma options on non-future systems.  */
+  if (!TARGET_FUTURE && TARGET_MMA)
+    {
+      if ((rs6000_isa_flags_explicit & OPTION_MASK_MMA) != 0)
+	error ("%qs requires %qs", "-mmma", "-mcpu=future");
+
+      rs6000_isa_flags &= ~OPTION_MASK_MMA;
+    }
+
   if (TARGET_DEBUG_REG || TARGET_DEBUG_TARGET)
     rs6000_print_isa_options (stderr, 0, "after subtarget", rs6000_isa_flags);
 
@@ -7175,7 +7249,7 @@ rs6000_slow_unaligned_access (machine_mode mode, unsigned int align)
   return (STRICT_ALIGNMENT
 	  || (!TARGET_EFFICIENT_UNALIGNED_VSX
 	      && ((SCALAR_FLOAT_MODE_NOT_VECTOR_P (mode) && align < 32)
-		  || ((VECTOR_MODE_P (mode) || FLOAT128_VECTOR_P (mode))
+		  || ((VECTOR_MODE_P (mode) || VECTOR_ALIGNMENT_P (mode))
 		      && (int) align < VECTOR_ALIGN (mode)))));
 }
 
@@ -7360,7 +7434,7 @@ quad_address_p (rtx addr, machine_mode mode, bool strict)
 {
   rtx op0, op1;
 
-  if (GET_MODE_SIZE (mode) != 16)
+  if (GET_MODE_SIZE (mode) < 16)
     return false;
 
   if (legitimate_indirect_address_p (addr, strict))
@@ -7678,6 +7752,12 @@ reg_offset_addressing_ok_p (machine_mode mode)
 	return mode_supports_dq_form (mode);
       break;
 
+      /* The vector pair/quad types support offset addressing if the
+	 underlying vectors support offset addressing.  */
+    case E_POImode:
+    case E_PXImode:
+      return TARGET_MMA;
+
     case E_SDmode:
       /* If we can do direct load/stores of SDmode, restrict it to reg+reg
 	 addressing for the LFIWZX and STFIWX instructions.  */
@@ -8024,8 +8104,14 @@ legitimate_indexed_address_p (rtx x, int strict)
 bool
 avoiding_indexed_address_p (machine_mode mode)
 {
-  /* Avoid indexed addressing for modes that have non-indexed
-     load/store instruction forms.  */
+  unsigned int msize = GET_MODE_SIZE (mode);
+
+  /* Avoid indexed addressing for modes that have non-indexed load/store
+     instruction forms.  On the future system, vector pairs have an indexed
+     form, but vector quads don't.  */
+  if (msize > 16)
+    return msize != 32;
+
   return (TARGET_AVOID_XFORM && VECTOR_MEM_NONE_P (mode));
 }
 
@@ -9856,6 +9942,13 @@ rs6000_emit_move (rtx dest, rtx source, machine_mode mode)
 	operands[1] = force_const_mem (mode, operands[1]);
       break;
 
+    case E_POImode:
+    case E_PXImode:
+      if (CONSTANT_P (operands[1]))
+	error ("%qs is an opaque type, and you can't set it to other values.",
+	       (mode == POImode) ? "__vector_pair" : "__vector_quad");
+      break;
+
     case E_SImode:
     case E_DImode:
       /* Use default pattern for address of ELF small data */
@@ -12117,8 +12210,20 @@ rs6000_preferred_reload_class (rtx x, enum reg_class rclass)
       return NO_REGS;
     }
 
-  if (GET_MODE_CLASS (mode) == MODE_INT && rclass == GEN_OR_FLOAT_REGS)
-    return GENERAL_REGS;
+  /* For the vector pair and vector quad modes, prefer their natural register
+     (VSX or FPR) rather than GPR registers.  For other integer types, prefer
+     the GPR registers.  */
+  if (rclass == GEN_OR_FLOAT_REGS)
+    {
+      if (mode == POImode)
+	return VSX_REGS;
+
+      if (mode == PXImode)
+	return FLOAT_REGS;
+
+      if (GET_MODE_CLASS (mode) == MODE_INT)
+	return GENERAL_REGS;
+    }
 
   return rclass;
 }
@@ -15793,7 +15898,23 @@ rs6000_split_multireg_move (rtx dst, rtx src)
   reg = REG_P (dst) ? REGNO (dst) : REGNO (src);
   mode = GET_MODE (dst);
   nregs = hard_regno_nregs (reg, mode);
-  if (FP_REGNO_P (reg))
+  /* If we have a quad vector register for MMA, and this is a load or store,
+     see if we can use vector paired load/stores.  */
+  if (mode == PXImode && TARGET_MMA
+      && (MEM_P (dst) || MEM_P (src)))
+    {
+      reg_mode = POImode;;
+      nregs /= hard_regno_nregs (reg, reg_mode);
+    }
+
+  /* If we have a vector pair/quad mode, split it into two/four separate
+     vectors.  */
+  else if (mode == POImode || mode == PXImode)
+    {
+      reg_mode = V1TImode;
+      nregs /= hard_regno_nregs (reg, reg_mode);
+    }
+  else if (FP_REGNO_P (reg))
     reg_mode = DECIMAL_FLOAT_MODE_P (mode) ? DDmode :
 	(TARGET_HARD_FLOAT ? DFmode : SFmode);
   else if (ALTIVEC_REGNO_P (reg))
@@ -15837,6 +15958,48 @@ rs6000_split_multireg_move (rtx dst, rtx src)
       return;
     }
 
+  /* For __vector_pair and __vector_quad modes we have to load or store the
+     registers so that things are properly swapped in little endian mode.
+     This means the last register gets the first memory location.  */
+  if (!WORDS_BIG_ENDIAN && (mode == POImode || mode == PXImode))
+    {
+      if (MEM_P (dst))
+	{
+	  unsigned offset = 0;
+	  unsigned size = GET_MODE_SIZE (reg_mode);
+
+	  for (int i = nregs - 1; i >= 0; i--)
+	    {
+	      rtx dst2 = adjust_address (dst, reg_mode, offset);
+	      rtx src2 = simplify_gen_subreg (reg_mode, src, mode, i * size);
+	      offset += size;
+
+	      emit_insn (gen_rtx_SET (dst2, src2));
+	    }
+
+	  return;
+	}
+
+      if (MEM_P (src))
+	{
+	  unsigned offset = 0;
+	  unsigned size = GET_MODE_SIZE (reg_mode);
+
+	  for (int i = nregs - 1; i >= 0; i--)
+	    {
+	      rtx dst2 = simplify_gen_subreg (reg_mode, dst, mode, i * size);
+	      rtx src2 = adjust_address (src, reg_mode, offset);
+	      offset += size;
+
+	      emit_insn (gen_rtx_SET (dst2, src2));
+	    }
+
+	  return;
+	}
+
+      /* Register -> register moves can use common code.  */
+    }
+
   if (REG_P (src) && REG_P (dst) && (REGNO (src) < REGNO (dst)))
     {
       /* Move register range backwards, if we might have destructive
@@ -19227,7 +19390,8 @@ rs6000_handle_altivec_attribute (tree *node,
 
 /* AltiVec defines five built-in scalar types that serve as vector
    elements; we must teach the compiler how to mangle them.  The 128-bit
-   floating point mangling is target-specific as well.  */
+   floating point mangling is target-specific as well.  MMA defines
+   two built-in types to be used as opaque vector types.  */
 
 static const char *
 rs6000_mangle_type (const_tree type)
@@ -19249,6 +19413,9 @@ rs6000_mangle_type (const_tree type)
   if (SCALAR_FLOAT_TYPE_P (type) && FLOAT128_IEEE_P (TYPE_MODE (type)))
     return ieee128_mangling_gcc_8_1 ? "U10__float128" : "u9__ieee128";
 
+  if (type == vector_pair_type_node) return "u13__vector_pair";
+  if (type == vector_quad_type_node) return "u13__vector_quad";
+
   /* For all other types, use the default mangling.  */
   return NULL;
 }
@@ -22506,7 +22673,7 @@ rs6000_function_value (const_tree valtype,
   /* VSX is a superset of Altivec and adds V2DImode/V2DFmode.  Since the same
      return register is used in both cases, and we won't see V2DImode/V2DFmode
      for pure altivec, combine the two cases.  */
-  else if ((TREE_CODE (valtype) == VECTOR_TYPE || FLOAT128_VECTOR_P (mode))
+  else if ((TREE_CODE (valtype) == VECTOR_TYPE || VECTOR_ALIGNMENT_P (mode))
 	   && TARGET_ALTIVEC && TARGET_ALTIVEC_ABI
 	   && ALTIVEC_OR_VSX_VECTOR_MODE (mode))
     regno = ALTIVEC_ARG_RETURN;
@@ -22922,6 +23089,7 @@ static struct rs6000_opt_mask const rs6000_opt_masks[] =
   { "isel",			OPTION_MASK_ISEL,		false, true  },
   { "mfcrf",			OPTION_MASK_MFCRF,		false, true  },
   { "mfpgpr",			0,				false, true  },
+  { "mma",			OPTION_MASK_MMA,		false, true  },
   { "modulo",			OPTION_MASK_MODULO,		false, true  },
   { "mulhw",			OPTION_MASK_MULHW,		false, true  },
   { "multiple",			OPTION_MASK_MULTIPLE,		false, true  },
@@ -22992,6 +23160,8 @@ static struct rs6000_opt_mask const rs6000_builtin_mask_names[] =
   { "powerpc64",	 RS6000_BTM_POWERPC64,  false, false },
   { "float128",		 RS6000_BTM_FLOAT128,   false, false },
   { "float128-hw",	 RS6000_BTM_FLOAT128_HW,false, false },
+  { "mma",		 RS6000_BTM_MMA,	false, false },
+  { "future",		 RS6000_BTM_FUTURE,	false, false },
 };
 
 /* Option variables that we want to support inside attribute((target)) and
@@ -24947,7 +25117,7 @@ address_to_insn_form (rtx addr,
 	non_prefixed_format = NON_PREFIXED_DS;
 
       else if (TARGET_VSX && size >= 16
-	       && (VECTOR_MODE_P (mode) || FLOAT128_VECTOR_P (mode)))
+	       && (VECTOR_MODE_P (mode) || VECTOR_ALIGNMENT_P (mode)))
 	non_prefixed_format = NON_PREFIXED_DQ;
 
       else
@@ -25076,7 +25246,7 @@ reg_to_non_prefixed (rtx reg, machine_mode mode)
 
       else if (TARGET_VSX && size >= 16
 	       && (VECTOR_MODE_P (mode)
-		   || FLOAT128_VECTOR_P (mode)
+		   || VECTOR_ALIGNMENT_P (mode)
 		   || mode == TImode || mode == CTImode))
 	return (TARGET_P9_VECTOR) ? NON_PREFIXED_DQ : NON_PREFIXED_X;
 
@@ -25100,7 +25270,7 @@ reg_to_non_prefixed (rtx reg, machine_mode mode)
 
       else if (TARGET_VSX && size >= 16
 	       && (VECTOR_MODE_P (mode)
-		   || FLOAT128_VECTOR_P (mode)
+		   || VECTOR_ALIGNMENT_P (mode)
 		   || mode == TImode || mode == CTImode))
 	return NON_PREFIXED_DQ;
 
@@ -26494,6 +26664,45 @@ rs6000_cannot_substitute_mem_equiv_p (rtx mem)
   return false;
 }
 
+/* Implement TARGET_INVALID_CONVERSION.  */
+
+static const char *
+rs6000_invalid_conversion (const_tree fromtype, const_tree totype)
+{
+  if (element_mode (fromtype) != element_mode (totype))
+    {
+      /* Do not allow conversions to/from PXImode and POImode types.  */
+      if (TYPE_MODE (fromtype) == PXImode)
+	return N_("invalid conversion from type %<__vector_quad%>");
+      if (TYPE_MODE (totype) == PXImode)
+	return N_("invalid conversion to type %<__vector_quad%>");
+      if (TYPE_MODE (fromtype) == POImode)
+	return N_("invalid conversion from type %<__vector_pair%>");
+      if (TYPE_MODE (totype) == POImode)
+	return N_("invalid conversion to type %<__vector_pair%>");
+    }
+  else if (POINTER_TYPE_P (fromtype) && POINTER_TYPE_P (totype))
+    {
+      /* Do not allow conversions to/from PXImode and POImode pointer
+	 types, except to/from void pointers.  */
+      if (TYPE_MODE (TREE_TYPE (fromtype)) == PXImode
+	  && TYPE_MODE (TREE_TYPE (totype)) != VOIDmode)
+	return N_("invalid conversion from type %<* __vector_quad%>");
+      if (TYPE_MODE (TREE_TYPE (totype)) == PXImode
+	  && TYPE_MODE (TREE_TYPE (fromtype)) != VOIDmode)
+	return N_("invalid conversion to type %<* __vector_quad%>");
+      if (TYPE_MODE (TREE_TYPE (fromtype)) == POImode
+	  && TYPE_MODE (TREE_TYPE (totype)) != VOIDmode)
+	return N_("invalid conversion from type %<* __vector_pair%>");
+      if (TYPE_MODE (TREE_TYPE (totype)) == POImode
+	  && TYPE_MODE (TREE_TYPE (fromtype)) != VOIDmode)
+	return N_("invalid conversion to type %<* __vector_pair%>");
+    }
+
+  /* Conversion allowed.  */
+  return NULL;
+}
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-rs6000.h"
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 1209a33173e..9c103bf8f7d 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -522,6 +522,7 @@ extern int rs6000_vector_align[];
 #define MASK_HTM			OPTION_MASK_HTM
 #define MASK_ISEL			OPTION_MASK_ISEL
 #define MASK_MFCRF			OPTION_MASK_MFCRF
+#define MASK_MMA			OPTION_MASK_MMA
 #define MASK_MULHW			OPTION_MASK_MULHW
 #define MASK_MULTIPLE			OPTION_MASK_MULTIPLE
 #define MASK_NO_UPDATE			OPTION_MASK_NO_UPDATE
@@ -776,7 +777,7 @@ extern unsigned rs6000_pointer_size;
 #define FUNCTION_BOUNDARY 32
 
 /* No data type wants to be aligned rounder than this.  */
-#define BIGGEST_ALIGNMENT 128
+#define BIGGEST_ALIGNMENT ((TARGET_MMA) ? 512 : 128)
 
 /* Alignment of field after `int : 0' in a structure.  */
 #define EMPTY_FIELD_BOUNDARY 32
@@ -1035,16 +1036,17 @@ enum data_align { align_abi, align_opt, align_both };
 	 ((MODE) == V4SFmode		\
 	  || (MODE) == V2DFmode)	\
 
-/* Note KFmode and possibly TFmode (i.e. IEEE 128-bit floating point) are not
-   really a vector, but we want to treat it as a vector for moves, and
-   such.  */
+/* Modes that are not vectors, but require vector alignment.  Treat these like
+   vectors in terms of loads and stores.  */
+#define VECTOR_ALIGNMENT_P(MODE)					\
+  (FLOAT128_VECTOR_P (MODE) || (MODE) == POImode || (MODE) == PXImode)
 
 #define ALTIVEC_VECTOR_MODE(MODE)					\
   ((MODE) == V16QImode							\
    || (MODE) == V8HImode						\
    || (MODE) == V4SFmode						\
    || (MODE) == V4SImode						\
-   || FLOAT128_VECTOR_P (MODE))
+   || VECTOR_ALIGNMENT_P (MODE))
 
 #define ALTIVEC_OR_VSX_VECTOR_MODE(MODE)				\
   (ALTIVEC_VECTOR_MODE (MODE) || VSX_VECTOR_MODE (MODE)			\
@@ -2309,6 +2311,7 @@ extern int frame_pointer_needed;
 #define RS6000_BTM_POWERPC64	MASK_POWERPC64	/* 64-bit registers.  */
 #define RS6000_BTM_FLOAT128	MASK_FLOAT128_KEYWORD /* IEEE 128-bit float.  */
 #define RS6000_BTM_FLOAT128_HW	MASK_FLOAT128_HW /* IEEE 128-bit float h/w.  */
+#define RS6000_BTM_MMA		MASK_MMA	/* ISA 3.1 MMA.  */
 #define RS6000_BTM_FUTURE	MASK_FUTURE
 
 
@@ -2331,7 +2334,9 @@ extern int frame_pointer_needed;
 				 | RS6000_BTM_LDBL128			\
 				 | RS6000_BTM_POWERPC64			\
 				 | RS6000_BTM_FLOAT128			\
-				 | RS6000_BTM_FLOAT128_HW)
+				 | RS6000_BTM_FLOAT128_HW		\
+				 | RS6000_BTM_MMA			\
+				 | RS6000_BTM_FUTURE)
 
 /* Define builtin enum index.  */
 
@@ -2443,6 +2448,8 @@ enum rs6000_builtin_type_index
   RS6000_BTI_ieee128_float,	 /* ieee 128-bit floating point */
   RS6000_BTI_ibm128_float,	 /* IBM 128-bit floating point */
   RS6000_BTI_const_str,		 /* pointer to const char * */
+  RS6000_BTI_vector_pair,	 /* unsigned 256-bit types (vector pair).  */
+  RS6000_BTI_vector_quad,	 /* unsigned 512-bit types (vector quad).  */
   RS6000_BTI_MAX
 };
 
@@ -2495,6 +2502,8 @@ enum rs6000_builtin_type_index
 #define ieee128_float_type_node		 (rs6000_builtin_types[RS6000_BTI_ieee128_float])
 #define ibm128_float_type_node		 (rs6000_builtin_types[RS6000_BTI_ibm128_float])
 #define const_str_type_node		 (rs6000_builtin_types[RS6000_BTI_const_str])
+#define vector_pair_type_node		 (rs6000_builtin_types[RS6000_BTI_vector_pair])
+#define vector_quad_type_node		 (rs6000_builtin_types[RS6000_BTI_vector_quad])
 
 extern GTY(()) tree rs6000_builtin_types[RS6000_BTI_MAX];
 extern GTY(()) tree rs6000_builtin_decls[RS6000_BUILTIN_COUNT];
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 0aa5265d199..6b462a3ecdb 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -322,7 +322,7 @@ (define_attr "cpu"
   (const (symbol_ref "(enum attr_cpu) rs6000_tune")))
 
 ;; The ISA we implement.
-(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9v,p9kf,p9tf,fut"
+(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9v,p9kf,p9tf,fut,mma"
   (const_string "any"))
 
 ;; Is this alternative enabled for the current CPU/ISA/etc.?
@@ -366,6 +366,10 @@ (define_attr "enabled" ""
      (and (eq_attr "isa" "fut")
 	  (match_test "TARGET_FUTURE"))
      (const_int 1)
+
+     (and (eq_attr "isa" "mma")
+	  (match_test "TARGET_MMA"))
+     (const_int 1)
     ] (const_int 0)))
 
 ;; If this instruction is microcoded on the CELL processor
@@ -772,7 +776,8 @@ (define_mode_attr BOOL_REGS_UNARY	[(TI	"r,0,0,wa,v")
 ;; Reload iterator for creating the function to allocate a base register to
 ;; supplement addressing modes.
 (define_mode_iterator RELOAD [V16QI V8HI V4SI V2DI V4SF V2DF V1TI
-			      SF SD SI DF DD DI TI PTI KF IF TF])
+			      SF SD SI DF DD DI TI PTI KF IF TF
+			      POI PXI])
 
 ;; Iterate over smin, smax
 (define_code_iterator fp_minmax	[smin smax])
@@ -14866,6 +14871,7 @@ (define_insn "*cmpeqb_internal"
 (include "vector.md")
 (include "vsx.md")
 (include "altivec.md")
+(include "mma.md")
 (include "dfp.md")
 (include "crypto.md")
 (include "htm.md")
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index f95b8279270..92951483e4e 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -578,3 +578,7 @@ Generate (do not generate) prefixed memory instructions.
 mpcrel
 Target Report Mask(PCREL) Var(rs6000_isa_flags)
 Generate (do not generate) pc-relative memory addressing.
+
+mmma
+Target Report Mask(MMA) Var(rs6000_isa_flags)
+Generate (do not generate) MMA instructions.
diff --git a/gcc/config/rs6000/t-rs6000 b/gcc/config/rs6000/t-rs6000
index 170a69591dd..81d550ce236 100644
--- a/gcc/config/rs6000/t-rs6000
+++ b/gcc/config/rs6000/t-rs6000
@@ -83,6 +83,7 @@ MD_INCLUDES = $(srcdir)/config/rs6000/rs64.md \
 	$(srcdir)/config/rs6000/vector.md \
 	$(srcdir)/config/rs6000/vsx.md \
 	$(srcdir)/config/rs6000/altivec.md \
+	$(srcdir)/config/rs6000/mma.md \
 	$(srcdir)/config/rs6000/crypto.md \
 	$(srcdir)/config/rs6000/htm.md \
 	$(srcdir)/config/rs6000/dfp.md
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 06a04e3d7dd..1452aabe693 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1201,7 +1201,7 @@ See RS/6000 and PowerPC Options.
 -mgnu-attribute  -mno-gnu-attribute @gol
 -mstack-protector-guard=@var{guard} -mstack-protector-guard-reg=@var{reg} @gol
 -mstack-protector-guard-offset=@var{offset} -mprefixed -mno-prefixed @gol
--mpcrel -mno-pcrel}
+-mpcrel -mno-pcrel -mmma -mno-mmma}
 
 @emph{RX Options}
 @gccoptlist{-m64bit-doubles  -m32bit-doubles  -fpu  -nofpu@gol
@@ -25940,7 +25940,8 @@ following options:
 -mpowerpc-gpopt  -mpowerpc-gfxopt @gol
 -mmulhw  -mdlmzb  -mmfpgpr  -mvsx @gol
 -mcrypto  -mhtm  -mpower8-fusion  -mpower8-vector @gol
--mquad-memory  -mquad-memory-atomic  -mfloat128  -mfloat128-hardware}
+-mquad-memory  -mquad-memory-atomic  -mfloat128 @gol
+-mfloat128-hardware -mprefixed -mpcrel -mmma}
 
 The particular options set for any particular CPU varies between
 compiler versions, depending on what setting seems to produce optimal
@@ -26936,6 +26937,13 @@ addressing (@option{-mprefixed}) options are enabled.
 @opindex mno-prefixed
 Generate (do not generate) addressing modes using prefixed load and
 store instructions when the option @option{-mcpu=future} is used.
+
+@item -mmma
+@itemx -mno-mma
+@opindex mmma
+@opindex mno-mma
+Generate (do not generate) the MMA instructions when the option
+@option{-mcpu=future} is used.
 @end table
 
 @node RX Options

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 2/3] rs6000: Add MMA built-in function definitions
  2020-06-15 19:54 [PATCH 0/3] rs6000: Add support for Matrix-Multiply Assist (MMA) built-in functions Peter Bergner
  2020-06-15 19:56 ` [PATCH 1/3] rs6000: Add base support and types for defining MMA built-ins Peter Bergner
@ 2020-06-15 19:58 ` Peter Bergner
  2020-06-15 22:43   ` will schmidt
  2020-06-15 19:59 ` [PATCH 3/3] rs6000: Add testsuite test cases for MMA built-ins Peter Bergner
  2 siblings, 1 reply; 12+ messages in thread
From: Peter Bergner @ 2020-06-15 19:58 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: David Edelsohn, GCC Patches, Bill Schmidt, Michael Meissner

This patches adds the actual MMA built-ins.  The MMA accumulators are INOUT
operands for most MMA instructions, but they are also very expensive to
move around.  For this reason, we have implemented a built-in API
where the accumulators are passed using pass-by-reference/pointers, so
the user won't use one accumulator as input and another as output,
which would entail a lot of copies.  However, using pointers gives us
poor code generation when we expand the built-ins at normal expand time.
We therefore expand the MMA built-ins early into gimple, converting
the pass-by-reference calls to an internal built-in that uses pass-by-value
calling convention, where we can enforce the input and output accumulators
are the same.  This gives us much better code generation.

The associated test cases for these built-ins are in patch3.

This patch plus patch1 passed bootstrap and regtesting with no regressions
on both powerpc64le-linux and powerpc64-linux.  Ok for trunk?

Peter

2020-06-15  Peter Bergner  <bergner@linux.ibm.com>

gcc/
	* config/rs6000/predicates.md (mma_input_operand): New predicate.
	* config/rs6000/rs6000-builtin.def (BU_MMA_1, BU_MMA_V2, BU_MMA_3,
	BU_MMA_5, BU_MMA_6, BU_VSX_1): Add support macros for defining MMA
	built-in functions.
	(ASSEMBLE_ACC, ASSEMBLE_PAIR, DISASSEMBLE_ACC, DISASSEMBLE_PAIR,
	PMXVBF16GER2, PMXVBF16GER2NN, PMXVBF16GER2NP, PMXVBF16GER2PN,
	PMXVBF16GER2PP, PMXVF16GER2, PMXVF16GER2NN, PMXVF16GER2NP,
	PMXVF16GER2PN, PMXVF16GER2PP, PMXVF32GER, PMXVF32GERNN,
	PMXVF32GERNP, PMXVF32GERPN, PMXVF32GERPP, PMXVF64GER, PMXVF64GERNN,
	PMXVF64GERNP, PMXVF64GERPN, PMXVF64GERPP, PMXVI16GER2, PMXVI16GER2PP,
	PMXVI16GER2S, PMXVI16GER2SPP, PMXVI4GER8, PMXVI4GER8PP, PMXVI8GER4,
	PMXVI8GER4PP, PMXVI8GER4SPP, XVBF16GER2, XVBF16GER2NN, XVBF16GER2NP,
	XVBF16GER2PN, XVBF16GER2PP, XVCVBF16SP, XVCVSPBF16, XVF16GER2,
	XVF16GER2NN, XVF16GER2NP, XVF16GER2PN, XVF16GER2PP, XVF32GER,
	XVF32GERNN, XVF32GERNP, XVF32GERPN, XVF32GERPP, XVF64GER, XVF64GERNN,
	XVF64GERNP, XVF64GERPN, XVF64GERPP, XVI16GER2, XVI16GER2PP, XVI16GER2S,
	XVI16GER2SPP, XVI4GER8, XVI4GER8PP, XVI8GER4, XVI8GER4PP, XVI8GER4SPP,
	XXMFACC, XXMTACC, XXSETACCZ): Add MMA built-ins.
	* config/rs6000/rs6000.c (rs6000_emit_move): Allow zero constants.
	(print_operand) <case 'A'>: New output modifier.
	(rs6000_split_multireg_move): Add support for inserting accumulator
	priming and depriming instructions.  Add support for splitting an
	assemble accumulator pattern.
	* config/rs6000/rs6000-call.c (mma_init_builtins, mma_expand_builtin,
	rs6000_gimple_fold_mma_builtin): New functions.
	(RS6000_BUILTIN_M): New macro.
	(def_builtin): Handle RS6000_BTC_QUAD and RS6000_BTC_PAIR attributes.
	(bdesc_mma): Add new MMA built-in support.
	(htm_expand_builtin): Use RS6000_BTC_OPND_MASK.
	(rs6000_invalid_builtin): Add handling of RS6000_BTM_FUTURE and
	RS6000_BTM_MMA.
	(rs6000_builtin_valid_without_lhs): Handle RS6000_BTC_VOID attribute.
	(rs6000_gimple_fold_builtin): Call rs6000_builtin_is_supported_p
	and rs6000_gimple_fold_mma_builtin.
	(rs6000_expand_builtin): Call mma_expand_builtin.
	Use RS6000_BTC_OPND_MASK.
	(rs6000_init_builtins): Adjust comment.  Call mma_init_builtins.
	(htm_init_builtins): Use RS6000_BTC_OPND_MASK.
	(builtin_function_type): Handle VSX_BUILTIN_XVCVSPBF16 and
	VSX_BUILTIN_XVCVBF16SP.
	* config/rs6000/rs6000.h (RS6000_BTC_QUINARY, RS6000_BTC_SENARY,
	RS6000_BTC_OPND_MASK, RS6000_BTC_QUAD, RS6000_BTC_PAIR,
	RS6000_BTC_QUADPAIR, RS6000_BTC_GIMPLE): New defines.
	(RS6000_BTC_PREDICATE, RS6000_BTC_ABS, RS6000_BTC_DST,
	RS6000_BTC_TYPE_MASK, RS6000_BTC_ATTR_MASK): Adjust values.
	* config/rs6000/mma.md (MAX_MMA_OPERANDS): New define_constant.
	(UNSPEC_MMA_ASSEMBLE_ACC, UNSPEC_MMA_PMXVBF16GER2,
	UNSPEC_MMA_PMXVBF16GER2NN, UNSPEC_MMA_PMXVBF16GER2NP,
	UNSPEC_MMA_PMXVBF16GER2PN, UNSPEC_MMA_PMXVBF16GER2PP,
	UNSPEC_MMA_PMXVF16GER2, UNSPEC_MMA_PMXVF16GER2NN,
	UNSPEC_MMA_PMXVF16GER2NP, UNSPEC_MMA_PMXVF16GER2PN,
	UNSPEC_MMA_PMXVF16GER2PP, UNSPEC_MMA_PMXVF32GER,
	UNSPEC_MMA_PMXVF32GERNN, UNSPEC_MMA_PMXVF32GERNP,
	UNSPEC_MMA_PMXVF32GERPN, UNSPEC_MMA_PMXVF32GERPP,
	UNSPEC_MMA_PMXVF64GER, UNSPEC_MMA_PMXVF64GERNN,
	UNSPEC_MMA_PMXVF64GERNP, UNSPEC_MMA_PMXVF64GERPN,
	UNSPEC_MMA_PMXVF64GERPP, UNSPEC_MMA_PMXVI16GER2,
	UNSPEC_MMA_PMXVI16GER2PP, UNSPEC_MMA_PMXVI16GER2S,
	UNSPEC_MMA_PMXVI16GER2SPP, UNSPEC_MMA_PMXVI4GER8,
	UNSPEC_MMA_PMXVI4GER8PP, UNSPEC_MMA_PMXVI8GER4,
	UNSPEC_MMA_PMXVI8GER4PP, UNSPEC_MMA_PMXVI8GER4SPP,
	UNSPEC_MMA_XVBF16GER2, UNSPEC_MMA_XVBF16GER2NN,
	UNSPEC_MMA_XVBF16GER2NP, UNSPEC_MMA_XVBF16GER2PN,
	UNSPEC_MMA_XVBF16GER2PP, UNSPEC_MMA_XVF16GER2, UNSPEC_MMA_XVF16GER2NN,
	UNSPEC_MMA_XVF16GER2NP, UNSPEC_MMA_XVF16GER2PN, UNSPEC_MMA_XVF16GER2PP,
	UNSPEC_MMA_XVF32GER, UNSPEC_MMA_XVF32GERNN, UNSPEC_MMA_XVF32GERNP,
	UNSPEC_MMA_XVF32GERPN, UNSPEC_MMA_XVF32GERPP, UNSPEC_MMA_XVF64GER,
	UNSPEC_MMA_XVF64GERNN, UNSPEC_MMA_XVF64GERNP, UNSPEC_MMA_XVF64GERPN,
	UNSPEC_MMA_XVF64GERPP, UNSPEC_MMA_XVI16GER2, UNSPEC_MMA_XVI16GER2PP,
	UNSPEC_MMA_XVI16GER2S, UNSPEC_MMA_XVI16GER2SPP, UNSPEC_MMA_XVI4GER8,
	UNSPEC_MMA_XVI4GER8PP, UNSPEC_MMA_XVI8GER4, UNSPEC_MMA_XVI8GER4PP,
	UNSPEC_MMA_XVI8GER4SPP, UNSPEC_MMA_XXMFACC, UNSPEC_MMA_XXMTACC): New.
	(MMA_ACC, MMA_VV, MMA_AVV, MMA_PV, MMA_APV, MMA_VVI4I4I8,
	MMA_AVVI4I4I8, MMA_VVI4I4I2, MMA_AVVI4I4I2, MMA_VVI4I4,
	MMA_AVVI4I4, MMA_PVI4I2, MMA_APVI4I2, MMA_VVI4I4I4,
	MMA_AVVI4I4I4): New define_int_iterator.
	(acc, vv, avv, pv, apv, vvi4i4i8, avvi4i4i8, vvi4i4i2,
	avvi4i4i2, vvi4i4, avvi4i4, pvi4i2, apvi4i2, vvi4i4i4,
	avvi4i4i4): New define_int_attr.
	(*movpxi): Add zero constant alternative.
	(mma_assemble_pair, mma_assemble_acc): New define_expand.
	(*mma_assemble_acc): New define_insn_and_split.
	(mma_<acc>, mma_xxsetaccz, mma_<vv>, mma_<avv>, mma_<pv>, mma_<apv>,
	mma_<vvi4i4i8>, mma_<avvi4i4i8>, mma_<vvi4i4i2>, mma_<avvi4i4i2>,
	mma_<vvi4i4>, mma_<avvi4i4>, mma_<pvi4i2>, mma_<apvi4i2>,
	mma_<vvi4i4i4>, mma_<avvi4i4i4>): New define_insn.
	* config/rs6000/rs6000.md ('type' attribute): Add mma type.
	* config/rs6000/vsx.md (UNSPEC_VSX_XVCVBF16SP): New.
	(UNSPEC_VSX_XVCVSPBF16): Likewise.
	(XVCVBF16): New define_int_iterator.
	(xvcvbf16): New define_int_attr.
	(vsx_<xvcvbf16>): New define_insn.
	* doc/extend.texi: Document the mma built-ins.

diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index c3f460face2..4e37ce35c5d 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -1119,6 +1119,12 @@ (define_predicate "splat_input_operand"
   return gpc_reg_operand (op, mode);
 })
 
+;; Return 1 if this operand is valid for a MMA assemble accumulator insn.
+(define_special_predicate "mma_input_operand"
+  (match_test "(mode == PXImode
+		&& (GET_MODE (op) == V16QImode)
+		&& (vsx_register_operand (op, GET_MODE (op)) || MEM_P (op)))"))
+
 ;; Return true if operand is an operator used in rotate-and-mask instructions.
 (define_predicate "rotate_mask_operator"
   (match_code "rotate,ashift,lshiftrt"))
diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def
index 8b1ddb00045..968c46cc36f 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -32,6 +32,7 @@
    RS6000_BUILTIN_A -- ABS builtins
    RS6000_BUILTIN_D -- DST builtins
    RS6000_BUILTIN_H -- HTM builtins
+   RS6000_BUILTIN_M -- MMA builtins
    RS6000_BUILTIN_P -- Altivec, VSX, ISA 2.07 vector predicate builtins
    RS6000_BUILTIN_X -- special builtins
 
@@ -74,6 +75,10 @@
   #error "RS6000_BUILTIN_H is not defined."
 #endif
 
+#ifndef RS6000_BUILTIN_M
+  #error "RS6000_BUILTIN_M is not defined."
+#endif
+
 #ifndef RS6000_BUILTIN_P
   #error "RS6000_BUILTIN_P is not defined."
 #endif
@@ -329,6 +334,82 @@
 		     | RS6000_BTC_SPECIAL),				\
 		    CODE_FOR_nothing)			/* ICODE */
 
+/* MMA convenience macros.  */
+
+#define BU_MMA_1(ENUM, NAME, ATTR, ICODE)				\
+  RS6000_BUILTIN_M (MMA_BUILTIN_ ## ENUM,		/* ENUM */	\
+		    "__builtin_mma_" NAME,		/* NAME */	\
+		    RS6000_BTM_MMA,			/* MASK */	\
+		    (RS6000_BTC_ ## ATTR		/* ATTR */	\
+		     | RS6000_BTC_UNARY					\
+		     | RS6000_BTC_VOID					\
+		     | RS6000_BTC_GIMPLE),				\
+		    CODE_FOR_nothing)			/* ICODE */	\
+  RS6000_BUILTIN_M (MMA_BUILTIN_ ## ENUM ## _INTERNAL,	/* ENUM */	\
+		    "__builtin_mma_" NAME "_internal",	/* NAME */	\
+		    RS6000_BTM_MMA,			/* MASK */	\
+		    (RS6000_BTC_ ## ATTR		/* ATTR */	\
+		     | RS6000_BTC_UNARY),				\
+		    CODE_FOR_ ## ICODE)			/* ICODE */
+
+#define BU_MMA_V2(ENUM, NAME, ATTR, ICODE)				\
+  RS6000_BUILTIN_M (MMA_BUILTIN_ ## ENUM,		/* ENUM */	\
+		    "__builtin_mma_" NAME,		/* NAME */	\
+		    RS6000_BTM_MMA,			/* MASK */	\
+		    (RS6000_BTC_ ## ATTR		/* ATTR */	\
+		     | RS6000_BTC_BINARY				\
+		     | RS6000_BTC_VOID					\
+		     | RS6000_BTC_GIMPLE),				\
+		    CODE_FOR_nothing)			/* ICODE */
+
+#define BU_MMA_3(ENUM, NAME, ATTR, ICODE)				\
+  RS6000_BUILTIN_M (MMA_BUILTIN_ ## ENUM,		/* ENUM */	\
+		    "__builtin_mma_" NAME,		/* NAME */	\
+		    RS6000_BTM_MMA,			/* MASK */	\
+		    (RS6000_BTC_ ## ATTR		/* ATTR */	\
+		     | RS6000_BTC_TERNARY				\
+		     | RS6000_BTC_VOID					\
+		     | RS6000_BTC_GIMPLE),				\
+		    CODE_FOR_nothing)			/* ICODE */	\
+  RS6000_BUILTIN_M (MMA_BUILTIN_ ## ENUM ## _INTERNAL,	/* ENUM */	\
+		    "__builtin_mma_" NAME "_internal",	/* NAME */	\
+		    RS6000_BTM_MMA,			/* MASK */	\
+		    (RS6000_BTC_ ## ATTR		/* ATTR */	\
+		     | RS6000_BTC_TERNARY),				\
+		    CODE_FOR_ ## ICODE)			/* ICODE */
+
+#define BU_MMA_5(ENUM, NAME, ATTR, ICODE)				\
+  RS6000_BUILTIN_M (MMA_BUILTIN_ ## ENUM,		/* ENUM */	\
+		    "__builtin_mma_" NAME,		/* NAME */	\
+		    RS6000_BTM_MMA,			/* MASK */	\
+		    (RS6000_BTC_ ## ATTR		/* ATTR */	\
+		     | RS6000_BTC_QUINARY				\
+		     | RS6000_BTC_VOID					\
+		     | RS6000_BTC_GIMPLE),				\
+		    CODE_FOR_nothing)			/* ICODE */	\
+  RS6000_BUILTIN_M (MMA_BUILTIN_ ## ENUM ## _INTERNAL,	/* ENUM */	\
+		    "__builtin_mma_" NAME "_internal",	/* NAME */	\
+		    RS6000_BTM_MMA,			/* MASK */	\
+		    (RS6000_BTC_ ## ATTR		/* ATTR */	\
+		     | RS6000_BTC_QUINARY),				\
+		    CODE_FOR_ ## ICODE)			/* ICODE */
+
+#define BU_MMA_6(ENUM, NAME, ATTR, ICODE)				\
+  RS6000_BUILTIN_M (MMA_BUILTIN_ ## ENUM,		/* ENUM */	\
+		    "__builtin_mma_" NAME,		/* NAME */	\
+		    RS6000_BTM_MMA,			/* MASK */	\
+		    (RS6000_BTC_ ## ATTR		/* ATTR */	\
+		     | RS6000_BTC_SENARY				\
+		     | RS6000_BTC_VOID					\
+		     | RS6000_BTC_GIMPLE),				\
+		    CODE_FOR_nothing)			/* ICODE */	\
+  RS6000_BUILTIN_M (MMA_BUILTIN_ ## ENUM ## _INTERNAL,	/* ENUM */	\
+		    "__builtin_mma_" NAME "_internal",	/* NAME */	\
+		    RS6000_BTM_MMA,			/* MASK */	\
+		    (RS6000_BTC_ ## ATTR		/* ATTR */	\
+		     | RS6000_BTC_SENARY),				\
+		    CODE_FOR_ ## ICODE)			/* ICODE */
+
 /* ISA 2.05 (power6) convenience macros. */
 /* For functions that depend on the CMPB instruction */
 #define BU_P6_2(ENUM, NAME, ATTR, ICODE)				\
@@ -2785,3 +2866,77 @@ BU_SPECIAL_X (RS6000_BUILTIN_CPU_SUPPORTS, "__builtin_cpu_supports",
 /* Darwin CfString builtin.  */
 BU_SPECIAL_X (RS6000_BUILTIN_CFSTRING, "__builtin_cfstring", RS6000_BTM_ALWAYS,
 	      RS6000_BTC_MISC)
+
+/* FUTURE MMA builtins.  */
+BU_VSX_1 (XVCVBF16SP,	    "xvcvbf16sp",	MISC, vsx_xvcvbf16sp)
+BU_VSX_1 (XVCVSPBF16,	    "xvcvspbf16",	MISC, vsx_xvcvspbf16)
+
+BU_MMA_1 (XXMFACC,	    "xxmfacc",		QUAD, mma_xxmfacc)
+BU_MMA_1 (XXMTACC,	    "xxmtacc",		QUAD, mma_xxmtacc)
+BU_MMA_1 (XXSETACCZ,	    "xxsetaccz",	MISC, mma_xxsetaccz)
+
+BU_MMA_V2 (DISASSEMBLE_ACC, "disassemble_acc",  QUAD, nothing)
+BU_MMA_V2 (DISASSEMBLE_PAIR,"disassemble_pair", PAIR, nothing)
+
+BU_MMA_3 (ASSEMBLE_PAIR,    "assemble_pair",	MISC, mma_assemble_pair)
+BU_MMA_3 (XVBF16GER2,	    "xvbf16ger2",	MISC, mma_xvbf16ger2)
+BU_MMA_3 (XVF16GER2,	    "xvf16ger2",	MISC, mma_xvf16ger2)
+BU_MMA_3 (XVF32GER,	    "xvf32ger",		MISC, mma_xvf32ger)
+BU_MMA_3 (XVF64GER,	    "xvf64ger",		PAIR, mma_xvf64ger)
+BU_MMA_3 (XVI4GER8,	    "xvi4ger8",		MISC, mma_xvi4ger8)
+BU_MMA_3 (XVI8GER4,	    "xvi8ger4",		MISC, mma_xvi8ger4)
+BU_MMA_3 (XVI16GER2,	    "xvi16ger2",	MISC, mma_xvi16ger2)
+BU_MMA_3 (XVI16GER2S,	    "xvi16ger2s",	MISC, mma_xvi16ger2s)
+BU_MMA_3 (XVBF16GER2NN,	    "xvbf16ger2nn",     QUAD, mma_xvbf16ger2nn)
+BU_MMA_3 (XVBF16GER2NP,	    "xvbf16ger2np",     QUAD, mma_xvbf16ger2np)
+BU_MMA_3 (XVBF16GER2PN,	    "xvbf16ger2pn",     QUAD, mma_xvbf16ger2pn)
+BU_MMA_3 (XVBF16GER2PP,	    "xvbf16ger2pp",     QUAD, mma_xvbf16ger2pp)
+BU_MMA_3 (XVF16GER2NN,	    "xvf16ger2nn",      QUAD, mma_xvf16ger2nn)
+BU_MMA_3 (XVF16GER2NP,	    "xvf16ger2np",      QUAD, mma_xvf16ger2np)
+BU_MMA_3 (XVF16GER2PN,	    "xvf16ger2pn",      QUAD, mma_xvf16ger2pn)
+BU_MMA_3 (XVF16GER2PP,	    "xvf16ger2pp",      QUAD, mma_xvf16ger2pp)
+BU_MMA_3 (XVF32GERNN,	    "xvf32gernn",       QUAD, mma_xvf32gernn)
+BU_MMA_3 (XVF32GERNP,	    "xvf32gernp",       QUAD, mma_xvf32gernp)
+BU_MMA_3 (XVF32GERPN,	    "xvf32gerpn",       QUAD, mma_xvf32gerpn)
+BU_MMA_3 (XVF32GERPP,	    "xvf32gerpp",       QUAD, mma_xvf32gerpp)
+BU_MMA_3 (XVF64GERNN,	    "xvf64gernn",       QUADPAIR, mma_xvf64gernn)
+BU_MMA_3 (XVF64GERNP,	    "xvf64gernp",       QUADPAIR, mma_xvf64gernp)
+BU_MMA_3 (XVF64GERPN,	    "xvf64gerpn",       QUADPAIR, mma_xvf64gerpn)
+BU_MMA_3 (XVF64GERPP,	    "xvf64gerpp",       QUADPAIR, mma_xvf64gerpp)
+BU_MMA_3 (XVI4GER8PP,	    "xvi4ger8pp",	QUAD, mma_xvi4ger8pp)
+BU_MMA_3 (XVI8GER4PP,	    "xvi8ger4pp",       QUAD, mma_xvi8ger4pp)
+BU_MMA_3 (XVI8GER4SPP,	    "xvi8ger4spp",      QUAD, mma_xvi8ger4spp)
+BU_MMA_3 (XVI16GER2PP,	    "xvi16ger2pp",      QUAD, mma_xvi16ger2pp)
+BU_MMA_3 (XVI16GER2SPP,	    "xvi16ger2spp",     QUAD, mma_xvi16ger2spp)
+
+BU_MMA_5 (ASSEMBLE_ACC,     "assemble_acc",	MISC, mma_assemble_acc)
+BU_MMA_5 (PMXVF32GER,	    "pmxvf32ger",       MISC, mma_pmxvf32ger)
+BU_MMA_5 (PMXVF64GER,	    "pmxvf64ger",       PAIR, mma_pmxvf64ger)
+BU_MMA_5 (PMXVF32GERNN,	    "pmxvf32gernn",     QUAD, mma_pmxvf32gernn)
+BU_MMA_5 (PMXVF32GERNP,	    "pmxvf32gernp",     QUAD, mma_pmxvf32gernp)
+BU_MMA_5 (PMXVF32GERPN,	    "pmxvf32gerpn",     QUAD, mma_pmxvf32gerpn)
+BU_MMA_5 (PMXVF32GERPP,	    "pmxvf32gerpp",     QUAD, mma_pmxvf32gerpp)
+BU_MMA_5 (PMXVF64GERNN,	    "pmxvf64gernn",     QUADPAIR, mma_pmxvf64gernn)
+BU_MMA_5 (PMXVF64GERNP,	    "pmxvf64gernp",     QUADPAIR, mma_pmxvf64gernp)
+BU_MMA_5 (PMXVF64GERPN,	    "pmxvf64gerpn",     QUADPAIR, mma_pmxvf64gerpn)
+BU_MMA_5 (PMXVF64GERPP,	    "pmxvf64gerpp",     QUADPAIR, mma_pmxvf64gerpp)
+
+BU_MMA_6 (PMXVBF16GER2,	    "pmxvbf16ger2",     MISC, mma_pmxvbf16ger2)
+BU_MMA_6 (PMXVF16GER2,	    "pmxvf16ger2",      MISC, mma_pmxvf16ger2)
+BU_MMA_6 (PMXVI4GER8,	    "pmxvi4ger8",       MISC, mma_pmxvi4ger8)
+BU_MMA_6 (PMXVI8GER4,	    "pmxvi8ger4",	MISC, mma_pmxvi8ger4)
+BU_MMA_6 (PMXVI16GER2,	    "pmxvi16ger2",      MISC, mma_pmxvi16ger2)
+BU_MMA_6 (PMXVI16GER2S,	    "pmxvi16ger2s",     MISC, mma_pmxvi16ger2s)
+BU_MMA_6 (PMXVBF16GER2NN,   "pmxvbf16ger2nn",   QUAD, mma_pmxvbf16ger2nn)
+BU_MMA_6 (PMXVBF16GER2NP,   "pmxvbf16ger2np",   QUAD, mma_pmxvbf16ger2np)
+BU_MMA_6 (PMXVBF16GER2PN,   "pmxvbf16ger2pn",   QUAD, mma_pmxvbf16ger2pn)
+BU_MMA_6 (PMXVBF16GER2PP,   "pmxvbf16ger2pp",   QUAD, mma_pmxvbf16ger2pp)
+BU_MMA_6 (PMXVF16GER2NN,    "pmxvf16ger2nn",    QUAD, mma_pmxvf16ger2nn)
+BU_MMA_6 (PMXVF16GER2NP,    "pmxvf16ger2np",    QUAD, mma_pmxvf16ger2np)
+BU_MMA_6 (PMXVF16GER2PN,    "pmxvf16ger2pn",    QUAD, mma_pmxvf16ger2pn)
+BU_MMA_6 (PMXVF16GER2PP,    "pmxvf16ger2pp",    QUAD, mma_pmxvf16ger2pp)
+BU_MMA_6 (PMXVI4GER8PP,	    "pmxvi4ger8pp",     QUAD, mma_pmxvi4ger8pp)
+BU_MMA_6 (PMXVI8GER4PP,	    "pmxvi8ger4pp",	QUAD, mma_pmxvi8ger4pp)
+BU_MMA_6 (PMXVI8GER4SPP,    "pmxvi8ger4spp",	QUAD, mma_pmxvi8ger4spp)
+BU_MMA_6 (PMXVI16GER2PP,    "pmxvi16ger2pp",    QUAD, mma_pmxvi16ger2pp)
+BU_MMA_6 (PMXVI16GER2SPP,   "pmxvi16ger2spp",   QUAD, mma_pmxvi16ger2spp)
diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index eeb20e5200d..d47c3a3aeb0 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -183,6 +183,7 @@ static tree builtin_function_type (machine_mode, machine_mode,
 				   enum rs6000_builtins, const char *name);
 static void rs6000_common_init_builtins (void);
 static void htm_init_builtins (void);
+static void mma_init_builtins (void);
 
 
 /* Hash table to keep track of the argument types for builtin functions.  */
@@ -243,6 +244,7 @@ builtin_hasher::equal (builtin_hash_struct *p1, builtin_hash_struct *p2)
 #undef RS6000_BUILTIN_A
 #undef RS6000_BUILTIN_D
 #undef RS6000_BUILTIN_H
+#undef RS6000_BUILTIN_M
 #undef RS6000_BUILTIN_P
 #undef RS6000_BUILTIN_X
 
@@ -270,6 +272,9 @@ builtin_hasher::equal (builtin_hash_struct *p1, builtin_hash_struct *p2)
 #define RS6000_BUILTIN_H(ENUM, NAME, MASK, ATTR, ICODE)  \
   { NAME, ICODE, MASK, ATTR },
 
+#define RS6000_BUILTIN_M(ENUM, NAME, MASK, ATTR, ICODE)  \
+  { NAME, ICODE, MASK, ATTR },
+
 #define RS6000_BUILTIN_P(ENUM, NAME, MASK, ATTR, ICODE)  \
   { NAME, ICODE, MASK, ATTR },
 
@@ -296,6 +301,7 @@ static const struct rs6000_builtin_info_type rs6000_builtin_info[] =
 #undef RS6000_BUILTIN_A
 #undef RS6000_BUILTIN_D
 #undef RS6000_BUILTIN_H
+#undef RS6000_BUILTIN_M
 #undef RS6000_BUILTIN_P
 #undef RS6000_BUILTIN_X
 
@@ -8354,6 +8360,9 @@ def_builtin (const char *name, tree type, enum rs6000_builtins code)
 	  attr_string = ", fp, const";
 	}
     }
+  else if ((classify & (RS6000_BTC_QUAD | RS6000_BTC_PAIR)) != 0)
+    /* The function uses a register quad and/or pair.  Nothing to do.  */
+    ;
   else if ((classify & RS6000_BTC_ATTR_MASK) != 0)
     gcc_unreachable ();
 
@@ -8372,6 +8381,7 @@ def_builtin (const char *name, tree type, enum rs6000_builtins code)
 #undef RS6000_BUILTIN_A
 #undef RS6000_BUILTIN_D
 #undef RS6000_BUILTIN_H
+#undef RS6000_BUILTIN_M
 #undef RS6000_BUILTIN_P
 #undef RS6000_BUILTIN_X
 
@@ -8385,6 +8395,7 @@ def_builtin (const char *name, tree type, enum rs6000_builtins code)
 #define RS6000_BUILTIN_A(ENUM, NAME, MASK, ATTR, ICODE)
 #define RS6000_BUILTIN_D(ENUM, NAME, MASK, ATTR, ICODE)
 #define RS6000_BUILTIN_H(ENUM, NAME, MASK, ATTR, ICODE)
+#define RS6000_BUILTIN_M(ENUM, NAME, MASK, ATTR, ICODE)
 #define RS6000_BUILTIN_P(ENUM, NAME, MASK, ATTR, ICODE)
 #define RS6000_BUILTIN_X(ENUM, NAME, MASK, ATTR, ICODE)
 
@@ -8403,6 +8414,7 @@ static const struct builtin_description bdesc_3arg[] =
 #undef RS6000_BUILTIN_A
 #undef RS6000_BUILTIN_D
 #undef RS6000_BUILTIN_H
+#undef RS6000_BUILTIN_M
 #undef RS6000_BUILTIN_P
 #undef RS6000_BUILTIN_X
 
@@ -8416,6 +8428,7 @@ static const struct builtin_description bdesc_3arg[] =
 #define RS6000_BUILTIN_A(ENUM, NAME, MASK, ATTR, ICODE)
 #define RS6000_BUILTIN_D(ENUM, NAME, MASK, ATTR, ICODE)
 #define RS6000_BUILTIN_H(ENUM, NAME, MASK, ATTR, ICODE)
+#define RS6000_BUILTIN_M(ENUM, NAME, MASK, ATTR, ICODE)
 #define RS6000_BUILTIN_P(ENUM, NAME, MASK, ATTR, ICODE)
 #define RS6000_BUILTIN_X(ENUM, NAME, MASK, ATTR, ICODE)
 
@@ -8434,6 +8447,7 @@ static const struct builtin_description bdesc_4arg[] =
 #undef RS6000_BUILTIN_A
 #undef RS6000_BUILTIN_D
 #undef RS6000_BUILTIN_H
+#undef RS6000_BUILTIN_M
 #undef RS6000_BUILTIN_P
 #undef RS6000_BUILTIN_X
 
@@ -8447,6 +8461,7 @@ static const struct builtin_description bdesc_4arg[] =
   { MASK, ICODE, NAME, ENUM },
 
 #define RS6000_BUILTIN_H(ENUM, NAME, MASK, ATTR, ICODE)
+#define RS6000_BUILTIN_M(ENUM, NAME, MASK, ATTR, ICODE)
 #define RS6000_BUILTIN_P(ENUM, NAME, MASK, ATTR, ICODE)
 #define RS6000_BUILTIN_X(ENUM, NAME, MASK, ATTR, ICODE)
 
@@ -8465,6 +8480,7 @@ static const struct builtin_description bdesc_dst[] =
 #undef RS6000_BUILTIN_A
 #undef RS6000_BUILTIN_D
 #undef RS6000_BUILTIN_H
+#undef RS6000_BUILTIN_M
 #undef RS6000_BUILTIN_P
 #undef RS6000_BUILTIN_X
 
@@ -8478,6 +8494,7 @@ static const struct builtin_description bdesc_dst[] =
 #define RS6000_BUILTIN_A(ENUM, NAME, MASK, ATTR, ICODE)
 #define RS6000_BUILTIN_D(ENUM, NAME, MASK, ATTR, ICODE)
 #define RS6000_BUILTIN_H(ENUM, NAME, MASK, ATTR, ICODE)
+#define RS6000_BUILTIN_M(ENUM, NAME, MASK, ATTR, ICODE)
 #define RS6000_BUILTIN_P(ENUM, NAME, MASK, ATTR, ICODE)
 #define RS6000_BUILTIN_X(ENUM, NAME, MASK, ATTR, ICODE)
 
@@ -8494,6 +8511,7 @@ static const struct builtin_description bdesc_2arg[] =
 #undef RS6000_BUILTIN_A
 #undef RS6000_BUILTIN_D
 #undef RS6000_BUILTIN_H
+#undef RS6000_BUILTIN_M
 #undef RS6000_BUILTIN_P
 #undef RS6000_BUILTIN_X
 
@@ -8505,6 +8523,7 @@ static const struct builtin_description bdesc_2arg[] =
 #define RS6000_BUILTIN_A(ENUM, NAME, MASK, ATTR, ICODE)
 #define RS6000_BUILTIN_D(ENUM, NAME, MASK, ATTR, ICODE)
 #define RS6000_BUILTIN_H(ENUM, NAME, MASK, ATTR, ICODE)
+#define RS6000_BUILTIN_M(ENUM, NAME, MASK, ATTR, ICODE)
 #define RS6000_BUILTIN_P(ENUM, NAME, MASK, ATTR, ICODE) \
   { MASK, ICODE, NAME, ENUM },
 
@@ -8527,6 +8546,7 @@ static const struct builtin_description bdesc_altivec_preds[] =
 #undef RS6000_BUILTIN_A
 #undef RS6000_BUILTIN_D
 #undef RS6000_BUILTIN_H
+#undef RS6000_BUILTIN_M
 #undef RS6000_BUILTIN_P
 #undef RS6000_BUILTIN_X
 
@@ -8540,6 +8560,7 @@ static const struct builtin_description bdesc_altivec_preds[] =
 
 #define RS6000_BUILTIN_D(ENUM, NAME, MASK, ATTR, ICODE)
 #define RS6000_BUILTIN_H(ENUM, NAME, MASK, ATTR, ICODE)
+#define RS6000_BUILTIN_M(ENUM, NAME, MASK, ATTR, ICODE)
 #define RS6000_BUILTIN_P(ENUM, NAME, MASK, ATTR, ICODE)
 #define RS6000_BUILTIN_X(ENUM, NAME, MASK, ATTR, ICODE)
 
@@ -8559,6 +8580,7 @@ static const struct builtin_description bdesc_abs[] =
 #undef RS6000_BUILTIN_A
 #undef RS6000_BUILTIN_D
 #undef RS6000_BUILTIN_H
+#undef RS6000_BUILTIN_M
 #undef RS6000_BUILTIN_P
 #undef RS6000_BUILTIN_X
 
@@ -8572,6 +8594,7 @@ static const struct builtin_description bdesc_abs[] =
 #define RS6000_BUILTIN_A(ENUM, NAME, MASK, ATTR, ICODE)
 #define RS6000_BUILTIN_D(ENUM, NAME, MASK, ATTR, ICODE)
 #define RS6000_BUILTIN_H(ENUM, NAME, MASK, ATTR, ICODE)
+#define RS6000_BUILTIN_M(ENUM, NAME, MASK, ATTR, ICODE)
 #define RS6000_BUILTIN_P(ENUM, NAME, MASK, ATTR, ICODE)
 #define RS6000_BUILTIN_X(ENUM, NAME, MASK, ATTR, ICODE)
 
@@ -8590,6 +8613,7 @@ static const struct builtin_description bdesc_1arg[] =
 #undef RS6000_BUILTIN_A
 #undef RS6000_BUILTIN_D
 #undef RS6000_BUILTIN_H
+#undef RS6000_BUILTIN_M
 #undef RS6000_BUILTIN_P
 #undef RS6000_BUILTIN_X
 
@@ -8603,6 +8627,7 @@ static const struct builtin_description bdesc_1arg[] =
 #define RS6000_BUILTIN_A(ENUM, NAME, MASK, ATTR, ICODE)
 #define RS6000_BUILTIN_D(ENUM, NAME, MASK, ATTR, ICODE)
 #define RS6000_BUILTIN_H(ENUM, NAME, MASK, ATTR, ICODE)
+#define RS6000_BUILTIN_M(ENUM, NAME, MASK, ATTR, ICODE)
 #define RS6000_BUILTIN_P(ENUM, NAME, MASK, ATTR, ICODE)
 #define RS6000_BUILTIN_X(ENUM, NAME, MASK, ATTR, ICODE)
 
@@ -8620,6 +8645,7 @@ static const struct builtin_description bdesc_0arg[] =
 #undef RS6000_BUILTIN_A
 #undef RS6000_BUILTIN_D
 #undef RS6000_BUILTIN_H
+#undef RS6000_BUILTIN_M
 #undef RS6000_BUILTIN_P
 #undef RS6000_BUILTIN_X
 
@@ -8633,6 +8659,7 @@ static const struct builtin_description bdesc_0arg[] =
 #define RS6000_BUILTIN_H(ENUM, NAME, MASK, ATTR, ICODE) \
   { MASK, ICODE, NAME, ENUM },
 
+#define RS6000_BUILTIN_M(ENUM, NAME, MASK, ATTR, ICODE)
 #define RS6000_BUILTIN_P(ENUM, NAME, MASK, ATTR, ICODE)
 #define RS6000_BUILTIN_X(ENUM, NAME, MASK, ATTR, ICODE)
 
@@ -8641,6 +8668,7 @@ static const struct builtin_description bdesc_htm[] =
 #include "rs6000-builtin.def"
 };
 
+/* MMA builtins.  */
 #undef RS6000_BUILTIN_0
 #undef RS6000_BUILTIN_1
 #undef RS6000_BUILTIN_2
@@ -8649,7 +8677,40 @@ static const struct builtin_description bdesc_htm[] =
 #undef RS6000_BUILTIN_A
 #undef RS6000_BUILTIN_D
 #undef RS6000_BUILTIN_H
+#undef RS6000_BUILTIN_M
 #undef RS6000_BUILTIN_P
+#undef RS6000_BUILTIN_X
+
+#define RS6000_BUILTIN_0(ENUM, NAME, MASK, ATTR, ICODE)
+#define RS6000_BUILTIN_1(ENUM, NAME, MASK, ATTR, ICODE)
+#define RS6000_BUILTIN_2(ENUM, NAME, MASK, ATTR, ICODE)
+#define RS6000_BUILTIN_3(ENUM, NAME, MASK, ATTR, ICODE)
+#define RS6000_BUILTIN_4(ENUM, NAME, MASK, ATTR, ICODE)
+#define RS6000_BUILTIN_A(ENUM, NAME, MASK, ATTR, ICODE)
+#define RS6000_BUILTIN_D(ENUM, NAME, MASK, ATTR, ICODE)
+#define RS6000_BUILTIN_H(ENUM, NAME, MASK, ATTR, ICODE)
+#define RS6000_BUILTIN_M(ENUM, NAME, MASK, ATTR, ICODE) \
+  { MASK, ICODE, NAME, ENUM },
+
+#define RS6000_BUILTIN_P(ENUM, NAME, MASK, ATTR, ICODE)
+#define RS6000_BUILTIN_X(ENUM, NAME, MASK, ATTR, ICODE)
+
+static const struct builtin_description bdesc_mma[] =
+{
+#include "rs6000-builtin.def"
+};
+
+#undef RS6000_BUILTIN_0
+#undef RS6000_BUILTIN_1
+#undef RS6000_BUILTIN_2
+#undef RS6000_BUILTIN_3
+#undef RS6000_BUILTIN_4
+#undef RS6000_BUILTIN_A
+#undef RS6000_BUILTIN_D
+#undef RS6000_BUILTIN_H
+#undef RS6000_BUILTIN_M
+#undef RS6000_BUILTIN_P
+#undef RS6000_BUILTIN_X
 
 /* Return true if a builtin function is overloaded.  */
 bool
@@ -9393,6 +9454,133 @@ altivec_expand_stv_builtin (enum insn_code icode, tree exp)
   return NULL_RTX;
 }
 
+/* Expand the MMA built-in in EXP.
+   Store true in *EXPANDEDP if we found a built-in to expand.  */
+
+static rtx
+mma_expand_builtin (tree exp, rtx target, bool *expandedp)
+{
+  unsigned i;
+  tree fndecl = TREE_OPERAND (CALL_EXPR_FN (exp), 0);
+  enum rs6000_builtins fcode
+    = (enum rs6000_builtins) DECL_MD_FUNCTION_CODE (fndecl);
+  const struct builtin_description *d = bdesc_mma;
+
+  /* Expand the MMA built-in.  */
+  for (i = 0; i < ARRAY_SIZE (bdesc_mma); i++, d++)
+    if (d->code == fcode)
+      break;
+
+  if (i >= ARRAY_SIZE (bdesc_mma))
+    {
+      *expandedp = false;
+      return NULL_RTX;
+    }
+
+  *expandedp = true;
+
+  tree arg;
+  call_expr_arg_iterator iter;
+  enum insn_code icode = d->icode;
+  const struct insn_operand_data *insn_op;
+  rtx op[MAX_MMA_OPERANDS];
+  unsigned nopnds = 0;
+  unsigned attr = rs6000_builtin_info[fcode].attr;
+  bool void_func = (attr & RS6000_BTC_VOID);
+  machine_mode tmode = VOIDmode;
+
+  if (TREE_TYPE (TREE_TYPE (fndecl)) != void_type_node)
+    {
+      tmode = insn_data[icode].operand[0].mode;
+      if (!target
+	  || GET_MODE (target) != tmode
+	  || !(*insn_data[icode].operand[0].predicate) (target, tmode))
+	target = gen_reg_rtx (tmode);
+      op[nopnds++] = target;
+    }
+  else
+    target = const0_rtx;
+
+  FOR_EACH_CALL_EXPR_ARG (arg, iter, exp)
+    {
+      if (arg == error_mark_node)
+	return const0_rtx;
+
+      rtx opnd;
+      insn_op = &insn_data[icode].operand[nopnds];
+      if (TREE_CODE (arg) == ADDR_EXPR
+	  && MEM_P (DECL_RTL (TREE_OPERAND (arg, 0))))
+	opnd = DECL_RTL (TREE_OPERAND (arg, 0));
+      else
+	opnd = expand_normal (arg);
+
+      if (!(*insn_op->predicate) (opnd, insn_op->mode))
+	{
+	  if (!strcmp (insn_op->constraint, "n"))
+	    {
+	      if (!CONST_INT_P (opnd))
+		error ("argument %d must be an unsigned literal", nopnds);
+	      else
+		error ("argument %d is an unsigned literal that is "
+		       "out of range", nopnds);
+	      return const0_rtx;
+	    }
+	  opnd = copy_to_mode_reg (insn_op->mode, opnd);
+	}
+
+      /* Some MMA instructions have INOUT accumulator operands, so force
+	 their target register to be the same as their input register.  */
+      if (!void_func
+	  && nopnds == 1
+	  && !strcmp (insn_op->constraint, "0")
+	  && insn_op->mode == tmode
+	  && REG_P (opnd)
+	  && (*insn_data[icode].operand[0].predicate) (opnd, tmode))
+	target = op[0] = opnd;
+
+      op[nopnds++] = opnd;
+    }
+
+  unsigned attr_args = attr & RS6000_BTC_OPND_MASK;
+  if (attr & RS6000_BTC_QUAD)
+    attr_args++;
+
+  gcc_assert (nopnds == attr_args);
+
+  rtx pat;
+  switch (nopnds)
+    {
+    case 1:
+      pat = GEN_FCN (icode) (op[0]);
+      break;
+    case 2:
+      pat = GEN_FCN (icode) (op[0], op[1]);
+      break;
+    case 3:
+      pat = GEN_FCN (icode) (op[0], op[1], op[2]);
+      break;
+    case 4:
+      pat = GEN_FCN (icode) (op[0], op[1], op[2], op[3]);
+      break;
+    case 5:
+      pat = GEN_FCN (icode) (op[0], op[1], op[2], op[3], op[4]);
+      break;
+    case 6:
+      pat = GEN_FCN (icode) (op[0], op[1], op[2], op[3], op[4], op[5]);
+      break;
+    case 7:
+      pat = GEN_FCN (icode) (op[0], op[1], op[2], op[3], op[4], op[5], op[6]);
+      break;
+    default:
+      gcc_unreachable ();
+    }
+  if (!pat)
+    return NULL_RTX;
+  emit_insn (pat);
+
+  return target;
+}
+
 /* Return the appropriate SPR number associated with the given builtin.  */
 static inline HOST_WIDE_INT
 htm_spr_num (enum rs6000_builtins code)
@@ -9539,11 +9727,11 @@ htm_expand_builtin (tree exp, rtx target, bool * expandedp)
 	if (flag_checking)
 	  {
 	    int expected_nopnds = 0;
-	    if ((attr & RS6000_BTC_TYPE_MASK) == RS6000_BTC_UNARY)
+	    if ((attr & RS6000_BTC_OPND_MASK) == RS6000_BTC_UNARY)
 	      expected_nopnds = 1;
-	    else if ((attr & RS6000_BTC_TYPE_MASK) == RS6000_BTC_BINARY)
+	    else if ((attr & RS6000_BTC_OPND_MASK) == RS6000_BTC_BINARY)
 	      expected_nopnds = 2;
-	    else if ((attr & RS6000_BTC_TYPE_MASK) == RS6000_BTC_TERNARY)
+	    else if ((attr & RS6000_BTC_OPND_MASK) == RS6000_BTC_TERNARY)
 	      expected_nopnds = 3;
 	    else if ((attr & RS6000_BTC_TYPE_MASK) == RS6000_BTC_QUATERNARY)
 	      expected_nopnds = 4;
@@ -10647,6 +10835,10 @@ rs6000_invalid_builtin (enum rs6000_builtins fncode)
 	   "-m64");
   else if ((fnmask & RS6000_BTM_P9_MISC) == RS6000_BTM_P9_MISC)
     error ("%qs requires the %qs option", name, "-mcpu=power9");
+  else if ((fnmask & RS6000_BTM_FUTURE) != 0)
+    error ("%qs requires the %qs option", name, "-mcpu=future");
+  else if ((fnmask & RS6000_BTM_MMA) != 0)
+    error ("%qs requires the %qs option", name, "-mmma");
   else if ((fnmask & RS6000_BTM_LDBL128) == RS6000_BTM_LDBL128)
     {
       if (!TARGET_HARD_FLOAT)
@@ -10690,6 +10882,10 @@ rs6000_fold_builtin (tree fndecl ATTRIBUTE_UNUSED,
 static bool
 rs6000_builtin_valid_without_lhs (enum rs6000_builtins fn_code)
 {
+  /* Check for built-ins explicitly marked as a void function.  */
+  if (rs6000_builtin_info[fn_code].attr & RS6000_BTC_VOID)
+    return true;
+
   switch (fn_code)
     {
     case ALTIVEC_BUILTIN_STVX_V16QI:
@@ -10833,6 +11029,156 @@ fold_mergeeo_helper (gimple_stmt_iterator *gsi, gimple *stmt, int use_odd)
   gsi_replace (gsi, g, true);
 }
 
+/* Expand the MMA built-ins early, so that we can convert the pass-by-reference
+   __vector_quad arguments into pass-by-value arguments, leading to more
+   efficient code generation.  */
+
+bool
+rs6000_gimple_fold_mma_builtin (gimple_stmt_iterator *gsi)
+{
+  gimple *stmt = gsi_stmt (*gsi);
+  tree fndecl = gimple_call_fndecl (stmt);
+  enum rs6000_builtins fncode
+    = (enum rs6000_builtins) DECL_MD_FUNCTION_CODE (fndecl);
+  unsigned attr = rs6000_builtin_info[fncode].attr;
+
+  if ((attr & RS6000_BTC_GIMPLE) == 0)
+    return false;
+
+  unsigned nopnds = (attr & RS6000_BTC_OPND_MASK);
+  gimple_seq new_seq = NULL;
+  gimple *new_call;
+  tree new_decl;
+
+  if (rs6000_builtin_info[fncode + 1].icode == CODE_FOR_nothing)
+    {
+      /* This is an MMA disassemble built-in function.  */
+      gcc_assert (fncode == MMA_BUILTIN_DISASSEMBLE_ACC
+		  || fncode == MMA_BUILTIN_DISASSEMBLE_PAIR);
+
+      push_gimplify_context (true);
+      tree dst_ptr = gimple_call_arg (stmt, 0);
+      tree src_ptr = gimple_call_arg (stmt, 1);
+      tree src_type = TREE_TYPE (src_ptr);
+      tree src = make_ssa_name (TREE_TYPE (src_type));
+      gimplify_assign (src, build_simple_mem_ref (src_ptr), &new_seq);
+
+      /* If we are not disassembling an accumulator or our destination is
+	 another accumulator, then just copy the entire thing as is.  */
+      if (fncode != MMA_BUILTIN_DISASSEMBLE_ACC
+	  || TREE_TYPE (TREE_TYPE (dst_ptr)) == vector_quad_type_node)
+	{
+	  tree dst = build_simple_mem_ref (build1 (VIEW_CONVERT_EXPR,
+						   src_type, dst_ptr));
+	  gimplify_assign (dst, src, &new_seq);
+	  pop_gimplify_context (NULL);
+	  gsi_replace_with_seq (gsi, new_seq, true);
+	  return true;
+	}
+
+      /* We're disassembling an accumulator into a different type, so we need
+	 to emit a xxmfacc instruction now, since we cannot do it later.  */
+      new_decl = rs6000_builtin_decls[MMA_BUILTIN_XXMFACC_INTERNAL];
+      new_call = gimple_build_call (new_decl, 1, src);
+      src = make_ssa_name (vector_quad_type_node);
+      gimple_call_set_lhs (new_call, src);
+      gimple_seq_add_stmt (&new_seq, new_call);
+
+      /* Copy the accumulator vector by vector.  */
+      tree dst_type = build_pointer_type_for_mode (unsigned_V16QI_type_node,
+						   ptr_mode, true);
+      tree dst_base = build1 (VIEW_CONVERT_EXPR, dst_type, dst_ptr);
+      tree array_type = build_array_type_nelts (unsigned_V16QI_type_node, 4);
+      tree src_array = build1 (VIEW_CONVERT_EXPR, array_type, src);
+      for (unsigned i = 0; i < 4; i++)
+	{
+	  tree ref = build4 (ARRAY_REF, unsigned_V16QI_type_node, src_array,
+			     build_int_cst (size_type_node, i),
+			     NULL_TREE, NULL_TREE);
+	  tree dst = build2 (MEM_REF, unsigned_V16QI_type_node, dst_base,
+			     build_int_cst (dst_type, i * 16));
+	  gimplify_assign (dst, ref, &new_seq);
+	}
+      pop_gimplify_context (NULL);
+      gsi_replace_with_seq (gsi, new_seq, true);
+      return true;
+    }
+
+  /* Convert this built-in into an internal version that uses pass-by-value
+     arguments.  The internal built-in follows immediately after this one.  */
+  new_decl = rs6000_builtin_decls[fncode + 1];
+  tree lhs, mem, op[MAX_MMA_OPERANDS];
+  tree acc = gimple_call_arg (stmt, 0);
+  if (TREE_CODE (acc) == PARM_DECL)
+    mem = build1 (INDIRECT_REF, TREE_TYPE (TREE_TYPE (acc)), acc);
+  else
+    mem = build_simple_mem_ref (acc);
+  push_gimplify_context (true);
+
+  if ((attr & RS6000_BTC_QUAD) != 0)
+    {
+      /* This built-in has a pass-by-reference accumulator input, so load it
+	 into a temporary accumulator for use as a pass-by-value input.  */
+      op[0] = make_ssa_name (vector_quad_type_node);
+      for (unsigned i = 1; i < nopnds; i++)
+	op[i] = gimple_call_arg (stmt, i);
+      gimplify_assign (op[0], mem, &new_seq);
+    }
+  else
+    {
+      /* This built-in does not use its pass-by-reference accumulator argument
+	 as an input argument, so remove it from the input list.  */
+      nopnds--;
+      for (unsigned i = 0; i < nopnds; i++)
+	op[i] = gimple_call_arg (stmt, i + 1);
+    }
+
+  switch (nopnds)
+    {
+    case 0:
+      new_call = gimple_build_call (new_decl, 0);
+      break;
+    case 1:
+      new_call = gimple_build_call (new_decl, 1, op[0]);
+      break;
+    case 2:
+      new_call = gimple_build_call (new_decl, 2, op[0], op[1]);
+      break;
+    case 3:
+      new_call = gimple_build_call (new_decl, 3, op[0], op[1], op[2]);
+      break;
+    case 4:
+      new_call = gimple_build_call (new_decl, 4, op[0], op[1], op[2], op[3]);
+      break;
+    case 5:
+      new_call = gimple_build_call (new_decl, 5, op[0], op[1], op[2], op[3],
+				    op[4]);
+      break;
+    case 6:
+      new_call = gimple_build_call (new_decl, 6, op[0], op[1], op[2], op[3],
+				    op[4], op[5]);
+      break;
+    case 7:
+      new_call = gimple_build_call (new_decl, 7, op[0], op[1], op[2], op[3],
+				    op[4], op[5], op[6]);
+      break;
+    default:
+      gcc_unreachable ();
+    }
+
+  if (fncode == MMA_BUILTIN_ASSEMBLE_PAIR)
+    lhs = make_ssa_name (vector_pair_type_node);
+  else
+    lhs = make_ssa_name (vector_quad_type_node);
+  gimple_call_set_lhs (new_call, lhs);
+  gimple_seq_add_stmt (&new_seq, new_call);
+  gimplify_assign (mem, lhs, &new_seq);
+  pop_gimplify_context (NULL);
+  gsi_replace_with_seq (gsi, new_seq, true);
+
+  return true;
+}
+
 /* Fold a machine-dependent built-in in GIMPLE.  (For folding into
    a constant, use rs6000_fold_builtin.)  */
 
@@ -10868,11 +11214,12 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
     return false;
 
   /* Don't fold invalid builtins, let rs6000_expand_builtin diagnose it.  */
-  HOST_WIDE_INT mask = rs6000_builtin_info[uns_fncode].mask;
-  bool func_valid_p = (rs6000_builtin_mask & mask) == mask;
-  if (!func_valid_p)
+  if (!rs6000_builtin_is_supported_p (fn_code))
     return false;
 
+  if (rs6000_gimple_fold_mma_builtin (gsi))
+    return true;
+
   switch (fn_code)
     {
     /* Flavors of vec_add.  We deliberately don't expand
@@ -12007,6 +12354,13 @@ rs6000_expand_builtin (tree exp, rtx target, rtx subtarget ATTRIBUTE_UNUSED,
       break;
     }
 
+  if (TARGET_MMA)
+    {
+      ret = mma_expand_builtin (exp, target, &success);
+
+      if (success)
+	return ret;
+    }
   if (TARGET_ALTIVEC)
     {
       ret = altivec_expand_builtin (exp, target, &success);
@@ -12022,7 +12376,7 @@ rs6000_expand_builtin (tree exp, rtx target, rtx subtarget ATTRIBUTE_UNUSED,
 	return ret;
     }  
 
-  unsigned attr = rs6000_builtin_info[uns_fcode].attr & RS6000_BTC_TYPE_MASK;
+  unsigned attr = rs6000_builtin_info[uns_fcode].attr & RS6000_BTC_OPND_MASK;
   /* RS6000_BTC_SPECIAL represents no-operand operators.  */
   gcc_assert (attr == RS6000_BTC_UNARY
 	      || attr == RS6000_BTC_BINARY
@@ -12205,7 +12559,7 @@ rs6000_init_builtins (void)
   else
     ieee128_float_type_node = ibm128_float_type_node = long_double_type_node;
 
-  /* Vector paired and vector quad support.  */
+  /* Vector pair and vector quad support.  */
   if (TARGET_MMA)
     {
       tree oi_uns_type = make_unsigned_type (256);
@@ -12287,6 +12641,8 @@ rs6000_init_builtins (void)
      the target attribute.  */
   if (TARGET_EXTRA_BUILTINS)
     altivec_init_builtins ();
+  if (TARGET_MMA)
+    mma_init_builtins ();
   if (TARGET_HTM)
     htm_init_builtins ();
 
@@ -13012,6 +13368,119 @@ altivec_init_builtins (void)
 
 }
 
+static void
+mma_init_builtins (void)
+{
+  const struct builtin_description *d = bdesc_mma;
+
+  for (unsigned i = 0; i < ARRAY_SIZE (bdesc_mma); i++, d++)
+    {
+      tree op[MAX_MMA_OPERANDS], type;
+      HOST_WIDE_INT mask = d->mask;
+      unsigned icode = (unsigned) d->icode;
+      unsigned attr = rs6000_builtin_info[d->code].attr;
+      int attr_args = (attr & RS6000_BTC_OPND_MASK);
+      bool gimple_func = (attr & RS6000_BTC_GIMPLE);
+      unsigned nopnds = 0;
+
+      if ((mask & rs6000_builtin_mask) != mask)
+	{
+	  if (TARGET_DEBUG_BUILTIN)
+	    fprintf (stderr, "mma_builtin, skip binary %s\n", d->name);
+	  continue;
+	}
+
+      if (d->name == 0)
+	{
+	  if (TARGET_DEBUG_BUILTIN)
+	    fprintf (stderr, "mma_builtin, bdesc_mma[%ld] no name\n",
+		     (long unsigned) i);
+	  continue;
+	}
+
+      if (gimple_func)
+	{
+	  gcc_assert (icode == CODE_FOR_nothing);
+	  op[nopnds++] = void_type_node;
+	  /* Some MMA built-ins that are expanded into gimple are converted
+	     into internal MMA built-ins that are expanded into rtl.
+	     The internal built-in follows immediately after this built-in.  */
+	  icode = d[1].icode;
+	}
+      else
+	{
+	  if ((attr & RS6000_BTC_QUAD) == 0)
+	    attr_args--;
+
+	  /* Ensure we have the correct number and type of operands.  */
+	  gcc_assert (attr_args == insn_data[icode].n_operands - 1);
+	}
+
+      if (icode == CODE_FOR_nothing)
+	{
+	  /* This is a disassemble MMA built-in function.  */
+	  gcc_assert (attr_args == RS6000_BTC_BINARY
+		      && (d->code == MMA_BUILTIN_DISASSEMBLE_ACC
+			  || d->code == MMA_BUILTIN_DISASSEMBLE_PAIR));
+	  op[nopnds++] = build_pointer_type (void_type_node);
+	  if (attr & RS6000_BTC_QUAD)
+	    op[nopnds++] = build_pointer_type (vector_quad_type_node);
+	  else
+	    op[nopnds++] = build_pointer_type (vector_pair_type_node);
+	}
+      else
+	{
+	  /* This is a normal MMA built-in function.  */
+	  unsigned j = (attr & RS6000_BTC_QUAD) ? 1 : 0;
+	  for (; j < insn_data[icode].n_operands; j++)
+	    {
+	      machine_mode mode = insn_data[icode].operand[j].mode;
+	      if (gimple_func && mode == PXImode)
+		op[nopnds++] = build_pointer_type (vector_quad_type_node);
+	      else if (gimple_func && mode == POImode
+		       && d->code == MMA_BUILTIN_ASSEMBLE_PAIR)
+		op[nopnds++] = build_pointer_type (vector_pair_type_node);
+	      else
+		/* MMA uses unsigned types.  */
+		op[nopnds++] = builtin_mode_to_type[mode][1];
+	    }
+	}
+
+      switch (nopnds)
+	{
+	case 1:
+	  type = build_function_type_list (op[0], NULL_TREE);
+	  break;
+	case 2:
+	  type = build_function_type_list (op[0], op[1], NULL_TREE);
+	  break;
+	case 3:
+	  type = build_function_type_list (op[0], op[1], op[2], NULL_TREE);
+	  break;
+	case 4:
+	  type = build_function_type_list (op[0], op[1], op[2], op[3],
+					   NULL_TREE);
+	  break;
+	case 5:
+	  type = build_function_type_list (op[0], op[1], op[2], op[3], op[4],
+					   NULL_TREE);
+	  break;
+	case 6:
+	  type = build_function_type_list (op[0], op[1], op[2], op[3], op[4],
+					   op[5], NULL_TREE);
+	  break;
+	case 7:
+	  type = build_function_type_list (op[0], op[1], op[2], op[3], op[4],
+					   op[5], op[6], NULL_TREE);
+	  break;
+	default:
+	  gcc_unreachable ();
+	}
+
+      def_builtin (d->name, type, d->code);
+    }
+}
+
 static void
 htm_init_builtins (void)
 {
@@ -13026,7 +13495,7 @@ htm_init_builtins (void)
       HOST_WIDE_INT mask = d->mask;
       unsigned attr = rs6000_builtin_info[d->code].attr;
       bool void_func = (attr & RS6000_BTC_VOID);
-      int attr_args = (attr & RS6000_BTC_TYPE_MASK);
+      int attr_args = (attr & RS6000_BTC_OPND_MASK);
       int nopnds = 0;
       tree gpr_type_node;
       tree rettype;
@@ -13192,6 +13661,8 @@ builtin_function_type (machine_mode mode_ret, machine_mode mode_arg0,
     case P8V_BUILTIN_VGBBD:
     case MISC_BUILTIN_CDTBCD:
     case MISC_BUILTIN_CBCDTD:
+    case VSX_BUILTIN_XVCVSPBF16:
+    case VSX_BUILTIN_XVCVBF16SP:
       h.uns_p[0] = 1;
       h.uns_p[1] = 1;
       break;
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 5948f63ba4c..62f05eeb975 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -9944,7 +9944,8 @@ rs6000_emit_move (rtx dest, rtx source, machine_mode mode)
 
     case E_POImode:
     case E_PXImode:
-      if (CONSTANT_P (operands[1]))
+      if (CONSTANT_P (operands[1])
+	  && INTVAL (operands[1]) != 0)
 	error ("%qs is an opaque type, and you can't set it to other values.",
 	       (mode == POImode) ? "__vector_pair" : "__vector_quad");
       break;
@@ -12856,6 +12857,14 @@ print_operand (FILE *file, rtx x, int code)
       /* %c is output_addr_const if a CONSTANT_ADDRESS_P, otherwise
 	 output_operand.  */
 
+    case 'A':
+      /* Write the MMA accumulator number associated with VSX register X.  */
+      if (!REG_P (x) || !FP_REGNO_P (REGNO (x)) || (REGNO (x) % 4) != 0)
+	output_operand_lossage ("invalid %%A value");
+      else
+	fprintf (file, "%d", (REGNO (x) - FIRST_FPR_REGNO) / 4);
+      return;
+
     case 'D':
       /* Like 'J' but get to the GT bit only.  */
       if (!REG_P (x) || !CR_REGNO_P (REGNO (x)))
@@ -15968,6 +15977,12 @@ rs6000_split_multireg_move (rtx dst, rtx src)
 	  unsigned offset = 0;
 	  unsigned size = GET_MODE_SIZE (reg_mode);
 
+	  /* If we are reading an accumulator register, we have to
+	     deprime it before we can access it.  */
+	  if (TARGET_MMA
+	      && GET_MODE (src) == PXImode && FP_REGNO_P (REGNO (src)))
+	    emit_insn (gen_mma_xxmfacc (src, src));
+
 	  for (int i = nregs - 1; i >= 0; i--)
 	    {
 	      rtx dst2 = adjust_address (dst, reg_mode, offset);
@@ -15994,6 +16009,32 @@ rs6000_split_multireg_move (rtx dst, rtx src)
 	      emit_insn (gen_rtx_SET (dst2, src2));
 	    }
 
+	  /* If we are writing an accumulator register, we have to
+	     prime it after we've written it.  */
+	  if (TARGET_MMA
+	      && GET_MODE (dst) == PXImode && FP_REGNO_P (REGNO (dst)))
+	    emit_insn (gen_mma_xxmtacc (dst, dst));
+
+	  return;
+	}
+
+      if (GET_CODE (src) == UNSPEC)
+	{
+	  gcc_assert (REG_P (dst)
+		      && FP_REGNO_P (REGNO (dst))
+		      && XINT (src, 1) == UNSPEC_MMA_ASSEMBLE_ACC);
+
+	  reg_mode = GET_MODE (XVECEXP (src, 0, 0));
+	  for (int i = 0; i < XVECLEN (src, 0); i++)
+	    {
+	      rtx dst_i = gen_rtx_REG (reg_mode, reg + i);
+	      emit_insn (gen_rtx_SET (dst_i, XVECEXP (src, 0, i)));
+	    }
+
+	  /* We are writing an accumulator register, so we have to
+	     prime it after we've written it.  */
+	  emit_insn (gen_mma_xxmtacc (dst, dst));
+
 	  return;
 	}
 
@@ -16002,6 +16043,12 @@ rs6000_split_multireg_move (rtx dst, rtx src)
 
   if (REG_P (src) && REG_P (dst) && (REGNO (src) < REGNO (dst)))
     {
+      /* If we are reading an accumulator register, we have to
+	 deprime it before we can access it.  */
+      if (TARGET_MMA
+	  && GET_MODE (src) == PXImode && FP_REGNO_P (REGNO (src)))
+	emit_insn (gen_mma_xxmfacc (src, src));
+
       /* Move register range backwards, if we might have destructive
 	 overlap.  */
       int i;
@@ -16010,6 +16057,12 @@ rs6000_split_multireg_move (rtx dst, rtx src)
 						     i * reg_mode_size),
 				simplify_gen_subreg (reg_mode, src, mode,
 						     i * reg_mode_size)));
+
+      /* If we are writing an accumulator register, we have to
+	 prime it after we've written it.  */
+      if (TARGET_MMA
+	  && GET_MODE (dst) == PXImode && FP_REGNO_P (REGNO (dst)))
+	emit_insn (gen_mma_xxmtacc (dst, dst));
     }
   else
     {
@@ -16142,6 +16195,12 @@ rs6000_split_multireg_move (rtx dst, rtx src)
 	    gcc_assert (rs6000_offsettable_memref_p (dst, reg_mode, true));
 	}
 
+      /* If we are reading an accumulator register, we have to
+	 deprime it before we can access it.  */
+      if (TARGET_MMA && REG_P (src)
+	  && GET_MODE (src) == PXImode && FP_REGNO_P (REGNO (src)))
+	emit_insn (gen_mma_xxmfacc (src, src));
+
       for (i = 0; i < nregs; i++)
 	{
 	  /* Calculate index to next subword.  */
@@ -16159,6 +16218,13 @@ rs6000_split_multireg_move (rtx dst, rtx src)
 				  simplify_gen_subreg (reg_mode, src, mode,
 						       j * reg_mode_size)));
 	}
+
+      /* If we are writing an accumulator register, we have to
+	 prime it after we've written it.  */
+      if (TARGET_MMA && REG_P (dst)
+	  && GET_MODE (dst) == PXImode && FP_REGNO_P (REGNO (dst)))
+	emit_insn (gen_mma_xxmtacc (dst, dst));
+
       if (restore_basereg != NULL_RTX)
 	emit_insn (restore_basereg);
     }
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 9c103bf8f7d..f3883b51255 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -2251,20 +2251,24 @@ extern int frame_pointer_needed;
    flags macros, but we've run out of bits, so we now map the options into new
    settings used here.  */
 
-/* Builtin attributes.  */
-#define RS6000_BTC_SPECIAL	0x00000000	/* Special function.  */
+/* Builtin operand count.  */
 #define RS6000_BTC_UNARY	0x00000001	/* normal unary function.  */
 #define RS6000_BTC_BINARY	0x00000002	/* normal binary function.  */
 #define RS6000_BTC_TERNARY	0x00000003	/* normal ternary function.  */
 #define RS6000_BTC_QUATERNARY	0x00000004	/* normal quaternary
 						   function. */
+#define RS6000_BTC_QUINARY	0x00000005	/* normal quinary function.  */
+#define RS6000_BTC_SENARY	0x00000006	/* normal senary function.  */
+#define RS6000_BTC_OPND_MASK	0x00000007	/* Mask to isolate operands. */
 
-#define RS6000_BTC_PREDICATE	0x00000005	/* predicate function.  */
-#define RS6000_BTC_ABS		0x00000006	/* Altivec/VSX ABS
+/* Builtin attributes.  */
+#define RS6000_BTC_SPECIAL	0x00000000	/* Special function.  */
+#define RS6000_BTC_PREDICATE	0x00000008	/* predicate function.  */
+#define RS6000_BTC_ABS		0x00000010	/* Altivec/VSX ABS
 						   function.  */
-#define RS6000_BTC_DST		0x00000007	/* Altivec DST function.  */
+#define RS6000_BTC_DST		0x00000020	/* Altivec DST function.  */
 
-#define RS6000_BTC_TYPE_MASK	0x0000000f	/* Mask to isolate types */
+#define RS6000_BTC_TYPE_MASK	0x0000003f	/* Mask to isolate types */
 
 #define RS6000_BTC_MISC		0x00000000	/* No special attributes.  */
 #define RS6000_BTC_CONST	0x00000100	/* Neither uses, nor
@@ -2273,13 +2277,18 @@ extern int frame_pointer_needed;
 						   state/mem and does
 						   not modify global state.  */
 #define RS6000_BTC_FP		0x00000400	/* depends on rounding mode.  */
-#define RS6000_BTC_ATTR_MASK	0x00000700	/* Mask of the attributes.  */
+#define RS6000_BTC_QUAD		0x00000800	/* Uses a register quad.  */
+#define RS6000_BTC_PAIR		0x00001000	/* Uses a register pair.  */
+#define RS6000_BTC_QUADPAIR	0x00001800	/* Uses a quad and a pair.  */
+#define RS6000_BTC_ATTR_MASK	0x00001f00	/* Mask of the attributes.  */
 
 /* Miscellaneous information.  */
 #define RS6000_BTC_SPR		0x01000000	/* function references SPRs.  */
 #define RS6000_BTC_VOID		0x02000000	/* function has no return value.  */
 #define RS6000_BTC_CR		0x04000000	/* function references a CR.  */
 #define RS6000_BTC_OVERLOADED	0x08000000	/* function is overloaded.  */
+#define RS6000_BTC_GIMPLE	0x10000000	/* function should be expanded
+						   into gimple.  */
 #define RS6000_BTC_MISC_MASK	0x1f000000	/* Mask of the misc info.  */
 
 /* Convenience macros to document the instruction type.  */
@@ -2348,6 +2357,7 @@ extern int frame_pointer_needed;
 #undef RS6000_BUILTIN_A
 #undef RS6000_BUILTIN_D
 #undef RS6000_BUILTIN_H
+#undef RS6000_BUILTIN_M
 #undef RS6000_BUILTIN_P
 #undef RS6000_BUILTIN_X
 
@@ -2359,6 +2369,7 @@ extern int frame_pointer_needed;
 #define RS6000_BUILTIN_A(ENUM, NAME, MASK, ATTR, ICODE) ENUM,
 #define RS6000_BUILTIN_D(ENUM, NAME, MASK, ATTR, ICODE) ENUM,
 #define RS6000_BUILTIN_H(ENUM, NAME, MASK, ATTR, ICODE) ENUM,
+#define RS6000_BUILTIN_M(ENUM, NAME, MASK, ATTR, ICODE) ENUM,
 #define RS6000_BUILTIN_P(ENUM, NAME, MASK, ATTR, ICODE) ENUM,
 #define RS6000_BUILTIN_X(ENUM, NAME, MASK, ATTR, ICODE) ENUM,
 
@@ -2377,6 +2388,7 @@ enum rs6000_builtins
 #undef RS6000_BUILTIN_A
 #undef RS6000_BUILTIN_D
 #undef RS6000_BUILTIN_H
+#undef RS6000_BUILTIN_M
 #undef RS6000_BUILTIN_P
 #undef RS6000_BUILTIN_X
 
diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
index b35a15a2be1..48182cec229 100644
--- a/gcc/config/rs6000/mma.md
+++ b/gcc/config/rs6000/mma.md
@@ -19,6 +19,241 @@
 ;; along with GCC; see the file COPYING3.  If not see
 ;; <http://www.gnu.org/licenses/>.
 
+
+(define_constants [(MAX_MMA_OPERANDS 7)])
+
+;; Constants for creating unspecs
+
+(define_c_enum "unspec"
+  [UNSPEC_MMA_ASSEMBLE_ACC
+   UNSPEC_MMA_PMXVBF16GER2
+   UNSPEC_MMA_PMXVBF16GER2NN
+   UNSPEC_MMA_PMXVBF16GER2NP
+   UNSPEC_MMA_PMXVBF16GER2PN
+   UNSPEC_MMA_PMXVBF16GER2PP
+   UNSPEC_MMA_PMXVF16GER2
+   UNSPEC_MMA_PMXVF16GER2NN
+   UNSPEC_MMA_PMXVF16GER2NP
+   UNSPEC_MMA_PMXVF16GER2PN
+   UNSPEC_MMA_PMXVF16GER2PP
+   UNSPEC_MMA_PMXVF32GER
+   UNSPEC_MMA_PMXVF32GERNN
+   UNSPEC_MMA_PMXVF32GERNP
+   UNSPEC_MMA_PMXVF32GERPN
+   UNSPEC_MMA_PMXVF32GERPP
+   UNSPEC_MMA_PMXVF64GER
+   UNSPEC_MMA_PMXVF64GERNN
+   UNSPEC_MMA_PMXVF64GERNP
+   UNSPEC_MMA_PMXVF64GERPN
+   UNSPEC_MMA_PMXVF64GERPP
+   UNSPEC_MMA_PMXVI16GER2
+   UNSPEC_MMA_PMXVI16GER2PP
+   UNSPEC_MMA_PMXVI16GER2S
+   UNSPEC_MMA_PMXVI16GER2SPP
+   UNSPEC_MMA_PMXVI4GER8
+   UNSPEC_MMA_PMXVI4GER8PP
+   UNSPEC_MMA_PMXVI8GER4
+   UNSPEC_MMA_PMXVI8GER4PP
+   UNSPEC_MMA_PMXVI8GER4SPP
+   UNSPEC_MMA_XVBF16GER2
+   UNSPEC_MMA_XVBF16GER2NN
+   UNSPEC_MMA_XVBF16GER2NP
+   UNSPEC_MMA_XVBF16GER2PN
+   UNSPEC_MMA_XVBF16GER2PP
+   UNSPEC_MMA_XVF16GER2
+   UNSPEC_MMA_XVF16GER2NN
+   UNSPEC_MMA_XVF16GER2NP
+   UNSPEC_MMA_XVF16GER2PN
+   UNSPEC_MMA_XVF16GER2PP
+   UNSPEC_MMA_XVF32GER
+   UNSPEC_MMA_XVF32GERNN
+   UNSPEC_MMA_XVF32GERNP
+   UNSPEC_MMA_XVF32GERPN
+   UNSPEC_MMA_XVF32GERPP
+   UNSPEC_MMA_XVF64GER
+   UNSPEC_MMA_XVF64GERNN
+   UNSPEC_MMA_XVF64GERNP
+   UNSPEC_MMA_XVF64GERPN
+   UNSPEC_MMA_XVF64GERPP
+   UNSPEC_MMA_XVI16GER2
+   UNSPEC_MMA_XVI16GER2PP
+   UNSPEC_MMA_XVI16GER2S
+   UNSPEC_MMA_XVI16GER2SPP
+   UNSPEC_MMA_XVI4GER8
+   UNSPEC_MMA_XVI4GER8PP
+   UNSPEC_MMA_XVI8GER4
+   UNSPEC_MMA_XVI8GER4PP
+   UNSPEC_MMA_XVI8GER4SPP
+   UNSPEC_MMA_XXMFACC
+   UNSPEC_MMA_XXMTACC
+  ])
+
+;; MMA instructions with 1 accumulator argument
+(define_int_iterator MMA_ACC		[UNSPEC_MMA_XXMFACC
+					 UNSPEC_MMA_XXMTACC])
+
+;; MMA instructions with 2 vector arguments
+(define_int_iterator MMA_VV		[UNSPEC_MMA_XVI4GER8
+					 UNSPEC_MMA_XVI8GER4
+					 UNSPEC_MMA_XVI16GER2
+					 UNSPEC_MMA_XVI16GER2S
+					 UNSPEC_MMA_XVF16GER2
+					 UNSPEC_MMA_XVBF16GER2
+					 UNSPEC_MMA_XVF32GER])
+
+;; MMA instructions with 1 accumulator and 2 vector arguments
+(define_int_iterator MMA_AVV		[UNSPEC_MMA_XVI4GER8PP
+					 UNSPEC_MMA_XVI8GER4PP
+					 UNSPEC_MMA_XVI8GER4SPP
+					 UNSPEC_MMA_XVI16GER2PP
+					 UNSPEC_MMA_XVI16GER2SPP
+					 UNSPEC_MMA_XVF16GER2PP
+					 UNSPEC_MMA_XVF16GER2PN
+					 UNSPEC_MMA_XVF16GER2NP
+					 UNSPEC_MMA_XVF16GER2NN
+					 UNSPEC_MMA_XVBF16GER2PP
+					 UNSPEC_MMA_XVBF16GER2PN
+					 UNSPEC_MMA_XVBF16GER2NP
+					 UNSPEC_MMA_XVBF16GER2NN
+					 UNSPEC_MMA_XVF32GERPP
+					 UNSPEC_MMA_XVF32GERPN
+					 UNSPEC_MMA_XVF32GERNP
+					 UNSPEC_MMA_XVF32GERNN])
+
+;; MMA instructions with 1 vector pair and 1 vector arguments
+(define_int_iterator MMA_PV		[UNSPEC_MMA_XVF64GER])
+
+;; MMA instructions with 1 accumulator, 1 vector pair and 1 vector arguments
+(define_int_iterator MMA_APV		[UNSPEC_MMA_XVF64GERPP
+					 UNSPEC_MMA_XVF64GERPN
+					 UNSPEC_MMA_XVF64GERNP
+					 UNSPEC_MMA_XVF64GERNN])
+
+;; MMA instructions with 2 vector, 2 4-bit and 1 8-bit arguments
+(define_int_iterator MMA_VVI4I4I8	[UNSPEC_MMA_PMXVI4GER8])
+
+;; MMA instructions with 1 accumulator, 2 vector, 2 4-bit and 1 8-bit arguments
+(define_int_iterator MMA_AVVI4I4I8	[UNSPEC_MMA_PMXVI4GER8PP])
+
+;; MMA instructions with 2 vector, 2 4-bit and 1 2-bit arguments
+(define_int_iterator MMA_VVI4I4I2	[UNSPEC_MMA_PMXVI16GER2
+					 UNSPEC_MMA_PMXVI16GER2S
+					 UNSPEC_MMA_PMXVF16GER2
+					 UNSPEC_MMA_PMXVBF16GER2])
+
+;; MMA instructions with 1 accumulator, 2 vector, 2 4-bit and 1 2-bit arguments
+(define_int_iterator MMA_AVVI4I4I2	[UNSPEC_MMA_PMXVI16GER2PP
+					 UNSPEC_MMA_PMXVI16GER2SPP
+					 UNSPEC_MMA_PMXVF16GER2PP
+					 UNSPEC_MMA_PMXVF16GER2PN
+					 UNSPEC_MMA_PMXVF16GER2NP
+					 UNSPEC_MMA_PMXVF16GER2NN
+					 UNSPEC_MMA_PMXVBF16GER2PP
+					 UNSPEC_MMA_PMXVBF16GER2PN
+					 UNSPEC_MMA_PMXVBF16GER2NP
+					 UNSPEC_MMA_PMXVBF16GER2NN])
+
+;; MMA instructions with 2 vector and 2 4-bit arguments
+(define_int_iterator MMA_VVI4I4		[UNSPEC_MMA_PMXVF32GER])
+
+;; MMA instructions with 1 accumulator, 2 vector and 2 4-bit arguments
+(define_int_iterator MMA_AVVI4I4	[UNSPEC_MMA_PMXVF32GERPP
+					 UNSPEC_MMA_PMXVF32GERPN
+					 UNSPEC_MMA_PMXVF32GERNP
+					 UNSPEC_MMA_PMXVF32GERNN])
+
+;; MMA instructions with 2 vector, 1 4-bit and 1 2-bit arguments
+(define_int_iterator MMA_PVI4I2		[UNSPEC_MMA_PMXVF64GER])
+
+;; MMA instructions with 1 accumulator, 2 vector, 1 4-bit and 1 2-bit arguments
+(define_int_iterator MMA_APVI4I2	[UNSPEC_MMA_PMXVF64GERPP
+					 UNSPEC_MMA_PMXVF64GERPN
+					 UNSPEC_MMA_PMXVF64GERNP
+					 UNSPEC_MMA_PMXVF64GERNN])
+
+;; MMA instructions with 2 vector and 3 4-bit arguments
+(define_int_iterator MMA_VVI4I4I4	[UNSPEC_MMA_PMXVI8GER4])
+
+;; MMA instructions with 1 accumulator, 2 vector and 3 4-bit arguments
+(define_int_iterator MMA_AVVI4I4I4	[UNSPEC_MMA_PMXVI8GER4PP
+					 UNSPEC_MMA_PMXVI8GER4SPP])
+
+(define_int_attr acc		[(UNSPEC_MMA_XXMFACC		"xxmfacc")
+				 (UNSPEC_MMA_XXMTACC		"xxmtacc")])
+
+(define_int_attr vv		[(UNSPEC_MMA_XVI4GER8		"xvi4ger8")
+				 (UNSPEC_MMA_XVI8GER4		"xvi8ger4")
+				 (UNSPEC_MMA_XVI16GER2		"xvi16ger2")
+				 (UNSPEC_MMA_XVI16GER2S		"xvi16ger2s")
+				 (UNSPEC_MMA_XVF16GER2		"xvf16ger2")
+				 (UNSPEC_MMA_XVBF16GER2		"xvbf16ger2")
+				 (UNSPEC_MMA_XVF32GER		"xvf32ger")])
+
+(define_int_attr avv		[(UNSPEC_MMA_XVI4GER8PP		"xvi4ger8pp")
+				 (UNSPEC_MMA_XVI8GER4PP		"xvi8ger4pp")
+				 (UNSPEC_MMA_XVI8GER4SPP	"xvi8ger4spp")
+				 (UNSPEC_MMA_XVI16GER2PP	"xvi16ger2pp")
+				 (UNSPEC_MMA_XVI16GER2SPP	"xvi16ger2spp")
+				 (UNSPEC_MMA_XVF16GER2PP	"xvf16ger2pp")
+				 (UNSPEC_MMA_XVF16GER2PN	"xvf16ger2pn")
+				 (UNSPEC_MMA_XVF16GER2NP	"xvf16ger2np")
+				 (UNSPEC_MMA_XVF16GER2NN	"xvf16ger2nn")
+				 (UNSPEC_MMA_XVBF16GER2PP	"xvbf16ger2pp")
+				 (UNSPEC_MMA_XVBF16GER2PN	"xvbf16ger2pn")
+				 (UNSPEC_MMA_XVBF16GER2NP	"xvbf16ger2np")
+				 (UNSPEC_MMA_XVBF16GER2NN	"xvbf16ger2nn")
+				 (UNSPEC_MMA_XVF32GERPP		"xvf32gerpp")
+				 (UNSPEC_MMA_XVF32GERPN		"xvf32gerpn")
+				 (UNSPEC_MMA_XVF32GERNP		"xvf32gernp")
+				 (UNSPEC_MMA_XVF32GERNN		"xvf32gernn")])
+
+(define_int_attr pv		[(UNSPEC_MMA_XVF64GER		"xvf64ger")])
+
+(define_int_attr apv		[(UNSPEC_MMA_XVF64GERPP		"xvf64gerpp")
+				 (UNSPEC_MMA_XVF64GERPN		"xvf64gerpn")
+				 (UNSPEC_MMA_XVF64GERNP		"xvf64gernp")
+				 (UNSPEC_MMA_XVF64GERNN		"xvf64gernn")])
+
+(define_int_attr vvi4i4i8	[(UNSPEC_MMA_PMXVI4GER8		"pmxvi4ger8")])
+
+(define_int_attr avvi4i4i8	[(UNSPEC_MMA_PMXVI4GER8PP	"pmxvi4ger8pp")])
+
+(define_int_attr vvi4i4i2	[(UNSPEC_MMA_PMXVI16GER2	"pmxvi16ger2")
+				 (UNSPEC_MMA_PMXVI16GER2S	"pmxvi16ger2s")
+				 (UNSPEC_MMA_PMXVF16GER2	"pmxvf16ger2")
+				 (UNSPEC_MMA_PMXVBF16GER2	"pmxvbf16ger2")])
+
+(define_int_attr avvi4i4i2	[(UNSPEC_MMA_PMXVI16GER2PP	"pmxvi16ger2pp")
+				 (UNSPEC_MMA_PMXVI16GER2SPP	"pmxvi16ger2spp")
+				 (UNSPEC_MMA_PMXVF16GER2PP	"pmxvf16ger2pp")
+				 (UNSPEC_MMA_PMXVF16GER2PN	"pmxvf16ger2pn")
+				 (UNSPEC_MMA_PMXVF16GER2NP	"pmxvf16ger2np")
+				 (UNSPEC_MMA_PMXVF16GER2NN	"pmxvf16ger2nn")
+				 (UNSPEC_MMA_PMXVBF16GER2PP	"pmxvbf16ger2pp")
+				 (UNSPEC_MMA_PMXVBF16GER2PN	"pmxvbf16ger2pn")
+				 (UNSPEC_MMA_PMXVBF16GER2NP	"pmxvbf16ger2np")
+				 (UNSPEC_MMA_PMXVBF16GER2NN	"pmxvbf16ger2nn")])
+
+(define_int_attr vvi4i4		[(UNSPEC_MMA_PMXVF32GER		"pmxvf32ger")])
+
+(define_int_attr avvi4i4	[(UNSPEC_MMA_PMXVF32GERPP	"pmxvf32gerpp")
+				 (UNSPEC_MMA_PMXVF32GERPN	"pmxvf32gerpn")
+				 (UNSPEC_MMA_PMXVF32GERNP	"pmxvf32gernp")
+				 (UNSPEC_MMA_PMXVF32GERNN	"pmxvf32gernn")])
+
+(define_int_attr pvi4i2		[(UNSPEC_MMA_PMXVF64GER		"pmxvf64ger")])
+
+(define_int_attr apvi4i2	[(UNSPEC_MMA_PMXVF64GERPP	"pmxvf64gerpp")
+				 (UNSPEC_MMA_PMXVF64GERPN	"pmxvf64gerpn")
+				 (UNSPEC_MMA_PMXVF64GERNP	"pmxvf64gernp")
+				 (UNSPEC_MMA_PMXVF64GERNN	"pmxvf64gernn")])
+
+(define_int_attr vvi4i4i4	[(UNSPEC_MMA_PMXVI8GER4		"pmxvi8ger4")])
+
+(define_int_attr avvi4i4i4	[(UNSPEC_MMA_PMXVI8GER4PP	"pmxvi8ger4pp")
+				 (UNSPEC_MMA_PMXVI8GER4SPP	"pmxvi8ger4spp")])
+
+
 ;; Vector load/store pair operations
 ;; We need to define an OImode move pattern, even though we don't enable it,
 ;; because the machine independent parts of the compiler at times uses the
@@ -111,10 +346,11 @@ (define_expand "movpxi"
 })
 
 (define_insn_and_split "*movpxi"
-  [(set (match_operand:PXI 0 "nonimmediate_operand" "=d,m,d")
-	(match_operand:PXI 1 "input_operand" "m,d,d"))]
+  [(set (match_operand:PXI 0 "nonimmediate_operand" "=d,m,d,d")
+	(match_operand:PXI 1 "input_operand"    "m,d,d,O"))]
   "TARGET_MMA
-   && (gpc_reg_operand (operands[0], PXImode)
+   && ((gpc_reg_operand (operands[0], PXImode)
+	&& !(CONST_INT_P (operands[1]) && INTVAL (operands[1]) == 0))
        || gpc_reg_operand (operands[1], PXImode))"
   "#"
   "&& reload_completed"
@@ -123,6 +359,249 @@ (define_insn_and_split "*movpxi"
   rs6000_split_multireg_move (operands[0], operands[1]);
   DONE;
 }
-  [(set_attr "type" "vecload,vecstore,veclogical")
-   (set_attr "length" "8,8,16")
-   (set_attr "max_prefixed_insns" "2,2,*")])
+  [(set_attr "type" "vecload,vecstore,veclogical,mma")
+   (set_attr "length" "8,8,16,*")
+   (set_attr "max_prefixed_insns" "2,2,*,*")])
+
+(define_expand "mma_assemble_pair"
+  [(match_operand:POI 0 "vsx_register_operand")
+   (match_operand:V16QI 1 "input_operand")
+   (match_operand:V16QI 2 "input_operand")]
+  "TARGET_MMA"
+{
+  rtx dst;
+
+  /* Let the compiler know the code below fully defines our output value.  */
+  emit_clobber (operands[0]);
+
+  dst = simplify_gen_subreg (V16QImode, operands[0], POImode, 0);
+  emit_move_insn (dst, operands[1]);
+  dst = simplify_gen_subreg (V16QImode, operands[0], POImode, 16);
+  emit_move_insn (dst, operands[2]);
+  DONE;
+})
+
+(define_expand "mma_assemble_acc"
+  [(match_operand:PXI 0 "fpr_reg_operand")
+   (match_operand:V16QI 1 "input_operand")
+   (match_operand:V16QI 2 "input_operand")
+   (match_operand:V16QI 3 "input_operand")
+   (match_operand:V16QI 4 "input_operand")]
+  "TARGET_MMA"
+{
+  rtx src = gen_rtx_UNSPEC (PXImode,
+			    gen_rtvec (4, operands[1], operands[2],
+				       operands[3], operands[4]),
+			    UNSPEC_MMA_ASSEMBLE_ACC);
+  emit_move_insn (operands[0], src);
+  DONE;
+})
+
+(define_insn_and_split "*mma_assemble_acc"
+  [(set (match_operand:PXI 0 "fpr_reg_operand" "=d")
+	(unspec:PXI [(match_operand:PXI 1 "mma_input_operand" "mwa")
+		     (match_operand:PXI 2 "mma_input_operand" "mwa")
+		     (match_operand:PXI 3 "mma_input_operand" "mwa")
+		     (match_operand:PXI 4 "mma_input_operand" "mwa")]
+		     UNSPEC_MMA_ASSEMBLE_ACC))]
+  "TARGET_MMA
+   && fpr_reg_operand (operands[0], PXImode)"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  rtx src = gen_rtx_UNSPEC (PXImode,
+			    gen_rtvec (4, operands[1], operands[2],
+				       operands[3], operands[4]),
+			    UNSPEC_MMA_ASSEMBLE_ACC);
+  rs6000_split_multireg_move (operands[0], src);
+  DONE;
+})
+
+;; MMA instructions that do not use their accumulators as an input, still
+;; must not allow their vector operands to overlap the registers used by
+;; the accumulator.  We enforce this by marking the output as early clobber.
+
+(define_insn "mma_<acc>"
+  [(set (match_operand:PXI 0 "fpr_reg_operand" "=&d")
+	(unspec:PXI [(match_operand:PXI 1 "fpr_reg_operand" "0")]
+		    MMA_ACC))]
+  "TARGET_MMA"
+  "<acc> %A0"
+  [(set_attr "type" "mma")])
+
+(define_insn "mma_xxsetaccz"
+  [(set (match_operand:PXI 0 "fpr_reg_operand" "=d")
+	(const_int 0))]
+  "TARGET_MMA"
+  "xxsetaccz %A0"
+  [(set_attr "type" "mma")])
+
+(define_insn "mma_<vv>"
+  [(set (match_operand:PXI 0 "fpr_reg_operand" "=&d")
+	(unspec:PXI [(match_operand:V16QI 1 "vsx_register_operand" "wa")
+		     (match_operand:V16QI 2 "vsx_register_operand" "wa")]
+		     MMA_VV))]
+  "TARGET_MMA"
+  "<vv> %A0,%x1,%x2"
+  [(set_attr "type" "mma")])
+
+(define_insn "mma_<avv>"
+  [(set (match_operand:PXI 0 "fpr_reg_operand" "=&d")
+	(unspec:PXI [(match_operand:PXI 1 "fpr_reg_operand" "0")
+		     (match_operand:V16QI 2 "vsx_register_operand" "wa")
+		     (match_operand:V16QI 3 "vsx_register_operand" "wa")]
+		     MMA_AVV))]
+  "TARGET_MMA"
+  "<avv> %A0,%x2,%x3"
+  [(set_attr "type" "mma")])
+
+(define_insn "mma_<pv>"
+  [(set (match_operand:PXI 0 "fpr_reg_operand" "=&d")
+	(unspec:PXI [(match_operand:POI 1 "vsx_register_operand" "wa")
+		     (match_operand:V16QI 2 "vsx_register_operand" "wa")]
+		     MMA_PV))]
+  "TARGET_MMA"
+  "<pv> %A0,%x1,%x2"
+  [(set_attr "type" "mma")])
+
+(define_insn "mma_<apv>"
+  [(set (match_operand:PXI 0 "fpr_reg_operand" "=&d")
+	(unspec:PXI [(match_operand:PXI 1 "fpr_reg_operand" "0")
+		     (match_operand:POI 2 "vsx_register_operand" "wa")
+		     (match_operand:V16QI 3 "vsx_register_operand" "wa")]
+		     MMA_APV))]
+  "TARGET_MMA"
+  "<apv> %A0,%x2,%x3"
+  [(set_attr "type" "mma")])
+
+(define_insn "mma_<vvi4i4i8>"
+  [(set (match_operand:PXI 0 "fpr_reg_operand" "=&d")
+	(unspec:PXI [(match_operand:V16QI 1 "vsx_register_operand" "wa")
+		     (match_operand:V16QI 2 "vsx_register_operand" "wa")
+		     (match_operand:SI 3 "const_0_to_15_operand" "n")
+		     (match_operand:SI 4 "const_0_to_15_operand" "n")
+		     (match_operand:SI 5 "u8bit_cint_operand" "n")]
+		     MMA_VVI4I4I8))]
+  "TARGET_MMA"
+  "<vvi4i4i8> %A0,%x1,%x2,%3,%4,%5"
+  [(set_attr "type" "mma")
+   (set_attr "length" "8")])
+
+(define_insn "mma_<avvi4i4i8>"
+  [(set (match_operand:PXI 0 "fpr_reg_operand" "=&d")
+	(unspec:PXI [(match_operand:PXI 1 "fpr_reg_operand" "0")
+		     (match_operand:V16QI 2 "vsx_register_operand" "wa")
+		     (match_operand:V16QI 3 "vsx_register_operand" "wa")
+		     (match_operand:SI 4 "const_0_to_15_operand" "n")
+		     (match_operand:SI 5 "const_0_to_15_operand" "n")
+		     (match_operand:SI 6 "u8bit_cint_operand" "n")]
+		     MMA_AVVI4I4I8))]
+  "TARGET_MMA"
+  "<avvi4i4i8> %A0,%x2,%x3,%4,%5,%6"
+  [(set_attr "type" "mma")
+   (set_attr "length" "8")])
+
+(define_insn "mma_<vvi4i4i2>"
+  [(set (match_operand:PXI 0 "fpr_reg_operand" "=&d")
+	(unspec:PXI [(match_operand:V16QI 1 "vsx_register_operand" "wa")
+		     (match_operand:V16QI 2 "vsx_register_operand" "wa")
+		     (match_operand:SI 3 "const_0_to_15_operand" "n")
+		     (match_operand:SI 4 "const_0_to_15_operand" "n")
+		     (match_operand:SI 5 "const_0_to_3_operand" "n")]
+		     MMA_VVI4I4I2))]
+  "TARGET_MMA"
+  "<vvi4i4i2> %A0,%x1,%x2,%3,%4,%5"
+  [(set_attr "type" "mma")
+   (set_attr "length" "8")])
+
+(define_insn "mma_<avvi4i4i2>"
+  [(set (match_operand:PXI 0 "fpr_reg_operand" "=&d")
+	(unspec:PXI [(match_operand:PXI 1 "fpr_reg_operand" "0")
+		     (match_operand:V16QI 2 "vsx_register_operand" "wa")
+		     (match_operand:V16QI 3 "vsx_register_operand" "wa")
+		     (match_operand:SI 4 "const_0_to_15_operand" "n")
+		     (match_operand:SI 5 "const_0_to_15_operand" "n")
+		     (match_operand:SI 6 "const_0_to_3_operand" "n")]
+		     MMA_AVVI4I4I2))]
+  "TARGET_MMA"
+  "<avvi4i4i2> %A0,%x2,%x3,%4,%5,%6"
+  [(set_attr "type" "mma")
+   (set_attr "length" "8")])
+
+(define_insn "mma_<vvi4i4>"
+  [(set (match_operand:PXI 0 "fpr_reg_operand" "=&d")
+	(unspec:PXI [(match_operand:V16QI 1 "vsx_register_operand" "wa")
+		     (match_operand:V16QI 2 "vsx_register_operand" "wa")
+		     (match_operand:SI 3 "const_0_to_15_operand" "n")
+		     (match_operand:SI 4 "const_0_to_15_operand" "n")]
+		     MMA_VVI4I4))]
+  "TARGET_MMA"
+  "<vvi4i4> %A0,%x1,%x2,%3,%4"
+  [(set_attr "type" "mma")
+   (set_attr "length" "8")])
+
+(define_insn "mma_<avvi4i4>"
+  [(set (match_operand:PXI 0 "fpr_reg_operand" "=&d")
+	(unspec:PXI [(match_operand:PXI 1 "fpr_reg_operand" "0")
+		     (match_operand:V16QI 2 "vsx_register_operand" "wa")
+		     (match_operand:V16QI 3 "vsx_register_operand" "wa")
+		     (match_operand:SI 4 "const_0_to_15_operand" "n")
+		     (match_operand:SI 5 "const_0_to_15_operand" "n")]
+		     MMA_AVVI4I4))]
+  "TARGET_MMA"
+  "<avvi4i4> %A0,%x2,%x3,%4,%5"
+  [(set_attr "type" "mma")
+   (set_attr "length" "8")])
+
+(define_insn "mma_<pvi4i2>"
+  [(set (match_operand:PXI 0 "fpr_reg_operand" "=&d")
+	(unspec:PXI [(match_operand:POI 1 "vsx_register_operand" "wa")
+		     (match_operand:V16QI 2 "vsx_register_operand" "wa")
+		     (match_operand:SI 3 "const_0_to_15_operand" "n")
+		     (match_operand:SI 4 "const_0_to_3_operand" "n")]
+		     MMA_PVI4I2))]
+  "TARGET_MMA"
+  "<pvi4i2> %A0,%x1,%x2,%3,%4"
+  [(set_attr "type" "mma")
+   (set_attr "length" "8")])
+
+(define_insn "mma_<apvi4i2>"
+  [(set (match_operand:PXI 0 "fpr_reg_operand" "=&d")
+	(unspec:PXI [(match_operand:PXI 1 "fpr_reg_operand" "0")
+		     (match_operand:POI 2 "vsx_register_operand" "wa")
+		     (match_operand:V16QI 3 "vsx_register_operand" "wa")
+		     (match_operand:SI 4 "const_0_to_15_operand" "n")
+		     (match_operand:SI 5 "const_0_to_3_operand" "n")]
+		     MMA_APVI4I2))]
+  "TARGET_MMA"
+  "<apvi4i2> %A0,%x2,%x3,%4,%5"
+  [(set_attr "type" "mma")
+   (set_attr "length" "8")])
+
+(define_insn "mma_<vvi4i4i4>"
+  [(set (match_operand:PXI 0 "fpr_reg_operand" "=&d")
+	(unspec:PXI [(match_operand:V16QI 1 "vsx_register_operand" "wa")
+		     (match_operand:V16QI 2 "vsx_register_operand" "wa")
+		     (match_operand:SI 3 "const_0_to_15_operand" "n")
+		     (match_operand:SI 4 "const_0_to_15_operand" "n")
+		     (match_operand:SI 5 "const_0_to_15_operand" "n")]
+		     MMA_VVI4I4I4))]
+  "TARGET_MMA"
+  "<vvi4i4i4> %A0,%x1,%x2,%3,%4,%5"
+  [(set_attr "type" "mma")
+   (set_attr "length" "8")])
+
+(define_insn "mma_<avvi4i4i4>"
+  [(set (match_operand:PXI 0 "fpr_reg_operand" "=&d")
+	(unspec:PXI [(match_operand:PXI 1 "fpr_reg_operand" "0")
+		     (match_operand:V16QI 2 "vsx_register_operand" "wa")
+		     (match_operand:V16QI 3 "vsx_register_operand" "wa")
+		     (match_operand:SI 4 "const_0_to_15_operand" "n")
+		     (match_operand:SI 5 "const_0_to_15_operand" "n")
+		     (match_operand:SI 6 "const_0_to_15_operand" "n")]
+		     MMA_AVVI4I4I4))]
+  "TARGET_MMA"
+  "<avvi4i4i4> %A0,%x2,%x3,%4,%5,%6"
+  [(set_attr "type" "mma")
+   (set_attr "length" "8")])
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 6b462a3ecdb..bbe0b4610fb 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -203,7 +203,7 @@ (define_attr "type"
    vecsimple,veccomplex,vecdiv,veccmp,veccmpsimple,vecperm,
    vecfloat,vecfdiv,vecdouble,mffgpr,mftgpr,crypto,
    veclogical,veccmpfx,vecexts,vecmove,
-   htm,htmsimple,dfp"
+   htm,htmsimple,dfp,mma"
   (const_string "integer"))
 
 ;; What data size does this instruction work on?
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 2a28215ac5b..342927abeda 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -296,6 +296,8 @@ (define_c_enum "unspec"
    UNSPEC_VSX_DIVUD
    UNSPEC_VSX_MULSD
    UNSPEC_VSX_SIGN_EXTEND
+   UNSPEC_VSX_XVCVBF16SP
+   UNSPEC_VSX_XVCVSPBF16
    UNSPEC_VSX_XVCVSPSXDS
    UNSPEC_VSX_VSLO
    UNSPEC_VSX_EXTRACT
@@ -346,6 +348,12 @@ (define_c_enum "unspec"
    UNSPEC_XXGENPCV
   ])
 
+(define_int_iterator XVCVBF16	[UNSPEC_VSX_XVCVSPBF16
+				 UNSPEC_VSX_XVCVBF16SP])
+
+(define_int_attr xvcvbf16       [(UNSPEC_VSX_XVCVSPBF16 "xvcvspbf16")
+				 (UNSPEC_VSX_XVCVBF16SP "xvcvbf16sp")])
+
 ;; VSX moves
 
 ;; The patterns for LE permuted loads and stores come before the general
@@ -5676,3 +5684,10 @@ (define_expand "vec_unpack_<su>fix_trunc_lo_v4sf"
   DONE;
 })
 
+(define_insn "vsx_<xvcvbf16>"
+  [(set (match_operand:V16QI 0 "vsx_register_operand" "=wa")
+	(unspec:V16QI [(match_operand:V16QI 1 "vsx_register_operand" "wa")]
+		      XVCVBF16))]
+  "TARGET_FUTURE"
+  "<xvcvbf16> %x0,%x1"
+  [(set_attr "type" "vecfloat")])
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index e656e66a80c..ca51c9bbadd 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -13858,6 +13858,7 @@ instructions, but allow the compiler to schedule those calls.
 * PowerPC AltiVec/VSX Built-in Functions::
 * PowerPC Hardware Transactional Memory Built-in Functions::
 * PowerPC Atomic Memory Operation Functions::
+* PowerPC Matrix-Multiply Assist Built-in Functions::
 * RX Built-in Functions::
 * S/390 System z Built-in Functions::
 * SH Built-in Functions::
@@ -21359,6 +21360,100 @@ void amo_stdat_smax (int64_t *, int64_t);
 void amo_stdat_smin (int64_t *, int64_t);
 @end smallexample
 
+@node PowerPC Matrix-Multiply Assist Built-in Functions
+@subsection PowerPC Matrix-Multiply Assist Built-in Functions
+ISA 3.1 of the PowerPC added new Matrix-Multiply Assist (MMA) instructions.
+GCC provides support for these instructions through the following built-in
+functions which are enabled with the @code{-mmma} option.  The vec_t type
+below is defined to be a normal vector unsigned char type.  The uint2, uint4
+and uint8 parameters are 2-bit, 4-bit and 8-bit unsigned integer constants
+respectively.  The compiler will verify that they are constants and that
+their values are within range. 
+
+The built-in functions supported are:
+
+@smallexample
+void __builtin_mma_xvi4ger8 (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvi8ger4 (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvi16ger2 (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvi16ger2s (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvf16ger2 (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvbf16ger2 (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvf32ger (__vector_quad *, vec_t, vec_t);
+
+void __builtin_mma_xvi4ger8pp (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvi8ger4pp (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvi8ger4spp(__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvi16ger2pp (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvi16ger2spp (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvf16ger2pp (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvf16ger2pn (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvf16ger2np (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvf16ger2nn (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvbf16ger2pp (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvbf16ger2pn (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvbf16ger2np (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvbf16ger2nn (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvf32gerpp (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvf32gerpn (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvf32gernp (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvf32gernn (__vector_quad *, vec_t, vec_t);
+
+void __builtin_mma_pmxvi4ger8 (__vector_quad *, vec_t, vec_t, uint4, uint4, uint8);
+void __builtin_mma_pmxvi4ger8pp (__vector_quad *, vec_t, vec_t, uint4, uint4, uint8);
+
+void __builtin_mma_pmxvi8ger4 (__vector_quad *, vec_t, vec_t, uint4, uint4, uint4);
+void __builtin_mma_pmxvi8ger4pp (__vector_quad *, vec_t, vec_t, uint4, uint4, uint4);
+void __builtin_mma_pmxvi8ger4spp(__vector_quad *, vec_t, vec_t, uint4, uint4, uint4);
+
+void __builtin_mma_pmxvi16ger2 (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+void __builtin_mma_pmxvi16ger2s (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+void __builtin_mma_pmxvf16ger2 (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+void __builtin_mma_pmxvbf16ger2 (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+
+void __builtin_mma_pmxvi16ger2pp (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+void __builtin_mma_pmxvi16ger2spp (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+void __builtin_mma_pmxvf16ger2pp (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+void __builtin_mma_pmxvf16ger2pn (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+void __builtin_mma_pmxvf16ger2np (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+void __builtin_mma_pmxvf16ger2nn (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+void __builtin_mma_pmxvbf16ger2pp (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+void __builtin_mma_pmxvbf16ger2pn (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+void __builtin_mma_pmxvbf16ger2np (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+void __builtin_mma_pmxvbf16ger2nn (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+
+void __builtin_mma_pmxvf32ger (__vector_quad *, vec_t, vec_t, uint4, uint4);
+void __builtin_mma_pmxvf32gerpp (__vector_quad *, vec_t, vec_t, uint4, uint4);
+void __builtin_mma_pmxvf32gerpn (__vector_quad *, vec_t, vec_t, uint4, uint4);
+void __builtin_mma_pmxvf32gernp (__vector_quad *, vec_t, vec_t, uint4, uint4);
+void __builtin_mma_pmxvf32gernn (__vector_quad *, vec_t, vec_t, uint4, uint4);
+
+void __builtin_mma_xvf64ger (__vector_quad *, __vector_pair, vec_t);
+void __builtin_mma_xvf64gerpp (__vector_quad *, __vector_pair, vec_t);
+void __builtin_mma_xvf64gerpn (__vector_quad *, __vector_pair, vec_t);
+void __builtin_mma_xvf64gernp (__vector_quad *, __vector_pair, vec_t);
+void __builtin_mma_xvf64gernn (__vector_quad *, __vector_pair, vec_t);
+
+void __builtin_mma_pmxvf64ger (__vector_quad *, __vector_pair, vec_t, uint4, uint2);
+void __builtin_mma_pmxvf64gerpp (__vector_quad *, __vector_pair, vec_t, uint4, uint2);
+void __builtin_mma_pmxvf64gerpn (__vector_quad *, __vector_pair, vec_t, uint4, uint2);
+void __builtin_mma_pmxvf64gernp (__vector_quad *, __vector_pair, vec_t, uint4, uint2);
+void __builtin_mma_pmxvf64gernn (__vector_quad *, __vector_pair, vec_t, uint4, uint2);
+
+void __builtin_mma_xxmtacc (__vector_quad *);
+void __builtin_mma_xxmfacc (__vector_quad *);
+void __builtin_mma_xxsetaccz (__vector_quad *);
+
+void __builtin_mma_assemble_acc (__vector_quad *, vec_t, vec_t, vec_t, vec_t);
+void __builtin_mma_disassemble_acc (void *, __vector_quad *);
+
+void __builtin_mma_assemble_pair (__vector_pair *, vec_t, vec_t);
+void __builtin_mma_disassemble_pair (void *, __vector_pair *);
+
+vec_t __builtin_xvcvspbf16 (vec_t);
+vec_t __builtin_xvcvbf16sp (vec_t);
+@end smallexample
+
 @node RX Built-in Functions
 @subsection RX Built-in Functions
 GCC supports some of the RX instructions which cannot be expressed in

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 3/3] rs6000: Add testsuite test cases for MMA built-ins.
  2020-06-15 19:54 [PATCH 0/3] rs6000: Add support for Matrix-Multiply Assist (MMA) built-in functions Peter Bergner
  2020-06-15 19:56 ` [PATCH 1/3] rs6000: Add base support and types for defining MMA built-ins Peter Bergner
  2020-06-15 19:58 ` [PATCH 2/3] rs6000: Add MMA built-in function definitions Peter Bergner
@ 2020-06-15 19:59 ` Peter Bergner
  2020-06-15 22:43   ` will schmidt
  2 siblings, 1 reply; 12+ messages in thread
From: Peter Bergner @ 2020-06-15 19:59 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: David Edelsohn, GCC Patches, Bill Schmidt, Michael Meissner

This patch adds the testsuite test cases for all of the MMA built-ins.

This patch plus patch1 and patch2 passed bootstrap and regtesting with no
regressions on both powerpc64le-linux and powerpc64-linux.  Ok for trunk?

Peter

2020-06-15  Peter Bergner  <bergner@linux.ibm.com>

gcc/testsuite/
	* gcc.target/powerpc/mma-builtin-1.c: New test.
	* gcc.target/powerpc/mma-builtin-2.c: New test.
	* gcc.target/powerpc/mma-builtin-3.c: New test.
	* gcc.target/powerpc/mma-builtin-4.c: New test.
	* gcc.target/powerpc/mma-builtin-5.c: New test.
	* gcc.target/powerpc/mma-builtin-6.c: New test.

diff --git a/gcc/testsuite/gcc.target/powerpc/mma-builtin-1.c b/gcc/testsuite/gcc.target/powerpc/mma-builtin-1.c
new file mode 100644
index 00000000000..a971c869095
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/mma-builtin-1.c
@@ -0,0 +1,313 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_future_ok } */
+/* { dg-options "-Wno-psabi -mdejagnu-cpu=future -O2" } */
+
+typedef unsigned char  vec_t __attribute__((vector_size(16)));
+
+void
+foo0 (__vector_quad *dst, vec_t *vec)
+{
+  __vector_quad acc;
+  vec_t vec0 = vec[0];
+  vec_t vec1 = vec[1];
+
+  __builtin_mma_xvi4ger8 (&acc, vec0, vec1);
+  __builtin_mma_xvi4ger8pp (&acc, vec0, vec1);
+  dst[0] = acc;
+}
+
+void
+foo1 (__vector_quad *dst, vec_t *vec)
+{
+  __vector_quad acc;
+  vec_t vec0 = vec[0];
+  vec_t vec1 = vec[1];
+
+  __builtin_mma_xvi8ger4 (&acc, vec0, vec1);
+  __builtin_mma_xvi8ger4pp (&acc, vec0, vec1);
+  __builtin_mma_xvi8ger4spp(&acc, vec0, vec1);
+  dst[1] = acc;
+}
+
+void
+foo2 (__vector_quad *dst, vec_t *vec)
+{
+  __vector_quad acc;
+  vec_t vec0 = vec[0];
+  vec_t vec1 = vec[1];
+
+  __builtin_mma_xvi16ger2 (&acc, vec0, vec1);
+  __builtin_mma_xvi16ger2pp (&acc, vec0, vec1);
+  dst[2] = acc;
+}
+
+void
+foo3 (__vector_quad *dst, vec_t *vec)
+{
+  __vector_quad acc;
+  vec_t vec0 = vec[0];
+  vec_t vec1 = vec[1];
+
+  __builtin_mma_xvi16ger2s (&acc, vec0, vec1);
+  __builtin_mma_xvi16ger2spp (&acc, vec0, vec1);
+  dst[3] = acc;
+}
+
+void
+foo4 (__vector_quad *dst, vec_t *vec)
+{
+  __vector_quad acc;
+  vec_t vec0 = vec[0];
+  vec_t vec1 = vec[1];
+
+  __builtin_mma_xvf16ger2 (&acc, vec0, vec1);
+  __builtin_mma_xvf16ger2pp (&acc, vec0, vec1);
+  __builtin_mma_xvf16ger2pn (&acc, vec0, vec1);
+  dst[4] = acc;
+}
+
+void
+foo4b (__vector_quad *dst, __vector_quad *src, vec_t *vec)
+{
+  __vector_quad acc;
+  vec_t vec0 = vec[0];
+  vec_t vec1 = vec[1];
+
+  acc = src[0];
+  __builtin_mma_xvf16ger2np (&acc, vec0, vec1);
+  __builtin_mma_xvf16ger2nn (&acc, vec0, vec1);
+  dst[4] = acc;
+}
+
+void
+foo5 (__vector_quad *dst, vec_t *vec)
+{
+  __vector_quad acc;
+  vec_t vec0 = vec[0];
+  vec_t vec1 = vec[1];
+
+  __builtin_mma_xvbf16ger2 (&acc, vec0, vec1);
+  __builtin_mma_xvbf16ger2pp (&acc, vec0, vec1);
+  __builtin_mma_xvbf16ger2pn (&acc, vec0, vec1);
+  dst[5] = acc;
+}
+
+void
+foo5b (__vector_quad *dst, __vector_quad *src, vec_t *vec)
+{
+  __vector_quad acc;
+  vec_t vec0 = vec[0];
+  vec_t vec1 = vec[1];
+
+  acc = src[0];
+  __builtin_mma_xvbf16ger2np (&acc, vec0, vec1);
+  __builtin_mma_xvbf16ger2nn (&acc, vec0, vec1);
+  dst[5] = acc;
+}
+
+void
+foo6 (__vector_quad *dst, vec_t *vec)
+{
+  __vector_quad acc;
+  vec_t vec0 = vec[0];
+  vec_t vec1 = vec[1];
+
+  __builtin_mma_xvf32ger (&acc, vec0, vec1);
+  __builtin_mma_xvf32gerpp (&acc, vec0, vec1);
+  __builtin_mma_xvf32gerpn (&acc, vec0, vec1);
+  dst[6] = acc;
+}
+
+void
+foo6b (__vector_quad *dst, __vector_quad *src, vec_t *vec)
+{
+  __vector_quad acc;
+  vec_t vec0 = vec[0];
+  vec_t vec1 = vec[1];
+
+  acc = src[0];
+  __builtin_mma_xvf32gernp (&acc, vec0, vec1);
+  __builtin_mma_xvf32gernn (&acc, vec0, vec1);
+  dst[6] = acc;
+}
+
+void
+foo7 (__vector_quad *dst, vec_t *vec)
+{
+  __vector_quad acc;
+  vec_t vec0 = vec[0];
+  vec_t vec1 = vec[1];
+
+  __builtin_mma_pmxvi4ger8 (&acc, vec0, vec1, 15, 15, 255);
+  __builtin_mma_pmxvi4ger8pp (&acc, vec0, vec1, 15, 15, 255);
+  dst[7] = acc;
+}
+
+void
+foo8 (__vector_quad *dst, vec_t *vec)
+{
+  __vector_quad acc;
+  vec_t vec0 = vec[0];
+  vec_t vec1 = vec[1];
+
+  __builtin_mma_pmxvi8ger4 (&acc, vec0, vec1, 15, 15, 15);
+  __builtin_mma_pmxvi8ger4pp (&acc, vec0, vec1, 15, 15, 15);
+  __builtin_mma_pmxvi8ger4spp(&acc, vec0, vec1, 15, 15, 15);
+  dst[8] = acc;
+}
+
+void
+foo9 (__vector_quad *dst, vec_t *vec)
+{
+  __vector_quad acc;
+  vec_t vec0 = vec[0];
+  vec_t vec1 = vec[1];
+
+  __builtin_mma_pmxvi16ger2 (&acc, vec0, vec1, 15, 15, 3);
+  __builtin_mma_pmxvi16ger2pp (&acc, vec0, vec1, 15, 15, 3);
+  dst[9] = acc;
+}
+
+void
+foo10 (__vector_quad *dst, vec_t *vec)
+{
+  __vector_quad acc;
+  vec_t vec0 = vec[0];
+  vec_t vec1 = vec[1];
+
+  __builtin_mma_pmxvi16ger2s (&acc, vec0, vec1, 15, 15, 3);
+  __builtin_mma_pmxvi16ger2spp (&acc, vec0, vec1, 15, 15, 3);
+  dst[10] = acc;
+}
+
+void
+foo11 (__vector_quad *dst, vec_t *vec)
+{
+  __vector_quad acc;
+  vec_t vec0 = vec[0];
+  vec_t vec1 = vec[1];
+
+  __builtin_mma_pmxvf16ger2 (&acc, vec0, vec1, 15, 15, 3);
+  __builtin_mma_pmxvf16ger2pp (&acc, vec0, vec1, 15, 15, 3);
+  __builtin_mma_pmxvf16ger2pn (&acc, vec0, vec1, 15, 15, 3);
+  dst[11] = acc;
+}
+
+void
+foo11b (__vector_quad *dst, __vector_quad *src, vec_t *vec)
+{
+  __vector_quad acc;
+  vec_t vec0 = vec[0];
+  vec_t vec1 = vec[1];
+
+  acc = src[0];
+  __builtin_mma_pmxvf16ger2np (&acc, vec0, vec1, 15, 15, 3);
+  __builtin_mma_pmxvf16ger2nn (&acc, vec0, vec1, 15, 15, 3);
+  dst[11] = acc;
+}
+
+void
+foo12 (__vector_quad *dst, vec_t *vec)
+{
+  __vector_quad acc;
+  vec_t vec0 = vec[0];
+  vec_t vec1 = vec[1];
+
+  __builtin_mma_pmxvbf16ger2 (&acc, vec0, vec1, 15, 15, 3);
+  __builtin_mma_pmxvbf16ger2pp (&acc, vec0, vec1, 15, 15, 3);
+  __builtin_mma_pmxvbf16ger2pn (&acc, vec0, vec1, 15, 15, 3);
+  dst[12] = acc;
+}
+
+void
+foo12b (__vector_quad *dst, __vector_quad *src, vec_t *vec)
+{
+  __vector_quad acc;
+  vec_t vec0 = vec[0];
+  vec_t vec1 = vec[1];
+
+  acc = src[0];
+  __builtin_mma_pmxvbf16ger2np (&acc, vec0, vec1, 15, 15, 3);
+  __builtin_mma_pmxvbf16ger2nn (&acc, vec0, vec1, 15, 15, 3);
+  dst[12] = acc;
+}
+
+void
+foo13 (__vector_quad *dst, vec_t *vec)
+{
+  __vector_quad acc;
+  vec_t vec0 = vec[0];
+  vec_t vec1 = vec[1];
+
+  __builtin_mma_pmxvf32ger (&acc, vec0, vec1, 15, 15);
+  __builtin_mma_pmxvf32gerpp (&acc, vec0, vec1, 15, 15);
+  __builtin_mma_pmxvf32gerpn (&acc, vec0, vec1, 15, 15);
+  dst[13] = acc;
+}
+
+void
+foo13b (__vector_quad *dst, __vector_quad *src, vec_t *vec)
+{
+  __vector_quad acc;
+  vec_t vec0 = vec[0];
+  vec_t vec1 = vec[1];
+
+  acc = src[0];
+  __builtin_mma_pmxvf32gernp (&acc, vec0, vec1, 15, 15);
+  __builtin_mma_pmxvf32gernn (&acc, vec0, vec1, 15, 15);
+  dst[13] = acc;
+}
+
+/* { dg-final { scan-assembler-times {\mlxv\M} 40 } } */
+/* { dg-final { scan-assembler-times {\mlxvp\M} 12 } } */
+/* { dg-final { scan-assembler-times {\mstxvp\M} 40 } } */
+/* { dg-final { scan-assembler-times {\mxxmfacc\M} 20 } } */
+/* { dg-final { scan-assembler-times {\mxxmtacc\M} 6 } } */
+/* { dg-final { scan-assembler-times {\mxvbf16ger2\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvbf16ger2nn\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvbf16ger2np\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvbf16ger2pn\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvbf16ger2pp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvf16ger2\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvf16ger2nn\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvf16ger2np\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvf16ger2pn\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvf16ger2pp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvf32ger\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvf32gernn\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvf32gernp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvf32gerpn\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvf32gerpp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvi16ger2\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvi16ger2pp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvi16ger2s\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvi16ger2spp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvi4ger8\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvi4ger8pp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvi8ger4\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvi8ger4pp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvi8ger4spp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvbf16ger2\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvbf16ger2nn\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvbf16ger2np\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvbf16ger2pn\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvbf16ger2pp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvf16ger2\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvf16ger2nn\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvf16ger2np\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvf16ger2pn\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvf16ger2pp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvf32ger\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvf32gernn\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvf32gernp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvf32gerpn\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvf32gerpp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvi16ger2\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvi16ger2pp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvi16ger2s\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvi16ger2spp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvi4ger8\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvi4ger8pp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvi8ger4\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvi8ger4pp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvi8ger4spp\M} 1 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/mma-builtin-2.c b/gcc/testsuite/gcc.target/powerpc/mma-builtin-2.c
new file mode 100644
index 00000000000..cb8b30dd992
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/mma-builtin-2.c
@@ -0,0 +1,72 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_future_ok } */
+/* { dg-options "-Wno-psabi -mdejagnu-cpu=future -O2" } */
+
+typedef unsigned char  vec_t __attribute__((vector_size(16)));
+
+void
+foo0 (__vector_quad *dst, vec_t *vec, __vector_pair *pvecp)
+{
+  __vector_quad acc;
+  __vector_pair vecp0 = *pvecp;
+  vec_t vec1 = vec[1];
+
+  __builtin_mma_xvf64ger (&acc, vecp0, vec1);
+  __builtin_mma_xvf64gerpp (&acc, vecp0, vec1);
+  __builtin_mma_xvf64gerpn (&acc, vecp0, vec1);
+  dst[0] = acc;
+}
+
+void
+foo1 (__vector_quad *dst, __vector_quad *src, vec_t *vec, __vector_pair *pvecp)
+{
+  __vector_quad acc;
+  __vector_pair vecp0 = *pvecp;
+  vec_t vec1 = vec[1];
+
+  acc = src[0];
+  __builtin_mma_xvf64gernp (&acc, vecp0, vec1);
+  __builtin_mma_xvf64gernn (&acc, vecp0, vec1);
+  dst[0] = acc;
+}
+
+void
+foo2 (__vector_quad *dst, vec_t *vec, __vector_pair *pvecp)
+{
+  __vector_quad acc;
+  __vector_pair vecp0 = *pvecp;
+  vec_t vec1 = vec[1];
+  __builtin_mma_pmxvf64ger (&acc, vecp0, vec1, 15, 3);
+  __builtin_mma_pmxvf64gerpp (&acc, vecp0, vec1, 15, 3);
+  __builtin_mma_pmxvf64gerpn (&acc, vecp0, vec1, 15, 3);
+  dst[1] = acc;
+}
+
+void
+foo3 (__vector_quad *dst, __vector_quad *src, vec_t *vec, __vector_pair *pvecp)
+{
+  __vector_quad acc;
+  __vector_pair vecp0 = *pvecp;
+  vec_t vec1 = vec[1];
+
+  acc = src[0];
+  __builtin_mma_pmxvf64gernp (&acc, vecp0, vec1, 15, 3);
+  __builtin_mma_pmxvf64gernn (&acc, vecp0, vec1, 15, 3);
+  dst[1] = acc;
+}
+
+/* { dg-final { scan-assembler-times {\mxxmfacc\M} 4 } } */
+/* { dg-final { scan-assembler-times {\mxxmtacc\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mlxv\M} 4 } } */
+/* { dg-final { scan-assembler-times {\mlxvp\M} 8 } } */
+/* { dg-final { scan-assembler-times {\mstxvp\M} 8 } } */
+/* { dg-final { scan-assembler-times {\mxvf64ger\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvf64gerpp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvf64gerpn\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvf64gernp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvf64gernn\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvf64ger\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvf64gerpp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvf64gerpn\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvf64gernp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvf64gernn\M} 1 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/mma-builtin-3.c b/gcc/testsuite/gcc.target/powerpc/mma-builtin-3.c
new file mode 100644
index 00000000000..5406707061e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/mma-builtin-3.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_future_ok } */
+/* { dg-options "-Wno-psabi -mdejagnu-cpu=future -O2" } */
+
+void
+foo0 (void)
+{
+  __vector_quad acc;
+  asm ("#..." : "=d" (acc));
+  __builtin_mma_xxmtacc (&acc);
+  __builtin_mma_xxmfacc (&acc);
+  asm ("#..." :: "d" (acc));
+}
+
+typedef unsigned char  vec_t __attribute__((vector_size(16)));
+
+void
+foo1 (vec_t *vec)
+{
+  vec[1] = __builtin_vsx_xvcvspbf16 (vec[0]);
+  vec[3] = __builtin_vsx_xvcvbf16sp (vec[2]);
+}
+
+/* { dg-final { scan-assembler-times {\mxxmtacc\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxxmfacc\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mlxv\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mstxv\M} 2 } } */
+/* { dg-final { scan-assembler-not {\mlxvp\M} } } */
+/* { dg-final { scan-assembler-not {\mstxvp\M} } } */
+/* { dg-final { scan-assembler-times {\mxvcvspbf16\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvcvbf16sp\M} 1 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/mma-builtin-4.c b/gcc/testsuite/gcc.target/powerpc/mma-builtin-4.c
new file mode 100644
index 00000000000..138d1b46bc4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/mma-builtin-4.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_future_ok } */
+/* { dg-options "-Wno-psabi -mdejagnu-cpu=future -O2" } */
+
+typedef unsigned char vec_t __attribute__((vector_size(16)));
+
+void
+foo (__vector_pair *dst, vec_t *src)
+{
+  __vector_pair pair;
+  __builtin_mma_assemble_pair (&pair, src[0], src[4]);
+  *dst = pair;
+}
+
+void
+bar (vec_t *dst, __vector_pair *src)
+{
+  vec_t res[2];
+  __builtin_mma_disassemble_pair (res, src);
+  dst[0] = res[0];
+  dst[4] = res[1];
+}
+
+/* { dg-final { scan-assembler-times {\mlxv\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mlxvp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mstxv\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mstxvp\M} 1 } } */
+
diff --git a/gcc/testsuite/gcc.target/powerpc/mma-builtin-5.c b/gcc/testsuite/gcc.target/powerpc/mma-builtin-5.c
new file mode 100644
index 00000000000..0ee45b6bdfd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/mma-builtin-5.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_future_ok } */
+/* { dg-options "-Wno-psabi -mdejagnu-cpu=future -O2" } */
+
+typedef unsigned char vec_t __attribute__((vector_size(16)));
+
+void
+foo (__vector_quad *dst, vec_t *src)
+{
+  __vector_quad acc;
+  __builtin_mma_assemble_acc (&acc, src[0], src[4], src[8], src[12]);
+  *dst = acc;
+}
+
+void
+bar (vec_t *dst, __vector_quad *src)
+{
+  vec_t res[4];
+  __builtin_mma_disassemble_acc (res, src);
+  dst[0] = res[0];
+  dst[4] = res[1];
+  dst[8] = res[2];
+  dst[12] = res[3];
+}
+
+/* { dg-final { scan-assembler-times {\mlxv\M} 4 } } */
+/* { dg-final { scan-assembler-times {\mlxvp\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mstxv\M} 4 } } */
+/* { dg-final { scan-assembler-times {\mstxvp\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mxxmfacc\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mxxmtacc\M} 2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/mma-builtin-6.c b/gcc/testsuite/gcc.target/powerpc/mma-builtin-6.c
new file mode 100644
index 00000000000..c0b5eedd3d1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/mma-builtin-6.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_future_ok } */
+/* { dg-options "-Wno-psabi -mdejagnu-cpu=future -O2" } */
+
+void
+foo (__vector_quad *dst)
+{
+  __vector_quad acc;
+  __builtin_mma_xxsetaccz (&acc);
+  *dst = acc;
+}
+
+/* { dg-final { scan-assembler-not {\mlxv\M} } } */
+/* { dg-final { scan-assembler-not {\mlxvp\M} } } */
+/* { dg-final { scan-assembler-not {\mxxmtacc\M} } } */
+/* { dg-final { scan-assembler-times {\mxxsetaccz\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxxmfacc\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mstxvp\M} 2 } } */

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] rs6000: Add base support and types for defining MMA built-ins.
  2020-06-15 19:56 ` [PATCH 1/3] rs6000: Add base support and types for defining MMA built-ins Peter Bergner
@ 2020-06-15 22:43   ` will schmidt
  2020-06-16 18:59     ` Peter Bergner
  0 siblings, 1 reply; 12+ messages in thread
From: will schmidt @ 2020-06-15 22:43 UTC (permalink / raw)
  To: Peter Bergner, Segher Boessenkool
  Cc: Bill Schmidt, GCC Patches, David Edelsohn, Michael Meissner

On Mon, 2020-06-15 at 14:56 -0500, Peter Bergner via Gcc-patches wrote:
> This patch adds the new -mmma option as well as the initial MMA
> support,
> which includes the target specific __vector_pair and __vector_quad
> types,
> the POImode and PXImode partial integer modes they are mapped to, and
> their
> associated  move patterns.  Support for the restrictions on the
> registers
> these modes can be assigned to as also been added.
> 
> This patch passed bootstrap and regtesting with no regressions on
> powerpc64le-linux.  Ok for trunk?
> 
> Peter
> 
> 2020-06-15  Peter Bergner  <bergner@linux.ibm.com>
> 	    Michael Meissner  <meissner@linux.ibm.com>
> 
> gcc/
> 	* config/rs6000/mma.md: New file.
> 	* config/rs6000/rs6000-c.c (rs6000_target_modify_macros):
> Define
> 	__MMA__ for mma.
> 	* config/rs6000/rs6000-call.c (rs6000_init_builtins): Add
> support
> 	for __vector_pair and __vector_quad types.
> 	* config/rs6000/rs6000-cpus.def (OTHER_FUTURE_MASKS): Add
> 	OPTION_MASK_MMA.
> 	(POWERPC_MASKS): Likewise.

Don't see POWERPC_MASKS in the patch here.


> 	* config/rs6000/rs6000-modes.def (OI, XI): New integer modes.
> 	(POI, PXI): New partial integer modes.
> 	* config/rs6000/rs6000.c (TARGET_INVALID_CONVERSION): Define.
> 	(rs6000_hard_regno_nregs_internal): Use VECTOR_ALIGNMENT_P.
> 	(rs6000_hard_regno_mode_ok_uncached): Likewise.
> 	Add support for POImode being allowed in VSX registers and
> PXImode
> 	being allowed in FP registers.
> 	(rs6000_modes_tieable_p): Adjust comment.
> 	Add support for POImode and PXImode.
> 	(rs6000_debug_reg_global) <print_tieable_modes>: Add OImode,
> POImode
> 	XImode and PXImode.
> 	(rs6000_setup_reg_addr_masks): Use VECTOR_ALIGNMENT_P.
> 	Set up appropriate addr_masks for vector pair and vector quad
> addresses.
> 	(rs6000_init_hard_regno_mode_ok): Add support for vector pair
> and
> 	vector quad registers.  Setup reload handlers for POImode and
> PXImode.
> 	(rs6000_builtin_mask_calculate): Add support for RS6000_BTM_MMA
> 	and RS6000_BTM_FUTURE.
> 	(rs6000_option_override_internal): Error if -mmma is specified
> 	without -mcpu=future.
> 	(rs6000_slow_unaligned_access): Use VECTOR_ALIGNMENT_P.
> 	(quad_address_p): Change size test to less than 16 bytes.
> 	(reg_offset_addressing_ok_p): Add support for ISA 3.1 vector
> pair
> 	and vector quad instructions.
> 	(avoiding_indexed_address_p): Likewise.
> 	(rs6000_emit_move): Disallow POImode and PXImode moves
> involving
> 	constants.
> 	(rs6000_preferred_reload_class): Prefer VSX registers for
> POImode
> 	and FP registers for PXImode.
> 	(rs6000_split_multireg_move): Support splitting POImode and
> PXImode
> 	move instructions.  Insert xxmtacc and xxmfacc instructions
> when
> 	setting a PXImode register and reading a PXImode register
> respectively.
> 	(rs6000_mangle_type): Adjust comment.  Add support for mangling
> 	__vector_pair and __vector_quad types.
> 	(rs6000_opt_masks): Add entry for mma.
> 	(rs6000_builtin_mask_names): Add RS6000_BTM_MMA and
> RS6000_BTM_FUTURE.
> 	(rs6000_function_value): Use VECTOR_ALIGNMENT_P.
> 	(address_to_insn_form): Likewise.
> 	(reg_to_non_prefixed): Likewise.
> 	(rs6000_invalid_conversion): New function.
> 	* config/rs6000/rs6000.h (MASK_MMA): Define.
> 	(BIGGEST_ALIGNMENT): Set to 512 if MMA support is enabled.
> 	(VECTOR_ALIGNMENT_P): New helper macro.
> 	(ALTIVEC_VECTOR_MODE): Use VECTOR_ALIGNMENT_P.
> 	(RS6000_BTM_MMA): Define.
> 	(RS6000_BTM_COMMON): Add RS6000_BTM_MMA and RS6000_BTM_FUTURE.
> 	(rs6000_builtin_type_index): Add RS6000_BTI_vector_pair and
> 	RS6000_BTI_vector_quad.
> 	(vector_pair_type_node): Define.
> 	(vector_quad_type_node): Likewise.
> 	* config/rs6000/rs6000.md (define_attr "isa"): Add mma.
> 	(define_attr "enabled"): Handle mma.
> 	(define_mode_iterator RELOAD): Add POI and PXI.
> 	Include mma.md.
> 	* config/rs6000/t-rs6000 (MD_INCLUDES): Add mma.md.
> 	* config/rs6000/rs6000.opt (-mmma): New.
> 	* doc/invoke.texi: Document -mmma.

The rest of the Changelog looks to match the contents.   ok.


> 
> diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
> new file mode 100644
> index 00000000000..b35a15a2be1
> --- /dev/null
> +++ b/gcc/config/rs6000/mma.md
> @@ -0,0 +1,128 @@
> +;; Vector Quad, Vector Pair, and MMA patterns.
> +;; Copyright (C) 2020 Free Software Foundation, Inc.
> +;; Contributed by Peter Bergner <bergner@linux.ibm.com> and
> +;;		  Michael Meissner <meissner@linux.ibm.com>
> +
> +;; This file is part of GCC.
> +
> +;; GCC is free software; you can redistribute it and/or modify it
> +;; under the terms of the GNU General Public License as published
> +;; by the Free Software Foundation; either version 3, or (at your
> +;; option) any later version.
> +
> +;; GCC is distributed in the hope that it will be useful, but
> WITHOUT
> +;; ANY WARRANTY; without even the implied warranty of
> MERCHANTABILITY
> +;; or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
> +;; License for more details.
> +
> +;; You should have received a copy of the GNU General Public License
> +;; along with GCC; see the file COPYING3.  If not see
> +;; <http://www.gnu.org/licenses/>.
> +
> +;; Vector load/store pair operations

Probably clear later on.  First blush and first pass a blurb here to
clarify MMA, and what the modes are may be useful.

The subsection paragraph from the extend.texi may be a good fit).


> +;; We need to define an OImode move pattern, even though we don't
> enable it,
> +;; because the machine independent parts of the compiler at times
> uses the
> +;; large integer modes.
> +;;
> +;; If we enable movoi, the compiler will try and use
> it.  Unfortunately, if it
> +;; is enabled, it will cause problems on little endian systems with
> code that
> +;; uses the vector_size attribute, due to endian issues.

So, maybe rearrange as two lines?

Define a (disabled) OImode move pattern so the machine independent
parts of the compare can use the large integer modes.
FIXME: If the OImove pattern is enabled, LE systems will have problems
with the vector_size attribute.


> +(define_expand "movoi"
> +  [(set (match_operand:OI 0 "nonimmediate_operand")
> +	(match_operand:OI 1 "input_operand"))]
> +  "0"
> +{
> +  gcc_unreachable ();
> +})

Is it the "0" or the _unreachable() that 'disables' this? 


I've read through the rest of this, nothing else jumped out at me. 

thanks
-Will




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] rs6000: Add MMA built-in function definitions
  2020-06-15 19:58 ` [PATCH 2/3] rs6000: Add MMA built-in function definitions Peter Bergner
@ 2020-06-15 22:43   ` will schmidt
  2020-06-16 19:02     ` Peter Bergner
  0 siblings, 1 reply; 12+ messages in thread
From: will schmidt @ 2020-06-15 22:43 UTC (permalink / raw)
  To: Peter Bergner, Segher Boessenkool
  Cc: Bill Schmidt, GCC Patches, David Edelsohn, Michael Meissner

On Mon, 2020-06-15 at 14:58 -0500, Peter Bergner via Gcc-patches wrote:
> This patches adds the actual MMA built-ins.  The MMA accumulators are
> INOUT
> operands for most MMA instructions, but they are also very expensive
> to
> move around.  For this reason, we have implemented a built-in API
> where the accumulators are passed using pass-by-reference/pointers,
> so
> the user won't use one accumulator as input and another as output,
> which would entail a lot of copies.  However, using pointers gives us
> poor code generation when we expand the built-ins at normal expand
> time.
> We therefore expand the MMA built-ins early into gimple, converting
> the pass-by-reference calls to an internal built-in that uses pass-
> by-value
> calling convention, where we can enforce the input and output
> accumulators
> are the same.  This gives us much better code generation.
> 
> The associated test cases for these built-ins are in patch3.
> 
> This patch plus patch1 passed bootstrap and regtesting with no
> regressions
> on both powerpc64le-linux and powerpc64-linux.  Ok for trunk?
> 
> Peter
> 
> 2020-06-15  Peter Bergner  <bergner@linux.ibm.com>
> 
> gcc/
> 	* config/rs6000/predicates.md (mma_input_operand): New
> predicate.
> 	* config/rs6000/rs6000-builtin.def (BU_MMA_1, BU_MMA_V2,
> BU_MMA_3,
> 	BU_MMA_5, BU_MMA_6, BU_VSX_1): Add support macros for defining
> MMA
> 	built-in functions.
> 	(ASSEMBLE_ACC, ASSEMBLE_PAIR, DISASSEMBLE_ACC,
> DISASSEMBLE_PAIR,
> 	PMXVBF16GER2, PMXVBF16GER2NN, PMXVBF16GER2NP, PMXVBF16GER2PN,
> 	PMXVBF16GER2PP, PMXVF16GER2, PMXVF16GER2NN, PMXVF16GER2NP,
> 	PMXVF16GER2PN, PMXVF16GER2PP, PMXVF32GER, PMXVF32GERNN,
> 	PMXVF32GERNP, PMXVF32GERPN, PMXVF32GERPP, PMXVF64GER,
> PMXVF64GERNN,
> 	PMXVF64GERNP, PMXVF64GERPN, PMXVF64GERPP, PMXVI16GER2,
> PMXVI16GER2PP,
> 	PMXVI16GER2S, PMXVI16GER2SPP, PMXVI4GER8, PMXVI4GER8PP,
> PMXVI8GER4,
> 	PMXVI8GER4PP, PMXVI8GER4SPP, XVBF16GER2, XVBF16GER2NN,
> XVBF16GER2NP,
> 	XVBF16GER2PN, XVBF16GER2PP, XVCVBF16SP, XVCVSPBF16, XVF16GER2,
> 	XVF16GER2NN, XVF16GER2NP, XVF16GER2PN, XVF16GER2PP, XVF32GER,
> 	XVF32GERNN, XVF32GERNP, XVF32GERPN, XVF32GERPP, XVF64GER,
> XVF64GERNN,
> 	XVF64GERNP, XVF64GERPN, XVF64GERPP, XVI16GER2, XVI16GER2PP,
> XVI16GER2S,
> 	XVI16GER2SPP, XVI4GER8, XVI4GER8PP, XVI8GER4, XVI8GER4PP,
> XVI8GER4SPP,
> 	XXMFACC, XXMTACC, XXSETACCZ): Add MMA built-ins.

checked noses, all have been found below. 

> 	* config/rs6000/rs6000.c (rs6000_emit_move): Allow zero
> constants.
> 	(print_operand) <case 'A'>: New output modifier.
> 	(rs6000_split_multireg_move): Add support for inserting
> accumulator
> 	priming and depriming instructions.  Add support for splitting
> an
> 	assemble accumulator pattern.
> 	* config/rs6000/rs6000-call.c (mma_init_builtins,
> mma_expand_builtin,
> 	rs6000_gimple_fold_mma_builtin): New functions.
> 	(RS6000_BUILTIN_M): New macro.
> 	(def_builtin): Handle RS6000_BTC_QUAD and RS6000_BTC_PAIR
> attributes.
> 	(bdesc_mma): Add new MMA built-in support.
> 	(htm_expand_builtin): Use RS6000_BTC_OPND_MASK.
> 	(rs6000_invalid_builtin): Add handling of RS6000_BTM_FUTURE and
> 	RS6000_BTM_MMA.
> 	(rs6000_builtin_valid_without_lhs): Handle RS6000_BTC_VOID
> attribute.
> 	(rs6000_gimple_fold_builtin): Call
> rs6000_builtin_is_supported_p
> 	and rs6000_gimple_fold_mma_builtin.
> 	(rs6000_expand_builtin): Call mma_expand_builtin.
> 	Use RS6000_BTC_OPND_MASK.
> 	(rs6000_init_builtins): Adjust comment.  Call
> mma_init_builtins.
> 	(htm_init_builtins): Use RS6000_BTC_OPND_MASK.
> 	(builtin_function_type): Handle VSX_BUILTIN_XVCVSPBF16 and
> 	VSX_BUILTIN_XVCVBF16SP.
> 	* config/rs6000/rs6000.h (RS6000_BTC_QUINARY,
> RS6000_BTC_SENARY,
> 	RS6000_BTC_OPND_MASK, RS6000_BTC_QUAD, RS6000_BTC_PAIR,
> 	RS6000_BTC_QUADPAIR, RS6000_BTC_GIMPLE): New defines.
> 	(RS6000_BTC_PREDICATE, RS6000_BTC_ABS, RS6000_BTC_DST,
> 	RS6000_BTC_TYPE_MASK, RS6000_BTC_ATTR_MASK): Adjust values.
> 	* config/rs6000/mma.md (MAX_MMA_OPERANDS): New define_constant.
> 	(UNSPEC_MMA_ASSEMBLE_ACC, UNSPEC_MMA_PMXVBF16GER2,
> 	UNSPEC_MMA_PMXVBF16GER2NN, UNSPEC_MMA_PMXVBF16GER2NP,
> 	UNSPEC_MMA_PMXVBF16GER2PN, UNSPEC_MMA_PMXVBF16GER2PP,
> 	UNSPEC_MMA_PMXVF16GER2, UNSPEC_MMA_PMXVF16GER2NN,
> 	UNSPEC_MMA_PMXVF16GER2NP, UNSPEC_MMA_PMXVF16GER2PN,
> 	UNSPEC_MMA_PMXVF16GER2PP, UNSPEC_MMA_PMXVF32GER,
> 	UNSPEC_MMA_PMXVF32GERNN, UNSPEC_MMA_PMXVF32GERNP,
> 	UNSPEC_MMA_PMXVF32GERPN, UNSPEC_MMA_PMXVF32GERPP,
> 	UNSPEC_MMA_PMXVF64GER, UNSPEC_MMA_PMXVF64GERNN,
> 	UNSPEC_MMA_PMXVF64GERNP, UNSPEC_MMA_PMXVF64GERPN,
> 	UNSPEC_MMA_PMXVF64GERPP, UNSPEC_MMA_PMXVI16GER2,
> 	UNSPEC_MMA_PMXVI16GER2PP, UNSPEC_MMA_PMXVI16GER2S,
> 	UNSPEC_MMA_PMXVI16GER2SPP, UNSPEC_MMA_PMXVI4GER8,
> 	UNSPEC_MMA_PMXVI4GER8PP, UNSPEC_MMA_PMXVI8GER4,
> 	UNSPEC_MMA_PMXVI8GER4PP, UNSPEC_MMA_PMXVI8GER4SPP,
> 	UNSPEC_MMA_XVBF16GER2, UNSPEC_MMA_XVBF16GER2NN,
> 	UNSPEC_MMA_XVBF16GER2NP, UNSPEC_MMA_XVBF16GER2PN,
> 	UNSPEC_MMA_XVBF16GER2PP, UNSPEC_MMA_XVF16GER2,
> UNSPEC_MMA_XVF16GER2NN,
> 	UNSPEC_MMA_XVF16GER2NP, UNSPEC_MMA_XVF16GER2PN,
> UNSPEC_MMA_XVF16GER2PP,
> 	UNSPEC_MMA_XVF32GER, UNSPEC_MMA_XVF32GERNN,
> UNSPEC_MMA_XVF32GERNP,
> 	UNSPEC_MMA_XVF32GERPN, UNSPEC_MMA_XVF32GERPP,
> UNSPEC_MMA_XVF64GER,
> 	UNSPEC_MMA_XVF64GERNN, UNSPEC_MMA_XVF64GERNP,
> UNSPEC_MMA_XVF64GERPN,
> 	UNSPEC_MMA_XVF64GERPP, UNSPEC_MMA_XVI16GER2,
> UNSPEC_MMA_XVI16GER2PP,
> 	UNSPEC_MMA_XVI16GER2S, UNSPEC_MMA_XVI16GER2SPP,
> UNSPEC_MMA_XVI4GER8,
> 	UNSPEC_MMA_XVI4GER8PP, UNSPEC_MMA_XVI8GER4,
> UNSPEC_MMA_XVI8GER4PP,
> 	UNSPEC_MMA_XVI8GER4SPP, UNSPEC_MMA_XXMFACC,
> UNSPEC_MMA_XXMTACC): New.

ok

> 	(MMA_ACC, MMA_VV, MMA_AVV, MMA_PV, MMA_APV, MMA_VVI4I4I8,
> 	MMA_AVVI4I4I8, MMA_VVI4I4I2, MMA_AVVI4I4I2, MMA_VVI4I4,
> 	MMA_AVVI4I4, MMA_PVI4I2, MMA_APVI4I2, MMA_VVI4I4I4,
> 	MMA_AVVI4I4I4): New define_int_iterator.
> 	(acc, vv, avv, pv, apv, vvi4i4i8, avvi4i4i8, vvi4i4i2,
> 	avvi4i4i2, vvi4i4, avvi4i4, pvi4i2, apvi4i2, vvi4i4i4,
> 	avvi4i4i4): New define_int_attr.
> 	(*movpxi): Add zero constant alternative.
> 	(mma_assemble_pair, mma_assemble_acc): New define_expand.
> 	(*mma_assemble_acc): New define_insn_and_split.
> 	(mma_<acc>, mma_xxsetaccz, mma_<vv>, mma_<avv>, mma_<pv>,
> mma_<apv>,
> 	mma_<vvi4i4i8>, mma_<avvi4i4i8>, mma_<vvi4i4i2>,
> mma_<avvi4i4i2>,
> 	mma_<vvi4i4>, mma_<avvi4i4>, mma_<pvi4i2>, mma_<apvi4i2>,
> 	mma_<vvi4i4i4>, mma_<avvi4i4i4>): New define_insn.
> 	* config/rs6000/rs6000.md ('type' attribute): Add mma type.

(mma) : New 'type' attribute.




> 	* config/rs6000/vsx.md (UNSPEC_VSX_XVCVBF16SP): New.
> 	(UNSPEC_VSX_XVCVSPBF16): Likewise.
> 	(XVCVBF16): New define_int_iterator.
> 	(xvcvbf16): New define_int_attr.
> 	(vsx_<xvcvbf16>): New define_insn.
> 	* doc/extend.texi: Document the mma built-ins.
> 



I've read through the rest of this patch.  nothing else jumps out at
me. 

Thanks,
-Will

<snip>




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] rs6000: Add testsuite test cases for MMA built-ins.
  2020-06-15 19:59 ` [PATCH 3/3] rs6000: Add testsuite test cases for MMA built-ins Peter Bergner
@ 2020-06-15 22:43   ` will schmidt
  0 siblings, 0 replies; 12+ messages in thread
From: will schmidt @ 2020-06-15 22:43 UTC (permalink / raw)
  To: Peter Bergner, Segher Boessenkool
  Cc: Bill Schmidt, GCC Patches, David Edelsohn, Michael Meissner

On Mon, 2020-06-15 at 14:59 -0500, Peter Bergner via Gcc-patches wrote:
> This patch adds the testsuite test cases for all of the MMA built-
> ins.
> 
> This patch plus patch1 and patch2 passed bootstrap and regtesting
> with no
> regressions on both powerpc64le-linux and powerpc64-linux.  Ok for
> trunk?
> 
> Peter
> 
> 2020-06-15  Peter Bergner  <bergner@linux.ibm.com>
> 
> gcc/testsuite/
> 	* gcc.target/powerpc/mma-builtin-1.c: New test.
> 	* gcc.target/powerpc/mma-builtin-2.c: New test.
> 	* gcc.target/powerpc/mma-builtin-3.c: New test.
> 	* gcc.target/powerpc/mma-builtin-4.c: New test.
> 	* gcc.target/powerpc/mma-builtin-5.c: New test.
> 	* gcc.target/powerpc/mma-builtin-6.c: New test.
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/mma-builtin-1.c
> b/gcc/testsuite/gcc.target/powerpc/mma-builtin-1.c
> new file mode 100644
> index 00000000000..a971c869095
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/mma-builtin-1.c
> @@ -0,0 +1,313 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_future_ok } */
> +/* { dg-options "-Wno-psabi -mdejagnu-cpu=future -O2" } */
> +


Compared the tests with scan-assembler stanzas, tests look reasonable.

Per the previous patches, the -mma option comes with -mcpu=future
option automatically, so thats good.
I think it would be good to have an additional test or two to verify
the -mma and -mno-mma options behave as desired, but that could be a
later add-on.

otherwise lgtm. 

Thanks
-Will




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] rs6000: Add base support and types for defining MMA built-ins.
  2020-06-15 22:43   ` will schmidt
@ 2020-06-16 18:59     ` Peter Bergner
  2020-06-16 21:32       ` will schmidt
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Bergner @ 2020-06-16 18:59 UTC (permalink / raw)
  To: will schmidt
  Cc: Segher Boessenkool, Bill Schmidt, GCC Patches, David Edelsohn,
	Michael Meissner

On 6/15/20 5:43 PM, will schmidt wrote:
> On Mon, 2020-06-15 at 14:56 -0500, Peter Bergner via Gcc-patches wrote:
>> 	* config/rs6000/rs6000-cpus.def (OTHER_FUTURE_MASKS): Add
>> 	OPTION_MASK_MMA.
>> 	(POWERPC_MASKS): Likewise.
> 
> Don't see POWERPC_MASKS in the patch here.

It's this hunk:

 /* Support for a future processor's features.  */
@@ -132,6 +133,7 @@
 				 | OPTION_MASK_HTM			\
 				 | OPTION_MASK_ISEL			\
 				 | OPTION_MASK_MFCRF			\
+				 | OPTION_MASK_MMA			\
 				 | OPTION_MASK_MODULO			\
 				 | OPTION_MASK_MULHW			\
 				 | OPTION_MASK_NO_UPDATE		\




>> +;; Vector load/store pair operations
> 
> Probably clear later on.  First blush and first pass a blurb here to
> clarify MMA, and what the modes are may be useful.
> 
> The subsection paragraph from the extend.texi may be a good fit).
[snip]
>> +;; We need to define an OImode move pattern, even though we don't
>> enable it,
>> +;; because the machine independent parts of the compiler at times
>> uses the
>> +;; large integer modes.
>> +;;
>> +;; If we enable movoi, the compiler will try and use
>> it.  Unfortunately, if it
>> +;; is enabled, it will cause problems on little endian systems with
>> code that
>> +;; uses the vector_size attribute, due to endian issues.
> 
> So, maybe rearrange as two lines?
> 
> Define a (disabled) OImode move pattern so the machine independent
> parts of the compare can use the large integer modes.
> FIXME: If the OImove pattern is enabled, LE systems will have problems
> with the vector_size attribute.

Ok, I'll take a stab at rewording this.


>> +(define_expand "movoi"
>> +  [(set (match_operand:OI 0 "nonimmediate_operand")
>> +	(match_operand:OI 1 "input_operand"))]
>> +  "0"
>> +{
>> +  gcc_unreachable ();
>> +})
> 
> Is it the "0" or the _unreachable() that 'disables' this? 

It's the "0" condition flag that disables it.  The gcc_unreachable() call
is just used to verify we never do.


Peter


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] rs6000: Add MMA built-in function definitions
  2020-06-15 22:43   ` will schmidt
@ 2020-06-16 19:02     ` Peter Bergner
  2020-06-16 20:30       ` Segher Boessenkool
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Bergner @ 2020-06-16 19:02 UTC (permalink / raw)
  To: will schmidt
  Cc: Segher Boessenkool, Bill Schmidt, GCC Patches, David Edelsohn,
	Michael Meissner

On 6/15/20 5:43 PM, will schmidt wrote:
> checked noses, all have been found below. 

Thanks for verifying!


>> 	* config/rs6000/rs6000.md ('type' attribute): Add mma type.
> 
> (mma) : New 'type' attribute.

I just copied what someone else did, but agree this is more readable.
Will change.



> I've read through the rest of this patch.  nothing else jumps out at me. 
Thanks

Peter

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] rs6000: Add MMA built-in function definitions
  2020-06-16 19:02     ` Peter Bergner
@ 2020-06-16 20:30       ` Segher Boessenkool
  2020-06-16 21:16         ` Peter Bergner
  0 siblings, 1 reply; 12+ messages in thread
From: Segher Boessenkool @ 2020-06-16 20:30 UTC (permalink / raw)
  To: Peter Bergner
  Cc: will schmidt, Bill Schmidt, GCC Patches, David Edelsohn,
	Michael Meissner

Hi!

On Tue, Jun 16, 2020 at 02:02:36PM -0500, Peter Bergner wrote:
> On 6/15/20 5:43 PM, will schmidt wrote:
> >> 	* config/rs6000/rs6000.md ('type' attribute): Add mma type.
> > 
> > (mma) : New 'type' attribute.
> 
> I just copied what someone else did, but agree this is more readable.
> Will change.

We have had before

        * config/rs6000/rs6000.md (define_attr "type"): New type popcnt.

and

        * config/rs6000/rs6000.md ('type' attribute): Add
        veclogical,veccmpfx,vecexts,vecmove insn types.

(Both are fine, double quotes is a teeny bit better).  The thing that
is changed is not named "mma", it is named "type".

"New value "mma"." or similar, maybe.  The important thing is that what
is in () is the thing that is modified, and that you mention the exact
name of the value added (spelled in full).

(Not that it matters much in this case anyway: no one will search the
changelog for "type" or even "\<type\>" or "\<type\>.*):", although,
hrm I did search for that in "git log config/rs6000", but most commits
do not have changelog entries in the commit message yet).


Segher

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] rs6000: Add MMA built-in function definitions
  2020-06-16 20:30       ` Segher Boessenkool
@ 2020-06-16 21:16         ` Peter Bergner
  0 siblings, 0 replies; 12+ messages in thread
From: Peter Bergner @ 2020-06-16 21:16 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: will schmidt, Bill Schmidt, GCC Patches, David Edelsohn,
	Michael Meissner

On 6/16/20 3:30 PM, Segher Boessenkool wrote:
> We have had before
> 
>         * config/rs6000/rs6000.md (define_attr "type"): New type popcnt.
> 
> and
> 
>         * config/rs6000/rs6000.md ('type' attribute): Add
>         veclogical,veccmpfx,vecexts,vecmove insn types.
> 
> (Both are fine, double quotes is a teeny bit better).  The thing that
> is changed is not named "mma", it is named "type".
> 
> "New value "mma"." or similar, maybe.  The important thing is that what
> is in () is the thing that is modified, and that you mention the exact
> name of the value added (spelled in full).

You are right, "mma" was not that thing that was modified.
Yeah, I like: (define_attr "type"): New type mma.
better too.

Peter

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] rs6000: Add base support and types for defining MMA built-ins.
  2020-06-16 18:59     ` Peter Bergner
@ 2020-06-16 21:32       ` will schmidt
  0 siblings, 0 replies; 12+ messages in thread
From: will schmidt @ 2020-06-16 21:32 UTC (permalink / raw)
  To: Peter Bergner
  Cc: Segher Boessenkool, Bill Schmidt, GCC Patches, David Edelsohn,
	Michael Meissner

On Tue, 2020-06-16 at 13:59 -0500, Peter Bergner wrote:
> On 6/15/20 5:43 PM, will schmidt wrote:
> > On Mon, 2020-06-15 at 14:56 -0500, Peter Bergner via Gcc-patches wrote:
> > > 	* config/rs6000/rs6000-cpus.def (OTHER_FUTURE_MASKS): Add
> > > 	OPTION_MASK_MMA.
> > > 	(POWERPC_MASKS): Likewise.
> > 
> > Don't see POWERPC_MASKS in the patch here.
> 
> It's this hunk:
> 
>  /* Support for a future processor's features.  */
> @@ -132,6 +133,7 @@
>  				 | OPTION_MASK_HTM			\
>  				 | OPTION_MASK_ISEL			\
>  				 | OPTION_MASK_MFCRF			\
> +				 | OPTION_MASK_MMA			\
>  				 | OPTION_MASK_MODULO			\
>  				 | OPTION_MASK_MULHW			\
>  				 | OPTION_MASK_NO_UPDATE		\
> 

I see it now, my bad.  (#define POWERPC_MASKS was outside of the diff
context).  :-)


> 
> 
> 
> > > +;; Vector load/store pair operations
> > 
> > Probably clear later on.  First blush and first pass a blurb here to
> > clarify MMA, and what the modes are may be useful.
> > 
> > The subsection paragraph from the extend.texi may be a good fit).
> 
> [snip]
> > > +;; We need to define an OImode move pattern, even though we don't
> > > enable it,
> > > +;; because the machine independent parts of the compiler at times
> > > uses the
> > > +;; large integer modes.
> > > +;;
> > > +;; If we enable movoi, the compiler will try and use
> > > it.  Unfortunately, if it
> > > +;; is enabled, it will cause problems on little endian systems with
> > > code that
> > > +;; uses the vector_size attribute, due to endian issues.
> > 
> > So, maybe rearrange as two lines?
> > 
> > Define a (disabled) OImode move pattern so the machine independent
> > parts of the compare can use the large integer modes.
> > FIXME: If the OImove pattern is enabled, LE systems will have problems
> > with the vector_size attribute.
> 
> Ok, I'll take a stab at rewording this.
> 
> 
> > > +(define_expand "movoi"
> > > +  [(set (match_operand:OI 0 "nonimmediate_operand")
> > > +	(match_operand:OI 1 "input_operand"))]
> > > +  "0"
> > > +{
> > > +  gcc_unreachable ();
> > > +})
> > 
> > Is it the "0" or the _unreachable() that 'disables' this? 
> 
> It's the "0" condition flag that disables it.  The gcc_unreachable() call
> is just used to verify we never do.


Ok,
Thanks,
-Will



> 
> 
> Peter
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-06-16 21:32 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-15 19:54 [PATCH 0/3] rs6000: Add support for Matrix-Multiply Assist (MMA) built-in functions Peter Bergner
2020-06-15 19:56 ` [PATCH 1/3] rs6000: Add base support and types for defining MMA built-ins Peter Bergner
2020-06-15 22:43   ` will schmidt
2020-06-16 18:59     ` Peter Bergner
2020-06-16 21:32       ` will schmidt
2020-06-15 19:58 ` [PATCH 2/3] rs6000: Add MMA built-in function definitions Peter Bergner
2020-06-15 22:43   ` will schmidt
2020-06-16 19:02     ` Peter Bergner
2020-06-16 20:30       ` Segher Boessenkool
2020-06-16 21:16         ` Peter Bergner
2020-06-15 19:59 ` [PATCH 3/3] rs6000: Add testsuite test cases for MMA built-ins Peter Bergner
2020-06-15 22:43   ` will schmidt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).