* [PATCH 1/3, v2] rs6000: Add base support and types for defining MMA built-ins.
2020-06-18 20:42 [PATCH 0/3, v2] rs6000: Add support for Matrix-Multiply Assist (MMA) built-in functions Peter Bergner
@ 2020-06-18 20:44 ` Peter Bergner
2020-06-18 23:44 ` Segher Boessenkool
2020-06-18 20:45 ` [PATCH 2/3, v2] rs6000: Add MMA built-in function definitions Peter Bergner
` (2 subsequent siblings)
3 siblings, 1 reply; 19+ messages in thread
From: Peter Bergner @ 2020-06-18 20:44 UTC (permalink / raw)
To: Segher Boessenkool
Cc: GCC Patches, Bill Schmidt, Michael Meissner, David Edelsohn,
Will Schmidt
Changes since v1:
- Modified verbiage in mma.md per Will's suggestion.
- Modified rs6000_split_multireg_move to correctly handle BE PXImode
and POImode moves.
This patch adds the new -mmma option as well as the initial MMA support,
which includes the target specific __vector_pair and __vector_quad types,
the POImode and PXImode partial integer modes they are mapped to, and their
associated move patterns. Support for the restrictions on the registers
these modes can be assigned to as also been added.
The v1 patch passed bootstrap and regtesting with no regressions on
powerpc64le-linux. This updated patch is bootstrapping and regtesting
on powerpc64le-linux. Ok for trunk if there are no regressions?
Peter
2020-06-18 Peter Bergner <bergner@linux.ibm.com>
Michael Meissner <meissner@linux.ibm.com>
gcc/
* config/rs6000/mma.md: New file.
* config/rs6000/rs6000-c.c (rs6000_target_modify_macros): Define
__MMA__ for mma.
* config/rs6000/rs6000-call.c (rs6000_init_builtins): Add support
for __vector_pair and __vector_quad types.
* config/rs6000/rs6000-cpus.def (OTHER_FUTURE_MASKS): Add
OPTION_MASK_MMA.
(POWERPC_MASKS): Likewise.
* config/rs6000/rs6000-modes.def (OI, XI): New integer modes.
(POI, PXI): New partial integer modes.
* config/rs6000/rs6000.c (TARGET_INVALID_CONVERSION): Define.
(rs6000_hard_regno_nregs_internal): Use VECTOR_ALIGNMENT_P.
(rs6000_hard_regno_mode_ok_uncached): Likewise.
Add support for POImode being allowed in VSX registers and PXImode
being allowed in FP registers.
(rs6000_modes_tieable_p): Adjust comment.
Add support for POImode and PXImode.
(rs6000_debug_reg_global) <print_tieable_modes>: Add OImode, POImode
XImode and PXImode.
(rs6000_setup_reg_addr_masks): Use VECTOR_ALIGNMENT_P.
Set up appropriate addr_masks for vector pair and vector quad addresses.
(rs6000_init_hard_regno_mode_ok): Add support for vector pair and
vector quad registers. Setup reload handlers for POImode and PXImode.
(rs6000_builtin_mask_calculate): Add support for RS6000_BTM_MMA
and RS6000_BTM_FUTURE.
(rs6000_option_override_internal): Error if -mmma is specified
without -mcpu=future.
(rs6000_slow_unaligned_access): Use VECTOR_ALIGNMENT_P.
(quad_address_p): Change size test to less than 16 bytes.
(reg_offset_addressing_ok_p): Add support for ISA 3.1 vector pair
and vector quad instructions.
(avoiding_indexed_address_p): Likewise.
(rs6000_emit_move): Disallow POImode and PXImode moves involving
constants.
(rs6000_preferred_reload_class): Prefer VSX registers for POImode
and FP registers for PXImode.
(rs6000_split_multireg_move): Support splitting POImode and PXImode
move instructions. Insert xxmtacc and xxmfacc instructions when
setting a PXImode register and reading a PXImode register respectively.
(rs6000_mangle_type): Adjust comment. Add support for mangling
__vector_pair and __vector_quad types.
(rs6000_opt_masks): Add entry for mma.
(rs6000_builtin_mask_names): Add RS6000_BTM_MMA and RS6000_BTM_FUTURE.
(rs6000_function_value): Use VECTOR_ALIGNMENT_P.
(address_to_insn_form): Likewise.
(reg_to_non_prefixed): Likewise.
(rs6000_invalid_conversion): New function.
* config/rs6000/rs6000.h (MASK_MMA): Define.
(BIGGEST_ALIGNMENT): Set to 512 if MMA support is enabled.
(VECTOR_ALIGNMENT_P): New helper macro.
(ALTIVEC_VECTOR_MODE): Use VECTOR_ALIGNMENT_P.
(RS6000_BTM_MMA): Define.
(RS6000_BTM_COMMON): Add RS6000_BTM_MMA and RS6000_BTM_FUTURE.
(rs6000_builtin_type_index): Add RS6000_BTI_vector_pair and
RS6000_BTI_vector_quad.
(vector_pair_type_node): Define.
(vector_quad_type_node): Likewise.
* config/rs6000/rs6000.md (define_attr "isa"): Add mma.
(define_attr "enabled"): Handle mma.
(define_mode_iterator RELOAD): Add POI and PXI.
Include mma.md.
* config/rs6000/t-rs6000 (MD_INCLUDES): Add mma.md.
* config/rs6000/rs6000.opt (-mmma): New.
* doc/invoke.texi: Document -mmma.
diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
new file mode 100644
index 00000000000..66c3cb5f2dc
--- /dev/null
+++ b/gcc/config/rs6000/mma.md
@@ -0,0 +1,126 @@
+;; Matrix-Multiply Assist (MMA) patterns.
+;; Copyright (C) 2020 Free Software Foundation, Inc.
+;; Contributed by Peter Bergner <bergner@linux.ibm.com> and
+;; Michael Meissner <meissner@linux.ibm.com>
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published
+;; by the Free Software Foundation; either version 3, or (at your
+;; option) any later version.
+;;
+;; GCC is distributed in the hope that it will be useful, but WITHOUT
+;; ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+;; or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public
+;; License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3. If not see
+;; <http://www.gnu.org/licenses/>.
+
+;; The MMA patterns use the multi-register PXImode and POImode partial
+;; integer modes to implement the target specific __vector_quad and
+;; __vector_pair types that the MMA built-in functions reference.
+;; To use these modes, we must define XImode and OImode move patterns
+;; so the independent parts of the compiler can use our large partial
+;; integer modes. However, if we enable the XImode and OImode move
+;; patterns, then the compiler will attempt to use them and this can
+;; cause byte swapping issues on litte-endian systems. We don't need
+;; the XImode and OImode move patterns for actual code generation,
+;; therefor, we define the XImode and OImode move patterns, but we
+;; disable their use with a "false" condition flag.
+
+;; Define a disabled OImode move pattern, so we can use POImode.
+(define_expand "movoi"
+ [(set (match_operand:OI 0 "nonimmediate_operand")
+ (match_operand:OI 1 "input_operand"))]
+ "0"
+{
+ gcc_unreachable ();
+})
+
+;; Vector pair support. POImode is only defined for vector registers.
+(define_expand "movpoi"
+ [(set (match_operand:POI 0 "nonimmediate_operand")
+ (match_operand:POI 1 "input_operand"))]
+ "TARGET_MMA"
+{
+ rs6000_emit_move (operands[0], operands[1], POImode);
+ DONE;
+})
+
+(define_insn_and_split "*movpoi"
+ [(set (match_operand:POI 0 "nonimmediate_operand" "=wa,m,wa")
+ (match_operand:POI 1 "input_operand" "m,wa,wa"))]
+ "TARGET_MMA
+ && (gpc_reg_operand (operands[0], POImode)
+ || gpc_reg_operand (operands[1], POImode))"
+ "@
+ lxvp%X1 %x0,%1
+ stxvp%X0 %x1,%0
+ #"
+ "&& reload_completed
+ && (!MEM_P (operands[0]) && !MEM_P (operands[1]))"
+ [(const_int 0)]
+{
+ rs6000_split_multireg_move (operands[0], operands[1]);
+ DONE;
+}
+ [(set_attr "type" "vecload,vecstore,veclogical")
+ (set_attr "length" "*,*,8")])
+
+;; Special pattern to prevent DSE from generating an internal error if it
+;; notices a structure copy that it wants to eliminate. This generates pretty
+;; bad code, but at least it doesn't die.
+(define_insn_and_split "truncpoidi2"
+ [(set (match_operand:DI 0 "gpc_reg_operand" "=r")
+ (truncate:DI (match_operand:POI 1 "gpc_reg_operand" "wa")))]
+ "TARGET_MMA"
+ "#"
+ "&& reload_completed"
+ [(set (match_dup 0)
+ (vec_select:DI (match_dup 2)
+ (parallel [(match_dup 3)])))]
+{
+ unsigned r = reg_or_subregno (operands[1]) + !BYTES_BIG_ENDIAN;
+ operands[2] = gen_rtx_REG (V2DImode, r);
+ operands[3] = BYTES_BIG_ENDIAN ? const1_rtx : const0_rtx;
+})
+
+\f
+;; Define a disabled XImode move pattern, so we can use PXImode.
+(define_expand "movxi"
+ [(set (match_operand:XI 0 "nonimmediate_operand")
+ (match_operand:XI 1 "input_operand"))]
+ "0"
+{
+ gcc_unreachable ();
+})
+
+;; Vector quad support. PXImode is only defined for floating point registers.
+(define_expand "movpxi"
+ [(set (match_operand:PXI 0 "nonimmediate_operand")
+ (match_operand:PXI 1 "input_operand"))]
+ "TARGET_MMA"
+{
+ rs6000_emit_move (operands[0], operands[1], PXImode);
+ DONE;
+})
+
+(define_insn_and_split "*movpxi"
+ [(set (match_operand:PXI 0 "nonimmediate_operand" "=d,m,d")
+ (match_operand:PXI 1 "input_operand" "m,d,d"))]
+ "TARGET_MMA
+ && (gpc_reg_operand (operands[0], PXImode)
+ || gpc_reg_operand (operands[1], PXImode))"
+ "#"
+ "&& reload_completed"
+ [(const_int 0)]
+{
+ rs6000_split_multireg_move (operands[0], operands[1]);
+ DONE;
+}
+ [(set_attr "type" "vecload,vecstore,veclogical")
+ (set_attr "length" "8,8,16")
+ (set_attr "max_prefixed_insns" "2,2,*")])
diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
index 07ca33a89b4..47514552449 100644
--- a/gcc/config/rs6000/rs6000-c.c
+++ b/gcc/config/rs6000/rs6000-c.c
@@ -593,6 +593,10 @@ rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT flags,
PROCESSOR_CELL) (e.g. -mcpu=cell). */
if ((bu_mask & RS6000_BTM_CELL) != 0)
rs6000_define_or_undefine_macro (define_p, "__PPU__");
+
+ /* Tell the user if we support the MMA instructions. */
+ if ((flags & OPTION_MASK_MMA) != 0)
+ rs6000_define_or_undefine_macro (define_p, "__MMA__");
}
void
diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index 817a14c9c0d..eeb20e5200d 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -12205,6 +12205,24 @@ rs6000_init_builtins (void)
else
ieee128_float_type_node = ibm128_float_type_node = long_double_type_node;
+ /* Vector paired and vector quad support. */
+ if (TARGET_MMA)
+ {
+ tree oi_uns_type = make_unsigned_type (256);
+ vector_pair_type_node = build_distinct_type_copy (oi_uns_type);
+ SET_TYPE_MODE (vector_pair_type_node, POImode);
+ layout_type (vector_pair_type_node);
+ lang_hooks.types.register_builtin_type (vector_pair_type_node,
+ "__vector_pair");
+
+ tree xi_uns_type = make_unsigned_type (512);
+ vector_quad_type_node = build_distinct_type_copy (xi_uns_type);
+ SET_TYPE_MODE (vector_quad_type_node, PXImode);
+ layout_type (vector_quad_type_node);
+ lang_hooks.types.register_builtin_type (vector_quad_type_node,
+ "__vector_quad");
+ }
+
/* Initialize the modes for builtin_function_type, mapping a machine mode to
tree type node. */
builtin_mode_to_type[QImode][0] = integer_type_node;
@@ -12236,6 +12254,8 @@ rs6000_init_builtins (void)
builtin_mode_to_type[V8HImode][1] = unsigned_V8HI_type_node;
builtin_mode_to_type[V16QImode][0] = V16QI_type_node;
builtin_mode_to_type[V16QImode][1] = unsigned_V16QI_type_node;
+ builtin_mode_to_type[POImode][1] = vector_pair_type_node;
+ builtin_mode_to_type[PXImode][1] = vector_quad_type_node;
tdecl = add_builtin_type ("__bool char", bool_char_type_node);
TYPE_NAME (bool_char_type_node) = tdecl;
diff --git a/gcc/config/rs6000/rs6000-cpus.def b/gcc/config/rs6000/rs6000-cpus.def
index 83362e05b10..667c7ecefb8 100644
--- a/gcc/config/rs6000/rs6000-cpus.def
+++ b/gcc/config/rs6000/rs6000-cpus.def
@@ -76,7 +76,8 @@
| OPTION_MASK_P9_VECTOR)
/* Flags that need to be turned off if -mno-future. */
-#define OTHER_FUTURE_MASKS (OPTION_MASK_PCREL \
+#define OTHER_FUTURE_MASKS (OPTION_MASK_MMA \
+ | OPTION_MASK_PCREL \
| OPTION_MASK_PREFIXED)
/* Support for a future processor's features. */
@@ -132,6 +133,7 @@
| OPTION_MASK_HTM \
| OPTION_MASK_ISEL \
| OPTION_MASK_MFCRF \
+ | OPTION_MASK_MMA \
| OPTION_MASK_MODULO \
| OPTION_MASK_MULHW \
| OPTION_MASK_NO_UPDATE \
diff --git a/gcc/config/rs6000/rs6000-modes.def b/gcc/config/rs6000/rs6000-modes.def
index 5f43cadff80..ddb218b3fba 100644
--- a/gcc/config/rs6000/rs6000-modes.def
+++ b/gcc/config/rs6000/rs6000-modes.def
@@ -82,3 +82,13 @@ VECTOR_MODE (INT, SI, 2); /* V2SI */
for quad memory atomic operations to force getting an even/odd register
combination. */
PARTIAL_INT_MODE (TI, 128, PTI);
+
+/* Define, but don't use the larger integer modes. We need an integer mode
+ defined that is the same size as the vector pair and vector quad modes. */
+
+INT_MODE (OI, 32);
+INT_MODE (XI, 64);
+
+/* Modes used by __vector_pair and __vector_quad. */
+PARTIAL_INT_MODE (OI, 256, POI); /* __vector_pair. */
+PARTIAL_INT_MODE (XI, 512, PXI); /* __vector_quad. */
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 58f5d780603..a0f4991d00a 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -1745,6 +1745,9 @@ static const struct attribute_spec rs6000_attribute_table[] =
#undef TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV_P
#define TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV_P \
rs6000_cannot_substitute_mem_equiv_p
+
+#undef TARGET_INVALID_CONVERSION
+#define TARGET_INVALID_CONVERSION rs6000_invalid_conversion
\f
/* Processor table. */
@@ -1798,7 +1801,7 @@ rs6000_hard_regno_nregs_internal (int regno, machine_mode mode)
128-bit floating point that can go in vector registers, which has VSX
memory addressing. */
if (FP_REGNO_P (regno))
- reg_size = (VECTOR_MEM_VSX_P (mode) || FLOAT128_VECTOR_P (mode)
+ reg_size = (VECTOR_MEM_VSX_P (mode) || VECTOR_ALIGNMENT_P (mode)
? UNITS_PER_VSX_WORD
: UNITS_PER_FP_WORD);
@@ -1821,6 +1824,20 @@ rs6000_hard_regno_mode_ok_uncached (int regno, machine_mode mode)
if (COMPLEX_MODE_P (mode))
mode = GET_MODE_INNER (mode);
+ /* Vector pair modes need even/odd VSX register pairs. Only allow vector
+ registers. We need to allow OImode to have the same registers as POImode,
+ even though we do not enable the move pattern for OImode. */
+ if (mode == POImode || mode == OImode)
+ return (TARGET_MMA && VSX_REGNO_P (regno)
+ && (regno & 1) == 0);
+
+ /* MMA accumulator modes need FPR registers divisible by 4. We need to allow
+ XImode to have the same registers as PXImode, even though we do not enable
+ the move pattern for XImode. */
+ if (mode == PXImode || mode == XImode)
+ return (TARGET_MMA && FP_REGNO_P (regno)
+ && (regno & 3) == 0);
+
/* PTImode can only go in GPRs. Quad word memory operations require even/odd
register combinations, and use PTImode where we need to deal with quad
word memory operations. Don't allow quad words in the argument or frame
@@ -1836,7 +1853,7 @@ rs6000_hard_regno_mode_ok_uncached (int regno, machine_mode mode)
asked for it. */
if (TARGET_VSX && VSX_REGNO_P (regno)
&& (VECTOR_MEM_VSX_P (mode)
- || FLOAT128_VECTOR_P (mode)
+ || VECTOR_ALIGNMENT_P (mode)
|| reg_addr[mode].scalar_in_vmx_p
|| mode == TImode
|| (TARGET_VADDUQM && mode == V1TImode)))
@@ -1846,7 +1863,7 @@ rs6000_hard_regno_mode_ok_uncached (int regno, machine_mode mode)
if (ALTIVEC_REGNO_P (regno))
{
- if (GET_MODE_SIZE (mode) != 16 && !reg_addr[mode].scalar_in_vmx_p)
+ if (GET_MODE_SIZE (mode) < 16 && !reg_addr[mode].scalar_in_vmx_p)
return 0;
return ALTIVEC_REGNO_P (last_regno);
@@ -1862,7 +1879,7 @@ rs6000_hard_regno_mode_ok_uncached (int regno, machine_mode mode)
modes and DImode. */
if (FP_REGNO_P (regno))
{
- if (FLOAT128_VECTOR_P (mode))
+ if (VECTOR_ALIGNMENT_P (mode))
return false;
if (SCALAR_FLOAT_MODE_P (mode)
@@ -1925,15 +1942,19 @@ rs6000_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
GPR registers, and TImode can go in any GPR as well as VSX registers (PR
57744).
+ Similarly, don't allow POImode (vector pair, restricted to even VSX
+ registers) or PXImode (vector quad, restricted to FPR registers divisible
+ by 4) to tie with other modes.
+
Altivec/VSX vector tests were moved ahead of scalar float mode, so that IEEE
128-bit floating point on VSX systems ties with other vectors. */
static bool
rs6000_modes_tieable_p (machine_mode mode1, machine_mode mode2)
{
- if (mode1 == PTImode)
- return mode2 == PTImode;
- if (mode2 == PTImode)
+ if (mode1 == PTImode || mode1 == POImode || mode1 == PXImode)
+ return mode1 == mode2;
+ if (mode2 == PTImode || mode2 == POImode || mode2 == PXImode)
return false;
if (ALTIVEC_OR_VSX_VECTOR_MODE (mode1))
@@ -2206,6 +2227,8 @@ rs6000_debug_reg_global (void)
SDmode,
DDmode,
TDmode,
+ V2SImode,
+ V2SFmode,
V16QImode,
V8HImode,
V4SImode,
@@ -2220,9 +2243,14 @@ rs6000_debug_reg_global (void)
V2DFmode,
V8SFmode,
V4DFmode,
+ OImode,
+ XImode,
+ POImode,
+ PXImode,
CCmode,
CCUNSmode,
CCEQmode,
+ CCFPmode,
};
/* Virtual regs we are interested in. */
@@ -2619,7 +2647,7 @@ rs6000_setup_reg_addr_masks (void)
&& (rc == RELOAD_REG_GPR || rc == RELOAD_REG_FPR)
&& msize <= 8
&& !VECTOR_MODE_P (m2)
- && !FLOAT128_VECTOR_P (m2)
+ && !VECTOR_ALIGNMENT_P (m2)
&& !complex_p
&& (m != E_DFmode || !TARGET_VSX)
&& (m != E_SFmode || !TARGET_P8_VECTOR)
@@ -2675,6 +2703,22 @@ rs6000_setup_reg_addr_masks (void)
addr_mask |= RELOAD_REG_QUAD_OFFSET;
}
+ /* Vector pairs can do both indexed and offset loads if the
+ instructions are enabled, otherwise they can only do offset loads
+ since it will be broken into two vector moves. Vector quads can
+ only do offset loads. */
+ else if ((addr_mask != 0) && TARGET_MMA
+ && (m2 == POImode || m2 == PXImode))
+ {
+ addr_mask |= RELOAD_REG_OFFSET;
+ if (rc == RELOAD_REG_FPR || rc == RELOAD_REG_VMX)
+ {
+ addr_mask |= RELOAD_REG_QUAD_OFFSET;
+ if (m2 == POImode)
+ addr_mask |= RELOAD_REG_INDEXED;
+ }
+ }
+
/* VMX registers can do (REG & -16) and ((REG+REG) & -16)
addressing on 128-bit types. */
if (rc == RELOAD_REG_VMX && msize == 16
@@ -2876,6 +2920,18 @@ rs6000_init_hard_regno_mode_ok (bool global_init_p)
rs6000_vector_align[TImode] = align64;
}
+ /* Add support for vector pairs and vector quad registers. */
+ if (TARGET_MMA)
+ {
+ for (m = 0; m < NUM_MACHINE_MODES; ++m)
+ if (m == POImode || m == PXImode)
+ {
+ rs6000_vector_unit[m] = VECTOR_NONE;
+ rs6000_vector_mem[m] = VECTOR_VSX;
+ rs6000_vector_align[m] = (m == POImode) ? 256 : 512;
+ }
+ }
+
/* Register class constraints for the constraints that depend on compile
switches. When the VSX code was added, different constraints were added
based on the type (DFmode, V2DFmode, V4SFmode). For the vector types, all
@@ -3007,6 +3063,14 @@ rs6000_init_hard_regno_mode_ok (bool global_init_p)
reg_addr[TFmode].reload_gpr_vsx = CODE_FOR_reload_gpr_from_vsxtf;
reg_addr[TFmode].reload_vsx_gpr = CODE_FOR_reload_vsx_from_gprtf;
}
+
+ if (TARGET_MMA)
+ {
+ reg_addr[POImode].reload_store = CODE_FOR_reload_poi_di_store;
+ reg_addr[POImode].reload_load = CODE_FOR_reload_poi_di_load;
+ reg_addr[PXImode].reload_store = CODE_FOR_reload_pxi_di_store;
+ reg_addr[PXImode].reload_load = CODE_FOR_reload_pxi_di_load;
+ }
}
}
else
@@ -3339,7 +3403,8 @@ rs6000_builtin_mask_calculate (void)
&& !TARGET_IEEEQUAD) ? RS6000_BTM_LDBL128 : 0)
| ((TARGET_FLOAT128_TYPE) ? RS6000_BTM_FLOAT128 : 0)
| ((TARGET_FLOAT128_HW) ? RS6000_BTM_FLOAT128_HW : 0)
- | ((TARGET_FUTURE) ? RS6000_BTM_FUTURE : 0));
+ | ((TARGET_MMA) ? RS6000_BTM_MMA : 0)
+ | ((TARGET_FUTURE) ? RS6000_BTM_FUTURE : 0));
}
/* Implement TARGET_MD_ASM_ADJUST. All asm statements are considered
@@ -4202,6 +4267,15 @@ rs6000_option_override_internal (bool global_init_p)
rs6000_isa_flags &= ~OPTION_MASK_PCREL;
}
+ /* Turn off vector pair/mma options on non-future systems. */
+ if (!TARGET_FUTURE && TARGET_MMA)
+ {
+ if ((rs6000_isa_flags_explicit & OPTION_MASK_MMA) != 0)
+ error ("%qs requires %qs", "-mmma", "-mcpu=future");
+
+ rs6000_isa_flags &= ~OPTION_MASK_MMA;
+ }
+
if (TARGET_DEBUG_REG || TARGET_DEBUG_TARGET)
rs6000_print_isa_options (stderr, 0, "after subtarget", rs6000_isa_flags);
@@ -7175,7 +7249,7 @@ rs6000_slow_unaligned_access (machine_mode mode, unsigned int align)
return (STRICT_ALIGNMENT
|| (!TARGET_EFFICIENT_UNALIGNED_VSX
&& ((SCALAR_FLOAT_MODE_NOT_VECTOR_P (mode) && align < 32)
- || ((VECTOR_MODE_P (mode) || FLOAT128_VECTOR_P (mode))
+ || ((VECTOR_MODE_P (mode) || VECTOR_ALIGNMENT_P (mode))
&& (int) align < VECTOR_ALIGN (mode)))));
}
@@ -7360,7 +7434,7 @@ quad_address_p (rtx addr, machine_mode mode, bool strict)
{
rtx op0, op1;
- if (GET_MODE_SIZE (mode) != 16)
+ if (GET_MODE_SIZE (mode) < 16)
return false;
if (legitimate_indirect_address_p (addr, strict))
@@ -7678,6 +7752,12 @@ reg_offset_addressing_ok_p (machine_mode mode)
return mode_supports_dq_form (mode);
break;
+ /* The vector pair/quad types support offset addressing if the
+ underlying vectors support offset addressing. */
+ case E_POImode:
+ case E_PXImode:
+ return TARGET_MMA;
+
case E_SDmode:
/* If we can do direct load/stores of SDmode, restrict it to reg+reg
addressing for the LFIWZX and STFIWX instructions. */
@@ -8024,8 +8104,14 @@ legitimate_indexed_address_p (rtx x, int strict)
bool
avoiding_indexed_address_p (machine_mode mode)
{
- /* Avoid indexed addressing for modes that have non-indexed
- load/store instruction forms. */
+ unsigned int msize = GET_MODE_SIZE (mode);
+
+ /* Avoid indexed addressing for modes that have non-indexed load/store
+ instruction forms. On the future system, vector pairs have an indexed
+ form, but vector quads don't. */
+ if (msize > 16)
+ return msize != 32;
+
return (TARGET_AVOID_XFORM && VECTOR_MEM_NONE_P (mode));
}
@@ -9856,6 +9942,13 @@ rs6000_emit_move (rtx dest, rtx source, machine_mode mode)
operands[1] = force_const_mem (mode, operands[1]);
break;
+ case E_POImode:
+ case E_PXImode:
+ if (CONSTANT_P (operands[1]))
+ error ("%qs is an opaque type, and you can't set it to other values.",
+ (mode == POImode) ? "__vector_pair" : "__vector_quad");
+ break;
+
case E_SImode:
case E_DImode:
/* Use default pattern for address of ELF small data */
@@ -12117,8 +12210,20 @@ rs6000_preferred_reload_class (rtx x, enum reg_class rclass)
return NO_REGS;
}
- if (GET_MODE_CLASS (mode) == MODE_INT && rclass == GEN_OR_FLOAT_REGS)
- return GENERAL_REGS;
+ /* For the vector pair and vector quad modes, prefer their natural register
+ (VSX or FPR) rather than GPR registers. For other integer types, prefer
+ the GPR registers. */
+ if (rclass == GEN_OR_FLOAT_REGS)
+ {
+ if (mode == POImode)
+ return VSX_REGS;
+
+ if (mode == PXImode)
+ return FLOAT_REGS;
+
+ if (GET_MODE_CLASS (mode) == MODE_INT)
+ return GENERAL_REGS;
+ }
return rclass;
}
@@ -15793,7 +15898,23 @@ rs6000_split_multireg_move (rtx dst, rtx src)
reg = REG_P (dst) ? REGNO (dst) : REGNO (src);
mode = GET_MODE (dst);
nregs = hard_regno_nregs (reg, mode);
- if (FP_REGNO_P (reg))
+ /* If we have a quad vector register for MMA, and this is a load or store,
+ see if we can use vector paired load/stores. */
+ if (mode == PXImode && TARGET_MMA
+ && (MEM_P (dst) || MEM_P (src)))
+ {
+ reg_mode = POImode;;
+ nregs /= hard_regno_nregs (reg, reg_mode);
+ }
+
+ /* If we have a vector pair/quad mode, split it into two/four separate
+ vectors. */
+ else if (mode == POImode || mode == PXImode)
+ {
+ reg_mode = V1TImode;
+ nregs /= hard_regno_nregs (reg, reg_mode);
+ }
+ else if (FP_REGNO_P (reg))
reg_mode = DECIMAL_FLOAT_MODE_P (mode) ? DDmode :
(TARGET_HARD_FLOAT ? DFmode : SFmode);
else if (ALTIVEC_REGNO_P (reg))
@@ -15837,6 +15958,51 @@ rs6000_split_multireg_move (rtx dst, rtx src)
return;
}
+ /* The __vector_pair and __vector_quad modes are multi-register modes,
+ so if have to load or store the registers, we have to be careful to
+ properly swap them if we're in little endian mode below. This means
+ the last register gets the first memory location. */
+ if (mode == POImode || mode == PXImode)
+ {
+ if (MEM_P (dst))
+ {
+ unsigned offset = 0;
+ unsigned size = GET_MODE_SIZE (reg_mode);
+
+ for (int i = 0; i < nregs; i++)
+ {
+ unsigned subreg = (WORDS_BIG_ENDIAN)
+ ? i * size : (nregs - 1 - i) * size;
+ rtx dst2 = adjust_address (dst, reg_mode, offset);
+ rtx src2 = simplify_gen_subreg (reg_mode, src, mode, subreg);
+ offset += size;
+ emit_insn (gen_rtx_SET (dst2, src2));
+ }
+
+ return;
+ }
+
+ if (MEM_P (src))
+ {
+ unsigned offset = 0;
+ unsigned size = GET_MODE_SIZE (reg_mode);
+
+ for (int i = 0; i < nregs; i++)
+ {
+ unsigned subreg = (WORDS_BIG_ENDIAN)
+ ? i * size : (nregs - 1 - i) * size;
+ rtx dst2 = simplify_gen_subreg (reg_mode, dst, mode, subreg);
+ rtx src2 = adjust_address (src, reg_mode, offset);
+ offset += size;
+ emit_insn (gen_rtx_SET (dst2, src2));
+ }
+
+ return;
+ }
+
+ /* Register -> register moves can use common code. */
+ }
+
if (REG_P (src) && REG_P (dst) && (REGNO (src) < REGNO (dst)))
{
/* Move register range backwards, if we might have destructive
@@ -19227,7 +19393,8 @@ rs6000_handle_altivec_attribute (tree *node,
/* AltiVec defines five built-in scalar types that serve as vector
elements; we must teach the compiler how to mangle them. The 128-bit
- floating point mangling is target-specific as well. */
+ floating point mangling is target-specific as well. MMA defines
+ two built-in types to be used as opaque vector types. */
static const char *
rs6000_mangle_type (const_tree type)
@@ -19249,6 +19416,9 @@ rs6000_mangle_type (const_tree type)
if (SCALAR_FLOAT_TYPE_P (type) && FLOAT128_IEEE_P (TYPE_MODE (type)))
return ieee128_mangling_gcc_8_1 ? "U10__float128" : "u9__ieee128";
+ if (type == vector_pair_type_node) return "u13__vector_pair";
+ if (type == vector_quad_type_node) return "u13__vector_quad";
+
/* For all other types, use the default mangling. */
return NULL;
}
@@ -22506,7 +22676,7 @@ rs6000_function_value (const_tree valtype,
/* VSX is a superset of Altivec and adds V2DImode/V2DFmode. Since the same
return register is used in both cases, and we won't see V2DImode/V2DFmode
for pure altivec, combine the two cases. */
- else if ((TREE_CODE (valtype) == VECTOR_TYPE || FLOAT128_VECTOR_P (mode))
+ else if ((TREE_CODE (valtype) == VECTOR_TYPE || VECTOR_ALIGNMENT_P (mode))
&& TARGET_ALTIVEC && TARGET_ALTIVEC_ABI
&& ALTIVEC_OR_VSX_VECTOR_MODE (mode))
regno = ALTIVEC_ARG_RETURN;
@@ -22922,6 +23092,7 @@ static struct rs6000_opt_mask const rs6000_opt_masks[] =
{ "isel", OPTION_MASK_ISEL, false, true },
{ "mfcrf", OPTION_MASK_MFCRF, false, true },
{ "mfpgpr", 0, false, true },
+ { "mma", OPTION_MASK_MMA, false, true },
{ "modulo", OPTION_MASK_MODULO, false, true },
{ "mulhw", OPTION_MASK_MULHW, false, true },
{ "multiple", OPTION_MASK_MULTIPLE, false, true },
@@ -22992,6 +23163,8 @@ static struct rs6000_opt_mask const rs6000_builtin_mask_names[] =
{ "powerpc64", RS6000_BTM_POWERPC64, false, false },
{ "float128", RS6000_BTM_FLOAT128, false, false },
{ "float128-hw", RS6000_BTM_FLOAT128_HW,false, false },
+ { "mma", RS6000_BTM_MMA, false, false },
+ { "future", RS6000_BTM_FUTURE, false, false },
};
/* Option variables that we want to support inside attribute((target)) and
@@ -24947,7 +25120,7 @@ address_to_insn_form (rtx addr,
non_prefixed_format = NON_PREFIXED_DS;
else if (TARGET_VSX && size >= 16
- && (VECTOR_MODE_P (mode) || FLOAT128_VECTOR_P (mode)))
+ && (VECTOR_MODE_P (mode) || VECTOR_ALIGNMENT_P (mode)))
non_prefixed_format = NON_PREFIXED_DQ;
else
@@ -25076,7 +25249,7 @@ reg_to_non_prefixed (rtx reg, machine_mode mode)
else if (TARGET_VSX && size >= 16
&& (VECTOR_MODE_P (mode)
- || FLOAT128_VECTOR_P (mode)
+ || VECTOR_ALIGNMENT_P (mode)
|| mode == TImode || mode == CTImode))
return (TARGET_P9_VECTOR) ? NON_PREFIXED_DQ : NON_PREFIXED_X;
@@ -25100,7 +25273,7 @@ reg_to_non_prefixed (rtx reg, machine_mode mode)
else if (TARGET_VSX && size >= 16
&& (VECTOR_MODE_P (mode)
- || FLOAT128_VECTOR_P (mode)
+ || VECTOR_ALIGNMENT_P (mode)
|| mode == TImode || mode == CTImode))
return NON_PREFIXED_DQ;
@@ -26494,6 +26667,45 @@ rs6000_cannot_substitute_mem_equiv_p (rtx mem)
return false;
}
+/* Implement TARGET_INVALID_CONVERSION. */
+
+static const char *
+rs6000_invalid_conversion (const_tree fromtype, const_tree totype)
+{
+ if (element_mode (fromtype) != element_mode (totype))
+ {
+ /* Do not allow conversions to/from PXImode and POImode types. */
+ if (TYPE_MODE (fromtype) == PXImode)
+ return N_("invalid conversion from type %<__vector_quad%>");
+ if (TYPE_MODE (totype) == PXImode)
+ return N_("invalid conversion to type %<__vector_quad%>");
+ if (TYPE_MODE (fromtype) == POImode)
+ return N_("invalid conversion from type %<__vector_pair%>");
+ if (TYPE_MODE (totype) == POImode)
+ return N_("invalid conversion to type %<__vector_pair%>");
+ }
+ else if (POINTER_TYPE_P (fromtype) && POINTER_TYPE_P (totype))
+ {
+ /* Do not allow conversions to/from PXImode and POImode pointer
+ types, except to/from void pointers. */
+ if (TYPE_MODE (TREE_TYPE (fromtype)) == PXImode
+ && TYPE_MODE (TREE_TYPE (totype)) != VOIDmode)
+ return N_("invalid conversion from type %<* __vector_quad%>");
+ if (TYPE_MODE (TREE_TYPE (totype)) == PXImode
+ && TYPE_MODE (TREE_TYPE (fromtype)) != VOIDmode)
+ return N_("invalid conversion to type %<* __vector_quad%>");
+ if (TYPE_MODE (TREE_TYPE (fromtype)) == POImode
+ && TYPE_MODE (TREE_TYPE (totype)) != VOIDmode)
+ return N_("invalid conversion from type %<* __vector_pair%>");
+ if (TYPE_MODE (TREE_TYPE (totype)) == POImode
+ && TYPE_MODE (TREE_TYPE (fromtype)) != VOIDmode)
+ return N_("invalid conversion to type %<* __vector_pair%>");
+ }
+
+ /* Conversion allowed. */
+ return NULL;
+}
+
struct gcc_target targetm = TARGET_INITIALIZER;
#include "gt-rs6000.h"
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 1209a33173e..9c103bf8f7d 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -522,6 +522,7 @@ extern int rs6000_vector_align[];
#define MASK_HTM OPTION_MASK_HTM
#define MASK_ISEL OPTION_MASK_ISEL
#define MASK_MFCRF OPTION_MASK_MFCRF
+#define MASK_MMA OPTION_MASK_MMA
#define MASK_MULHW OPTION_MASK_MULHW
#define MASK_MULTIPLE OPTION_MASK_MULTIPLE
#define MASK_NO_UPDATE OPTION_MASK_NO_UPDATE
@@ -776,7 +777,7 @@ extern unsigned rs6000_pointer_size;
#define FUNCTION_BOUNDARY 32
/* No data type wants to be aligned rounder than this. */
-#define BIGGEST_ALIGNMENT 128
+#define BIGGEST_ALIGNMENT ((TARGET_MMA) ? 512 : 128)
/* Alignment of field after `int : 0' in a structure. */
#define EMPTY_FIELD_BOUNDARY 32
@@ -1035,16 +1036,17 @@ enum data_align { align_abi, align_opt, align_both };
((MODE) == V4SFmode \
|| (MODE) == V2DFmode) \
-/* Note KFmode and possibly TFmode (i.e. IEEE 128-bit floating point) are not
- really a vector, but we want to treat it as a vector for moves, and
- such. */
+/* Modes that are not vectors, but require vector alignment. Treat these like
+ vectors in terms of loads and stores. */
+#define VECTOR_ALIGNMENT_P(MODE) \
+ (FLOAT128_VECTOR_P (MODE) || (MODE) == POImode || (MODE) == PXImode)
#define ALTIVEC_VECTOR_MODE(MODE) \
((MODE) == V16QImode \
|| (MODE) == V8HImode \
|| (MODE) == V4SFmode \
|| (MODE) == V4SImode \
- || FLOAT128_VECTOR_P (MODE))
+ || VECTOR_ALIGNMENT_P (MODE))
#define ALTIVEC_OR_VSX_VECTOR_MODE(MODE) \
(ALTIVEC_VECTOR_MODE (MODE) || VSX_VECTOR_MODE (MODE) \
@@ -2309,6 +2311,7 @@ extern int frame_pointer_needed;
#define RS6000_BTM_POWERPC64 MASK_POWERPC64 /* 64-bit registers. */
#define RS6000_BTM_FLOAT128 MASK_FLOAT128_KEYWORD /* IEEE 128-bit float. */
#define RS6000_BTM_FLOAT128_HW MASK_FLOAT128_HW /* IEEE 128-bit float h/w. */
+#define RS6000_BTM_MMA MASK_MMA /* ISA 3.1 MMA. */
#define RS6000_BTM_FUTURE MASK_FUTURE
@@ -2331,7 +2334,9 @@ extern int frame_pointer_needed;
| RS6000_BTM_LDBL128 \
| RS6000_BTM_POWERPC64 \
| RS6000_BTM_FLOAT128 \
- | RS6000_BTM_FLOAT128_HW)
+ | RS6000_BTM_FLOAT128_HW \
+ | RS6000_BTM_MMA \
+ | RS6000_BTM_FUTURE)
/* Define builtin enum index. */
@@ -2443,6 +2448,8 @@ enum rs6000_builtin_type_index
RS6000_BTI_ieee128_float, /* ieee 128-bit floating point */
RS6000_BTI_ibm128_float, /* IBM 128-bit floating point */
RS6000_BTI_const_str, /* pointer to const char * */
+ RS6000_BTI_vector_pair, /* unsigned 256-bit types (vector pair). */
+ RS6000_BTI_vector_quad, /* unsigned 512-bit types (vector quad). */
RS6000_BTI_MAX
};
@@ -2495,6 +2502,8 @@ enum rs6000_builtin_type_index
#define ieee128_float_type_node (rs6000_builtin_types[RS6000_BTI_ieee128_float])
#define ibm128_float_type_node (rs6000_builtin_types[RS6000_BTI_ibm128_float])
#define const_str_type_node (rs6000_builtin_types[RS6000_BTI_const_str])
+#define vector_pair_type_node (rs6000_builtin_types[RS6000_BTI_vector_pair])
+#define vector_quad_type_node (rs6000_builtin_types[RS6000_BTI_vector_quad])
extern GTY(()) tree rs6000_builtin_types[RS6000_BTI_MAX];
extern GTY(()) tree rs6000_builtin_decls[RS6000_BUILTIN_COUNT];
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 0aa5265d199..6b462a3ecdb 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -322,7 +322,7 @@ (define_attr "cpu"
(const (symbol_ref "(enum attr_cpu) rs6000_tune")))
;; The ISA we implement.
-(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9v,p9kf,p9tf,fut"
+(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9v,p9kf,p9tf,fut,mma"
(const_string "any"))
;; Is this alternative enabled for the current CPU/ISA/etc.?
@@ -366,6 +366,10 @@ (define_attr "enabled" ""
(and (eq_attr "isa" "fut")
(match_test "TARGET_FUTURE"))
(const_int 1)
+
+ (and (eq_attr "isa" "mma")
+ (match_test "TARGET_MMA"))
+ (const_int 1)
] (const_int 0)))
;; If this instruction is microcoded on the CELL processor
@@ -772,7 +776,8 @@ (define_mode_attr BOOL_REGS_UNARY [(TI "r,0,0,wa,v")
;; Reload iterator for creating the function to allocate a base register to
;; supplement addressing modes.
(define_mode_iterator RELOAD [V16QI V8HI V4SI V2DI V4SF V2DF V1TI
- SF SD SI DF DD DI TI PTI KF IF TF])
+ SF SD SI DF DD DI TI PTI KF IF TF
+ POI PXI])
;; Iterate over smin, smax
(define_code_iterator fp_minmax [smin smax])
@@ -14866,6 +14871,7 @@ (define_insn "*cmpeqb_internal"
(include "vector.md")
(include "vsx.md")
(include "altivec.md")
+(include "mma.md")
(include "dfp.md")
(include "crypto.md")
(include "htm.md")
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index f95b8279270..92951483e4e 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -578,3 +578,7 @@ Generate (do not generate) prefixed memory instructions.
mpcrel
Target Report Mask(PCREL) Var(rs6000_isa_flags)
Generate (do not generate) pc-relative memory addressing.
+
+mmma
+Target Report Mask(MMA) Var(rs6000_isa_flags)
+Generate (do not generate) MMA instructions.
diff --git a/gcc/config/rs6000/t-rs6000 b/gcc/config/rs6000/t-rs6000
index 170a69591dd..81d550ce236 100644
--- a/gcc/config/rs6000/t-rs6000
+++ b/gcc/config/rs6000/t-rs6000
@@ -83,6 +83,7 @@ MD_INCLUDES = $(srcdir)/config/rs6000/rs64.md \
$(srcdir)/config/rs6000/vector.md \
$(srcdir)/config/rs6000/vsx.md \
$(srcdir)/config/rs6000/altivec.md \
+ $(srcdir)/config/rs6000/mma.md \
$(srcdir)/config/rs6000/crypto.md \
$(srcdir)/config/rs6000/htm.md \
$(srcdir)/config/rs6000/dfp.md
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 06a04e3d7dd..1452aabe693 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1201,7 +1201,7 @@ See RS/6000 and PowerPC Options.
-mgnu-attribute -mno-gnu-attribute @gol
-mstack-protector-guard=@var{guard} -mstack-protector-guard-reg=@var{reg} @gol
-mstack-protector-guard-offset=@var{offset} -mprefixed -mno-prefixed @gol
--mpcrel -mno-pcrel}
+-mpcrel -mno-pcrel -mmma -mno-mmma}
@emph{RX Options}
@gccoptlist{-m64bit-doubles -m32bit-doubles -fpu -nofpu@gol
@@ -25940,7 +25940,8 @@ following options:
-mpowerpc-gpopt -mpowerpc-gfxopt @gol
-mmulhw -mdlmzb -mmfpgpr -mvsx @gol
-mcrypto -mhtm -mpower8-fusion -mpower8-vector @gol
--mquad-memory -mquad-memory-atomic -mfloat128 -mfloat128-hardware}
+-mquad-memory -mquad-memory-atomic -mfloat128 @gol
+-mfloat128-hardware -mprefixed -mpcrel -mmma}
The particular options set for any particular CPU varies between
compiler versions, depending on what setting seems to produce optimal
@@ -26936,6 +26937,13 @@ addressing (@option{-mprefixed}) options are enabled.
@opindex mno-prefixed
Generate (do not generate) addressing modes using prefixed load and
store instructions when the option @option{-mcpu=future} is used.
+
+@item -mmma
+@itemx -mno-mma
+@opindex mmma
+@opindex mno-mma
+Generate (do not generate) the MMA instructions when the option
+@option{-mcpu=future} is used.
@end table
@node RX Options
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 2/3, v2] rs6000: Add MMA built-in function definitions
2020-06-18 20:42 [PATCH 0/3, v2] rs6000: Add support for Matrix-Multiply Assist (MMA) built-in functions Peter Bergner
2020-06-18 20:44 ` [PATCH 1/3, v2] rs6000: Add base support and types for defining MMA built-ins Peter Bergner
@ 2020-06-18 20:45 ` Peter Bergner
2020-06-19 16:45 ` Segher Boessenkool
2020-06-18 20:46 ` [PATCH 3/3, v2] rs6000: Add testsuite test cases for MMA built-ins Peter Bergner
2020-06-24 19:28 ` [PATCH 0/3, v2] rs6000: Add support for Matrix-Multiply Assist (MMA) built-in functions Peter Bergner
3 siblings, 1 reply; 19+ messages in thread
From: Peter Bergner @ 2020-06-18 20:45 UTC (permalink / raw)
To: Segher Boessenkool
Cc: GCC Patches, Bill Schmidt, Michael Meissner, David Edelsohn,
Will Schmidt
Changes since v1:
- Updated ChangeLog entry per Segher's suggestion.
- Updated doc/extend.texi with correct built-in names for
__builtin_vsx_xvcvspbf16 and __builtin_vsx_xvcvbf16sp.
This patches adds the actual MMA built-ins. The MMA accumulators are INOUT
operands for most MMA instructions, but they are also very expensive to
move around. For this reason, we have implemented a built-in API
where the accumulators are passed using pass-by-reference/pointers, so
the user won't use one accumulator as input and another as output,
which would entail a lot of copies. However, using pointers gives us
poor code generation when we expand the built-ins at normal expand time.
We therefore expand the MMA built-ins early into gimple, converting
the pass-by-reference calls to an internal built-in that uses pass-by-value
calling convention, where we can enforce the input and output accumulators
are the same. This gives us much better code generation.
The associated test cases for these built-ins are in patch3.
This patch plus patch1 passed bootstrap and regtesting with no regressions
on both powerpc64le-linux and powerpc64-linux. Ok for trunk?
The v1 patch passed bootstrap and regtesting with no regressions on both
powerpc64le-linux and powerpc64-linux. This updated patch + patch1 is
bootstrapping and regtesting on powerpc64{,64}-linux. Ok for trunk if
there are no regressions?
Peter
2020-06-18 Peter Bergner <bergner@linux.ibm.com>
gcc/
* config/rs6000/predicates.md (mma_input_operand): New predicate.
* config/rs6000/rs6000-builtin.def (BU_MMA_1, BU_MMA_V2, BU_MMA_3,
BU_MMA_5, BU_MMA_6, BU_VSX_1): Add support macros for defining MMA
built-in functions.
(ASSEMBLE_ACC, ASSEMBLE_PAIR, DISASSEMBLE_ACC, DISASSEMBLE_PAIR,
PMXVBF16GER2, PMXVBF16GER2NN, PMXVBF16GER2NP, PMXVBF16GER2PN,
PMXVBF16GER2PP, PMXVF16GER2, PMXVF16GER2NN, PMXVF16GER2NP,
PMXVF16GER2PN, PMXVF16GER2PP, PMXVF32GER, PMXVF32GERNN,
PMXVF32GERNP, PMXVF32GERPN, PMXVF32GERPP, PMXVF64GER, PMXVF64GERNN,
PMXVF64GERNP, PMXVF64GERPN, PMXVF64GERPP, PMXVI16GER2, PMXVI16GER2PP,
PMXVI16GER2S, PMXVI16GER2SPP, PMXVI4GER8, PMXVI4GER8PP, PMXVI8GER4,
PMXVI8GER4PP, PMXVI8GER4SPP, XVBF16GER2, XVBF16GER2NN, XVBF16GER2NP,
XVBF16GER2PN, XVBF16GER2PP, XVCVBF16SP, XVCVSPBF16, XVF16GER2,
XVF16GER2NN, XVF16GER2NP, XVF16GER2PN, XVF16GER2PP, XVF32GER,
XVF32GERNN, XVF32GERNP, XVF32GERPN, XVF32GERPP, XVF64GER, XVF64GERNN,
XVF64GERNP, XVF64GERPN, XVF64GERPP, XVI16GER2, XVI16GER2PP, XVI16GER2S,
XVI16GER2SPP, XVI4GER8, XVI4GER8PP, XVI8GER4, XVI8GER4PP, XVI8GER4SPP,
XXMFACC, XXMTACC, XXSETACCZ): Add MMA built-ins.
* config/rs6000/rs6000.c (rs6000_emit_move): Allow zero constants.
(print_operand) <case 'A'>: New output modifier.
(rs6000_split_multireg_move): Add support for inserting accumulator
priming and depriming instructions. Add support for splitting an
assemble accumulator pattern.
* config/rs6000/rs6000-call.c (mma_init_builtins, mma_expand_builtin,
rs6000_gimple_fold_mma_builtin): New functions.
(RS6000_BUILTIN_M): New macro.
(def_builtin): Handle RS6000_BTC_QUAD and RS6000_BTC_PAIR attributes.
(bdesc_mma): Add new MMA built-in support.
(htm_expand_builtin): Use RS6000_BTC_OPND_MASK.
(rs6000_invalid_builtin): Add handling of RS6000_BTM_FUTURE and
RS6000_BTM_MMA.
(rs6000_builtin_valid_without_lhs): Handle RS6000_BTC_VOID attribute.
(rs6000_gimple_fold_builtin): Call rs6000_builtin_is_supported_p
and rs6000_gimple_fold_mma_builtin.
(rs6000_expand_builtin): Call mma_expand_builtin.
Use RS6000_BTC_OPND_MASK.
(rs6000_init_builtins): Adjust comment. Call mma_init_builtins.
(htm_init_builtins): Use RS6000_BTC_OPND_MASK.
(builtin_function_type): Handle VSX_BUILTIN_XVCVSPBF16 and
VSX_BUILTIN_XVCVBF16SP.
* config/rs6000/rs6000.h (RS6000_BTC_QUINARY, RS6000_BTC_SENARY,
RS6000_BTC_OPND_MASK, RS6000_BTC_QUAD, RS6000_BTC_PAIR,
RS6000_BTC_QUADPAIR, RS6000_BTC_GIMPLE): New defines.
(RS6000_BTC_PREDICATE, RS6000_BTC_ABS, RS6000_BTC_DST,
RS6000_BTC_TYPE_MASK, RS6000_BTC_ATTR_MASK): Adjust values.
* config/rs6000/mma.md (MAX_MMA_OPERANDS): New define_constant.
(UNSPEC_MMA_ASSEMBLE_ACC, UNSPEC_MMA_PMXVBF16GER2,
UNSPEC_MMA_PMXVBF16GER2NN, UNSPEC_MMA_PMXVBF16GER2NP,
UNSPEC_MMA_PMXVBF16GER2PN, UNSPEC_MMA_PMXVBF16GER2PP,
UNSPEC_MMA_PMXVF16GER2, UNSPEC_MMA_PMXVF16GER2NN,
UNSPEC_MMA_PMXVF16GER2NP, UNSPEC_MMA_PMXVF16GER2PN,
UNSPEC_MMA_PMXVF16GER2PP, UNSPEC_MMA_PMXVF32GER,
UNSPEC_MMA_PMXVF32GERNN, UNSPEC_MMA_PMXVF32GERNP,
UNSPEC_MMA_PMXVF32GERPN, UNSPEC_MMA_PMXVF32GERPP,
UNSPEC_MMA_PMXVF64GER, UNSPEC_MMA_PMXVF64GERNN,
UNSPEC_MMA_PMXVF64GERNP, UNSPEC_MMA_PMXVF64GERPN,
UNSPEC_MMA_PMXVF64GERPP, UNSPEC_MMA_PMXVI16GER2,
UNSPEC_MMA_PMXVI16GER2PP, UNSPEC_MMA_PMXVI16GER2S,
UNSPEC_MMA_PMXVI16GER2SPP, UNSPEC_MMA_PMXVI4GER8,
UNSPEC_MMA_PMXVI4GER8PP, UNSPEC_MMA_PMXVI8GER4,
UNSPEC_MMA_PMXVI8GER4PP, UNSPEC_MMA_PMXVI8GER4SPP,
UNSPEC_MMA_XVBF16GER2, UNSPEC_MMA_XVBF16GER2NN,
UNSPEC_MMA_XVBF16GER2NP, UNSPEC_MMA_XVBF16GER2PN,
UNSPEC_MMA_XVBF16GER2PP, UNSPEC_MMA_XVF16GER2, UNSPEC_MMA_XVF16GER2NN,
UNSPEC_MMA_XVF16GER2NP, UNSPEC_MMA_XVF16GER2PN, UNSPEC_MMA_XVF16GER2PP,
UNSPEC_MMA_XVF32GER, UNSPEC_MMA_XVF32GERNN, UNSPEC_MMA_XVF32GERNP,
UNSPEC_MMA_XVF32GERPN, UNSPEC_MMA_XVF32GERPP, UNSPEC_MMA_XVF64GER,
UNSPEC_MMA_XVF64GERNN, UNSPEC_MMA_XVF64GERNP, UNSPEC_MMA_XVF64GERPN,
UNSPEC_MMA_XVF64GERPP, UNSPEC_MMA_XVI16GER2, UNSPEC_MMA_XVI16GER2PP,
UNSPEC_MMA_XVI16GER2S, UNSPEC_MMA_XVI16GER2SPP, UNSPEC_MMA_XVI4GER8,
UNSPEC_MMA_XVI4GER8PP, UNSPEC_MMA_XVI8GER4, UNSPEC_MMA_XVI8GER4PP,
UNSPEC_MMA_XVI8GER4SPP, UNSPEC_MMA_XXMFACC, UNSPEC_MMA_XXMTACC): New.
(MMA_ACC, MMA_VV, MMA_AVV, MMA_PV, MMA_APV, MMA_VVI4I4I8,
MMA_AVVI4I4I8, MMA_VVI4I4I2, MMA_AVVI4I4I2, MMA_VVI4I4,
MMA_AVVI4I4, MMA_PVI4I2, MMA_APVI4I2, MMA_VVI4I4I4,
MMA_AVVI4I4I4): New define_int_iterator.
(acc, vv, avv, pv, apv, vvi4i4i8, avvi4i4i8, vvi4i4i2,
avvi4i4i2, vvi4i4, avvi4i4, pvi4i2, apvi4i2, vvi4i4i4,
avvi4i4i4): New define_int_attr.
(*movpxi): Add zero constant alternative.
(mma_assemble_pair, mma_assemble_acc): New define_expand.
(*mma_assemble_acc): New define_insn_and_split.
(mma_<acc>, mma_xxsetaccz, mma_<vv>, mma_<avv>, mma_<pv>, mma_<apv>,
mma_<vvi4i4i8>, mma_<avvi4i4i8>, mma_<vvi4i4i2>, mma_<avvi4i4i2>,
mma_<vvi4i4>, mma_<avvi4i4>, mma_<pvi4i2>, mma_<apvi4i2>,
mma_<vvi4i4i4>, mma_<avvi4i4i4>): New define_insn.
* config/rs6000/rs6000.md (define_attr "type"): New type mma.
* config/rs6000/vsx.md (UNSPEC_VSX_XVCVBF16SP): New.
(UNSPEC_VSX_XVCVSPBF16): Likewise.
(XVCVBF16): New define_int_iterator.
(xvcvbf16): New define_int_attr.
(vsx_<xvcvbf16>): New define_insn.
* doc/extend.texi: Document the mma built-ins.
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index c3f460face2..4e37ce35c5d 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -1119,6 +1119,12 @@ (define_predicate "splat_input_operand"
return gpc_reg_operand (op, mode);
})
+;; Return 1 if this operand is valid for a MMA assemble accumulator insn.
+(define_special_predicate "mma_input_operand"
+ (match_test "(mode == PXImode
+ && (GET_MODE (op) == V16QImode)
+ && (vsx_register_operand (op, GET_MODE (op)) || MEM_P (op)))"))
+
;; Return true if operand is an operator used in rotate-and-mask instructions.
(define_predicate "rotate_mask_operator"
(match_code "rotate,ashift,lshiftrt"))
diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def
index 8b1ddb00045..968c46cc36f 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -32,6 +32,7 @@
RS6000_BUILTIN_A -- ABS builtins
RS6000_BUILTIN_D -- DST builtins
RS6000_BUILTIN_H -- HTM builtins
+ RS6000_BUILTIN_M -- MMA builtins
RS6000_BUILTIN_P -- Altivec, VSX, ISA 2.07 vector predicate builtins
RS6000_BUILTIN_X -- special builtins
@@ -74,6 +75,10 @@
#error "RS6000_BUILTIN_H is not defined."
#endif
+#ifndef RS6000_BUILTIN_M
+ #error "RS6000_BUILTIN_M is not defined."
+#endif
+
#ifndef RS6000_BUILTIN_P
#error "RS6000_BUILTIN_P is not defined."
#endif
@@ -329,6 +334,82 @@
| RS6000_BTC_SPECIAL), \
CODE_FOR_nothing) /* ICODE */
+/* MMA convenience macros. */
+
+#define BU_MMA_1(ENUM, NAME, ATTR, ICODE) \
+ RS6000_BUILTIN_M (MMA_BUILTIN_ ## ENUM, /* ENUM */ \
+ "__builtin_mma_" NAME, /* NAME */ \
+ RS6000_BTM_MMA, /* MASK */ \
+ (RS6000_BTC_ ## ATTR /* ATTR */ \
+ | RS6000_BTC_UNARY \
+ | RS6000_BTC_VOID \
+ | RS6000_BTC_GIMPLE), \
+ CODE_FOR_nothing) /* ICODE */ \
+ RS6000_BUILTIN_M (MMA_BUILTIN_ ## ENUM ## _INTERNAL, /* ENUM */ \
+ "__builtin_mma_" NAME "_internal", /* NAME */ \
+ RS6000_BTM_MMA, /* MASK */ \
+ (RS6000_BTC_ ## ATTR /* ATTR */ \
+ | RS6000_BTC_UNARY), \
+ CODE_FOR_ ## ICODE) /* ICODE */
+
+#define BU_MMA_V2(ENUM, NAME, ATTR, ICODE) \
+ RS6000_BUILTIN_M (MMA_BUILTIN_ ## ENUM, /* ENUM */ \
+ "__builtin_mma_" NAME, /* NAME */ \
+ RS6000_BTM_MMA, /* MASK */ \
+ (RS6000_BTC_ ## ATTR /* ATTR */ \
+ | RS6000_BTC_BINARY \
+ | RS6000_BTC_VOID \
+ | RS6000_BTC_GIMPLE), \
+ CODE_FOR_nothing) /* ICODE */
+
+#define BU_MMA_3(ENUM, NAME, ATTR, ICODE) \
+ RS6000_BUILTIN_M (MMA_BUILTIN_ ## ENUM, /* ENUM */ \
+ "__builtin_mma_" NAME, /* NAME */ \
+ RS6000_BTM_MMA, /* MASK */ \
+ (RS6000_BTC_ ## ATTR /* ATTR */ \
+ | RS6000_BTC_TERNARY \
+ | RS6000_BTC_VOID \
+ | RS6000_BTC_GIMPLE), \
+ CODE_FOR_nothing) /* ICODE */ \
+ RS6000_BUILTIN_M (MMA_BUILTIN_ ## ENUM ## _INTERNAL, /* ENUM */ \
+ "__builtin_mma_" NAME "_internal", /* NAME */ \
+ RS6000_BTM_MMA, /* MASK */ \
+ (RS6000_BTC_ ## ATTR /* ATTR */ \
+ | RS6000_BTC_TERNARY), \
+ CODE_FOR_ ## ICODE) /* ICODE */
+
+#define BU_MMA_5(ENUM, NAME, ATTR, ICODE) \
+ RS6000_BUILTIN_M (MMA_BUILTIN_ ## ENUM, /* ENUM */ \
+ "__builtin_mma_" NAME, /* NAME */ \
+ RS6000_BTM_MMA, /* MASK */ \
+ (RS6000_BTC_ ## ATTR /* ATTR */ \
+ | RS6000_BTC_QUINARY \
+ | RS6000_BTC_VOID \
+ | RS6000_BTC_GIMPLE), \
+ CODE_FOR_nothing) /* ICODE */ \
+ RS6000_BUILTIN_M (MMA_BUILTIN_ ## ENUM ## _INTERNAL, /* ENUM */ \
+ "__builtin_mma_" NAME "_internal", /* NAME */ \
+ RS6000_BTM_MMA, /* MASK */ \
+ (RS6000_BTC_ ## ATTR /* ATTR */ \
+ | RS6000_BTC_QUINARY), \
+ CODE_FOR_ ## ICODE) /* ICODE */
+
+#define BU_MMA_6(ENUM, NAME, ATTR, ICODE) \
+ RS6000_BUILTIN_M (MMA_BUILTIN_ ## ENUM, /* ENUM */ \
+ "__builtin_mma_" NAME, /* NAME */ \
+ RS6000_BTM_MMA, /* MASK */ \
+ (RS6000_BTC_ ## ATTR /* ATTR */ \
+ | RS6000_BTC_SENARY \
+ | RS6000_BTC_VOID \
+ | RS6000_BTC_GIMPLE), \
+ CODE_FOR_nothing) /* ICODE */ \
+ RS6000_BUILTIN_M (MMA_BUILTIN_ ## ENUM ## _INTERNAL, /* ENUM */ \
+ "__builtin_mma_" NAME "_internal", /* NAME */ \
+ RS6000_BTM_MMA, /* MASK */ \
+ (RS6000_BTC_ ## ATTR /* ATTR */ \
+ | RS6000_BTC_SENARY), \
+ CODE_FOR_ ## ICODE) /* ICODE */
+
/* ISA 2.05 (power6) convenience macros. */
/* For functions that depend on the CMPB instruction */
#define BU_P6_2(ENUM, NAME, ATTR, ICODE) \
@@ -2785,3 +2866,77 @@ BU_SPECIAL_X (RS6000_BUILTIN_CPU_SUPPORTS, "__builtin_cpu_supports",
/* Darwin CfString builtin. */
BU_SPECIAL_X (RS6000_BUILTIN_CFSTRING, "__builtin_cfstring", RS6000_BTM_ALWAYS,
RS6000_BTC_MISC)
+
+/* FUTURE MMA builtins. */
+BU_VSX_1 (XVCVBF16SP, "xvcvbf16sp", MISC, vsx_xvcvbf16sp)
+BU_VSX_1 (XVCVSPBF16, "xvcvspbf16", MISC, vsx_xvcvspbf16)
+
+BU_MMA_1 (XXMFACC, "xxmfacc", QUAD, mma_xxmfacc)
+BU_MMA_1 (XXMTACC, "xxmtacc", QUAD, mma_xxmtacc)
+BU_MMA_1 (XXSETACCZ, "xxsetaccz", MISC, mma_xxsetaccz)
+
+BU_MMA_V2 (DISASSEMBLE_ACC, "disassemble_acc", QUAD, nothing)
+BU_MMA_V2 (DISASSEMBLE_PAIR,"disassemble_pair", PAIR, nothing)
+
+BU_MMA_3 (ASSEMBLE_PAIR, "assemble_pair", MISC, mma_assemble_pair)
+BU_MMA_3 (XVBF16GER2, "xvbf16ger2", MISC, mma_xvbf16ger2)
+BU_MMA_3 (XVF16GER2, "xvf16ger2", MISC, mma_xvf16ger2)
+BU_MMA_3 (XVF32GER, "xvf32ger", MISC, mma_xvf32ger)
+BU_MMA_3 (XVF64GER, "xvf64ger", PAIR, mma_xvf64ger)
+BU_MMA_3 (XVI4GER8, "xvi4ger8", MISC, mma_xvi4ger8)
+BU_MMA_3 (XVI8GER4, "xvi8ger4", MISC, mma_xvi8ger4)
+BU_MMA_3 (XVI16GER2, "xvi16ger2", MISC, mma_xvi16ger2)
+BU_MMA_3 (XVI16GER2S, "xvi16ger2s", MISC, mma_xvi16ger2s)
+BU_MMA_3 (XVBF16GER2NN, "xvbf16ger2nn", QUAD, mma_xvbf16ger2nn)
+BU_MMA_3 (XVBF16GER2NP, "xvbf16ger2np", QUAD, mma_xvbf16ger2np)
+BU_MMA_3 (XVBF16GER2PN, "xvbf16ger2pn", QUAD, mma_xvbf16ger2pn)
+BU_MMA_3 (XVBF16GER2PP, "xvbf16ger2pp", QUAD, mma_xvbf16ger2pp)
+BU_MMA_3 (XVF16GER2NN, "xvf16ger2nn", QUAD, mma_xvf16ger2nn)
+BU_MMA_3 (XVF16GER2NP, "xvf16ger2np", QUAD, mma_xvf16ger2np)
+BU_MMA_3 (XVF16GER2PN, "xvf16ger2pn", QUAD, mma_xvf16ger2pn)
+BU_MMA_3 (XVF16GER2PP, "xvf16ger2pp", QUAD, mma_xvf16ger2pp)
+BU_MMA_3 (XVF32GERNN, "xvf32gernn", QUAD, mma_xvf32gernn)
+BU_MMA_3 (XVF32GERNP, "xvf32gernp", QUAD, mma_xvf32gernp)
+BU_MMA_3 (XVF32GERPN, "xvf32gerpn", QUAD, mma_xvf32gerpn)
+BU_MMA_3 (XVF32GERPP, "xvf32gerpp", QUAD, mma_xvf32gerpp)
+BU_MMA_3 (XVF64GERNN, "xvf64gernn", QUADPAIR, mma_xvf64gernn)
+BU_MMA_3 (XVF64GERNP, "xvf64gernp", QUADPAIR, mma_xvf64gernp)
+BU_MMA_3 (XVF64GERPN, "xvf64gerpn", QUADPAIR, mma_xvf64gerpn)
+BU_MMA_3 (XVF64GERPP, "xvf64gerpp", QUADPAIR, mma_xvf64gerpp)
+BU_MMA_3 (XVI4GER8PP, "xvi4ger8pp", QUAD, mma_xvi4ger8pp)
+BU_MMA_3 (XVI8GER4PP, "xvi8ger4pp", QUAD, mma_xvi8ger4pp)
+BU_MMA_3 (XVI8GER4SPP, "xvi8ger4spp", QUAD, mma_xvi8ger4spp)
+BU_MMA_3 (XVI16GER2PP, "xvi16ger2pp", QUAD, mma_xvi16ger2pp)
+BU_MMA_3 (XVI16GER2SPP, "xvi16ger2spp", QUAD, mma_xvi16ger2spp)
+
+BU_MMA_5 (ASSEMBLE_ACC, "assemble_acc", MISC, mma_assemble_acc)
+BU_MMA_5 (PMXVF32GER, "pmxvf32ger", MISC, mma_pmxvf32ger)
+BU_MMA_5 (PMXVF64GER, "pmxvf64ger", PAIR, mma_pmxvf64ger)
+BU_MMA_5 (PMXVF32GERNN, "pmxvf32gernn", QUAD, mma_pmxvf32gernn)
+BU_MMA_5 (PMXVF32GERNP, "pmxvf32gernp", QUAD, mma_pmxvf32gernp)
+BU_MMA_5 (PMXVF32GERPN, "pmxvf32gerpn", QUAD, mma_pmxvf32gerpn)
+BU_MMA_5 (PMXVF32GERPP, "pmxvf32gerpp", QUAD, mma_pmxvf32gerpp)
+BU_MMA_5 (PMXVF64GERNN, "pmxvf64gernn", QUADPAIR, mma_pmxvf64gernn)
+BU_MMA_5 (PMXVF64GERNP, "pmxvf64gernp", QUADPAIR, mma_pmxvf64gernp)
+BU_MMA_5 (PMXVF64GERPN, "pmxvf64gerpn", QUADPAIR, mma_pmxvf64gerpn)
+BU_MMA_5 (PMXVF64GERPP, "pmxvf64gerpp", QUADPAIR, mma_pmxvf64gerpp)
+
+BU_MMA_6 (PMXVBF16GER2, "pmxvbf16ger2", MISC, mma_pmxvbf16ger2)
+BU_MMA_6 (PMXVF16GER2, "pmxvf16ger2", MISC, mma_pmxvf16ger2)
+BU_MMA_6 (PMXVI4GER8, "pmxvi4ger8", MISC, mma_pmxvi4ger8)
+BU_MMA_6 (PMXVI8GER4, "pmxvi8ger4", MISC, mma_pmxvi8ger4)
+BU_MMA_6 (PMXVI16GER2, "pmxvi16ger2", MISC, mma_pmxvi16ger2)
+BU_MMA_6 (PMXVI16GER2S, "pmxvi16ger2s", MISC, mma_pmxvi16ger2s)
+BU_MMA_6 (PMXVBF16GER2NN, "pmxvbf16ger2nn", QUAD, mma_pmxvbf16ger2nn)
+BU_MMA_6 (PMXVBF16GER2NP, "pmxvbf16ger2np", QUAD, mma_pmxvbf16ger2np)
+BU_MMA_6 (PMXVBF16GER2PN, "pmxvbf16ger2pn", QUAD, mma_pmxvbf16ger2pn)
+BU_MMA_6 (PMXVBF16GER2PP, "pmxvbf16ger2pp", QUAD, mma_pmxvbf16ger2pp)
+BU_MMA_6 (PMXVF16GER2NN, "pmxvf16ger2nn", QUAD, mma_pmxvf16ger2nn)
+BU_MMA_6 (PMXVF16GER2NP, "pmxvf16ger2np", QUAD, mma_pmxvf16ger2np)
+BU_MMA_6 (PMXVF16GER2PN, "pmxvf16ger2pn", QUAD, mma_pmxvf16ger2pn)
+BU_MMA_6 (PMXVF16GER2PP, "pmxvf16ger2pp", QUAD, mma_pmxvf16ger2pp)
+BU_MMA_6 (PMXVI4GER8PP, "pmxvi4ger8pp", QUAD, mma_pmxvi4ger8pp)
+BU_MMA_6 (PMXVI8GER4PP, "pmxvi8ger4pp", QUAD, mma_pmxvi8ger4pp)
+BU_MMA_6 (PMXVI8GER4SPP, "pmxvi8ger4spp", QUAD, mma_pmxvi8ger4spp)
+BU_MMA_6 (PMXVI16GER2PP, "pmxvi16ger2pp", QUAD, mma_pmxvi16ger2pp)
+BU_MMA_6 (PMXVI16GER2SPP, "pmxvi16ger2spp", QUAD, mma_pmxvi16ger2spp)
diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index eeb20e5200d..d47c3a3aeb0 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -183,6 +183,7 @@ static tree builtin_function_type (machine_mode, machine_mode,
enum rs6000_builtins, const char *name);
static void rs6000_common_init_builtins (void);
static void htm_init_builtins (void);
+static void mma_init_builtins (void);
/* Hash table to keep track of the argument types for builtin functions. */
@@ -243,6 +244,7 @@ builtin_hasher::equal (builtin_hash_struct *p1, builtin_hash_struct *p2)
#undef RS6000_BUILTIN_A
#undef RS6000_BUILTIN_D
#undef RS6000_BUILTIN_H
+#undef RS6000_BUILTIN_M
#undef RS6000_BUILTIN_P
#undef RS6000_BUILTIN_X
@@ -270,6 +272,9 @@ builtin_hasher::equal (builtin_hash_struct *p1, builtin_hash_struct *p2)
#define RS6000_BUILTIN_H(ENUM, NAME, MASK, ATTR, ICODE) \
{ NAME, ICODE, MASK, ATTR },
+#define RS6000_BUILTIN_M(ENUM, NAME, MASK, ATTR, ICODE) \
+ { NAME, ICODE, MASK, ATTR },
+
#define RS6000_BUILTIN_P(ENUM, NAME, MASK, ATTR, ICODE) \
{ NAME, ICODE, MASK, ATTR },
@@ -296,6 +301,7 @@ static const struct rs6000_builtin_info_type rs6000_builtin_info[] =
#undef RS6000_BUILTIN_A
#undef RS6000_BUILTIN_D
#undef RS6000_BUILTIN_H
+#undef RS6000_BUILTIN_M
#undef RS6000_BUILTIN_P
#undef RS6000_BUILTIN_X
@@ -8354,6 +8360,9 @@ def_builtin (const char *name, tree type, enum rs6000_builtins code)
attr_string = ", fp, const";
}
}
+ else if ((classify & (RS6000_BTC_QUAD | RS6000_BTC_PAIR)) != 0)
+ /* The function uses a register quad and/or pair. Nothing to do. */
+ ;
else if ((classify & RS6000_BTC_ATTR_MASK) != 0)
gcc_unreachable ();
@@ -8372,6 +8381,7 @@ def_builtin (const char *name, tree type, enum rs6000_builtins code)
#undef RS6000_BUILTIN_A
#undef RS6000_BUILTIN_D
#undef RS6000_BUILTIN_H
+#undef RS6000_BUILTIN_M
#undef RS6000_BUILTIN_P
#undef RS6000_BUILTIN_X
@@ -8385,6 +8395,7 @@ def_builtin (const char *name, tree type, enum rs6000_builtins code)
#define RS6000_BUILTIN_A(ENUM, NAME, MASK, ATTR, ICODE)
#define RS6000_BUILTIN_D(ENUM, NAME, MASK, ATTR, ICODE)
#define RS6000_BUILTIN_H(ENUM, NAME, MASK, ATTR, ICODE)
+#define RS6000_BUILTIN_M(ENUM, NAME, MASK, ATTR, ICODE)
#define RS6000_BUILTIN_P(ENUM, NAME, MASK, ATTR, ICODE)
#define RS6000_BUILTIN_X(ENUM, NAME, MASK, ATTR, ICODE)
@@ -8403,6 +8414,7 @@ static const struct builtin_description bdesc_3arg[] =
#undef RS6000_BUILTIN_A
#undef RS6000_BUILTIN_D
#undef RS6000_BUILTIN_H
+#undef RS6000_BUILTIN_M
#undef RS6000_BUILTIN_P
#undef RS6000_BUILTIN_X
@@ -8416,6 +8428,7 @@ static const struct builtin_description bdesc_3arg[] =
#define RS6000_BUILTIN_A(ENUM, NAME, MASK, ATTR, ICODE)
#define RS6000_BUILTIN_D(ENUM, NAME, MASK, ATTR, ICODE)
#define RS6000_BUILTIN_H(ENUM, NAME, MASK, ATTR, ICODE)
+#define RS6000_BUILTIN_M(ENUM, NAME, MASK, ATTR, ICODE)
#define RS6000_BUILTIN_P(ENUM, NAME, MASK, ATTR, ICODE)
#define RS6000_BUILTIN_X(ENUM, NAME, MASK, ATTR, ICODE)
@@ -8434,6 +8447,7 @@ static const struct builtin_description bdesc_4arg[] =
#undef RS6000_BUILTIN_A
#undef RS6000_BUILTIN_D
#undef RS6000_BUILTIN_H
+#undef RS6000_BUILTIN_M
#undef RS6000_BUILTIN_P
#undef RS6000_BUILTIN_X
@@ -8447,6 +8461,7 @@ static const struct builtin_description bdesc_4arg[] =
{ MASK, ICODE, NAME, ENUM },
#define RS6000_BUILTIN_H(ENUM, NAME, MASK, ATTR, ICODE)
+#define RS6000_BUILTIN_M(ENUM, NAME, MASK, ATTR, ICODE)
#define RS6000_BUILTIN_P(ENUM, NAME, MASK, ATTR, ICODE)
#define RS6000_BUILTIN_X(ENUM, NAME, MASK, ATTR, ICODE)
@@ -8465,6 +8480,7 @@ static const struct builtin_description bdesc_dst[] =
#undef RS6000_BUILTIN_A
#undef RS6000_BUILTIN_D
#undef RS6000_BUILTIN_H
+#undef RS6000_BUILTIN_M
#undef RS6000_BUILTIN_P
#undef RS6000_BUILTIN_X
@@ -8478,6 +8494,7 @@ static const struct builtin_description bdesc_dst[] =
#define RS6000_BUILTIN_A(ENUM, NAME, MASK, ATTR, ICODE)
#define RS6000_BUILTIN_D(ENUM, NAME, MASK, ATTR, ICODE)
#define RS6000_BUILTIN_H(ENUM, NAME, MASK, ATTR, ICODE)
+#define RS6000_BUILTIN_M(ENUM, NAME, MASK, ATTR, ICODE)
#define RS6000_BUILTIN_P(ENUM, NAME, MASK, ATTR, ICODE)
#define RS6000_BUILTIN_X(ENUM, NAME, MASK, ATTR, ICODE)
@@ -8494,6 +8511,7 @@ static const struct builtin_description bdesc_2arg[] =
#undef RS6000_BUILTIN_A
#undef RS6000_BUILTIN_D
#undef RS6000_BUILTIN_H
+#undef RS6000_BUILTIN_M
#undef RS6000_BUILTIN_P
#undef RS6000_BUILTIN_X
@@ -8505,6 +8523,7 @@ static const struct builtin_description bdesc_2arg[] =
#define RS6000_BUILTIN_A(ENUM, NAME, MASK, ATTR, ICODE)
#define RS6000_BUILTIN_D(ENUM, NAME, MASK, ATTR, ICODE)
#define RS6000_BUILTIN_H(ENUM, NAME, MASK, ATTR, ICODE)
+#define RS6000_BUILTIN_M(ENUM, NAME, MASK, ATTR, ICODE)
#define RS6000_BUILTIN_P(ENUM, NAME, MASK, ATTR, ICODE) \
{ MASK, ICODE, NAME, ENUM },
@@ -8527,6 +8546,7 @@ static const struct builtin_description bdesc_altivec_preds[] =
#undef RS6000_BUILTIN_A
#undef RS6000_BUILTIN_D
#undef RS6000_BUILTIN_H
+#undef RS6000_BUILTIN_M
#undef RS6000_BUILTIN_P
#undef RS6000_BUILTIN_X
@@ -8540,6 +8560,7 @@ static const struct builtin_description bdesc_altivec_preds[] =
#define RS6000_BUILTIN_D(ENUM, NAME, MASK, ATTR, ICODE)
#define RS6000_BUILTIN_H(ENUM, NAME, MASK, ATTR, ICODE)
+#define RS6000_BUILTIN_M(ENUM, NAME, MASK, ATTR, ICODE)
#define RS6000_BUILTIN_P(ENUM, NAME, MASK, ATTR, ICODE)
#define RS6000_BUILTIN_X(ENUM, NAME, MASK, ATTR, ICODE)
@@ -8559,6 +8580,7 @@ static const struct builtin_description bdesc_abs[] =
#undef RS6000_BUILTIN_A
#undef RS6000_BUILTIN_D
#undef RS6000_BUILTIN_H
+#undef RS6000_BUILTIN_M
#undef RS6000_BUILTIN_P
#undef RS6000_BUILTIN_X
@@ -8572,6 +8594,7 @@ static const struct builtin_description bdesc_abs[] =
#define RS6000_BUILTIN_A(ENUM, NAME, MASK, ATTR, ICODE)
#define RS6000_BUILTIN_D(ENUM, NAME, MASK, ATTR, ICODE)
#define RS6000_BUILTIN_H(ENUM, NAME, MASK, ATTR, ICODE)
+#define RS6000_BUILTIN_M(ENUM, NAME, MASK, ATTR, ICODE)
#define RS6000_BUILTIN_P(ENUM, NAME, MASK, ATTR, ICODE)
#define RS6000_BUILTIN_X(ENUM, NAME, MASK, ATTR, ICODE)
@@ -8590,6 +8613,7 @@ static const struct builtin_description bdesc_1arg[] =
#undef RS6000_BUILTIN_A
#undef RS6000_BUILTIN_D
#undef RS6000_BUILTIN_H
+#undef RS6000_BUILTIN_M
#undef RS6000_BUILTIN_P
#undef RS6000_BUILTIN_X
@@ -8603,6 +8627,7 @@ static const struct builtin_description bdesc_1arg[] =
#define RS6000_BUILTIN_A(ENUM, NAME, MASK, ATTR, ICODE)
#define RS6000_BUILTIN_D(ENUM, NAME, MASK, ATTR, ICODE)
#define RS6000_BUILTIN_H(ENUM, NAME, MASK, ATTR, ICODE)
+#define RS6000_BUILTIN_M(ENUM, NAME, MASK, ATTR, ICODE)
#define RS6000_BUILTIN_P(ENUM, NAME, MASK, ATTR, ICODE)
#define RS6000_BUILTIN_X(ENUM, NAME, MASK, ATTR, ICODE)
@@ -8620,6 +8645,7 @@ static const struct builtin_description bdesc_0arg[] =
#undef RS6000_BUILTIN_A
#undef RS6000_BUILTIN_D
#undef RS6000_BUILTIN_H
+#undef RS6000_BUILTIN_M
#undef RS6000_BUILTIN_P
#undef RS6000_BUILTIN_X
@@ -8633,6 +8659,7 @@ static const struct builtin_description bdesc_0arg[] =
#define RS6000_BUILTIN_H(ENUM, NAME, MASK, ATTR, ICODE) \
{ MASK, ICODE, NAME, ENUM },
+#define RS6000_BUILTIN_M(ENUM, NAME, MASK, ATTR, ICODE)
#define RS6000_BUILTIN_P(ENUM, NAME, MASK, ATTR, ICODE)
#define RS6000_BUILTIN_X(ENUM, NAME, MASK, ATTR, ICODE)
@@ -8641,6 +8668,7 @@ static const struct builtin_description bdesc_htm[] =
#include "rs6000-builtin.def"
};
+/* MMA builtins. */
#undef RS6000_BUILTIN_0
#undef RS6000_BUILTIN_1
#undef RS6000_BUILTIN_2
@@ -8649,7 +8677,40 @@ static const struct builtin_description bdesc_htm[] =
#undef RS6000_BUILTIN_A
#undef RS6000_BUILTIN_D
#undef RS6000_BUILTIN_H
+#undef RS6000_BUILTIN_M
#undef RS6000_BUILTIN_P
+#undef RS6000_BUILTIN_X
+
+#define RS6000_BUILTIN_0(ENUM, NAME, MASK, ATTR, ICODE)
+#define RS6000_BUILTIN_1(ENUM, NAME, MASK, ATTR, ICODE)
+#define RS6000_BUILTIN_2(ENUM, NAME, MASK, ATTR, ICODE)
+#define RS6000_BUILTIN_3(ENUM, NAME, MASK, ATTR, ICODE)
+#define RS6000_BUILTIN_4(ENUM, NAME, MASK, ATTR, ICODE)
+#define RS6000_BUILTIN_A(ENUM, NAME, MASK, ATTR, ICODE)
+#define RS6000_BUILTIN_D(ENUM, NAME, MASK, ATTR, ICODE)
+#define RS6000_BUILTIN_H(ENUM, NAME, MASK, ATTR, ICODE)
+#define RS6000_BUILTIN_M(ENUM, NAME, MASK, ATTR, ICODE) \
+ { MASK, ICODE, NAME, ENUM },
+
+#define RS6000_BUILTIN_P(ENUM, NAME, MASK, ATTR, ICODE)
+#define RS6000_BUILTIN_X(ENUM, NAME, MASK, ATTR, ICODE)
+
+static const struct builtin_description bdesc_mma[] =
+{
+#include "rs6000-builtin.def"
+};
+
+#undef RS6000_BUILTIN_0
+#undef RS6000_BUILTIN_1
+#undef RS6000_BUILTIN_2
+#undef RS6000_BUILTIN_3
+#undef RS6000_BUILTIN_4
+#undef RS6000_BUILTIN_A
+#undef RS6000_BUILTIN_D
+#undef RS6000_BUILTIN_H
+#undef RS6000_BUILTIN_M
+#undef RS6000_BUILTIN_P
+#undef RS6000_BUILTIN_X
/* Return true if a builtin function is overloaded. */
bool
@@ -9393,6 +9454,133 @@ altivec_expand_stv_builtin (enum insn_code icode, tree exp)
return NULL_RTX;
}
+/* Expand the MMA built-in in EXP.
+ Store true in *EXPANDEDP if we found a built-in to expand. */
+
+static rtx
+mma_expand_builtin (tree exp, rtx target, bool *expandedp)
+{
+ unsigned i;
+ tree fndecl = TREE_OPERAND (CALL_EXPR_FN (exp), 0);
+ enum rs6000_builtins fcode
+ = (enum rs6000_builtins) DECL_MD_FUNCTION_CODE (fndecl);
+ const struct builtin_description *d = bdesc_mma;
+
+ /* Expand the MMA built-in. */
+ for (i = 0; i < ARRAY_SIZE (bdesc_mma); i++, d++)
+ if (d->code == fcode)
+ break;
+
+ if (i >= ARRAY_SIZE (bdesc_mma))
+ {
+ *expandedp = false;
+ return NULL_RTX;
+ }
+
+ *expandedp = true;
+
+ tree arg;
+ call_expr_arg_iterator iter;
+ enum insn_code icode = d->icode;
+ const struct insn_operand_data *insn_op;
+ rtx op[MAX_MMA_OPERANDS];
+ unsigned nopnds = 0;
+ unsigned attr = rs6000_builtin_info[fcode].attr;
+ bool void_func = (attr & RS6000_BTC_VOID);
+ machine_mode tmode = VOIDmode;
+
+ if (TREE_TYPE (TREE_TYPE (fndecl)) != void_type_node)
+ {
+ tmode = insn_data[icode].operand[0].mode;
+ if (!target
+ || GET_MODE (target) != tmode
+ || !(*insn_data[icode].operand[0].predicate) (target, tmode))
+ target = gen_reg_rtx (tmode);
+ op[nopnds++] = target;
+ }
+ else
+ target = const0_rtx;
+
+ FOR_EACH_CALL_EXPR_ARG (arg, iter, exp)
+ {
+ if (arg == error_mark_node)
+ return const0_rtx;
+
+ rtx opnd;
+ insn_op = &insn_data[icode].operand[nopnds];
+ if (TREE_CODE (arg) == ADDR_EXPR
+ && MEM_P (DECL_RTL (TREE_OPERAND (arg, 0))))
+ opnd = DECL_RTL (TREE_OPERAND (arg, 0));
+ else
+ opnd = expand_normal (arg);
+
+ if (!(*insn_op->predicate) (opnd, insn_op->mode))
+ {
+ if (!strcmp (insn_op->constraint, "n"))
+ {
+ if (!CONST_INT_P (opnd))
+ error ("argument %d must be an unsigned literal", nopnds);
+ else
+ error ("argument %d is an unsigned literal that is "
+ "out of range", nopnds);
+ return const0_rtx;
+ }
+ opnd = copy_to_mode_reg (insn_op->mode, opnd);
+ }
+
+ /* Some MMA instructions have INOUT accumulator operands, so force
+ their target register to be the same as their input register. */
+ if (!void_func
+ && nopnds == 1
+ && !strcmp (insn_op->constraint, "0")
+ && insn_op->mode == tmode
+ && REG_P (opnd)
+ && (*insn_data[icode].operand[0].predicate) (opnd, tmode))
+ target = op[0] = opnd;
+
+ op[nopnds++] = opnd;
+ }
+
+ unsigned attr_args = attr & RS6000_BTC_OPND_MASK;
+ if (attr & RS6000_BTC_QUAD)
+ attr_args++;
+
+ gcc_assert (nopnds == attr_args);
+
+ rtx pat;
+ switch (nopnds)
+ {
+ case 1:
+ pat = GEN_FCN (icode) (op[0]);
+ break;
+ case 2:
+ pat = GEN_FCN (icode) (op[0], op[1]);
+ break;
+ case 3:
+ pat = GEN_FCN (icode) (op[0], op[1], op[2]);
+ break;
+ case 4:
+ pat = GEN_FCN (icode) (op[0], op[1], op[2], op[3]);
+ break;
+ case 5:
+ pat = GEN_FCN (icode) (op[0], op[1], op[2], op[3], op[4]);
+ break;
+ case 6:
+ pat = GEN_FCN (icode) (op[0], op[1], op[2], op[3], op[4], op[5]);
+ break;
+ case 7:
+ pat = GEN_FCN (icode) (op[0], op[1], op[2], op[3], op[4], op[5], op[6]);
+ break;
+ default:
+ gcc_unreachable ();
+ }
+ if (!pat)
+ return NULL_RTX;
+ emit_insn (pat);
+
+ return target;
+}
+
/* Return the appropriate SPR number associated with the given builtin. */
static inline HOST_WIDE_INT
htm_spr_num (enum rs6000_builtins code)
@@ -9539,11 +9727,11 @@ htm_expand_builtin (tree exp, rtx target, bool * expandedp)
if (flag_checking)
{
int expected_nopnds = 0;
- if ((attr & RS6000_BTC_TYPE_MASK) == RS6000_BTC_UNARY)
+ if ((attr & RS6000_BTC_OPND_MASK) == RS6000_BTC_UNARY)
expected_nopnds = 1;
- else if ((attr & RS6000_BTC_TYPE_MASK) == RS6000_BTC_BINARY)
+ else if ((attr & RS6000_BTC_OPND_MASK) == RS6000_BTC_BINARY)
expected_nopnds = 2;
- else if ((attr & RS6000_BTC_TYPE_MASK) == RS6000_BTC_TERNARY)
+ else if ((attr & RS6000_BTC_OPND_MASK) == RS6000_BTC_TERNARY)
expected_nopnds = 3;
else if ((attr & RS6000_BTC_TYPE_MASK) == RS6000_BTC_QUATERNARY)
expected_nopnds = 4;
@@ -10647,6 +10835,10 @@ rs6000_invalid_builtin (enum rs6000_builtins fncode)
"-m64");
else if ((fnmask & RS6000_BTM_P9_MISC) == RS6000_BTM_P9_MISC)
error ("%qs requires the %qs option", name, "-mcpu=power9");
+ else if ((fnmask & RS6000_BTM_FUTURE) != 0)
+ error ("%qs requires the %qs option", name, "-mcpu=future");
+ else if ((fnmask & RS6000_BTM_MMA) != 0)
+ error ("%qs requires the %qs option", name, "-mmma");
else if ((fnmask & RS6000_BTM_LDBL128) == RS6000_BTM_LDBL128)
{
if (!TARGET_HARD_FLOAT)
@@ -10690,6 +10882,10 @@ rs6000_fold_builtin (tree fndecl ATTRIBUTE_UNUSED,
static bool
rs6000_builtin_valid_without_lhs (enum rs6000_builtins fn_code)
{
+ /* Check for built-ins explicitly marked as a void function. */
+ if (rs6000_builtin_info[fn_code].attr & RS6000_BTC_VOID)
+ return true;
+
switch (fn_code)
{
case ALTIVEC_BUILTIN_STVX_V16QI:
@@ -10833,6 +11029,156 @@ fold_mergeeo_helper (gimple_stmt_iterator *gsi, gimple *stmt, int use_odd)
gsi_replace (gsi, g, true);
}
+/* Expand the MMA built-ins early, so that we can convert the pass-by-reference
+ __vector_quad arguments into pass-by-value arguments, leading to more
+ efficient code generation. */
+
+bool
+rs6000_gimple_fold_mma_builtin (gimple_stmt_iterator *gsi)
+{
+ gimple *stmt = gsi_stmt (*gsi);
+ tree fndecl = gimple_call_fndecl (stmt);
+ enum rs6000_builtins fncode
+ = (enum rs6000_builtins) DECL_MD_FUNCTION_CODE (fndecl);
+ unsigned attr = rs6000_builtin_info[fncode].attr;
+
+ if ((attr & RS6000_BTC_GIMPLE) == 0)
+ return false;
+
+ unsigned nopnds = (attr & RS6000_BTC_OPND_MASK);
+ gimple_seq new_seq = NULL;
+ gimple *new_call;
+ tree new_decl;
+
+ if (rs6000_builtin_info[fncode + 1].icode == CODE_FOR_nothing)
+ {
+ /* This is an MMA disassemble built-in function. */
+ gcc_assert (fncode == MMA_BUILTIN_DISASSEMBLE_ACC
+ || fncode == MMA_BUILTIN_DISASSEMBLE_PAIR);
+
+ push_gimplify_context (true);
+ tree dst_ptr = gimple_call_arg (stmt, 0);
+ tree src_ptr = gimple_call_arg (stmt, 1);
+ tree src_type = TREE_TYPE (src_ptr);
+ tree src = make_ssa_name (TREE_TYPE (src_type));
+ gimplify_assign (src, build_simple_mem_ref (src_ptr), &new_seq);
+
+ /* If we are not disassembling an accumulator or our destination is
+ another accumulator, then just copy the entire thing as is. */
+ if (fncode != MMA_BUILTIN_DISASSEMBLE_ACC
+ || TREE_TYPE (TREE_TYPE (dst_ptr)) == vector_quad_type_node)
+ {
+ tree dst = build_simple_mem_ref (build1 (VIEW_CONVERT_EXPR,
+ src_type, dst_ptr));
+ gimplify_assign (dst, src, &new_seq);
+ pop_gimplify_context (NULL);
+ gsi_replace_with_seq (gsi, new_seq, true);
+ return true;
+ }
+
+ /* We're disassembling an accumulator into a different type, so we need
+ to emit a xxmfacc instruction now, since we cannot do it later. */
+ new_decl = rs6000_builtin_decls[MMA_BUILTIN_XXMFACC_INTERNAL];
+ new_call = gimple_build_call (new_decl, 1, src);
+ src = make_ssa_name (vector_quad_type_node);
+ gimple_call_set_lhs (new_call, src);
+ gimple_seq_add_stmt (&new_seq, new_call);
+
+ /* Copy the accumulator vector by vector. */
+ tree dst_type = build_pointer_type_for_mode (unsigned_V16QI_type_node,
+ ptr_mode, true);
+ tree dst_base = build1 (VIEW_CONVERT_EXPR, dst_type, dst_ptr);
+ tree array_type = build_array_type_nelts (unsigned_V16QI_type_node, 4);
+ tree src_array = build1 (VIEW_CONVERT_EXPR, array_type, src);
+ for (unsigned i = 0; i < 4; i++)
+ {
+ tree ref = build4 (ARRAY_REF, unsigned_V16QI_type_node, src_array,
+ build_int_cst (size_type_node, i),
+ NULL_TREE, NULL_TREE);
+ tree dst = build2 (MEM_REF, unsigned_V16QI_type_node, dst_base,
+ build_int_cst (dst_type, i * 16));
+ gimplify_assign (dst, ref, &new_seq);
+ }
+ pop_gimplify_context (NULL);
+ gsi_replace_with_seq (gsi, new_seq, true);
+ return true;
+ }
+
+ /* Convert this built-in into an internal version that uses pass-by-value
+ arguments. The internal built-in follows immediately after this one. */
+ new_decl = rs6000_builtin_decls[fncode + 1];
+ tree lhs, mem, op[MAX_MMA_OPERANDS];
+ tree acc = gimple_call_arg (stmt, 0);
+ if (TREE_CODE (acc) == PARM_DECL)
+ mem = build1 (INDIRECT_REF, TREE_TYPE (TREE_TYPE (acc)), acc);
+ else
+ mem = build_simple_mem_ref (acc);
+ push_gimplify_context (true);
+
+ if ((attr & RS6000_BTC_QUAD) != 0)
+ {
+ /* This built-in has a pass-by-reference accumulator input, so load it
+ into a temporary accumulator for use as a pass-by-value input. */
+ op[0] = make_ssa_name (vector_quad_type_node);
+ for (unsigned i = 1; i < nopnds; i++)
+ op[i] = gimple_call_arg (stmt, i);
+ gimplify_assign (op[0], mem, &new_seq);
+ }
+ else
+ {
+ /* This built-in does not use its pass-by-reference accumulator argument
+ as an input argument, so remove it from the input list. */
+ nopnds--;
+ for (unsigned i = 0; i < nopnds; i++)
+ op[i] = gimple_call_arg (stmt, i + 1);
+ }
+
+ switch (nopnds)
+ {
+ case 0:
+ new_call = gimple_build_call (new_decl, 0);
+ break;
+ case 1:
+ new_call = gimple_build_call (new_decl, 1, op[0]);
+ break;
+ case 2:
+ new_call = gimple_build_call (new_decl, 2, op[0], op[1]);
+ break;
+ case 3:
+ new_call = gimple_build_call (new_decl, 3, op[0], op[1], op[2]);
+ break;
+ case 4:
+ new_call = gimple_build_call (new_decl, 4, op[0], op[1], op[2], op[3]);
+ break;
+ case 5:
+ new_call = gimple_build_call (new_decl, 5, op[0], op[1], op[2], op[3],
+ op[4]);
+ break;
+ case 6:
+ new_call = gimple_build_call (new_decl, 6, op[0], op[1], op[2], op[3],
+ op[4], op[5]);
+ break;
+ case 7:
+ new_call = gimple_build_call (new_decl, 7, op[0], op[1], op[2], op[3],
+ op[4], op[5], op[6]);
+ break;
+ default:
+ gcc_unreachable ();
+ }
+
+ if (fncode == MMA_BUILTIN_ASSEMBLE_PAIR)
+ lhs = make_ssa_name (vector_pair_type_node);
+ else
+ lhs = make_ssa_name (vector_quad_type_node);
+ gimple_call_set_lhs (new_call, lhs);
+ gimple_seq_add_stmt (&new_seq, new_call);
+ gimplify_assign (mem, lhs, &new_seq);
+ pop_gimplify_context (NULL);
+ gsi_replace_with_seq (gsi, new_seq, true);
+
+ return true;
+}
+
/* Fold a machine-dependent built-in in GIMPLE. (For folding into
a constant, use rs6000_fold_builtin.) */
@@ -10868,11 +11214,12 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
return false;
/* Don't fold invalid builtins, let rs6000_expand_builtin diagnose it. */
- HOST_WIDE_INT mask = rs6000_builtin_info[uns_fncode].mask;
- bool func_valid_p = (rs6000_builtin_mask & mask) == mask;
- if (!func_valid_p)
+ if (!rs6000_builtin_is_supported_p (fn_code))
return false;
+ if (rs6000_gimple_fold_mma_builtin (gsi))
+ return true;
+
switch (fn_code)
{
/* Flavors of vec_add. We deliberately don't expand
@@ -12007,6 +12354,13 @@ rs6000_expand_builtin (tree exp, rtx target, rtx subtarget ATTRIBUTE_UNUSED,
break;
}
+ if (TARGET_MMA)
+ {
+ ret = mma_expand_builtin (exp, target, &success);
+
+ if (success)
+ return ret;
+ }
if (TARGET_ALTIVEC)
{
ret = altivec_expand_builtin (exp, target, &success);
@@ -12022,7 +12376,7 @@ rs6000_expand_builtin (tree exp, rtx target, rtx subtarget ATTRIBUTE_UNUSED,
return ret;
}
- unsigned attr = rs6000_builtin_info[uns_fcode].attr & RS6000_BTC_TYPE_MASK;
+ unsigned attr = rs6000_builtin_info[uns_fcode].attr & RS6000_BTC_OPND_MASK;
/* RS6000_BTC_SPECIAL represents no-operand operators. */
gcc_assert (attr == RS6000_BTC_UNARY
|| attr == RS6000_BTC_BINARY
@@ -12205,7 +12559,7 @@ rs6000_init_builtins (void)
else
ieee128_float_type_node = ibm128_float_type_node = long_double_type_node;
- /* Vector paired and vector quad support. */
+ /* Vector pair and vector quad support. */
if (TARGET_MMA)
{
tree oi_uns_type = make_unsigned_type (256);
@@ -12287,6 +12641,8 @@ rs6000_init_builtins (void)
the target attribute. */
if (TARGET_EXTRA_BUILTINS)
altivec_init_builtins ();
+ if (TARGET_MMA)
+ mma_init_builtins ();
if (TARGET_HTM)
htm_init_builtins ();
@@ -13012,6 +13368,119 @@ altivec_init_builtins (void)
}
+static void
+mma_init_builtins (void)
+{
+ const struct builtin_description *d = bdesc_mma;
+
+ for (unsigned i = 0; i < ARRAY_SIZE (bdesc_mma); i++, d++)
+ {
+ tree op[MAX_MMA_OPERANDS], type;
+ HOST_WIDE_INT mask = d->mask;
+ unsigned icode = (unsigned) d->icode;
+ unsigned attr = rs6000_builtin_info[d->code].attr;
+ int attr_args = (attr & RS6000_BTC_OPND_MASK);
+ bool gimple_func = (attr & RS6000_BTC_GIMPLE);
+ unsigned nopnds = 0;
+
+ if ((mask & rs6000_builtin_mask) != mask)
+ {
+ if (TARGET_DEBUG_BUILTIN)
+ fprintf (stderr, "mma_builtin, skip binary %s\n", d->name);
+ continue;
+ }
+
+ if (d->name == 0)
+ {
+ if (TARGET_DEBUG_BUILTIN)
+ fprintf (stderr, "mma_builtin, bdesc_mma[%ld] no name\n",
+ (long unsigned) i);
+ continue;
+ }
+
+ if (gimple_func)
+ {
+ gcc_assert (icode == CODE_FOR_nothing);
+ op[nopnds++] = void_type_node;
+ /* Some MMA built-ins that are expanded into gimple are converted
+ into internal MMA built-ins that are expanded into rtl.
+ The internal built-in follows immediately after this built-in. */
+ icode = d[1].icode;
+ }
+ else
+ {
+ if ((attr & RS6000_BTC_QUAD) == 0)
+ attr_args--;
+
+ /* Ensure we have the correct number and type of operands. */
+ gcc_assert (attr_args == insn_data[icode].n_operands - 1);
+ }
+
+ if (icode == CODE_FOR_nothing)
+ {
+ /* This is a disassemble MMA built-in function. */
+ gcc_assert (attr_args == RS6000_BTC_BINARY
+ && (d->code == MMA_BUILTIN_DISASSEMBLE_ACC
+ || d->code == MMA_BUILTIN_DISASSEMBLE_PAIR));
+ op[nopnds++] = build_pointer_type (void_type_node);
+ if (attr & RS6000_BTC_QUAD)
+ op[nopnds++] = build_pointer_type (vector_quad_type_node);
+ else
+ op[nopnds++] = build_pointer_type (vector_pair_type_node);
+ }
+ else
+ {
+ /* This is a normal MMA built-in function. */
+ unsigned j = (attr & RS6000_BTC_QUAD) ? 1 : 0;
+ for (; j < insn_data[icode].n_operands; j++)
+ {
+ machine_mode mode = insn_data[icode].operand[j].mode;
+ if (gimple_func && mode == PXImode)
+ op[nopnds++] = build_pointer_type (vector_quad_type_node);
+ else if (gimple_func && mode == POImode
+ && d->code == MMA_BUILTIN_ASSEMBLE_PAIR)
+ op[nopnds++] = build_pointer_type (vector_pair_type_node);
+ else
+ /* MMA uses unsigned types. */
+ op[nopnds++] = builtin_mode_to_type[mode][1];
+ }
+ }
+
+ switch (nopnds)
+ {
+ case 1:
+ type = build_function_type_list (op[0], NULL_TREE);
+ break;
+ case 2:
+ type = build_function_type_list (op[0], op[1], NULL_TREE);
+ break;
+ case 3:
+ type = build_function_type_list (op[0], op[1], op[2], NULL_TREE);
+ break;
+ case 4:
+ type = build_function_type_list (op[0], op[1], op[2], op[3],
+ NULL_TREE);
+ break;
+ case 5:
+ type = build_function_type_list (op[0], op[1], op[2], op[3], op[4],
+ NULL_TREE);
+ break;
+ case 6:
+ type = build_function_type_list (op[0], op[1], op[2], op[3], op[4],
+ op[5], NULL_TREE);
+ break;
+ case 7:
+ type = build_function_type_list (op[0], op[1], op[2], op[3], op[4],
+ op[5], op[6], NULL_TREE);
+ break;
+ default:
+ gcc_unreachable ();
+ }
+
+ def_builtin (d->name, type, d->code);
+ }
+}
+
static void
htm_init_builtins (void)
{
@@ -13026,7 +13495,7 @@ htm_init_builtins (void)
HOST_WIDE_INT mask = d->mask;
unsigned attr = rs6000_builtin_info[d->code].attr;
bool void_func = (attr & RS6000_BTC_VOID);
- int attr_args = (attr & RS6000_BTC_TYPE_MASK);
+ int attr_args = (attr & RS6000_BTC_OPND_MASK);
int nopnds = 0;
tree gpr_type_node;
tree rettype;
@@ -13192,6 +13661,8 @@ builtin_function_type (machine_mode mode_ret, machine_mode mode_arg0,
case P8V_BUILTIN_VGBBD:
case MISC_BUILTIN_CDTBCD:
case MISC_BUILTIN_CBCDTD:
+ case VSX_BUILTIN_XVCVSPBF16:
+ case VSX_BUILTIN_XVCVBF16SP:
h.uns_p[0] = 1;
h.uns_p[1] = 1;
break;
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index a0f4991d00a..756a2ae8cb9 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -9944,7 +9944,8 @@ rs6000_emit_move (rtx dest, rtx source, machine_mode mode)
case E_POImode:
case E_PXImode:
- if (CONSTANT_P (operands[1]))
+ if (CONSTANT_P (operands[1])
+ && INTVAL (operands[1]) != 0)
error ("%qs is an opaque type, and you can't set it to other values.",
(mode == POImode) ? "__vector_pair" : "__vector_quad");
break;
@@ -12856,6 +12857,14 @@ print_operand (FILE *file, rtx x, int code)
/* %c is output_addr_const if a CONSTANT_ADDRESS_P, otherwise
output_operand. */
+ case 'A':
+ /* Write the MMA accumulator number associated with VSX register X. */
+ if (!REG_P (x) || !FP_REGNO_P (REGNO (x)) || (REGNO (x) % 4) != 0)
+ output_operand_lossage ("invalid %%A value");
+ else
+ fprintf (file, "%d", (REGNO (x) - FIRST_FPR_REGNO) / 4);
+ return;
+
case 'D':
/* Like 'J' but get to the GT bit only. */
if (!REG_P (x) || !CR_REGNO_P (REGNO (x)))
@@ -15969,6 +15978,12 @@ rs6000_split_multireg_move (rtx dst, rtx src)
unsigned offset = 0;
unsigned size = GET_MODE_SIZE (reg_mode);
+ /* If we are reading an accumulator register, we have to
+ deprime it before we can access it. */
+ if (TARGET_MMA
+ && GET_MODE (src) == PXImode && FP_REGNO_P (REGNO (src)))
+ emit_insn (gen_mma_xxmfacc (src, src));
+
for (int i = 0; i < nregs; i++)
{
unsigned subreg = (WORDS_BIG_ENDIAN)
@@ -15997,6 +16012,32 @@ rs6000_split_multireg_move (rtx dst, rtx src)
emit_insn (gen_rtx_SET (dst2, src2));
}
+ /* If we are writing an accumulator register, we have to
+ prime it after we've written it. */
+ if (TARGET_MMA
+ && GET_MODE (dst) == PXImode && FP_REGNO_P (REGNO (dst)))
+ emit_insn (gen_mma_xxmtacc (dst, dst));
+
+ return;
+ }
+
+ if (GET_CODE (src) == UNSPEC)
+ {
+ gcc_assert (REG_P (dst)
+ && FP_REGNO_P (REGNO (dst))
+ && XINT (src, 1) == UNSPEC_MMA_ASSEMBLE_ACC);
+
+ reg_mode = GET_MODE (XVECEXP (src, 0, 0));
+ for (int i = 0; i < XVECLEN (src, 0); i++)
+ {
+ rtx dst_i = gen_rtx_REG (reg_mode, reg + i);
+ emit_insn (gen_rtx_SET (dst_i, XVECEXP (src, 0, i)));
+ }
+
+ /* We are writing an accumulator register, so we have to
+ prime it after we've written it. */
+ emit_insn (gen_mma_xxmtacc (dst, dst));
+
return;
}
@@ -16005,6 +16046,12 @@ rs6000_split_multireg_move (rtx dst, rtx src)
if (REG_P (src) && REG_P (dst) && (REGNO (src) < REGNO (dst)))
{
+ /* If we are reading an accumulator register, we have to
+ deprime it before we can access it. */
+ if (TARGET_MMA
+ && GET_MODE (src) == PXImode && FP_REGNO_P (REGNO (src)))
+ emit_insn (gen_mma_xxmfacc (src, src));
+
/* Move register range backwards, if we might have destructive
overlap. */
int i;
@@ -16013,6 +16060,12 @@ rs6000_split_multireg_move (rtx dst, rtx src)
i * reg_mode_size),
simplify_gen_subreg (reg_mode, src, mode,
i * reg_mode_size)));
+
+ /* If we are writing an accumulator register, we have to
+ prime it after we've written it. */
+ if (TARGET_MMA
+ && GET_MODE (dst) == PXImode && FP_REGNO_P (REGNO (dst)))
+ emit_insn (gen_mma_xxmtacc (dst, dst));
}
else
{
@@ -16145,6 +16198,12 @@ rs6000_split_multireg_move (rtx dst, rtx src)
gcc_assert (rs6000_offsettable_memref_p (dst, reg_mode, true));
}
+ /* If we are reading an accumulator register, we have to
+ deprime it before we can access it. */
+ if (TARGET_MMA && REG_P (src)
+ && GET_MODE (src) == PXImode && FP_REGNO_P (REGNO (src)))
+ emit_insn (gen_mma_xxmfacc (src, src));
+
for (i = 0; i < nregs; i++)
{
/* Calculate index to next subword. */
@@ -16162,6 +16221,13 @@ rs6000_split_multireg_move (rtx dst, rtx src)
simplify_gen_subreg (reg_mode, src, mode,
j * reg_mode_size)));
}
+
+ /* If we are writing an accumulator register, we have to
+ prime it after we've written it. */
+ if (TARGET_MMA && REG_P (dst)
+ && GET_MODE (dst) == PXImode && FP_REGNO_P (REGNO (dst)))
+ emit_insn (gen_mma_xxmtacc (dst, dst));
+
if (restore_basereg != NULL_RTX)
emit_insn (restore_basereg);
}
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 9c103bf8f7d..f3883b51255 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -2251,20 +2251,24 @@ extern int frame_pointer_needed;
flags macros, but we've run out of bits, so we now map the options into new
settings used here. */
-/* Builtin attributes. */
-#define RS6000_BTC_SPECIAL 0x00000000 /* Special function. */
+/* Builtin operand count. */
#define RS6000_BTC_UNARY 0x00000001 /* normal unary function. */
#define RS6000_BTC_BINARY 0x00000002 /* normal binary function. */
#define RS6000_BTC_TERNARY 0x00000003 /* normal ternary function. */
#define RS6000_BTC_QUATERNARY 0x00000004 /* normal quaternary
function. */
+#define RS6000_BTC_QUINARY 0x00000005 /* normal quinary function. */
+#define RS6000_BTC_SENARY 0x00000006 /* normal senary function. */
+#define RS6000_BTC_OPND_MASK 0x00000007 /* Mask to isolate operands. */
-#define RS6000_BTC_PREDICATE 0x00000005 /* predicate function. */
-#define RS6000_BTC_ABS 0x00000006 /* Altivec/VSX ABS
+/* Builtin attributes. */
+#define RS6000_BTC_SPECIAL 0x00000000 /* Special function. */
+#define RS6000_BTC_PREDICATE 0x00000008 /* predicate function. */
+#define RS6000_BTC_ABS 0x00000010 /* Altivec/VSX ABS
function. */
-#define RS6000_BTC_DST 0x00000007 /* Altivec DST function. */
+#define RS6000_BTC_DST 0x00000020 /* Altivec DST function. */
-#define RS6000_BTC_TYPE_MASK 0x0000000f /* Mask to isolate types */
+#define RS6000_BTC_TYPE_MASK 0x0000003f /* Mask to isolate types */
#define RS6000_BTC_MISC 0x00000000 /* No special attributes. */
#define RS6000_BTC_CONST 0x00000100 /* Neither uses, nor
@@ -2273,13 +2277,18 @@ extern int frame_pointer_needed;
state/mem and does
not modify global state. */
#define RS6000_BTC_FP 0x00000400 /* depends on rounding mode. */
-#define RS6000_BTC_ATTR_MASK 0x00000700 /* Mask of the attributes. */
+#define RS6000_BTC_QUAD 0x00000800 /* Uses a register quad. */
+#define RS6000_BTC_PAIR 0x00001000 /* Uses a register pair. */
+#define RS6000_BTC_QUADPAIR 0x00001800 /* Uses a quad and a pair. */
+#define RS6000_BTC_ATTR_MASK 0x00001f00 /* Mask of the attributes. */
/* Miscellaneous information. */
#define RS6000_BTC_SPR 0x01000000 /* function references SPRs. */
#define RS6000_BTC_VOID 0x02000000 /* function has no return value. */
#define RS6000_BTC_CR 0x04000000 /* function references a CR. */
#define RS6000_BTC_OVERLOADED 0x08000000 /* function is overloaded. */
+#define RS6000_BTC_GIMPLE 0x10000000 /* function should be expanded
+ into gimple. */
#define RS6000_BTC_MISC_MASK 0x1f000000 /* Mask of the misc info. */
/* Convenience macros to document the instruction type. */
@@ -2348,6 +2357,7 @@ extern int frame_pointer_needed;
#undef RS6000_BUILTIN_A
#undef RS6000_BUILTIN_D
#undef RS6000_BUILTIN_H
+#undef RS6000_BUILTIN_M
#undef RS6000_BUILTIN_P
#undef RS6000_BUILTIN_X
@@ -2359,6 +2369,7 @@ extern int frame_pointer_needed;
#define RS6000_BUILTIN_A(ENUM, NAME, MASK, ATTR, ICODE) ENUM,
#define RS6000_BUILTIN_D(ENUM, NAME, MASK, ATTR, ICODE) ENUM,
#define RS6000_BUILTIN_H(ENUM, NAME, MASK, ATTR, ICODE) ENUM,
+#define RS6000_BUILTIN_M(ENUM, NAME, MASK, ATTR, ICODE) ENUM,
#define RS6000_BUILTIN_P(ENUM, NAME, MASK, ATTR, ICODE) ENUM,
#define RS6000_BUILTIN_X(ENUM, NAME, MASK, ATTR, ICODE) ENUM,
@@ -2377,6 +2388,7 @@ enum rs6000_builtins
#undef RS6000_BUILTIN_A
#undef RS6000_BUILTIN_D
#undef RS6000_BUILTIN_H
+#undef RS6000_BUILTIN_M
#undef RS6000_BUILTIN_P
#undef RS6000_BUILTIN_X
diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
index 66c3cb5f2dc..a1ff5fa852f 100644
--- a/gcc/config/rs6000/mma.md
+++ b/gcc/config/rs6000/mma.md
@@ -31,6 +31,240 @@
;; therefor, we define the XImode and OImode move patterns, but we
;; disable their use with a "false" condition flag.
+(define_constants [(MAX_MMA_OPERANDS 7)])
+
+;; Constants for creating unspecs
+
+(define_c_enum "unspec"
+ [UNSPEC_MMA_ASSEMBLE_ACC
+ UNSPEC_MMA_PMXVBF16GER2
+ UNSPEC_MMA_PMXVBF16GER2NN
+ UNSPEC_MMA_PMXVBF16GER2NP
+ UNSPEC_MMA_PMXVBF16GER2PN
+ UNSPEC_MMA_PMXVBF16GER2PP
+ UNSPEC_MMA_PMXVF16GER2
+ UNSPEC_MMA_PMXVF16GER2NN
+ UNSPEC_MMA_PMXVF16GER2NP
+ UNSPEC_MMA_PMXVF16GER2PN
+ UNSPEC_MMA_PMXVF16GER2PP
+ UNSPEC_MMA_PMXVF32GER
+ UNSPEC_MMA_PMXVF32GERNN
+ UNSPEC_MMA_PMXVF32GERNP
+ UNSPEC_MMA_PMXVF32GERPN
+ UNSPEC_MMA_PMXVF32GERPP
+ UNSPEC_MMA_PMXVF64GER
+ UNSPEC_MMA_PMXVF64GERNN
+ UNSPEC_MMA_PMXVF64GERNP
+ UNSPEC_MMA_PMXVF64GERPN
+ UNSPEC_MMA_PMXVF64GERPP
+ UNSPEC_MMA_PMXVI16GER2
+ UNSPEC_MMA_PMXVI16GER2PP
+ UNSPEC_MMA_PMXVI16GER2S
+ UNSPEC_MMA_PMXVI16GER2SPP
+ UNSPEC_MMA_PMXVI4GER8
+ UNSPEC_MMA_PMXVI4GER8PP
+ UNSPEC_MMA_PMXVI8GER4
+ UNSPEC_MMA_PMXVI8GER4PP
+ UNSPEC_MMA_PMXVI8GER4SPP
+ UNSPEC_MMA_XVBF16GER2
+ UNSPEC_MMA_XVBF16GER2NN
+ UNSPEC_MMA_XVBF16GER2NP
+ UNSPEC_MMA_XVBF16GER2PN
+ UNSPEC_MMA_XVBF16GER2PP
+ UNSPEC_MMA_XVF16GER2
+ UNSPEC_MMA_XVF16GER2NN
+ UNSPEC_MMA_XVF16GER2NP
+ UNSPEC_MMA_XVF16GER2PN
+ UNSPEC_MMA_XVF16GER2PP
+ UNSPEC_MMA_XVF32GER
+ UNSPEC_MMA_XVF32GERNN
+ UNSPEC_MMA_XVF32GERNP
+ UNSPEC_MMA_XVF32GERPN
+ UNSPEC_MMA_XVF32GERPP
+ UNSPEC_MMA_XVF64GER
+ UNSPEC_MMA_XVF64GERNN
+ UNSPEC_MMA_XVF64GERNP
+ UNSPEC_MMA_XVF64GERPN
+ UNSPEC_MMA_XVF64GERPP
+ UNSPEC_MMA_XVI16GER2
+ UNSPEC_MMA_XVI16GER2PP
+ UNSPEC_MMA_XVI16GER2S
+ UNSPEC_MMA_XVI16GER2SPP
+ UNSPEC_MMA_XVI4GER8
+ UNSPEC_MMA_XVI4GER8PP
+ UNSPEC_MMA_XVI8GER4
+ UNSPEC_MMA_XVI8GER4PP
+ UNSPEC_MMA_XVI8GER4SPP
+ UNSPEC_MMA_XXMFACC
+ UNSPEC_MMA_XXMTACC
+ ])
+
+;; MMA instructions with 1 accumulator argument
+(define_int_iterator MMA_ACC [UNSPEC_MMA_XXMFACC
+ UNSPEC_MMA_XXMTACC])
+
+;; MMA instructions with 2 vector arguments
+(define_int_iterator MMA_VV [UNSPEC_MMA_XVI4GER8
+ UNSPEC_MMA_XVI8GER4
+ UNSPEC_MMA_XVI16GER2
+ UNSPEC_MMA_XVI16GER2S
+ UNSPEC_MMA_XVF16GER2
+ UNSPEC_MMA_XVBF16GER2
+ UNSPEC_MMA_XVF32GER])
+
+;; MMA instructions with 1 accumulator and 2 vector arguments
+(define_int_iterator MMA_AVV [UNSPEC_MMA_XVI4GER8PP
+ UNSPEC_MMA_XVI8GER4PP
+ UNSPEC_MMA_XVI8GER4SPP
+ UNSPEC_MMA_XVI16GER2PP
+ UNSPEC_MMA_XVI16GER2SPP
+ UNSPEC_MMA_XVF16GER2PP
+ UNSPEC_MMA_XVF16GER2PN
+ UNSPEC_MMA_XVF16GER2NP
+ UNSPEC_MMA_XVF16GER2NN
+ UNSPEC_MMA_XVBF16GER2PP
+ UNSPEC_MMA_XVBF16GER2PN
+ UNSPEC_MMA_XVBF16GER2NP
+ UNSPEC_MMA_XVBF16GER2NN
+ UNSPEC_MMA_XVF32GERPP
+ UNSPEC_MMA_XVF32GERPN
+ UNSPEC_MMA_XVF32GERNP
+ UNSPEC_MMA_XVF32GERNN])
+
+;; MMA instructions with 1 vector pair and 1 vector arguments
+(define_int_iterator MMA_PV [UNSPEC_MMA_XVF64GER])
+
+;; MMA instructions with 1 accumulator, 1 vector pair and 1 vector arguments
+(define_int_iterator MMA_APV [UNSPEC_MMA_XVF64GERPP
+ UNSPEC_MMA_XVF64GERPN
+ UNSPEC_MMA_XVF64GERNP
+ UNSPEC_MMA_XVF64GERNN])
+
+;; MMA instructions with 2 vector, 2 4-bit and 1 8-bit arguments
+(define_int_iterator MMA_VVI4I4I8 [UNSPEC_MMA_PMXVI4GER8])
+
+;; MMA instructions with 1 accumulator, 2 vector, 2 4-bit and 1 8-bit arguments
+(define_int_iterator MMA_AVVI4I4I8 [UNSPEC_MMA_PMXVI4GER8PP])
+
+;; MMA instructions with 2 vector, 2 4-bit and 1 2-bit arguments
+(define_int_iterator MMA_VVI4I4I2 [UNSPEC_MMA_PMXVI16GER2
+ UNSPEC_MMA_PMXVI16GER2S
+ UNSPEC_MMA_PMXVF16GER2
+ UNSPEC_MMA_PMXVBF16GER2])
+
+;; MMA instructions with 1 accumulator, 2 vector, 2 4-bit and 1 2-bit arguments
+(define_int_iterator MMA_AVVI4I4I2 [UNSPEC_MMA_PMXVI16GER2PP
+ UNSPEC_MMA_PMXVI16GER2SPP
+ UNSPEC_MMA_PMXVF16GER2PP
+ UNSPEC_MMA_PMXVF16GER2PN
+ UNSPEC_MMA_PMXVF16GER2NP
+ UNSPEC_MMA_PMXVF16GER2NN
+ UNSPEC_MMA_PMXVBF16GER2PP
+ UNSPEC_MMA_PMXVBF16GER2PN
+ UNSPEC_MMA_PMXVBF16GER2NP
+ UNSPEC_MMA_PMXVBF16GER2NN])
+
+;; MMA instructions with 2 vector and 2 4-bit arguments
+(define_int_iterator MMA_VVI4I4 [UNSPEC_MMA_PMXVF32GER])
+
+;; MMA instructions with 1 accumulator, 2 vector and 2 4-bit arguments
+(define_int_iterator MMA_AVVI4I4 [UNSPEC_MMA_PMXVF32GERPP
+ UNSPEC_MMA_PMXVF32GERPN
+ UNSPEC_MMA_PMXVF32GERNP
+ UNSPEC_MMA_PMXVF32GERNN])
+
+;; MMA instructions with 2 vector, 1 4-bit and 1 2-bit arguments
+(define_int_iterator MMA_PVI4I2 [UNSPEC_MMA_PMXVF64GER])
+
+;; MMA instructions with 1 accumulator, 2 vector, 1 4-bit and 1 2-bit arguments
+(define_int_iterator MMA_APVI4I2 [UNSPEC_MMA_PMXVF64GERPP
+ UNSPEC_MMA_PMXVF64GERPN
+ UNSPEC_MMA_PMXVF64GERNP
+ UNSPEC_MMA_PMXVF64GERNN])
+
+;; MMA instructions with 2 vector and 3 4-bit arguments
+(define_int_iterator MMA_VVI4I4I4 [UNSPEC_MMA_PMXVI8GER4])
+
+;; MMA instructions with 1 accumulator, 2 vector and 3 4-bit arguments
+(define_int_iterator MMA_AVVI4I4I4 [UNSPEC_MMA_PMXVI8GER4PP
+ UNSPEC_MMA_PMXVI8GER4SPP])
+
+(define_int_attr acc [(UNSPEC_MMA_XXMFACC "xxmfacc")
+ (UNSPEC_MMA_XXMTACC "xxmtacc")])
+
+(define_int_attr vv [(UNSPEC_MMA_XVI4GER8 "xvi4ger8")
+ (UNSPEC_MMA_XVI8GER4 "xvi8ger4")
+ (UNSPEC_MMA_XVI16GER2 "xvi16ger2")
+ (UNSPEC_MMA_XVI16GER2S "xvi16ger2s")
+ (UNSPEC_MMA_XVF16GER2 "xvf16ger2")
+ (UNSPEC_MMA_XVBF16GER2 "xvbf16ger2")
+ (UNSPEC_MMA_XVF32GER "xvf32ger")])
+
+(define_int_attr avv [(UNSPEC_MMA_XVI4GER8PP "xvi4ger8pp")
+ (UNSPEC_MMA_XVI8GER4PP "xvi8ger4pp")
+ (UNSPEC_MMA_XVI8GER4SPP "xvi8ger4spp")
+ (UNSPEC_MMA_XVI16GER2PP "xvi16ger2pp")
+ (UNSPEC_MMA_XVI16GER2SPP "xvi16ger2spp")
+ (UNSPEC_MMA_XVF16GER2PP "xvf16ger2pp")
+ (UNSPEC_MMA_XVF16GER2PN "xvf16ger2pn")
+ (UNSPEC_MMA_XVF16GER2NP "xvf16ger2np")
+ (UNSPEC_MMA_XVF16GER2NN "xvf16ger2nn")
+ (UNSPEC_MMA_XVBF16GER2PP "xvbf16ger2pp")
+ (UNSPEC_MMA_XVBF16GER2PN "xvbf16ger2pn")
+ (UNSPEC_MMA_XVBF16GER2NP "xvbf16ger2np")
+ (UNSPEC_MMA_XVBF16GER2NN "xvbf16ger2nn")
+ (UNSPEC_MMA_XVF32GERPP "xvf32gerpp")
+ (UNSPEC_MMA_XVF32GERPN "xvf32gerpn")
+ (UNSPEC_MMA_XVF32GERNP "xvf32gernp")
+ (UNSPEC_MMA_XVF32GERNN "xvf32gernn")])
+
+(define_int_attr pv [(UNSPEC_MMA_XVF64GER "xvf64ger")])
+
+(define_int_attr apv [(UNSPEC_MMA_XVF64GERPP "xvf64gerpp")
+ (UNSPEC_MMA_XVF64GERPN "xvf64gerpn")
+ (UNSPEC_MMA_XVF64GERNP "xvf64gernp")
+ (UNSPEC_MMA_XVF64GERNN "xvf64gernn")])
+
+(define_int_attr vvi4i4i8 [(UNSPEC_MMA_PMXVI4GER8 "pmxvi4ger8")])
+
+(define_int_attr avvi4i4i8 [(UNSPEC_MMA_PMXVI4GER8PP "pmxvi4ger8pp")])
+
+(define_int_attr vvi4i4i2 [(UNSPEC_MMA_PMXVI16GER2 "pmxvi16ger2")
+ (UNSPEC_MMA_PMXVI16GER2S "pmxvi16ger2s")
+ (UNSPEC_MMA_PMXVF16GER2 "pmxvf16ger2")
+ (UNSPEC_MMA_PMXVBF16GER2 "pmxvbf16ger2")])
+
+(define_int_attr avvi4i4i2 [(UNSPEC_MMA_PMXVI16GER2PP "pmxvi16ger2pp")
+ (UNSPEC_MMA_PMXVI16GER2SPP "pmxvi16ger2spp")
+ (UNSPEC_MMA_PMXVF16GER2PP "pmxvf16ger2pp")
+ (UNSPEC_MMA_PMXVF16GER2PN "pmxvf16ger2pn")
+ (UNSPEC_MMA_PMXVF16GER2NP "pmxvf16ger2np")
+ (UNSPEC_MMA_PMXVF16GER2NN "pmxvf16ger2nn")
+ (UNSPEC_MMA_PMXVBF16GER2PP "pmxvbf16ger2pp")
+ (UNSPEC_MMA_PMXVBF16GER2PN "pmxvbf16ger2pn")
+ (UNSPEC_MMA_PMXVBF16GER2NP "pmxvbf16ger2np")
+ (UNSPEC_MMA_PMXVBF16GER2NN "pmxvbf16ger2nn")])
+
+(define_int_attr vvi4i4 [(UNSPEC_MMA_PMXVF32GER "pmxvf32ger")])
+
+(define_int_attr avvi4i4 [(UNSPEC_MMA_PMXVF32GERPP "pmxvf32gerpp")
+ (UNSPEC_MMA_PMXVF32GERPN "pmxvf32gerpn")
+ (UNSPEC_MMA_PMXVF32GERNP "pmxvf32gernp")
+ (UNSPEC_MMA_PMXVF32GERNN "pmxvf32gernn")])
+
+(define_int_attr pvi4i2 [(UNSPEC_MMA_PMXVF64GER "pmxvf64ger")])
+
+(define_int_attr apvi4i2 [(UNSPEC_MMA_PMXVF64GERPP "pmxvf64gerpp")
+ (UNSPEC_MMA_PMXVF64GERPN "pmxvf64gerpn")
+ (UNSPEC_MMA_PMXVF64GERNP "pmxvf64gernp")
+ (UNSPEC_MMA_PMXVF64GERNN "pmxvf64gernn")])
+
+(define_int_attr vvi4i4i4 [(UNSPEC_MMA_PMXVI8GER4 "pmxvi8ger4")])
+
+(define_int_attr avvi4i4i4 [(UNSPEC_MMA_PMXVI8GER4PP "pmxvi8ger4pp")
+ (UNSPEC_MMA_PMXVI8GER4SPP "pmxvi8ger4spp")])
+
+
;; Define a disabled OImode move pattern, so we can use POImode.
(define_expand "movoi"
[(set (match_operand:OI 0 "nonimmediate_operand")
@@ -109,10 +343,11 @@ (define_expand "movpxi"
})
(define_insn_and_split "*movpxi"
- [(set (match_operand:PXI 0 "nonimmediate_operand" "=d,m,d")
- (match_operand:PXI 1 "input_operand" "m,d,d"))]
+ [(set (match_operand:PXI 0 "nonimmediate_operand" "=d,m,d,d")
+ (match_operand:PXI 1 "input_operand" "m,d,d,O"))]
"TARGET_MMA
- && (gpc_reg_operand (operands[0], PXImode)
+ && ((gpc_reg_operand (operands[0], PXImode)
+ && !(CONST_INT_P (operands[1]) && INTVAL (operands[1]) == 0))
|| gpc_reg_operand (operands[1], PXImode))"
"#"
"&& reload_completed"
@@ -121,6 +356,249 @@ (define_insn_and_split "*movpxi"
rs6000_split_multireg_move (operands[0], operands[1]);
DONE;
}
- [(set_attr "type" "vecload,vecstore,veclogical")
- (set_attr "length" "8,8,16")
- (set_attr "max_prefixed_insns" "2,2,*")])
+ [(set_attr "type" "vecload,vecstore,veclogical,mma")
+ (set_attr "length" "8,8,16,*")
+ (set_attr "max_prefixed_insns" "2,2,*,*")])
+
+(define_expand "mma_assemble_pair"
+ [(match_operand:POI 0 "vsx_register_operand")
+ (match_operand:V16QI 1 "input_operand")
+ (match_operand:V16QI 2 "input_operand")]
+ "TARGET_MMA"
+{
+ rtx dst;
+
+ /* Let the compiler know the code below fully defines our output value. */
+ emit_clobber (operands[0]);
+
+ dst = simplify_gen_subreg (V16QImode, operands[0], POImode, 0);
+ emit_move_insn (dst, operands[1]);
+ dst = simplify_gen_subreg (V16QImode, operands[0], POImode, 16);
+ emit_move_insn (dst, operands[2]);
+ DONE;
+})
+
+(define_expand "mma_assemble_acc"
+ [(match_operand:PXI 0 "fpr_reg_operand")
+ (match_operand:V16QI 1 "input_operand")
+ (match_operand:V16QI 2 "input_operand")
+ (match_operand:V16QI 3 "input_operand")
+ (match_operand:V16QI 4 "input_operand")]
+ "TARGET_MMA"
+{
+ rtx src = gen_rtx_UNSPEC (PXImode,
+ gen_rtvec (4, operands[1], operands[2],
+ operands[3], operands[4]),
+ UNSPEC_MMA_ASSEMBLE_ACC);
+ emit_move_insn (operands[0], src);
+ DONE;
+})
+
+(define_insn_and_split "*mma_assemble_acc"
+ [(set (match_operand:PXI 0 "fpr_reg_operand" "=d")
+ (unspec:PXI [(match_operand:PXI 1 "mma_input_operand" "mwa")
+ (match_operand:PXI 2 "mma_input_operand" "mwa")
+ (match_operand:PXI 3 "mma_input_operand" "mwa")
+ (match_operand:PXI 4 "mma_input_operand" "mwa")]
+ UNSPEC_MMA_ASSEMBLE_ACC))]
+ "TARGET_MMA
+ && fpr_reg_operand (operands[0], PXImode)"
+ "#"
+ "&& reload_completed"
+ [(const_int 0)]
+{
+ rtx src = gen_rtx_UNSPEC (PXImode,
+ gen_rtvec (4, operands[1], operands[2],
+ operands[3], operands[4]),
+ UNSPEC_MMA_ASSEMBLE_ACC);
+ rs6000_split_multireg_move (operands[0], src);
+ DONE;
+})
+
+;; MMA instructions that do not use their accumulators as an input, still
+;; must not allow their vector operands to overlap the registers used by
+;; the accumulator. We enforce this by marking the output as early clobber.
+
+(define_insn "mma_<acc>"
+ [(set (match_operand:PXI 0 "fpr_reg_operand" "=&d")
+ (unspec:PXI [(match_operand:PXI 1 "fpr_reg_operand" "0")]
+ MMA_ACC))]
+ "TARGET_MMA"
+ "<acc> %A0"
+ [(set_attr "type" "mma")])
+
+(define_insn "mma_xxsetaccz"
+ [(set (match_operand:PXI 0 "fpr_reg_operand" "=d")
+ (const_int 0))]
+ "TARGET_MMA"
+ "xxsetaccz %A0"
+ [(set_attr "type" "mma")])
+
+(define_insn "mma_<vv>"
+ [(set (match_operand:PXI 0 "fpr_reg_operand" "=&d")
+ (unspec:PXI [(match_operand:V16QI 1 "vsx_register_operand" "wa")
+ (match_operand:V16QI 2 "vsx_register_operand" "wa")]
+ MMA_VV))]
+ "TARGET_MMA"
+ "<vv> %A0,%x1,%x2"
+ [(set_attr "type" "mma")])
+
+(define_insn "mma_<avv>"
+ [(set (match_operand:PXI 0 "fpr_reg_operand" "=&d")
+ (unspec:PXI [(match_operand:PXI 1 "fpr_reg_operand" "0")
+ (match_operand:V16QI 2 "vsx_register_operand" "wa")
+ (match_operand:V16QI 3 "vsx_register_operand" "wa")]
+ MMA_AVV))]
+ "TARGET_MMA"
+ "<avv> %A0,%x2,%x3"
+ [(set_attr "type" "mma")])
+
+(define_insn "mma_<pv>"
+ [(set (match_operand:PXI 0 "fpr_reg_operand" "=&d")
+ (unspec:PXI [(match_operand:POI 1 "vsx_register_operand" "wa")
+ (match_operand:V16QI 2 "vsx_register_operand" "wa")]
+ MMA_PV))]
+ "TARGET_MMA"
+ "<pv> %A0,%x1,%x2"
+ [(set_attr "type" "mma")])
+
+(define_insn "mma_<apv>"
+ [(set (match_operand:PXI 0 "fpr_reg_operand" "=&d")
+ (unspec:PXI [(match_operand:PXI 1 "fpr_reg_operand" "0")
+ (match_operand:POI 2 "vsx_register_operand" "wa")
+ (match_operand:V16QI 3 "vsx_register_operand" "wa")]
+ MMA_APV))]
+ "TARGET_MMA"
+ "<apv> %A0,%x2,%x3"
+ [(set_attr "type" "mma")])
+
+(define_insn "mma_<vvi4i4i8>"
+ [(set (match_operand:PXI 0 "fpr_reg_operand" "=&d")
+ (unspec:PXI [(match_operand:V16QI 1 "vsx_register_operand" "wa")
+ (match_operand:V16QI 2 "vsx_register_operand" "wa")
+ (match_operand:SI 3 "const_0_to_15_operand" "n")
+ (match_operand:SI 4 "const_0_to_15_operand" "n")
+ (match_operand:SI 5 "u8bit_cint_operand" "n")]
+ MMA_VVI4I4I8))]
+ "TARGET_MMA"
+ "<vvi4i4i8> %A0,%x1,%x2,%3,%4,%5"
+ [(set_attr "type" "mma")
+ (set_attr "length" "8")])
+
+(define_insn "mma_<avvi4i4i8>"
+ [(set (match_operand:PXI 0 "fpr_reg_operand" "=&d")
+ (unspec:PXI [(match_operand:PXI 1 "fpr_reg_operand" "0")
+ (match_operand:V16QI 2 "vsx_register_operand" "wa")
+ (match_operand:V16QI 3 "vsx_register_operand" "wa")
+ (match_operand:SI 4 "const_0_to_15_operand" "n")
+ (match_operand:SI 5 "const_0_to_15_operand" "n")
+ (match_operand:SI 6 "u8bit_cint_operand" "n")]
+ MMA_AVVI4I4I8))]
+ "TARGET_MMA"
+ "<avvi4i4i8> %A0,%x2,%x3,%4,%5,%6"
+ [(set_attr "type" "mma")
+ (set_attr "length" "8")])
+
+(define_insn "mma_<vvi4i4i2>"
+ [(set (match_operand:PXI 0 "fpr_reg_operand" "=&d")
+ (unspec:PXI [(match_operand:V16QI 1 "vsx_register_operand" "wa")
+ (match_operand:V16QI 2 "vsx_register_operand" "wa")
+ (match_operand:SI 3 "const_0_to_15_operand" "n")
+ (match_operand:SI 4 "const_0_to_15_operand" "n")
+ (match_operand:SI 5 "const_0_to_3_operand" "n")]
+ MMA_VVI4I4I2))]
+ "TARGET_MMA"
+ "<vvi4i4i2> %A0,%x1,%x2,%3,%4,%5"
+ [(set_attr "type" "mma")
+ (set_attr "length" "8")])
+
+(define_insn "mma_<avvi4i4i2>"
+ [(set (match_operand:PXI 0 "fpr_reg_operand" "=&d")
+ (unspec:PXI [(match_operand:PXI 1 "fpr_reg_operand" "0")
+ (match_operand:V16QI 2 "vsx_register_operand" "wa")
+ (match_operand:V16QI 3 "vsx_register_operand" "wa")
+ (match_operand:SI 4 "const_0_to_15_operand" "n")
+ (match_operand:SI 5 "const_0_to_15_operand" "n")
+ (match_operand:SI 6 "const_0_to_3_operand" "n")]
+ MMA_AVVI4I4I2))]
+ "TARGET_MMA"
+ "<avvi4i4i2> %A0,%x2,%x3,%4,%5,%6"
+ [(set_attr "type" "mma")
+ (set_attr "length" "8")])
+
+(define_insn "mma_<vvi4i4>"
+ [(set (match_operand:PXI 0 "fpr_reg_operand" "=&d")
+ (unspec:PXI [(match_operand:V16QI 1 "vsx_register_operand" "wa")
+ (match_operand:V16QI 2 "vsx_register_operand" "wa")
+ (match_operand:SI 3 "const_0_to_15_operand" "n")
+ (match_operand:SI 4 "const_0_to_15_operand" "n")]
+ MMA_VVI4I4))]
+ "TARGET_MMA"
+ "<vvi4i4> %A0,%x1,%x2,%3,%4"
+ [(set_attr "type" "mma")
+ (set_attr "length" "8")])
+
+(define_insn "mma_<avvi4i4>"
+ [(set (match_operand:PXI 0 "fpr_reg_operand" "=&d")
+ (unspec:PXI [(match_operand:PXI 1 "fpr_reg_operand" "0")
+ (match_operand:V16QI 2 "vsx_register_operand" "wa")
+ (match_operand:V16QI 3 "vsx_register_operand" "wa")
+ (match_operand:SI 4 "const_0_to_15_operand" "n")
+ (match_operand:SI 5 "const_0_to_15_operand" "n")]
+ MMA_AVVI4I4))]
+ "TARGET_MMA"
+ "<avvi4i4> %A0,%x2,%x3,%4,%5"
+ [(set_attr "type" "mma")
+ (set_attr "length" "8")])
+
+(define_insn "mma_<pvi4i2>"
+ [(set (match_operand:PXI 0 "fpr_reg_operand" "=&d")
+ (unspec:PXI [(match_operand:POI 1 "vsx_register_operand" "wa")
+ (match_operand:V16QI 2 "vsx_register_operand" "wa")
+ (match_operand:SI 3 "const_0_to_15_operand" "n")
+ (match_operand:SI 4 "const_0_to_3_operand" "n")]
+ MMA_PVI4I2))]
+ "TARGET_MMA"
+ "<pvi4i2> %A0,%x1,%x2,%3,%4"
+ [(set_attr "type" "mma")
+ (set_attr "length" "8")])
+
+(define_insn "mma_<apvi4i2>"
+ [(set (match_operand:PXI 0 "fpr_reg_operand" "=&d")
+ (unspec:PXI [(match_operand:PXI 1 "fpr_reg_operand" "0")
+ (match_operand:POI 2 "vsx_register_operand" "wa")
+ (match_operand:V16QI 3 "vsx_register_operand" "wa")
+ (match_operand:SI 4 "const_0_to_15_operand" "n")
+ (match_operand:SI 5 "const_0_to_3_operand" "n")]
+ MMA_APVI4I2))]
+ "TARGET_MMA"
+ "<apvi4i2> %A0,%x2,%x3,%4,%5"
+ [(set_attr "type" "mma")
+ (set_attr "length" "8")])
+
+(define_insn "mma_<vvi4i4i4>"
+ [(set (match_operand:PXI 0 "fpr_reg_operand" "=&d")
+ (unspec:PXI [(match_operand:V16QI 1 "vsx_register_operand" "wa")
+ (match_operand:V16QI 2 "vsx_register_operand" "wa")
+ (match_operand:SI 3 "const_0_to_15_operand" "n")
+ (match_operand:SI 4 "const_0_to_15_operand" "n")
+ (match_operand:SI 5 "const_0_to_15_operand" "n")]
+ MMA_VVI4I4I4))]
+ "TARGET_MMA"
+ "<vvi4i4i4> %A0,%x1,%x2,%3,%4,%5"
+ [(set_attr "type" "mma")
+ (set_attr "length" "8")])
+
+(define_insn "mma_<avvi4i4i4>"
+ [(set (match_operand:PXI 0 "fpr_reg_operand" "=&d")
+ (unspec:PXI [(match_operand:PXI 1 "fpr_reg_operand" "0")
+ (match_operand:V16QI 2 "vsx_register_operand" "wa")
+ (match_operand:V16QI 3 "vsx_register_operand" "wa")
+ (match_operand:SI 4 "const_0_to_15_operand" "n")
+ (match_operand:SI 5 "const_0_to_15_operand" "n")
+ (match_operand:SI 6 "const_0_to_15_operand" "n")]
+ MMA_AVVI4I4I4))]
+ "TARGET_MMA"
+ "<avvi4i4i4> %A0,%x2,%x3,%4,%5,%6"
+ [(set_attr "type" "mma")
+ (set_attr "length" "8")])
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 6b462a3ecdb..bbe0b4610fb 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -203,7 +203,7 @@ (define_attr "type"
vecsimple,veccomplex,vecdiv,veccmp,veccmpsimple,vecperm,
vecfloat,vecfdiv,vecdouble,mffgpr,mftgpr,crypto,
veclogical,veccmpfx,vecexts,vecmove,
- htm,htmsimple,dfp"
+ htm,htmsimple,dfp,mma"
(const_string "integer"))
;; What data size does this instruction work on?
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 2a28215ac5b..342927abeda 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -296,6 +296,8 @@ (define_c_enum "unspec"
UNSPEC_VSX_DIVUD
UNSPEC_VSX_MULSD
UNSPEC_VSX_SIGN_EXTEND
+ UNSPEC_VSX_XVCVBF16SP
+ UNSPEC_VSX_XVCVSPBF16
UNSPEC_VSX_XVCVSPSXDS
UNSPEC_VSX_VSLO
UNSPEC_VSX_EXTRACT
@@ -346,6 +348,12 @@ (define_c_enum "unspec"
UNSPEC_XXGENPCV
])
+(define_int_iterator XVCVBF16 [UNSPEC_VSX_XVCVSPBF16
+ UNSPEC_VSX_XVCVBF16SP])
+
+(define_int_attr xvcvbf16 [(UNSPEC_VSX_XVCVSPBF16 "xvcvspbf16")
+ (UNSPEC_VSX_XVCVBF16SP "xvcvbf16sp")])
+
;; VSX moves
;; The patterns for LE permuted loads and stores come before the general
@@ -5676,3 +5684,10 @@ (define_expand "vec_unpack_<su>fix_trunc_lo_v4sf"
DONE;
})
+(define_insn "vsx_<xvcvbf16>"
+ [(set (match_operand:V16QI 0 "vsx_register_operand" "=wa")
+ (unspec:V16QI [(match_operand:V16QI 1 "vsx_register_operand" "wa")]
+ XVCVBF16))]
+ "TARGET_FUTURE"
+ "<xvcvbf16> %x0,%x1"
+ [(set_attr "type" "vecfloat")])
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index e656e66a80c..8242c48337e 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -13858,6 +13858,7 @@ instructions, but allow the compiler to schedule those calls.
* PowerPC AltiVec/VSX Built-in Functions::
* PowerPC Hardware Transactional Memory Built-in Functions::
* PowerPC Atomic Memory Operation Functions::
+* PowerPC Matrix-Multiply Assist Built-in Functions::
* RX Built-in Functions::
* S/390 System z Built-in Functions::
* SH Built-in Functions::
@@ -21359,6 +21360,100 @@ void amo_stdat_smax (int64_t *, int64_t);
void amo_stdat_smin (int64_t *, int64_t);
@end smallexample
+@node PowerPC Matrix-Multiply Assist Built-in Functions
+@subsection PowerPC Matrix-Multiply Assist Built-in Functions
+ISA 3.1 of the PowerPC added new Matrix-Multiply Assist (MMA) instructions.
+GCC provides support for these instructions through the following built-in
+functions which are enabled with the @code{-mmma} option. The vec_t type
+below is defined to be a normal vector unsigned char type. The uint2, uint4
+and uint8 parameters are 2-bit, 4-bit and 8-bit unsigned integer constants
+respectively. The compiler will verify that they are constants and that
+their values are within range.
+
+The built-in functions supported are:
+
+@smallexample
+void __builtin_mma_xvi4ger8 (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvi8ger4 (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvi16ger2 (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvi16ger2s (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvf16ger2 (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvbf16ger2 (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvf32ger (__vector_quad *, vec_t, vec_t);
+
+void __builtin_mma_xvi4ger8pp (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvi8ger4pp (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvi8ger4spp(__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvi16ger2pp (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvi16ger2spp (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvf16ger2pp (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvf16ger2pn (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvf16ger2np (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvf16ger2nn (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvbf16ger2pp (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvbf16ger2pn (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvbf16ger2np (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvbf16ger2nn (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvf32gerpp (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvf32gerpn (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvf32gernp (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvf32gernn (__vector_quad *, vec_t, vec_t);
+
+void __builtin_mma_pmxvi4ger8 (__vector_quad *, vec_t, vec_t, uint4, uint4, uint8);
+void __builtin_mma_pmxvi4ger8pp (__vector_quad *, vec_t, vec_t, uint4, uint4, uint8);
+
+void __builtin_mma_pmxvi8ger4 (__vector_quad *, vec_t, vec_t, uint4, uint4, uint4);
+void __builtin_mma_pmxvi8ger4pp (__vector_quad *, vec_t, vec_t, uint4, uint4, uint4);
+void __builtin_mma_pmxvi8ger4spp(__vector_quad *, vec_t, vec_t, uint4, uint4, uint4);
+
+void __builtin_mma_pmxvi16ger2 (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+void __builtin_mma_pmxvi16ger2s (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+void __builtin_mma_pmxvf16ger2 (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+void __builtin_mma_pmxvbf16ger2 (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+
+void __builtin_mma_pmxvi16ger2pp (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+void __builtin_mma_pmxvi16ger2spp (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+void __builtin_mma_pmxvf16ger2pp (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+void __builtin_mma_pmxvf16ger2pn (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+void __builtin_mma_pmxvf16ger2np (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+void __builtin_mma_pmxvf16ger2nn (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+void __builtin_mma_pmxvbf16ger2pp (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+void __builtin_mma_pmxvbf16ger2pn (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+void __builtin_mma_pmxvbf16ger2np (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+void __builtin_mma_pmxvbf16ger2nn (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+
+void __builtin_mma_pmxvf32ger (__vector_quad *, vec_t, vec_t, uint4, uint4);
+void __builtin_mma_pmxvf32gerpp (__vector_quad *, vec_t, vec_t, uint4, uint4);
+void __builtin_mma_pmxvf32gerpn (__vector_quad *, vec_t, vec_t, uint4, uint4);
+void __builtin_mma_pmxvf32gernp (__vector_quad *, vec_t, vec_t, uint4, uint4);
+void __builtin_mma_pmxvf32gernn (__vector_quad *, vec_t, vec_t, uint4, uint4);
+
+void __builtin_mma_xvf64ger (__vector_quad *, __vector_pair, vec_t);
+void __builtin_mma_xvf64gerpp (__vector_quad *, __vector_pair, vec_t);
+void __builtin_mma_xvf64gerpn (__vector_quad *, __vector_pair, vec_t);
+void __builtin_mma_xvf64gernp (__vector_quad *, __vector_pair, vec_t);
+void __builtin_mma_xvf64gernn (__vector_quad *, __vector_pair, vec_t);
+
+void __builtin_mma_pmxvf64ger (__vector_quad *, __vector_pair, vec_t, uint4, uint2);
+void __builtin_mma_pmxvf64gerpp (__vector_quad *, __vector_pair, vec_t, uint4, uint2);
+void __builtin_mma_pmxvf64gerpn (__vector_quad *, __vector_pair, vec_t, uint4, uint2);
+void __builtin_mma_pmxvf64gernp (__vector_quad *, __vector_pair, vec_t, uint4, uint2);
+void __builtin_mma_pmxvf64gernn (__vector_quad *, __vector_pair, vec_t, uint4, uint2);
+
+void __builtin_mma_xxmtacc (__vector_quad *);
+void __builtin_mma_xxmfacc (__vector_quad *);
+void __builtin_mma_xxsetaccz (__vector_quad *);
+
+void __builtin_mma_assemble_acc (__vector_quad *, vec_t, vec_t, vec_t, vec_t);
+void __builtin_mma_disassemble_acc (void *, __vector_quad *);
+
+void __builtin_mma_assemble_pair (__vector_pair *, vec_t, vec_t);
+void __builtin_mma_disassemble_pair (void *, __vector_pair *);
+
+vec_t __builtin_vsx_xvcvspbf16 (vec_t);
+vec_t __builtin_vsx_xvcvbf16sp (vec_t);
+@end smallexample
+
@node RX Built-in Functions
@subsection RX Built-in Functions
GCC supports some of the RX instructions which cannot be expressed in
^ permalink raw reply [flat|nested] 19+ messages in thread