From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 1005) id A575E385840E; Tue, 23 Jan 2024 07:36:47 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A575E385840E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1705995407; bh=MzwvDqH8kSddujJojqGEiWh2yN4OkOu4Um5vuQblYO0=; h=From:To:Subject:Date:From; b=Bbc5Wz1lvPP784mM0uGMujbNPn3xI2AoMj4tViQLKWthyJ+DDwAGmz9BZVgbgGiId nTsAozqncCzhEb12bd2SFR44z58176/sA6JsdTdz0Y+1aRd5jPBKhYe9IE0Jqv00xP 3X1Q6ghRdZmsnICy0jYA/ww3GkzH+w2dSfFEX9yg= Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: Michael Meissner To: gcc-cvs@gcc.gnu.org Subject: [gcc(refs/users/meissner/heads/work154-vcombo)] Add vector pair built-in functions. X-Act-Checkin: gcc X-Git-Author: Michael Meissner X-Git-Refname: refs/users/meissner/heads/work154-vcombo X-Git-Oldrev: 2ffec8b255662b47175aabab782f63a73ef0755a X-Git-Newrev: ea987aafefb63acd3b6ee5a9c850d55a09be6f1d Message-Id: <20240123073647.A575E385840E@sourceware.org> Date: Tue, 23 Jan 2024 07:36:47 +0000 (GMT) List-Id: https://gcc.gnu.org/g:ea987aafefb63acd3b6ee5a9c850d55a09be6f1d commit ea987aafefb63acd3b6ee5a9c850d55a09be6f1d Author: Michael Meissner Date: Tue Jan 23 02:34:39 2024 -0500 Add vector pair built-in functions. 2024-01-23 Michael Meissner gcc/ * config/rs6000/vector-pair.md (vpair_add_neg_3): New combiner insn to convert vector plus/neg into a minus operation. (vpair_fma__merge): Optimize multiply, add/subtract, and negation into fma operations if the user specifies to create fmas. (vpair_fma__merge): Likewise. (vpair_fma__merge2): Likewise. (vpair_nfma__merge): Likewise. (vpair_nfms__merge): Likewise. (vpair_nfms__merge2): Likewise. gcc/testsuite/ * gcc.target/powerpc/vector-pair-7.c: New test. * gcc.target/powerpc/vector-pair-8.c: Likewise. * gcc.target/powerpc/vector-pair-9.c: Likewise. * gcc.target/powerpc/vector-pair-10.c: Likewise. * gcc.target/powerpc/vector-pair-11.c: Likewise. * gcc.target/powerpc/vector-pair-12xs.c: Likewise. 2024-01-23 Michael Meissner gcc/ * config/rs6000/rs6000-builtins.def (__builtin_vpair_zero): New built-in function. (__builtin_vpair_f32_splat): Likewise. (__builtin_vpair_f64_splat): Likewise. * config/rs6000/vector-pair.md (UNSPEC_VPAIR_ZERO): New unspec. (UNSPEC_VPAIR_SPLAT): Likewise. (VPAIR_SPLAT_VMODE): New mode iterator. (VPAIR_SPLAT_ELEMENT_TO_VMODE): New mode attribute. (vpair_splat_name): Likewise. (vpair_zero): New insn. (vpair_splat_): New define_expand. (vpair_splat__internal): New insns. gcc/testsuite/ * gcc.target/powerpc/vector-pair-5.c: New test. * gcc.target/powerpc/vector-pair-6.c: Likewise. 2024-01-23 Michael Meissner gcc/ * config/rs6000/rs6000-builtins.def (__builtin_vpair_f32_fma): New built-in. (__builtin_vpair_f32_fms): Likewise. (__builtin_vpair_f32_nfma): Likewise. (__builtin_vpair_f32_nfms): Likewise. (__builtin_vpair_f64_fma): Likewise. (__builtin_vpair_f64_fms): Likewise. (__builtin_vpair_f64_nfma): Likewise. * config/rs6000/rs6000/rs6000-proto.h (enum vpair_split_fma): New enumeration. (vpair_split_fma): New declaration. * config/rs6000/rs6000.cc (vpair_split_fma): New function to split vector pair FMA operations. * config/rs6000/vector-pair.md (UNSPEC_VPAIR_FMA): New unspec. (vpair_stdname): Add UNSPEC_VPAIR_FMA. (VPAIR_OP): Likewise. (vpair_fma_4): New insns. (vpair_fms_4): Likewise. (vpair_nfma_4): Likewise. (vpair_nfms_4): Likewise. * doc/extend.texi (PowerPC Vector Pair Built-in Functions): Document new vector pair fma built-in functions. gcc/testsuite/ * gcc.target/powerpc/vector-pair-3.c: New test. * gcc.target/powerpc/vector-pair-4.c: Likewise. 2024-01-23 Michael Meissner gcc/ * config/rs6000/rs6000-builtins.def (__builtin_vpair_*): Add new built-in functions for vector pair support. * config/rs6000/rs6000-protos.h (enum vpair_split_unary): New enumeration. (vpair_split_unary): New declaration. (vpair_split_binary): Likewise. * config/rs6000/rs6000.cc (vpair_split_unary): New function to split vector pair operations. (vpair_split_binary): Likewise. * config/rs6000/rs6000.md (toplevel): Include vector-pair.md. * config/rs6000/t-rs6000 (MD_INCLUDES): Add vector-pair.md. * config/rs6000/vector-pair.md: New file. * doc/extend.texi (PowerPC Vector Pair Built-in Functions): Add documentation for the new vector pair built-in functions. gcc/testsuite/ * gcc.target/powerpc/vector-pair-1.c: New test. * gcc.target/powerpc/vector-pair-2.c: Likewise. Diff: --- gcc/config/rs6000/rs6000-builtins.def | 90 ++++ gcc/config/rs6000/rs6000-protos.h | 25 + gcc/config/rs6000/rs6000.cc | 138 +++++ gcc/config/rs6000/rs6000.md | 1 + gcc/config/rs6000/t-rs6000 | 1 + gcc/config/rs6000/vector-pair.md | 580 ++++++++++++++++++++++ gcc/doc/extend.texi | 85 ++++ gcc/testsuite/gcc.target/powerpc/vector-pair-1.c | 87 ++++ gcc/testsuite/gcc.target/powerpc/vector-pair-10.c | 61 +++ gcc/testsuite/gcc.target/powerpc/vector-pair-11.c | 65 +++ gcc/testsuite/gcc.target/powerpc/vector-pair-12.c | 65 +++ gcc/testsuite/gcc.target/powerpc/vector-pair-2.c | 86 ++++ gcc/testsuite/gcc.target/powerpc/vector-pair-3.c | 57 +++ gcc/testsuite/gcc.target/powerpc/vector-pair-4.c | 57 +++ gcc/testsuite/gcc.target/powerpc/vector-pair-5.c | 56 +++ gcc/testsuite/gcc.target/powerpc/vector-pair-6.c | 56 +++ gcc/testsuite/gcc.target/powerpc/vector-pair-7.c | 18 + gcc/testsuite/gcc.target/powerpc/vector-pair-8.c | 18 + gcc/testsuite/gcc.target/powerpc/vector-pair-9.c | 61 +++ 19 files changed, 1607 insertions(+) diff --git a/gcc/config/rs6000/rs6000-builtins.def b/gcc/config/rs6000/rs6000-builtins.def index 3bc7fed6956..b757a8630ff 100644 --- a/gcc/config/rs6000/rs6000-builtins.def +++ b/gcc/config/rs6000/rs6000-builtins.def @@ -4131,3 +4131,93 @@ void __builtin_vsx_stxvp (v256, unsigned long, const v256 *); STXVP nothing {mma,pair} + +;; Vector pair built-in functions. + v256 __builtin_vpair_zero (); + VPAIR_ZERO vpair_zero {mma} + +;; Vector pair built-in functions with float elements + v256 __builtin_vpair_f32_abs (v256); + VPAIR_F32_ABS vpair_abs_v8sf2 {mma} + + v256 __builtin_vpair_f32_add (v256, v256); + VPAIR_F32_ADD vpair_add_v8sf3 {mma} + + v256 __builtin_vpair_f32_div (v256, v256); + VPAIR_F32_DIV vpair_div_v8sf3 {mma} + + v256 __builtin_vpair_f32_fma (v256, v256, v256); + VPAIR_F32_FMA vpair_fma_v8sf4 {mma} + + v256 __builtin_vpair_f32_fms (v256, v256, v256); + VPAIR_F32_FMS vpair_fms_v8sf4 {mma} + + v256 __builtin_vpair_f32_max (v256, v256); + VPAIR_F32_MAX vpair_smax_v8sf3 {mma} + + v256 __builtin_vpair_f32_min (v256, v256); + VPAIR_F32_MIN vpair_smin_v8sf3 {mma} + + v256 __builtin_vpair_f32_mul (v256, v256); + VPAIR_F32_MUL vpair_mul_v8sf3 {mma} + + v256 __builtin_vpair_f32_nabs (v256); + VPAIR_F32_NABS vpair_nabs_v8sf2 {mma} + + v256 __builtin_vpair_f32_neg (v256); + VPAIR_F32_NEG vpair_neg_v8sf2 {mma} + + v256 __builtin_vpair_f32_nfma (v256, v256, v256); + VPAIR_F32_NFMA vpair_nfma_v8sf4 {mma} + + v256 __builtin_vpair_f32_nfms (v256, v256, v256); + VPAIR_F32_NFMS vpair_nfms_v8sf4 {mma} + + v256 __builtin_vpair_f32_splat (float); + VPAIR_F32_SPLAT vpair_splat_v8sf {mma} + + v256 __builtin_vpair_f32_sub (v256, v256); + VPAIR_F32_SUB vpair_sub_v8sf3 {mma} + +;; Vector pair built-in functions with double elements + v256 __builtin_vpair_f64_abs (v256); + VPAIR_F64_ABS vpair_abs_v4df2 {mma} + + v256 __builtin_vpair_f64_add (v256, v256); + VPAIR_F64_ADD vpair_add_v4df3 {mma} + + v256 __builtin_vpair_f64_div (v256, v256); + VPAIR_F64_DIV vpair_div_v4df3 {mma} + + v256 __builtin_vpair_f64_fma (v256, v256, v256); + VPAIR_F64_FMA vpair_fma_v4df4 {mma} + + v256 __builtin_vpair_f64_fms (v256, v256, v256); + VPAIR_F64_FMS vpair_fms_v4df4 {mma} + + v256 __builtin_vpair_f64_max (v256, v256); + VPAIR_F64_MAX vpair_smax_v4df3 {mma} + + v256 __builtin_vpair_f64_min (v256, v256); + VPAIR_F64_MIN vpair_smin_v4df3 {mma} + + v256 __builtin_vpair_f64_mul (v256, v256); + VPAIR_F64_MUL vpair_mul_v4df3 {mma} + + v256 __builtin_vpair_f64_nabs (v256); + VPAIR_F64_NABS vpair_nabs_v4df2 {mma} + + v256 __builtin_vpair_f64_neg (v256); + VPAIR_F64_NEG vpair_neg_v4df2 {mma} + + v256 __builtin_vpair_f64_nfma (v256, v256, v256); + VPAIR_F64_NFMA vpair_nfma_v4df4 {mma} + + v256 __builtin_vpair_f64_nfms (v256, v256, v256); + VPAIR_F64_NFMS vpair_nfms_v4df4 {mma} + + v256 __builtin_vpair_f64_splat (double); + VPAIR_F64_SPLAT vpair_splat_v4df {mma} + + v256 __builtin_vpair_f64_sub (v256, v256); + VPAIR_F64_SUB vpair_sub_v4df3 {mma} diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h index 09a57a806fa..aed4081c87b 100644 --- a/gcc/config/rs6000/rs6000-protos.h +++ b/gcc/config/rs6000/rs6000-protos.h @@ -162,6 +162,31 @@ extern bool rs6000_pcrel_p (void); extern bool rs6000_fndecl_pcrel_p (const_tree); extern void rs6000_output_addr_vec_elt (FILE *, int); +/* If we are splitting a vector pair unary operator into two separate vector + operations, we need to generate a NEG if this is NABS. */ + +enum vpair_split_unary { + VPAIR_SPLIT_NORMAL, /* No extra processing is needed. */ + VPAIR_SPLIT_NEGATE /* Wrap operation with a NEG. */ +}; + +extern void vpair_split_unary (rtx [], machine_mode, enum rtx_code, + enum vpair_split_unary); +extern void vpair_split_binary (rtx [], machine_mode, enum rtx_code); + +/* When we are splitting a vector pair FMA operation into two vector operations, we + may need to modify the code generated. This enumeration encodes the + different choices. */ + +enum vpair_split_fma { + VPAIR_SPLIT_FMA, /* Fused multiply-add. */ + VPAIR_SPLIT_FMS, /* Fused multiply-subtract. */ + VPAIR_SPLIT_NFMA, /* Fused negate multiply-add. */ + VPAIR_SPLIT_NFMS /* Fused negate multiply-subtract. */ +}; + +extern void vpair_split_fma (rtx [], machine_mode, enum vpair_split_fma); + /* Different PowerPC instruction formats that are used by GCC. There are various other instruction formats used by the PowerPC hardware, but these formats are not currently used by GCC. */ diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index f3aa1c15f68..055cc55ffc9 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -29391,7 +29391,145 @@ rs6000_opaque_type_invalid_use_p (gimple *stmt) return false; } + +/* Split vector pair unary operations. */ + +void +vpair_split_unary (rtx operands[], /* Dest, input. */ + machine_mode vmode, /* Vector mode. */ + enum rtx_code code, /* Operator code. */ + enum vpair_split_unary action) /* Action to take. */ +{ + rtx op0 = operands[0]; + machine_mode mode0 = GET_MODE (op0); + gcc_assert (GET_MODE_SIZE (mode0) == 32); + rtx op0_a = simplify_gen_subreg (vmode, op0, mode0, 0); + rtx op0_b = simplify_gen_subreg (vmode, op0, mode0, 16); + + rtx op1 = operands[1]; + machine_mode mode1 = GET_MODE (op1); + gcc_assert (GET_MODE_SIZE (mode0) == 32); + rtx op1_a = simplify_gen_subreg (vmode, op1, mode1, 0); + rtx op1_b = simplify_gen_subreg (vmode, op1, mode1, 16); + + rtx operation_a = gen_rtx_fmt_e (code, vmode, op1_a); + rtx operation_b = gen_rtx_fmt_e (code, vmode, op1_b); + + if (action == VPAIR_SPLIT_NEGATE) + { + operation_a = gen_rtx_NEG (vmode, operation_a); + operation_b = gen_rtx_NEG (vmode, operation_b); + } + + emit_insn (gen_rtx_SET (op0_a, operation_a)); + emit_insn (gen_rtx_SET (op0_b, operation_b)); + return; +} + +/* Split vector pair binary operations. */ + +void +vpair_split_binary (rtx operands[], /* Dest, 2 inputs. */ + machine_mode vmode, /* Vector mode. */ + enum rtx_code code) /* Operator code. */ +{ + rtx op0 = operands[0]; + machine_mode mode0 = GET_MODE (op0); + gcc_assert (GET_MODE_SIZE (mode0) == 32); + rtx op0_a = simplify_gen_subreg (vmode, op0, mode0, 0); + rtx op0_b = simplify_gen_subreg (vmode, op0, mode0, 16); + + rtx op1 = operands[1]; + machine_mode mode1 = GET_MODE (op1); + gcc_assert (GET_MODE_SIZE (mode1) == 32); + rtx op1_a = simplify_gen_subreg (vmode, op1, mode1, 0); + rtx op1_b = simplify_gen_subreg (vmode, op1, mode1, 16); + + rtx op2 = operands[2]; + machine_mode mode2 = GET_MODE (op2); + gcc_assert (GET_MODE_SIZE (mode2) == 32); + rtx op2_a = simplify_gen_subreg (vmode, op2, mode2, 0); + rtx op2_b = simplify_gen_subreg (vmode, op2, mode2, 16); + + rtx operation_a = gen_rtx_fmt_ee (code, vmode, op1_a, op2_a); + rtx operation_b = gen_rtx_fmt_ee (code, vmode, op1_b, op2_b); + + emit_insn (gen_rtx_SET (op0_a, operation_a)); + emit_insn (gen_rtx_SET (op0_b, operation_b)); + return; +} + +/* Split vector pair fma operations. */ + +void +vpair_split_fma (rtx operands[], /* Dest, 3 inputs. */ + machine_mode vmode, /* Vector mode. */ + enum vpair_split_fma action) /* Action to take. */ +{ + rtx op0 = operands[0]; + machine_mode mode0 = GET_MODE (op0); + gcc_assert (GET_MODE_SIZE (mode0) == 32); + rtx op0_a = simplify_gen_subreg (vmode, op0, mode0, 0); + rtx op0_b = simplify_gen_subreg (vmode, op0, mode0, 16); + + rtx op1 = operands[1]; + machine_mode mode1 = GET_MODE (op1); + gcc_assert (GET_MODE_SIZE (mode1) == 32); + rtx op1_a = simplify_gen_subreg (vmode, op1, mode1, 0); + rtx op1_b = simplify_gen_subreg (vmode, op1, mode1, 16); + + rtx op2 = operands[2]; + machine_mode mode2 = GET_MODE (op2); + gcc_assert (GET_MODE_SIZE (mode2) == 32); + rtx op2_a = simplify_gen_subreg (vmode, op2, mode2, 0); + rtx op2_b = simplify_gen_subreg (vmode, op2, mode2, 16); + + rtx op3 = operands[3]; + machine_mode mode3 = GET_MODE (op3); + gcc_assert (GET_MODE_SIZE (mode3) == 32); + rtx op3_a = simplify_gen_subreg (vmode, op3, mode3, 0); + rtx op3_b = simplify_gen_subreg (vmode, op3, mode3, 16); + + switch (action) + { + case VPAIR_SPLIT_FMA: + case VPAIR_SPLIT_NFMA: + break; + + case VPAIR_SPLIT_FMS: + case VPAIR_SPLIT_NFMS: + op3_a = gen_rtx_NEG (vmode, op3_a); + op3_b = gen_rtx_NEG (vmode, op3_b); + break; + + default: + gcc_unreachable (); + } + + rtx operation_a = gen_rtx_fmt_eee (FMA, vmode, op1_a, op2_a, op3_a); + rtx operation_b = gen_rtx_fmt_eee (FMA, vmode, op1_b, op2_b, op3_b); + + switch (action) + { + case VPAIR_SPLIT_FMA: + case VPAIR_SPLIT_FMS: + break; + + case VPAIR_SPLIT_NFMA: + case VPAIR_SPLIT_NFMS: + operation_a = gen_rtx_NEG (vmode, operation_a); + operation_b = gen_rtx_NEG (vmode, operation_b); + break; + + default: + gcc_unreachable (); + } + emit_insn (gen_rtx_SET (op0_a, operation_a)); + emit_insn (gen_rtx_SET (op0_b, operation_b)); + return; +} + struct gcc_target targetm = TARGET_INITIALIZER; #include "gt-rs6000.h" diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index 4acb4031ae0..129e1ce74e2 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -15834,6 +15834,7 @@ (include "vsx.md") (include "altivec.md") (include "mma.md") +(include "vector-pair.md") (include "dfp.md") (include "crypto.md") (include "htm.md") diff --git a/gcc/config/rs6000/t-rs6000 b/gcc/config/rs6000/t-rs6000 index b3ce09d523b..64655ef38b8 100644 --- a/gcc/config/rs6000/t-rs6000 +++ b/gcc/config/rs6000/t-rs6000 @@ -128,6 +128,7 @@ MD_INCLUDES = $(srcdir)/config/rs6000/rs64.md \ $(srcdir)/config/rs6000/vsx.md \ $(srcdir)/config/rs6000/altivec.md \ $(srcdir)/config/rs6000/mma.md \ + $(srcdir)/config/rs6000/vector-pair.md \ $(srcdir)/config/rs6000/crypto.md \ $(srcdir)/config/rs6000/htm.md \ $(srcdir)/config/rs6000/dfp.md \ diff --git a/gcc/config/rs6000/vector-pair.md b/gcc/config/rs6000/vector-pair.md new file mode 100644 index 00000000000..7a81acbdc05 --- /dev/null +++ b/gcc/config/rs6000/vector-pair.md @@ -0,0 +1,580 @@ +;; Vector pair arithmetic support. +;; Copyright (C) 2020-2023 Free Software Foundation, Inc. +;; Contributed by Peter Bergner and +;; Michael Meissner +;; +;; This file is part of GCC. +;; +;; GCC is free software; you can redistribute it and/or modify it +;; under the terms of the GNU General Public License as published +;; by the Free Software Foundation; either version 3, or (at your +;; option) any later version. +;; +;; GCC is distributed in the hope that it will be useful, but WITHOUT +;; ANY WARRANTY; without even the implied warranty of MERCHANTABILITY +;; or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public +;; License for more details. +;; +;; You should have received a copy of the GNU General Public License +;; along with GCC; see the file COPYING3. If not see +;; . +;; +;; This file adds support for doing vector operations on pairs of vector +;; registers. Most of the instructions use vector pair instructions to load +;; and possibly store registers, but splitting the operation after register +;; allocation to do 2 separate operations. The second scheduler pass can +;; interleave other instructions between these pairs of instructions if +;; possible. + +;; We use UNSPEC to identify the representation for the operation rather than +;; SUBREG, because SUBREG tends to generate extra moves. + +(define_c_enum "unspec" + [UNSPEC_VPAIR_ABS + UNSPEC_VPAIR_DIV + UNSPEC_VPAIR_FMA + UNSPEC_VPAIR_MINUS + UNSPEC_VPAIR_MULT + UNSPEC_VPAIR_NEG + UNSPEC_VPAIR_PLUS + UNSPEC_VPAIR_SMAX + UNSPEC_VPAIR_SMIN + UNSPEC_VPAIR_ZERO + UNSPEC_VPAIR_SPLAT]) + +;; Vector pair element ID that defines the scaler element within the vector pair. +(define_c_enum "vpair_element" + [VPAIR_ELEMENT_FLOAT + VPAIR_ELEMENT_DOUBLE]) + +(define_int_iterator VPAIR_FP_ELEMENT [VPAIR_ELEMENT_FLOAT + VPAIR_ELEMENT_DOUBLE]) + +;; Map vector pair element ID to the vector mode after the vector pair has been +;; split. +(define_int_attr VPAIR_VMODE [(VPAIR_ELEMENT_FLOAT "V4SF") + (VPAIR_ELEMENT_DOUBLE "V2DF")]) + +;; Map vector pair element ID to the name used on the define_insn (in lower +;; case). +(define_int_attr vpair_modename [(VPAIR_ELEMENT_FLOAT "v8sf") + (VPAIR_ELEMENT_DOUBLE "v4df")]) + +;; Unary/binary arithmetic iterator on vector pairs. +(define_int_iterator VPAIR_FP_UNARY [UNSPEC_VPAIR_ABS + UNSPEC_VPAIR_NEG]) + +(define_int_iterator VPAIR_FP_BINARY [UNSPEC_VPAIR_DIV + UNSPEC_VPAIR_MINUS + UNSPEC_VPAIR_MULT + UNSPEC_VPAIR_PLUS + UNSPEC_VPAIR_SMAX + UNSPEC_VPAIR_SMIN]) + +;; Map the vpair operator unspec number to the standard name. +(define_int_attr vpair_stdname [(UNSPEC_VPAIR_ABS "abs") + (UNSPEC_VPAIR_DIV "div") + (UNSPEC_VPAIR_FMA "fma") + (UNSPEC_VPAIR_MINUS "sub") + (UNSPEC_VPAIR_MULT "mul") + (UNSPEC_VPAIR_NEG "neg") + (UNSPEC_VPAIR_PLUS "add") + (UNSPEC_VPAIR_SMAX "smax") + (UNSPEC_VPAIR_SMIN "smin")]) + +;; Map the vpair operator unspec number to the RTL operator. +(define_int_attr VPAIR_OP [(UNSPEC_VPAIR_ABS "ABS") + (UNSPEC_VPAIR_DIV "DIV") + (UNSPEC_VPAIR_FMA "FMA") + (UNSPEC_VPAIR_MINUS "MINUS") + (UNSPEC_VPAIR_MULT "MULT") + (UNSPEC_VPAIR_NEG "NEG") + (UNSPEC_VPAIR_PLUS "PLUS") + (UNSPEC_VPAIR_SMAX "SMAX") + (UNSPEC_VPAIR_SMIN "SMIN")]) + +;; Map the scalar element ID into the appropriate insn type. +(define_int_attr vpair_type [(VPAIR_ELEMENT_FLOAT "vecfloat") + (VPAIR_ELEMENT_DOUBLE "vecdouble")]) + +;; Map the scalar element ID into the appropriate insn type for divide. +(define_int_attr vpair_divtype [(VPAIR_ELEMENT_FLOAT "vecfdiv") + (VPAIR_ELEMENT_DOUBLE "vecdiv")]) + +;; Mode iterator for the vector modes that we provide splat operations for. +(define_mode_iterator VPAIR_SPLAT_VMODE [V4SF V2DF]) + +;; Map element mode to 128-bit vector mode for splat operations +(define_mode_attr VPAIR_SPLAT_ELEMENT_TO_VMODE [(SF "V4SF") + (DF "V2DF")]) + +;; Map either element mode or vector mode into the name for the splat insn. +(define_mode_attr vpair_splat_name [(SF "v8sf") + (DF "v4df") + (V4SF "v8sf") + (V2DF "v4df")]) + +;; Initialize a vector pair to 0 +(define_insn_and_split "vpair_zero" + [(set (match_operand:OO 0 "vsx_register_operand" "=wa") + (unspec:OO [(const_int 0)] UNSPEC_VPAIR_ZERO))] + "TARGET_MMA" + "#" + "&& reload_completed" + [(set (match_dup 1) (match_dup 3)) + (set (match_dup 2) (match_dup 3))] +{ + rtx op0 = operands[0]; + + operands[1] = simplify_gen_subreg (V2DFmode, op0, OOmode, 0); + operands[2] = simplify_gen_subreg (V2DFmode, op0, OOmode, 16); + operands[3] = CONST0_RTX (V2DFmode); +} + [(set_attr "length" "8") + (set_attr "type" "vecperm")]) + +;; Create a vector pair with a value splat'ed (duplicated) to all of the +;; elements. +(define_expand "vpair_splat_" + [(use (match_operand:OO 0 "vsx_register_operand")) + (use (match_operand:SFDF 1 "input_operand"))] + "TARGET_MMA" +{ + rtx op0 = operands[0]; + rtx op1 = operands[1]; + machine_mode element_mode = mode; + + if (op1 == CONST0_RTX (element_mode)) + { + emit_insn (gen_vpair_zero (op0)); + DONE; + } + + machine_mode vector_mode = mode; + rtx vec = gen_reg_rtx (vector_mode); + unsigned num_elements = GET_MODE_NUNITS (vector_mode); + rtvec elements = rtvec_alloc (num_elements); + for (size_t i = 0; i < num_elements; i++) + RTVEC_ELT (elements, i) = copy_rtx (op1); + + rs6000_expand_vector_init (vec, gen_rtx_PARALLEL (vector_mode, elements)); + emit_insn (gen_vpair_splat__internal (op0, vec)); + DONE; +}) + +;; Inner splat support. Operand1 is the vector splat created above. Allow +;; operand 1 to overlap with the output registers to eliminate one move +;; instruction. +(define_insn_and_split "vpair_splat__internal" + [(set (match_operand:OO 0 "vsx_register_operand" "=wa,wa") + (unspec:OO + [(match_operand:VPAIR_SPLAT_VMODE 1 "vsx_register_operand" "0,wa")] + UNSPEC_VPAIR_SPLAT))] + "TARGET_MMA" + "#" + "&& reload_completed" + [(const_int 0)] +{ + rtx op0 = operands[0]; + rtx op0_a = simplify_gen_subreg (mode, op0, OOmode, 0); + rtx op0_b = simplify_gen_subreg (mode, op0, OOmode, 16); + rtx op1 = operands[1]; + unsigned op1_regno = reg_or_subregno (op1); + + /* Check if the input is one of the output registers. */ + if (op1_regno == reg_or_subregno (op0_a)) + emit_move_insn (op0_b, op1); + + else if (op1_regno == reg_or_subregno (op0_b)) + emit_move_insn (op0_a, op1); + + else + { + emit_move_insn (op0_a, op1); + emit_move_insn (op0_b, op1); + } + + DONE; +} + [(set_attr "length" "*,8") + (set_attr "type" "vecmove")]) + +;; Vector pair unary operations. The last argument in the UNSPEC is a +;; CONST_INT which identifies what the scalar element is. +(define_insn_and_split "vpair__2" + [(set (match_operand:OO 0 "vsx_register_operand" "=wa") + (unspec:OO + [(match_operand:OO 1 "vsx_register_operand" "wa") + (const_int VPAIR_FP_ELEMENT)] + VPAIR_FP_UNARY))] + "TARGET_MMA" + "#" + "&& reload_completed" + [(const_int 0)] +{ + vpair_split_unary (operands, mode, , + VPAIR_SPLIT_NORMAL); + DONE; +} + [(set_attr "length" "8") + (set_attr "type" "")]) + +;; Optimize vector pair (neg (abs)). +(define_insn_and_split "vpair_nabs_2" + [(set (match_operand:OO 0 "vsx_register_operand" "=wa") + (unspec:OO + [(unspec:OO + [(match_operand:OO 1 "vsx_register_operand" "wa") + (const_int VPAIR_FP_ELEMENT)] + UNSPEC_VPAIR_ABS) + (const_int VPAIR_FP_ELEMENT)] + UNSPEC_VPAIR_NEG))] + "TARGET_MMA" + "#" + "&& reload_completed" + [(const_int 0)] +{ + vpair_split_unary (operands, mode, ABS, VPAIR_SPLIT_NEGATE); + DONE; +} + [(set_attr "length" "8") + (set_attr "type" "")]) + +;; Vector pair binary operations. The last argument in the UNSPEC is a +;; CONST_INT which identifies what the scalar element is. +(define_insn_and_split "vpair__3" + [(set (match_operand:OO 0 "vsx_register_operand" "=wa") + (unspec:OO + [(match_operand:OO 1 "vsx_register_operand" "wa") + (match_operand:OO 2 "vsx_register_operand" "wa") + (const_int VPAIR_FP_ELEMENT)] + VPAIR_FP_BINARY))] + "TARGET_MMA" + "#" + "&& reload_completed" + [(const_int 0)] +{ + vpair_split_binary (operands, mode, ); + DONE; +} + [(set_attr "length" "8") + (set (attr "type") (if_then_else (match_test " == DIV") + (const_string "") + (const_string "")))]) + +;; Optimize vector pair add of a negative value into a subtract. +(define_insn_and_split "*vpair_add_neg_3" + [(set (match_operand:OO 0 "vsx_register_operand" "=wa") + (unspec:OO + [(match_operand:OO 1 "vsx_register_operand" "wa") + (unspec:OO + [(match_operand:OO 2 "vsx_register_operand" "wa") + (const_int VPAIR_FP_ELEMENT)] + UNSPEC_VPAIR_NEG) + (const_int VPAIR_FP_ELEMENT)] + VPAIR_FP_BINARY))] + "TARGET_MMA" + "#" + "&& 1" + [(set (match_dup 0) + (unspec:OO + [(match_dup 1) + (match_dup 2) + (const_int VPAIR_FP_ELEMENT)] + UNSPEC_VPAIR_MINUS))] +{ +} + [(set_attr "length" "8") + (set_attr "type" "")]) + +;; Vector pair fused-multiply (FMA) operations. The last argument in the +;; UNSPEC is a CONST_INT which identifies what the scalar element is. +(define_insn_and_split "vpair_fma_4" + [(set (match_operand:OO 0 "vsx_register_operand" "=wa,wa") + (unspec:OO + [(match_operand:OO 1 "vsx_register_operand" "%wa,wa") + (match_operand:OO 2 "vsx_register_operand" "wa,0") + (match_operand:OO 3 "vsx_register_operand" "0,wa") + (const_int VPAIR_FP_ELEMENT)] + UNSPEC_VPAIR_FMA))] + "TARGET_MMA" + "#" + "&& reload_completed" + [(const_int 0)] +{ + vpair_split_fma (operands, mode, VPAIR_SPLIT_FMA); + DONE; +} + [(set_attr "length" "8") + (set_attr "type" "")]) + +;; Vector pair fused multiply-subtract +(define_insn_and_split "vpair_fms_4" + [(set (match_operand:OO 0 "vsx_register_operand" "=wa,wa") + (unspec:OO + [(match_operand:OO 1 "vsx_register_operand" "%wa,wa") + (match_operand:OO 2 "vsx_register_operand" "wa,0") + (unspec:OO + [(match_operand:OO 3 "vsx_register_operand" "0,wa") + (const_int VPAIR_FP_ELEMENT)] + UNSPEC_VPAIR_NEG) + (const_int VPAIR_FP_ELEMENT)] + UNSPEC_VPAIR_FMA))] + "TARGET_MMA" + "#" + "&& reload_completed" + [(const_int 0)] +{ + vpair_split_fma (operands, mode, VPAIR_SPLIT_FMS); + DONE; +} + [(set_attr "length" "8") + (set_attr "type" "")]) + +;; Vector pair negate fused multiply-add +(define_insn_and_split "vpair_nfma_4" + [(set (match_operand:OO 0 "vsx_register_operand" "=wa,wa") + (unspec:OO + [(unspec:OO + [(match_operand:OO 1 "vsx_register_operand" "%wa,wa") + (match_operand:OO 2 "vsx_register_operand" "wa,0") + (match_operand:OO 3 "vsx_register_operand" "0,wa") + (const_int VPAIR_FP_ELEMENT)] + UNSPEC_VPAIR_FMA) + (const_int VPAIR_FP_ELEMENT)] + UNSPEC_VPAIR_NEG))] + "TARGET_MMA" + "#" + "&& reload_completed" + [(const_int 0)] +{ + vpair_split_fma (operands, mode, VPAIR_SPLIT_NFMA); + DONE; +} + [(set_attr "length" "8") + (set_attr "type" "")]) + +;; Vector pair fused multiply-subtract +(define_insn_and_split "vpair_nfms_4" + [(set (match_operand:OO 0 "vsx_register_operand" "=wa,wa") + (unspec:OO + [(unspec:OO + [(match_operand:OO 1 "vsx_register_operand" "%wa,wa") + (match_operand:OO 2 "vsx_register_operand" "wa,0") + (unspec:OO + [(match_operand:OO 3 "vsx_register_operand" "0,wa") + (const_int VPAIR_FP_ELEMENT)] + UNSPEC_VPAIR_NEG) + (const_int VPAIR_FP_ELEMENT)] + UNSPEC_VPAIR_FMA) + (const_int VPAIR_FP_ELEMENT)] + UNSPEC_VPAIR_NEG))] + "TARGET_MMA" + "#" + "&& reload_completed" + [(const_int 0)] +{ + vpair_split_fma (operands, mode, VPAIR_SPLIT_NFMS); + DONE; +} + [(set_attr "length" "8") + (set_attr "type" "")]) + +;; Optimize vector pair multiply and vector pair add into vector pair fma, +;; providing the compiler would do this optimization for scalar and vectors. +;; Unlike most of the define_insn_and_splits, this can be done before register +;; allocation. +(define_insn_and_split "*vpair_fma__merge" + [(set (match_operand:OO 0 "vsx_register_operand" "=wa,wa") + (unspec:OO + [(unspec:OO + [(match_operand:OO 1 "vsx_register_operand" "%wa,wa") + (match_operand:OO 2 "vsx_register_operand" "wa,0") + (const_int VPAIR_FP_ELEMENT)] + UNSPEC_VPAIR_MULT) + (match_operand:OO 3 "vsx_register_operand" "0,wa") + (const_int VPAIR_FP_ELEMENT)] + UNSPEC_VPAIR_PLUS))] + "TARGET_MMA && flag_fp_contract_mode == FP_CONTRACT_FAST" + "#" + "&& 1" + [(set (match_dup 0) + (unspec:OO + [(match_dup 1) + (match_dup 2) + (match_dup 3) + (const_int VPAIR_FP_ELEMENT)] + UNSPEC_VPAIR_FMA))] +{ +} + [(set_attr "length" "8") + (set_attr "type" "")]) + +;; Merge multiply and subtract. +(define_insn_and_split "*vpair_fma__merge" + [(set (match_operand:OO 0 "vsx_register_operand" "=wa,wa") + (unspec:OO + [(unspec:OO + [(match_operand:OO 1 "vsx_register_operand" "%wa,wa") + (match_operand:OO 2 "vsx_register_operand" "wa,0") + (const_int VPAIR_FP_ELEMENT)] + UNSPEC_VPAIR_MULT) + (match_operand:OO 3 "vsx_register_operand" "0,wa") + (const_int VPAIR_FP_ELEMENT)] + UNSPEC_VPAIR_MINUS))] + "TARGET_MMA && flag_fp_contract_mode == FP_CONTRACT_FAST" + "#" + "&& 1" + [(set (match_dup 0) + (unspec:OO + [(match_dup 1) + (match_dup 2) + (unspec:OO + [(match_dup 3) + (const_int VPAIR_FP_ELEMENT)] + UNSPEC_VPAIR_NEG) + (const_int VPAIR_FP_ELEMENT)] + UNSPEC_VPAIR_FMA))] +{ +} + [(set_attr "length" "8") + (set_attr "type" "")]) + +(define_insn_and_split "*vpair_fma__merge2" + [(set (match_operand:OO 0 "vsx_register_operand" "=wa,wa") + (unspec:OO + [(unspec:OO + [(match_operand:OO 1 "vsx_register_operand" "%wa,wa") + (match_operand:OO 2 "vsx_register_operand" "wa,0") + (const_int VPAIR_FP_ELEMENT)] + UNSPEC_VPAIR_MULT) + (unspec:OO + [(match_operand:OO 3 "vsx_register_operand" "0,wa") + (const_int VPAIR_FP_ELEMENT)] + UNSPEC_VPAIR_NEG) + (const_int VPAIR_FP_ELEMENT)] + UNSPEC_VPAIR_PLUS))] + "TARGET_MMA && flag_fp_contract_mode == FP_CONTRACT_FAST" + "#" + "&& 1" + [(set (match_dup 0) + (unspec:OO + [(match_dup 1) + (match_dup 2) + (unspec:OO + [(match_dup 3) + (const_int VPAIR_FP_ELEMENT)] + UNSPEC_VPAIR_NEG) + (const_int VPAIR_FP_ELEMENT)] + UNSPEC_VPAIR_FMA))] +{ +} + [(set_attr "length" "8") + (set_attr "type" "")]) + +;; Merge negate, multiply, and add. +(define_insn_and_split "*vpair_nfma__merge" + [(set (match_operand:OO 0 "vsx_register_operand" "=wa,wa") + (unspec:OO + [(unspec:OO + [(unspec:OO + [(match_operand:OO 1 "vsx_register_operand" "%wa,wa") + (match_operand:OO 2 "vsx_register_operand" "wa,0") + (const_int VPAIR_FP_ELEMENT)] + UNSPEC_VPAIR_MULT) + (match_operand:OO 3 "vsx_register_operand" "0,wa") + (const_int VPAIR_FP_ELEMENT)] + UNSPEC_VPAIR_PLUS) + (const_int VPAIR_FP_ELEMENT)] + UNSPEC_VPAIR_NEG))] + "TARGET_MMA && flag_fp_contract_mode == FP_CONTRACT_FAST" + "#" + "&& 1" + [(set (match_dup 0) + (unspec:OO + [(unspec:OO + [(match_dup 1) + (match_dup 2) + (match_dup 3) + (const_int VPAIR_FP_ELEMENT)] + UNSPEC_VPAIR_FMA) + (const_int VPAIR_FP_ELEMENT)] + UNSPEC_VPAIR_NEG))] +{ +} + [(set_attr "length" "8") + (set_attr "type" "")]) + +;; Merge negate, multiply, and subtract. +(define_insn_and_split "*vpair_nfms__merge" + [(set (match_operand:OO 0 "vsx_register_operand" "=wa,wa") + (unspec:OO + [(unspec:OO + [(unspec:OO + [(match_operand:OO 1 "vsx_register_operand" "%wa,wa") + (match_operand:OO 2 "vsx_register_operand" "wa,0") + (const_int VPAIR_FP_ELEMENT)] + UNSPEC_VPAIR_MULT) + (match_operand:OO 3 "vsx_register_operand" "0,wa") + (const_int VPAIR_FP_ELEMENT)] + UNSPEC_VPAIR_MINUS) + (const_int VPAIR_FP_ELEMENT)] + UNSPEC_VPAIR_NEG))] + "TARGET_MMA && flag_fp_contract_mode == FP_CONTRACT_FAST" + "#" + "&& 1" + [(set (match_dup 0) + (unspec:OO + [(unspec:OO + [(match_dup 1) + (match_dup 2) + (unspec:OO + [(match_dup 3) + (const_int VPAIR_FP_ELEMENT)] + UNSPEC_VPAIR_NEG) + (const_int VPAIR_FP_ELEMENT)] + UNSPEC_VPAIR_FMA) + (const_int VPAIR_FP_ELEMENT)] + UNSPEC_VPAIR_NEG))] +{ +} + [(set_attr "length" "8") + (set_attr "type" "")]) + +(define_insn_and_split "*vpair_nfms__merge2" + [(set (match_operand:OO 0 "vsx_register_operand" "=wa,wa") + (unspec:OO + [(unspec:OO + [(unspec:OO + [(match_operand:OO 1 "vsx_register_operand" "%wa,wa") + (match_operand:OO 2 "vsx_register_operand" "wa,0") + (const_int VPAIR_FP_ELEMENT)] + UNSPEC_VPAIR_MULT) + (unspec:OO + [(match_operand:OO 3 "vsx_register_operand" "0,wa") + (const_int VPAIR_FP_ELEMENT)] + UNSPEC_VPAIR_NEG) + (const_int VPAIR_FP_ELEMENT)] + UNSPEC_VPAIR_PLUS) + (const_int VPAIR_FP_ELEMENT)] + UNSPEC_VPAIR_NEG))] + "TARGET_MMA && flag_fp_contract_mode == FP_CONTRACT_FAST" + "#" + "&& 1" + [(set (match_dup 0) + (unspec:OO + [(unspec:OO + [(match_dup 1) + (match_dup 2) + (unspec:OO + [(match_dup 3) + (const_int VPAIR_FP_ELEMENT)] + UNSPEC_VPAIR_NEG) + (const_int VPAIR_FP_ELEMENT)] + UNSPEC_VPAIR_FMA) + (const_int VPAIR_FP_ELEMENT)] + UNSPEC_VPAIR_NEG))] +{ +} + [(set_attr "length" "8") + (set_attr "type" "")]) diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 0bc586d120e..d455d0c5624 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -15827,6 +15827,7 @@ instructions, but allow the compiler to schedule those calls. * NDS32 Built-in Functions:: * Nvidia PTX Built-in Functions:: * Basic PowerPC Built-in Functions:: +* PowerPC Vector Pair Built-in Functions:: * PowerPC AltiVec/VSX Built-in Functions:: * PowerPC Hardware Transactional Memory Built-in Functions:: * PowerPC Atomic Memory Operation Functions:: @@ -23857,6 +23858,90 @@ int vec_any_le (vector unsigned __int128, vector unsigned __int128); @end smallexample +@node PowerPC Vector Pair Built-in Functions +@subsection PowerPC Vector Pair Built-in Functions + +GCC provides functions to speed up processing by using the type +@code{__vector_pair} to hold two 128-bit vectors on processors that +support ISA 3.1 (power10). The @code{__vector_pair} type and the +vector pair built-in functions require the MMA instruction set +(@option{-mmma}) to be enabled, which is on by default for +@option{-mcpu=power10}. + +By default, @code{__vector_pair} types are loaded into vectors with a +single load vector pair instruction. The processing for the built-in +function is done as two separate vector instructions on each of the +two 128-bit vectors stored in the vector pair. The +@code{__vector_pair} type is usually stored with a single vector pair +store instruction. + +The @code{nabs} built-in is a combination of @code{neg} and +@code{abs}. + +The @code{fms} built-in is a combination of @code{fma} and @code{neg} +of the third element. + +The @code{nfma} built-in is a combination of @code{neg} of the +@code{fma} built-in. + +The @code{nfms} built-in is a combination of @code{neg} of the +@code{fms} built-in. + +The following built-in function is independent on the type of the +underlying vector: + +@smallexample +__vector_pair __builtin_vpair_zero (); +@end smallexample + +The following built-in functions operate on pairs of +@code{vector float} values: + +@smallexample +__vector_pair __builtin_vpair_f32_abs (__vector_pair); +__vector_pair __builtin_vpair_f32_add (__vector_pair, __vector_pair); +__vector_pair __builtin_vpair_f32_div (__vector_pair, __vector_pair); +__vector_pair __builtin_vpair_f32_fma (__vector_pair, __vector_pair, + __vector_pair); +__vector_pair __builtin_vpair_f32_fms (__vector_pair, __vector_pair, + __vector_pair); +__vector_pair __builtin_vpair_f32_max (__vector_pair, __vector_pair); +__vector_pair __builtin_vpair_f32_min (__vector_pair, __vector_pair); +__vector_pair __builtin_vpair_f32_mul (__vector_pair, __vector_pair); +__vector_pair __builtin_vpair_f32_nabs (__vector_pair); +__vector_pair __builtin_vpair_f32_neg (__vector_pair); +__vector_pair __builtin_vpair_f32_nfma (__vector_pair, __vector_pair, + __vector_pair); +__vector_pair __builtin_vpair_f32_nfms (__vector_pair, __vector_pair, + __vector_pair); +__vector_pair __builtin_vpair_f32_splat (float); +__vector_pair __builtin_vpair_f32_sub (__vector_pair, __vector_pair); +@end smallexample + +The following built-in functions operate on pairs of +@code{vector double} values: + +@smallexample +__vector_pair __builtin_vpair_f64_abs (__vector_pair); +__vector_pair __builtin_vpair_f64_add (__vector_pair, __vector_pair); +__vector_pair __builtin_vpair_f64_div (__vector_pair, __vector_pair); +__vector_pair __builtin_vpair_f64_fma (__vector_pair, __vector_pair, + __vector_pair); +__vector_pair __builtin_vpair_f64_fms (__vector_pair, __vector_pair, + __vector_pair); +__vector_pair __builtin_vpair_f64_max (__vector_pair, __vector_pair); +__vector_pair __builtin_vpair_f64_min (__vector_pair, __vector_pair); +__vector_pair __builtin_vpair_f64_mul (__vector_pair, __vector_pair); +__vector_pair __builtin_vpair_f64_nabs (__vector_pair); +__vector_pair __builtin_vpair_f64_neg (__vector_pair); +__vector_pair __builtin_vpair_f64_nfma (__vector_pair, __vector_pair, + __vector_pair); +__vector_pair __builtin_vpair_f64_nfms (__vector_pair, __vector_pair, + __vector_pair); +__vector_pair __builtin_vpair_f64_splat (double); +__vector_pair __builtin_vpair_f64_sub (__vector_pair, __vector_pair); +@end smallexample + @node PowerPC Hardware Transactional Memory Built-in Functions @subsection PowerPC Hardware Transactional Memory Built-in Functions GCC provides two interfaces for accessing the Hardware Transactional diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-1.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-1.c new file mode 100644 index 00000000000..a6dbc457639 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-1.c @@ -0,0 +1,87 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ + +/* Test whether the vector builtin code generates the expected instructions for + vector pairs with 4 double elements. */ + +void +test_add (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 xvadddp, 1 stxvp. */ + *dest = __builtin_vpair_f64_add (*x, *y); +} + +void +test_sub (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 xvsubdp, 1 stxvp. */ + *dest = __builtin_vpair_f64_sub (*x, *y); +} + +void +test_multiply (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 xvmuldp, 1 stxvp. */ + *dest = __builtin_vpair_f64_mul (*x, *y); +} + +void +test_min (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 xvmindp, 1 stxvp. */ + *dest = __builtin_vpair_f64_min (*x, *y); +} + +void +test_max (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 xvmaxdp, 1 stxvp. */ + *dest = __builtin_vpair_f64_max (*x, *y); +} + +void +test_negate (__vector_pair *dest, + __vector_pair *x) +{ + /* 1 lxvp, 2 xvnegdp, 1 stxvp. */ + *dest = __builtin_vpair_f64_neg (*x); +} + +void +test_abs (__vector_pair *dest, + __vector_pair *x) +{ + /* 1 lxvp, 2 xvabsdp, 1 stxvp. */ + *dest = __builtin_vpair_f64_abs (*x); +} + +void +test_negative_abs (__vector_pair *dest, + __vector_pair *x) +{ + /* 2 lxvp, 2 xvnabsdp, 1 stxvp. */ + __vector_pair ab = __builtin_vpair_f64_abs (*x); + *dest = __builtin_vpair_f64_neg (ab); +} + +/* { dg-final { scan-assembler-times {\mlxvp\M} 13 } } */ +/* { dg-final { scan-assembler-times {\mstxvp\M} 8 } } */ +/* { dg-final { scan-assembler-times {\mxvabsdp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvadddp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvmaxdp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvmindp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvmuldp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvnabsdp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvnegdp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvsubdp\M} 2 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-10.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-10.c new file mode 100644 index 00000000000..d2ee4dd0dd9 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-10.c @@ -0,0 +1,61 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -Ofast -ffp-contract=fast" } */ + +/* Test whether the vector builtin code merges multiply, add/subtract, and + negate into fma operations. */ + +void +test_fma (__vector_pair *p, + __vector_pair *q, + __vector_pair *r, + __vector_pair *s) +{ + /* lxvp, 2 xvmadd{a,m}sp, stxvp. */ + __vector_pair mul = __builtin_vpair_f32_mul (*q, *r); + *p = __builtin_vpair_f32_add (mul, *s); +} + +void +test_fms (__vector_pair *p, + __vector_pair *q, + __vector_pair *r, + __vector_pair *s) +{ + /* lxvp, 2 xvmsub{a,m}sp, stxvp. */ + __vector_pair mul = __builtin_vpair_f32_mul (*q, *r); + __vector_pair neg = __builtin_vpair_f32_neg (*s); + *p = __builtin_vpair_f32_add (mul, neg); +} + +void +test_nfma (__vector_pair *p, + __vector_pair *q, + __vector_pair *r, + __vector_pair *s) +{ + /* lxvp, 2 xvnmadd{a,m}sp, stxvp. */ + __vector_pair mul = __builtin_vpair_f32_mul (*q, *r); + __vector_pair muladd = __builtin_vpair_f32_add (mul, *s); + *p = __builtin_vpair_f32_neg (muladd); +} + +void +test_nfms (__vector_pair *p, + __vector_pair *q, + __vector_pair *r, + __vector_pair *s) +{ + /* lxvp, 2 xvnmsub{a,m}sp, stxvp. */ + __vector_pair mul = __builtin_vpair_f32_mul (*q, *r); + __vector_pair neg = __builtin_vpair_f32_neg (*s); + __vector_pair muladd = __builtin_vpair_f32_add (mul, neg); + *p = __builtin_vpair_f32_neg (muladd); +} + +/* { dg-final { scan-assembler-times {\mlxvp\M} 12 } } */ +/* { dg-final { scan-assembler-times {\mstxvp\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mxvmadd.sp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvmsub.sp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvnmadd.sp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvnmsub.sp\M} 2 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-11.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-11.c new file mode 100644 index 00000000000..e635b599aed --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-11.c @@ -0,0 +1,65 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2 -ffp-contract=off" } */ + +/* Test whether the vector builtin code do not merge multiply, add/subtract, + and negate into fma operations if -ffp-contract is off. */ + +void +test_fma (__vector_pair *p, + __vector_pair *q, + __vector_pair *r, + __vector_pair *s) +{ + /* lxvp, 2 xvmuldp, 2 xvadddp, stxvp. */ + __vector_pair mul = __builtin_vpair_f64_mul (*q, *r); + *p = __builtin_vpair_f64_add (mul, *s); +} + +void +test_fms (__vector_pair *p, + __vector_pair *q, + __vector_pair *r, + __vector_pair *s) +{ + /* lxvp, 2 xvmuldp, 2 xvsubdp, stxvp. */ + __vector_pair mul = __builtin_vpair_f64_mul (*q, *r); + __vector_pair neg = __builtin_vpair_f64_neg (*s); + *p = __builtin_vpair_f64_add (mul, neg); +} + +void +test_nfma (__vector_pair *p, + __vector_pair *q, + __vector_pair *r, + __vector_pair *s) +{ + /* lxvp, 2 xvmuldp, 2 xvadddp, 2 xvnegdp, stxvp. */ + __vector_pair mul = __builtin_vpair_f64_mul (*q, *r); + __vector_pair muladd = __builtin_vpair_f64_add (mul, *s); + *p = __builtin_vpair_f64_neg (muladd); +} + +void +test_nfms (__vector_pair *p, + __vector_pair *q, + __vector_pair *r, + __vector_pair *s) +{ + /* lxvp, 2 xvmuldp, 2 xvsubdp, 2 xvnegdp, stxvp. */ + __vector_pair mul = __builtin_vpair_f64_mul (*q, *r); + __vector_pair neg = __builtin_vpair_f64_neg (*s); + __vector_pair muladd = __builtin_vpair_f64_add (mul, neg); + *p = __builtin_vpair_f64_neg (muladd); +} + +/* { dg-final { scan-assembler-times {\mlxvp\M} 12 } } */ +/* { dg-final { scan-assembler-times {\mstxvp\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mxvadddp\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mxvmuldp\M} 8 } } */ +/* { dg-final { scan-assembler-times {\mxvnegdp\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mxvsubdp\M} 4 } } */ +/* { dg-final { scan-assembler-not {\mxvmadd.dp\M} } } */ +/* { dg-final { scan-assembler-not {\mxvmsub.dp\M} } } */ +/* { dg-final { scan-assembler-not {\mxvnmadd.dp\M} } } */ +/* { dg-final { scan-assembler-not {\mxvnmsub.dp\M} } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-12.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-12.c new file mode 100644 index 00000000000..4997279473e --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-12.c @@ -0,0 +1,65 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2 -ffp-contract=off" } */ + +/* Test whether the vector builtin code do not merge multiply, add/subtract, + and negate into fma operations if -ffp-contract is off. */ + +void +test_fma (__vector_pair *p, + __vector_pair *q, + __vector_pair *r, + __vector_pair *s) +{ + /* lxvp, 2 xvmulsp, 2 xvaddsp, stxvp. */ + __vector_pair mul = __builtin_vpair_f32_mul (*q, *r); + *p = __builtin_vpair_f32_add (mul, *s); +} + +void +test_fms (__vector_pair *p, + __vector_pair *q, + __vector_pair *r, + __vector_pair *s) +{ + /* lxvp, 2 xvmulsp, 2 xvsubsp, stxvp. */ + __vector_pair mul = __builtin_vpair_f32_mul (*q, *r); + __vector_pair neg = __builtin_vpair_f32_neg (*s); + *p = __builtin_vpair_f32_add (mul, neg); +} + +void +test_nfma (__vector_pair *p, + __vector_pair *q, + __vector_pair *r, + __vector_pair *s) +{ + /* lxvp, 2 xvmulsp, 2 xvaddsp, 2 xvnegsp, stxvp. */ + __vector_pair mul = __builtin_vpair_f32_mul (*q, *r); + __vector_pair muladd = __builtin_vpair_f32_add (mul, *s); + *p = __builtin_vpair_f32_neg (muladd); +} + +void +test_nfms (__vector_pair *p, + __vector_pair *q, + __vector_pair *r, + __vector_pair *s) +{ + /* lxvp, 2 xvmulsp, 2 xvsubsp, 2 xvnegsp, stxvp. */ + __vector_pair mul = __builtin_vpair_f32_mul (*q, *r); + __vector_pair neg = __builtin_vpair_f32_neg (*s); + __vector_pair muladd = __builtin_vpair_f32_add (mul, neg); + *p = __builtin_vpair_f32_neg (muladd); +} + +/* { dg-final { scan-assembler-times {\mlxvp\M} 12 } } */ +/* { dg-final { scan-assembler-times {\mstxvp\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mxvaddsp\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mxvmulsp\M} 8 } } */ +/* { dg-final { scan-assembler-times {\mxvnegsp\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mxvsubsp\M} 4 } } */ +/* { dg-final { scan-assembler-not {\mxvmadd.sp\M} } } */ +/* { dg-final { scan-assembler-not {\mxvmsub.sp\M} } } */ +/* { dg-final { scan-assembler-not {\mxvnmadd.sp\M} } } */ +/* { dg-final { scan-assembler-not {\mxvnmsub.sp\M} } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-2.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-2.c new file mode 100644 index 00000000000..2f663c5780c --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-2.c @@ -0,0 +1,86 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ + +/* Test whether the vector builtin code generates the expected instructions for + vector pairs with 8 float elements. */ + +void +test_add (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 xvaddsp, 1 stxvp. */ + *dest = __builtin_vpair_f32_add (*x, *y); +} + +void +test_sub (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 xvsubsp, 1 stxvp. */ + *dest = __builtin_vpair_f32_sub (*x, *y); +} + +void +test_multiply (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 xvmulsp, 1 stxvp. */ + *dest = __builtin_vpair_f32_mul (*x, *y); +} + +void +test_max (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 xvmaxsp, 1 stxvp. */ + *dest = __builtin_vpair_f32_max (*x, *y); +} + +void +test_min (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 xvminsp, 1 stxvp. */ + *dest = __builtin_vpair_f32_min (*x, *y); +} + +void +test_negate (__vector_pair *dest, + __vector_pair *x) +{ + /* 1 lxvp, 2 xvnegsp, 1 stxvp. */ + *dest = __builtin_vpair_f32_neg (*x); +} + +void +test_abs (__vector_pair *dest, + __vector_pair *x) +{ + /* 1 lxvp, 2 xvabssp, 1 stxvp. */ + *dest = __builtin_vpair_f32_abs (*x); +} + +void +test_negative_abs (__vector_pair *dest, + __vector_pair *x) +{ + /* 2 lxvp, 2 xvnabssp, 1 stxvp. */ + __vector_pair ab = __builtin_vpair_f32_abs (*x); + *dest = __builtin_vpair_f32_neg (ab); +} + +/* { dg-final { scan-assembler-times {\mlxvp\M} 13 } } */ +/* { dg-final { scan-assembler-times {\mstxvp\M} 8 } } */ +/* { dg-final { scan-assembler-times {\mxvabssp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvaddsp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvmaxsp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvminsp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvmulsp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvnabssp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvnegsp\M} 2 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-3.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-3.c new file mode 100644 index 00000000000..43b91461759 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-3.c @@ -0,0 +1,57 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ + +/* Test whether the vector builtin code generates the expected FMA instructions + for vector pairs with 4 double elements. */ + +void +test_fma (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y, + __vector_pair *z) +{ + /* 3 lxvp, 2 xvmadd{a,q}sp, 1 stxvp. */ + *dest = __builtin_vpair_f64_fma (*x, *y, *z); +} + +void +test_fms (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y, + __vector_pair *z) +{ + /* 3 lxvp, 2 xvmsub{a,q}sp, 1 stxvp. */ + __vector_pair n = __builtin_vpair_f64_neg (*z); + *dest = __builtin_vpair_f64_fma (*x, *y, n); +} + +void +test_nfma (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y, + __vector_pair *z) +{ + /* 3 lxvp, 2 xvnmadd{a,q}sp, 1 stxvp. */ + __vector_pair w = __builtin_vpair_f64_fma (*x, *y, *z); + *dest = __builtin_vpair_f64_neg (w); +} + +void +test_nfms (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y, + __vector_pair *z) +{ + /* 3 lxvp, 2 xvnmsub{a,q}sp, 1 stxvp. */ + __vector_pair n = __builtin_vpair_f64_neg (*z); + __vector_pair w = __builtin_vpair_f64_fma (*x, *y, n); + *dest = __builtin_vpair_f64_neg (w); +} + +/* { dg-final { scan-assembler-times {\mlxvp\M} 12 } } */ +/* { dg-final { scan-assembler-times {\mstxvp\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mxvmadd.dp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvnmadd.dp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvnmsub.dp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvmsub.dp\M} 2 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-4.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-4.c new file mode 100644 index 00000000000..d5c55d3883c --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-4.c @@ -0,0 +1,57 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ + +/* Test whether the vector builtin code generates the expected FMA instructions + for vector pairs with 8 float elements. */ + +void +test_fma (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y, + __vector_pair *z) +{ + /* 3 lxvp, 2 xvmadd{a,q}sp, 1 stxvp. */ + *dest = __builtin_vpair_f32_fma (*x, *y, *z); +} + +void +test_fms (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y, + __vector_pair *z) +{ + /* 3 lxvp, 2 xvmsub{a,q}sp, 1 stxvp. */ + __vector_pair n = __builtin_vpair_f32_neg (*z); + *dest = __builtin_vpair_f32_fma (*x, *y, n); +} + +void +test_nfma (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y, + __vector_pair *z) +{ + /* 3 lxvp, 2 xvnmadd{a,q}sp, 1 stxvp. */ + __vector_pair w = __builtin_vpair_f32_fma (*x, *y, *z); + *dest = __builtin_vpair_f32_neg (w); +} + +void +test_nfms (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y, + __vector_pair *z) +{ + /* 3 lxvp, 2 xvnmsub{a,q}sp, 1 stxvp. */ + __vector_pair n = __builtin_vpair_f32_neg (*z); + __vector_pair w = __builtin_vpair_f32_fma (*x, *y, n); + *dest = __builtin_vpair_f32_neg (w); +} + +/* { dg-final { scan-assembler-times {\mlxvp\M} 12 } } */ +/* { dg-final { scan-assembler-times {\mstxvp\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mxvmadd.sp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvnmadd.sp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvnmsub.sp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvmsub.sp\M} 2 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-5.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-5.c new file mode 100644 index 00000000000..9b645e626e1 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-5.c @@ -0,0 +1,56 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ + +/* Test whether the vector builtin code generates the expected instructions for + vector pairs zero and splat functions for vector pairs containing + doubles. */ + +void +test_zero (__vector_pair *p) +{ + /* 2 xxspltib/xxlxor. */ + *p = __builtin_vpair_zero (); +} + +void +test_splat_zero (__vector_pair *p) +{ + /* 2 xxspltib/xxlxor. */ + *p = __builtin_vpair_f64_splat (0.0); +} + +void +test_splat_one (__vector_pair *p) +{ + /* xxspltidp, xxlor. */ + *p = __builtin_vpair_f64_splat (1.0); +} + +void +test_splat_pi (__vector_pair *p) +{ + /* plxv, xxlor (note, we cannot use xxspltidp). */ + *p = __builtin_vpair_f64_splat (3.1415926535); +} + +void +test_splat_arg (__vector_pair *p, double x) +{ + /* xxpermdi, xxlor. */ + *p = __builtin_vpair_f64_splat (x); +} + +void +test_splat_mem (__vector_pair *p, double *q) +{ + /* lxvdsx, xxlor. */ + *p = __builtin_vpair_f64_splat (*q); +} + +/* { dg-final { scan-assembler-times {\mlxvdsx\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mp?lxvx?\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mstxvp\M} 6 } } */ +/* { dg-final { scan-assembler-times {\mxxpermdi\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mxxspltib\M|\mxxlxor\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mxxspltidp\M} 1 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-6.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-6.c new file mode 100644 index 00000000000..5ec53d4bfc3 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-6.c @@ -0,0 +1,56 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ + +/* Test whether the vector builtin code generates the expected instructions for + vector pairs zero and splat functions for vector pairs containing + floats. */ + +void +test_zero (__vector_pair *p) +{ + /* 2 xxspltib/xxlxor. */ + *p = __builtin_vpair_zero (); +} + +void +test_splat_zero (__vector_pair *p) +{ + /* 2 xxspltib/xxlxor. */ + *p = __builtin_vpair_f32_splat (0.0f); +} + +void +test_splat_one (__vector_pair *p) +{ + /* xxspltiw, xxlor. */ + *p = __builtin_vpair_f32_splat (1.0f); +} + +void +test_splat_pi (__vector_pair *p) +{ + /* xxspltiw, xxlor. */ + *p = __builtin_vpair_f32_splat (3.1415926535f); +} + +void +test_splat_arg (__vector_pair *p, float x) +{ + /* xscvdpspn, xxspltw, xxlor. */ + *p = __builtin_vpair_f32_splat (x); +} + +void +test_splat_mem (__vector_pair *p, float *q) +{ + /* xlvwsx, xxlor. */ + *p = __builtin_vpair_f32_splat (*q); +} + +/* { dg-final { scan-assembler-times {\mlxvwsx\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mstxvp\M} 6 } } */ +/* { dg-final { scan-assembler-times {\mxscvdpspn\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mxxspltib\M|\mxxlxor\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mxxspltiw\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxxspltw\M} 1 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-7.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-7.c new file mode 100644 index 00000000000..51a400cb4b3 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-7.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ + +/* Test whether the vector builtin code merges plus and neg into a minus + operation. */ + +void +test_minus (__vector_pair *p, __vector_pair *q, __vector_pair *r) +{ + *p = __builtin_vpair_f64_add (*q, __builtin_vpair_f64_neg (*r)); +} + +/* { dg-final { scan-assembler-times {\mlxvp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mstxvp\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mxvsubdp\M} 2 } } */ +/* { dg-final { scan-assembler-not {\mxvadddp\M} } } */ +/* { dg-final { scan-assembler-not {\mxvnegdp\M} } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-8.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-8.c new file mode 100644 index 00000000000..67957e3bdea --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-8.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ + +/* Test whether the vector builtin code merges plus and neg into a minus + operation. */ + +void +test_minus (__vector_pair *p, __vector_pair *q, __vector_pair *r) +{ + *p = __builtin_vpair_f32_add (*q, __builtin_vpair_f32_neg (*r)); +} + +/* { dg-final { scan-assembler-times {\mlxvp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mstxvp\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mxvsubsp\M} 2 } } */ +/* { dg-final { scan-assembler-not {\mxvaddsp\M} } } */ +/* { dg-final { scan-assembler-not {\mxvnegsp\M} } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-9.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-9.c new file mode 100644 index 00000000000..eacf8dae9d8 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-9.c @@ -0,0 +1,61 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -Ofast -ffp-contract=fast" } */ + +/* Test whether the vector builtin code merges multiply, add/subtract, and + negate into fma operations. */ + +void +test_fma (__vector_pair *p, + __vector_pair *q, + __vector_pair *r, + __vector_pair *s) +{ + /* lxvp, 2 xvmadd{a,m}dp, stxvp. */ + __vector_pair mul = __builtin_vpair_f64_mul (*q, *r); + *p = __builtin_vpair_f64_add (mul, *s); +} + +void +test_fms (__vector_pair *p, + __vector_pair *q, + __vector_pair *r, + __vector_pair *s) +{ + /* lxvp, 2 xvmsub{a,m}dp, stxvp. */ + __vector_pair mul = __builtin_vpair_f64_mul (*q, *r); + __vector_pair neg = __builtin_vpair_f64_neg (*s); + *p = __builtin_vpair_f64_add (mul, neg); +} + +void +test_nfma (__vector_pair *p, + __vector_pair *q, + __vector_pair *r, + __vector_pair *s) +{ + /* lxvp, 2 xvnmadd{a,m}dp, stxvp. */ + __vector_pair mul = __builtin_vpair_f64_mul (*q, *r); + __vector_pair muladd = __builtin_vpair_f64_add (mul, *s); + *p = __builtin_vpair_f64_neg (muladd); +} + +void +test_nfms (__vector_pair *p, + __vector_pair *q, + __vector_pair *r, + __vector_pair *s) +{ + /* lxvp, 2 xvnmsub{a,m}dp, stxvp. */ + __vector_pair mul = __builtin_vpair_f64_mul (*q, *r); + __vector_pair neg = __builtin_vpair_f64_neg (*s); + __vector_pair muladd = __builtin_vpair_f64_add (mul, neg); + *p = __builtin_vpair_f64_neg (muladd); +} + +/* { dg-final { scan-assembler-times {\mlxvp\M} 12 } } */ +/* { dg-final { scan-assembler-times {\mstxvp\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mxvmadd.dp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvmsub.dp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvnmadd.dp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvnmsub.dp\M} 2 } } */