From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 2078) id 58B9F3858437; Wed, 4 Aug 2021 04:58:20 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 58B9F3858437 MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="utf-8" From: hongtao Liu To: gcc-cvs@gcc.gnu.org Subject: [gcc r12-2718] Support cond_{fma, fms, fnma, fnms} for vector float/double under AVX512. X-Act-Checkin: gcc X-Git-Author: liuhongt X-Git-Refname: refs/heads/master X-Git-Oldrev: 22e40cc7feb8abda85762e4f07719836d5c57f1a X-Git-Newrev: 2fc2e3917f9c8fd94f5d101477971d16c483ef88 Message-Id: <20210804045820.58B9F3858437@sourceware.org> Date: Wed, 4 Aug 2021 04:58:20 +0000 (GMT) X-BeenThere: gcc-cvs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-cvs mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 Aug 2021 04:58:20 -0000 https://gcc.gnu.org/g:2fc2e3917f9c8fd94f5d101477971d16c483ef88 commit r12-2718-g2fc2e3917f9c8fd94f5d101477971d16c483ef88 Author: liuhongt Date: Wed Aug 4 11:41:37 2021 +0800 Support cond_{fma,fms,fnma,fnms} for vector float/double under AVX512. gcc/ChangeLog: * config/i386/sse.md (cond_fma): New expander. (cond_fms): Ditto. (cond_fnma): Ditto. (cond_fnms): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/cond_op_fma_double-1.c: New test. * gcc.target/i386/cond_op_fma_double-2.c: New test. * gcc.target/i386/cond_op_fma_float-1.c: New test. * gcc.target/i386/cond_op_fma_float-2.c: New test. Diff: --- gcc/config/i386/sse.md | 96 ++++++++++ .../gcc.target/i386/cond_op_fma_double-1.c | 87 +++++++++ .../gcc.target/i386/cond_op_fma_double-2.c | 206 +++++++++++++++++++++ .../gcc.target/i386/cond_op_fma_float-1.c | 20 ++ .../gcc.target/i386/cond_op_fma_float-2.c | 4 + 5 files changed, 413 insertions(+) diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 52b2b4214d7..f5968e04669 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -4438,6 +4438,29 @@ [(set_attr "type" "ssemuladd") (set_attr "mode" "")]) +(define_expand "cond_fma" + [(set (match_operand:VF_AVX512VL 0 "register_operand") + (vec_merge:VF_AVX512VL + (fma:VF_AVX512VL + (match_operand:VF_AVX512VL 2 "vector_operand") + (match_operand:VF_AVX512VL 3 "vector_operand") + (match_operand:VF_AVX512VL 4 "vector_operand")) + (match_operand:VF_AVX512VL 5 "nonimm_or_0_operand") + (match_operand: 1 "register_operand")))] + "TARGET_AVX512F" +{ + rtx tmp = gen_reg_rtx (mode); + emit_insn (gen_fma4 (tmp, + operands[2], + operands[3], + operands[4])); + emit_move_insn (operands[0], gen_rtx_VEC_MERGE (mode, + tmp, + operands[5], + operands[1])); + DONE; +}) + (define_insn "_fmadd__mask" [(set (match_operand:VF_AVX512VL 0 "register_operand" "=v,v") (vec_merge:VF_AVX512VL @@ -4515,6 +4538,30 @@ [(set_attr "type" "ssemuladd") (set_attr "mode" "")]) +(define_expand "cond_fms" + [(set (match_operand:VF_AVX512VL 0 "register_operand") + (vec_merge:VF_AVX512VL + (fma:VF_AVX512VL + (match_operand:VF_AVX512VL 2 "vector_operand") + (match_operand:VF_AVX512VL 3 "vector_operand") + (neg:VF_AVX512VL + (match_operand:VF_AVX512VL 4 "vector_operand"))) + (match_operand:VF_AVX512VL 5 "nonimm_or_0_operand") + (match_operand: 1 "register_operand")))] + "TARGET_AVX512F" +{ + rtx tmp = gen_reg_rtx (mode); + emit_insn (gen_fms4 (tmp, + operands[2], + operands[3], + operands[4])); + emit_move_insn (operands[0], gen_rtx_VEC_MERGE (mode, + tmp, + operands[5], + operands[1])); + DONE; +}) + (define_insn "_fmsub__mask" [(set (match_operand:VF_AVX512VL 0 "register_operand" "=v,v") (vec_merge:VF_AVX512VL @@ -4594,6 +4641,30 @@ [(set_attr "type" "ssemuladd") (set_attr "mode" "")]) +(define_expand "cond_fnma" + [(set (match_operand:VF_AVX512VL 0 "register_operand") + (vec_merge:VF_AVX512VL + (fma:VF_AVX512VL + (neg:VF_AVX512VL + (match_operand:VF_AVX512VL 2 "vector_operand")) + (match_operand:VF_AVX512VL 3 "vector_operand") + (match_operand:VF_AVX512VL 4 "vector_operand")) + (match_operand:VF_AVX512VL 5 "nonimm_or_0_operand") + (match_operand: 1 "register_operand")))] + "TARGET_AVX512F" +{ + rtx tmp = gen_reg_rtx (mode); + emit_insn (gen_fnma4 (tmp, + operands[2], + operands[3], + operands[4])); + emit_move_insn (operands[0], gen_rtx_VEC_MERGE (mode, + tmp, + operands[5], + operands[1])); + DONE; +}) + (define_insn "_fnmadd__mask" [(set (match_operand:VF_AVX512VL 0 "register_operand" "=v,v") (vec_merge:VF_AVX512VL @@ -4675,6 +4746,31 @@ [(set_attr "type" "ssemuladd") (set_attr "mode" "")]) +(define_expand "cond_fnms" + [(set (match_operand:VF_AVX512VL 0 "register_operand") + (vec_merge:VF_AVX512VL + (fma:VF_AVX512VL + (neg:VF_AVX512VL + (match_operand:VF_AVX512VL 2 "vector_operand")) + (match_operand:VF_AVX512VL 3 "vector_operand") + (neg:VF_AVX512VL + (match_operand:VF_AVX512VL 4 "vector_operand"))) + (match_operand:VF_AVX512VL 5 "nonimm_or_0_operand") + (match_operand: 1 "register_operand")))] + "TARGET_AVX512F" +{ + rtx tmp = gen_reg_rtx (mode); + emit_insn (gen_fnms4 (tmp, + operands[2], + operands[3], + operands[4])); + emit_move_insn (operands[0], gen_rtx_VEC_MERGE (mode, + tmp, + operands[5], + operands[1])); + DONE; +}) + (define_insn "_fnmsub__mask" [(set (match_operand:VF_AVX512VL 0 "register_operand" "=v,v") (vec_merge:VF_AVX512VL diff --git a/gcc/testsuite/gcc.target/i386/cond_op_fma_double-1.c b/gcc/testsuite/gcc.target/i386/cond_op_fma_double-1.c new file mode 100644 index 00000000000..4e14b75743c --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/cond_op_fma_double-1.c @@ -0,0 +1,87 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=skylake-avx512 -fdump-tree-optimized" } */ +/* { dg-final { scan-tree-dump-times ".COND_FMA" 3 "optimized" } } */ +/* { dg-final { scan-tree-dump-times ".COND_FNMA" 3 "optimized" } } */ +/* { dg-final { scan-tree-dump-times ".COND_FMS" 3 "optimized" } } */ +/* { dg-final { scan-tree-dump-times ".COND_FNMS" 3 "optimized" } } */ +/* { dg-final { scan-assembler-times "vfmadd132pd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmadd132pd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmsub132pd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmsub132pd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmadd231pd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmadd231pd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmsub231pd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmsub231pd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmadd132pd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmadd132pd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmsub132pd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmsub132pd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ + +#ifndef NUM +#define NUM 800 +#endif +#ifndef TYPE +#define TYPE double +#endif +#ifndef __BUILTIN_FMA +#define __BUILTIN_FMA __builtin_fma +#endif + +TYPE a[NUM], b[NUM], c[NUM], d[NUM], e[NUM], j[NUM]; +#define MIN(X,Y) ((X) < (Y) ? (X) : (Y)) +#define MAX(X,Y) ((X) < (Y) ? (Y) : (X)) + +#define FMA3(OPNAME, OP1, OP2) \ + void \ + __attribute__ ((noipa,optimize ("O3"))) \ + foo3_##OPNAME () \ + { \ + for (int i = 0; i != NUM; i++) \ + { \ + TYPE tmp = MAX(d[i], e[i]); \ + if (b[i] < c[i]) \ + a[i] = __BUILTIN_FMA (OP1 d[i], e[i], OP2 tmp); \ + else \ + a[i] = tmp; \ + } \ + } + +#define FMAZ(OPNAME, OP1, OP2) \ + void \ + __attribute__ ((noipa,optimize ("O3"))) \ + fooz_##OPNAME () \ + { \ + for (int i = 0; i != NUM; i++) \ + if (b[i] < c[i]) \ + a[i] = __BUILTIN_FMA (OP1 d[i], e[i], OP2 a[i]); \ + else \ + a[i] = .0; \ + } + +#define FMA1(OPNAME, OP1, OP2) \ + void \ + __attribute__ ((noipa,optimize ("O3"))) \ + foo1_##OPNAME () \ + { \ + for (int i = 0; i != NUM; i++) \ + if (b[i] < c[i]) \ + a[i] = __BUILTIN_FMA (OP1 d[i], e[i], OP2 a[i]); \ + else \ + a[i] = d[i]; \ + } + + +FMAZ (fma,, +); +FMAZ (fms,, -); +FMAZ (fnma, -, +); +FMAZ (fnms, -, -); + +FMA1 (fma,, +); +FMA1 (fms,, -); +FMA1 (fnma, -, +); +FMA1 (fnms, -, -); + +FMA3 (fma,, +); +FMA3 (fms,, -); +FMA3 (fnma, -, +); +FMA3 (fnms, -, -); diff --git a/gcc/testsuite/gcc.target/i386/cond_op_fma_double-2.c b/gcc/testsuite/gcc.target/i386/cond_op_fma_double-2.c new file mode 100644 index 00000000000..d8180de7491 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/cond_op_fma_double-2.c @@ -0,0 +1,206 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx512vl -mprefer-vector-width=256" } */ +#define AVX512VL +#ifndef CHECK +#define CHECK "avx512f-helper.h" +#endif + +#include CHECK + +#include "cond_op_fma_double-1.c" +#define FMA3_O2(OPNAME, OP1, OP2) \ + void \ + __attribute__ ((noipa,optimize ("O2"))) \ + foo3_o2_##OPNAME () \ + { \ + for (int i = 0; i != NUM; i++) \ + { \ + TYPE tmp = MAX(d[i], e[i]); \ + if (b[i] < c[i]) \ + j[i] = __BUILTIN_FMA (OP1 d[i], e[i], OP2 tmp); \ + else \ + j[i] = tmp; \ + } \ + } + +#define FMAZ_O2(OPNAME, OP1, OP2) \ + void \ + __attribute__ ((noipa,optimize ("O2"))) \ + fooz_o2_##OPNAME () \ + { \ + for (int i = 0; i != NUM; i++) \ + if (b[i] < c[i]) \ + j[i] = __BUILTIN_FMA (OP1 d[i], e[i], OP2 a[i]); \ + else \ + j[i] = .0; \ + } + +#define FMA1_O2(OPNAME, OP1, OP2) \ + void \ + __attribute__ ((noipa,optimize ("O2"))) \ + foo1_o2_##OPNAME () \ + { \ + for (int i = 0; i != NUM; i++) \ + if (b[i] < c[i]) \ + j[i] = __BUILTIN_FMA (OP1 d[i], e[i], OP2 a[i]); \ + else \ + j[i] = d[i]; \ + } + +FMAZ_O2 (fma,, +); +FMAZ_O2 (fms,, -); +FMAZ_O2 (fnma, -, +); +FMAZ_O2 (fnms, -, -); + +FMA1_O2 (fma,, +); +FMA1_O2 (fms,, -); +FMA1_O2 (fnma, -, +); +FMA1_O2 (fnms, -, -); + +FMA3_O2 (fma,, +); +FMA3_O2 (fms,, -); +FMA3_O2 (fnma, -, +); +FMA3_O2 (fnms, -, -); + +static void +test_256 (void) +{ + int sign = -1; + for (int i = 0; i != NUM; i++) + { + a[i] = 0; + d[i] = i * 2; + e[i] = i * i * 3 - i * 9 + 153; + b[i] = i * 83; + c[i] = b[i] + sign; + sign *= -1; + j[i] = 1; + } + foo1_o2_fma (); + /* foo1_fma need to be after foo1_o2_fma since + it changes a[i] which is used by foo1_o2_fma. */ + foo1_fma (); + for (int i = 0; i != NUM; i++) + { + if (a[i] != j[i]) + abort (); + a[i] = 0; + b[i] = 1; + } + + foo1_o2_fms (); + foo1_fms (); + for (int i = 0; i != NUM; i++) + { + if (a[i] != j[i]) + abort (); + a[i] = 0; + j[i] = 1; + } + + foo1_o2_fnma (); + foo1_fnma (); + for (int i = 0; i != NUM; i++) + { + if (a[i] != j[i]) + abort (); + a[i] = 0; + j[i] = 1; + } + + foo1_o2_fnms (); + foo1_fnms (); + for (int i = 0; i != NUM; i++) + { + if (a[i] != j[i]) + abort (); + a[i] = 0; + j[i] = 1; + } + + fooz_o2_fma (); + fooz_fma (); + for (int i = 0; i != NUM; i++) + { + if (a[i] != j[i]) + abort (); + a[i] = 0; + b[i] = 1; + } + + fooz_o2_fms (); + fooz_fms (); + for (int i = 0; i != NUM; i++) + { + if (a[i] != j[i]) + abort (); + a[i] = 0; + j[i] = 1; + } + + fooz_o2_fnma (); + fooz_fnma (); + for (int i = 0; i != NUM; i++) + { + if (a[i] != j[i]) + abort (); + a[i] = 0; + j[i] = 1; + } + + fooz_o2_fnms (); + fooz_fnms (); + for (int i = 0; i != NUM; i++) + { + if (a[i] != j[i]) + abort (); + a[i] = 0; + j[i] = 1; + } + + foo3_o2_fma (); + foo3_fma (); + for (int i = 0; i != NUM; i++) + { + if (a[i] != j[i]) + abort (); + a[i] = 0; + b[i] = 1; + } + + foo3_o2_fms (); + foo3_fms (); + for (int i = 0; i != NUM; i++) + { + if (a[i] != j[i]) + abort (); + a[i] = 0; + j[i] = 1; + } + + foo3_o2_fnma (); + foo3_fnma (); + for (int i = 0; i != NUM; i++) + { + if (a[i] != j[i]) + abort (); + a[i] = 0; + j[i] = 1; + } + + foo3_o2_fnms (); + foo3_fnms (); + for (int i = 0; i != NUM; i++) + { + if (a[i] != j[i]) + abort (); + a[i] = 0; + j[i] = 1; + } +} + +static void +test_128 () +{ + +} diff --git a/gcc/testsuite/gcc.target/i386/cond_op_fma_float-1.c b/gcc/testsuite/gcc.target/i386/cond_op_fma_float-1.c new file mode 100644 index 00000000000..a5752e71b15 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/cond_op_fma_float-1.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=skylake-avx512 -DTYPE=float -fdump-tree-optimized -D__BUILTIN_FMA=__builtin_fmaf" } */ +/* { dg-final { scan-tree-dump-times ".COND_FMA" 3 "optimized" } } */ +/* { dg-final { scan-tree-dump-times ".COND_FNMA" 3 "optimized" } } */ +/* { dg-final { scan-tree-dump-times ".COND_FMS" 3 "optimized" } } */ +/* { dg-final { scan-tree-dump-times ".COND_FNMS" 3 "optimized" } } */ +/* { dg-final { scan-assembler-times "vfmadd132ps\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmadd132ps\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmsub132ps\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmsub132ps\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmadd231ps\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmadd231ps\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmsub231ps\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmsub231ps\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmadd132ps\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmadd132ps\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfmsub132ps\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vfnmsub132ps\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ + +#include "cond_op_fma_double-1.c" diff --git a/gcc/testsuite/gcc.target/i386/cond_op_fma_float-2.c b/gcc/testsuite/gcc.target/i386/cond_op_fma_float-2.c new file mode 100644 index 00000000000..0097735dddb --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/cond_op_fma_float-2.c @@ -0,0 +1,4 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx512vl -mprefer-vector-width=256 -DTYPE=float -D__BUILTIN_FMA=__builtin_fmaf" } */ + +#include "cond_op_fma_double-2.c"