From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 126786 invoked by alias); 20 Oct 2018 06:47:45 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 115492 invoked by uid 89); 20 Oct 2018 06:46:33 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-26.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy=avx512fintrin.h, fmaintrin.h, UD:fmaintrin.h, UD:avx512fintrin.h X-HELO: mail-oi1-f194.google.com Received: from mail-oi1-f194.google.com (HELO mail-oi1-f194.google.com) (209.85.167.194) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Sat, 20 Oct 2018 06:46:29 +0000 Received: by mail-oi1-f194.google.com with SMTP id u74-v6so28486915oia.11 for ; Fri, 19 Oct 2018 23:46:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id; bh=pxvEAafDKBTjutlZuvIUFl7VwvpEIF+U4htcgJBxS9A=; b=iUl0jIVC6vhP5ac+P81mwsGba7ekod/KO3Fa4U21IFJJSSuAKFfDE2WbgSWsPNv9Sg 5khuUb18BiTP98Iy6UYUpR8dcB5fufbXpT7F93rKNF8W3rJZOR9VnRNhoYd2aKpX1nDd 5GjhJhkPJ1gRcDokbVGJylqJ5uarFCECJPLJdPz7zMDVJ/ggGTmKrFnTXifddh3zOA8r 02cdhIByDCxqYlTSIwG8MxEOJOn8wk75LAqCooZ90D3lbiM8QzZ4pbEyC/+OtxuVZa8V LhaSCujZHSZW/tjpVWOgEDANGmd5hAJE4/AxSMokpxg/gy5Os265NjEvZsznrYlgqhT5 zvMQ== Return-Path: Received: from gnu-efi-2.localdomain ([2607:fb90:200:a0a:3475:7d15:bbf9:a02a]) by smtp.gmail.com with ESMTPSA id s5sm8833655otc.40.2018.10.19.23.46.18 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 19 Oct 2018 23:46:25 -0700 (PDT) From: "H.J. Lu" To: gcc-patches@gcc.gnu.org Cc: Uros Bizjak Subject: [PATCH 1/4] i386: Enable AVX512 memory broadcast for FMSUB Date: Sat, 20 Oct 2018 08:35:00 -0000 Message-Id: <20181020064608.9674-1-hjl.tools@gmail.com> X-IsSubscribed: yes X-SW-Source: 2018-10/txt/msg01235.txt.bz2 Many AVX512 vector operations can broadcast from a scalar memory source. This patch enables memory broadcast for FMSUB operations. In order to support AVX512 memory broadcast for FMSUB, FMSUB builtin functions are also added, instead of passing the negated value to FMA builtin functions. gcc/ PR target/72782 * config/i386/avx512fintrin.h (_mm512_fmsub_round_pd): Use __builtin_ia32_vfmsubpd512_mask. (_mm512_mask_fmsub_round_pd): Likewise. (_mm512_fmsub_pd): Likewise. (_mm512_mask_fmsub_pd): Likewise. (_mm512_maskz_fmsub_round_pd): Use __builtin_ia32_vfmsubpd512_maskz. (_mm512_maskz_fmsub_pd): Likewise. (_mm512_fmsub_round_ps): Use __builtin_ia32_vfmsubps512_mask. (_mm512_mask_fmsub_round_ps): Likewise. (_mm512_fmsub_ps): Likewise. (_mm512_mask_fmsub_ps): Likewise. (_mm512_maskz_fmsub_round_ps): Use __builtin_ia32_vfmsubps512_maskz. (_mm512_maskz_fmsub_ps): Likewise. * config/i386/avx512vlintrin.h (_mm256_mask_fmsub_pd): Use __builtin_ia32_vfmsubpd256_mask. (_mm256_maskz_fmsub_pd): Use __builtin_ia32_vfmsubpd256_maskz. (_mm_mask_fmsub_pd): Use __builtin_ia32_vfmaddpd128_mask (_mm_maskz_fmsub_pd): Use __builtin_ia32_vfmsubpd128_maskz. (_mm256_mask_fmsub_ps): Use __builtin_ia32_vfmsubps256_mask. (_mm256_mask_fmsub_ps): Use __builtin_ia32_vfmsubps256_mask. (_mm256_maskz_fmsub_ps): Use __builtin_ia32_vfmsubps256_maskz. (_mm_mask_fmsub_ps): Use __builtin_ia32_vfmsubps128_mask. (_mm_maskz_fmsub_ps): Use __builtin_ia32_vfmsubps128_maskz. * config/i386/fmaintrin.h (_mm_fmsub_pd): Use __builtin_ia32_vfmsubpd. (_mm256_fmsub_pd): Use __builtin_ia32_vfmsubpd256. (_mm_fmsub_ps): Use __builtin_ia32_vfmsubps. (_mm256_fmsub_ps): Use __builtin_ia32_vfmsubps256. (_mm_fmsub_sd): Use __builtin_ia32_vfmsubsd3. (_mm_fmsub_ss): Use __builtin_ia32_vfmsubss3. * config/i386/i386-builtin.def: Add __builtin_ia32_vfmsubpd256_mask, __builtin_ia32_vfmsubpd256_maskz, __builtin_ia32_vfmsubpd128_mask, __builtin_ia32_vfmsubpd128_maskz, __builtin_ia32_vfmsubps256_mask, __builtin_ia32_vfmsubps256_maskz, __builtin_ia32_vfmsubps128_mask, __builtin_ia32_vfmsubps128_maskz, __builtin_ia32_vfmsubpd512_mask, __builtin_ia32_vfmsubpd512_maskz, __builtin_ia32_vfmsubps512_mask, __builtin_ia32_vfmsubps512_maskz, __builtin_ia32_vfmsubss3, __builtin_ia32_vfmsubsd3, __builtin_ia32_vfmsubps, __builtin_ia32_vfmsubpd, __builtin_ia32_vfmsubps256 and. __builtin_ia32_vfmsubpd256. * config/i386/sse.md (fma4i_fmsub_): New. (_fmsub__maskz): Likewise. (*fma_fmsub__bcst_1): Likewise. (*fma_fmsub__bcst_2): Likewise. (*fma_fmsub__bcst_3): Likewise. (fmai_vmfmsub_): Likewise. gcc/testsuite/ PR target/72782 * gcc.target/i386/avx512f-fmsub-df-zmm-1.c: New test. * gcc.target/i386/avx512f-fmsub-sf-zmm-1.c: Likewise. * gcc.target/i386/avx512f-fmsub-sf-zmm-2.c: Likewise. * gcc.target/i386/avx512f-fmsub-sf-zmm-3.c: Likewise. * gcc.target/i386/avx512f-fmsub-sf-zmm-4.c: Likewise. * gcc.target/i386/avx512f-fmsub-sf-zmm-5.c: Likewise. * gcc.target/i386/avx512f-fmsub-sf-zmm-6.c: Likewise. * gcc.target/i386/avx512f-fmsub-sf-zmm-7.c: Likewise. * gcc.target/i386/avx512f-fmsub-sf-zmm-8.c: Likewise. * gcc.target/i386/avx512vl-fmsub-sf-xmm-1.c: Likewise. * gcc.target/i386/avx512vl-fmsub-sf-ymm-1.c: Likewise. --- gcc/config/i386/avx512fintrin.h | 60 +++++++-------- gcc/config/i386/avx512vlintrin.h | 32 ++++---- gcc/config/i386/fmaintrin.h | 24 +++--- gcc/config/i386/i386-builtin.def | 18 +++++ gcc/config/i386/sse.md | 77 +++++++++++++++++++ .../gcc.target/i386/avx512f-fmsub-df-zmm-1.c | 12 +++ .../gcc.target/i386/avx512f-fmsub-sf-zmm-1.c | 12 +++ .../gcc.target/i386/avx512f-fmsub-sf-zmm-2.c | 12 +++ .../gcc.target/i386/avx512f-fmsub-sf-zmm-3.c | 12 +++ .../gcc.target/i386/avx512f-fmsub-sf-zmm-4.c | 12 +++ .../gcc.target/i386/avx512f-fmsub-sf-zmm-5.c | 12 +++ .../gcc.target/i386/avx512f-fmsub-sf-zmm-6.c | 12 +++ .../gcc.target/i386/avx512f-fmsub-sf-zmm-7.c | 12 +++ .../gcc.target/i386/avx512f-fmsub-sf-zmm-8.c | 12 +++ .../gcc.target/i386/avx512vl-fmsub-sf-xmm-1.c | 12 +++ .../gcc.target/i386/avx512vl-fmsub-sf-ymm-1.c | 12 +++ 16 files changed, 285 insertions(+), 58 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-fmsub-df-zmm-1.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-1.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-3.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-4.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-5.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-6.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-7.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-8.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-fmsub-sf-xmm-1.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-fmsub-sf-ymm-1.c diff --git a/gcc/config/i386/avx512fintrin.h b/gcc/config/i386/avx512fintrin.h index 8473cd0d26c..c0c8fa1efd0 100644 --- a/gcc/config/i386/avx512fintrin.h +++ b/gcc/config/i386/avx512fintrin.h @@ -3355,9 +3355,9 @@ extern __inline __m512d __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm512_fmsub_round_pd (__m512d __A, __m512d __B, __m512d __C, const int __R) { - return (__m512d) __builtin_ia32_vfmaddpd512_mask ((__v8df) __A, + return (__m512d) __builtin_ia32_vfmsubpd512_mask ((__v8df) __A, (__v8df) __B, - -(__v8df) __C, + (__v8df) __C, (__mmask8) -1, __R); } @@ -3366,9 +3366,9 @@ __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm512_mask_fmsub_round_pd (__m512d __A, __mmask8 __U, __m512d __B, __m512d __C, const int __R) { - return (__m512d) __builtin_ia32_vfmaddpd512_mask ((__v8df) __A, + return (__m512d) __builtin_ia32_vfmsubpd512_mask ((__v8df) __A, (__v8df) __B, - -(__v8df) __C, + (__v8df) __C, (__mmask8) __U, __R); } @@ -3388,9 +3388,9 @@ __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm512_maskz_fmsub_round_pd (__mmask8 __U, __m512d __A, __m512d __B, __m512d __C, const int __R) { - return (__m512d) __builtin_ia32_vfmaddpd512_maskz ((__v8df) __A, + return (__m512d) __builtin_ia32_vfmsubpd512_maskz ((__v8df) __A, (__v8df) __B, - -(__v8df) __C, + (__v8df) __C, (__mmask8) __U, __R); } @@ -3398,9 +3398,9 @@ extern __inline __m512 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm512_fmsub_round_ps (__m512 __A, __m512 __B, __m512 __C, const int __R) { - return (__m512) __builtin_ia32_vfmaddps512_mask ((__v16sf) __A, + return (__m512) __builtin_ia32_vfmsubps512_mask ((__v16sf) __A, (__v16sf) __B, - -(__v16sf) __C, + (__v16sf) __C, (__mmask16) -1, __R); } @@ -3409,9 +3409,9 @@ __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm512_mask_fmsub_round_ps (__m512 __A, __mmask16 __U, __m512 __B, __m512 __C, const int __R) { - return (__m512) __builtin_ia32_vfmaddps512_mask ((__v16sf) __A, + return (__m512) __builtin_ia32_vfmsubps512_mask ((__v16sf) __A, (__v16sf) __B, - -(__v16sf) __C, + (__v16sf) __C, (__mmask16) __U, __R); } @@ -3431,9 +3431,9 @@ __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm512_maskz_fmsub_round_ps (__mmask16 __U, __m512 __A, __m512 __B, __m512 __C, const int __R) { - return (__m512) __builtin_ia32_vfmaddps512_maskz ((__v16sf) __A, + return (__m512) __builtin_ia32_vfmsubps512_maskz ((__v16sf) __A, (__v16sf) __B, - -(__v16sf) __C, + (__v16sf) __C, (__mmask16) __U, __R); } @@ -3806,28 +3806,28 @@ _mm512_maskz_fnmsub_round_ps (__mmask16 __U, __m512 __A, __m512 __B, (__m512)__builtin_ia32_vfmaddps512_maskz(A, B, C, U, R) #define _mm512_fmsub_round_pd(A, B, C, R) \ - (__m512d)__builtin_ia32_vfmaddpd512_mask(A, B, -(C), -1, R) + (__m512d)__builtin_ia32_vfmsubpd512_mask(A, B, C, -1, R) #define _mm512_mask_fmsub_round_pd(A, U, B, C, R) \ - (__m512d)__builtin_ia32_vfmaddpd512_mask(A, B, -(C), U, R) + (__m512d)__builtin_ia32_vfmsubpd512_mask(A, B, C, U, R) #define _mm512_mask3_fmsub_round_pd(A, B, C, U, R) \ (__m512d)__builtin_ia32_vfmsubpd512_mask3(A, B, C, U, R) #define _mm512_maskz_fmsub_round_pd(U, A, B, C, R) \ - (__m512d)__builtin_ia32_vfmaddpd512_maskz(A, B, -(C), U, R) + (__m512d)__builtin_ia32_vfmsubpd512_maskz(A, B, C, U, R) #define _mm512_fmsub_round_ps(A, B, C, R) \ - (__m512)__builtin_ia32_vfmaddps512_mask(A, B, -(C), -1, R) + (__m512)__builtin_ia32_vfmsubps512_mask(A, B, C, -1, R) #define _mm512_mask_fmsub_round_ps(A, U, B, C, R) \ - (__m512)__builtin_ia32_vfmaddps512_mask(A, B, -(C), U, R) + (__m512)__builtin_ia32_vfmsubps512_mask(A, B, C, U, R) #define _mm512_mask3_fmsub_round_ps(A, B, C, U, R) \ (__m512)__builtin_ia32_vfmsubps512_mask3(A, B, C, U, R) #define _mm512_maskz_fmsub_round_ps(U, A, B, C, R) \ - (__m512)__builtin_ia32_vfmaddps512_maskz(A, B, -(C), U, R) + (__m512)__builtin_ia32_vfmsubps512_maskz(A, B, C, U, R) #define _mm512_fmaddsub_round_pd(A, B, C, R) \ (__m512d)__builtin_ia32_vfmaddsubpd512_mask(A, B, C, -1, R) @@ -12416,9 +12416,9 @@ extern __inline __m512d __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm512_fmsub_pd (__m512d __A, __m512d __B, __m512d __C) { - return (__m512d) __builtin_ia32_vfmaddpd512_mask ((__v8df) __A, + return (__m512d) __builtin_ia32_vfmsubpd512_mask ((__v8df) __A, (__v8df) __B, - -(__v8df) __C, + (__v8df) __C, (__mmask8) -1, _MM_FROUND_CUR_DIRECTION); } @@ -12427,9 +12427,9 @@ extern __inline __m512d __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm512_mask_fmsub_pd (__m512d __A, __mmask8 __U, __m512d __B, __m512d __C) { - return (__m512d) __builtin_ia32_vfmaddpd512_mask ((__v8df) __A, + return (__m512d) __builtin_ia32_vfmsubpd512_mask ((__v8df) __A, (__v8df) __B, - -(__v8df) __C, + (__v8df) __C, (__mmask8) __U, _MM_FROUND_CUR_DIRECTION); } @@ -12449,9 +12449,9 @@ extern __inline __m512d __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm512_maskz_fmsub_pd (__mmask8 __U, __m512d __A, __m512d __B, __m512d __C) { - return (__m512d) __builtin_ia32_vfmaddpd512_maskz ((__v8df) __A, + return (__m512d) __builtin_ia32_vfmsubpd512_maskz ((__v8df) __A, (__v8df) __B, - -(__v8df) __C, + (__v8df) __C, (__mmask8) __U, _MM_FROUND_CUR_DIRECTION); } @@ -12460,9 +12460,9 @@ extern __inline __m512 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm512_fmsub_ps (__m512 __A, __m512 __B, __m512 __C) { - return (__m512) __builtin_ia32_vfmaddps512_mask ((__v16sf) __A, + return (__m512) __builtin_ia32_vfmsubps512_mask ((__v16sf) __A, (__v16sf) __B, - -(__v16sf) __C, + (__v16sf) __C, (__mmask16) -1, _MM_FROUND_CUR_DIRECTION); } @@ -12471,9 +12471,9 @@ extern __inline __m512 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm512_mask_fmsub_ps (__m512 __A, __mmask16 __U, __m512 __B, __m512 __C) { - return (__m512) __builtin_ia32_vfmaddps512_mask ((__v16sf) __A, + return (__m512) __builtin_ia32_vfmsubps512_mask ((__v16sf) __A, (__v16sf) __B, - -(__v16sf) __C, + (__v16sf) __C, (__mmask16) __U, _MM_FROUND_CUR_DIRECTION); } @@ -12493,9 +12493,9 @@ extern __inline __m512 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm512_maskz_fmsub_ps (__mmask16 __U, __m512 __A, __m512 __B, __m512 __C) { - return (__m512) __builtin_ia32_vfmaddps512_maskz ((__v16sf) __A, + return (__m512) __builtin_ia32_vfmsubps512_maskz ((__v16sf) __A, (__v16sf) __B, - -(__v16sf) __C, + (__v16sf) __C, (__mmask16) __U, _MM_FROUND_CUR_DIRECTION); } diff --git a/gcc/config/i386/avx512vlintrin.h b/gcc/config/i386/avx512vlintrin.h index 68b5537845b..1e2e6da29bd 100644 --- a/gcc/config/i386/avx512vlintrin.h +++ b/gcc/config/i386/avx512vlintrin.h @@ -4117,9 +4117,9 @@ __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm256_mask_fmsub_pd (__m256d __A, __mmask8 __U, __m256d __B, __m256d __C) { - return (__m256d) __builtin_ia32_vfmaddpd256_mask ((__v4df) __A, + return (__m256d) __builtin_ia32_vfmsubpd256_mask ((__v4df) __A, (__v4df) __B, - -(__v4df) __C, + (__v4df) __C, (__mmask8) __U); } @@ -4139,9 +4139,9 @@ __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm256_maskz_fmsub_pd (__mmask8 __U, __m256d __A, __m256d __B, __m256d __C) { - return (__m256d) __builtin_ia32_vfmaddpd256_maskz ((__v4df) __A, + return (__m256d) __builtin_ia32_vfmsubpd256_maskz ((__v4df) __A, (__v4df) __B, - -(__v4df) __C, + (__v4df) __C, (__mmask8) __U); } @@ -4149,9 +4149,9 @@ extern __inline __m128d __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm_mask_fmsub_pd (__m128d __A, __mmask8 __U, __m128d __B, __m128d __C) { - return (__m128d) __builtin_ia32_vfmaddpd128_mask ((__v2df) __A, + return (__m128d) __builtin_ia32_vfmsubpd128_mask ((__v2df) __A, (__v2df) __B, - -(__v2df) __C, + (__v2df) __C, (__mmask8) __U); } @@ -4171,9 +4171,9 @@ __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm_maskz_fmsub_pd (__mmask8 __U, __m128d __A, __m128d __B, __m128d __C) { - return (__m128d) __builtin_ia32_vfmaddpd128_maskz ((__v2df) __A, + return (__m128d) __builtin_ia32_vfmsubpd128_maskz ((__v2df) __A, (__v2df) __B, - -(__v2df) __C, + (__v2df) __C, (__mmask8) __U); } @@ -4181,9 +4181,9 @@ extern __inline __m256 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm256_mask_fmsub_ps (__m256 __A, __mmask8 __U, __m256 __B, __m256 __C) { - return (__m256) __builtin_ia32_vfmaddps256_mask ((__v8sf) __A, + return (__m256) __builtin_ia32_vfmsubps256_mask ((__v8sf) __A, (__v8sf) __B, - -(__v8sf) __C, + (__v8sf) __C, (__mmask8) __U); } @@ -4203,9 +4203,9 @@ __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm256_maskz_fmsub_ps (__mmask8 __U, __m256 __A, __m256 __B, __m256 __C) { - return (__m256) __builtin_ia32_vfmaddps256_maskz ((__v8sf) __A, + return (__m256) __builtin_ia32_vfmsubps256_maskz ((__v8sf) __A, (__v8sf) __B, - -(__v8sf) __C, + (__v8sf) __C, (__mmask8) __U); } @@ -4213,9 +4213,9 @@ extern __inline __m128 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm_mask_fmsub_ps (__m128 __A, __mmask8 __U, __m128 __B, __m128 __C) { - return (__m128) __builtin_ia32_vfmaddps128_mask ((__v4sf) __A, + return (__m128) __builtin_ia32_vfmsubps128_mask ((__v4sf) __A, (__v4sf) __B, - -(__v4sf) __C, + (__v4sf) __C, (__mmask8) __U); } @@ -4233,9 +4233,9 @@ extern __inline __m128 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm_maskz_fmsub_ps (__mmask8 __U, __m128 __A, __m128 __B, __m128 __C) { - return (__m128) __builtin_ia32_vfmaddps128_maskz ((__v4sf) __A, + return (__m128) __builtin_ia32_vfmsubps128_maskz ((__v4sf) __A, (__v4sf) __B, - -(__v4sf) __C, + (__v4sf) __C, (__mmask8) __U); } diff --git a/gcc/config/i386/fmaintrin.h b/gcc/config/i386/fmaintrin.h index 660d3453590..2eddd896579 100644 --- a/gcc/config/i386/fmaintrin.h +++ b/gcc/config/i386/fmaintrin.h @@ -86,48 +86,48 @@ extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_fmsub_pd (__m128d __A, __m128d __B, __m128d __C) { - return (__m128d)__builtin_ia32_vfmaddpd ((__v2df)__A, (__v2df)__B, - -(__v2df)__C); + return (__m128d)__builtin_ia32_vfmsubpd ((__v2df)__A, (__v2df)__B, + (__v2df)__C); } extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm256_fmsub_pd (__m256d __A, __m256d __B, __m256d __C) { - return (__m256d)__builtin_ia32_vfmaddpd256 ((__v4df)__A, (__v4df)__B, - -(__v4df)__C); + return (__m256d)__builtin_ia32_vfmsubpd256 ((__v4df)__A, (__v4df)__B, + (__v4df)__C); } extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_fmsub_ps (__m128 __A, __m128 __B, __m128 __C) { - return (__m128)__builtin_ia32_vfmaddps ((__v4sf)__A, (__v4sf)__B, - -(__v4sf)__C); + return (__m128)__builtin_ia32_vfmsubps ((__v4sf)__A, (__v4sf)__B, + (__v4sf)__C); } extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm256_fmsub_ps (__m256 __A, __m256 __B, __m256 __C) { - return (__m256)__builtin_ia32_vfmaddps256 ((__v8sf)__A, (__v8sf)__B, - -(__v8sf)__C); + return (__m256)__builtin_ia32_vfmsubps256 ((__v8sf)__A, (__v8sf)__B, + (__v8sf)__C); } extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_fmsub_sd (__m128d __A, __m128d __B, __m128d __C) { - return (__m128d)__builtin_ia32_vfmaddsd3 ((__v2df)__A, (__v2df)__B, - -(__v2df)__C); + return (__m128d)__builtin_ia32_vfmsubsd3 ((__v2df)__A, (__v2df)__B, + (__v2df)__C); } extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_fmsub_ss (__m128 __A, __m128 __B, __m128 __C) { - return (__m128)__builtin_ia32_vfmaddss3 ((__v4sf)__A, (__v4sf)__B, - -(__v4sf)__C); + return (__m128)__builtin_ia32_vfmsubss3 ((__v4sf)__A, (__v4sf)__B, + (__v4sf)__C); } extern __inline __m128d diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def index dc4c70c7ea3..f5b5e56a01c 100644 --- a/gcc/config/i386/i386-builtin.def +++ b/gcc/config/i386/i386-builtin.def @@ -1903,10 +1903,18 @@ BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fmadd_v8sf_maskz, "__builtin_ BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fmadd_v4sf_mask, "__builtin_ia32_vfmaddps128_mask", IX86_BUILTIN_VFMADDPS128_MASK, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI) BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fmadd_v4sf_mask3, "__builtin_ia32_vfmaddps128_mask3", IX86_BUILTIN_VFMADDPS128_MASK3, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI) BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fmadd_v4sf_maskz, "__builtin_ia32_vfmaddps128_maskz", IX86_BUILTIN_VFMADDPS128_MASKZ, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fmsub_v4df_mask, "__builtin_ia32_vfmsubpd256_mask", IX86_BUILTIN_VFMSUBPD256_MASK, UNKNOWN, (int) V4DF_FTYPE_V4DF_V4DF_V4DF_UQI) BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fmsub_v4df_mask3, "__builtin_ia32_vfmsubpd256_mask3", IX86_BUILTIN_VFMSUBPD256_MASK3, UNKNOWN, (int) V4DF_FTYPE_V4DF_V4DF_V4DF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fmsub_v4df_maskz, "__builtin_ia32_vfmsubpd256_maskz", IX86_BUILTIN_VFMSUBPD256_MASKZ, UNKNOWN, (int) V4DF_FTYPE_V4DF_V4DF_V4DF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fmsub_v2df_mask, "__builtin_ia32_vfmsubpd128_mask", IX86_BUILTIN_VFMSUBPD128_MASK, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI) BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fmsub_v2df_mask3, "__builtin_ia32_vfmsubpd128_mask3", IX86_BUILTIN_VFMSUBPD128_MASK3, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fmsub_v2df_maskz, "__builtin_ia32_vfmsubpd128_maskz", IX86_BUILTIN_VFMSUBPD128_MASKZ, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fmsub_v8sf_mask, "__builtin_ia32_vfmsubps256_mask", IX86_BUILTIN_VFMSUBPS256_MASK, UNKNOWN, (int) V8SF_FTYPE_V8SF_V8SF_V8SF_UQI) BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fmsub_v8sf_mask3, "__builtin_ia32_vfmsubps256_mask3", IX86_BUILTIN_VFMSUBPS256_MASK3, UNKNOWN, (int) V8SF_FTYPE_V8SF_V8SF_V8SF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fmsub_v8sf_maskz, "__builtin_ia32_vfmsubps256_maskz", IX86_BUILTIN_VFMSUBPS256_MASKZ, UNKNOWN, (int) V8SF_FTYPE_V8SF_V8SF_V8SF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fmsub_v4sf_mask, "__builtin_ia32_vfmsubps128_mask", IX86_BUILTIN_VFMSUBPS128_MASK, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI) BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fmsub_v4sf_mask3, "__builtin_ia32_vfmsubps128_mask3", IX86_BUILTIN_VFMSUBPS128_MASK3, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI) +BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fmsub_v4sf_maskz, "__builtin_ia32_vfmsubps128_maskz", IX86_BUILTIN_VFMSUBPS128_MASKZ, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI) BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fnmadd_v4df_mask, "__builtin_ia32_vfnmaddpd256_mask", IX86_BUILTIN_VFNMADDPD256_MASK, UNKNOWN, (int) V4DF_FTYPE_V4DF_V4DF_V4DF_UQI) BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fnmadd_v2df_mask, "__builtin_ia32_vfnmaddpd128_mask", IX86_BUILTIN_VFNMADDPD128_MASK, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI) BDESC (OPTION_MASK_ISA_AVX512VL, CODE_FOR_avx512vl_fnmadd_v8sf_mask, "__builtin_ia32_vfnmaddps256_mask", IX86_BUILTIN_VFNMADDPS256_MASK, UNKNOWN, (int) V8SF_FTYPE_V8SF_V8SF_V8SF_UQI) @@ -2768,8 +2776,12 @@ BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fmaddsub_v16sf_mask3_round, "__ BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fmaddsub_v16sf_maskz_round, "__builtin_ia32_vfmaddsubps512_maskz", IX86_BUILTIN_VFMADDSUBPS512_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT) BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fmsubadd_v8df_mask3_round, "__builtin_ia32_vfmsubaddpd512_mask3", IX86_BUILTIN_VFMSUBADDPD512_MASK3, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT) BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fmsubadd_v16sf_mask3_round, "__builtin_ia32_vfmsubaddps512_mask3", IX86_BUILTIN_VFMSUBADDPS512_MASK3, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fmsub_v8df_mask_round, "__builtin_ia32_vfmsubpd512_mask", IX86_BUILTIN_VFMSUBPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT) BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fmsub_v8df_mask3_round, "__builtin_ia32_vfmsubpd512_mask3", IX86_BUILTIN_VFMSUBPD512_MASK3, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fmsub_v8df_maskz_round, "__builtin_ia32_vfmsubpd512_maskz", IX86_BUILTIN_VFMSUBPD512_MASKZ, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fmsub_v16sf_mask_round, "__builtin_ia32_vfmsubps512_mask", IX86_BUILTIN_VFMSUBPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT) BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fmsub_v16sf_mask3_round, "__builtin_ia32_vfmsubps512_mask3", IX86_BUILTIN_VFMSUBPS512_MASK3, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fmsub_v16sf_maskz_round, "__builtin_ia32_vfmsubps512_maskz", IX86_BUILTIN_VFMSUBPS512_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT) BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fnmadd_v8df_mask_round, "__builtin_ia32_vfnmaddpd512_mask", IX86_BUILTIN_VFNMADDPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT) BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fnmadd_v16sf_mask_round, "__builtin_ia32_vfnmaddps512_mask", IX86_BUILTIN_VFNMADDPS512_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT) BDESC (OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fnmsub_v8df_mask_round, "__builtin_ia32_vfnmsubpd512_mask", IX86_BUILTIN_VFNMSUBPD512_MASK, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT) @@ -2855,11 +2867,17 @@ BDESC_FIRST (multi_arg, MULTI_ARG, BDESC (OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_vmfmadd_v2df, "__builtin_ia32_vfmaddsd", IX86_BUILTIN_VFMADDSD, UNKNOWN, (int)MULTI_ARG_3_DF) BDESC (OPTION_MASK_ISA_FMA, CODE_FOR_fmai_vmfmadd_v4sf, "__builtin_ia32_vfmaddss3", IX86_BUILTIN_VFMADDSS3, UNKNOWN, (int)MULTI_ARG_3_SF) BDESC (OPTION_MASK_ISA_FMA, CODE_FOR_fmai_vmfmadd_v2df, "__builtin_ia32_vfmaddsd3", IX86_BUILTIN_VFMADDSD3, UNKNOWN, (int)MULTI_ARG_3_DF) +BDESC (OPTION_MASK_ISA_FMA, CODE_FOR_fmai_vmfmsub_v4sf, "__builtin_ia32_vfmsubss3", IX86_BUILTIN_VFMSUBSS3, UNKNOWN, (int)MULTI_ARG_3_SF) +BDESC (OPTION_MASK_ISA_FMA, CODE_FOR_fmai_vmfmsub_v2df, "__builtin_ia32_vfmsubsd3", IX86_BUILTIN_VFMSUBSD3, UNKNOWN, (int)MULTI_ARG_3_DF) BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fmadd_v4sf, "__builtin_ia32_vfmaddps", IX86_BUILTIN_VFMADDPS, UNKNOWN, (int)MULTI_ARG_3_SF) BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fmadd_v2df, "__builtin_ia32_vfmaddpd", IX86_BUILTIN_VFMADDPD, UNKNOWN, (int)MULTI_ARG_3_DF) BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fmadd_v8sf, "__builtin_ia32_vfmaddps256", IX86_BUILTIN_VFMADDPS256, UNKNOWN, (int)MULTI_ARG_3_SF2) BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fmadd_v4df, "__builtin_ia32_vfmaddpd256", IX86_BUILTIN_VFMADDPD256, UNKNOWN, (int)MULTI_ARG_3_DF2) +BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fmsub_v4sf, "__builtin_ia32_vfmsubps", IX86_BUILTIN_VFMSUBPS, UNKNOWN, (int)MULTI_ARG_3_SF) +BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fmsub_v2df, "__builtin_ia32_vfmsubpd", IX86_BUILTIN_VFMSUBPD, UNKNOWN, (int)MULTI_ARG_3_DF) +BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fmsub_v8sf, "__builtin_ia32_vfmsubps256", IX86_BUILTIN_VFMSUBPS256, UNKNOWN, (int)MULTI_ARG_3_SF2) +BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fma4i_fmsub_v4df, "__builtin_ia32_vfmsubpd256", IX86_BUILTIN_VFMSUBPD256, UNKNOWN, (int)MULTI_ARG_3_DF2) BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fmaddsub_v4sf, "__builtin_ia32_vfmaddsubps", IX86_BUILTIN_VFMADDSUBPS, UNKNOWN, (int)MULTI_ARG_3_SF) BDESC (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4, CODE_FOR_fmaddsub_v2df, "__builtin_ia32_vfmaddsubpd", IX86_BUILTIN_VFMADDSUBPD, UNKNOWN, (int)MULTI_ARG_3_DF) diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 635a6902d33..0e9d3541ccc 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -3760,6 +3760,14 @@ (match_operand:FMAMODE_AVX512 2 "nonimmediate_operand") (match_operand:FMAMODE_AVX512 3 "nonimmediate_operand")))]) +(define_expand "fma4i_fmsub_" + [(set (match_operand:FMAMODE_AVX512 0 "register_operand") + (fma:FMAMODE_AVX512 + (match_operand:FMAMODE_AVX512 1 "nonimmediate_operand") + (match_operand:FMAMODE_AVX512 2 "nonimmediate_operand") + (neg:FMAMODE_AVX512 + (match_operand:FMAMODE_AVX512 3 "nonimmediate_operand"))))]) + (define_expand "_fmadd__maskz" [(match_operand:VF_AVX512VL 0 "register_operand") (match_operand:VF_AVX512VL 1 "") @@ -3898,6 +3906,20 @@ (set_attr "type" "ssemuladd") (set_attr "mode" "")]) +(define_expand "_fmsub__maskz" + [(match_operand:VF_AVX512VL 0 "register_operand") + (match_operand:VF_AVX512VL 1 "") + (match_operand:VF_AVX512VL 2 "") + (match_operand:VF_AVX512VL 3 "") + (match_operand: 4 "register_operand")] + "TARGET_AVX512F && " +{ + emit_insn (gen_fma_fmsub__maskz_1 ( + operands[0], operands[1], operands[2], operands[3], + CONST0_RTX (mode), operands[4])); + DONE; +}) + (define_insn "fma_fmsub_" [(set (match_operand:VF_SF_AVX512VL 0 "register_operand" "=v,v,v") (fma:VF_SF_AVX512VL @@ -3913,6 +3935,49 @@ [(set_attr "type" "ssemuladd") (set_attr "mode" "")]) +(define_insn "*fma_fmsub__bcst_1" + [(set (match_operand:VF_AVX512 0 "register_operand" "=v,v") + (fma:VF_AVX512 + (match_operand:VF_AVX512 1 "register_operand" "0,v") + (match_operand:VF_AVX512 2 "register_operand" "v,0") + (neg:VF_AVX512 + (vec_duplicate:VF_AVX512 + (match_operand: 3 "memory_operand" "m,m")))))] + "TARGET_AVX512F && " + "vfmsub213\t{%3, %2, %0|%0, %2, %3}" + [(set_attr "type" "ssemuladd") + (set_attr "mode" "")]) + +(define_insn "*fma_fmsub__bcst_2" + [(set (match_operand:VF_AVX512 0 "register_operand" "=v,v") + (fma:VF_AVX512 + (vec_duplicate:VF_AVX512 + (match_operand: 1 "memory_operand" "m,m")) + (match_operand:VF_AVX512 2 "register_operand" "0,v") + (neg:VF_AVX512 + (match_operand:VF_AVX512 3 "register_operand" "v,0"))))] + "TARGET_AVX512F && " + "@ + vfmsub132\t{%1, %3, %0|%0, %3, %1} + vfmsub231\t{%1, %2, %0|%0, %2, %1}" + [(set_attr "type" "ssemuladd") + (set_attr "mode" "")]) + +(define_insn "*fma_fmsub__bcst_3" + [(set (match_operand:VF_AVX512 0 "register_operand" "=v,v") + (fma:VF_AVX512 + (match_operand:VF_AVX512 1 "register_operand" "0,v") + (vec_duplicate:VF_AVX512 + (match_operand: 2 "memory_operand" "m,m")) + (neg:VF_AVX512 + (match_operand:VF_AVX512 3 "nonimmediate_operand" "v,0"))))] + "TARGET_AVX512F && " + "@ + vfmsub132\t{%2, %3, %0|%0, %3, %2} + vfmsub231\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "type" "ssemuladd") + (set_attr "mode" "")]) + (define_insn "_fmsub__mask" [(set (match_operand:VF_AVX512VL 0 "register_operand" "=v,v") (vec_merge:VF_AVX512VL @@ -4261,6 +4326,18 @@ (const_int 1)))] "TARGET_FMA") +(define_expand "fmai_vmfmsub_" + [(set (match_operand:VF_128 0 "register_operand") + (vec_merge:VF_128 + (fma:VF_128 + (match_operand:VF_128 1 "") + (match_operand:VF_128 2 "") + (neg:VF_128 + (match_operand:VF_128 3 ""))) + (match_dup 1) + (const_int 1)))] + "TARGET_FMA") + (define_insn "*fmai_fmadd_" [(set (match_operand:VF_128 0 "register_operand" "=v,v") (vec_merge:VF_128 diff --git a/gcc/testsuite/gcc.target/i386/avx512f-fmsub-df-zmm-1.c b/gcc/testsuite/gcc.target/i386/avx512f-fmsub-df-zmm-1.c new file mode 100644 index 00000000000..840888a2d81 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512f-fmsub-df-zmm-1.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512f -O2" } */ +/* { dg-final { scan-assembler-times "vfmsub...pd\[ \\t\]+\\(%(?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}, %zmm\[0-9\]+, %zmm0" 1 } } */ +/* { dg-final { scan-assembler-not "vbroadcastsd\[^\n\]*%zmm\[0-9\]+" } } */ + +#define type __m512d +#define vec 512 +#define op fmsub +#define suffix pd +#define SCALAR double + +#include "avx512-fma-1.h" diff --git a/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-1.c b/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-1.c new file mode 100644 index 00000000000..0cb675b7628 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-1.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512f -O2" } */ +/* { dg-final { scan-assembler-times "vfmsub...ps\[ \\t\]+\\(%(?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}, %zmm\[0-9\]+, %zmm0" 1 } } */ +/* { dg-final { scan-assembler-not "vbroadcastss\[^\n\]*%zmm\[0-9\]+" } } */ + +#define type __m512 +#define vec 512 +#define op fmsub +#define suffix ps +#define SCALAR float + +#include "avx512-fma-1.h" diff --git a/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-2.c b/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-2.c new file mode 100644 index 00000000000..10212d471b1 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-2.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512f -O2" } */ +/* { dg-final { scan-assembler-times "vfmsub...ps\[ \\t\]+\\(%(?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}, %zmm\[0-9\]+, %zmm0" 1 } } */ +/* { dg-final { scan-assembler-not "vbroadcastss\[^\n\]*%zmm\[0-9\]+" } } */ + +#define type __m512 +#define vec 512 +#define op fmsub +#define suffix ps +#define SCALAR float + +#include "avx512-fma-2.h" diff --git a/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-3.c b/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-3.c new file mode 100644 index 00000000000..feb34077085 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-3.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512f -O2" } */ +/* { dg-final { scan-assembler-times "vfmsub...ps\[ \\t\]+\\(%(?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}, %zmm\[0-9\]+, %zmm0" 1 } } */ +/* { dg-final { scan-assembler-not "vbroadcastss\[^\n\]*%zmm\[0-9\]+" } } */ + +#define type __m512 +#define vec 512 +#define op fmsub +#define suffix ps +#define SCALAR float + +#include "avx512-fma-3.h" diff --git a/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-4.c b/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-4.c new file mode 100644 index 00000000000..4305fffe628 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-4.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512f -O2" } */ +/* { dg-final { scan-assembler-times "vfmsub...ps\[ \\t\]+\\(%(?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}, %zmm\[0-9\]+, %zmm0" 1 } } */ +/* { dg-final { scan-assembler-not "vbroadcastss\[^\n\]*%zmm\[0-9\]+" } } */ + +#define type __m512 +#define vec 512 +#define op fmsub +#define suffix ps +#define SCALAR float + +#include "avx512-fma-4.h" diff --git a/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-5.c b/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-5.c new file mode 100644 index 00000000000..d57251f83be --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-5.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512f -O2" } */ +/* { dg-final { scan-assembler-times "vfmsub...ps\[ \\t\]+\\(%(?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}, %zmm\[0-9\]+, %zmm0" 1 } } */ +/* { dg-final { scan-assembler-not "vbroadcastss\[^\n\]*%zmm\[0-9\]+" } } */ + +#define type __m512 +#define vec 512 +#define op fmsub +#define suffix ps +#define SCALAR float + +#include "avx512-fma-5.h" diff --git a/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-6.c b/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-6.c new file mode 100644 index 00000000000..b26a9ee7eeb --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-6.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512f -O2" } */ +/* { dg-final { scan-assembler-times "vfmsub...ps\[ \\t\]+\\(%(?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}, %zmm\[0-9\]+, %zmm0" 1 } } */ +/* { dg-final { scan-assembler-not "vbroadcastss\[^\n\]*%zmm\[0-9\]+" } } */ + +#define type __m512 +#define vec 512 +#define op fmsub +#define suffix ps +#define SCALAR float + +#include "avx512-fma-6.h" diff --git a/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-7.c b/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-7.c new file mode 100644 index 00000000000..cc705af8ea5 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-7.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512f -O2" } */ +/* { dg-final { scan-assembler-times "vbroadcastss\[^\n\]*%zmm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vfmsub...ps\[^\n\]*%zmm\[0-9\]+" 1 } } */ + +#define type __m512 +#define vec 512 +#define op fmsub +#define suffix ps +#define SCALAR float + +#include "avx512-fma-7.h" diff --git a/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-8.c b/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-8.c new file mode 100644 index 00000000000..2b929fa11e8 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-8.c @@ -0,0 +1,12 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-mavx512f -O2" } */ +/* { dg-final { scan-assembler-times "vbroadcastss\[^\n\]*%zmm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vfmsub...ps\[ \\t\]+%zmm\[0-9\]+, %zmm\[0-9\]+, %zmm0" 1 } } */ + +#define type __m512 +#define vec 512 +#define op fmsub +#define suffix ps +#define SCALAR float + +#include "avx512-fma-8.h" diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-fmsub-sf-xmm-1.c b/gcc/testsuite/gcc.target/i386/avx512vl-fmsub-sf-xmm-1.c new file mode 100644 index 00000000000..70efbcc98f9 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512vl-fmsub-sf-xmm-1.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-mfma -mavx512vl -O2" } */ +/* { dg-final { scan-assembler-times "vfmsub...ps\[ \\t\]+\\(%(?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}, %xmm\[0-9\]+, %xmm0" 1 } } */ +/* { dg-final { scan-assembler-not "vbroadcastss\[^\n\]*%xmm\[0-9\]+" } } */ + +#define type __m128 +#define vec +#define op fmsub +#define suffix ps +#define SCALAR float + +#include "avx512-fma-1.h" diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-fmsub-sf-ymm-1.c b/gcc/testsuite/gcc.target/i386/avx512vl-fmsub-sf-ymm-1.c new file mode 100644 index 00000000000..a7c1b370b74 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512vl-fmsub-sf-ymm-1.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-mfma -mavx512vl -O2" } */ +/* { dg-final { scan-assembler-times "vfmsub...ps\[ \\t\]+\\(%(?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}, %ymm\[0-9\]+, %ymm0" 1 } } */ +/* { dg-final { scan-assembler-not "vbroadcastss\[^\n\]*%ymm\[0-9\]+" } } */ + +#define type __m256 +#define vec 256 +#define op fmsub +#define suffix ps +#define SCALAR float + +#include "avx512-fma-1.h" -- 2.17.2