From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ua1-x932.google.com (mail-ua1-x932.google.com [IPv6:2607:f8b0:4864:20::932]) by sourceware.org (Postfix) with ESMTPS id 5B2083857C6F for ; Fri, 27 Aug 2021 05:08:09 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 5B2083857C6F Received: by mail-ua1-x932.google.com with SMTP id g2so2799418uad.4 for ; Thu, 26 Aug 2021 22:08:09 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=EGcPbWZF5s78fHELdWeqGJ0UKfsmFJ3I0i6zSKAjC5Y=; b=p7lcyR+OHw+sGsUyUsGE36+oF4x9OwJgZ4RY9UbvBZMt2fBE5L1Mknr3r85kiS1zTd Y14TB5ZPOaIyhR30sIpfuSynsrRxKUC4NlSzgpV6QQ/TqUcpp2ERcdaEH7Dtzrug9WAe w/mL7HfWtr6ZU8ahbaX9CBQfe9kujidxIEACDzTTEId35VQEuHK4dHQ9UUatg97qux/6 hqfgDM8WJWZGlVbw62goGNK4xom6ZHO6oD1FCf1QOZ8UDDrpNfJlwEz3MHjGNTF/1/Rt Kiw23Y/w5MSWC/aRbiZAvotRtSHqIV5rjt8YnXI+LQIpLLaha+VLvOLBr+hDny2EgvMe PxjQ== X-Gm-Message-State: AOAM530J3/PXn6FBK23YvWbQJHSuu5a7uq069iDWsZ6wMiGj4b9H0njB qzrBDPOS/R98Ge9Lhue80rXUr7beW6FBgNsPy2K+IuvPBs4= X-Google-Smtp-Source: ABdhPJyo5XtKpoM27XcvnXanzDFDIY+SePrlsxpQgdma1KbpnoYbNGUSAY2b4aa4+Wi16w1/OuYfaFzfnRJmqFRsFYw= X-Received: by 2002:a9f:3189:: with SMTP id v9mr5242125uad.32.1630040888773; Thu, 26 Aug 2021 22:08:08 -0700 (PDT) MIME-Version: 1.0 References: <20210825081029.3623-1-lingling.kong@intel.com> In-Reply-To: From: Hongtao Liu Date: Fri, 27 Aug 2021 13:14:05 +0800 Message-ID: Subject: Re: [PATCH] i386: Fix wrong optimization for consecutive masked scatters [PR 101472] To: "Kong, Lingling" Cc: "Liu, Hongtao" , "gcc-patches@gcc.gnu.org" Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 27 Aug 2021 05:08:19 -0000 On Fri, Aug 27, 2021 at 10:03 AM Kong, Lingling via Gcc-patches wrote: > > Hi, > > For avx512f_scattersi, mask operand only affect set src, we need to refine the pattern to let gcc know mask register also affect the dest. > So we put mask operand into UNSPEC_VSIBADDR. > > Bootstrapped and regression tested on x86_64-linux-gnu{-m32,-m64}. > Ok for master? Ok. > > gcc/ChangeLog: > > PR target/101472 > * config/i386/sse.md: (scattersi): Add mask operand to > UNSPEC_VSIBADDR. > (scattersi): Likewise. > (*avx512f_scattersi): Merge mask operand to set_dest. > (*avx512f_scatterdi): Likewise > > gcc/testsuite/ChangeLog: > > PR target/101472 > * gcc.target/i386/avx512f-pr101472.c: New test. > * gcc.target/i386/avx512vl-pr101472.c: New test. > --- > gcc/config/i386/sse.md | 20 +++-- > .../gcc.target/i386/avx512f-pr101472.c | 49 ++++++++++++ > .../gcc.target/i386/avx512vl-pr101472.c | 79 +++++++++++++++++++ > 3 files changed, 140 insertions(+), 8 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-pr101472.c > create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-pr101472.c > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 03fc2df1fb0..a3055dbd316 100644 > --- a/gcc/config/i386/sse.md > +++ b/gcc/config/i386/sse.md > @@ -24205,8 +24205,9 @@ > "TARGET_AVX512F" > { > operands[5] > - = gen_rtx_UNSPEC (Pmode, gen_rtvec (3, operands[0], operands[2], > - operands[4]), UNSPEC_VSIBADDR); > + = gen_rtx_UNSPEC (Pmode, gen_rtvec (4, operands[0], operands[2], > + operands[4], operands[1]), > + UNSPEC_VSIBADDR); > }) > > (define_insn "*avx512f_scattersi" > @@ -24214,10 +24215,11 @@ > [(unspec:P > [(match_operand:P 0 "vsib_address_operand" "Tv") > (match_operand: 2 "register_operand" "v") > - (match_operand:SI 4 "const1248_operand" "n")] > + (match_operand:SI 4 "const1248_operand" "n") > + (match_operand: 6 "register_operand" "1")] > UNSPEC_VSIBADDR)]) > (unspec:VI48F > - [(match_operand: 6 "register_operand" "1") > + [(match_dup 6) > (match_operand:VI48F 3 "register_operand" "v")] > UNSPEC_SCATTER)) > (clobber (match_scratch: 1 "=&Yk"))] @@ -24243,8 +24245,9 @@ > "TARGET_AVX512F" > { > operands[5] > - = gen_rtx_UNSPEC (Pmode, gen_rtvec (3, operands[0], operands[2], > - operands[4]), UNSPEC_VSIBADDR); > + = gen_rtx_UNSPEC (Pmode, gen_rtvec (4, operands[0], operands[2], > + operands[4], operands[1]), > + UNSPEC_VSIBADDR); > }) > > (define_insn "*avx512f_scatterdi" > @@ -24252,10 +24255,11 @@ > [(unspec:P > [(match_operand:P 0 "vsib_address_operand" "Tv") > (match_operand: 2 "register_operand" "v") > - (match_operand:SI 4 "const1248_operand" "n")] > + (match_operand:SI 4 "const1248_operand" "n") > + (match_operand:QI 6 "register_operand" "1")] > UNSPEC_VSIBADDR)]) > (unspec:VI48F > - [(match_operand:QI 6 "register_operand" "1") > + [(match_dup 6) > (match_operand: 3 "register_operand" "v")] > UNSPEC_SCATTER)) > (clobber (match_scratch:QI 1 "=&Yk"))] diff --git a/gcc/testsuite/gcc.target/i386/avx512f-pr101472.c b/gcc/testsuite/gcc.target/i386/avx512f-pr101472.c > new file mode 100644 > index 00000000000..89c6603c2ff > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/avx512f-pr101472.c > @@ -0,0 +1,49 @@ > +/* PR target/101472 */ > +/* { dg-do compile } */ > +/* { dg-options "-mavx512f -O2" } */ > +/* { dg-final { scan-assembler-times "vpscatterqd\[ > +\\t\]+\[^\{\n\]*ymm\[0-9\]\[^\n\]*zmm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[ > +\\t\]+#)" 2 } } */ > +/* { dg-final { scan-assembler-times "vpscatterdd\[ > +\\t\]+\[^\{\n\]*zmm\[0-9\]\[^\n\]*zmm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[ > +\\t\]+#)" 2 } } */ > +/* { dg-final { scan-assembler-times "vpscatterqq\[ > +\\t\]+\[^\{\n\]*zmm\[0-9\]\[^\n\]*zmm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[ > +\\t\]+#)" 2 } } */ > +/* { dg-final { scan-assembler-times "vpscatterdq\[ > +\\t\]+\[^\{\n\]*zmm\[0-9\]\[^\n\]*ymm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[ > +\\t\]+#)" 2 } } */ > +/* { dg-final { scan-assembler-times "vscatterqps\[ > +\\t\]+\[^\{\n\]*ymm\[0-9\]\[^\n\]*zmm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[ > +\\t\]+#)" 2 } } */ > +/* { dg-final { scan-assembler-times "vscatterdps\[ > +\\t\]+\[^\{\n\]*zmm\[0-9\]\[^\n\]*zmm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[ > +\\t\]+#)" 2 } } */ > +/* { dg-final { scan-assembler-times "vscatterqpd\[ > +\\t\]+\[^\{\n\]*zmm\[0-9\]\[^\n\]*zmm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[ > +\\t\]+#)" 2 } } */ > +/* { dg-final { scan-assembler-times "vscatterdpd\[ > +\\t\]+\[^\{\n\]*zmm\[0-9\]\[^\n\]*ymm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[ > +\\t\]+#)" 2 } } */ > + > +#include > + > +void two_scatters_epi32(void* addr, __mmask8 k1, __mmask8 k2, __m512i vindex, > + __m256i a, __m512i b) > +{ > + _mm512_mask_i64scatter_epi32(addr, k1, vindex, a, 1); > + _mm512_mask_i64scatter_epi32(addr, k2, vindex, a, 1); > + _mm512_mask_i32scatter_epi32(addr, k1, vindex, b, 1); > + _mm512_mask_i32scatter_epi32(addr, k2, vindex, b, 1); } > + > +void two_scatters_epi64(void* addr, __mmask8 k1, __mmask8 k2, __m512i vindex, > + __m256i idx, __m512i a) > +{ > + _mm512_mask_i64scatter_epi64(addr, k1, vindex, a, 1); > + _mm512_mask_i64scatter_epi64(addr, k2, vindex, a, 1); > + _mm512_mask_i32scatter_epi64(addr, k1, idx, a, 1); > + _mm512_mask_i32scatter_epi64(addr, k2, idx, a, 1); } > + > +void two_scatters_ps(void* addr, __mmask8 k1, __mmask8 k2, __m512i vindex, > + __m256 a, __m512 b) > +{ > + _mm512_mask_i64scatter_ps(addr, k1, vindex, a, 1); > + _mm512_mask_i64scatter_ps(addr, k2, vindex, a, 1); > + _mm512_mask_i32scatter_ps(addr, k1, vindex, b, 1); > + _mm512_mask_i32scatter_ps(addr, k2, vindex, b, 1); } > + > +void two_scatters_pd(void* addr, __mmask8 k1, __mmask8 k2, __m512i vindex, > + __m256i idx, __m512d a) > +{ > + _mm512_mask_i64scatter_pd(addr, k1, vindex, a, 1); > + _mm512_mask_i64scatter_pd(addr, k2, vindex, a, 1); > + _mm512_mask_i32scatter_pd(addr, k1, idx, a, 1); > + _mm512_mask_i32scatter_pd(addr, k2, idx, a, 1); } > diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-pr101472.c b/gcc/testsuite/gcc.target/i386/avx512vl-pr101472.c > new file mode 100644 > index 00000000000..6df59a2eb7f > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/avx512vl-pr101472.c > @@ -0,0 +1,79 @@ > +/* PR target/101472 */ > +/* { dg-do compile } */ > +/* { dg-options "-mavx512vl -O2" } */ > +/* { dg-final { scan-assembler-times "vpscatterqd\[ > +\\t\]+\[^\{\n\]*xmm\[0-9\]\[^\n\]*xmm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[ > +\\t\]+#)" 2 } } */ > +/* { dg-final { scan-assembler-times "vpscatterqd\[ > +\\t\]+\[^\{\n\]*xmm\[0-9\]\[^\n\]*ymm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[ > +\\t\]+#)" 2 } } */ > +/* { dg-final { scan-assembler-times "vpscatterdd\[ > +\\t\]+\[^\{\n\]*xmm\[0-9\]\[^\n\]*xmm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[ > +\\t\]+#)" 2 } } */ > +/* { dg-final { scan-assembler-times "vpscatterdd\[ > +\\t\]+\[^\{\n\]*ymm\[0-9\]\[^\n\]*ymm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[ > +\\t\]+#)" 2 } } */ > +/* { dg-final { scan-assembler-times "vpscatterqq\[ > +\\t\]+\[^\{\n\]*xmm\[0-9\]\[^\n\]*xmm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[ > +\\t\]+#)" 2 } } */ > +/* { dg-final { scan-assembler-times "vpscatterqq\[ > +\\t\]+\[^\{\n\]*ymm\[0-9\]\[^\n\]*ymm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[ > +\\t\]+#)" 2 } } */ > +/* { dg-final { scan-assembler-times "vpscatterdq\[ > +\\t\]+\[^\{\n\]*xmm\[0-9\]\[^\n\]*xmm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[ > +\\t\]+#)" 2 } } */ > +/* { dg-final { scan-assembler-times "vpscatterdq\[ > +\\t\]+\[^\{\n\]*ymm\[0-9\]\[^\n\]*xmm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[ > +\\t\]+#)" 2 } } */ > +/* { dg-final { scan-assembler-times "vscatterqps\[ > +\\t\]+\[^\{\n\]*xmm\[0-9\]\[^\n\]*xmm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[ > +\\t\]+#)" 2 } } */ > +/* { dg-final { scan-assembler-times "vscatterqps\[ > +\\t\]+\[^\{\n\]*xmm\[0-9\]\[^\n\]*ymm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[ > +\\t\]+#)" 2 } } */ > +/* { dg-final { scan-assembler-times "vscatterdps\[ > +\\t\]+\[^\{\n\]*xmm\[0-9\]\[^\n\]*xmm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[ > +\\t\]+#)" 2 } } */ > +/* { dg-final { scan-assembler-times "vscatterdps\[ > +\\t\]+\[^\{\n\]*ymm\[0-9\]\[^\n\]*ymm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[ > +\\t\]+#)" 2 } } */ > +/* { dg-final { scan-assembler-times "vscatterqpd\[ > +\\t\]+\[^\{\n\]*xmm\[0-9\]\[^\n\]*xmm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[ > +\\t\]+#)" 2 } } */ > +/* { dg-final { scan-assembler-times "vscatterqpd\[ > +\\t\]+\[^\{\n\]*ymm\[0-9\]\[^\n\]*ymm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[ > +\\t\]+#)" 2 } } */ > +/* { dg-final { scan-assembler-times "vscatterdpd\[ > +\\t\]+\[^\{\n\]*xmm\[0-9\]\[^\n\]*xmm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[ > +\\t\]+#)" 2 } } */ > +/* { dg-final { scan-assembler-times "vscatterdpd\[ > +\\t\]+\[^\{\n\]*ymm\[0-9\]\[^\n\]*xmm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[ > +\\t\]+#)" 2 } } */ > + > + > +#include > + > +void two_scatters_epi32(void* addr, __mmask8 k1, __mmask8 k2, __m128i vindex1, > + __m256i vindex2, __m128i src_epi32, > + __m256i src_i32_epi32) > +{ > + _mm_mask_i64scatter_epi32(addr, k1, vindex1, src_epi32, 1); > + _mm_mask_i64scatter_epi32(addr, k2, vindex1, src_epi32, 1); > + _mm256_mask_i64scatter_epi32(addr, k1, vindex2, src_epi32, 1); > + _mm256_mask_i64scatter_epi32(addr, k2, vindex2, src_epi32, 1); > + > + _mm_mask_i32scatter_epi32(addr, k1, vindex1, src_epi32, 1); > + _mm_mask_i32scatter_epi32(addr, k2, vindex1, src_epi32, 1); > + _mm256_mask_i32scatter_epi32(addr, k1, vindex2, src_i32_epi32, 1); > + _mm256_mask_i32scatter_epi32(addr, k2, vindex2, src_i32_epi32, 1); > +} > + > +void two_scatters_epi64(void* addr, __mmask8 k1, __mmask8 k2, __m128i vindex1, > + __m256i vindex2, __m128i src_epi64_mm, > + __m256i src_epi64) > +{ > + _mm_mask_i64scatter_epi64(addr, k1, vindex1, src_epi64_mm, 1); > + _mm_mask_i64scatter_epi64(addr, k2, vindex1, src_epi64_mm, 1); > + _mm256_mask_i64scatter_epi64(addr, k1, vindex2, src_epi64, 1); > + _mm256_mask_i64scatter_epi64(addr, k2, vindex2, src_epi64, 1); > + > + _mm_mask_i32scatter_epi64(addr, k1, vindex1, src_epi64_mm, 8); > + _mm_mask_i32scatter_epi64(addr, k2, vindex1, src_epi64_mm, 8); > + _mm256_mask_i32scatter_epi64(addr, k1, vindex1, src_epi64, 1); > + _mm256_mask_i32scatter_epi64(addr, k2, vindex1, src_epi64, 1); } > +void two_scatters_ps(void* addr, __mmask8 k1, __mmask8 k2, __m128i vindex1, > + __m256i vindex2, __m128 src_ps, __m256 src_i32_ps) { > + _mm_mask_i64scatter_ps(addr, k1, vindex1, src_ps, 1); > + _mm_mask_i64scatter_ps(addr, k2, vindex1, src_ps, 1); > + _mm256_mask_i64scatter_ps(addr, k1, vindex2, src_ps, 1); > + _mm256_mask_i64scatter_ps(addr, k2, vindex2, src_ps, 1); > + > + _mm_mask_i32scatter_ps(addr, k1, vindex1, src_ps, 8); > + _mm_mask_i32scatter_ps(addr, k2, vindex1, src_ps, 8); > + _mm256_mask_i32scatter_ps(addr, k1, vindex2, src_i32_ps, 1); > + _mm256_mask_i32scatter_ps(addr, k2, vindex2, src_i32_ps, 1); } > + > +void two_scatters_pd(void* addr, __mmask8 k1, __mmask8 k2, __m128i vindex1, > + __m256i vindex2, __m128d src_pd_mm, __m256d src_pd) { > + _mm_mask_i64scatter_pd(addr, k1, vindex1, src_pd_mm, 1); > + _mm_mask_i64scatter_pd(addr, k2, vindex1, src_pd_mm, 1); > + _mm256_mask_i64scatter_pd(addr, k1, vindex2, src_pd, 1); > + _mm256_mask_i64scatter_pd(addr, k2, vindex2, src_pd, 1); > + > + _mm_mask_i32scatter_pd(addr, k1, vindex1, src_pd_mm, 8); > + _mm_mask_i32scatter_pd(addr, k2, vindex1, src_pd_mm, 8); > + _mm256_mask_i32scatter_pd(addr, k1, vindex1, src_pd, 1); > + _mm256_mask_i32scatter_pd(addr, k2, vindex1, src_pd, 1); } > -- > 2.18.1 > -- BR, Hongtao