From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg1-x531.google.com (mail-pg1-x531.google.com [IPv6:2607:f8b0:4864:20::531]) by sourceware.org (Postfix) with ESMTPS id 579AC395B47D for ; Tue, 4 May 2021 17:03:56 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 579AC395B47D Received: by mail-pg1-x531.google.com with SMTP id y30so7753941pgl.7 for ; Tue, 04 May 2021 10:03:56 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Ov5gC90DvkmEag8AlT/PrCVuNMX2OnOxiDndGLctQV0=; b=LxDi1ulRP1vRaeJ7/b2mZLUPisOeUzprl3gyQsvwUbesUHwuZ/O48C86W85/ISWIOH LwFQkeIj05QgHee93fmrXDuz+wB6KcscevqdN149Lhswd79H0YPDqJD+HLsz24o9r3jC 4ivwZOgIijCfWIswF5Iqeyfu3CNYNGozpqpgSxBq8fnAnNTQezq4XjL8wBubWT6w10nn uK4UsFIEqaxwv3xHmxoNcrXzp6E9e93L7nKiLPtg4XTzHNLqo+LJUb5DznmVGQRqjTnp KPTjvOL364qCuf08qt82h0E+iT06rG3/1BkUM9dMyHx/C+HfXBdDmggwRsMiz6TjkT8O ep/g== X-Gm-Message-State: AOAM532PArhspPYJ3Tv5+3o0ohvCnCaxby1nz8ugjVTo5qIANmxc99Zg UNrz1BlGbIznSjRn3Oky/OUmgc1v5hCi/xJjomCdXg== X-Google-Smtp-Source: ABdhPJx1vJwb05EbSdHm8pMUp6DL1p1u4gMqytYIeaOP5L9Lu1JJup2c05FfvBENDFQ4ghtqnCWMeIB5x72GKaV7ZMA= X-Received: by 2002:a62:f202:0:b029:28e:96e7:a084 with SMTP id m2-20020a62f2020000b029028e96e7a084mr11320478pfh.59.1620147835201; Tue, 04 May 2021 10:03:55 -0700 (PDT) MIME-Version: 1.0 References: <1619791790-628-1-git-send-email-christophe.lyon@linaro.org> <1619791790-628-7-git-send-email-christophe.lyon@linaro.org> <12b710c0-f646-d44e-c18d-6a550140911b@arm.com> In-Reply-To: From: Christophe Lyon Date: Tue, 4 May 2021 19:03:44 +0200 Message-ID: Subject: Re: [PATCH 7/9] arm: Auto-vectorization for MVE: add __fp16 support to VCMP To: "Andre Vieira (lists)" Cc: "gcc-patches@gcc.gnu.org" Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-10.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 May 2021 17:03:58 -0000 On Tue, 4 May 2021 at 15:43, Christophe Lyon wrote: > > On Tue, 4 May 2021 at 13:48, Andre Vieira (lists) > wrote: > > > > It would be good to also add tests for NEON as you also enable auto-vec > > for it. I checked and I do think the necessary 'neon_vc' patterns exist > > for 'VH', so we should be OK there. > > > > Actually since I posted the patch series, I've noticed a regression in > armv8_2-fp16-arith-1.c, because we now vectorize all the float16x[48]_t loops, > but we lose the fact that some FP comparisons can throw exceptions. > > I'll have to revisit this patch. Actually it looks like my patch does the right thing: we now vectorize appropriately, given that the testcase is compiled with -ffast-math. I need to update the testcase, though. > > Thanks, > > Christophe > > > On 30/04/2021 15:09, Christophe Lyon via Gcc-patches wrote: > > > This patch adds __fp16 support to the previous patch that added vcmp > > > support with MVE. For this we update existing expanders to use VDQWH > > > iterator, and add a new expander vcond. In the > > > process we need to create suitable iterators, and update v_cmp_result > > > as needed. > > > > > > 2021-04-26 Christophe Lyon > > > > > > gcc/ > > > * config/arm/iterators.md (V16): New iterator. > > > (VH_cvtto): New iterator. > > > (v_cmp_result): Added V4HF and V8HF support. > > > * config/arm/vec-common.md (vec_cmp): Use VDQWH. > > > (vcond): Likewise. > > > (vcond_mask_): Likewise. > > > (vcond): New expander. > > > > > > gcc/testsuite/ > > > * gcc.target/arm/simd/mve-compare-3.c: New test with GCC vectors. > > > * gcc.target/arm/simd/mve-vcmp-f16.c: New test for > > > auto-vectorization. > > > --- > > > gcc/config/arm/iterators.md | 6 ++++ > > > gcc/config/arm/vec-common.md | 40 ++++++++++++++++------- > > > gcc/testsuite/gcc.target/arm/simd/mve-compare-3.c | 38 +++++++++++++++++++++ > > > gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f16.c | 30 +++++++++++++++++ > > > 4 files changed, 102 insertions(+), 12 deletions(-) > > > create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-compare-3.c > > > create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f16.c > > > > > > diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md > > > index a128465..3042baf 100644 > > > --- a/gcc/config/arm/iterators.md > > > +++ b/gcc/config/arm/iterators.md > > > @@ -231,6 +231,9 @@ (define_mode_iterator VU [V16QI V8HI V4SI]) > > > ;; Vector modes for 16-bit floating-point support. > > > (define_mode_iterator VH [V8HF V4HF]) > > > > > > +;; Modes with 16-bit elements only. > > > +(define_mode_iterator V16 [V4HI V4HF V8HI V8HF]) > > > + > > > ;; 16-bit floating-point vector modes suitable for moving (includes BFmode). > > > (define_mode_iterator VHFBF [V8HF V4HF V4BF V8BF]) > > > > > > @@ -571,6 +574,8 @@ (define_mode_attr V_cvtto [(V2SI "v2sf") (V2SF "v2si") > > > ;; (Opposite) mode to convert to/from for vector-half mode conversions. > > > (define_mode_attr VH_CVTTO [(V4HI "V4HF") (V4HF "V4HI") > > > (V8HI "V8HF") (V8HF "V8HI")]) > > > +(define_mode_attr VH_cvtto [(V4HI "v4hf") (V4HF "v4hi") > > > + (V8HI "v8hf") (V8HF "v8hi")]) > > > > > > ;; Define element mode for each vector mode. > > > (define_mode_attr V_elem [(V8QI "QI") (V16QI "QI") > > > @@ -720,6 +725,7 @@ (define_mode_attr V_cmp_result [(V8QI "V8QI") (V16QI "V16QI") > > > (define_mode_attr v_cmp_result [(V8QI "v8qi") (V16QI "v16qi") > > > (V4HI "v4hi") (V8HI "v8hi") > > > (V2SI "v2si") (V4SI "v4si") > > > + (V4HF "v4hi") (V8HF "v8hi") > > > (DI "di") (V2DI "v2di") > > > (V2SF "v2si") (V4SF "v4si")]) > > > > > > diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md > > > index 034b48b..3fd341c 100644 > > > --- a/gcc/config/arm/vec-common.md > > > +++ b/gcc/config/arm/vec-common.md > > > @@ -366,8 +366,8 @@ (define_expand "vlshr3" > > > (define_expand "vec_cmp" > > > [(set (match_operand: 0 "s_register_operand") > > > (match_operator: 1 "comparison_operator" > > > - [(match_operand:VDQW 2 "s_register_operand") > > > - (match_operand:VDQW 3 "reg_or_zero_operand")]))] > > > + [(match_operand:VDQWH 2 "s_register_operand") > > > + (match_operand:VDQWH 3 "reg_or_zero_operand")]))] > > > "ARM_HAVE__ARITH > > > && !TARGET_REALLY_IWMMXT > > > && (! || flag_unsafe_math_optimizations)" > > > @@ -399,13 +399,13 @@ (define_expand "vec_cmpu" > > > ;; element-wise. > > > > > > (define_expand "vcond" > > > - [(set (match_operand:VDQW 0 "s_register_operand") > > > - (if_then_else:VDQW > > > + [(set (match_operand:VDQWH 0 "s_register_operand") > > > + (if_then_else:VDQWH > > > (match_operator 3 "comparison_operator" > > > - [(match_operand:VDQW 4 "s_register_operand") > > > - (match_operand:VDQW 5 "reg_or_zero_operand")]) > > > - (match_operand:VDQW 1 "s_register_operand") > > > - (match_operand:VDQW 2 "s_register_operand")))] > > > + [(match_operand:VDQWH 4 "s_register_operand") > > > + (match_operand:VDQWH 5 "reg_or_zero_operand")]) > > > + (match_operand:VDQWH 1 "s_register_operand") > > > + (match_operand:VDQWH 2 "s_register_operand")))] > > > "ARM_HAVE__ARITH > > > && !TARGET_REALLY_IWMMXT > > > && (! || flag_unsafe_math_optimizations)" > > > @@ -430,6 +430,22 @@ (define_expand "vcond" > > > DONE; > > > }) > > > > > > +(define_expand "vcond" > > > + [(set (match_operand: 0 "s_register_operand") > > > + (if_then_else: > > > + (match_operator 3 "comparison_operator" > > > + [(match_operand:V16 4 "s_register_operand") > > > + (match_operand:V16 5 "reg_or_zero_operand")]) > > > + (match_operand: 1 "s_register_operand") > > > + (match_operand: 2 "s_register_operand")))] > > > + "ARM_HAVE__ARITH > > > + && !TARGET_REALLY_IWMMXT > > > + && (! || flag_unsafe_math_optimizations)" > > > +{ > > > + arm_expand_vcond (operands, mode); > > > + DONE; > > > +}) > > > + > > > (define_expand "vcondu" > > > [(set (match_operand:VDQW 0 "s_register_operand") > > > (if_then_else:VDQW > > > @@ -446,11 +462,11 @@ (define_expand "vcondu" > > > }) > > > > > > (define_expand "vcond_mask_" > > > - [(set (match_operand:VDQW 0 "s_register_operand") > > > - (if_then_else:VDQW > > > + [(set (match_operand:VDQWH 0 "s_register_operand") > > > + (if_then_else:VDQWH > > > (match_operand: 3 "s_register_operand") > > > - (match_operand:VDQW 1 "s_register_operand") > > > - (match_operand:VDQW 2 "s_register_operand")))] > > > + (match_operand:VDQWH 1 "s_register_operand") > > > + (match_operand:VDQWH 2 "s_register_operand")))] > > > "ARM_HAVE__ARITH > > > && !TARGET_REALLY_IWMMXT" > > > { > > > diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-compare-3.c b/gcc/testsuite/gcc.target/arm/simd/mve-compare-3.c > > > new file mode 100644 > > > index 0000000..76f81e8 > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.target/arm/simd/mve-compare-3.c > > > @@ -0,0 +1,38 @@ > > > +/* { dg-do assemble } */ > > > +/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */ > > > +/* { dg-add-options arm_v8_1m_mve_fp } */ > > > +/* { dg-additional-options "-O3 -funsafe-math-optimizations" } */ > > > + > > > +/* float 16 tests. */ > > > + > > > +#ifndef ELEM_TYPE > > > +#define ELEM_TYPE __fp16 > > > +#endif > > > +#ifndef INT_ELEM_TYPE > > > +#define INT_ELEM_TYPE __INT16_TYPE__ > > > +#endif > > > + > > > +#define COMPARE(NAME, OP) \ > > > + int_vec \ > > > + cmp_##NAME##_reg (vec a, vec b) \ > > > + { \ > > > + return a OP b; \ > > > + } > > > + > > > +typedef INT_ELEM_TYPE int_vec __attribute__((vector_size(16))); > > > +typedef ELEM_TYPE vec __attribute__((vector_size(16))); > > > + > > > +COMPARE (eq, ==) > > > +COMPARE (ne, !=) > > > +COMPARE (lt, <) > > > +COMPARE (le, <=) > > > +COMPARE (gt, >) > > > +COMPARE (ge, >=) > > > + > > > +/* eq, ne, lt, le, gt, ge. > > > +/* { dg-final { scan-assembler-times {\tvcmp.f16\teq, q[0-9]+, q[0-9]+\n} 1 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.f16\tne, q[0-9]+, q[0-9]+\n} 1 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.f16\tlt, q[0-9]+, q[0-9]+\n} 1 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.f16\tle, q[0-9]+, q[0-9]+\n} 1 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.f16\tgt, q[0-9]+, q[0-9]+\n} 1 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.f16\tge, q[0-9]+, q[0-9]+\n} 1 } } */ > > > diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f16.c b/gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f16.c > > > new file mode 100644 > > > index 0000000..dbae2d1 > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f16.c > > > @@ -0,0 +1,30 @@ > > > +/* { dg-do assemble } */ > > > +/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */ > > > +/* { dg-add-options arm_v8_1m_mve_fp } */ > > > +/* { dg-additional-options "-O3 -funsafe-math-optimizations" } */ > > > + > > > +#include > > > + > > > +#define NB 8 > > > + > > > +#define FUNC(OP, NAME) \ > > > + void test_ ## NAME ##_f (__fp16 * __restrict__ dest, __fp16 *a, __fp16 *b) { \ > > > + int i; \ > > > + for (i=0; i > > + dest[i] = a[i] OP b[i]; \ > > > + } \ > > > + } > > > + > > > +FUNC(==, vcmpeq) > > > +FUNC(!=, vcmpne) > > > +FUNC(<, vcmplt) > > > +FUNC(<=, vcmple) > > > +FUNC(>, vcmpgt) > > > +FUNC(>=, vcmpge) > > > + > > > +/* { dg-final { scan-assembler-times {\tvcmp.f16\teq, q[0-9]+, q[0-9]+\n} 1 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.f16\tne, q[0-9]+, q[0-9]+\n} 1 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.f16\tlt, q[0-9]+, q[0-9]+\n} 1 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.f16\tle, q[0-9]+, q[0-9]+\n} 1 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.f16\tgt, q[0-9]+, q[0-9]+\n} 1 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.f16\tge, q[0-9]+, q[0-9]+\n} 1 } } */