From: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
To: Christophe Lyon <christophe.lyon@linaro.org>,
Andre Simoes Dias Vieira <Andre.SimoesDiasVieira@arm.com>
Cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: RE: [PATCH 7/9] arm: Auto-vectorization for MVE: add __fp16 support to VCMP
Date: Mon, 17 May 2021 10:49:03 +0000 [thread overview]
Message-ID: <PAXPR08MB6926D0D0E49FE9AD43FFB5D7932D9@PAXPR08MB6926.eurprd08.prod.outlook.com> (raw)
In-Reply-To: <CAKdteOYGBnLr0tPu0KzhwKQzH-us-d9v+mgdidU5SvPuOuTDqw@mail.gmail.com>
> -----Original Message-----
> From: Gcc-patches <gcc-patches-bounces@gcc.gnu.org> On Behalf Of
> Christophe Lyon via Gcc-patches
> Sent: 05 May 2021 15:09
> To: Andre Simoes Dias Vieira <Andre.SimoesDiasVieira@arm.com>
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH 7/9] arm: Auto-vectorization for MVE: add __fp16
> support to VCMP
>
> On Tue, 4 May 2021 at 19:03, Christophe Lyon <christophe.lyon@linaro.org>
> wrote:
> >
> > On Tue, 4 May 2021 at 15:43, Christophe Lyon
> <christophe.lyon@linaro.org> wrote:
> > >
> > > On Tue, 4 May 2021 at 13:48, Andre Vieira (lists)
> > > <andre.simoesdiasvieira@arm.com> wrote:
> > > >
> > > > It would be good to also add tests for NEON as you also enable auto-
> vec
> > > > for it. I checked and I do think the necessary 'neon_vc' patterns exist
> > > > for 'VH', so we should be OK there.
> > > >
> > >
> > > Actually since I posted the patch series, I've noticed a regression in
> > > armv8_2-fp16-arith-1.c, because we now vectorize all the float16x[48]_t
> loops,
> > > but we lose the fact that some FP comparisons can throw exceptions.
> > >
> > > I'll have to revisit this patch.
> >
> > Actually it looks like my patch does the right thing: we now vectorize
> > appropriately, given that the testcase is compiled with -ffast-math.
> > I need to update the testcase, though.
> >
>
> Here is a new version, with armv8_2-fp16-arith-1.c updated to take
> into account the new vectorization.
Ok.
Thanks,
Kyrill
>
> Christophe
>
>
> > >
> > > Thanks,
> > >
> > > Christophe
> > >
> > > > On 30/04/2021 15:09, Christophe Lyon via Gcc-patches wrote:
> > > > > This patch adds __fp16 support to the previous patch that added
> vcmp
> > > > > support with MVE. For this we update existing expanders to use
> VDQWH
> > > > > iterator, and add a new expander vcond<VH_cvtto><mode>. In the
> > > > > process we need to create suitable iterators, and update
> v_cmp_result
> > > > > as needed.
> > > > >
> > > > > 2021-04-26 Christophe Lyon <christophe.lyon@linaro.org>
> > > > >
> > > > > gcc/
> > > > > * config/arm/iterators.md (V16): New iterator.
> > > > > (VH_cvtto): New iterator.
> > > > > (v_cmp_result): Added V4HF and V8HF support.
> > > > > * config/arm/vec-common.md (vec_cmp<mode><v_cmp_result>):
> Use VDQWH.
> > > > > (vcond<mode><mode>): Likewise.
> > > > > (vcond_mask_<mode><v_cmp_result>): Likewise.
> > > > > (vcond<VH_cvtto><mode>): New expander.
> > > > >
> > > > > gcc/testsuite/
> > > > > * gcc.target/arm/simd/mve-compare-3.c: New test with GCC
> vectors.
> > > > > * gcc.target/arm/simd/mve-vcmp-f16.c: New test for
> > > > > auto-vectorization.
> > > > > ---
> > > > > gcc/config/arm/iterators.md | 6 ++++
> > > > > gcc/config/arm/vec-common.md | 40
> ++++++++++++++++-------
> > > > > gcc/testsuite/gcc.target/arm/simd/mve-compare-3.c | 38
> +++++++++++++++++++++
> > > > > gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f16.c | 30
> +++++++++++++++++
> > > > > 4 files changed, 102 insertions(+), 12 deletions(-)
> > > > > create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-
> compare-3.c
> > > > > create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-vcmp-
> f16.c
> > > > >
> > > > > diff --git a/gcc/config/arm/iterators.md
> b/gcc/config/arm/iterators.md
> > > > > index a128465..3042baf 100644
> > > > > --- a/gcc/config/arm/iterators.md
> > > > > +++ b/gcc/config/arm/iterators.md
> > > > > @@ -231,6 +231,9 @@ (define_mode_iterator VU [V16QI V8HI V4SI])
> > > > > ;; Vector modes for 16-bit floating-point support.
> > > > > (define_mode_iterator VH [V8HF V4HF])
> > > > >
> > > > > +;; Modes with 16-bit elements only.
> > > > > +(define_mode_iterator V16 [V4HI V4HF V8HI V8HF])
> > > > > +
> > > > > ;; 16-bit floating-point vector modes suitable for moving (includes
> BFmode).
> > > > > (define_mode_iterator VHFBF [V8HF V4HF V4BF V8BF])
> > > > >
> > > > > @@ -571,6 +574,8 @@ (define_mode_attr V_cvtto [(V2SI "v2sf")
> (V2SF "v2si")
> > > > > ;; (Opposite) mode to convert to/from for vector-half mode
> conversions.
> > > > > (define_mode_attr VH_CVTTO [(V4HI "V4HF") (V4HF "V4HI")
> > > > > (V8HI "V8HF") (V8HF "V8HI")])
> > > > > +(define_mode_attr VH_cvtto [(V4HI "v4hf") (V4HF "v4hi")
> > > > > + (V8HI "v8hf") (V8HF "v8hi")])
> > > > >
> > > > > ;; Define element mode for each vector mode.
> > > > > (define_mode_attr V_elem [(V8QI "QI") (V16QI "QI")
> > > > > @@ -720,6 +725,7 @@ (define_mode_attr V_cmp_result [(V8QI
> "V8QI") (V16QI "V16QI")
> > > > > (define_mode_attr v_cmp_result [(V8QI "v8qi") (V16QI "v16qi")
> > > > > (V4HI "v4hi") (V8HI "v8hi")
> > > > > (V2SI "v2si") (V4SI "v4si")
> > > > > + (V4HF "v4hi") (V8HF "v8hi")
> > > > > (DI "di") (V2DI "v2di")
> > > > > (V2SF "v2si") (V4SF "v4si")])
> > > > >
> > > > > diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-
> common.md
> > > > > index 034b48b..3fd341c 100644
> > > > > --- a/gcc/config/arm/vec-common.md
> > > > > +++ b/gcc/config/arm/vec-common.md
> > > > > @@ -366,8 +366,8 @@ (define_expand "vlshr<mode>3"
> > > > > (define_expand "vec_cmp<mode><v_cmp_result>"
> > > > > [(set (match_operand:<V_cmp_result> 0 "s_register_operand")
> > > > > (match_operator:<V_cmp_result> 1 "comparison_operator"
> > > > > - [(match_operand:VDQW 2 "s_register_operand")
> > > > > - (match_operand:VDQW 3 "reg_or_zero_operand")]))]
> > > > > + [(match_operand:VDQWH 2 "s_register_operand")
> > > > > + (match_operand:VDQWH 3 "reg_or_zero_operand")]))]
> > > > > "ARM_HAVE_<MODE>_ARITH
> > > > > && !TARGET_REALLY_IWMMXT
> > > > > && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
> > > > > @@ -399,13 +399,13 @@ (define_expand
> "vec_cmpu<mode><mode>"
> > > > > ;; element-wise.
> > > > >
> > > > > (define_expand "vcond<mode><mode>"
> > > > > - [(set (match_operand:VDQW 0 "s_register_operand")
> > > > > - (if_then_else:VDQW
> > > > > + [(set (match_operand:VDQWH 0 "s_register_operand")
> > > > > + (if_then_else:VDQWH
> > > > > (match_operator 3 "comparison_operator"
> > > > > - [(match_operand:VDQW 4 "s_register_operand")
> > > > > - (match_operand:VDQW 5 "reg_or_zero_operand")])
> > > > > - (match_operand:VDQW 1 "s_register_operand")
> > > > > - (match_operand:VDQW 2 "s_register_operand")))]
> > > > > + [(match_operand:VDQWH 4 "s_register_operand")
> > > > > + (match_operand:VDQWH 5 "reg_or_zero_operand")])
> > > > > + (match_operand:VDQWH 1 "s_register_operand")
> > > > > + (match_operand:VDQWH 2 "s_register_operand")))]
> > > > > "ARM_HAVE_<MODE>_ARITH
> > > > > && !TARGET_REALLY_IWMMXT
> > > > > && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
> > > > > @@ -430,6 +430,22 @@ (define_expand "vcond<V_cvtto><mode>"
> > > > > DONE;
> > > > > })
> > > > >
> > > > > +(define_expand "vcond<VH_cvtto><mode>"
> > > > > + [(set (match_operand:<VH_CVTTO> 0 "s_register_operand")
> > > > > + (if_then_else:<VH_CVTTO>
> > > > > + (match_operator 3 "comparison_operator"
> > > > > + [(match_operand:V16 4 "s_register_operand")
> > > > > + (match_operand:V16 5 "reg_or_zero_operand")])
> > > > > + (match_operand:<VH_CVTTO> 1 "s_register_operand")
> > > > > + (match_operand:<VH_CVTTO> 2 "s_register_operand")))]
> > > > > + "ARM_HAVE_<MODE>_ARITH
> > > > > + && !TARGET_REALLY_IWMMXT
> > > > > + && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
> > > > > +{
> > > > > + arm_expand_vcond (operands, <V_cmp_result>mode);
> > > > > + DONE;
> > > > > +})
> > > > > +
> > > > > (define_expand "vcondu<mode><v_cmp_result>"
> > > > > [(set (match_operand:VDQW 0 "s_register_operand")
> > > > > (if_then_else:VDQW
> > > > > @@ -446,11 +462,11 @@ (define_expand
> "vcondu<mode><v_cmp_result>"
> > > > > })
> > > > >
> > > > > (define_expand "vcond_mask_<mode><v_cmp_result>"
> > > > > - [(set (match_operand:VDQW 0 "s_register_operand")
> > > > > - (if_then_else:VDQW
> > > > > + [(set (match_operand:VDQWH 0 "s_register_operand")
> > > > > + (if_then_else:VDQWH
> > > > > (match_operand:<V_cmp_result> 3 "s_register_operand")
> > > > > - (match_operand:VDQW 1 "s_register_operand")
> > > > > - (match_operand:VDQW 2 "s_register_operand")))]
> > > > > + (match_operand:VDQWH 1 "s_register_operand")
> > > > > + (match_operand:VDQWH 2 "s_register_operand")))]
> > > > > "ARM_HAVE_<MODE>_ARITH
> > > > > && !TARGET_REALLY_IWMMXT"
> > > > > {
> > > > > diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-compare-3.c
> b/gcc/testsuite/gcc.target/arm/simd/mve-compare-3.c
> > > > > new file mode 100644
> > > > > index 0000000..76f81e8
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/arm/simd/mve-compare-3.c
> > > > > @@ -0,0 +1,38 @@
> > > > > +/* { dg-do assemble } */
> > > > > +/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
> > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > +/* { dg-additional-options "-O3 -funsafe-math-optimizations" } */
> > > > > +
> > > > > +/* float 16 tests. */
> > > > > +
> > > > > +#ifndef ELEM_TYPE
> > > > > +#define ELEM_TYPE __fp16
> > > > > +#endif
> > > > > +#ifndef INT_ELEM_TYPE
> > > > > +#define INT_ELEM_TYPE __INT16_TYPE__
> > > > > +#endif
> > > > > +
> > > > > +#define COMPARE(NAME, OP) \
> > > > > + int_vec \
> > > > > + cmp_##NAME##_reg (vec a, vec b) \
> > > > > + { \
> > > > > + return a OP b; \
> > > > > + }
> > > > > +
> > > > > +typedef INT_ELEM_TYPE int_vec __attribute__((vector_size(16)));
> > > > > +typedef ELEM_TYPE vec __attribute__((vector_size(16)));
> > > > > +
> > > > > +COMPARE (eq, ==)
> > > > > +COMPARE (ne, !=)
> > > > > +COMPARE (lt, <)
> > > > > +COMPARE (le, <=)
> > > > > +COMPARE (gt, >)
> > > > > +COMPARE (ge, >=)
> > > > > +
> > > > > +/* eq, ne, lt, le, gt, ge.
> > > > > +/* { dg-final { scan-assembler-times {\tvcmp.f16\teq, q[0-9]+, q[0-
> 9]+\n} 1 } } */
> > > > > +/* { dg-final { scan-assembler-times {\tvcmp.f16\tne, q[0-9]+, q[0-
> 9]+\n} 1 } } */
> > > > > +/* { dg-final { scan-assembler-times {\tvcmp.f16\tlt, q[0-9]+, q[0-
> 9]+\n} 1 } } */
> > > > > +/* { dg-final { scan-assembler-times {\tvcmp.f16\tle, q[0-9]+, q[0-
> 9]+\n} 1 } } */
> > > > > +/* { dg-final { scan-assembler-times {\tvcmp.f16\tgt, q[0-9]+, q[0-
> 9]+\n} 1 } } */
> > > > > +/* { dg-final { scan-assembler-times {\tvcmp.f16\tge, q[0-9]+, q[0-
> 9]+\n} 1 } } */
> > > > > diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f16.c
> b/gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f16.c
> > > > > new file mode 100644
> > > > > index 0000000..dbae2d1
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f16.c
> > > > > @@ -0,0 +1,30 @@
> > > > > +/* { dg-do assemble } */
> > > > > +/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
> > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > +/* { dg-additional-options "-O3 -funsafe-math-optimizations" } */
> > > > > +
> > > > > +#include <stdint.h>
> > > > > +
> > > > > +#define NB 8
> > > > > +
> > > > > +#define FUNC(OP, NAME) \
> > > > > + void test_ ## NAME ##_f (__fp16 * __restrict__ dest, __fp16 *a,
> __fp16 *b) { \
> > > > > + int i; \
> > > > > + for (i=0; i<NB; i++) { \
> > > > > + dest[i] = a[i] OP b[i]; \
> > > > > + } \
> > > > > + }
> > > > > +
> > > > > +FUNC(==, vcmpeq)
> > > > > +FUNC(!=, vcmpne)
> > > > > +FUNC(<, vcmplt)
> > > > > +FUNC(<=, vcmple)
> > > > > +FUNC(>, vcmpgt)
> > > > > +FUNC(>=, vcmpge)
> > > > > +
> > > > > +/* { dg-final { scan-assembler-times {\tvcmp.f16\teq, q[0-9]+, q[0-
> 9]+\n} 1 } } */
> > > > > +/* { dg-final { scan-assembler-times {\tvcmp.f16\tne, q[0-9]+, q[0-
> 9]+\n} 1 } } */
> > > > > +/* { dg-final { scan-assembler-times {\tvcmp.f16\tlt, q[0-9]+, q[0-
> 9]+\n} 1 } } */
> > > > > +/* { dg-final { scan-assembler-times {\tvcmp.f16\tle, q[0-9]+, q[0-
> 9]+\n} 1 } } */
> > > > > +/* { dg-final { scan-assembler-times {\tvcmp.f16\tgt, q[0-9]+, q[0-
> 9]+\n} 1 } } */
> > > > > +/* { dg-final { scan-assembler-times {\tvcmp.f16\tge, q[0-9]+, q[0-
> 9]+\n} 1 } } */
next prev parent reply other threads:[~2021-05-17 10:49 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-04-30 14:09 [PATCH 1/9] arm: MVE: Convert vcmp[eq|ne]* in arm_mve.h to use only 's' builtin version Christophe Lyon
2021-04-30 14:09 ` [PATCH 2/9] arm: MVE: Cleanup vcmpne/vcmpeq builtins Christophe Lyon
2021-05-10 11:57 ` Kyrylo Tkachov
2021-04-30 14:09 ` [PATCH 3/9] arm: MVE: Remove _s and _u suffixes from vcmp* builtins Christophe Lyon
2021-05-10 11:58 ` Kyrylo Tkachov
2021-04-30 14:09 ` [PATCH 4/9] arm: MVE: Factorize all vcmp* integer patterns Christophe Lyon
2021-05-10 11:59 ` Kyrylo Tkachov
2021-04-30 14:09 ` [PATCH 5/9] arm: MVE: Factorize vcmp_*f* Christophe Lyon
2021-05-10 11:59 ` Kyrylo Tkachov
2021-04-30 14:09 ` [PATCH 6/9] arm: Auto-vectorization for MVE: vcmp Christophe Lyon
2021-05-04 11:29 ` Andre Vieira (lists)
2021-05-04 13:41 ` Christophe Lyon
2021-05-05 14:08 ` Christophe Lyon
2021-05-17 9:54 ` Christophe Lyon
2021-05-17 10:35 ` Kyrylo Tkachov
2021-05-17 12:31 ` Christophe Lyon
2021-04-30 14:09 ` [PATCH 7/9] arm: Auto-vectorization for MVE: add __fp16 support to VCMP Christophe Lyon
2021-05-04 11:48 ` Andre Vieira (lists)
2021-05-04 13:43 ` Christophe Lyon
2021-05-04 17:03 ` Christophe Lyon
2021-05-05 14:09 ` Christophe Lyon
2021-05-17 9:54 ` Christophe Lyon
2021-05-17 10:49 ` Kyrylo Tkachov [this message]
2021-04-30 14:09 ` [PATCH 8/9] arm: Auto-vectorization for MVE: vld2/vst2 Christophe Lyon
2021-05-17 9:55 ` Christophe Lyon
2021-05-24 7:19 ` Christophe Lyon
2021-05-24 12:15 ` Kyrylo Tkachov
2021-04-30 14:09 ` [PATCH 9/9] arm: Auto-vectorization for MVE: vld4/vst4 Christophe Lyon
2021-05-04 12:03 ` Andre Vieira (lists)
2021-05-04 14:57 ` Christophe Lyon
2021-05-17 9:55 ` Christophe Lyon
2021-05-24 7:20 ` Christophe Lyon
2021-05-24 12:15 ` Kyrylo Tkachov
2021-05-10 11:21 ` [PATCH 1/9] arm: MVE: Convert vcmp[eq|ne]* in arm_mve.h to use only 's' builtin version Christophe Lyon
2021-05-10 11:54 ` Kyrylo Tkachov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=PAXPR08MB6926D0D0E49FE9AD43FFB5D7932D9@PAXPR08MB6926.eurprd08.prod.outlook.com \
--to=kyrylo.tkachov@arm.com \
--cc=Andre.SimoesDiasVieira@arm.com \
--cc=christophe.lyon@linaro.org \
--cc=gcc-patches@gcc.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).