public inbox for gcc-cvs@sourceware.org help / color / mirror / Atom feed
From: Christophe Lyon <clyon@gcc.gnu.org> To: gcc-cvs@gcc.gnu.org Subject: [gcc r12-1334] arm: Auto-vectorization for MVE and Neon: vhadd/vrhadd Date: Wed, 9 Jun 2021 16:00:20 +0000 (GMT) [thread overview] Message-ID: <20210609160020.3C554384780F@sourceware.org> (raw) https://gcc.gnu.org/g:880198da50e1beac9b7cf8ff1bff570359c5f2a0 commit r12-1334-g880198da50e1beac9b7cf8ff1bff570359c5f2a0 Author: Christophe Lyon <christophe.lyon@linaro.org> Date: Wed Jun 9 16:00:01 2021 +0000 arm: Auto-vectorization for MVE and Neon: vhadd/vrhadd This patch adds support for auto-vectorization of average value computation using vhadd or vrhadd, for both MVE and Neon. The patch adds the needed [u]avg<mode>3_[floor|ceil] patterns to vec-common.md, I'm not sure how to factorize them without introducing an unspec iterator? It also adds tests for 'floor' and for 'ceil', each for MVE and Neon. 2021-06-09 Christophe Lyon <christophe.lyon@linaro.org> gcc/ * config/arm/mve.md (mve_vhaddq_<supf><mode>): Prefix with '@'. (@mve_vrhaddq_<supf><mode): Likewise. * config/arm/neon.md (neon_v<r>hadd<sup><mode>): Likewise. * config/arm/vec-common.md (avg<mode>3_floor, uavg<mode>3_floor) (avg<mode>3_ceil", uavg<mode>3_ceil): New patterns. gcc/testsuite/ * gcc.target/arm/simd/mve-vhadd-1.c: New test. * gcc.target/arm/simd/mve-vhadd-2.c: New test. * gcc.target/arm/simd/neon-vhadd-1.c: New test. * gcc.target/arm/simd/neon-vhadd-2.c: New test. Diff: --- gcc/config/arm/mve.md | 4 +- gcc/config/arm/neon.md | 2 +- gcc/config/arm/vec-common.md | 60 ++++++++++++++++++++++++ gcc/testsuite/gcc.target/arm/simd/mve-vhadd-1.c | 31 ++++++++++++ gcc/testsuite/gcc.target/arm/simd/mve-vhadd-2.c | 31 ++++++++++++ gcc/testsuite/gcc.target/arm/simd/neon-vhadd-1.c | 34 ++++++++++++++ gcc/testsuite/gcc.target/arm/simd/neon-vhadd-2.c | 33 +++++++++++++ 7 files changed, 192 insertions(+), 3 deletions(-) diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md index 0bfa6a91d55..04aa612331a 100644 --- a/gcc/config/arm/mve.md +++ b/gcc/config/arm/mve.md @@ -1030,7 +1030,7 @@ ;; ;; [vhaddq_s, vhaddq_u]) ;; -(define_insn "mve_vhaddq_<supf><mode>" +(define_insn "@mve_vhaddq_<supf><mode>" [ (set (match_operand:MVE_2 0 "s_register_operand" "=w") (unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w") @@ -1652,7 +1652,7 @@ ;; ;; [vrhaddq_s, vrhaddq_u]) ;; -(define_insn "mve_vrhaddq_<supf><mode>" +(define_insn "@mve_vrhaddq_<supf><mode>" [ (set (match_operand:MVE_2 0 "s_register_operand" "=w") (unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w") diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index 077c62ffd20..18571d819eb 100644 --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -1488,7 +1488,7 @@ ; vhadd and vrhadd. -(define_insn "neon_v<r>hadd<sup><mode>" +(define_insn "@neon_v<r>hadd<sup><mode>" [(set (match_operand:VDQIW 0 "s_register_operand" "=w") (unspec:VDQIW [(match_operand:VDQIW 1 "s_register_operand" "w") (match_operand:VDQIW 2 "s_register_operand" "w")] diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md index 80b273229f5..2779c1a8aaa 100644 --- a/gcc/config/arm/vec-common.md +++ b/gcc/config/arm/vec-common.md @@ -565,3 +565,63 @@ DONE; }) + +(define_expand "avg<mode>3_floor" + [(match_operand:MVE_2 0 "s_register_operand") + (match_operand:MVE_2 1 "s_register_operand") + (match_operand:MVE_2 2 "s_register_operand")] + "ARM_HAVE_<MODE>_ARITH" +{ + if (TARGET_HAVE_MVE) + emit_insn (gen_mve_vhaddq (VHADDQ_S, <MODE>mode, + operands[0], operands[1], operands[2])); + else + emit_insn (gen_neon_vhadd (UNSPEC_VHADD_S, UNSPEC_VHADD_S, <MODE>mode, + operands[0], operands[1], operands[2])); + DONE; +}) + +(define_expand "uavg<mode>3_floor" + [(match_operand:MVE_2 0 "s_register_operand") + (match_operand:MVE_2 1 "s_register_operand") + (match_operand:MVE_2 2 "s_register_operand")] + "ARM_HAVE_<MODE>_ARITH" +{ + if (TARGET_HAVE_MVE) + emit_insn (gen_mve_vhaddq (VHADDQ_U, <MODE>mode, + operands[0], operands[1], operands[2])); + else + emit_insn (gen_neon_vhadd (UNSPEC_VHADD_U, UNSPEC_VHADD_U, <MODE>mode, + operands[0], operands[1], operands[2])); + DONE; +}) + +(define_expand "avg<mode>3_ceil" + [(match_operand:MVE_2 0 "s_register_operand") + (match_operand:MVE_2 1 "s_register_operand") + (match_operand:MVE_2 2 "s_register_operand")] + "ARM_HAVE_<MODE>_ARITH" +{ + if (TARGET_HAVE_MVE) + emit_insn (gen_mve_vrhaddq (VRHADDQ_S, <MODE>mode, + operands[0], operands[1], operands[2])); + else + emit_insn (gen_neon_vhadd (UNSPEC_VRHADD_S, UNSPEC_VRHADD_S, <MODE>mode, + operands[0], operands[1], operands[2])); + DONE; +}) + +(define_expand "uavg<mode>3_ceil" + [(match_operand:MVE_2 0 "s_register_operand") + (match_operand:MVE_2 1 "s_register_operand") + (match_operand:MVE_2 2 "s_register_operand")] + "ARM_HAVE_<MODE>_ARITH" +{ + if (TARGET_HAVE_MVE) + emit_insn (gen_mve_vrhaddq (VRHADDQ_U, <MODE>mode, + operands[0], operands[1], operands[2])); + else + emit_insn (gen_neon_vhadd (UNSPEC_VRHADD_U, UNSPEC_VRHADD_U, <MODE>mode, + operands[0], operands[1], operands[2])); + DONE; +}) diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vhadd-1.c b/gcc/testsuite/gcc.target/arm/simd/mve-vhadd-1.c new file mode 100644 index 00000000000..19d5f5aa44f --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vhadd-1.c @@ -0,0 +1,31 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_v8_1m_mve_ok } */ +/* { dg-add-options arm_v8_1m_mve } */ +/* { dg-additional-options "-O3" } */ + +#include <stdint.h> + +/* We force a cast to int64_t to enable the vectorizer when dealing with 32-bit + inputs. */ +#define FUNC(SIGN, TYPE, BITS, OP, NAME) \ + void test_ ## NAME ##_ ## SIGN ## BITS (TYPE##BITS##_t * __restrict__ dest, \ + TYPE##BITS##_t *a, TYPE##BITS##_t *b) { \ + int i; \ + for (i=0; i < (128 / BITS); i++) { \ + dest[i] = ((int64_t)a[i] OP b[i]) >> 1; \ + } \ +} + +FUNC(s, int, 32, +, vhadd) +FUNC(u, uint, 32, +, vhadd) +FUNC(s, int, 16, +, vhadd) +FUNC(u, uint, 16, +, vhadd) +FUNC(s, int, 8, +, vhadd) +FUNC(u, uint, 8, +, vhadd) + +/* { dg-final { scan-assembler-times {vhadd\.s32\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } } */ +/* { dg-final { scan-assembler-times {vhadd\.u32\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } } */ +/* { dg-final { scan-assembler-times {vhadd\.s16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } } */ +/* { dg-final { scan-assembler-times {vhadd\.u16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } } */ +/* { dg-final { scan-assembler-times {vhadd\.s8\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } } */ +/* { dg-final { scan-assembler-times {vhadd\.u8\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } } */ diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vhadd-2.c b/gcc/testsuite/gcc.target/arm/simd/mve-vhadd-2.c new file mode 100644 index 00000000000..30029fc86b3 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vhadd-2.c @@ -0,0 +1,31 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_v8_1m_mve_ok } */ +/* { dg-add-options arm_v8_1m_mve } */ +/* { dg-additional-options "-O3" } */ + +#include <stdint.h> + +/* We force a cast to int64_t to enable the vectorizer when dealing with 32-bit + inputs. */ +#define FUNC(SIGN, TYPE, BITS, OP, NAME) \ + void test_ ## NAME ##_ ## SIGN ## BITS (TYPE##BITS##_t * __restrict__ dest, \ + TYPE##BITS##_t *a, TYPE##BITS##_t *b) { \ + int i; \ + for (i=0; i < (128 / BITS); i++) { \ + dest[i] = ((int64_t)a[i] OP b[i] + 1) >> 1; \ + } \ +} + +FUNC(s, int, 32, +, vrhadd) +FUNC(u, uint, 32, +, vrhadd) +FUNC(s, int, 16, +, vrhadd) +FUNC(u, uint, 16, +, vrhadd) +FUNC(s, int, 8, +, vrhadd) +FUNC(u, uint, 8, +, vrhadd) + +/* { dg-final { scan-assembler-times {vrhadd\.s32\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } } */ +/* { dg-final { scan-assembler-times {vrhadd\.u32\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } } */ +/* { dg-final { scan-assembler-times {vrhadd\.s16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } } */ +/* { dg-final { scan-assembler-times {vrhadd\.u16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } } */ +/* { dg-final { scan-assembler-times {vrhadd\.s8\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } } */ +/* { dg-final { scan-assembler-times {vrhadd\.u8\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } } */ diff --git a/gcc/testsuite/gcc.target/arm/simd/neon-vhadd-1.c b/gcc/testsuite/gcc.target/arm/simd/neon-vhadd-1.c new file mode 100644 index 00000000000..ce577849f03 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/simd/neon-vhadd-1.c @@ -0,0 +1,34 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_neon_ok } */ +/* { dg-add-options arm_neon } */ +/* { dg-additional-options "-O3" } */ + +#include <stdint.h> + +/* Since we have implemented the avg* optabs for 128-bit vectors only, use + enough iterations to check that vectorization works as expected. */ + +/* We force a cast to int64_t to enable the vectorizer when dealing with 32-bit + inputs. */ +#define FUNC(SIGN, TYPE, BITS, OP, NAME) \ + void test_ ## NAME ##_ ## SIGN ## BITS (TYPE##BITS##_t * __restrict__ dest, \ + TYPE##BITS##_t *a, TYPE##BITS##_t *b) { \ + int i; \ + for (i=0; i < (128 / BITS); i++) { \ + dest[i] = ((int64_t)a[i] OP b[i]) >> 1; \ + } \ +} + +FUNC(s, int, 32, +, vhadd) +FUNC(u, uint, 32, +, vhadd) +FUNC(s, int, 16, +, vhadd) +FUNC(u, uint, 16, +, vhadd) +FUNC(s, int, 8, +, vhadd) +FUNC(u, uint, 8, +, vhadd) + +/* { dg-final { scan-assembler-times {vhadd\.s32\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } } */ +/* { dg-final { scan-assembler-times {vhadd\.u32\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } } */ +/* { dg-final { scan-assembler-times {vhadd\.s16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } } */ +/* { dg-final { scan-assembler-times {vhadd\.u16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } } */ +/* { dg-final { scan-assembler-times {vhadd\.s8\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } } */ +/* { dg-final { scan-assembler-times {vhadd\.u8\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } } */ diff --git a/gcc/testsuite/gcc.target/arm/simd/neon-vhadd-2.c b/gcc/testsuite/gcc.target/arm/simd/neon-vhadd-2.c new file mode 100644 index 00000000000..f2692542f9b --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/simd/neon-vhadd-2.c @@ -0,0 +1,33 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_neon_ok } */ +/* { dg-add-options arm_neon } */ +/* { dg-additional-options "-O3" } */ + +#include <stdint.h> + +/* Since we default to -mvectorize-with-neon-quad, use enough iterations so that + we can vectorize using 128-bit vectors. */ +/* We force a cast to int64_t to enable the vectorizer when dealing with 32-bit + inputs. */ +#define FUNC(SIGN, TYPE, BITS, OP, NAME) \ + void test_ ## NAME ##_ ## SIGN ## BITS (TYPE##BITS##_t * __restrict__ dest, \ + TYPE##BITS##_t *a, TYPE##BITS##_t *b) { \ + int i; \ + for (i=0; i < (128 / BITS); i++) { \ + dest[i] = ((int64_t)a[i] OP b[i] + 1) >> 1; \ + } \ +} + +FUNC(s, int, 32, +, vrhadd) +FUNC(u, uint, 32, +, vrhadd) +FUNC(s, int, 16, +, vrhadd) +FUNC(u, uint, 16, +, vrhadd) +FUNC(s, int, 8, +, vrhadd) +FUNC(u, uint, 8, +, vrhadd) + +/* { dg-final { scan-assembler-times {vrhadd\.s32\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } } */ +/* { dg-final { scan-assembler-times {vrhadd\.u32\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } } */ +/* { dg-final { scan-assembler-times {vrhadd\.s16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } } */ +/* { dg-final { scan-assembler-times {vrhadd\.u16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } } */ +/* { dg-final { scan-assembler-times {vrhadd\.s8\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } } */ +/* { dg-final { scan-assembler-times {vrhadd\.u8\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } } */
reply other threads:[~2021-06-09 16:00 UTC|newest] Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20210609160020.3C554384780F@sourceware.org \ --to=clyon@gcc.gnu.org \ --cc=gcc-cvs@gcc.gnu.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).