* [PATCH][GCC10][1/6] arm: Add vld1_lane_bf16 + vldq_lane_bf16 intrinsics
@ 2020-12-03 17:07 Andrea Corallo
2020-12-10 9:36 ` Andrea Corallo
2020-12-11 11:11 ` Kyrylo Tkachov
0 siblings, 2 replies; 4+ messages in thread
From: Andrea Corallo @ 2020-12-03 17:07 UTC (permalink / raw)
To: gcc-patches; +Cc: Kyrylo Tkachov, Richard Earnshaw, nd
[-- Attachment #1: Type: text/plain, Size: 419 bytes --]
Hi all,
first patch of the series to backport a number of bfloat16 intrinsics from
trunk to gcc-10.
These patch are including the fixes to the tests that we have applied
into master.
Please see refer to:
ACLE <https://developer.arm.com/docs/101028/latest>
ISA <https://developer.arm.com/docs/ddi0596/latest>
The serie has been bootstrapped on arm-linux-gnueabihf and regtested.
Okay for gcc-10?
Thanks
Andrea
[-- Attachment #2: 0001-arm-Add-vld1_lane_bf16-vldq_lane_bf16-intrinsics.patch --]
[-- Type: text/plain, Size: 5515 bytes --]
From 7b2080b71405918769811174082646219d23163c Mon Sep 17 00:00:00 2001
From: Andrea Corallo <andrea.corallo@arm.com>
Date: Wed, 21 Oct 2020 11:16:01 +0200
Subject: [PATCH 1/6] arm: Add vld1_lane_bf16 + vldq_lane_bf16 intrinsics
gcc/ChangeLog
2020-10-21 Andrea Corallo <andrea.corallo@arm.com>
* config/arm/arm_neon_builtins.def: Add to LOAD1LANE v4bf, v8bf.
* config/arm/arm_neon.h (vld1_lane_bf16, vld1q_lane_bf16): Add
intrinsics.
gcc/testsuite/ChangeLog
2020-10-21 Andrea Corallo <andrea.corallo@arm.com>
* gcc.target/arm/simd/vld1_lane_bf16_1.c: New testcase.
* gcc.target/arm/simd/vld1_lane_bf16_indices_1.c: Likewise.
* gcc.target/arm/simd/vld1q_lane_bf16_indices_1.c: Likewise.
---
gcc/config/arm/arm_neon.h | 14 ++++++++++++
gcc/config/arm/arm_neon_builtins.def | 4 ++--
.../gcc.target/arm/simd/vld1_lane_bf16_1.c | 22 +++++++++++++++++++
.../arm/simd/vld1_lane_bf16_indices_1.c | 19 ++++++++++++++++
.../arm/simd/vld1q_lane_bf16_indices_1.c | 19 ++++++++++++++++
5 files changed, 76 insertions(+), 2 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/arm/simd/vld1_lane_bf16_1.c
create mode 100644 gcc/testsuite/gcc.target/arm/simd/vld1_lane_bf16_indices_1.c
create mode 100644 gcc/testsuite/gcc.target/arm/simd/vld1q_lane_bf16_indices_1.c
diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index aa21730dea0..fcd8020425e 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -19665,6 +19665,20 @@ vld4q_dup_bf16 (const bfloat16_t * __ptr)
return __rv.__i;
}
+__extension__ extern __inline bfloat16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vld1_lane_bf16 (const bfloat16_t * __a, bfloat16x4_t __b, const int __c)
+{
+ return __builtin_neon_vld1_lanev4bf (__a, __b, __c);
+}
+
+__extension__ extern __inline bfloat16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vld1q_lane_bf16 (const bfloat16_t * __a, bfloat16x8_t __b, const int __c)
+{
+ return __builtin_neon_vld1_lanev8bf (__a, __b, __c);
+}
+
#pragma GCC pop_options
#ifdef __cplusplus
diff --git a/gcc/config/arm/arm_neon_builtins.def b/gcc/config/arm/arm_neon_builtins.def
index 34c1945c0a1..d0617a4695d 100644
--- a/gcc/config/arm/arm_neon_builtins.def
+++ b/gcc/config/arm/arm_neon_builtins.def
@@ -312,8 +312,8 @@ VAR1 (TERNOP, vtbx3, v8qi)
VAR1 (TERNOP, vtbx4, v8qi)
VAR12 (LOAD1, vld1,
v8qi, v4hi, v4hf, v2si, v2sf, di, v16qi, v8hi, v8hf, v4si, v4sf, v2di)
-VAR10 (LOAD1LANE, vld1_lane,
- v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di)
+VAR12 (LOAD1LANE, vld1_lane,
+ v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di, v4bf, v8bf)
VAR10 (LOAD1, vld1_dup,
v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di)
VAR12 (STORE1, vst1,
diff --git a/gcc/testsuite/gcc.target/arm/simd/vld1_lane_bf16_1.c b/gcc/testsuite/gcc.target/arm/simd/vld1_lane_bf16_1.c
new file mode 100644
index 00000000000..94fb38f32b8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/vld1_lane_bf16_1.c
@@ -0,0 +1,22 @@
+/* { dg-do assemble } */
+/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
+/* { dg-require-effective-target arm_hard_ok } */
+/* { dg-add-options arm_v8_2a_bf16_neon } */
+/* { dg-additional-options "-O3 --save-temps -mfloat-abi=hard" } */
+
+#include "arm_neon.h"
+
+bfloat16x4_t
+test_vld1_lane_bf16 (bfloat16_t *a, bfloat16x4_t b)
+{
+ return vld1_lane_bf16 (a, b, 1);
+}
+
+bfloat16x8_t
+test_vld1q_lane_bf16 (bfloat16_t *a, bfloat16x8_t b)
+{
+ return vld1q_lane_bf16 (a, b, 2);
+}
+
+/* { dg-final { scan-assembler "vld1.16\t{d0\\\[1\\\]}, \\\[r0\\\]" } } */
+/* { dg-final { scan-assembler "vld1.16\t{d0\\\[2\\\]}, \\\[r0\\\]" } } */
diff --git a/gcc/testsuite/gcc.target/arm/simd/vld1_lane_bf16_indices_1.c b/gcc/testsuite/gcc.target/arm/simd/vld1_lane_bf16_indices_1.c
new file mode 100644
index 00000000000..d9af512cf92
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/vld1_lane_bf16_indices_1.c
@@ -0,0 +1,19 @@
+/* { dg-do assemble } */
+/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
+/* { dg-require-effective-target arm_hard_ok } */
+/* { dg-add-options arm_v8_2a_bf16_neon } */
+/* { dg-additional-options "-mfloat-abi=hard" } */
+
+#include "arm_neon.h"
+
+bfloat16x4_t
+test_vld1_lane_bf16 (bfloat16_t *a, bfloat16x4_t b)
+{
+ bfloat16x4_t res;
+ res = vld1_lane_bf16 (a, b, -1);
+ res = vld1_lane_bf16 (a, b, 4);
+ return res;
+}
+
+/* { dg-error "lane -1 out of range 0 - 3" "" { target *-*-* } 0 } */
+/* { dg-error "lane 4 out of range 0 - 3" "" { target *-*-* } 0 } */
diff --git a/gcc/testsuite/gcc.target/arm/simd/vld1q_lane_bf16_indices_1.c b/gcc/testsuite/gcc.target/arm/simd/vld1q_lane_bf16_indices_1.c
new file mode 100644
index 00000000000..a73184c0f78
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/vld1q_lane_bf16_indices_1.c
@@ -0,0 +1,19 @@
+/* { dg-do assemble } */
+/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
+/* { dg-require-effective-target arm_hard_ok } */
+/* { dg-add-options arm_v8_2a_bf16_neon } */
+/* { dg-additional-options "-mfloat-abi=hard" } */
+
+#include "arm_neon.h"
+
+bfloat16x8_t
+test_vld1q_lane_bf16 (bfloat16_t *a, bfloat16x8_t b)
+{
+ bfloat16x8_t res;
+ res = vld1q_lane_bf16 (a, b, -1);
+ res = vld1q_lane_bf16 (a, b, 8);
+ return res;
+}
+
+/* { dg-error "lane -1 out of range 0 - 7" "" { target *-*-* } 0 } */
+/* { dg-error "lane 8 out of range 0 - 7" "" { target *-*-* } 0 } */
--
2.20.1
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH][GCC10][1/6] arm: Add vld1_lane_bf16 + vldq_lane_bf16 intrinsics
2020-12-03 17:07 [PATCH][GCC10][1/6] arm: Add vld1_lane_bf16 + vldq_lane_bf16 intrinsics Andrea Corallo
@ 2020-12-10 9:36 ` Andrea Corallo
2020-12-11 11:11 ` Kyrylo Tkachov
1 sibling, 0 replies; 4+ messages in thread
From: Andrea Corallo @ 2020-12-10 9:36 UTC (permalink / raw)
To: Andrea Corallo via Gcc-patches; +Cc: nd, Richard Earnshaw, Kyrylo Tkachov
Andrea Corallo via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> Hi all,
>
> first patch of the series to backport a number of bfloat16 intrinsics from
> trunk to gcc-10.
>
> These patch are including the fixes to the tests that we have applied
> into master.
>
> Please see refer to:
> ACLE <https://developer.arm.com/docs/101028/latest>
> ISA <https://developer.arm.com/docs/ddi0596/latest>
>
> The serie has been bootstrapped on arm-linux-gnueabihf and regtested.
>
> Okay for gcc-10?
>
> Thanks
>
> Andrea
Pinging this and all the serie.
Thanks
Andrea
^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: [PATCH][GCC10][1/6] arm: Add vld1_lane_bf16 + vldq_lane_bf16 intrinsics
2020-12-03 17:07 [PATCH][GCC10][1/6] arm: Add vld1_lane_bf16 + vldq_lane_bf16 intrinsics Andrea Corallo
2020-12-10 9:36 ` Andrea Corallo
@ 2020-12-11 11:11 ` Kyrylo Tkachov
2020-12-11 14:10 ` Andrea Corallo
1 sibling, 1 reply; 4+ messages in thread
From: Kyrylo Tkachov @ 2020-12-11 11:11 UTC (permalink / raw)
To: Andrea Corallo, gcc-patches; +Cc: Richard Earnshaw, nd
> -----Original Message-----
> From: Andrea Corallo <Andrea.Corallo@arm.com>
> Sent: 03 December 2020 17:08
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; nd <nd@arm.com>
> Subject: [PATCH][GCC10][1/6] arm: Add vld1_lane_bf16 + vldq_lane_bf16
> intrinsics
>
> Hi all,
>
> first patch of the series to backport a number of bfloat16 intrinsics from
> trunk to gcc-10.
>
> These patch are including the fixes to the tests that we have applied
> into master.
>
> Please see refer to:
> ACLE <https://developer.arm.com/docs/101028/latest>
> ISA <https://developer.arm.com/docs/ddi0596/latest>
>
> The serie has been bootstrapped on arm-linux-gnueabihf and regtested.
>
> Okay for gcc-10?
Ok.
Thanks,
Kyrill
>
> Thanks
>
> Andrea
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH][GCC10][1/6] arm: Add vld1_lane_bf16 + vldq_lane_bf16 intrinsics
2020-12-11 11:11 ` Kyrylo Tkachov
@ 2020-12-11 14:10 ` Andrea Corallo
0 siblings, 0 replies; 4+ messages in thread
From: Andrea Corallo @ 2020-12-11 14:10 UTC (permalink / raw)
To: Kyrylo Tkachov; +Cc: gcc-patches, Richard Earnshaw, nd
Kyrylo Tkachov <Kyrylo.Tkachov@arm.com> writes:
[...]
> Ok.
> Thanks,
> Kyrill
Hi Kyrill,
I've installed the serie into releases/gcc-10 as follow:
00be3a70dd8 arm: Add vstN_lane_bf16 + vstNq_lane_bf16 intrisics
69191da4f4f arm: Add vldN_lane_bf16 + vldNq_lane_bf16 intrisics
f09b8cc616a arm: Add vst1_bf16 + vst1q_bf16 intrinsics
caee9e676a5 arm: Add vld1_bf16 + vld1q_bf16 intrinsics
e875b07405f arm: Add vst1_lane_bf16 + vstq_lane_bf16 intrinsics
00b3e8408ab arm: Add vld1_lane_bf16 + vldq_lane_bf16 intrinsics
Thanks!
Andrea
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2020-12-11 14:10 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-03 17:07 [PATCH][GCC10][1/6] arm: Add vld1_lane_bf16 + vldq_lane_bf16 intrinsics Andrea Corallo
2020-12-10 9:36 ` Andrea Corallo
2020-12-11 11:11 ` Kyrylo Tkachov
2020-12-11 14:10 ` Andrea Corallo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).