Hi, This is the latest version of the patch. I am forcing -mfloat-abi=hard because the code generated is slightly differently depending on the float-abi used. Thanks, Delia On 3/4/20 5:20 PM, Kyrill Tkachov wrote: > Hi Delia, > > On 3/4/20 2:05 PM, Delia Burduv wrote: >> Hi, >> >> The previous version of this patch shared part of its code with the >> store intrinsics patch >> (https://gcc.gnu.org/ml/gcc-patches/2020-03/msg00145.html) so I removed >> any duplicated code. This patch now depends on the previously mentioned >> store intrinsics patch. >> >> Here is the latest version and the updated ChangeLog. >> >> gcc/ChangeLog: >> >> 2019-03-04  Delia Burduv  >> >>         * config/arm/arm_neon.h (bfloat16_t): New typedef. >>          (vld2_bf16): New. >>         (vld2q_bf16): New. >>         (vld3_bf16): New. >>         (vld3q_bf16): New. >>         (vld4_bf16): New. >>         (vld4q_bf16): New. >>         (vld2_dup_bf16): New. >>         (vld2q_dup_bf16): New. >>          (vld3_dup_bf16): New. >>         (vld3q_dup_bf16): New. >>         (vld4_dup_bf16): New. >>         (vld4q_dup_bf16): New. >>          * config/arm/arm_neon_builtins.def >>          (vld2): Changed to VAR13 and added v4bf, v8bf >>          (vld2_dup): Changed to VAR8 and added v4bf, v8bf >>          (vld3): Changed to VAR13 and added v4bf, v8bf >>          (vld3_dup): Changed to VAR8 and added v4bf, v8bf >>          (vld4): Changed to VAR13 and added v4bf, v8bf >>          (vld4_dup): Changed to VAR8 and added v4bf, v8bf >>          * config/arm/iterators.md (VDXBF): New iterator. >>          (VQ2BF): New iterator. >>          *config/arm/neon.md (vld2): Used new iterators. >>          (vld2_dup): Used new iterators. >>          (vld2_dupv8bf): New. >>          (vst3): Used new iterators. >>          (vst3qa): Used new iterators. >>          (vst3qb): Used new iterators. >>          (vld3_dup): Used new iterators. >>          (vld3_dupv8bf): New. >>          (vst4): Used new iterators. >>          (vst4qa): Used new iterators. >>          (vst4qb): Used new iterators. >>          (vld4_dup): Used new iterators. >>          (vld4_dupv8bf): New. >> >> >> gcc/testsuite/ChangeLog: >> >> 2019-03-04  Delia Burduv  >> >>         * gcc.target/arm/simd/bf16_vldn_1.c: New test. >> >> Thanks, >> Delia >> >> On 2/19/20 5:25 PM, Delia Burduv wrote: >> > >> > Hi, >> > >> > Here is the latest version of the patch. It just has some minor >> > formatting changes that were brought up by Richard Sandiford in the >> > AArch64 patches >> > >> > Thanks, >> > Delia >> > >> > On 1/22/20 5:31 PM, Delia Burduv wrote: >> >> Ping. >> >> >> >> I will change the tests to use the exact input and output registers as >> >> Richard Sandiford suggested for the AArch64 patches. >> >> >> >> On 12/20/19 6:48 PM, Delia Burduv wrote: >> >>> This patch adds the ARMv8.6 ACLE BFloat16 load intrinsics >> >>> vld{q}_bf16 as part of the BFloat16 extension. >> >>> >> (https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics) >> >> >>> >> >>> The intrinsics are declared in arm_neon.h . >> >>> A new test is added to check assembler output. >> >>> >> >>> This patch depends on the Arm back-end patche. >> >>> (https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html) >> >>> >> >>> Tested for regression on arm-none-eabi and armeb-none-eabi. I don't >> >>> have commit rights, so if this is ok can someone please commit it for >> >>> me? >> >>> >> >>> gcc/ChangeLog: >> >>> >> >>> 2019-11-14  Delia Burduv >> >>> >> >>>      * config/arm/arm_neon.h (bfloat16_t): New typedef. >> >>>          (bfloat16x4x2_t): New typedef. >> >>>          (bfloat16x8x2_t): New typedef. >> >>>          (bfloat16x4x3_t): New typedef. >> >>>          (bfloat16x8x3_t): New typedef. >> >>>          (bfloat16x4x4_t): New typedef. >> >>>          (bfloat16x8x4_t): New typedef. >> >>>          (vld2_bf16): New. >> >>>      (vld2q_bf16): New. >> >>>      (vld3_bf16): New. >> >>>      (vld3q_bf16): New. >> >>>      (vld4_bf16): New. >> >>>      (vld4q_bf16): New. >> >>>      (vld2_dup_bf16): New. >> >>>      (vld2q_dup_bf16): New. >> >>>       (vld3_dup_bf16): New. >> >>>      (vld3q_dup_bf16): New. >> >>>      (vld4_dup_bf16): New. >> >>>      (vld4q_dup_bf16): New. >> >>>          * config/arm/arm-builtins.c (E_V2BFmode): New mode. >> >>>          (VAR13): New. >> >>>          (arm_simd_types[Bfloat16x2_t]):New type. >> >>>          * config/arm/arm-modes.def (V2BF): New mode. >> >>>          * config/arm/arm-simd-builtin-types.def >> >>>          (Bfloat16x2_t): New entry. >> >>>          * config/arm/arm_neon_builtins.def >> >>>          (vld2): Changed to VAR13 and added v4bf, v8bf >> >>>          (vld2_dup): Changed to VAR8 and added v4bf, v8bf >> >>>          (vld3): Changed to VAR13 and added v4bf, v8bf >> >>>          (vld3_dup): Changed to VAR8 and added v4bf, v8bf >> >>>          (vld4): Changed to VAR13 and added v4bf, v8bf >> >>>          (vld4_dup): Changed to VAR8 and added v4bf, v8bf >> >>>          * config/arm/iterators.md (VDXBF): New iterator. >> >>>          (VQ2BF): New iterator. >> >>>          (V_elem): Added V4BF, V8BF. >> >>>          (V_sz_elem): Added V4BF, V8BF. >> >>>          (V_mode_nunits): Added V4BF, V8BF. >> >>>          (q): Added V4BF, V8BF. >> >>>          *config/arm/neon.md (vld2): Used new iterators. >> >>>          (vld2_dup): Used new iterators. >> >>>          (vld2_dupv8bf): New. >> >>>          (vst3): Used new iterators. >> >>>          (vst3qa): Used new iterators. >> >>>          (vst3qb): Used new iterators. >> >>>          (vld3_dup): Used new iterators. >> >>>          (vld3_dupv8bf): New. >> >>>          (vst4): Used new iterators. >> >>>          (vst4qa): Used new iterators. >> >>>          (vst4qb): Used new iterators. >> >>>          (vld4_dup): Used new iterators. >> >>>          (vld4_dupv8bf): New. >> >>> >> >>> >> >>> gcc/testsuite/ChangeLog: >> >>> >> >>> 2019-11-14  Delia Burduv >> >>> >> >>>      * gcc.target/arm/simd/bf16_vldn_1.c: New test. > > > diff --git a/gcc/testsuite/gcc.target/arm/simd/bf16_vldn_1.c > b/gcc/testsuite/gcc.target/arm/simd/bf16_vldn_1.c > new file mode 100644 > index > 0000000000000000000000000000000000000000..7ff8b600827e5c2e313ce40d14382aa641b4bb31 > > --- /dev/null > +++ b/gcc/testsuite/gcc.target/arm/simd/bf16_vldn_1.c > @@ -0,0 +1,152 @@ > +/* { dg-do assemble } */ > +/* { dg-options "-save-temps" }  */ > +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */ > +/* { dg-add-options arm_v8_2a_bf16_neon } */ > +/* { dg-final { check-function-bodies "**" "" } } */ > > > I think this should include an optimisation option like -O2 because... > >  + > +#include "arm_neon.h" > + > + > +/* > +**test_vld2_bf16: > +**    ... > +**    vld2.16    {d16-d17}, \[r3\] > > ... this is unstable codegen depending on the -O0 register allocator > moving the ptr argument to r3 from its initial r0. > This should really be r0 and the load instruction should load the low D > regs. > So let's add an -O2 to the dg-options and scan for the result of that. > > > Otherwise this is ok. > Thanks! > Kyrill > > >  +**    ... > +*/ > +bfloat16x4x2_t > +test_vld2_bf16 (bfloat16_t * ptr) > +{ > +  vld2_bf16 (ptr); > +} > + >