Re: ACLE intrinsics: BFloat16 load intrinsics for AArch32

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>
To: Delia Burduv <delia.burduv@arm.com>,
	"gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Cc: "nickc@redhat.com" <nickc@redhat.com>,
	Richard Earnshaw <Richard.Earnshaw@arm.com>,
	Ramana Radhakrishnan <Ramana.Radhakrishnan@arm.com>
Subject: Re: ACLE intrinsics: BFloat16 load intrinsics for AArch32
Date: Fri, 06 Mar 2020 10:45:00 -0000	[thread overview]
Message-ID: <42e0b20e-313a-5dba-e81c-d7cd3bb552c4@foss.arm.com> (raw)
In-Reply-To: <03e394d8-9d16-ce0f-e478-e708b35bc3e1@arm.com>

Hi Delia,

On 3/5/20 4:38 PM, Delia Burduv wrote:
> Hi,
>
> This is the latest version of the patch. I am forcing -mfloat-abi=hard 
> because the code generated is slightly differently depending on the 
> float-abi used.


Thanks, I've pushed it with an updated ChangeLog.

2020-03-06  Delia Burduv  <delia.burduv@arm.com>

     * config/arm/arm_neon.h (vld2_bf16): New.
     (vld2q_bf16): New.
     (vld3_bf16): New.
     (vld3q_bf16): New.
     (vld4_bf16): New.
     (vld4q_bf16): New.
     (vld2_dup_bf16): New.
     (vld2q_dup_bf16): New.
     (vld3_dup_bf16): New.
     (vld3q_dup_bf16): New.
     (vld4_dup_bf16): New.
     (vld4q_dup_bf16): New.
     * config/arm/arm_neon_builtins.def
     (vld2): Changed to VAR13 and added v4bf, v8bf
     (vld2_dup): Changed to VAR8 and added v4bf, v8bf
     (vld3): Changed to VAR13 and added v4bf, v8bf
     (vld3_dup): Changed to VAR8 and added v4bf, v8bf
     (vld4): Changed to VAR13 and added v4bf, v8bf
     (vld4_dup): Changed to VAR8 and added v4bf, v8bf
     * config/arm/iterators.md (VDXBF2): New iterator.
     *config/arm/neon.md (neon_vld2): Use new iterators.
     (neon_vld2_dup<mode): Use new iterators.
     (neon_vld3<mode>): Likewise.
     (neon_vld3qa<mode>): Likewise.
     (neon_vld3qb<mode>): Likewise.
     (neon_vld3_dup<mode>): Likewise.
     (neon_vld4<mode>): Likewise.
     (neon_vld4qa<mode>): Likewise.
     (neon_vld4qb<mode>): Likewise.
     (neon_vld4_dup<mode>): Likewise.
     (neon_vld2_dupv8bf): New.
     (neon_vld3_dupv8bf): Likewise.
     (neon_vld4_dupv8bf): Likewise.

Kyrill


>
> Thanks,
> Delia
>
> On 3/4/20 5:20 PM, Kyrill Tkachov wrote:
>> Hi Delia,
>>
>> On 3/4/20 2:05 PM, Delia Burduv wrote:
>>> Hi,
>>>
>>> The previous version of this patch shared part of its code with the
>>> store intrinsics patch
>>> (https://gcc.gnu.org/ml/gcc-patches/2020-03/msg00145.html) so I removed
>>> any duplicated code. This patch now depends on the previously mentioned
>>> store intrinsics patch.
>>>
>>> Here is the latest version and the updated ChangeLog.
>>>
>>> gcc/ChangeLog:
>>>
>>> 2019-03-04  Delia Burduv  <delia.burduv@arm.com>
>>>
>>>         * config/arm/arm_neon.h (bfloat16_t): New typedef.
>>>          (vld2_bf16): New.
>>>         (vld2q_bf16): New.
>>>         (vld3_bf16): New.
>>>         (vld3q_bf16): New.
>>>         (vld4_bf16): New.
>>>         (vld4q_bf16): New.
>>>         (vld2_dup_bf16): New.
>>>         (vld2q_dup_bf16): New.
>>>          (vld3_dup_bf16): New.
>>>         (vld3q_dup_bf16): New.
>>>         (vld4_dup_bf16): New.
>>>         (vld4q_dup_bf16): New.
>>>          * config/arm/arm_neon_builtins.def
>>>          (vld2): Changed to VAR13 and added v4bf, v8bf
>>>          (vld2_dup): Changed to VAR8 and added v4bf, v8bf
>>>          (vld3): Changed to VAR13 and added v4bf, v8bf
>>>          (vld3_dup): Changed to VAR8 and added v4bf, v8bf
>>>          (vld4): Changed to VAR13 and added v4bf, v8bf
>>>          (vld4_dup): Changed to VAR8 and added v4bf, v8bf
>>>          * config/arm/iterators.md (VDXBF): New iterator.
>>>          (VQ2BF): New iterator.
>>>          *config/arm/neon.md (vld2): Used new iterators.
>>>          (vld2_dup<mode>): Used new iterators.
>>>          (vld2_dupv8bf): New.
>>>          (vst3): Used new iterators.
>>>          (vst3qa): Used new iterators.
>>>          (vst3qb): Used new iterators.
>>>          (vld3_dup<mode>): Used new iterators.
>>>          (vld3_dupv8bf): New.
>>>          (vst4): Used new iterators.
>>>          (vst4qa): Used new iterators.
>>>          (vst4qb): Used new iterators.
>>>          (vld4_dup<mode>): Used new iterators.
>>>          (vld4_dupv8bf): New.
>>>
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> 2019-03-04  Delia Burduv  <delia.burduv@arm.com>
>>>
>>>         * gcc.target/arm/simd/bf16_vldn_1.c: New test.
>>>
>>> Thanks,
>>> Delia
>>>
>>> On 2/19/20 5:25 PM, Delia Burduv wrote:
>>> >
>>> > Hi,
>>> >
>>> > Here is the latest version of the patch. It just has some minor
>>> > formatting changes that were brought up by Richard Sandiford in the
>>> > AArch64 patches
>>> >
>>> > Thanks,
>>> > Delia
>>> >
>>> > On 1/22/20 5:31 PM, Delia Burduv wrote:
>>> >> Ping.
>>> >>
>>> >> I will change the tests to use the exact input and output 
>>> registers as
>>> >> Richard Sandiford suggested for the AArch64 patches.
>>> >>
>>> >> On 12/20/19 6:48 PM, Delia Burduv wrote:
>>> >>> This patch adds the ARMv8.6 ACLE BFloat16 load intrinsics
>>> >>> vld<n>{q}_bf16 as part of the BFloat16 extension.
>>> >>> 
>>> (https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics) 
>>>
>>> >>>
>>> >>> The intrinsics are declared in arm_neon.h .
>>> >>> A new test is added to check assembler output.
>>> >>>
>>> >>> This patch depends on the Arm back-end patche.
>>> >>> (https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html)
>>> >>>
>>> >>> Tested for regression on arm-none-eabi and armeb-none-eabi. I don't
>>> >>> have commit rights, so if this is ok can someone please commit 
>>> it for
>>> >>> me?
>>> >>>
>>> >>> gcc/ChangeLog:
>>> >>>
>>> >>> 2019-11-14  Delia Burduv <delia.burduv@arm.com>
>>> >>>
>>> >>>      * config/arm/arm_neon.h (bfloat16_t): New typedef.
>>> >>>          (bfloat16x4x2_t): New typedef.
>>> >>>          (bfloat16x8x2_t): New typedef.
>>> >>>          (bfloat16x4x3_t): New typedef.
>>> >>>          (bfloat16x8x3_t): New typedef.
>>> >>>          (bfloat16x4x4_t): New typedef.
>>> >>>          (bfloat16x8x4_t): New typedef.
>>> >>>          (vld2_bf16): New.
>>> >>>      (vld2q_bf16): New.
>>> >>>      (vld3_bf16): New.
>>> >>>      (vld3q_bf16): New.
>>> >>>      (vld4_bf16): New.
>>> >>>      (vld4q_bf16): New.
>>> >>>      (vld2_dup_bf16): New.
>>> >>>      (vld2q_dup_bf16): New.
>>> >>>       (vld3_dup_bf16): New.
>>> >>>      (vld3q_dup_bf16): New.
>>> >>>      (vld4_dup_bf16): New.
>>> >>>      (vld4q_dup_bf16): New.
>>> >>>          * config/arm/arm-builtins.c (E_V2BFmode): New mode.
>>> >>>          (VAR13): New.
>>> >>>          (arm_simd_types[Bfloat16x2_t]):New type.
>>> >>>          * config/arm/arm-modes.def (V2BF): New mode.
>>> >>>          * config/arm/arm-simd-builtin-types.def
>>> >>>          (Bfloat16x2_t): New entry.
>>> >>>          * config/arm/arm_neon_builtins.def
>>> >>>          (vld2): Changed to VAR13 and added v4bf, v8bf
>>> >>>          (vld2_dup): Changed to VAR8 and added v4bf, v8bf
>>> >>>          (vld3): Changed to VAR13 and added v4bf, v8bf
>>> >>>          (vld3_dup): Changed to VAR8 and added v4bf, v8bf
>>> >>>          (vld4): Changed to VAR13 and added v4bf, v8bf
>>> >>>          (vld4_dup): Changed to VAR8 and added v4bf, v8bf
>>> >>>          * config/arm/iterators.md (VDXBF): New iterator.
>>> >>>          (VQ2BF): New iterator.
>>> >>>          (V_elem): Added V4BF, V8BF.
>>> >>>          (V_sz_elem): Added V4BF, V8BF.
>>> >>>          (V_mode_nunits): Added V4BF, V8BF.
>>> >>>          (q): Added V4BF, V8BF.
>>> >>>          *config/arm/neon.md (vld2): Used new iterators.
>>> >>>          (vld2_dup<mode>): Used new iterators.
>>> >>>          (vld2_dupv8bf): New.
>>> >>>          (vst3): Used new iterators.
>>> >>>          (vst3qa): Used new iterators.
>>> >>>          (vst3qb): Used new iterators.
>>> >>>          (vld3_dup<mode>): Used new iterators.
>>> >>>          (vld3_dupv8bf): New.
>>> >>>          (vst4): Used new iterators.
>>> >>>          (vst4qa): Used new iterators.
>>> >>>          (vst4qb): Used new iterators.
>>> >>>          (vld4_dup<mode>): Used new iterators.
>>> >>>          (vld4_dupv8bf): New.
>>> >>>
>>> >>>
>>> >>> gcc/testsuite/ChangeLog:
>>> >>>
>>> >>> 2019-11-14  Delia Burduv <delia.burduv@arm.com>
>>> >>>
>>> >>>      * gcc.target/arm/simd/bf16_vldn_1.c: New test.
>>
>>
>> diff --git a/gcc/testsuite/gcc.target/arm/simd/bf16_vldn_1.c 
>> b/gcc/testsuite/gcc.target/arm/simd/bf16_vldn_1.c
>> new file mode 100644
>> index 
>> 0000000000000000000000000000000000000000..7ff8b600827e5c2e313ce40d14382aa641b4bb31 
>>
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/arm/simd/bf16_vldn_1.c
>> @@ -0,0 +1,152 @@
>> +/* { dg-do assemble } */
>> +/* { dg-options "-save-temps" }  */
>> +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
>> +/* { dg-add-options arm_v8_2a_bf16_neon } */
>> +/* { dg-final { check-function-bodies "**" "" } } */
>>
>>
>> I think this should include an optimisation option like -O2 because...
>>
>>   +
>> +#include "arm_neon.h"
>> +
>> +
>> +/*
>> +**test_vld2_bf16:
>> +**    ...
>> +**    vld2.16    {d16-d17}, \[r3\]
>>
>> ... this is unstable codegen depending on the -O0 register allocator 
>> moving the ptr argument to r3 from its initial r0.
>> This should really be r0 and the load instruction should load the low 
>> D regs.
>> So let's add an -O2 to the dg-options and scan for the result of that.
>>
>>
>> Otherwise this is ok.
>> Thanks!
>> Kyrill
>>
>>
>>   +**    ...
>> +*/
>> +bfloat16x4x2_t
>> +test_vld2_bf16 (bfloat16_t * ptr)
>> +{
>> +  vld2_bf16 (ptr);
>> +}
>> +
>>

next prev parent reply	other threads:[~2020-03-06 10:45 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-20 19:04 Delia Burduv
2020-01-22 18:20 ` Delia Burduv
2020-01-28 17:18   ` Delia Burduv
2020-02-19 17:26   ` Delia Burduv
2020-03-04 14:05     ` Delia Burduv
2020-03-04 17:21       ` Kyrill Tkachov
2020-03-05 16:39         ` Delia Burduv
2020-03-06 10:45           ` Kyrill Tkachov [this message]
2020-03-09 10:18             ` Christophe Lyon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=42e0b20e-313a-5dba-e81c-d7cd3bb552c4@foss.arm.com \
    --to=kyrylo.tkachov@foss.arm.com \
    --cc=Ramana.Radhakrishnan@arm.com \
    --cc=Richard.Earnshaw@arm.com \
    --cc=delia.burduv@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=nickc@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).