From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-520643-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 109630 invoked by alias); 4 Mar 2020 17:21:04 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 109586 invoked by uid 89); 4 Mar 2020 17:21:04 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-20.0 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,KAM_NUMSUBJECT,KAM_SHORT autolearn=ham version=3.3.1 spammy=H*f:sk:11518e9, H*MI:sk:11518e9, H*i:sk:11518e9
X-HELO: foss.arm.com
Received: from foss.arm.com (HELO foss.arm.com) (217.140.110.172) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 04 Mar 2020 17:21:01 +0000
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])	by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 6163E31B;	Wed,  4 Mar 2020 09:21:00 -0800 (PST)
Received: from [10.2.80.62] (e120808-lin.cambridge.arm.com [10.2.80.62])	by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id AE38F3F6CF;	Wed,  4 Mar 2020 09:20:59 -0800 (PST)
Subject: Re: ACLE intrinsics: BFloat16 load intrinsics for AArch32
To: Delia Burduv <Delia.Burduv@arm.com>, "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Cc: "nickc@redhat.com" <nickc@redhat.com>, Richard Earnshaw <Richard.Earnshaw@arm.com>, Ramana Radhakrishnan <Ramana.Radhakrishnan@arm.com>
References: <03fd9393-a25d-c1fb-535b-c4f39ea7decb@arm.com> <64238216-3612-f947-e2b0-407cb5110d9a@arm.com> <47885cba-033e-5222-eece-cd86f1adf11f@arm.com> <11518e9b-0b13-c1f5-1dea-eac88cdc464d@arm.com>
From: Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>
Message-ID: <4b4edee7-e9e8-d12b-8f88-6c6be52e02a6@foss.arm.com>
Date: Wed, 04 Mar 2020 17:21:00 -0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.7.1
MIME-Version: 1.0
In-Reply-To: <11518e9b-0b13-c1f5-1dea-eac88cdc464d@arm.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
X-SW-Source: 2020-03/txt/msg00232.txt

Hi Delia,

On 3/4/20 2:05 PM, Delia Burduv wrote:
> Hi,
>
> The previous version of this patch shared part of its code with the
> store intrinsics patch
> (https://gcc.gnu.org/ml/gcc-patches/2020-03/msg00145.html) so I removed
> any duplicated code. This patch now depends on the previously mentioned
> store intrinsics patch.
>
> Here is the latest version and the updated ChangeLog.
>
> gcc/ChangeLog:
>
> 2019-03-04  Delia Burduv  <delia.burduv@arm.com>
>
>         * config/arm/arm_neon.h (bfloat16_t): New typedef.
>          (vld2_bf16): New.
>         (vld2q_bf16): New.
>         (vld3_bf16): New.
>         (vld3q_bf16): New.
>         (vld4_bf16): New.
>         (vld4q_bf16): New.
>         (vld2_dup_bf16): New.
>         (vld2q_dup_bf16): New.
>          (vld3_dup_bf16): New.
>         (vld3q_dup_bf16): New.
>         (vld4_dup_bf16): New.
>         (vld4q_dup_bf16): New.
>          * config/arm/arm_neon_builtins.def
>          (vld2): Changed to VAR13 and added v4bf, v8bf
>          (vld2_dup): Changed to VAR8 and added v4bf, v8bf
>          (vld3): Changed to VAR13 and added v4bf, v8bf
>          (vld3_dup): Changed to VAR8 and added v4bf, v8bf
>          (vld4): Changed to VAR13 and added v4bf, v8bf
>          (vld4_dup): Changed to VAR8 and added v4bf, v8bf
>          * config/arm/iterators.md (VDXBF): New iterator.
>          (VQ2BF): New iterator.
>          *config/arm/neon.md (vld2): Used new iterators.
>          (vld2_dup<mode>): Used new iterators.
>          (vld2_dupv8bf): New.
>          (vst3): Used new iterators.
>          (vst3qa): Used new iterators.
>          (vst3qb): Used new iterators.
>          (vld3_dup<mode>): Used new iterators.
>          (vld3_dupv8bf): New.
>          (vst4): Used new iterators.
>          (vst4qa): Used new iterators.
>          (vst4qb): Used new iterators.
>          (vld4_dup<mode>): Used new iterators.
>          (vld4_dupv8bf): New.
>
>
> gcc/testsuite/ChangeLog:
>
> 2019-03-04  Delia Burduv  <delia.burduv@arm.com>
>
>         * gcc.target/arm/simd/bf16_vldn_1.c: New test.
>
> Thanks,
> Delia
>
> On 2/19/20 5:25 PM, Delia Burduv wrote:
> >
> > Hi,
> >
> > Here is the latest version of the patch. It just has some minor
> > formatting changes that were brought up by Richard Sandiford in the
> > AArch64 patches
> >
> > Thanks,
> > Delia
> >
> > On 1/22/20 5:31 PM, Delia Burduv wrote:
> >> Ping.
> >>
> >> I will change the tests to use the exact input and output registers as
> >> Richard Sandiford suggested for the AArch64 patches.
> >>
> >> On 12/20/19 6:48 PM, Delia Burduv wrote:
> >>> This patch adds the ARMv8.6 ACLE BFloat16 load intrinsics
> >>> vld<n>{q}_bf16 as part of the BFloat16 extension.
> >>> 
> (https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics) 
>
> >>>
> >>> The intrinsics are declared in arm_neon.h .
> >>> A new test is added to check assembler output.
> >>>
> >>> This patch depends on the Arm back-end patche.
> >>> (https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html)
> >>>
> >>> Tested for regression on arm-none-eabi and armeb-none-eabi. I don't
> >>> have commit rights, so if this is ok can someone please commit it for
> >>> me?
> >>>
> >>> gcc/ChangeLog:
> >>>
> >>> 2019-11-14  Delia Burduv <delia.burduv@arm.com>
> >>>
> >>>      * config/arm/arm_neon.h (bfloat16_t): New typedef.
> >>>          (bfloat16x4x2_t): New typedef.
> >>>          (bfloat16x8x2_t): New typedef.
> >>>          (bfloat16x4x3_t): New typedef.
> >>>          (bfloat16x8x3_t): New typedef.
> >>>          (bfloat16x4x4_t): New typedef.
> >>>          (bfloat16x8x4_t): New typedef.
> >>>          (vld2_bf16): New.
> >>>      (vld2q_bf16): New.
> >>>      (vld3_bf16): New.
> >>>      (vld3q_bf16): New.
> >>>      (vld4_bf16): New.
> >>>      (vld4q_bf16): New.
> >>>      (vld2_dup_bf16): New.
> >>>      (vld2q_dup_bf16): New.
> >>>       (vld3_dup_bf16): New.
> >>>      (vld3q_dup_bf16): New.
> >>>      (vld4_dup_bf16): New.
> >>>      (vld4q_dup_bf16): New.
> >>>          * config/arm/arm-builtins.c (E_V2BFmode): New mode.
> >>>          (VAR13): New.
> >>>          (arm_simd_types[Bfloat16x2_t]):New type.
> >>>          * config/arm/arm-modes.def (V2BF): New mode.
> >>>          * config/arm/arm-simd-builtin-types.def
> >>>          (Bfloat16x2_t): New entry.
> >>>          * config/arm/arm_neon_builtins.def
> >>>          (vld2): Changed to VAR13 and added v4bf, v8bf
> >>>          (vld2_dup): Changed to VAR8 and added v4bf, v8bf
> >>>          (vld3): Changed to VAR13 and added v4bf, v8bf
> >>>          (vld3_dup): Changed to VAR8 and added v4bf, v8bf
> >>>          (vld4): Changed to VAR13 and added v4bf, v8bf
> >>>          (vld4_dup): Changed to VAR8 and added v4bf, v8bf
> >>>          * config/arm/iterators.md (VDXBF): New iterator.
> >>>          (VQ2BF): New iterator.
> >>>          (V_elem): Added V4BF, V8BF.
> >>>          (V_sz_elem): Added V4BF, V8BF.
> >>>          (V_mode_nunits): Added V4BF, V8BF.
> >>>          (q): Added V4BF, V8BF.
> >>>          *config/arm/neon.md (vld2): Used new iterators.
> >>>          (vld2_dup<mode>): Used new iterators.
> >>>          (vld2_dupv8bf): New.
> >>>          (vst3): Used new iterators.
> >>>          (vst3qa): Used new iterators.
> >>>          (vst3qb): Used new iterators.
> >>>          (vld3_dup<mode>): Used new iterators.
> >>>          (vld3_dupv8bf): New.
> >>>          (vst4): Used new iterators.
> >>>          (vst4qa): Used new iterators.
> >>>          (vst4qb): Used new iterators.
> >>>          (vld4_dup<mode>): Used new iterators.
> >>>          (vld4_dupv8bf): New.
> >>>
> >>>
> >>> gcc/testsuite/ChangeLog:
> >>>
> >>> 2019-11-14  Delia Burduv <delia.burduv@arm.com>
> >>>
> >>>      * gcc.target/arm/simd/bf16_vldn_1.c: New test.
diff --git a/gcc/testsuite/gcc.target/arm/simd/bf16_vldn_1.c b/gcc/testsuite/gcc.target/arm/simd/bf16_vldn_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..7ff8b600827e5c2e313ce40d14382aa641b4bb31
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/bf16_vldn_1.c
@@ -0,0 +1,152 @@
+/* { dg-do assemble } */
+/* { dg-options "-save-temps" }  */
+/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
+/* { dg-add-options arm_v8_2a_bf16_neon } */
+/* { dg-final { check-function-bodies "**" "" } } */


I think this should include an optimisation option like -O2 because...

  +
+#include "arm_neon.h"
+
+
+/*
+**test_vld2_bf16:
+**	...
+**	vld2.16	{d16-d17}, \[r3\]

... this is unstable codegen depending on the -O0 register allocator moving the ptr argument to r3 from its initial r0.
This should really be r0 and the load instruction should load the low D regs.
So let's add an -O2 to the dg-options and scan for the result of that.


Otherwise this is ok.
Thanks!
Kyrill


  +**	...
+*/
+bfloat16x4x2_t
+test_vld2_bf16 (bfloat16_t * ptr)
+{
+  vld2_bf16 (ptr);
+}
+