From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 57409 invoked by alias); 6 Mar 2020 10:45:39 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 57333 invoked by uid 89); 6 Mar 2020 10:45:38 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-20.0 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,KAM_NUMSUBJECT,KAM_SHORT autolearn=ham version=3.3.1 spammy=H*M:5dba X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.110.172) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 06 Mar 2020 10:45:23 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 863DB31B; Fri, 6 Mar 2020 02:45:21 -0800 (PST) Received: from [10.2.80.62] (e120808-lin.cambridge.arm.com [10.2.80.62]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id C7E7B3F6C4; Fri, 6 Mar 2020 02:45:20 -0800 (PST) Subject: Re: ACLE intrinsics: BFloat16 load intrinsics for AArch32 To: Delia Burduv , "gcc-patches@gcc.gnu.org" Cc: "nickc@redhat.com" , Richard Earnshaw , Ramana Radhakrishnan References: <03fd9393-a25d-c1fb-535b-c4f39ea7decb@arm.com> <64238216-3612-f947-e2b0-407cb5110d9a@arm.com> <47885cba-033e-5222-eece-cd86f1adf11f@arm.com> <11518e9b-0b13-c1f5-1dea-eac88cdc464d@arm.com> <4b4edee7-e9e8-d12b-8f88-6c6be52e02a6@foss.arm.com> <03e394d8-9d16-ce0f-e478-e708b35bc3e1@arm.com> From: Kyrill Tkachov Message-ID: <42e0b20e-313a-5dba-e81c-d7cd3bb552c4@foss.arm.com> Date: Fri, 06 Mar 2020 10:45:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.7.1 MIME-Version: 1.0 In-Reply-To: <03e394d8-9d16-ce0f-e478-e708b35bc3e1@arm.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-SW-Source: 2020-03/txt/msg00359.txt Hi Delia, On 3/5/20 4:38 PM, Delia Burduv wrote: > Hi, > > This is the latest version of the patch. I am forcing -mfloat-abi=hard > because the code generated is slightly differently depending on the > float-abi used. Thanks, I've pushed it with an updated ChangeLog. 2020-03-06  Delia Burduv      * config/arm/arm_neon.h (vld2_bf16): New.     (vld2q_bf16): New.     (vld3_bf16): New.     (vld3q_bf16): New.     (vld4_bf16): New.     (vld4q_bf16): New.     (vld2_dup_bf16): New.     (vld2q_dup_bf16): New.     (vld3_dup_bf16): New.     (vld3q_dup_bf16): New.     (vld4_dup_bf16): New.     (vld4q_dup_bf16): New.     * config/arm/arm_neon_builtins.def     (vld2): Changed to VAR13 and added v4bf, v8bf     (vld2_dup): Changed to VAR8 and added v4bf, v8bf     (vld3): Changed to VAR13 and added v4bf, v8bf     (vld3_dup): Changed to VAR8 and added v4bf, v8bf     (vld4): Changed to VAR13 and added v4bf, v8bf     (vld4_dup): Changed to VAR8 and added v4bf, v8bf     * config/arm/iterators.md (VDXBF2): New iterator.     *config/arm/neon.md (neon_vld2): Use new iterators.     (neon_vld2_dup): Likewise.     (neon_vld3qa): Likewise.     (neon_vld3qb): Likewise.     (neon_vld3_dup): Likewise.     (neon_vld4): Likewise.     (neon_vld4qa): Likewise.     (neon_vld4qb): Likewise.     (neon_vld4_dup): Likewise.     (neon_vld2_dupv8bf): New.     (neon_vld3_dupv8bf): Likewise.     (neon_vld4_dupv8bf): Likewise. Kyrill > > Thanks, > Delia > > On 3/4/20 5:20 PM, Kyrill Tkachov wrote: >> Hi Delia, >> >> On 3/4/20 2:05 PM, Delia Burduv wrote: >>> Hi, >>> >>> The previous version of this patch shared part of its code with the >>> store intrinsics patch >>> (https://gcc.gnu.org/ml/gcc-patches/2020-03/msg00145.html) so I removed >>> any duplicated code. This patch now depends on the previously mentioned >>> store intrinsics patch. >>> >>> Here is the latest version and the updated ChangeLog. >>> >>> gcc/ChangeLog: >>> >>> 2019-03-04  Delia Burduv  >>> >>>         * config/arm/arm_neon.h (bfloat16_t): New typedef. >>>          (vld2_bf16): New. >>>         (vld2q_bf16): New. >>>         (vld3_bf16): New. >>>         (vld3q_bf16): New. >>>         (vld4_bf16): New. >>>         (vld4q_bf16): New. >>>         (vld2_dup_bf16): New. >>>         (vld2q_dup_bf16): New. >>>          (vld3_dup_bf16): New. >>>         (vld3q_dup_bf16): New. >>>         (vld4_dup_bf16): New. >>>         (vld4q_dup_bf16): New. >>>          * config/arm/arm_neon_builtins.def >>>          (vld2): Changed to VAR13 and added v4bf, v8bf >>>          (vld2_dup): Changed to VAR8 and added v4bf, v8bf >>>          (vld3): Changed to VAR13 and added v4bf, v8bf >>>          (vld3_dup): Changed to VAR8 and added v4bf, v8bf >>>          (vld4): Changed to VAR13 and added v4bf, v8bf >>>          (vld4_dup): Changed to VAR8 and added v4bf, v8bf >>>          * config/arm/iterators.md (VDXBF): New iterator. >>>          (VQ2BF): New iterator. >>>          *config/arm/neon.md (vld2): Used new iterators. >>>          (vld2_dup): Used new iterators. >>>          (vld2_dupv8bf): New. >>>          (vst3): Used new iterators. >>>          (vst3qa): Used new iterators. >>>          (vst3qb): Used new iterators. >>>          (vld3_dup): Used new iterators. >>>          (vld3_dupv8bf): New. >>>          (vst4): Used new iterators. >>>          (vst4qa): Used new iterators. >>>          (vst4qb): Used new iterators. >>>          (vld4_dup): Used new iterators. >>>          (vld4_dupv8bf): New. >>> >>> >>> gcc/testsuite/ChangeLog: >>> >>> 2019-03-04  Delia Burduv  >>> >>>         * gcc.target/arm/simd/bf16_vldn_1.c: New test. >>> >>> Thanks, >>> Delia >>> >>> On 2/19/20 5:25 PM, Delia Burduv wrote: >>> > >>> > Hi, >>> > >>> > Here is the latest version of the patch. It just has some minor >>> > formatting changes that were brought up by Richard Sandiford in the >>> > AArch64 patches >>> > >>> > Thanks, >>> > Delia >>> > >>> > On 1/22/20 5:31 PM, Delia Burduv wrote: >>> >> Ping. >>> >> >>> >> I will change the tests to use the exact input and output >>> registers as >>> >> Richard Sandiford suggested for the AArch64 patches. >>> >> >>> >> On 12/20/19 6:48 PM, Delia Burduv wrote: >>> >>> This patch adds the ARMv8.6 ACLE BFloat16 load intrinsics >>> >>> vld{q}_bf16 as part of the BFloat16 extension. >>> >>> >>> (https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics) >>> >>> >>> >>> >>> The intrinsics are declared in arm_neon.h . >>> >>> A new test is added to check assembler output. >>> >>> >>> >>> This patch depends on the Arm back-end patche. >>> >>> (https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html) >>> >>> >>> >>> Tested for regression on arm-none-eabi and armeb-none-eabi. I don't >>> >>> have commit rights, so if this is ok can someone please commit >>> it for >>> >>> me? >>> >>> >>> >>> gcc/ChangeLog: >>> >>> >>> >>> 2019-11-14  Delia Burduv >>> >>> >>> >>>      * config/arm/arm_neon.h (bfloat16_t): New typedef. >>> >>>          (bfloat16x4x2_t): New typedef. >>> >>>          (bfloat16x8x2_t): New typedef. >>> >>>          (bfloat16x4x3_t): New typedef. >>> >>>          (bfloat16x8x3_t): New typedef. >>> >>>          (bfloat16x4x4_t): New typedef. >>> >>>          (bfloat16x8x4_t): New typedef. >>> >>>          (vld2_bf16): New. >>> >>>      (vld2q_bf16): New. >>> >>>      (vld3_bf16): New. >>> >>>      (vld3q_bf16): New. >>> >>>      (vld4_bf16): New. >>> >>>      (vld4q_bf16): New. >>> >>>      (vld2_dup_bf16): New. >>> >>>      (vld2q_dup_bf16): New. >>> >>>       (vld3_dup_bf16): New. >>> >>>      (vld3q_dup_bf16): New. >>> >>>      (vld4_dup_bf16): New. >>> >>>      (vld4q_dup_bf16): New. >>> >>>          * config/arm/arm-builtins.c (E_V2BFmode): New mode. >>> >>>          (VAR13): New. >>> >>>          (arm_simd_types[Bfloat16x2_t]):New type. >>> >>>          * config/arm/arm-modes.def (V2BF): New mode. >>> >>>          * config/arm/arm-simd-builtin-types.def >>> >>>          (Bfloat16x2_t): New entry. >>> >>>          * config/arm/arm_neon_builtins.def >>> >>>          (vld2): Changed to VAR13 and added v4bf, v8bf >>> >>>          (vld2_dup): Changed to VAR8 and added v4bf, v8bf >>> >>>          (vld3): Changed to VAR13 and added v4bf, v8bf >>> >>>          (vld3_dup): Changed to VAR8 and added v4bf, v8bf >>> >>>          (vld4): Changed to VAR13 and added v4bf, v8bf >>> >>>          (vld4_dup): Changed to VAR8 and added v4bf, v8bf >>> >>>          * config/arm/iterators.md (VDXBF): New iterator. >>> >>>          (VQ2BF): New iterator. >>> >>>          (V_elem): Added V4BF, V8BF. >>> >>>          (V_sz_elem): Added V4BF, V8BF. >>> >>>          (V_mode_nunits): Added V4BF, V8BF. >>> >>>          (q): Added V4BF, V8BF. >>> >>>          *config/arm/neon.md (vld2): Used new iterators. >>> >>>          (vld2_dup): Used new iterators. >>> >>>          (vld2_dupv8bf): New. >>> >>>          (vst3): Used new iterators. >>> >>>          (vst3qa): Used new iterators. >>> >>>          (vst3qb): Used new iterators. >>> >>>          (vld3_dup): Used new iterators. >>> >>>          (vld3_dupv8bf): New. >>> >>>          (vst4): Used new iterators. >>> >>>          (vst4qa): Used new iterators. >>> >>>          (vst4qb): Used new iterators. >>> >>>          (vld4_dup): Used new iterators. >>> >>>          (vld4_dupv8bf): New. >>> >>> >>> >>> >>> >>> gcc/testsuite/ChangeLog: >>> >>> >>> >>> 2019-11-14  Delia Burduv >>> >>> >>> >>>      * gcc.target/arm/simd/bf16_vldn_1.c: New test. >> >> >> diff --git a/gcc/testsuite/gcc.target/arm/simd/bf16_vldn_1.c >> b/gcc/testsuite/gcc.target/arm/simd/bf16_vldn_1.c >> new file mode 100644 >> index >> 0000000000000000000000000000000000000000..7ff8b600827e5c2e313ce40d14382aa641b4bb31 >> >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/arm/simd/bf16_vldn_1.c >> @@ -0,0 +1,152 @@ >> +/* { dg-do assemble } */ >> +/* { dg-options "-save-temps" }  */ >> +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */ >> +/* { dg-add-options arm_v8_2a_bf16_neon } */ >> +/* { dg-final { check-function-bodies "**" "" } } */ >> >> >> I think this should include an optimisation option like -O2 because... >> >>   + >> +#include "arm_neon.h" >> + >> + >> +/* >> +**test_vld2_bf16: >> +**    ... >> +**    vld2.16    {d16-d17}, \[r3\] >> >> ... this is unstable codegen depending on the -O0 register allocator >> moving the ptr argument to r3 from its initial r0. >> This should really be r0 and the load instruction should load the low >> D regs. >> So let's add an -O2 to the dg-options and scan for the result of that. >> >> >> Otherwise this is ok. >> Thanks! >> Kyrill >> >> >>   +**    ... >> +*/ >> +bfloat16x4x2_t >> +test_vld2_bf16 (bfloat16_t * ptr) >> +{ >> +  vld2_bf16 (ptr); >> +} >> + >>