From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ot1-x343.google.com (mail-ot1-x343.google.com [IPv6:2607:f8b0:4864:20::343]) by server2.sourceware.org (Postfix) with ESMTPS id 268323944400 for ; Mon, 9 Mar 2020 10:18:36 +0000 (GMT) Received: by mail-ot1-x343.google.com with SMTP id j14so8997182otq.3 for ; Mon, 09 Mar 2020 03:18:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=IPLm+01gpxpqRby5dCol/ShR/lUkL1PA1hjzfQLc8EE=; b=ccneKiJlApSSZsR235+dw+gDjc25CyDSid/XTeHWDz71yPCjecqnhQuG7rmwZP+3XP mnhTlOr0kOkFg3ySWkMKryP02xc6TZibrUdWOX2hvCje/Elw+j4/LrfOiEtpMHWwi+bd 3FBK6NStvy7eC+3nNIKoIrOrMZANdnZ5dc4cO+TapXYFonRI4VeEGU/O+VN4pTV3fo1j 6/KxO3Giid6V5/SJN+GYvbeVVCanLHCJyGOlVkNHGgV/l5W73MEI2SMFuEbEsBelwV3x v6vSfAvoJMUg9E5YbM9pK8bqw1T90pvUGNCuUtbD3GM2jqXeagJI6rI9j4cbQOrzDHTP Kaqw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=IPLm+01gpxpqRby5dCol/ShR/lUkL1PA1hjzfQLc8EE=; b=ryvB+qtoGsUryXj71YDkqI3n4V+bwALv2wN+vTSK0/ECVYFtCzDMOuJh0V8IlmeUWU qFp+w2bP+u201AWoDV48kKjNL7EyWEx3mtRbtbZ+O8xVWN1rThxjkHIrkUf6NwfOk6TE E+vmzi2SNLK1QIxZlPIwNUDiOUOgjfJ57VR1GDO4wrMpJfwkA5ShCAkvVuqLzK8lsmkO NSTY7/mByLHgJGbDVPSnj8LeVeTnMJnqhtQ6+6gXsRoqdEG4xq9j9pf7doVqAF8OuQRi vufHajFcdRJ2rd1i4KJshsTsoDTLLp5zO4UYXY/0q6WwTRalbPZON9ZMP0BjKrQWBnwO s/hQ== X-Gm-Message-State: ANhLgQ2+HLFFsGzQ9XZhjVff/tAvWZDJK1gUDhoHWwA8i3b/w7gICU9g jLN2dUWOFuaa5q80GLiJzyn5Vi9CQ9wLZAttNX841A== X-Google-Smtp-Source: ADFU+vufwgvNntkqtopTUIOkjsIfGguXQlni/7ZbYD6FQLcxjuUm61QmgoYOYNtnBOg5R/Pk1RqMZgb4mVbjXMf3zfQ= X-Received: by 2002:a9d:518e:: with SMTP id y14mr12553773otg.273.1583749115163; Mon, 09 Mar 2020 03:18:35 -0700 (PDT) MIME-Version: 1.0 References: <03fd9393-a25d-c1fb-535b-c4f39ea7decb@arm.com> <64238216-3612-f947-e2b0-407cb5110d9a@arm.com> <47885cba-033e-5222-eece-cd86f1adf11f@arm.com> <11518e9b-0b13-c1f5-1dea-eac88cdc464d@arm.com> <4b4edee7-e9e8-d12b-8f88-6c6be52e02a6@foss.arm.com> <03e394d8-9d16-ce0f-e478-e708b35bc3e1@arm.com> <42e0b20e-313a-5dba-e81c-d7cd3bb552c4@foss.arm.com> In-Reply-To: <42e0b20e-313a-5dba-e81c-d7cd3bb552c4@foss.arm.com> From: Christophe Lyon Date: Mon, 9 Mar 2020 11:18:24 +0100 Message-ID: Subject: Re: ACLE intrinsics: BFloat16 load intrinsics for AArch32 To: Kyrill Tkachov Cc: Delia Burduv , "gcc-patches@gcc.gnu.org" , "nickc@redhat.com" , Richard Earnshaw , Ramana Radhakrishnan Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-27.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Mar 2020 10:18:37 -0000 On Fri, 6 Mar 2020 at 11:46, Kyrill Tkachov wrote: > > Hi Delia, > > On 3/5/20 4:38 PM, Delia Burduv wrote: > > Hi, > > > > This is the latest version of the patch. I am forcing -mfloat-abi=hard > > because the code generated is slightly differently depending on the > > float-abi used. > > > Thanks, I've pushed it with an updated ChangeLog. > > 2020-03-06 Delia Burduv > > * config/arm/arm_neon.h (vld2_bf16): New. > (vld2q_bf16): New. > (vld3_bf16): New. > (vld3q_bf16): New. > (vld4_bf16): New. > (vld4q_bf16): New. > (vld2_dup_bf16): New. > (vld2q_dup_bf16): New. > (vld3_dup_bf16): New. > (vld3q_dup_bf16): New. > (vld4_dup_bf16): New. > (vld4q_dup_bf16): New. > * config/arm/arm_neon_builtins.def > (vld2): Changed to VAR13 and added v4bf, v8bf > (vld2_dup): Changed to VAR8 and added v4bf, v8bf > (vld3): Changed to VAR13 and added v4bf, v8bf > (vld3_dup): Changed to VAR8 and added v4bf, v8bf > (vld4): Changed to VAR13 and added v4bf, v8bf > (vld4_dup): Changed to VAR8 and added v4bf, v8bf > * config/arm/iterators.md (VDXBF2): New iterator. > *config/arm/neon.md (neon_vld2): Use new iterators. > (neon_vld2_dup (neon_vld3): Likewise. > (neon_vld3qa): Likewise. > (neon_vld3qb): Likewise. > (neon_vld3_dup): Likewise. > (neon_vld4): Likewise. > (neon_vld4qa): Likewise. > (neon_vld4qb): Likewise. > (neon_vld4_dup): Likewise. > (neon_vld2_dupv8bf): New. > (neon_vld3_dupv8bf): Likewise. > (neon_vld4_dupv8bf): Likewise. > > Kyrill Hi! There's a problem with the arm_neon.h update. on arm-none-linux-gnueabihf, there is a regression on g++.dg/other/pr54300.C and g++.dg/other/pr55073.C, because: FAIL: g++.dg/other/pr54300.C -std=gnu++98 (test for excess errors) Excess errors: /aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabihf/gcc3/gcc/include/arm_neon.h:19565:39: error: cannot convert 'const short int*' to 'const __bf16*' /aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabihf/gcc3/gcc/include/arm_neon.h:19574:39: error: cannot convert 'const short int*' to 'const __bf16*' [....] The same problem makes a lot (~365) of tests become unsupported on arm-none-linux-gnueabi: g++.dg/abi/mangle-arm-crypto.C g++.dg/abi/mangle-neon.C Can you fix it? Thanks Christophe > > > > > > Thanks, > > Delia > > > > On 3/4/20 5:20 PM, Kyrill Tkachov wrote: > >> Hi Delia, > >> > >> On 3/4/20 2:05 PM, Delia Burduv wrote: > >>> Hi, > >>> > >>> The previous version of this patch shared part of its code with the > >>> store intrinsics patch > >>> (https://gcc.gnu.org/ml/gcc-patches/2020-03/msg00145.html) so I removed > >>> any duplicated code. This patch now depends on the previously mentioned > >>> store intrinsics patch. > >>> > >>> Here is the latest version and the updated ChangeLog. > >>> > >>> gcc/ChangeLog: > >>> > >>> 2019-03-04 Delia Burduv > >>> > >>> * config/arm/arm_neon.h (bfloat16_t): New typedef. > >>> (vld2_bf16): New. > >>> (vld2q_bf16): New. > >>> (vld3_bf16): New. > >>> (vld3q_bf16): New. > >>> (vld4_bf16): New. > >>> (vld4q_bf16): New. > >>> (vld2_dup_bf16): New. > >>> (vld2q_dup_bf16): New. > >>> (vld3_dup_bf16): New. > >>> (vld3q_dup_bf16): New. > >>> (vld4_dup_bf16): New. > >>> (vld4q_dup_bf16): New. > >>> * config/arm/arm_neon_builtins.def > >>> (vld2): Changed to VAR13 and added v4bf, v8bf > >>> (vld2_dup): Changed to VAR8 and added v4bf, v8bf > >>> (vld3): Changed to VAR13 and added v4bf, v8bf > >>> (vld3_dup): Changed to VAR8 and added v4bf, v8bf > >>> (vld4): Changed to VAR13 and added v4bf, v8bf > >>> (vld4_dup): Changed to VAR8 and added v4bf, v8bf > >>> * config/arm/iterators.md (VDXBF): New iterator. > >>> (VQ2BF): New iterator. > >>> *config/arm/neon.md (vld2): Used new iterators. > >>> (vld2_dup): Used new iterators. > >>> (vld2_dupv8bf): New. > >>> (vst3): Used new iterators. > >>> (vst3qa): Used new iterators. > >>> (vst3qb): Used new iterators. > >>> (vld3_dup): Used new iterators. > >>> (vld3_dupv8bf): New. > >>> (vst4): Used new iterators. > >>> (vst4qa): Used new iterators. > >>> (vst4qb): Used new iterators. > >>> (vld4_dup): Used new iterators. > >>> (vld4_dupv8bf): New. > >>> > >>> > >>> gcc/testsuite/ChangeLog: > >>> > >>> 2019-03-04 Delia Burduv > >>> > >>> * gcc.target/arm/simd/bf16_vldn_1.c: New test. > >>> > >>> Thanks, > >>> Delia > >>> > >>> On 2/19/20 5:25 PM, Delia Burduv wrote: > >>> > > >>> > Hi, > >>> > > >>> > Here is the latest version of the patch. It just has some minor > >>> > formatting changes that were brought up by Richard Sandiford in the > >>> > AArch64 patches > >>> > > >>> > Thanks, > >>> > Delia > >>> > > >>> > On 1/22/20 5:31 PM, Delia Burduv wrote: > >>> >> Ping. > >>> >> > >>> >> I will change the tests to use the exact input and output > >>> registers as > >>> >> Richard Sandiford suggested for the AArch64 patches. > >>> >> > >>> >> On 12/20/19 6:48 PM, Delia Burduv wrote: > >>> >>> This patch adds the ARMv8.6 ACLE BFloat16 load intrinsics > >>> >>> vld{q}_bf16 as part of the BFloat16 extension. > >>> >>> > >>> (https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics) > >>> > >>> >>> > >>> >>> The intrinsics are declared in arm_neon.h . > >>> >>> A new test is added to check assembler output. > >>> >>> > >>> >>> This patch depends on the Arm back-end patche. > >>> >>> (https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html) > >>> >>> > >>> >>> Tested for regression on arm-none-eabi and armeb-none-eabi. I don't > >>> >>> have commit rights, so if this is ok can someone please commit > >>> it for > >>> >>> me? > >>> >>> > >>> >>> gcc/ChangeLog: > >>> >>> > >>> >>> 2019-11-14 Delia Burduv > >>> >>> > >>> >>> * config/arm/arm_neon.h (bfloat16_t): New typedef. > >>> >>> (bfloat16x4x2_t): New typedef. > >>> >>> (bfloat16x8x2_t): New typedef. > >>> >>> (bfloat16x4x3_t): New typedef. > >>> >>> (bfloat16x8x3_t): New typedef. > >>> >>> (bfloat16x4x4_t): New typedef. > >>> >>> (bfloat16x8x4_t): New typedef. > >>> >>> (vld2_bf16): New. > >>> >>> (vld2q_bf16): New. > >>> >>> (vld3_bf16): New. > >>> >>> (vld3q_bf16): New. > >>> >>> (vld4_bf16): New. > >>> >>> (vld4q_bf16): New. > >>> >>> (vld2_dup_bf16): New. > >>> >>> (vld2q_dup_bf16): New. > >>> >>> (vld3_dup_bf16): New. > >>> >>> (vld3q_dup_bf16): New. > >>> >>> (vld4_dup_bf16): New. > >>> >>> (vld4q_dup_bf16): New. > >>> >>> * config/arm/arm-builtins.c (E_V2BFmode): New mode. > >>> >>> (VAR13): New. > >>> >>> (arm_simd_types[Bfloat16x2_t]):New type. > >>> >>> * config/arm/arm-modes.def (V2BF): New mode. > >>> >>> * config/arm/arm-simd-builtin-types.def > >>> >>> (Bfloat16x2_t): New entry. > >>> >>> * config/arm/arm_neon_builtins.def > >>> >>> (vld2): Changed to VAR13 and added v4bf, v8bf > >>> >>> (vld2_dup): Changed to VAR8 and added v4bf, v8bf > >>> >>> (vld3): Changed to VAR13 and added v4bf, v8bf > >>> >>> (vld3_dup): Changed to VAR8 and added v4bf, v8bf > >>> >>> (vld4): Changed to VAR13 and added v4bf, v8bf > >>> >>> (vld4_dup): Changed to VAR8 and added v4bf, v8bf > >>> >>> * config/arm/iterators.md (VDXBF): New iterator. > >>> >>> (VQ2BF): New iterator. > >>> >>> (V_elem): Added V4BF, V8BF. > >>> >>> (V_sz_elem): Added V4BF, V8BF. > >>> >>> (V_mode_nunits): Added V4BF, V8BF. > >>> >>> (q): Added V4BF, V8BF. > >>> >>> *config/arm/neon.md (vld2): Used new iterators. > >>> >>> (vld2_dup): Used new iterators. > >>> >>> (vld2_dupv8bf): New. > >>> >>> (vst3): Used new iterators. > >>> >>> (vst3qa): Used new iterators. > >>> >>> (vst3qb): Used new iterators. > >>> >>> (vld3_dup): Used new iterators. > >>> >>> (vld3_dupv8bf): New. > >>> >>> (vst4): Used new iterators. > >>> >>> (vst4qa): Used new iterators. > >>> >>> (vst4qb): Used new iterators. > >>> >>> (vld4_dup): Used new iterators. > >>> >>> (vld4_dupv8bf): New. > >>> >>> > >>> >>> > >>> >>> gcc/testsuite/ChangeLog: > >>> >>> > >>> >>> 2019-11-14 Delia Burduv > >>> >>> > >>> >>> * gcc.target/arm/simd/bf16_vldn_1.c: New test. > >> > >> > >> diff --git a/gcc/testsuite/gcc.target/arm/simd/bf16_vldn_1.c > >> b/gcc/testsuite/gcc.target/arm/simd/bf16_vldn_1.c > >> new file mode 100644 > >> index > >> 0000000000000000000000000000000000000000..7ff8b600827e5c2e313ce40d14382aa641b4bb31 > >> > >> --- /dev/null > >> +++ b/gcc/testsuite/gcc.target/arm/simd/bf16_vldn_1.c > >> @@ -0,0 +1,152 @@ > >> +/* { dg-do assemble } */ > >> +/* { dg-options "-save-temps" } */ > >> +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */ > >> +/* { dg-add-options arm_v8_2a_bf16_neon } */ > >> +/* { dg-final { check-function-bodies "**" "" } } */ > >> > >> > >> I think this should include an optimisation option like -O2 because... > >> > >> + > >> +#include "arm_neon.h" > >> + > >> + > >> +/* > >> +**test_vld2_bf16: > >> +** ... > >> +** vld2.16 {d16-d17}, \[r3\] > >> > >> ... this is unstable codegen depending on the -O0 register allocator > >> moving the ptr argument to r3 from its initial r0. > >> This should really be r0 and the load instruction should load the low > >> D regs. > >> So let's add an -O2 to the dg-options and scan for the result of that. > >> > >> > >> Otherwise this is ok. > >> Thanks! > >> Kyrill > >> > >> > >> +** ... > >> +*/ > >> +bfloat16x4x2_t > >> +test_vld2_bf16 (bfloat16_t * ptr) > >> +{ > >> + vld2_bf16 (ptr); > >> +} > >> + > >>