From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <christophe.lyon@linaro.org>
Received: from mail-ot1-x343.google.com (mail-ot1-x343.google.com
 [IPv6:2607:f8b0:4864:20::343])
 by server2.sourceware.org (Postfix) with ESMTPS id 268323944400
 for <gcc-patches@gcc.gnu.org>; Mon,  9 Mar 2020 10:18:36 +0000 (GMT)
Received: by mail-ot1-x343.google.com with SMTP id j14so8997182otq.3
 for <gcc-patches@gcc.gnu.org>; Mon, 09 Mar 2020 03:18:36 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google;
 h=mime-version:references:in-reply-to:from:date:message-id:subject:to
 :cc; bh=IPLm+01gpxpqRby5dCol/ShR/lUkL1PA1hjzfQLc8EE=;
 b=ccneKiJlApSSZsR235+dw+gDjc25CyDSid/XTeHWDz71yPCjecqnhQuG7rmwZP+3XP
 mnhTlOr0kOkFg3ySWkMKryP02xc6TZibrUdWOX2hvCje/Elw+j4/LrfOiEtpMHWwi+bd
 3FBK6NStvy7eC+3nNIKoIrOrMZANdnZ5dc4cO+TapXYFonRI4VeEGU/O+VN4pTV3fo1j
 6/KxO3Giid6V5/SJN+GYvbeVVCanLHCJyGOlVkNHGgV/l5W73MEI2SMFuEbEsBelwV3x
 v6vSfAvoJMUg9E5YbM9pK8bqw1T90pvUGNCuUtbD3GM2jqXeagJI6rI9j4cbQOrzDHTP
 Kaqw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=IPLm+01gpxpqRby5dCol/ShR/lUkL1PA1hjzfQLc8EE=;
 b=ryvB+qtoGsUryXj71YDkqI3n4V+bwALv2wN+vTSK0/ECVYFtCzDMOuJh0V8IlmeUWU
 qFp+w2bP+u201AWoDV48kKjNL7EyWEx3mtRbtbZ+O8xVWN1rThxjkHIrkUf6NwfOk6TE
 E+vmzi2SNLK1QIxZlPIwNUDiOUOgjfJ57VR1GDO4wrMpJfwkA5ShCAkvVuqLzK8lsmkO
 NSTY7/mByLHgJGbDVPSnj8LeVeTnMJnqhtQ6+6gXsRoqdEG4xq9j9pf7doVqAF8OuQRi
 vufHajFcdRJ2rd1i4KJshsTsoDTLLp5zO4UYXY/0q6WwTRalbPZON9ZMP0BjKrQWBnwO
 s/hQ==
X-Gm-Message-State: ANhLgQ2+HLFFsGzQ9XZhjVff/tAvWZDJK1gUDhoHWwA8i3b/w7gICU9g
 jLN2dUWOFuaa5q80GLiJzyn5Vi9CQ9wLZAttNX841A==
X-Google-Smtp-Source: ADFU+vufwgvNntkqtopTUIOkjsIfGguXQlni/7ZbYD6FQLcxjuUm61QmgoYOYNtnBOg5R/Pk1RqMZgb4mVbjXMf3zfQ=
X-Received: by 2002:a9d:518e:: with SMTP id y14mr12553773otg.273.1583749115163; 
 Mon, 09 Mar 2020 03:18:35 -0700 (PDT)
MIME-Version: 1.0
References: <03fd9393-a25d-c1fb-535b-c4f39ea7decb@arm.com>
 <64238216-3612-f947-e2b0-407cb5110d9a@arm.com>
 <47885cba-033e-5222-eece-cd86f1adf11f@arm.com>
 <11518e9b-0b13-c1f5-1dea-eac88cdc464d@arm.com>
 <4b4edee7-e9e8-d12b-8f88-6c6be52e02a6@foss.arm.com>
 <03e394d8-9d16-ce0f-e478-e708b35bc3e1@arm.com>
 <42e0b20e-313a-5dba-e81c-d7cd3bb552c4@foss.arm.com>
In-Reply-To: <42e0b20e-313a-5dba-e81c-d7cd3bb552c4@foss.arm.com>
From: Christophe Lyon <christophe.lyon@linaro.org>
Date: Mon, 9 Mar 2020 11:18:24 +0100
Message-ID: <CAKdteObKeLE=jgMMTAMho7sAh75qmQCpQtDuomP8fOzB-fdsEA@mail.gmail.com>
Subject: Re: ACLE intrinsics: BFloat16 load intrinsics for AArch32
To: Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>
Cc: Delia Burduv <delia.burduv@arm.com>, 
 "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
 "nickc@redhat.com" <nickc@redhat.com>, 
 Richard Earnshaw <Richard.Earnshaw@arm.com>, 
 Ramana Radhakrishnan <Ramana.Radhakrishnan@arm.com>
Content-Type: text/plain; charset="UTF-8"
X-Spam-Status: No, score=-27.1 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, GIT_PATCH_1,
 GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,
 SPF_PASS autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <http://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <http://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <http://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Mon, 09 Mar 2020 10:18:37 -0000

On Fri, 6 Mar 2020 at 11:46, Kyrill Tkachov <kyrylo.tkachov@foss.arm.com> wrote:
>
> Hi Delia,
>
> On 3/5/20 4:38 PM, Delia Burduv wrote:
> > Hi,
> >
> > This is the latest version of the patch. I am forcing -mfloat-abi=hard
> > because the code generated is slightly differently depending on the
> > float-abi used.
>
>
> Thanks, I've pushed it with an updated ChangeLog.
>
> 2020-03-06  Delia Burduv  <delia.burduv@arm.com>
>
>      * config/arm/arm_neon.h (vld2_bf16): New.
>      (vld2q_bf16): New.
>      (vld3_bf16): New.
>      (vld3q_bf16): New.
>      (vld4_bf16): New.
>      (vld4q_bf16): New.
>      (vld2_dup_bf16): New.
>      (vld2q_dup_bf16): New.
>      (vld3_dup_bf16): New.
>      (vld3q_dup_bf16): New.
>      (vld4_dup_bf16): New.
>      (vld4q_dup_bf16): New.
>      * config/arm/arm_neon_builtins.def
>      (vld2): Changed to VAR13 and added v4bf, v8bf
>      (vld2_dup): Changed to VAR8 and added v4bf, v8bf
>      (vld3): Changed to VAR13 and added v4bf, v8bf
>      (vld3_dup): Changed to VAR8 and added v4bf, v8bf
>      (vld4): Changed to VAR13 and added v4bf, v8bf
>      (vld4_dup): Changed to VAR8 and added v4bf, v8bf
>      * config/arm/iterators.md (VDXBF2): New iterator.
>      *config/arm/neon.md (neon_vld2): Use new iterators.
>      (neon_vld2_dup<mode): Use new iterators.
>      (neon_vld3<mode>): Likewise.
>      (neon_vld3qa<mode>): Likewise.
>      (neon_vld3qb<mode>): Likewise.
>      (neon_vld3_dup<mode>): Likewise.
>      (neon_vld4<mode>): Likewise.
>      (neon_vld4qa<mode>): Likewise.
>      (neon_vld4qb<mode>): Likewise.
>      (neon_vld4_dup<mode>): Likewise.
>      (neon_vld2_dupv8bf): New.
>      (neon_vld3_dupv8bf): Likewise.
>      (neon_vld4_dupv8bf): Likewise.
>
> Kyrill

Hi!

There's a problem with the arm_neon.h update.
on arm-none-linux-gnueabihf, there is a regression on
g++.dg/other/pr54300.C and g++.dg/other/pr55073.C, because:
FAIL: g++.dg/other/pr54300.C  -std=gnu++98 (test for excess errors)
Excess errors:
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabihf/gcc3/gcc/include/arm_neon.h:19565:39:
error: cannot convert 'const short int*' to 'const __bf16*'
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabihf/gcc3/gcc/include/arm_neon.h:19574:39:
error: cannot convert 'const short int*' to 'const __bf16*'
[....]

The same problem makes a lot (~365) of tests become unsupported on
arm-none-linux-gnueabi:
g++.dg/abi/mangle-arm-crypto.C
g++.dg/abi/mangle-neon.C

Can you fix it?

Thanks

Christophe

>
>
> >
> > Thanks,
> > Delia
> >
> > On 3/4/20 5:20 PM, Kyrill Tkachov wrote:
> >> Hi Delia,
> >>
> >> On 3/4/20 2:05 PM, Delia Burduv wrote:
> >>> Hi,
> >>>
> >>> The previous version of this patch shared part of its code with the
> >>> store intrinsics patch
> >>> (https://gcc.gnu.org/ml/gcc-patches/2020-03/msg00145.html) so I removed
> >>> any duplicated code. This patch now depends on the previously mentioned
> >>> store intrinsics patch.
> >>>
> >>> Here is the latest version and the updated ChangeLog.
> >>>
> >>> gcc/ChangeLog:
> >>>
> >>> 2019-03-04  Delia Burduv  <delia.burduv@arm.com>
> >>>
> >>>         * config/arm/arm_neon.h (bfloat16_t): New typedef.
> >>>          (vld2_bf16): New.
> >>>         (vld2q_bf16): New.
> >>>         (vld3_bf16): New.
> >>>         (vld3q_bf16): New.
> >>>         (vld4_bf16): New.
> >>>         (vld4q_bf16): New.
> >>>         (vld2_dup_bf16): New.
> >>>         (vld2q_dup_bf16): New.
> >>>          (vld3_dup_bf16): New.
> >>>         (vld3q_dup_bf16): New.
> >>>         (vld4_dup_bf16): New.
> >>>         (vld4q_dup_bf16): New.
> >>>          * config/arm/arm_neon_builtins.def
> >>>          (vld2): Changed to VAR13 and added v4bf, v8bf
> >>>          (vld2_dup): Changed to VAR8 and added v4bf, v8bf
> >>>          (vld3): Changed to VAR13 and added v4bf, v8bf
> >>>          (vld3_dup): Changed to VAR8 and added v4bf, v8bf
> >>>          (vld4): Changed to VAR13 and added v4bf, v8bf
> >>>          (vld4_dup): Changed to VAR8 and added v4bf, v8bf
> >>>          * config/arm/iterators.md (VDXBF): New iterator.
> >>>          (VQ2BF): New iterator.
> >>>          *config/arm/neon.md (vld2): Used new iterators.
> >>>          (vld2_dup<mode>): Used new iterators.
> >>>          (vld2_dupv8bf): New.
> >>>          (vst3): Used new iterators.
> >>>          (vst3qa): Used new iterators.
> >>>          (vst3qb): Used new iterators.
> >>>          (vld3_dup<mode>): Used new iterators.
> >>>          (vld3_dupv8bf): New.
> >>>          (vst4): Used new iterators.
> >>>          (vst4qa): Used new iterators.
> >>>          (vst4qb): Used new iterators.
> >>>          (vld4_dup<mode>): Used new iterators.
> >>>          (vld4_dupv8bf): New.
> >>>
> >>>
> >>> gcc/testsuite/ChangeLog:
> >>>
> >>> 2019-03-04  Delia Burduv  <delia.burduv@arm.com>
> >>>
> >>>         * gcc.target/arm/simd/bf16_vldn_1.c: New test.
> >>>
> >>> Thanks,
> >>> Delia
> >>>
> >>> On 2/19/20 5:25 PM, Delia Burduv wrote:
> >>> >
> >>> > Hi,
> >>> >
> >>> > Here is the latest version of the patch. It just has some minor
> >>> > formatting changes that were brought up by Richard Sandiford in the
> >>> > AArch64 patches
> >>> >
> >>> > Thanks,
> >>> > Delia
> >>> >
> >>> > On 1/22/20 5:31 PM, Delia Burduv wrote:
> >>> >> Ping.
> >>> >>
> >>> >> I will change the tests to use the exact input and output
> >>> registers as
> >>> >> Richard Sandiford suggested for the AArch64 patches.
> >>> >>
> >>> >> On 12/20/19 6:48 PM, Delia Burduv wrote:
> >>> >>> This patch adds the ARMv8.6 ACLE BFloat16 load intrinsics
> >>> >>> vld<n>{q}_bf16 as part of the BFloat16 extension.
> >>> >>>
> >>> (https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics)
> >>>
> >>> >>>
> >>> >>> The intrinsics are declared in arm_neon.h .
> >>> >>> A new test is added to check assembler output.
> >>> >>>
> >>> >>> This patch depends on the Arm back-end patche.
> >>> >>> (https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html)
> >>> >>>
> >>> >>> Tested for regression on arm-none-eabi and armeb-none-eabi. I don't
> >>> >>> have commit rights, so if this is ok can someone please commit
> >>> it for
> >>> >>> me?
> >>> >>>
> >>> >>> gcc/ChangeLog:
> >>> >>>
> >>> >>> 2019-11-14  Delia Burduv <delia.burduv@arm.com>
> >>> >>>
> >>> >>>      * config/arm/arm_neon.h (bfloat16_t): New typedef.
> >>> >>>          (bfloat16x4x2_t): New typedef.
> >>> >>>          (bfloat16x8x2_t): New typedef.
> >>> >>>          (bfloat16x4x3_t): New typedef.
> >>> >>>          (bfloat16x8x3_t): New typedef.
> >>> >>>          (bfloat16x4x4_t): New typedef.
> >>> >>>          (bfloat16x8x4_t): New typedef.
> >>> >>>          (vld2_bf16): New.
> >>> >>>      (vld2q_bf16): New.
> >>> >>>      (vld3_bf16): New.
> >>> >>>      (vld3q_bf16): New.
> >>> >>>      (vld4_bf16): New.
> >>> >>>      (vld4q_bf16): New.
> >>> >>>      (vld2_dup_bf16): New.
> >>> >>>      (vld2q_dup_bf16): New.
> >>> >>>       (vld3_dup_bf16): New.
> >>> >>>      (vld3q_dup_bf16): New.
> >>> >>>      (vld4_dup_bf16): New.
> >>> >>>      (vld4q_dup_bf16): New.
> >>> >>>          * config/arm/arm-builtins.c (E_V2BFmode): New mode.
> >>> >>>          (VAR13): New.
> >>> >>>          (arm_simd_types[Bfloat16x2_t]):New type.
> >>> >>>          * config/arm/arm-modes.def (V2BF): New mode.
> >>> >>>          * config/arm/arm-simd-builtin-types.def
> >>> >>>          (Bfloat16x2_t): New entry.
> >>> >>>          * config/arm/arm_neon_builtins.def
> >>> >>>          (vld2): Changed to VAR13 and added v4bf, v8bf
> >>> >>>          (vld2_dup): Changed to VAR8 and added v4bf, v8bf
> >>> >>>          (vld3): Changed to VAR13 and added v4bf, v8bf
> >>> >>>          (vld3_dup): Changed to VAR8 and added v4bf, v8bf
> >>> >>>          (vld4): Changed to VAR13 and added v4bf, v8bf
> >>> >>>          (vld4_dup): Changed to VAR8 and added v4bf, v8bf
> >>> >>>          * config/arm/iterators.md (VDXBF): New iterator.
> >>> >>>          (VQ2BF): New iterator.
> >>> >>>          (V_elem): Added V4BF, V8BF.
> >>> >>>          (V_sz_elem): Added V4BF, V8BF.
> >>> >>>          (V_mode_nunits): Added V4BF, V8BF.
> >>> >>>          (q): Added V4BF, V8BF.
> >>> >>>          *config/arm/neon.md (vld2): Used new iterators.
> >>> >>>          (vld2_dup<mode>): Used new iterators.
> >>> >>>          (vld2_dupv8bf): New.
> >>> >>>          (vst3): Used new iterators.
> >>> >>>          (vst3qa): Used new iterators.
> >>> >>>          (vst3qb): Used new iterators.
> >>> >>>          (vld3_dup<mode>): Used new iterators.
> >>> >>>          (vld3_dupv8bf): New.
> >>> >>>          (vst4): Used new iterators.
> >>> >>>          (vst4qa): Used new iterators.
> >>> >>>          (vst4qb): Used new iterators.
> >>> >>>          (vld4_dup<mode>): Used new iterators.
> >>> >>>          (vld4_dupv8bf): New.
> >>> >>>
> >>> >>>
> >>> >>> gcc/testsuite/ChangeLog:
> >>> >>>
> >>> >>> 2019-11-14  Delia Burduv <delia.burduv@arm.com>
> >>> >>>
> >>> >>>      * gcc.target/arm/simd/bf16_vldn_1.c: New test.
> >>
> >>
> >> diff --git a/gcc/testsuite/gcc.target/arm/simd/bf16_vldn_1.c
> >> b/gcc/testsuite/gcc.target/arm/simd/bf16_vldn_1.c
> >> new file mode 100644
> >> index
> >> 0000000000000000000000000000000000000000..7ff8b600827e5c2e313ce40d14382aa641b4bb31
> >>
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/arm/simd/bf16_vldn_1.c
> >> @@ -0,0 +1,152 @@
> >> +/* { dg-do assemble } */
> >> +/* { dg-options "-save-temps" }  */
> >> +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
> >> +/* { dg-add-options arm_v8_2a_bf16_neon } */
> >> +/* { dg-final { check-function-bodies "**" "" } } */
> >>
> >>
> >> I think this should include an optimisation option like -O2 because...
> >>
> >>   +
> >> +#include "arm_neon.h"
> >> +
> >> +
> >> +/*
> >> +**test_vld2_bf16:
> >> +**    ...
> >> +**    vld2.16    {d16-d17}, \[r3\]
> >>
> >> ... this is unstable codegen depending on the -O0 register allocator
> >> moving the ptr argument to r3 from its initial r0.
> >> This should really be r0 and the load instruction should load the low
> >> D regs.
> >> So let's add an -O2 to the dg-options and scan for the result of that.
> >>
> >>
> >> Otherwise this is ok.
> >> Thanks!
> >> Kyrill
> >>
> >>
> >>   +**    ...
> >> +*/
> >> +bfloat16x4x2_t
> >> +test_vld2_bf16 (bfloat16_t * ptr)
> >> +{
> >> +  vld2_bf16 (ptr);
> >> +}
> >> +
> >>