From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 17746 invoked by alias); 5 Mar 2020 16:39:14 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 17684 invoked by uid 89); 5 Mar 2020 16:39:09 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-23.0 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,KAM_LOTSOFHASH,KAM_NUMSUBJECT,KAM_SHORT,MSGID_FROM_MTA_HEADER,RCVD_IN_DNSWL_NONE,SPF_HELO_PASS,SPF_PASS,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 spammy=60607 X-HELO: EUR05-VI1-obe.outbound.protection.outlook.com Received: from mail-vi1eur05on2050.outbound.protection.outlook.com (HELO EUR05-VI1-obe.outbound.protection.outlook.com) (40.107.21.50) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 05 Mar 2020 16:39:05 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=OBoEiTPQaImfuErp/GjNGbF2AuHEXAvjufp2Unl5TIA=; b=QQ4iuZUPyaLnTjMqKyt4O4lHHLEI/3CpkrGSEqqM7Rp6xxNhFDgQFm43QEdYWJasnxGijqBWniPK2Qwd+GGPS5LlA7afsautGvm58NRKWz+r12uUtSIndKnFP/4JVpLRdTLHsVW2r6xq24WyAxZ+I7eR2p8Iry3Gmk6m/zX0EVc= Received: from AM0PR05CA0090.eurprd05.prod.outlook.com (2603:10a6:208:136::30) by HE1PR0802MB2588.eurprd08.prod.outlook.com (2603:10a6:3:e2::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2772.19; Thu, 5 Mar 2020 16:39:01 +0000 Received: from AM5EUR03FT043.eop-EUR03.prod.protection.outlook.com (2603:10a6:208:136:cafe::b6) by AM0PR05CA0090.outlook.office365.com (2603:10a6:208:136::30) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2772.19 via Frontend Transport; Thu, 5 Mar 2020 16:39:00 +0000 Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; gcc.gnu.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;gcc.gnu.org; dmarc=bestguesspass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM5EUR03FT043.mail.protection.outlook.com (10.152.17.43) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2793.11 via Frontend Transport; Thu, 5 Mar 2020 16:39:00 +0000 Received: ("Tessian outbound 846b976b3941:v42"); Thu, 05 Mar 2020 16:39:00 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: ad7c1f334cdc1b7e X-CR-MTA-TID: 64aa7808 Received: from 3cdbf6905595.3 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 8CD8E7B4-0F12-45FA-A0D8-3689E41704D9.1; Thu, 05 Mar 2020 16:38:54 +0000 Received: from EUR04-DB3-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 3cdbf6905595.3 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 05 Mar 2020 16:38:54 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=d83ZGp/fJYJauB7OFQiMzZUvyYLI/UftK+MdKy8rM8yV13ynPrDCDJYwskJx5u1OkK6+btMEOxZNJFuwEybbP/uqqu+3LJOk6E4hf435skvWXdpSPruZlKL06nd13HNN1fXFfYTDYTveJ0sNrQZ+8+YTAg4qOtAcTil5a+/EzlxJ473v+f9kecEQ/NEXslW17zcSLQuKnOhZ4MNBjbOPAmOwZtc0D/tjzT99aPqRxl0yR4ofPMp0U3Yjn5ZHHOWZN1UWrcaAmD9gRFFhZjHy16fC8N/pUtTfSC0n12wYjdef+2A9/5cZko9PCYIlbPb3z1uXWM09TcF948FsQZqS4A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=OBoEiTPQaImfuErp/GjNGbF2AuHEXAvjufp2Unl5TIA=; b=d8FhFVBSEvjJaaC7J0a7kFQy0tCprGqliixxNivr7sEf0ffXcqnibFK3TIu92BIgnN5aVIDTEeDyICuTvLH4by9kWPbLxLZ+zwSKC27HVZvDFLFx+YIVPwzElUMJ68gOyEH2AU9EDkDktpupKkLD+YGnxKuINf9GjYApov4EBAGFzJtHpegoVp3Qt+HLzRmUakxDnPE2Wx7FTUWtyQrc3mm75gIlqfl8i8Ksq/+ZrZEDAKkILhMuHbaraN2kVx4KY6JO2YzFJ36UIDWa0UTIJpTV9INqT4ZLKdYxKc0AytDhMwW3bIfnV7xr4FcYOQC+dBTTi/JAvBq+RtvKb39vzw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=OBoEiTPQaImfuErp/GjNGbF2AuHEXAvjufp2Unl5TIA=; b=QQ4iuZUPyaLnTjMqKyt4O4lHHLEI/3CpkrGSEqqM7Rp6xxNhFDgQFm43QEdYWJasnxGijqBWniPK2Qwd+GGPS5LlA7afsautGvm58NRKWz+r12uUtSIndKnFP/4JVpLRdTLHsVW2r6xq24WyAxZ+I7eR2p8Iry3Gmk6m/zX0EVc= Authentication-Results-Original: spf=none (sender IP is ) smtp.mailfrom=Delia.Burduv@arm.com; Received: from VI1PR08MB4096.eurprd08.prod.outlook.com (20.178.126.87) by VI1PR08MB2767.eurprd08.prod.outlook.com (10.170.238.31) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2772.18; Thu, 5 Mar 2020 16:38:53 +0000 Received: from VI1PR08MB4096.eurprd08.prod.outlook.com ([fe80::35ab:a1fa:adf3:4186]) by VI1PR08MB4096.eurprd08.prod.outlook.com ([fe80::35ab:a1fa:adf3:4186%6]) with mapi id 15.20.2750.027; Thu, 5 Mar 2020 16:38:52 +0000 Subject: Re: ACLE intrinsics: BFloat16 load intrinsics for AArch32 To: Kyrill Tkachov , "gcc-patches@gcc.gnu.org" Cc: "nickc@redhat.com" , Richard Earnshaw , Ramana Radhakrishnan References: <03fd9393-a25d-c1fb-535b-c4f39ea7decb@arm.com> <64238216-3612-f947-e2b0-407cb5110d9a@arm.com> <47885cba-033e-5222-eece-cd86f1adf11f@arm.com> <11518e9b-0b13-c1f5-1dea-eac88cdc464d@arm.com> <4b4edee7-e9e8-d12b-8f88-6c6be52e02a6@foss.arm.com> From: Delia Burduv Message-ID: <03e394d8-9d16-ce0f-e478-e708b35bc3e1@arm.com> Date: Thu, 05 Mar 2020 16:39:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1 In-Reply-To: <4b4edee7-e9e8-d12b-8f88-6c6be52e02a6@foss.arm.com> Content-Type: multipart/mixed; boundary="------------55DB96B3365C3C63E0C0F1AE" MIME-Version: 1.0 Received: from [10.2.80.82] (217.140.106.49) by LNXP265CA0058.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:5d::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2772.19 via Frontend Transport; Thu, 5 Mar 2020 16:38:52 +0000 X-MS-Exchange-Transport-Forked: True x-checkrecipientrouted: true X-MS-Oob-TLC-OOBClassifiers: OLM:4714;OLM:4714; X-Forefront-Antispam-Report-Untrusted: SFV:NSPM;SFS:(10009020)(4636009)(376002)(136003)(396003)(346002)(39860400002)(366004)(189003)(199004)(31686004)(81166006)(478600001)(66476007)(8676002)(33964004)(53546011)(81156014)(16576012)(8936002)(6486002)(4001150100001)(5660300002)(66556008)(66616009)(235185007)(54906003)(2906002)(66946007)(110136005)(52116002)(316002)(44832011)(31696002)(26005)(16526019)(4326008)(86362001)(36756003)(2616005)(186003)(956004);DIR:OUT;SFP:1101;SCL:1;SRVR:VI1PR08MB2767;H:VI1PR08MB4096.eurprd08.prod.outlook.com;FPR:;SPF:None;LANG:en;PTR:InfoNoRecords;MX:1;A:1; Received-SPF: None (protection.outlook.com: arm.com does not designate permitted sender hosts) X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: AzG8UHKhYPTfyyzTp7cUowpHoP7kfqRq6eAofo5cKM843peJcrt/Tu9coz+ZF5Obh8JzW0ZbNU2c/tqtN1gIsiiu3Qx+O8L/fydRzTu/pd6VzI9PhXYcu3vwmO3AVIOEkBYyyvJS+Ls+3kn3BuQDvqXA3Qa9Ehf0teoTzxap4XXnql8Uix1bsUUUQNvIIU/CMIUsqTjmzmg2QJFBkTItKuFXp1uI2USMSmGsMho+rEEpfwjoaAk2NlL+6WOfmqOuAs3JO1hU3hkuAX3pkVLVIFdyWr6XKBu+xHkKrs2oTp9y76lUnRR0ChaLGs7N+f79/n2r+LMXUq/GhNZf3DYLsVf68PhUWCEBP4yqskza+CMyTq5jkS7O16L6/+xxPUWKXrWQyJ8xxv5fZKM1m0BlySCkPQN6t8wkS31EiG1GhNpiS1Q/tmkof/V4yLYkF/TBq343E8y4Q6FRChfkH1dFO2fMoU0xzerXBK04fjbYYAZzOsRCZJoB7PdPsL/IJA/61BIi+g14KBvQQrd2LhLyfQ== X-MS-Exchange-AntiSpam-MessageData: ydXxN1PP3wo3mkJFcfi2yGxP0W8iIdsw/c8k6SCxiYndooZHLlZ/Xyph52NB/oBkb4IBebvZbbAu0d0mwFjThf60R27dv3hkzDSTpHCo2VAxPng3M6fBhEZl4YLVhj5YYUjb23YbhjVcCL2kZgajwQ== Original-Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=Delia.Burduv@arm.com; Return-Path: Delia.Burduv@arm.com X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM5EUR03FT043.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: 1e868312-3e7c-4988-4a2c-08d7c123b07a X-SW-Source: 2020-03/txt/msg00311.txt --------------55DB96B3365C3C63E0C0F1AE Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-length: 7391 Hi, This is the latest version of the patch. I am forcing -mfloat-abi=hard because the code generated is slightly differently depending on the float-abi used. Thanks, Delia On 3/4/20 5:20 PM, Kyrill Tkachov wrote: > Hi Delia, > > On 3/4/20 2:05 PM, Delia Burduv wrote: >> Hi, >> >> The previous version of this patch shared part of its code with the >> store intrinsics patch >> (https://gcc.gnu.org/ml/gcc-patches/2020-03/msg00145.html) so I removed >> any duplicated code. This patch now depends on the previously mentioned >> store intrinsics patch. >> >> Here is the latest version and the updated ChangeLog. >> >> gcc/ChangeLog: >> >> 2019-03-04  Delia Burduv  >> >>         * config/arm/arm_neon.h (bfloat16_t): New typedef. >>          (vld2_bf16): New. >>         (vld2q_bf16): New. >>         (vld3_bf16): New. >>         (vld3q_bf16): New. >>         (vld4_bf16): New. >>         (vld4q_bf16): New. >>         (vld2_dup_bf16): New. >>         (vld2q_dup_bf16): New. >>          (vld3_dup_bf16): New. >>         (vld3q_dup_bf16): New. >>         (vld4_dup_bf16): New. >>         (vld4q_dup_bf16): New. >>          * config/arm/arm_neon_builtins.def >>          (vld2): Changed to VAR13 and added v4bf, v8bf >>          (vld2_dup): Changed to VAR8 and added v4bf, v8bf >>          (vld3): Changed to VAR13 and added v4bf, v8bf >>          (vld3_dup): Changed to VAR8 and added v4bf, v8bf >>          (vld4): Changed to VAR13 and added v4bf, v8bf >>          (vld4_dup): Changed to VAR8 and added v4bf, v8bf >>          * config/arm/iterators.md (VDXBF): New iterator. >>          (VQ2BF): New iterator. >>          *config/arm/neon.md (vld2): Used new iterators. >>          (vld2_dup): Used new iterators. >>          (vld2_dupv8bf): New. >>          (vst3): Used new iterators. >>          (vst3qa): Used new iterators. >>          (vst3qb): Used new iterators. >>          (vld3_dup): Used new iterators. >>          (vld3_dupv8bf): New. >>          (vst4): Used new iterators. >>          (vst4qa): Used new iterators. >>          (vst4qb): Used new iterators. >>          (vld4_dup): Used new iterators. >>          (vld4_dupv8bf): New. >> >> >> gcc/testsuite/ChangeLog: >> >> 2019-03-04  Delia Burduv  >> >>         * gcc.target/arm/simd/bf16_vldn_1.c: New test. >> >> Thanks, >> Delia >> >> On 2/19/20 5:25 PM, Delia Burduv wrote: >> > >> > Hi, >> > >> > Here is the latest version of the patch. It just has some minor >> > formatting changes that were brought up by Richard Sandiford in the >> > AArch64 patches >> > >> > Thanks, >> > Delia >> > >> > On 1/22/20 5:31 PM, Delia Burduv wrote: >> >> Ping. >> >> >> >> I will change the tests to use the exact input and output registers as >> >> Richard Sandiford suggested for the AArch64 patches. >> >> >> >> On 12/20/19 6:48 PM, Delia Burduv wrote: >> >>> This patch adds the ARMv8.6 ACLE BFloat16 load intrinsics >> >>> vld{q}_bf16 as part of the BFloat16 extension. >> >>> >> (https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics) >> >> >>> >> >>> The intrinsics are declared in arm_neon.h . >> >>> A new test is added to check assembler output. >> >>> >> >>> This patch depends on the Arm back-end patche. >> >>> (https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html) >> >>> >> >>> Tested for regression on arm-none-eabi and armeb-none-eabi. I don't >> >>> have commit rights, so if this is ok can someone please commit it for >> >>> me? >> >>> >> >>> gcc/ChangeLog: >> >>> >> >>> 2019-11-14  Delia Burduv >> >>> >> >>>      * config/arm/arm_neon.h (bfloat16_t): New typedef. >> >>>          (bfloat16x4x2_t): New typedef. >> >>>          (bfloat16x8x2_t): New typedef. >> >>>          (bfloat16x4x3_t): New typedef. >> >>>          (bfloat16x8x3_t): New typedef. >> >>>          (bfloat16x4x4_t): New typedef. >> >>>          (bfloat16x8x4_t): New typedef. >> >>>          (vld2_bf16): New. >> >>>      (vld2q_bf16): New. >> >>>      (vld3_bf16): New. >> >>>      (vld3q_bf16): New. >> >>>      (vld4_bf16): New. >> >>>      (vld4q_bf16): New. >> >>>      (vld2_dup_bf16): New. >> >>>      (vld2q_dup_bf16): New. >> >>>       (vld3_dup_bf16): New. >> >>>      (vld3q_dup_bf16): New. >> >>>      (vld4_dup_bf16): New. >> >>>      (vld4q_dup_bf16): New. >> >>>          * config/arm/arm-builtins.c (E_V2BFmode): New mode. >> >>>          (VAR13): New. >> >>>          (arm_simd_types[Bfloat16x2_t]):New type. >> >>>          * config/arm/arm-modes.def (V2BF): New mode. >> >>>          * config/arm/arm-simd-builtin-types.def >> >>>          (Bfloat16x2_t): New entry. >> >>>          * config/arm/arm_neon_builtins.def >> >>>          (vld2): Changed to VAR13 and added v4bf, v8bf >> >>>          (vld2_dup): Changed to VAR8 and added v4bf, v8bf >> >>>          (vld3): Changed to VAR13 and added v4bf, v8bf >> >>>          (vld3_dup): Changed to VAR8 and added v4bf, v8bf >> >>>          (vld4): Changed to VAR13 and added v4bf, v8bf >> >>>          (vld4_dup): Changed to VAR8 and added v4bf, v8bf >> >>>          * config/arm/iterators.md (VDXBF): New iterator. >> >>>          (VQ2BF): New iterator. >> >>>          (V_elem): Added V4BF, V8BF. >> >>>          (V_sz_elem): Added V4BF, V8BF. >> >>>          (V_mode_nunits): Added V4BF, V8BF. >> >>>          (q): Added V4BF, V8BF. >> >>>          *config/arm/neon.md (vld2): Used new iterators. >> >>>          (vld2_dup): Used new iterators. >> >>>          (vld2_dupv8bf): New. >> >>>          (vst3): Used new iterators. >> >>>          (vst3qa): Used new iterators. >> >>>          (vst3qb): Used new iterators. >> >>>          (vld3_dup): Used new iterators. >> >>>          (vld3_dupv8bf): New. >> >>>          (vst4): Used new iterators. >> >>>          (vst4qa): Used new iterators. >> >>>          (vst4qb): Used new iterators. >> >>>          (vld4_dup): Used new iterators. >> >>>          (vld4_dupv8bf): New. >> >>> >> >>> >> >>> gcc/testsuite/ChangeLog: >> >>> >> >>> 2019-11-14  Delia Burduv >> >>> >> >>>      * gcc.target/arm/simd/bf16_vldn_1.c: New test. > > > diff --git a/gcc/testsuite/gcc.target/arm/simd/bf16_vldn_1.c > b/gcc/testsuite/gcc.target/arm/simd/bf16_vldn_1.c > new file mode 100644 > index > 0000000000000000000000000000000000000000..7ff8b600827e5c2e313ce40d14382aa641b4bb31 > > --- /dev/null > +++ b/gcc/testsuite/gcc.target/arm/simd/bf16_vldn_1.c > @@ -0,0 +1,152 @@ > +/* { dg-do assemble } */ > +/* { dg-options "-save-temps" }  */ > +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */ > +/* { dg-add-options arm_v8_2a_bf16_neon } */ > +/* { dg-final { check-function-bodies "**" "" } } */ > > > I think this should include an optimisation option like -O2 because... > >  + > +#include "arm_neon.h" > + > + > +/* > +**test_vld2_bf16: > +**    ... > +**    vld2.16    {d16-d17}, \[r3\] > > ... this is unstable codegen depending on the -O0 register allocator > moving the ptr argument to r3 from its initial r0. > This should really be r0 and the load instruction should load the low D > regs. > So let's add an -O2 to the dg-options and scan for the result of that. > > > Otherwise this is ok. > Thanks! > Kyrill > > >  +**    ... > +*/ > +bfloat16x4x2_t > +test_vld2_bf16 (bfloat16_t * ptr) > +{ > +  vld2_bf16 (ptr); > +} > + > --------------55DB96B3365C3C63E0C0F1AE Content-Type: text/x-patch; name="rb12473.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="rb12473.patch" Content-length: 18332 diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h index 425a2a49b69d7e3070059dd0a79ae3d306400f4b..2573cca6bb64f5104a1efd1379ef956f56d0fe04 100644 --- a/gcc/config/arm/arm_neon.h +++ b/gcc/config/arm/arm_neon.h @@ -19504,6 +19504,114 @@ vst4q_bf16 (bfloat16_t * __ptr, bfloat16x8x4_t __val) return __builtin_neon_vst4v8bf (__ptr, __bu.__o); } +__extension__ extern __inline bfloat16x4x2_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vld2_bf16 (bfloat16_t const * __ptr) +{ + union { bfloat16x4x2_t __i; __builtin_neon_ti __o; } __rv; + __rv.__o = __builtin_neon_vld2v4bf ((const __builtin_neon_hi *) __ptr); + return __rv.__i; +} + +__extension__ extern __inline bfloat16x8x2_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vld2q_bf16 (const bfloat16_t * __ptr) +{ + union { bfloat16x8x2_t __i; __builtin_neon_oi __o; } __rv; + __rv.__o = __builtin_neon_vld2v8bf ((const __builtin_neon_hi *) __ptr); + return __rv.__i; +} + +__extension__ extern __inline bfloat16x4x3_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vld3_bf16 (const bfloat16_t * __ptr) +{ + union { bfloat16x4x3_t __i; __builtin_neon_ei __o; } __rv; + __rv.__o = __builtin_neon_vld3v4bf ((const __builtin_neon_hi *) __ptr); + return __rv.__i; +} + +__extension__ extern __inline bfloat16x8x3_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vld3q_bf16 (const bfloat16_t * __ptr) +{ + union { bfloat16x8x3_t __i; __builtin_neon_ci __o; } __rv; + __rv.__o = __builtin_neon_vld3v8bf ((const __builtin_neon_hi *) __ptr); + return __rv.__i; +} + +__extension__ extern __inline bfloat16x4x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vld4_bf16 (const bfloat16_t * __ptr) +{ + union { bfloat16x4x4_t __i; __builtin_neon_oi __o; } __rv; + __rv.__o = __builtin_neon_vld4v4bf ((const __builtin_neon_hi *) __ptr); + return __rv.__i; +} + +__extension__ extern __inline bfloat16x8x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vld4q_bf16 (const bfloat16_t * __ptr) +{ + union { bfloat16x8x4_t __i; __builtin_neon_xi __o; } __rv; + __rv.__o = __builtin_neon_vld4v8bf ((const __builtin_neon_hi *) __ptr); + return __rv.__i; +} + +__extension__ extern __inline bfloat16x4x2_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vld2_dup_bf16 (const bfloat16_t * __ptr) +{ + union { bfloat16x4x2_t __i; __builtin_neon_ti __o; } __rv; + __rv.__o = __builtin_neon_vld2_dupv4bf ((const __builtin_neon_hi *) __ptr); + return __rv.__i; +} + +__extension__ extern __inline bfloat16x8x2_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vld2q_dup_bf16 (const bfloat16_t * __ptr) +{ + union { bfloat16x8x2_t __i; __builtin_neon_oi __o; } __rv; + __rv.__o = __builtin_neon_vld2_dupv8bf ((const __builtin_neon_hi *) __ptr); + return __rv.__i; +} + +__extension__ extern __inline bfloat16x4x3_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vld3_dup_bf16 (const bfloat16_t * __ptr) +{ + union { bfloat16x4x3_t __i; __builtin_neon_ei __o; } __rv; + __rv.__o = __builtin_neon_vld3_dupv4bf ((const __builtin_neon_hi *) __ptr); + return __rv.__i; +} + +__extension__ extern __inline bfloat16x8x3_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vld3q_dup_bf16 (const bfloat16_t * __ptr) +{ + union { bfloat16x8x3_t __i; __builtin_neon_ci __o; } __rv; + __rv.__o = __builtin_neon_vld3_dupv8bf ((const __builtin_neon_hi *) __ptr); + return __rv.__i; +} + +__extension__ extern __inline bfloat16x4x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vld4_dup_bf16 (const bfloat16_t * __ptr) +{ + union { bfloat16x4x4_t __i; __builtin_neon_oi __o; } __rv; + __rv.__o = __builtin_neon_vld4_dupv4bf ((const __builtin_neon_hi *) __ptr); + return __rv.__i; +} + +__extension__ extern __inline bfloat16x8x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vld4q_dup_bf16 (const bfloat16_t * __ptr) +{ + union { bfloat16x8x4_t __i; __builtin_neon_xi __o; } __rv; + __rv.__o = __builtin_neon_vld4_dupv8bf ((const __builtin_neon_hi *) __ptr); + return __rv.__i; +} + #pragma GCC pop_options #ifdef __cplusplus diff --git a/gcc/config/arm/arm_neon_builtins.def b/gcc/config/arm/arm_neon_builtins.def index d85a2d4b1fcf9e851f215dfdd4b305e59ded651c..e3c1652b9e92ff5024225279f26c1ccb197dcd69 100644 --- a/gcc/config/arm/arm_neon_builtins.def +++ b/gcc/config/arm/arm_neon_builtins.def @@ -320,29 +320,29 @@ VAR12 (STORE1, vst1, v8qi, v4hi, v4hf, v2si, v2sf, di, v16qi, v8hi, v8hf, v4si, v4sf, v2di) VAR12 (STORE1LANE, vst1_lane, v8qi, v4hi, v4hf, v2si, v2sf, di, v16qi, v8hi, v8hf, v4si, v4sf, v2di) -VAR11 (LOAD1, vld2, - v8qi, v4hi, v4hf, v2si, v2sf, di, v16qi, v8hi, v8hf, v4si, v4sf) +VAR13 (LOAD1, vld2, + v8qi, v4hi, v4hf, v2si, v2sf, di, v16qi, v8hi, v8hf, v4si, v4sf, v4bf, v8bf) VAR9 (LOAD1LANE, vld2_lane, v8qi, v4hi, v4hf, v2si, v2sf, v8hi, v8hf, v4si, v4sf) -VAR6 (LOAD1, vld2_dup, v8qi, v4hi, v4hf, v2si, v2sf, di) +VAR8 (LOAD1, vld2_dup, v8qi, v4hi, v4hf, v2si, v2sf, di, v4bf, v8bf) VAR13 (STORE1, vst2, v8qi, v4hi, v4hf, v4bf, v2si, v2sf, di, v16qi, v8hi, v8hf, v8bf, v4si, v4sf) VAR9 (STORE1LANE, vst2_lane, v8qi, v4hi, v4hf, v2si, v2sf, v8hi, v8hf, v4si, v4sf) -VAR11 (LOAD1, vld3, - v8qi, v4hi, v4hf, v2si, v2sf, di, v16qi, v8hi, v8hf, v4si, v4sf) +VAR13 (LOAD1, vld3, + v8qi, v4hi, v4hf, v2si, v2sf, di, v16qi, v8hi, v8hf, v4si, v4sf, v4bf, v8bf) VAR9 (LOAD1LANE, vld3_lane, v8qi, v4hi, v4hf, v2si, v2sf, v8hi, v8hf, v4si, v4sf) -VAR6 (LOAD1, vld3_dup, v8qi, v4hi, v4hf, v2si, v2sf, di) +VAR8 (LOAD1, vld3_dup, v8qi, v4hi, v4hf, v2si, v2sf, di, v4bf, v8bf) VAR13 (STORE1, vst3, v8qi, v4hi, v4hf, v4bf, v2si, v2sf, di, v16qi, v8hi, v8hf, v8bf, v4si, v4sf) VAR9 (STORE1LANE, vst3_lane, v8qi, v4hi, v4hf, v2si, v2sf, v8hi, v8hf, v4si, v4sf) -VAR11 (LOAD1, vld4, - v8qi, v4hi, v4hf, v2si, v2sf, di, v16qi, v8hi, v8hf, v4si, v4sf) +VAR13 (LOAD1, vld4, + v8qi, v4hi, v4hf, v2si, v2sf, di, v16qi, v8hi, v8hf, v4si, v4sf, v4bf, v8bf) VAR9 (LOAD1LANE, vld4_lane, v8qi, v4hi, v4hf, v2si, v2sf, v8hi, v8hf, v4si, v4sf) -VAR6 (LOAD1, vld4_dup, v8qi, v4hi, v4hf, v2si, v2sf, di) +VAR8 (LOAD1, vld4_dup, v8qi, v4hi, v4hf, v2si, v2sf, di, v4bf, v8bf) VAR13 (STORE1, vst4, v8qi, v4hi, v4hf, v4bf, v2si, v2sf, di, v16qi, v8hi, v8hf, v8bf, v4si, v4sf) VAR9 (STORE1LANE, vst4_lane, diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md index 0c03e747c3643e018f4f62dda5e832dfb1af758f..7401f16ef59b9854bbc85f98cfdcdd7a8a600337 100644 --- a/gcc/config/arm/iterators.md +++ b/gcc/config/arm/iterators.md @@ -87,6 +87,9 @@ ;; Double-width vector modes plus 64-bit elements, including V4BF. (define_mode_iterator VDXBF [V8QI V4HI V4HF (V4BF "TARGET_BF16_SIMD") V2SI V2SF DI]) +;; Double-width vector modes plus 64-bit elements, V4BF and V8BF. +(define_mode_iterator VDXBF2 [V8QI V4HI V4HF V2SI V2SF DI (V4BF "TARGET_BF16_SIMD") (V8BF ("TARGET_BF16_SIMD"))]) + ;; Double-width vector modes plus 64-bit elements, ;; with V4BFmode added, suitable for moves. (define_mode_iterator VDXMOV [V8QI V4HI V4HF V4BF V2SI V2SF DI]) diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index fcf59aee32a955b6bb3e7b98a4d880a0e631b4be..5117f78dd2dce442bc738de6082686421fcdca52 100644 --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -5428,7 +5428,7 @@ if (BYTES_BIG_ENDIAN) (define_insn "neon_vld2" [(set (match_operand:TI 0 "s_register_operand" "=w") (unspec:TI [(match_operand:TI 1 "neon_struct_operand" "Um") - (unspec:VDX [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] + (unspec:VDXBF [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] UNSPEC_VLD2))] "TARGET_NEON" { @@ -5453,7 +5453,7 @@ if (BYTES_BIG_ENDIAN) (define_insn "neon_vld2" [(set (match_operand:OI 0 "s_register_operand" "=w") (unspec:OI [(match_operand:OI 1 "neon_struct_operand" "Um") - (unspec:VQ2 [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] + (unspec:VQ2BF [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] UNSPEC_VLD2))] "TARGET_NEON" "vld2.\t%h0, %A1" @@ -5516,7 +5516,7 @@ if (BYTES_BIG_ENDIAN) (define_insn "neon_vld2_dup" [(set (match_operand:TI 0 "s_register_operand" "=w") (unspec:TI [(match_operand: 1 "neon_struct_operand" "Um") - (unspec:VDX [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] + (unspec:VDXBF [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] UNSPEC_VLD2_DUP))] "TARGET_NEON" { @@ -5531,6 +5531,27 @@ if (BYTES_BIG_ENDIAN) (const_string "neon_load1_1reg")))] ) +(define_insn "neon_vld2_dupv8bf" + [(set (match_operand:OI 0 "s_register_operand" "=w") + (unspec:OI [(match_operand:V2BF 1 "neon_struct_operand" "Um") + (unspec:V8BF [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] + UNSPEC_VLD2_DUP))] + "TARGET_BF16_SIMD" + { + rtx ops[5]; + int tabbase = REGNO (operands[0]); + + ops[4] = operands[1]; + ops[0] = gen_rtx_REG (V4BFmode, tabbase); + ops[1] = gen_rtx_REG (V4BFmode, tabbase + 2); + ops[2] = gen_rtx_REG (V4BFmode, tabbase + 4); + ops[3] = gen_rtx_REG (V4BFmode, tabbase + 6); + output_asm_insn ("vld2.16\t{%P0, %P1, %P2, %P3}, %A4", ops); + return ""; + } + [(set_attr "type" "neon_load2_all_lanes_q")] +) + (define_expand "vec_store_lanesti" [(set (match_operand:TI 0 "neon_struct_operand") (unspec:TI [(match_operand:TI 1 "s_register_operand") @@ -5637,7 +5658,7 @@ if (BYTES_BIG_ENDIAN) (define_insn "neon_vld3" [(set (match_operand:EI 0 "s_register_operand" "=w") (unspec:EI [(match_operand:EI 1 "neon_struct_operand" "Um") - (unspec:VDX [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] + (unspec:VDXBF [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] UNSPEC_VLD3))] "TARGET_NEON" { @@ -5665,7 +5686,7 @@ if (BYTES_BIG_ENDIAN) (define_expand "neon_vld3" [(match_operand:CI 0 "s_register_operand") (match_operand:CI 1 "neon_struct_operand") - (unspec:VQ2 [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] + (unspec:VQ2BF [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] "TARGET_NEON" { rtx mem; @@ -5680,7 +5701,7 @@ if (BYTES_BIG_ENDIAN) (define_insn "neon_vld3qa" [(set (match_operand:CI 0 "s_register_operand" "=w") (unspec:CI [(match_operand:EI 1 "neon_struct_operand" "Um") - (unspec:VQ2 [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] + (unspec:VQ2BF [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] UNSPEC_VLD3A))] "TARGET_NEON" { @@ -5700,7 +5721,7 @@ if (BYTES_BIG_ENDIAN) [(set (match_operand:CI 0 "s_register_operand" "=w") (unspec:CI [(match_operand:EI 1 "neon_struct_operand" "Um") (match_operand:CI 2 "s_register_operand" "0") - (unspec:VQ2 [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] + (unspec:VQ2BF [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] UNSPEC_VLD3B))] "TARGET_NEON" { @@ -5777,7 +5798,7 @@ if (BYTES_BIG_ENDIAN) (define_insn "neon_vld3_dup" [(set (match_operand:EI 0 "s_register_operand" "=w") (unspec:EI [(match_operand: 1 "neon_struct_operand" "Um") - (unspec:VDX [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] + (unspec:VDXBF [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] UNSPEC_VLD3_DUP))] "TARGET_NEON" { @@ -5800,6 +5821,26 @@ if (BYTES_BIG_ENDIAN) (const_string "neon_load3_all_lanes") (const_string "neon_load1_1reg")))]) +(define_insn "neon_vld3_dupv8bf" + [(set (match_operand:CI 0 "s_register_operand" "=w") + (unspec:CI [(match_operand:V2BF 1 "neon_struct_operand" "Um") + (unspec:V8BF [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] + UNSPEC_VLD2_DUP))] + "TARGET_BF16_SIMD" + { + rtx ops[4]; + int tabbase = REGNO (operands[0]); + + ops[3] = operands[1]; + ops[0] = gen_rtx_REG (V4BFmode, tabbase); + ops[1] = gen_rtx_REG (V4BFmode, tabbase + 2); + ops[2] = gen_rtx_REG (V4BFmode, tabbase + 4); + output_asm_insn ("vld3.16\t{%P0[], %P1[], %P2[]}, %A3", ops); + return ""; + } + [(set_attr "type" "neon_load3_all_lanes_q")] +) + (define_expand "vec_store_lanesei" [(set (match_operand:EI 0 "neon_struct_operand") (unspec:EI [(match_operand:EI 1 "s_register_operand") @@ -5955,7 +5996,7 @@ if (BYTES_BIG_ENDIAN) (define_insn "neon_vld4" [(set (match_operand:OI 0 "s_register_operand" "=w") (unspec:OI [(match_operand:OI 1 "neon_struct_operand" "Um") - (unspec:VDX [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] + (unspec:VDXBF [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] UNSPEC_VLD4))] "TARGET_NEON" { @@ -5983,7 +6024,7 @@ if (BYTES_BIG_ENDIAN) (define_expand "neon_vld4" [(match_operand:XI 0 "s_register_operand") (match_operand:XI 1 "neon_struct_operand") - (unspec:VQ2 [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] + (unspec:VQ2BF [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] "TARGET_NEON" { rtx mem; @@ -5998,7 +6039,7 @@ if (BYTES_BIG_ENDIAN) (define_insn "neon_vld4qa" [(set (match_operand:XI 0 "s_register_operand" "=w") (unspec:XI [(match_operand:OI 1 "neon_struct_operand" "Um") - (unspec:VQ2 [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] + (unspec:VQ2BF [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] UNSPEC_VLD4A))] "TARGET_NEON" { @@ -6019,7 +6060,7 @@ if (BYTES_BIG_ENDIAN) [(set (match_operand:XI 0 "s_register_operand" "=w") (unspec:XI [(match_operand:OI 1 "neon_struct_operand" "Um") (match_operand:XI 2 "s_register_operand" "0") - (unspec:VQ2 [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] + (unspec:VQ2BF [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] UNSPEC_VLD4B))] "TARGET_NEON" { @@ -6099,7 +6140,7 @@ if (BYTES_BIG_ENDIAN) (define_insn "neon_vld4_dup" [(set (match_operand:OI 0 "s_register_operand" "=w") (unspec:OI [(match_operand: 1 "neon_struct_operand" "Um") - (unspec:VDX [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] + (unspec:VDXBF [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] UNSPEC_VLD4_DUP))] "TARGET_NEON" { @@ -6125,6 +6166,27 @@ if (BYTES_BIG_ENDIAN) (const_string "neon_load1_1reg")))] ) +(define_insn "neon_vld4_dupv8bf" + [(set (match_operand:XI 0 "s_register_operand" "=w") + (unspec:XI [(match_operand:V2BF 1 "neon_struct_operand" "Um") + (unspec:V8BF [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] + UNSPEC_VLD2_DUP))] + "TARGET_BF16_SIMD" + { + rtx ops[5]; + int tabbase = REGNO (operands[0]); + + ops[4] = operands[1]; + ops[0] = gen_rtx_REG (V4BFmode, tabbase); + ops[1] = gen_rtx_REG (V4BFmode, tabbase + 2); + ops[2] = gen_rtx_REG (V4BFmode, tabbase + 4); + ops[3] = gen_rtx_REG (V4BFmode, tabbase + 6); + output_asm_insn ("vld4.16\t{%P0[], %P1[], %P2[], %P3[]}, %A4", ops); + return ""; + } + [(set_attr "type" "neon_load4_all_lanes_q")] +) + (define_expand "vec_store_lanesoi" [(set (match_operand:OI 0 "neon_struct_operand") (unspec:OI [(match_operand:OI 1 "s_register_operand") diff --git a/gcc/testsuite/gcc.target/arm/simd/bf16_vldn_1.c b/gcc/testsuite/gcc.target/arm/simd/bf16_vldn_1.c new file mode 100644 index 0000000000000000000000000000000000000000..222e7af945383bd93b6b280b516a56e684f1d651 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/simd/bf16_vldn_1.c @@ -0,0 +1,152 @@ +/* { dg-do assemble } */ +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */ +/* { dg-add-options arm_v8_2a_bf16_neon } */ +/* { dg-additional-options "-save-temps -O2 -mfloat-abi=hard" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#include "arm_neon.h" + + +/* +**test_vld2_bf16: +** ... +** vld2.16 {d0-d1}, \[r0\] +** bx lr +*/ +bfloat16x4x2_t +test_vld2_bf16 (bfloat16_t * ptr) +{ + return vld2_bf16 (ptr); +} + +/* +**test_vld2q_bf16: +** ... +** vld2.16 {d0-d3}, \[r0\] +** bx lr +*/ +bfloat16x8x2_t +test_vld2q_bf16 (bfloat16_t * ptr) +{ + return vld2q_bf16 (ptr); +} + +/* +**test_vld2_dup_bf16: +** ... +** vld2.16 {d0\[\], d1\[\]}, \[r0\] +** bx lr +*/ +bfloat16x4x2_t +test_vld2_dup_bf16 (bfloat16_t * ptr) +{ + return vld2_dup_bf16 (ptr); +} + +/* +**test_vld2q_dup_bf16: +** ... +** vld2.16 {d0, d1, d2, d3}, \[r0\] +** bx lr +*/ +bfloat16x8x2_t +test_vld2q_dup_bf16 (bfloat16_t * ptr) +{ + return vld2q_dup_bf16 (ptr); +} + +/* +**test_vld3_bf16: +** ... +** vld3.16 {d0-d2}, \[r0\] +** bx lr +*/ +bfloat16x4x3_t +test_vld3_bf16 (bfloat16_t * ptr) +{ + return vld3_bf16 (ptr); +} + +/* +**test_vld3q_bf16: +** ... +** vld3.16 {d1, d3, d5}, \[r0\] +** bx lr +*/ +bfloat16x8x3_t +test_vld3q_bf16 (bfloat16_t * ptr) +{ + return vld3q_bf16 (ptr); +} + +/* +**test_vld3_dup_bf16: +** ... +** vld3.16 {d0\[\], d1\[\], d2\[\]}, \[r0\] +** bx lr +*/ +bfloat16x4x3_t +test_vld3_dup_bf16 (bfloat16_t * ptr) +{ + return vld3_dup_bf16 (ptr); +} + +/* +**test_vld3q_dup_bf16: +** ... +** vld3.16 {d0\[\], d1\[\], d2\[\]}, \[r0\] +** bx lr +*/ +bfloat16x8x3_t +test_vld3q_dup_bf16 (bfloat16_t * ptr) +{ + return vld3q_dup_bf16 (ptr); +} + +/* +**test_vld4_bf16: +** ... +** vld4.16 {d0-d3}, \[r0\] +** bx lr +*/ +bfloat16x4x4_t +test_vld4_bf16 (bfloat16_t * ptr) +{ + return vld4_bf16 (ptr); +} + +/* +**test_vld4q_bf16: +** ... +** vld4.16 {d1, d3, d5, d7}, \[r0\] +** bx lr +*/ +bfloat16x8x4_t +test_vld4q_bf16 (bfloat16_t * ptr) +{ + return vld4q_bf16 (ptr); +} + +/* +**test_vld4_dup_bf16: +** ... +** vld4.16 {d0\[\], d1\[\], d2\[\], d3\[\]}, \[r0\] +** bx lr +*/ +bfloat16x4x4_t +test_vld4_dup_bf16 (bfloat16_t * ptr) +{ + return vld4_dup_bf16 (ptr); +} + +/* +**test_vld4q_dup_bf16: +** ... +** vld4.16 {d0\[\], d1\[\], d2\[\], d3\[\]}, \[r0\] +** bx lr +*/ +bfloat16x8x4_t +test_vld4q_dup_bf16 (bfloat16_t * ptr) +{ + return vld4q_dup_bf16 (ptr); +} --------------55DB96B3365C3C63E0C0F1AE--