From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by sourceware.org (Postfix) with ESMTPS id 888DD3857C4B for ; Fri, 28 Oct 2022 06:20:20 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 888DD3857C4B Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1666938020; x=1698474020; h=from:to:subject:date:message-id:references:in-reply-to: content-transfer-encoding:mime-version; bh=b0tQcRDIHZ3W21e8XIy1jL89RJKLMR81HoIWlrkzI+U=; b=BNJLtS6XsR/d+almHjreWhlNLxJWcWlIJntj+OS3QCSHDPcwXiVPus9K Z7b42xLEEaJzgxydRZYM2G611VMB1HpDs7J1wHfdZAfaL18cwXim/EGbj 4fVQU2cxdXxzhaTNUJxZhyQGI/fi1fGxD17W987TN297CfrBpcSCX5res HUnObsnY88uP/7ueG91vIAoYtEPHxFd1CmIBEnx5ZYZmeFUHbhwzyG4Fc v7vFM5YuwTaV3qoMkEUjULGe/D0OniFwUIbTIFb+1Ph/LTC15viP07Znk C+Qy82XUGRKGwPlgMl04/fEzDW63kQGmRHEP6l/PqJyqWuv0uoWLeYYRe g==; X-IronPort-AV: E=McAfee;i="6500,9779,10513"; a="372635157" X-IronPort-AV: E=Sophos;i="5.95,220,1661842800"; d="scan'208";a="372635157" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Oct 2022 23:20:11 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10513"; a="775266856" X-IronPort-AV: E=Sophos;i="5.95,220,1661842800"; d="scan'208";a="775266856" Received: from fmsmsx602.amr.corp.intel.com ([10.18.126.82]) by fmsmga001.fm.intel.com with ESMTP; 27 Oct 2022 23:20:11 -0700 Received: from fmsmsx612.amr.corp.intel.com (10.18.126.92) by fmsmsx602.amr.corp.intel.com (10.18.126.82) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Thu, 27 Oct 2022 23:20:11 -0700 Received: from fmsmsx610.amr.corp.intel.com (10.18.126.90) by fmsmsx612.amr.corp.intel.com (10.18.126.92) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Thu, 27 Oct 2022 23:20:10 -0700 Received: from FMSEDG603.ED.cps.intel.com (10.1.192.133) by fmsmsx610.amr.corp.intel.com (10.18.126.90) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31 via Frontend Transport; Thu, 27 Oct 2022 23:20:10 -0700 Received: from NAM11-BN8-obe.outbound.protection.outlook.com (104.47.58.168) by edgegateway.intel.com (192.55.55.68) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2375.31; Thu, 27 Oct 2022 23:20:10 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=mmY1DEuIOL0CvcqKlWgE0bF8+4YvlnfAqhQcE+fGu7THvUqTQM9tzVvlX7sDAP0jNY9KG3PJ+iqx7YNn9ftc4GjvA8B2ygUtKZeWuN7Y6N4IOcaJ8junEubbJJQxcdBjSH/u0yg4A6wkC+TdQsoXEY6jdjboxr2iUS2Hl7y1Cb2AZGK2ozTaGSenqp61cQTuZnw+b1rcQ96GYFFvfmXLFNmwGO8gOhzU4MrdxDlj8E+NsSCwgE31l8v5whQ+4aLtDlqJ8oyOapi9jgoiBZ0u6I5m4/0J32U+pAMJMJZg/UqMUHq04S1UlaJsRfSuewJ6bB0qR9hUs9zpmSV3G+iXDA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=/+U87goPaSpzbFdCb9wv0nXThDNlVTEB9T1NwwJKHZw=; b=JRV/sWW47ptWZ5/Iy+mHnnMR5UDas2Q1c4qfWkrpkvpvuCOb3iweietK/VEmKy+nuTlrDWkJ9lkC4tjNBVb8ARoI2dBa3QCbsx0fmdrsQYgyCqKJthJNG4W1fyQeO+VN+ZbbrybktTE8Dq9wjneSgWfFckU/p5ntzrdejBAKycX+vucITtpPx2kaA7+yhYxhvNkXx3SznuIgAQY/U2UwFneEhxMqEmUb3ICRRs5/wnz7eRafJdBV1KvfgZHY9c0J/5I0TwisKrKFTgbX/nhObZeRZw8jkmlDdRZqFtx5zrMEDiU7gZAIap/JXEy0JaXZdAETfROq3PHNZgUGYx+zhg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none Received: from DM4PR11MB5487.namprd11.prod.outlook.com (2603:10b6:5:39f::22) by BL1PR11MB5398.namprd11.prod.outlook.com (2603:10b6:208:31c::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5746.28; Fri, 28 Oct 2022 06:20:07 +0000 Received: from DM4PR11MB5487.namprd11.prod.outlook.com ([fe80::dc50:e9a3:2270:4a70]) by DM4PR11MB5487.namprd11.prod.outlook.com ([fe80::dc50:e9a3:2270:4a70%9]) with mapi id 15.20.5746.028; Fri, 28 Oct 2022 06:20:06 +0000 From: "Kong, Lingling" To: "Liu, Hongtao" , "gcc-patches@gcc.gnu.org" Subject: [PATCH] i386: using __bf16 for AVX512BF16 intrinsics Thread-Topic: [PATCH] i386: using __bf16 for AVX512BF16 intrinsics Thread-Index: AQHY6pOuDl6wgtzUekOWNrXCUyMhu64jUqtg Date: Fri, 28 Oct 2022 06:20:06 +0000 Message-ID: References: <20221028060808.1637178-1-lingling.kong@intel.com> In-Reply-To: <20221028060808.1637178-1-lingling.kong@intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-product: dlpe-windows dlp-version: 11.6.500.17 dlp-reaction: no-action authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=intel.com; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: DM4PR11MB5487:EE_|BL1PR11MB5398:EE_ x-ms-office365-filtering-correlation-id: 8898a0ca-ff21-45d3-851d-08dab8ac755c x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: GtU+l1k62M838Cr3IYU9Zdkg1/tgcbtPH6YEcZteV91tQfCstxErpzYS9eC9a7WbYBqP3f9+iKs8hYTcsQD8Tn6LR6oga8JDUQJxWckWR2dSvHKcgksPmc/zD0edRMamfbRIvT8hIJafae4iQM5ghzSqPgomqAfxtRWEEIwAZLvPxKYxUhUZSaYmPbljwgo7d0DKjIgxgZ6U1PcSkHNOQl0Do5zsdtn0zt8WTKi34rEFUjOlnrDW57v/he4024W07aL+86JRMuZ0DllpOac40nmHPGcRdP0BGBTcVPJKH3G/BUtKHaRlA/S/3lFbiFeWiSmVACcxheTZYonB4Y5YZnpmsJkl4F8o+K24qh0EKaEabc3XOwJQ5v7FTE/NYvUMGuwbjqJInMeMKGj+d4jhI2/JDMAgCUxOaSHUJ1yUufxM1giWPBwtuRdtXq24p/zN2PqEs8C9RUghUEa5QRyD14Df/AjpZ+Ks4ZT3HYHipwqWriuGSOzHD6qOJysQhCDnaGdUTp4Rtj0g1J4rXkjXLayLTY3M2pK7Uo8DGXifrl/FQzl6N1lofuP9UJulokcVf1MnF3CSataS9/nD4qvpL3NSflVlHXAA6UHT2nt80dr+FEmiG73Ub1HvNkLnKbM1gMgchhI4iN85mroV4OEdKysTgOpYRwceIUDqMRRt8Nxt4EkoogVaX+IorBo0TDJ3ha/CQBVnhooQ0OYtc12KnJRkJy4871NeWei3Nu6ZL0nhl3JhSrS7fXOMZfr6lIS/ceUdilyuhv0B8hqNdLnU570ikJNtkzI75MX0XJiDkKM= x-forefront-antispam-report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:DM4PR11MB5487.namprd11.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230022)(346002)(396003)(376002)(136003)(366004)(39860400002)(451199015)(84970400001)(316002)(66556008)(26005)(66476007)(55016003)(8936002)(66446008)(38070700005)(83380400001)(110136005)(66946007)(30864003)(41300700001)(478600001)(64756008)(8676002)(33656002)(2906002)(52536014)(71200400001)(5660300002)(9686003)(186003)(122000001)(38100700002)(76116006)(6506007)(86362001)(7696005)(82960400001)(559001)(579004);DIR:OUT;SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?us-ascii?Q?fXFThrp2YmuB2kP2Ljq4gSZnqgGpUV8gERwKYer0Lwk41n7awsWOSecZb1BQ?= =?us-ascii?Q?n0UOJRUgiHBr8YnZwcCT8OMEqvEYtXgQmMMFgXfzGm0AcS7NbTYKZKgncJ6h?= =?us-ascii?Q?vXw1cZk/ca+LTQyVuWvYrw90G2I7DdCHW1hrxoV7iKZ9tbeadTCsYCMuHdhA?= =?us-ascii?Q?EkidyTVuOrArdthauIgJvwbcgZF9Tw3lnYPL3ij8XJwnNAsB/jhaSwgaf24p?= =?us-ascii?Q?/eyWxmxsHBrQpyWkrtuZ2Q1IGvaVa5j3ArDG5R6AC8uwWLIQBWgQzZiplZ8P?= =?us-ascii?Q?iTErUEgf930lfIp7RmcGX/6tJ0kgIWHY/fEjnnoLkgljQQbdI/v2F/g1eTD+?= =?us-ascii?Q?mofdkmMHaYIWOA+IYLxrQ4YRBNrvxthIv/uXOHqKmpa9gibvz/mW1l5Kefjc?= =?us-ascii?Q?85FWPg47TAUjfOrGGF6RgWdryDiSkU5JRFdajps74dUPFC+vV0V3+yUFqVPp?= =?us-ascii?Q?a9RsYo8jGQaaXJkXSkiX8tTAZhQF3X0mdKzjwMUZJKB0mm8kRbsG/rJPyMkN?= =?us-ascii?Q?Ai1iqlQHOTU3/t1WgkQpSylr+UKJkwdOBaKwOZDVGpvDopO76bC+FxQIt4mc?= =?us-ascii?Q?/x/NExsr0tdRkZSNQVf8PrDMWIjGVx6fR+McyjZHu4jFogNfUVpeXRCULhdz?= =?us-ascii?Q?NAeAj97NSauDhnWE2kTznHYkuM/FA0wLnKccdwpuijnylhUWgmQjuwVoHMWg?= =?us-ascii?Q?J/yVGuRgi+BHwl2mqKx/exeWzwaTmmlajShEJbaRZcOcCm3VxV2BdcuRnvg5?= =?us-ascii?Q?OFqpjcE+t3E/IcLAjVa9BOpsyde0JkqU/VYPKCwQ5AT73lR5uPpA4OKglTpb?= =?us-ascii?Q?DT3SBox7XQ+3riOIxYy4WvE3L65nhbniN/8yLO535S/zku9Yx4iD82ZUGrvD?= =?us-ascii?Q?gszDL9xtbqKo+7IBwpJnaCg1j5zkAkwS87MW6dw3BGrddqjU0zaakMkxNFXI?= =?us-ascii?Q?jEeSMBhB/6MUsuJO1Vt6Lg3uC5lxzXKtxgmQNkFXrDqvT8+fy1MN0Xa8+MmG?= =?us-ascii?Q?Tbqi8Nx8V/op1TlNxn/mtOKR2sqCKnoVB/+0ix6FoiE0wXLt60Yx/BP3aMZo?= =?us-ascii?Q?WjYKRQ9uIAEKBZoyj/ycMDVml0NG4e+0px6mm2DfR+K4gHz2MUzupTK99xPJ?= =?us-ascii?Q?T3ptd5+V1NKKn30Z+GIdPPpXr4SCqpExw2JK8MXmSC7O3yj1x4ycbgjqP230?= =?us-ascii?Q?hexy+3b2t9QuXaJr1PEzBP3sGKp/ytw+BgxLKUbjxzypCCtXflU/RuC9GZls?= =?us-ascii?Q?i3Yh7Gwp58TekiiI+Y33K/T9LWcuOnCNmEKD0eDpVIKlFBRqT5x7HlK2XuzP?= =?us-ascii?Q?MOfWOc3R2l0JFRu5HY/RTh6SwphIzf7HMRXDGY+OjO2KoJIiLBL7gQHS0W3L?= =?us-ascii?Q?oF4owIwMUHuipE365wzws5YAmCZdhOMlisvzbbuzVzkMxHx5gr/cDdy6COwi?= =?us-ascii?Q?LKmhNR3/xCFVsB2Yk73fABkbBRdrEO9goy/xUSXlq85n+zRAkUMZxEcex9xX?= =?us-ascii?Q?hMq/qKkRt51JGPkGbreQCWXjKQ/jEuOHyZmFUr22tVyLyeFTQzIXlWCeGmHL?= =?us-ascii?Q?3ISMpxuIvXfPKA/v5LSfJi40xFZume9NocGs4G+h?= Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: DM4PR11MB5487.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 8898a0ca-ff21-45d3-851d-08dab8ac755c X-MS-Exchange-CrossTenant-originalarrivaltime: 28 Oct 2022 06:20:06.8761 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: VfQzDP5SaZqP3jrWqa+xGo8K73BidJ1idXMKiRgS5uIzPrIBBCvJKNk8DqRyw0clpTmum+9HhCIXEXYNyEhy1g== X-MS-Exchange-Transport-CrossTenantHeadersStamped: BL1PR11MB5398 X-OriginatorOrg: intel.com X-Spam-Status: No, score=-13.5 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi, Previously we use unsigned short to represent bf16. It's not a good express= ion, and at the time the front end didn't support bf16 type. Now we introduced __bf16 to X86 psABI. So we can switch intrinsics to the n= ew type. Ok for trunk ? Thanks, Lingling gcc/ChangeLog: * config/i386/avx512bf16intrin.h (__attribute__): Change short to bf16. (_mm_cvtsbh_ss): Ditto. (_mm512_cvtne2ps_pbh): Ditto. (_mm512_mask_cvtne2ps_pbh): Ditto. (_mm512_maskz_cvtne2ps_pbh): Ditto. * config/i386/avx512bf16vlintrin.h (__attribute__): Ditto. (_mm256_cvtne2ps_pbh): Ditto. (_mm256_mask_cvtne2ps_pbh): Ditto. (_mm256_maskz_cvtne2ps_pbh): Ditto. (_mm_cvtne2ps_pbh): Ditto. (_mm_mask_cvtne2ps_pbh): Ditto. (_mm_maskz_cvtne2ps_pbh): Ditto. (_mm_cvtness_sbh): Ditto. * config/i386/i386-builtin-types.def (V8BF): Add new DEF_VECTOR_TYPE for BFmode. (V16BF): Ditto. (V32BF): Ditto. * config/i386/i386-builtin.def (BDESC): Fixed builtins. * config/i386/i386-expand.cc (ix86_expand_args_builtin): Changed avx512bf16 ix86_builtin_func_type included HI to BF. * config/i386/immintrin.h: Add SSE2 depend for avx512bf16. * config/i386/sse.md (TARGET_AVX512VL): Changed HI vector to BF vector. (avx512f_cvtneps2bf16_v4sf): New define_expand. (*avx512f_cvtneps2bf16_v4sf): New define_insn. (avx512f_cvtneps2bf16_v4sf_maskz):Ditto. (avx512f_cvtneps2bf16_v4sf_mask): Ditto. (avx512f_cvtneps2bf16_v4sf_mask_1): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512bf16-cvtsbh2ss-1.c: Add fpmath option. * gcc.target/i386/avx512bf16-vdpbf16ps-2.c: Fixed scan-assembler. * gcc.target/i386/avx512bf16vl-cvtness2sbh-1.c: Add x/y suffix for vcvtneps2bf16. * gcc.target/i386/avx512bf16vl-vcvtneps2bf16-1.c: Ditto. --- gcc/config/i386/avx512bf16intrin.h | 12 +-- gcc/config/i386/avx512bf16vlintrin.h | 29 ++--- gcc/config/i386/i386-builtin-types.def | 51 ++++----- gcc/config/i386/i386-builtin.def | 54 +++++----- gcc/config/i386/i386-expand.cc | 48 ++++----- gcc/config/i386/immintrin.h | 2 + gcc/config/i386/sse.md | 101 ++++++++++++++---- .../gcc.target/i386/avx512bf16-cvtsbh2ss-1.c | 2 +- .../gcc.target/i386/avx512bf16-vdpbf16ps-2.c | 2 +- .../i386/avx512bf16vl-cvtness2sbh-1.c | 2 +- .../i386/avx512bf16vl-vcvtneps2bf16-1.c | 12 +-- 11 files changed, 189 insertions(+), 126 deletions(-) diff --git a/gcc/config/i386/avx512bf16intrin.h b/gcc/config/i386/avx512bf1= 6intrin.h index b6e9ddad157..ea1d0125b3f 100644 --- a/gcc/config/i386/avx512bf16intrin.h +++ b/gcc/config/i386/avx512bf16intrin.h @@ -35,16 +35,16 @@ #endif /* __AVX512BF16__ */ =20 /* Internal data types for implementing the intrinsics. */ -typedef short __v32bh __attribute__ ((__vector_size__ (64))); +typedef __bf16 __v32bf __attribute__ ((__vector_size__ (64))); =20 /* The Intel API is flexible enough that we must allow aliasing with other vector types, and their scalar components. */ -typedef short __m512bh __attribute__ ((__vector_size__ (64), __may_alias__= )); +typedef __bf16 __m512bh __attribute__ ((__vector_size__ (64), __may_alias_= _)); =20 /* Convert One BF16 Data to One Single Float Data. */ extern __inline float __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_cvtsbh_ss (__bfloat16 __A) +_mm_cvtsbh_ss (__bf16 __A) { union{ float a; unsigned int b;} __tmp; __tmp.b =3D ((unsigned int)(__A)) << 16; @@ -57,21 +57,21 @@ extern __inline __m512bh __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm512_cvtne2ps_pbh (__m512 __A, __m512 __B) { - return (__m512bh)__builtin_ia32_cvtne2ps2bf16_v32hi(__A, __B); + return (__m512bh)__builtin_ia32_cvtne2ps2bf16_v32bf(__A, __B); } =20 extern __inline __m512bh __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm512_mask_cvtne2ps_pbh (__m512bh __A, __mmask32 __B, __m512 __C, __m512 = __D) { - return (__m512bh)__builtin_ia32_cvtne2ps2bf16_v32hi_mask(__C, __D, __A, = __B); + return (__m512bh)__builtin_ia32_cvtne2ps2bf16_v32bf_mask(__C, __D, __A, = __B); } =20 extern __inline __m512bh __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm512_maskz_cvtne2ps_pbh (__mmask32 __A, __m512 __B, __m512 __C) { - return (__m512bh)__builtin_ia32_cvtne2ps2bf16_v32hi_maskz(__B, __C, __A)= ; + return (__m512bh)__builtin_ia32_cvtne2ps2bf16_v32bf_maskz(__B, __C, __A)= ; } =20 /* vcvtneps2bf16 */ diff --git a/gcc/config/i386/avx512bf16vlintrin.h b/gcc/config/i386/avx512b= f16vlintrin.h index 969335ff358..56c28f14cf6 100644 --- a/gcc/config/i386/avx512bf16vlintrin.h +++ b/gcc/config/i386/avx512bf16vlintrin.h @@ -35,57 +35,58 @@ #endif /* __AVX512BF16__ */ =20 /* Internal data types for implementing the intrinsics. */ -typedef short __v16bh __attribute__ ((__vector_size__ (32))); -typedef short __v8bh __attribute__ ((__vector_size__ (16))); +typedef __bf16 __v16bf __attribute__ ((__vector_size__ (32))); +typedef __bf16 __v8bf __attribute__ ((__vector_size__ (16))); =20 /* The Intel API is flexible enough that we must allow aliasing with other vector types, and their scalar components. */ -typedef short __m256bh __attribute__ ((__vector_size__ (32), __may_alias__= )); -typedef short __m128bh __attribute__ ((__vector_size__ (16), __may_alias__= )); +typedef __bf16 __m256bh __attribute__ ((__vector_size__ (32), __may_alias_= _)); +typedef __bf16 __m128bh __attribute__ ((__vector_size__ (16), __may_alias_= _)); + +typedef __bf16 __bfloat16; =20 -typedef unsigned short __bfloat16; /* vcvtne2ps2bf16 */ =20 extern __inline __m256bh __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm256_cvtne2ps_pbh (__m256 __A, __m256 __B) { - return (__m256bh)__builtin_ia32_cvtne2ps2bf16_v16hi(__A, __B); + return (__m256bh)__builtin_ia32_cvtne2ps2bf16_v16bf(__A, __B); } =20 extern __inline __m256bh __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm256_mask_cvtne2ps_pbh (__m256bh __A, __mmask16 __B, __m256 __C, __m256 = __D) { - return (__m256bh)__builtin_ia32_cvtne2ps2bf16_v16hi_mask(__C, __D, __A, = __B); + return (__m256bh)__builtin_ia32_cvtne2ps2bf16_v16bf_mask(__C, __D, __A, = __B); } =20 extern __inline __m256bh __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm256_maskz_cvtne2ps_pbh (__mmask16 __A, __m256 __B, __m256 __C) { - return (__m256bh)__builtin_ia32_cvtne2ps2bf16_v16hi_maskz(__B, __C, __A)= ; + return (__m256bh)__builtin_ia32_cvtne2ps2bf16_v16bf_maskz(__B, __C, __A)= ; } =20 extern __inline __m128bh __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_cvtne2ps_pbh (__m128 __A, __m128 __B) { - return (__m128bh)__builtin_ia32_cvtne2ps2bf16_v8hi(__A, __B); + return (__m128bh)__builtin_ia32_cvtne2ps2bf16_v8bf(__A, __B); } =20 extern __inline __m128bh __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_mask_cvtne2ps_pbh (__m128bh __A, __mmask8 __B, __m128 __C, __m128 __D) { - return (__m128bh)__builtin_ia32_cvtne2ps2bf16_v8hi_mask(__C, __D, __A, _= _B); + return (__m128bh)__builtin_ia32_cvtne2ps2bf16_v8bf_mask(__C, __D, __A, _= _B); } =20 extern __inline __m128bh __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_maskz_cvtne2ps_pbh (__mmask8 __A, __m128 __B, __m128 __C) { - return (__m128bh)__builtin_ia32_cvtne2ps2bf16_v8hi_maskz(__B, __C, __A); + return (__m128bh)__builtin_ia32_cvtne2ps2bf16_v8bf_maskz(__B, __C, __A); } =20 /* vcvtneps2bf16 */ @@ -176,13 +177,13 @@ _mm_maskz_dpbf16_ps (__mmask8 __A, __m128 __B, __m128= bh __C, __m128bh __D) return (__m128)__builtin_ia32_dpbf16ps_v4sf_maskz(__B, __C, __D, __A); } =20 -extern __inline __bfloat16 +extern __inline __bf16 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm_cvtness_sbh (float __A) { __v4sf __V =3D {__A, 0, 0, 0}; - __v8hi __R =3D __builtin_ia32_cvtneps2bf16_v4sf_mask ((__v4sf)__V, - (__v8hi)_mm_undefined_si128 (), (__mmask8)-1); + __v8bf __R =3D __builtin_ia32_cvtneps2bf16_v4sf_mask ((__v4sf)__V, + (__v8bf)_mm_undefined_si128 (), (__mmask8)-1); return __R[0]; } =20 diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-= builtin-types.def index 63a360b0f8b..aedae2d7750 100644 --- a/gcc/config/i386/i386-builtin-types.def +++ b/gcc/config/i386/i386-builtin-types.def @@ -87,6 +87,7 @@ DEF_VECTOR_TYPE (V8QI, QI) DEF_VECTOR_TYPE (V2DF, DOUBLE) DEF_VECTOR_TYPE (V4SF, FLOAT) DEF_VECTOR_TYPE (V8HF, FLOAT16) +DEF_VECTOR_TYPE (V8BF, BFLOAT16) DEF_VECTOR_TYPE (V2DI, DI) DEF_VECTOR_TYPE (V4SI, SI) DEF_VECTOR_TYPE (V8HI, HI) @@ -100,6 +101,7 @@ DEF_VECTOR_TYPE (V16UQI, UQI, V16QI) DEF_VECTOR_TYPE (V4DF, DOUBLE) DEF_VECTOR_TYPE (V8SF, FLOAT) DEF_VECTOR_TYPE (V16HF, FLOAT16) +DEF_VECTOR_TYPE (V16BF, BFLOAT16) DEF_VECTOR_TYPE (V4DI, DI) DEF_VECTOR_TYPE (V8SI, SI) DEF_VECTOR_TYPE (V16HI, HI) @@ -111,6 +113,7 @@ DEF_VECTOR_TYPE (V16UHI, UHI, V16HI) # AVX512F vectors DEF_VECTOR_TYPE (V32SF, FLOAT) DEF_VECTOR_TYPE (V32HF, FLOAT16) +DEF_VECTOR_TYPE (V32BF, BFLOAT16) DEF_VECTOR_TYPE (V16SF, FLOAT) DEF_VECTOR_TYPE (V8DF, DOUBLE) DEF_VECTOR_TYPE (V8DI, DI) @@ -1273,30 +1276,30 @@ DEF_FUNCTION_TYPE (V4SI, V4SI, V4SI, UHI) DEF_FUNCTION_TYPE (V8SI, V8SI, V8SI, UHI) =20 # BF16 builtins -DEF_FUNCTION_TYPE (V32HI, V16SF, V16SF) -DEF_FUNCTION_TYPE (V32HI, V16SF, V16SF, V32HI, USI) -DEF_FUNCTION_TYPE (V32HI, V16SF, V16SF, USI) -DEF_FUNCTION_TYPE (V16HI, V8SF, V8SF) -DEF_FUNCTION_TYPE (V16HI, V8SF, V8SF, V16HI, UHI) -DEF_FUNCTION_TYPE (V16HI, V8SF, V8SF, UHI) -DEF_FUNCTION_TYPE (V8HI, V4SF, V4SF) -DEF_FUNCTION_TYPE (V8HI, V4SF, V4SF, V8HI, UQI) -DEF_FUNCTION_TYPE (V8HI, V4SF, V4SF, UQI) -DEF_FUNCTION_TYPE (V16HI, V16SF) -DEF_FUNCTION_TYPE (V16HI, V16SF, V16HI, UHI) -DEF_FUNCTION_TYPE (V16HI, V16SF, UHI) -DEF_FUNCTION_TYPE (V8HI, V8SF) -DEF_FUNCTION_TYPE (V8HI, V8SF, V8HI, UQI) -DEF_FUNCTION_TYPE (V8HI, V8SF, UQI) -DEF_FUNCTION_TYPE (V8HI, V4SF) -DEF_FUNCTION_TYPE (V8HI, V4SF, V8HI, UQI) -DEF_FUNCTION_TYPE (V8HI, V4SF, UQI) -DEF_FUNCTION_TYPE (V16SF, V16SF, V32HI, V32HI) -DEF_FUNCTION_TYPE (V16SF, V16SF, V32HI, V32HI, UHI) -DEF_FUNCTION_TYPE (V8SF, V8SF, V16HI, V16HI) -DEF_FUNCTION_TYPE (V8SF, V8SF, V16HI, V16HI, UQI) -DEF_FUNCTION_TYPE (V4SF, V4SF, V8HI, V8HI) -DEF_FUNCTION_TYPE (V4SF, V4SF, V8HI, V8HI, UQI) +DEF_FUNCTION_TYPE (V32BF, V16SF, V16SF) +DEF_FUNCTION_TYPE (V32BF, V16SF, V16SF, V32BF, USI) +DEF_FUNCTION_TYPE (V32BF, V16SF, V16SF, USI) +DEF_FUNCTION_TYPE (V16BF, V8SF, V8SF) +DEF_FUNCTION_TYPE (V16BF, V8SF, V8SF, V16BF, UHI) +DEF_FUNCTION_TYPE (V16BF, V8SF, V8SF, UHI) +DEF_FUNCTION_TYPE (V8BF, V4SF, V4SF) +DEF_FUNCTION_TYPE (V8BF, V4SF, V4SF, V8BF, UQI) +DEF_FUNCTION_TYPE (V8BF, V4SF, V4SF, UQI) +DEF_FUNCTION_TYPE (V16BF, V16SF) +DEF_FUNCTION_TYPE (V16BF, V16SF, V16BF, UHI) +DEF_FUNCTION_TYPE (V16BF, V16SF, UHI) +DEF_FUNCTION_TYPE (V8BF, V8SF) +DEF_FUNCTION_TYPE (V8BF, V8SF, V8BF, UQI) +DEF_FUNCTION_TYPE (V8BF, V8SF, UQI) +DEF_FUNCTION_TYPE (V8BF, V4SF) +DEF_FUNCTION_TYPE (V8BF, V4SF, V8BF, UQI) +DEF_FUNCTION_TYPE (V8BF, V4SF, UQI) +DEF_FUNCTION_TYPE (V16SF, V16SF, V32BF, V32BF) +DEF_FUNCTION_TYPE (V16SF, V16SF, V32BF, V32BF, UHI) +DEF_FUNCTION_TYPE (V8SF, V8SF, V16BF, V16BF) +DEF_FUNCTION_TYPE (V8SF, V8SF, V16BF, V16BF, UQI) +DEF_FUNCTION_TYPE (V4SF, V4SF, V8BF, V8BF) +DEF_FUNCTION_TYPE (V4SF, V4SF, V8BF, V8BF, UQI) =20 # KEYLOCKER builtins DEF_FUNCTION_TYPE (UINT, UINT, V2DI, V2DI, PVOID) diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builti= n.def index e35306e27d0..5802e2049a8 100644 --- a/gcc/config/i386/i386-builtin.def +++ b/gcc/config/i386/i386-builtin.def @@ -2779,33 +2779,33 @@ BDESC (0, OPTION_MASK_ISA2_VAES, CODE_FOR_vaesencla= st_v32qi, "__builtin_ia32_vae BDESC (0, OPTION_MASK_ISA2_VAES, CODE_FOR_vaesenclast_v64qi, "__builtin_ia= 32_vaesenclast_v64qi", IX86_BUILTIN_VAESENCLAST64, UNKNOWN, (int) V64QI_FTY= PE_V64QI_V64QI) =20 /* BF16 */ -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v32h= i, "__builtin_ia32_cvtne2ps2bf16_v32hi", IX86_BUILTIN_CVTNE2PS2HI16_V32HI, = UNKNOWN, (int) V32HI_FTYPE_V16SF_V16SF) -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v32h= i_mask, "__builtin_ia32_cvtne2ps2bf16_v32hi_mask", IX86_BUILTIN_CVTNE2PS2HI= 16_V32HI_MASK, UNKNOWN, (int) V32HI_FTYPE_V16SF_V16SF_V32HI_USI) -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v32h= i_maskz, "__builtin_ia32_cvtne2ps2bf16_v32hi_maskz", IX86_BUILTIN_CVTNE2PS2= HI16_V32HI_MASKZ, UNKNOWN, (int) V32HI_FTYPE_V16SF_V16SF_USI) -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v16h= i, "__builtin_ia32_cvtne2ps2bf16_v16hi", IX86_BUILTIN_CVTNE2PS2HI16_V16HI, = UNKNOWN, (int) V16HI_FTYPE_V8SF_V8SF) -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v16h= i_mask, "__builtin_ia32_cvtne2ps2bf16_v16hi_mask", IX86_BUILTIN_CVTNE2PS2HI= 16_V16HI_MASK, UNKNOWN, (int) V16HI_FTYPE_V8SF_V8SF_V16HI_UHI) -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v16h= i_maskz, "__builtin_ia32_cvtne2ps2bf16_v16hi_maskz", IX86_BUILTIN_CVTNE2PS2= HI16_V16HI_MASKZ, UNKNOWN, (int) V16HI_FTYPE_V8SF_V8SF_UHI) -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v8hi= , "__builtin_ia32_cvtne2ps2bf16_v8hi", IX86_BUILTIN_CVTNE2PS2HI16_V8HI, UNK= NOWN, (int) V8HI_FTYPE_V4SF_V4SF) -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v8hi= _mask, "__builtin_ia32_cvtne2ps2bf16_v8hi_mask", IX86_BUILTIN_CVTNE2PS2HI16= _V8HI_MASK, UNKNOWN, (int) V8HI_FTYPE_V4SF_V4SF_V8HI_UQI) -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v8hi= _maskz, "__builtin_ia32_cvtne2ps2bf16_v8hi_maskz", IX86_BUILTIN_CVTNE2PS2HI= 16_V8HI_MASKZ, UNKNOWN, (int) V8HI_FTYPE_V4SF_V4SF_UQI) -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtneps2bf16_v16sf= , "__builtin_ia32_cvtneps2bf16_v16sf", IX86_BUILTIN_CVTNEPS2HI16_V16SF, UNK= NOWN, (int) V16HI_FTYPE_V16SF) -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtneps2bf16_v16sf= _mask, "__builtin_ia32_cvtneps2bf16_v16sf_mask", IX86_BUILTIN_CVTNEPS2HI16_= V16SF_MASK, UNKNOWN, (int) V16HI_FTYPE_V16SF_V16HI_UHI) -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtneps2bf16_v16sf= _maskz, "__builtin_ia32_cvtneps2bf16_v16sf_maskz", IX86_BUILTIN_CVTNE2PS2HI= 16_V16SF_MASKZ, UNKNOWN, (int) V16HI_FTYPE_V16SF_UHI) -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtneps2bf16_v8sf,= "__builtin_ia32_cvtneps2bf16_v8sf", IX86_BUILTIN_CVTNEPS2HI16_V8SF, UNKNOW= N, (int) V8HI_FTYPE_V8SF) -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtneps2bf16_v8sf_= mask, "__builtin_ia32_cvtneps2bf16_v8sf_mask", IX86_BUILTIN_CVTNEPS2HI16_V8= SF_MASK, UNKNOWN, (int) V8HI_FTYPE_V8SF_V8HI_UQI) -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtneps2bf16_v8sf_= maskz, "__builtin_ia32_cvtneps2bf16_v8sf_maskz", IX86_BUILTIN_CVTNE2PS2HI16= _V8SF_MASKZ, UNKNOWN, (int) V8HI_FTYPE_V8SF_UQI) -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtneps2bf16_v4sf,= "__builtin_ia32_cvtneps2bf16_v4sf", IX86_BUILTIN_CVTNEPS2HI16_V4SF, UNKNOW= N, (int) V8HI_FTYPE_V4SF) -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtneps2bf16_v4sf_= mask, "__builtin_ia32_cvtneps2bf16_v4sf_mask", IX86_BUILTIN_CVTNEPS2HI16_V4= SF_MASK, UNKNOWN, (int) V8HI_FTYPE_V4SF_V8HI_UQI) -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtneps2bf16_v4sf_= maskz, "__builtin_ia32_cvtneps2bf16_v4sf_maskz", IX86_BUILTIN_CVTNE2PS2HI16= _V4SF_MASKZ, UNKNOWN, (int) V8HI_FTYPE_V4SF_UQI) -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v16sf, "_= _builtin_ia32_dpbf16ps_v16sf", IX86_BUILTIN_DPHI16PS_V16SF, UNKNOWN, (int) = V16SF_FTYPE_V16SF_V32HI_V32HI) -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v16sf_mas= k, "__builtin_ia32_dpbf16ps_v16sf_mask", IX86_BUILTIN_DPHI16PS_V16SF_MASK, = UNKNOWN, (int) V16SF_FTYPE_V16SF_V32HI_V32HI_UHI) -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v16sf_mas= kz, "__builtin_ia32_dpbf16ps_v16sf_maskz", IX86_BUILTIN_DPHI16PS_V16SF_MASK= Z, UNKNOWN, (int) V16SF_FTYPE_V16SF_V32HI_V32HI_UHI) -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v8sf, "__= builtin_ia32_dpbf16ps_v8sf", IX86_BUILTIN_DPHI16PS_V8SF, UNKNOWN, (int) V8S= F_FTYPE_V8SF_V16HI_V16HI) -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v8sf_mask= , "__builtin_ia32_dpbf16ps_v8sf_mask", IX86_BUILTIN_DPHI16PS_V8SF_MASK, UNK= NOWN, (int) V8SF_FTYPE_V8SF_V16HI_V16HI_UQI) -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v8sf_mask= z, "__builtin_ia32_dpbf16ps_v8sf_maskz", IX86_BUILTIN_DPHI16PS_V8SF_MASKZ, = UNKNOWN, (int) V8SF_FTYPE_V8SF_V16HI_V16HI_UQI) -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v4sf, "__= builtin_ia32_dpbf16ps_v4sf", IX86_BUILTIN_DPHI16PS_V4SF, UNKNOWN, (int) V4S= F_FTYPE_V4SF_V8HI_V8HI) -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v4sf_mask= , "__builtin_ia32_dpbf16ps_v4sf_mask", IX86_BUILTIN_DPHI16PS_V4SF_MASK, UNK= NOWN, (int) V4SF_FTYPE_V4SF_V8HI_V8HI_UQI) -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v4sf_mask= z, "__builtin_ia32_dpbf16ps_v4sf_maskz", IX86_BUILTIN_DPHI16PS_V4SF_MASKZ, = UNKNOWN, (int) V4SF_FTYPE_V4SF_V8HI_V8HI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v32b= f, "__builtin_ia32_cvtne2ps2bf16_v32bf", IX86_BUILTIN_CVTNE2PS2BF16_V32BF, = UNKNOWN, (int) V32BF_FTYPE_V16SF_V16SF) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v32b= f_mask, "__builtin_ia32_cvtne2ps2bf16_v32bf_mask", IX86_BUILTIN_CVTNE2PS2BF= 16_V32BF_MASK, UNKNOWN, (int) V32BF_FTYPE_V16SF_V16SF_V32BF_USI) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v32b= f_maskz, "__builtin_ia32_cvtne2ps2bf16_v32bf_maskz", IX86_BUILTIN_CVTNE2PS2= BF16_V32BF_MASKZ, UNKNOWN, (int) V32BF_FTYPE_V16SF_V16SF_USI) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v16b= f, "__builtin_ia32_cvtne2ps2bf16_v16bf", IX86_BUILTIN_CVTNE2PS2BF16_V16BF, = UNKNOWN, (int) V16BF_FTYPE_V8SF_V8SF) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v16b= f_mask, "__builtin_ia32_cvtne2ps2bf16_v16bf_mask", IX86_BUILTIN_CVTNE2PS2BF= 16_V16BF_MASK, UNKNOWN, (int) V16BF_FTYPE_V8SF_V8SF_V16BF_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v16b= f_maskz, "__builtin_ia32_cvtne2ps2bf16_v16bf_maskz", IX86_BUILTIN_CVTNE2PS2= BF16_V16BF_MASKZ, UNKNOWN, (int) V16BF_FTYPE_V8SF_V8SF_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v8bf= , "__builtin_ia32_cvtne2ps2bf16_v8bf", IX86_BUILTIN_CVTNE2PS2BF16_V8BF, UNK= NOWN, (int) V8BF_FTYPE_V4SF_V4SF) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v8bf= _mask, "__builtin_ia32_cvtne2ps2bf16_v8bf_mask", IX86_BUILTIN_CVTNE2PS2BF16= _V8BF_MASK, UNKNOWN, (int) V8BF_FTYPE_V4SF_V4SF_V8BF_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v8bf= _maskz, "__builtin_ia32_cvtne2ps2bf16_v8bf_maskz", IX86_BUILTIN_CVTNE2PS2BF= 16_V8BF_MASKZ, UNKNOWN, (int) V8BF_FTYPE_V4SF_V4SF_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtneps2bf16_v16sf= , "__builtin_ia32_cvtneps2bf16_v16sf", IX86_BUILTIN_CVTNEPS2BF16_V16SF, UNK= NOWN, (int) V16BF_FTYPE_V16SF) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtneps2bf16_v16sf= _mask, "__builtin_ia32_cvtneps2bf16_v16sf_mask", IX86_BUILTIN_CVTNEPS2BF16_= V16SF_MASK, UNKNOWN, (int) V16BF_FTYPE_V16SF_V16BF_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtneps2bf16_v16sf= _maskz, "__builtin_ia32_cvtneps2bf16_v16sf_maskz", IX86_BUILTIN_CVTNE2PS2BF= 16_V16SF_MASKZ, UNKNOWN, (int) V16BF_FTYPE_V16SF_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtneps2bf16_v8sf,= "__builtin_ia32_cvtneps2bf16_v8sf", IX86_BUILTIN_CVTNEPS2BF16_V8SF, UNKNOW= N, (int) V8BF_FTYPE_V8SF) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtneps2bf16_v8sf_= mask, "__builtin_ia32_cvtneps2bf16_v8sf_mask", IX86_BUILTIN_CVTNEPS2BF16_V8= SF_MASK, UNKNOWN, (int) V8BF_FTYPE_V8SF_V8BF_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtneps2bf16_v8sf_= maskz, "__builtin_ia32_cvtneps2bf16_v8sf_maskz", IX86_BUILTIN_CVTNE2PS2BF16= _V8SF_MASKZ, UNKNOWN, (int) V8BF_FTYPE_V8SF_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtneps2bf16_v4sf,= "__builtin_ia32_cvtneps2bf16_v4sf", IX86_BUILTIN_CVTNEPS2BF16_V4SF, UNKNOW= N, (int) V8BF_FTYPE_V4SF) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtneps2bf16_v4sf_= mask, "__builtin_ia32_cvtneps2bf16_v4sf_mask", IX86_BUILTIN_CVTNEPS2BF16_V4= SF_MASK, UNKNOWN, (int) V8BF_FTYPE_V4SF_V8BF_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtneps2bf16_v4sf_= maskz, "__builtin_ia32_cvtneps2bf16_v4sf_maskz", IX86_BUILTIN_CVTNE2PS2BF16= _V4SF_MASKZ, UNKNOWN, (int) V8BF_FTYPE_V4SF_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v16sf, "_= _builtin_ia32_dpbf16ps_v16sf", IX86_BUILTIN_DPBF16PS_V16SF, UNKNOWN, (int) = V16SF_FTYPE_V16SF_V32BF_V32BF) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v16sf_mas= k, "__builtin_ia32_dpbf16ps_v16sf_mask", IX86_BUILTIN_DPBF16PS_V16SF_MASK, = UNKNOWN, (int) V16SF_FTYPE_V16SF_V32BF_V32BF_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v16sf_mas= kz, "__builtin_ia32_dpbf16ps_v16sf_maskz", IX86_BUILTIN_DPBF16PS_V16SF_MASK= Z, UNKNOWN, (int) V16SF_FTYPE_V16SF_V32BF_V32BF_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v8sf, "__= builtin_ia32_dpbf16ps_v8sf", IX86_BUILTIN_DPBF16PS_V8SF, UNKNOWN, (int) V8S= F_FTYPE_V8SF_V16BF_V16BF) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v8sf_mask= , "__builtin_ia32_dpbf16ps_v8sf_mask", IX86_BUILTIN_DPBF16PS_V8SF_MASK, UNK= NOWN, (int) V8SF_FTYPE_V8SF_V16BF_V16BF_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v8sf_mask= z, "__builtin_ia32_dpbf16ps_v8sf_maskz", IX86_BUILTIN_DPBF16PS_V8SF_MASKZ, = UNKNOWN, (int) V8SF_FTYPE_V8SF_V16BF_V16BF_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v4sf, "__= builtin_ia32_dpbf16ps_v4sf", IX86_BUILTIN_DPBF16PS_V4SF, UNKNOWN, (int) V4S= F_FTYPE_V4SF_V8BF_V8BF) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v4sf_mask= , "__builtin_ia32_dpbf16ps_v4sf_mask", IX86_BUILTIN_DPBF16PS_V4SF_MASK, UNK= NOWN, (int) V4SF_FTYPE_V4SF_V8BF_V8BF_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v4sf_mask= z, "__builtin_ia32_dpbf16ps_v4sf_maskz", IX86_BUILTIN_DPBF16PS_V4SF_MASKZ, = UNKNOWN, (int) V4SF_FTYPE_V4SF_V8BF_V8BF_UQI) =20 /* AVX512FP16. */ BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_add= v8hf3_mask, "__builtin_ia32_addph128_mask", IX86_BUILTIN_ADDPH128_MASK, UNK= NOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.c= c index 5d9e5a12f7e..8e1ef0b4c4a 100644 --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -10462,9 +10462,9 @@ ix86_expand_args_builtin (const struct builtin_desc= ription *d, case V8DF_FTYPE_V2DF: case V8DF_FTYPE_V8DF: case V4DI_FTYPE_V4DI: - case V16HI_FTYPE_V16SF: - case V8HI_FTYPE_V8SF: - case V8HI_FTYPE_V4SF: + case V16BF_FTYPE_V16SF: + case V8BF_FTYPE_V8SF: + case V8BF_FTYPE_V4SF: nargs =3D 1; break; case V4SF_FTYPE_V4SF_VEC_MERGE: @@ -10592,12 +10592,12 @@ ix86_expand_args_builtin (const struct builtin_de= scription *d, case USI_FTYPE_USI_USI: case UDI_FTYPE_UDI_UDI: case V16SI_FTYPE_V8DF_V8DF: - case V32HI_FTYPE_V16SF_V16SF: - case V16HI_FTYPE_V8SF_V8SF: - case V8HI_FTYPE_V4SF_V4SF: - case V16HI_FTYPE_V16SF_UHI: - case V8HI_FTYPE_V8SF_UQI: - case V8HI_FTYPE_V4SF_UQI: + case V32BF_FTYPE_V16SF_V16SF: + case V16BF_FTYPE_V8SF_V8SF: + case V8BF_FTYPE_V4SF_V4SF: + case V16BF_FTYPE_V16SF_UHI: + case V8BF_FTYPE_V8SF_UQI: + case V8BF_FTYPE_V4SF_UQI: nargs =3D 2; break; case V2DI_FTYPE_V2DI_INT_CONVERT: @@ -10803,15 +10803,15 @@ ix86_expand_args_builtin (const struct builtin_de= scription *d, case V16HI_FTYPE_V16HI_V16HI_V16HI: case V8SI_FTYPE_V8SI_V8SI_V8SI: case V8HI_FTYPE_V8HI_V8HI_V8HI: - case V32HI_FTYPE_V16SF_V16SF_USI: - case V16HI_FTYPE_V8SF_V8SF_UHI: - case V8HI_FTYPE_V4SF_V4SF_UQI: - case V16HI_FTYPE_V16SF_V16HI_UHI: - case V8HI_FTYPE_V8SF_V8HI_UQI: - case V8HI_FTYPE_V4SF_V8HI_UQI: - case V16SF_FTYPE_V16SF_V32HI_V32HI: - case V8SF_FTYPE_V8SF_V16HI_V16HI: - case V4SF_FTYPE_V4SF_V8HI_V8HI: + case V32BF_FTYPE_V16SF_V16SF_USI: + case V16BF_FTYPE_V8SF_V8SF_UHI: + case V8BF_FTYPE_V4SF_V4SF_UQI: + case V16BF_FTYPE_V16SF_V16BF_UHI: + case V8BF_FTYPE_V8SF_V8BF_UQI: + case V8BF_FTYPE_V4SF_V8BF_UQI: + case V16SF_FTYPE_V16SF_V32BF_V32BF: + case V8SF_FTYPE_V8SF_V16BF_V16BF: + case V4SF_FTYPE_V4SF_V8BF_V8BF: nargs =3D 3; break; case V32QI_FTYPE_V32QI_V32QI_INT: @@ -10958,9 +10958,9 @@ ix86_expand_args_builtin (const struct builtin_desc= ription *d, case V16HI_FTYPE_V32QI_V32QI_V16HI_UHI: case V8SI_FTYPE_V16HI_V16HI_V8SI_UQI: case V4SI_FTYPE_V8HI_V8HI_V4SI_UQI: - case V32HI_FTYPE_V16SF_V16SF_V32HI_USI: - case V16HI_FTYPE_V8SF_V8SF_V16HI_UHI: - case V8HI_FTYPE_V4SF_V4SF_V8HI_UQI: + case V32BF_FTYPE_V16SF_V16SF_V32BF_USI: + case V16BF_FTYPE_V8SF_V8SF_V16BF_UHI: + case V8BF_FTYPE_V4SF_V4SF_V8BF_UQI: nargs =3D 4; break; case V2DF_FTYPE_V2DF_V2DF_V2DI_INT: @@ -10998,9 +10998,9 @@ ix86_expand_args_builtin (const struct builtin_desc= ription *d, break; case UCHAR_FTYPE_UCHAR_UINT_UINT_PUNSIGNED: case UCHAR_FTYPE_UCHAR_ULONGLONG_ULONGLONG_PULONGLONG: - case V16SF_FTYPE_V16SF_V32HI_V32HI_UHI: - case V8SF_FTYPE_V8SF_V16HI_V16HI_UQI: - case V4SF_FTYPE_V4SF_V8HI_V8HI_UQI: + case V16SF_FTYPE_V16SF_V32BF_V32BF_UHI: + case V8SF_FTYPE_V8SF_V16BF_V16BF_UQI: + case V4SF_FTYPE_V4SF_V8BF_V8BF_UQI: nargs =3D 4; break; case UQI_FTYPE_V8DI_V8DI_INT_UQI: diff --git a/gcc/config/i386/immintrin.h b/gcc/config/i386/immintrin.h index ddea249d09b..c62d50f1951 100644 --- a/gcc/config/i386/immintrin.h +++ b/gcc/config/i386/immintrin.h @@ -118,9 +118,11 @@ =20 #include =20 +#ifdef __SSE2__ #include =20 #include +#endif =20 #include =20 diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index f4b5506703f..fba81a93c1a 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -187,8 +187,6 @@ UNSPEC_VP2INTERSECT =20 ;; For AVX512BF16 support - UNSPEC_VCVTNE2PS2BF16 - UNSPEC_VCVTNEPS2BF16 UNSPEC_VDPBF16PS =20 ;; For AVX512FP16 suppport @@ -28918,41 +28916,101 @@ "vp2intersectd\t{%2, %1, %0|%0, %1, %2}" [(set_attr ("prefix") ("evex"))]) =20 -(define_mode_iterator BF16 [V32HI (V16HI "TARGET_AVX512VL") (V8HI "TARGET_= AVX512VL")]) +(define_mode_iterator VF_AVX512BF16VL + [V32BF (V16BF "TARGET_AVX512VL") (V8BF "TARGET_AVX512VL")]) ;; Converting from BF to SF (define_mode_attr bf16_cvt_2sf - [(V32HI "V16SF") (V16HI "V8SF") (V8HI "V4SF")]) + [(V32BF "V16SF") (V16BF "V8SF") (V8BF "V4SF")]) ;; Converting from SF to BF (define_mode_attr sf_cvt_bf16 - [(V4SF "V8HI") (V8SF "V8HI") (V16SF "V16HI")]) + [(V8SF "V8BF") (V16SF "V16BF")]) ;; Mapping from BF to SF (define_mode_attr sf_bf16 - [(V4SF "V8HI") (V8SF "V16HI") (V16SF "V32HI")]) + [(V4SF "V8BF") (V8SF "V16BF") (V16SF "V32BF")]) =20 (define_expand "avx512f_cvtne2ps2bf16__maskz" - [(match_operand:BF16 0 "register_operand") + [(match_operand:VF_AVX512BF16VL 0 "register_operand") (match_operand: 1 "register_operand") - (match_operand: 2 "register_operand") + (match_operand: 2 "nonimmediate_operand") (match_operand: 3 "register_operand")] "TARGET_AVX512BF16" { - emit_insn (gen_avx512f_cvtne2ps2bf16__mask(operands[0], operands[1= ], - operands[2], CONST0_RTX(mode), operands[3])); + emit_insn (gen_avx512f_cvtne2ps2bf16__mask(operands[0], operands[2= ], + operands[1], CONST0_RTX(mode), operands[3])); DONE; }) =20 (define_insn "avx512f_cvtne2ps2bf16_" - [(set (match_operand:BF16 0 "register_operand" "=3Dv") - (unspec:BF16 - [(match_operand: 1 "register_operand" "v") - (match_operand: 2 "register_operand" "v")] - UNSPEC_VCVTNE2PS2BF16))] + [(set (match_operand:VF_AVX512BF16VL 0 "register_operand" "=3Dv") + (vec_concat:VF_AVX512BF16VL + (float_truncate: + (match_operand: 2 "nonimmediate_operand" "vm")) + (float_truncate: + (match_operand: 1 "register_operand" "v"))))] "TARGET_AVX512BF16" "vcvtne2ps2bf16\t{%2, %1, %0|%0, %1, %2}") =20 +(define_expand "avx512f_cvtneps2bf16_v4sf" + [(set (match_operand:V8BF 0 "register_operand") + (vec_concat:V8BF + (float_truncate:V4BF + (match_operand:V4SF 1 "nonimmediate_operand")) + (match_dup 2)))] + "TARGET_AVX512BF16 && TARGET_AVX512VL" + "operands[2] =3D CONST0_RTX (V4BFmode);") + +(define_insn "*avx512f_cvtneps2bf16_v4sf" + [(set (match_operand:V8BF 0 "register_operand" "=3Dv") + (vec_concat:V8BF + (float_truncate:V4BF + (match_operand:V4SF 1 "nonimmediate_operand" "vm")) + (match_operand:V4BF 2 "const0_operand")))] + "TARGET_AVX512BF16 && TARGET_AVX512VL" + "vcvtneps2bf16{x}\t{%1, %0|%0, %1}") + +(define_expand "avx512f_cvtneps2bf16_v4sf_maskz" + [(match_operand:V8BF 0 "register_operand") + (match_operand:V4SF 1 "nonimmediate_operand") + (match_operand:QI 2 "register_operand")] + "TARGET_AVX512BF16 && TARGET_AVX512VL" +{ + emit_insn (gen_avx512f_cvtneps2bf16_v4sf_mask_1(operands[0], operands[1]= , + CONST0_RTX(V8BFmode), operands[2], CONST0_RTX(V4BFmode))); + DONE; +}) + +(define_expand "avx512f_cvtneps2bf16_v4sf_mask" + [(match_operand:V8BF 0 "register_operand") + (match_operand:V4SF 1 "nonimmediate_operand") + (match_operand:V8BF 2 "nonimm_or_0_operand") + (match_operand:QI 3 "register_operand")] + "TARGET_AVX512BF16 && TARGET_AVX512VL" +{ + emit_insn (gen_avx512f_cvtneps2bf16_v4sf_mask_1(operands[0], operands[1]= , + operands[2], operands[3], CONST0_RTX(V4BFmode))); + DONE; +}) + +(define_insn "avx512f_cvtneps2bf16_v4sf_mask_1" + [(set (match_operand:V8BF 0 "register_operand" "=3Dv") + (vec_concat:V8BF + (vec_merge:V4BF + (float_truncate:V4BF + (match_operand:V4SF 1 "nonimmediate_operand" "vm")) + (vec_select:V4BF + (match_operand:V8BF 2 "nonimm_or_0_operand" "0C") + (parallel [(const_int 0) (const_int 1) + (const_int 2) (const_int 3)])) + (match_operand:QI 3 "register_operand" "Yk")) + (match_operand:V4BF 4 "const0_operand")))] + "TARGET_AVX512BF16 && TARGET_AVX512VL" + "vcvtneps2bf16{x}\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}") + +(define_mode_iterator VF1_AVX512_256 [V16SF (V8SF "TARGET_AVX512VL")]) + (define_expand "avx512f_cvtneps2bf16__maskz" [(match_operand: 0 "register_operand") - (match_operand:VF1_AVX512VL 1 "register_operand") + (match_operand:VF1_AVX512_256 1 "nonimmediate_operand") (match_operand: 2 "register_operand")] "TARGET_AVX512BF16" { @@ -28963,11 +29021,10 @@ =20 (define_insn "avx512f_cvtneps2bf16_" [(set (match_operand: 0 "register_operand" "=3Dv") - (unspec: - [(match_operand:VF1_AVX512VL 1 "register_operand" "v")] - UNSPEC_VCVTNEPS2BF16))] + (float_truncate: + (match_operand:VF1_AVX512_256 1 "nonimmediate_operand" "vm")))] "TARGET_AVX512BF16" - "vcvtneps2bf16\t{%1, %0|%0, %1}") + "vcvtneps2bf16\t{%1, %0|%0, %1}= ") =20 (define_expand "avx512f_dpbf16ps__maskz" [(match_operand:VF1_AVX512VL 0 "register_operand") @@ -28987,7 +29044,7 @@ (unspec:VF1_AVX512VL [(match_operand:VF1_AVX512VL 1 "register_operand" "0") (match_operand: 2 "register_operand" "v") - (match_operand: 3 "register_operand" "v")] + (match_operand: 3 "nonimmediate_operand" "vm")] UNSPEC_VDPBF16PS))] "TARGET_AVX512BF16" "vdpbf16ps\t{%3, %2, %0|%0, %2= , %3}") @@ -28998,7 +29055,7 @@ (unspec:VF1_AVX512VL [(match_operand:VF1_AVX512VL 1 "register_operand" "0") (match_operand: 2 "register_operand" "v") - (match_operand: 3 "register_operand" "v")] + (match_operand: 3 "nonimmediate_operand" "vm")] UNSPEC_VDPBF16PS) (match_dup 1) (match_operand: 4 "register_operand" "Yk"))= )] diff --git a/gcc/testsuite/gcc.target/i386/avx512bf16-cvtsbh2ss-1.c b/gcc/t= estsuite/gcc.target/i386/avx512bf16-cvtsbh2ss-1.c index 831abd37d80..8e929e6f159 100644 --- a/gcc/testsuite/gcc.target/i386/avx512bf16-cvtsbh2ss-1.c +++ b/gcc/testsuite/gcc.target/i386/avx512bf16-cvtsbh2ss-1.c @@ -1,6 +1,6 @@ /* { dg-do compile } */ /* { dg-options "-mavx512bf16 -O2" } */ -/* { dg-additional-options "-fno-PIE" { target ia32 } } */ +/* { dg-additional-options "-fno-PIE -mfpmath=3Dsse" { target ia32 } } */ /* { dg-final { scan-assembler-times "sall\[ \\t\]+\[^\{\n\]*16" 1 } } */ /* { dg-final { scan-assembler-times "movl" 1 } } */ =20 diff --git a/gcc/testsuite/gcc.target/i386/avx512bf16-vdpbf16ps-2.c b/gcc/t= estsuite/gcc.target/i386/avx512bf16-vdpbf16ps-2.c index b64ad7b84dd..02ebdd8cf5b 100644 --- a/gcc/testsuite/gcc.target/i386/avx512bf16-vdpbf16ps-2.c +++ b/gcc/testsuite/gcc.target/i386/avx512bf16-vdpbf16ps-2.c @@ -1,6 +1,6 @@ /* { dg-do compile } */ /* { dg-options "-mavx512bf16 -O2" } */ -/* { dg-final { scan-assembler-times "vdpbf16ps\[ \\t\]+\[^\{\n\]*%zmm\[0-= 9\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|= \[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vdpbf16ps\[ \\t\]+\[^\{\n\]*\[^\{\n\= ]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)"= 1 } } */ =20 #include =20 diff --git a/gcc/testsuite/gcc.target/i386/avx512bf16vl-cvtness2sbh-1.c b/g= cc/testsuite/gcc.target/i386/avx512bf16vl-cvtness2sbh-1.c index 8f21b1bfdae..b71addd6301 100644 --- a/gcc/testsuite/gcc.target/i386/avx512bf16vl-cvtness2sbh-1.c +++ b/gcc/testsuite/gcc.target/i386/avx512bf16vl-cvtness2sbh-1.c @@ -1,6 +1,6 @@ /* { dg-do compile } */ /* { dg-options "-mavx512bf16 -mavx512vl -O2" } */ -/* { dg-final { scan-assembler-times "vcvtneps2bf16\[ \\t\]+\[^\{\n\]*%xmm= \[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneps2bf16x\[ \\t\]+\[^\{\n\]*%xm= m\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ =20 #include =20 diff --git a/gcc/testsuite/gcc.target/i386/avx512bf16vl-vcvtneps2bf16-1.c b= /gcc/testsuite/gcc.target/i386/avx512bf16vl-vcvtneps2bf16-1.c index 0969ae1b35e..d3a9bdf8c34 100644 --- a/gcc/testsuite/gcc.target/i386/avx512bf16vl-vcvtneps2bf16-1.c +++ b/gcc/testsuite/gcc.target/i386/avx512bf16vl-vcvtneps2bf16-1.c @@ -1,11 +1,11 @@ /* { dg-do compile } */ /* { dg-options "-mavx512bf16 -mavx512vl -O2" } */ -/* { dg-final { scan-assembler-times "vcvtneps2bf16\[ \\t\]+\[^\{\n\]*%ymm= \[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ -/* { dg-final { scan-assembler-times "vcvtneps2bf16\[ \\t\]+\[^\{\n\]*%ymm= \[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ -/* { dg-final { scan-assembler-times "vcvtneps2bf16\[ \\t\]+\[^\{\n\]*%ymm= \[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" = 1 } } */ -/* { dg-final { scan-assembler-times "vcvtneps2bf16\[ \\t\]+\[^\{\n\]*%xmm= \[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ -/* { dg-final { scan-assembler-times "vcvtneps2bf16\[ \\t\]+\[^\{\n\]*%xmm= \[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ -/* { dg-final { scan-assembler-times "vcvtneps2bf16\[ \\t\]+\[^\{\n\]*%xmm= \[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" = 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneps2bf16y\[ \\t\]+\[^\{\n\]*%ym= m\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneps2bf16y\[ \\t\]+\[^\{\n\]*%ym= m\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneps2bf16y\[ \\t\]+\[^\{\n\]*%ym= m\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)"= 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneps2bf16x\[ \\t\]+\[^\{\n\]*%xm= m\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneps2bf16x\[ \\t\]+\[^\{\n\]*%xm= m\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneps2bf16x\[ \\t\]+\[^\{\n\]*%xm= m\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)"= 1 } } */ =20 #include =20 --=20 2.27.0