From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR05-VI1-obe.outbound.protection.outlook.com (mail-vi1eur05on2048.outbound.protection.outlook.com [40.107.21.48]) by sourceware.org (Postfix) with ESMTPS id 6A0EA386FC1D for ; Mon, 24 May 2021 12:15:39 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 6A0EA386FC1D Received: from AM7PR03CA0024.eurprd03.prod.outlook.com (2603:10a6:20b:130::34) by AM9PR08MB6997.eurprd08.prod.outlook.com (2603:10a6:20b:418::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4150.23; Mon, 24 May 2021 12:15:36 +0000 Received: from AM5EUR03FT026.eop-EUR03.prod.protection.outlook.com (2603:10a6:20b:130:cafe::2a) by AM7PR03CA0024.outlook.office365.com (2603:10a6:20b:130::34) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4150.26 via Frontend Transport; Mon, 24 May 2021 12:15:36 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; gcc.gnu.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;gcc.gnu.org; dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM5EUR03FT026.mail.protection.outlook.com (10.152.16.155) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4129.25 via Frontend Transport; Mon, 24 May 2021 12:15:35 +0000 Received: ("Tessian outbound 0f1e4509c199:v92"); Mon, 24 May 2021 12:15:34 +0000 X-CR-MTA-TID: 64aa7808 Received: from c77059438c1a.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 34AB1207-4BEF-41E2-B11F-6F6F3AFBC623.1; Mon, 24 May 2021 12:15:28 +0000 Received: from EUR04-HE1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id c77059438c1a.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Mon, 24 May 2021 12:15:28 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=EyuP2ORQFxXVtW47N+HVrbMbvvEvptaWjz2s5pAyUb//E0ZIWAAGIznOX8a83GOVlQGE2zOz0CEvvhZjSdqiwyGWl8KRNXmL9AK2AhP5MEmQICdqSzonW5a+QZfMR2mjfFZTjYV1WddmypE5EKvv1CwEbCctflc3IMc2U607jopVy1EL150m+67aVi4XiHydEPlxy2NeiTCeOYSNzEn7qrZGHNbHs2YDYyBHABYCq85cUwwxNylH81Gb9ncVp2RGtt80riX9R8EHbTrtDw0ZcLghPbQhsDTNrEJ/U4mAjrGI6P6iw5oe1EP8BKiwLjwoQwlyW2WcPFsGNOzMH0HArg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=nRRLV9hX7PdsPZTY4GACntWogBkWtd0gIClo1FYC070=; b=ckkUsEyETwIOTulCqGPtPzW3JXUBNfT3BQ52SWxSqkuW5UciAyxGAXo4blWkic4K/32GkcePkZI3E+q/BIbcehQ4UPLrWug4EYnI0HdlYOnJgv+kdluJEAMhxjPZcubQH5YlGr02aeeMShlUavDWEoF7uod1iL1UJBfOA8xpBDq4hReOCXHZyrl4akfdUB1xsJ1xHZfGhlxpg/N5DrQSkj9jXhMWvP6E18wh237K5tWqR21kIBvCNxCRWcwPRgvN3oPMmigTU1gWjHjiaWnrrG3B6i+KhH+PLTm7VbxIPo0GaBLvk+o2gXlv+XlbNy/Ly0gvznEbGLPR4l8pXCDyaQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Received: from PAXPR08MB6926.eurprd08.prod.outlook.com (2603:10a6:102:138::24) by PR3PR08MB5689.eurprd08.prod.outlook.com (2603:10a6:102:90::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4150.26; Mon, 24 May 2021 12:15:26 +0000 Received: from PAXPR08MB6926.eurprd08.prod.outlook.com ([fe80::95ab:14a5:b91f:5d7a]) by PAXPR08MB6926.eurprd08.prod.outlook.com ([fe80::95ab:14a5:b91f:5d7a%6]) with mapi id 15.20.4150.027; Mon, 24 May 2021 12:15:26 +0000 From: Kyrylo Tkachov To: Christophe Lyon CC: "gcc-patches@gcc.gnu.org" Subject: RE: [PATCH 8/9] arm: Auto-vectorization for MVE: vld2/vst2 Thread-Topic: [PATCH 8/9] arm: Auto-vectorization for MVE: vld2/vst2 Thread-Index: AQHXPcqygkSpcQ3TQEmqfzZ61nmHLarysU9g Date: Mon, 24 May 2021 12:15:26 +0000 Message-ID: References: <1619791790-628-1-git-send-email-christophe.lyon@linaro.org> <1619791790-628-8-git-send-email-christophe.lyon@linaro.org> In-Reply-To: <1619791790-628-8-git-send-email-christophe.lyon@linaro.org> Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ts-tracking-id: AB53398A1CBB4A4CB93160576F7C363B.0 x-checkrecipientchecked: true Authentication-Results-Original: linaro.org; dkim=none (message not signed) header.d=none;linaro.org; dmarc=none action=none header.from=arm.com; x-originating-ip: [86.31.103.53] x-ms-publictraffictype: Email X-MS-Office365-Filtering-Correlation-Id: 9ee48330-e809-4b18-4662-08d91eada295 x-ms-traffictypediagnostic: PR3PR08MB5689:|AM9PR08MB6997: X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true nodisclaimer: true x-ms-oob-tlc-oobclassifiers: OLM:626;OLM:626; X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: ZIUtRh4dxHgGz7jIK10Cw3HhkZoUr9cuSMEFcd2Qgbc3qd6SNrazG9+Xa9qXSKn9qAOQBsP1tqQLo15zw+T0onAGOWzgmDvhDXd/X6iRWB2ZhATGW9n3JLSk/9DRkYVTo4+M6lK2dZp7qVcf+aluDy98ZJZLbi1ckGJmYq4g3/A5LGYvu/VujTwZC8iNEqVKOnYuafL2cKUum+EUUtqsGymLNVhs99bUpjMS1t8xkbtkjXF7DJokb3D5tZR6dKTQRnH0QGboro4IEXjUHzZQXfdhkGBeMtXhtSzhZbYqlaY/BeKmRoyO8cIIM7w2boYHHe2pa7HhpwMbdgNs4cO+iy8R2Y5ZAvjdjWOsG3Kr3jHiXUlQgoMyUGSabnGWYZVTUHT7ko8VISl4OHC1YY6bBVYB0tQaEbrGqdMj3v2MtYRKtVeZju6+vvhQtde7knCZT1DIfAUoxy4HDJ28QEltrHHnUS3mSdhLnikf7qqgBX3bhrQTGyCVWjJJS8Fbjzzc3Sg4pUoPgtT9tCkY79jlScz06yLvOTao/wXqHOK2qRKC3QvxIfJL7n1EHTHsi1hhnvR+/Bdnr0DvMB8bmG7wJIMf5NA2sC/WYBu3FBUW7TY= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:PAXPR08MB6926.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(39850400004)(396003)(136003)(346002)(376002)(366004)(122000001)(86362001)(7696005)(5660300002)(64756008)(66476007)(4326008)(66446008)(53546011)(6916009)(55016002)(66556008)(66946007)(55236004)(76116006)(26005)(83380400001)(8676002)(38100700002)(8936002)(6506007)(186003)(2906002)(52536014)(71200400001)(9686003)(316002)(33656002)(478600001); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata: =?us-ascii?Q?vxitU25zRECo636eFc7vq9GdaCu9zjeLNgLZ3t08H8Mxvhp7IXRVYd3qADqT?= =?us-ascii?Q?cmpIAFl3sHtPIJrC2eMxfYQjoLjSmqXKbhNkHAo7s0j/NIj6zOomtpFWc6O3?= =?us-ascii?Q?L//LVQHklzSxqICgP9BGKOg3eeJ0aGTF11VJdD4J2lmwXkkAyr284dImOAru?= =?us-ascii?Q?V1s2IldXlsuKaljFjOPyUU+Tt8Jy3uaPvzmwr8aP9UcdkyNRx2LELdba/Mef?= =?us-ascii?Q?w3UPqBDyEV/JyKp7j8HZIgkpMBVLi1riWO3GXnI3M5XUbTV1QxWBtqxIg4yH?= =?us-ascii?Q?BVtnmTFogeJFJNKRueRDKDXYZ1sbl8sH3vWmFNR3UdpfajLs8vTM0GtZGQKb?= =?us-ascii?Q?yRSnqj2pZHZXouEaMULDh5wnmCDdHy6nXKia6ADG4bk3KRZ/RMB68XLpQjJC?= =?us-ascii?Q?dgkFJ/Oae27YQWfWhUPZnYw9YtTQQwf/8wroKzI40KGl5DI8d8bzJcVJvyq0?= =?us-ascii?Q?feBRvdIlXw4sn5rbxFu2Su2RrckMnidHH5pFFNhThl73o0l+3fjFfa9cxfJN?= =?us-ascii?Q?i/i4NPAAEz4XUgR+11Ny8R33Azoe50VMWcH4uIju2OP27nOnuOh4JsL+EHsT?= =?us-ascii?Q?N8Mjo7O09Z9LKZr7njqiMCjTOIirfrEg4/qcRMtqzHkRCFDoE6nQ/bhx5q/I?= =?us-ascii?Q?wtQT3bSktTyBTvG1MnRLYf7Z/QLXaMGHom+Xn2k3c7xk31uBrzsoWgKGUESl?= =?us-ascii?Q?C5Oh6IYZ+zTndq78Jmn6o6S+rflRrpGDAlxrRm/NErMQ9YvcsxO6K2sV8TWi?= =?us-ascii?Q?Zdx+vth3WHpTYTwfFm1/c5dNN5l4pHciC7kOH01MSZmmpL6AkS2VZP6kf9Os?= =?us-ascii?Q?Tli4Ftv4X5PfoSwauYuYuSLS8cyAYNWgRwZm4xD9MVsweFBe4F9oDyiqKT9F?= =?us-ascii?Q?1EDZmVaruDOv1sekb+FoVI5Paj2rXP12485H2IdGVncgB1HsUwRH2fyrfbUS?= =?us-ascii?Q?s5UjqMBtFomspJR9ZjCpXAkBCcH8UBBtziHKkVLXk+6UMy6+VOK0LvzHK1gf?= =?us-ascii?Q?Ik9yC0DEtdbZ3rGSZkIHoSZP2uZ/09mMi0RUawXl5O42uTnx+bIeQAO2/LPs?= =?us-ascii?Q?SaDtNZpEb9RvSDY4S3e7a0GHD5F3S4HDdaOWPhBQw4ZDXqT8d22B4/kNC66P?= =?us-ascii?Q?0jJQJXp1pzpacxQjK13hJNFj/gLeX0MwdAzSqk4PyrB70rL0kQ37VR4+EvMS?= =?us-ascii?Q?F8hpo5IwNcRKUy6FPRp98mQlTJBXYEjS8k0chskWly2wdN4EPOY2DXpkUVQ1?= =?us-ascii?Q?KUsqun37dnFiuwoFQtgM85HD/8gLQ4bbN75XVnpeCJ/r2I4XafzYpKjdGica?= =?us-ascii?Q?+fs=3D?= x-ms-exchange-transport-forked: True Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: PR3PR08MB5689 Original-Authentication-Results: linaro.org; dkim=none (message not signed) header.d=none;linaro.org; dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM5EUR03FT026.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: 6a102847-f7f5-4be1-8b4a-08d91ead9d37 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: Mde5oL7FIkSsmUNTcMqQbSefsBmvQPz+j1IqN0P39Ik2ulW+uRJMx392FMSmj/yB2yhp5sIXU1+k8iP3j+EKB8B52f/69m3pUZ7phYkkEXpj83A6ov6tte3qcRW4NlV7eFSO0Eeu5jCIaB6HonSeuB76epWqtqRBegtjmEFXyXJveeop7+ZiaPSuffjOb4Bd7NftNz3do/bKALa/5lsoomKvfi0Vuw80veDiAX1d8Zp66j4BCNHEYESg5wfNhYSp8ZPznvwjLM+uiU4ld4Rqgx0hKqbLsAzPpby95x3tuVDs218EKafNeUllUTLjcJWDW8u+rg/eA2fO16MV4yNJ1zmPPGNrvLbVpz7lBP8B/R4EIqtC5Jmebm+gxvOQR8h0QRjI49/RcezVkd4ZB6hATQaK9H7vNEUw7STTe3J9awhMxFWGlTGogEZ95RM/j7NGG6tZVvgpl7cBwnVShmNAxAP8X0AtkHqN8Z1ETKEZIWpwiPJvgUjA5Gy7fySJCGuNNM7eWS+ZDozLf1SUYK6oZo40pydzhbo0rcc16fVtfTYsWsBcDnhimwa/UKIYo4njNH81PUwUFTcUzi36ZdvoALVlH6BeSfttIe/lRQR27K4R/fTeEOSHXPRnUGy+4XVf03taIms2vzn1F0HRFe9o5Q== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(376002)(39850400004)(396003)(136003)(346002)(46966006)(36840700001)(6862004)(6506007)(53546011)(2906002)(26005)(4326008)(336012)(52536014)(82310400003)(5660300002)(9686003)(83380400001)(36860700001)(47076005)(186003)(55236004)(55016002)(33656002)(356005)(316002)(81166007)(478600001)(8936002)(82740400003)(70206006)(7696005)(70586007)(8676002)(86362001); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 24 May 2021 12:15:35.4763 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 9ee48330-e809-4b18-4662-08d91eada295 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM5EUR03FT026.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM9PR08MB6997 X-Spam-Status: No, score=-13.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, KAM_NUMSUBJECT, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 May 2021 12:15:42 -0000 > -----Original Message----- > From: Gcc-patches On Behalf Of > Christophe Lyon via Gcc-patches > Sent: 30 April 2021 15:10 > To: gcc-patches@gcc.gnu.org > Subject: [PATCH 8/9] arm: Auto-vectorization for MVE: vld2/vst2 >=20 > This patch enables MVE vld2/vst2 instructions for auto-vectorization. > We move the existing expanders from neon.md and enable them for MVE, > calling the respective emitter. Ok. Thanks, Kyrill >=20 > 2021-03-12 Christophe Lyon >=20 > gcc/ > * config/arm/neon.md (vec_load_lanesoi) > (vec_store_lanesoi): Move ... > * config/arm/vec-common.md: here. >=20 > gcc/testsuite/ > * gcc.target/arm/simd/mve-vld2.c: New test, derived from > slp-perm-2.c > --- > gcc/config/arm/neon.md | 14 ---- > gcc/config/arm/vec-common.md | 27 ++++++++ > gcc/testsuite/gcc.target/arm/simd/mve-vld2.c | 96 > ++++++++++++++++++++++++++++ > 3 files changed, 123 insertions(+), 14 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-vld2.c >=20 > diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md > index 6660846..bc8775c 100644 > --- a/gcc/config/arm/neon.md > +++ b/gcc/config/arm/neon.md > @@ -5063,13 +5063,6 @@ (define_insn "neon_vld2" > (const_string "neon_load2_2reg")))] > ) >=20 > -(define_expand "vec_load_lanesoi" > - [(set (match_operand:OI 0 "s_register_operand") > - (unspec:OI [(match_operand:OI 1 "neon_struct_operand") > - (unspec:VQ2 [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] > - UNSPEC_VLD2))] > - "TARGET_NEON") > - > (define_insn "neon_vld2" > [(set (match_operand:OI 0 "s_register_operand" "=3Dw") > (unspec:OI [(match_operand:OI 1 "neon_struct_operand" "Um") > @@ -5197,13 +5190,6 @@ (define_insn "neon_vst2" > (const_string "neon_store2_one_lane")))] > ) >=20 > -(define_expand "vec_store_lanesoi" > - [(set (match_operand:OI 0 "neon_struct_operand") > - (unspec:OI [(match_operand:OI 1 "s_register_operand") > - (unspec:VQ2 [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] > - UNSPEC_VST2))] > - "TARGET_NEON") > - > (define_insn "neon_vst2" > [(set (match_operand:OI 0 "neon_struct_operand" "=3DUm") > (unspec:OI [(match_operand:OI 1 "s_register_operand" "w") > diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec- > common.md > index 3fd341c..7abefea 100644 > --- a/gcc/config/arm/vec-common.md > +++ b/gcc/config/arm/vec-common.md > @@ -482,6 +482,33 @@ (define_expand > "vcond_mask_" > } > else > gcc_unreachable (); > + DONE; > +}) >=20 > +(define_expand "vec_load_lanesoi" > + [(set (match_operand:OI 0 "s_register_operand") > + (unspec:OI [(match_operand:OI 1 "neon_struct_operand") > + (unspec:VQ2 [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] > + UNSPEC_VLD2))] > + "TARGET_NEON || TARGET_HAVE_MVE" > +{ > + if (TARGET_NEON) > + emit_insn (gen_neon_vld2 (operands[0], operands[1])); > + else > + emit_insn (gen_mve_vld2q (operands[0], operands[1])); > + DONE; > +}) > + > +(define_expand "vec_store_lanesoi" > + [(set (match_operand:OI 0 "neon_struct_operand") > + (unspec:OI [(match_operand:OI 1 "s_register_operand") > + (unspec:VQ2 [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] > + UNSPEC_VST2))] > + "TARGET_NEON || TARGET_HAVE_MVE" > +{ > + if (TARGET_NEON) > + emit_insn (gen_neon_vst2 (operands[0], operands[1])); > + else > + emit_insn (gen_mve_vst2q (operands[0], operands[1])); > DONE; > }) > diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vld2.c > b/gcc/testsuite/gcc.target/arm/simd/mve-vld2.c > new file mode 100644 > index 0000000..9c7c3f5 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vld2.c > @@ -0,0 +1,96 @@ > +/* { dg-do assemble } */ > +/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */ > +/* { dg-add-options arm_v8_1m_mve_fp } */ > +/* { dg-additional-options "-O3" } */ > + > +#include > + > +#define M00 100 > +#define M10 216 > +#define M01 1322 > +#define M11 13 > + > +#define N 128 > + > + > +/* Integer tests. */ > +#define FUNC(SIGN, TYPE, BITS) > \ > + void foo_##SIGN##BITS##x (TYPE##BITS##_t *__restrict__ pInput, \ > + TYPE##BITS##_t *__restrict__ pOutput) \ > + { \ > + unsigned int i; \ > + TYPE##BITS##_t a, b; \ > + \ > + for (i =3D 0; i < N / BITS; i++) \ > + { \ > + a =3D *pInput++; \ > + b =3D *pInput++; \ > + \ > + *pOutput++ =3D M00 * a + M01 * b; > \ > + *pOutput++ =3D M10 * a + M11 * b; > \ > + } \ > + } > + > +FUNC(s, int, 8) > +FUNC(u, uint, 8) > +FUNC(s, int, 16) > +FUNC(u, uint, 16) > +FUNC(s, int, 32) > +FUNC(u, uint, 32) > + > +/* float test, keep the macro because it's similar to the above, but doe= s not > + need the ##BITS##_t. */ > +#define FUNC_FLOAT(SIGN, TYPE, BITS) > \ > + void foo_##SIGN##BITS##x (TYPE *__restrict__ pInput, > \ > + TYPE *__restrict__ pOutput) \ > + { \ > + unsigned int i; \ > + TYPE a, b; \ > + \ > + for (i =3D 0; i < N / BITS; i++) \ > + { \ > + a =3D *pInput++; \ > + b =3D *pInput++; \ > + \ > + *pOutput++ =3D M00 * a + M01 * b; > \ > + *pOutput++ =3D M10 * a + M11 * b; > \ > + } \ > + } > + > +FUNC_FLOAT(f, float, 32) > + > +/* __fp16 test, needs explicit casts to avoid conversions to floating-po= int > and > + failure to vectorize. */ > +__fp16 M00_fp16 =3D 100.0f16; > +__fp16 M10_fp16 =3D 216.0f16; > +__fp16 M01_fp16 =3D 1322.0f16; > +__fp16 M11_fp16 =3D 13.0f16; > + > +#define FUNC_FLOAT_FP16(SIGN, TYPE, BITS) \ > + void foo_##SIGN##BITS##x (TYPE *__restrict__ pInput, > \ > + TYPE *__restrict__ pOutput) \ > + { \ > + unsigned int i; \ > + TYPE a, b; \ > + \ > + for (i =3D 0; i < N / BITS; i++) \ > + { \ > + a =3D *pInput++; \ > + b =3D *pInput++; \ > + \ > + *pOutput++ =3D (__fp16)(M00_fp16 * a) + (__fp16)(M01_fp16 * b); > \ > + *pOutput++ =3D (__fp16)(M10_fp16 * a) + (__fp16)(M11_fp16 * b); > \ > + } \ > + } > + > +FUNC_FLOAT_FP16(f, __fp16, 16) > + > +/* vld2X.8 is used for signed and unsigned chars: 2 pairs. */ > +/* vld2X.16 is used for signed and unsigned shorts and __fp16: 3 pairs. = */ > +/* vld2X.32 is used for signed and unsigned ints and float: 3 pairs. */ > +/* { dg-final { scan-assembler-times {vld2[01].8\t.q[0-9]+, q[0-9]+., } = 4 } } */ > +/* { dg-final { scan-assembler-times {vld2[01].16\t.q[0-9]+, q[0-9]+., }= 6 } } > */ > +/* { dg-final { scan-assembler-times {vld2[01].32\t.q[0-9]+, q[0-9]+., }= 6 } } > */ > +/* { dg-final { scan-assembler-times {vst2[01].8\t.q[0-9]+, q[0-9]+., } = 4 } } */ > +/* { dg-final { scan-assembler-times {vst2[01].16\t.q[0-9]+, q[0-9]+., }= 6 } } > */ > +/* { dg-final { scan-assembler-times {vst2[01].32\t.q[0-9]+, q[0-9]+., }= 6 } } > */ > -- > 2.7.4