From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR05-DB8-obe.outbound.protection.outlook.com (mail-db8eur05on2080.outbound.protection.outlook.com [40.107.20.80]) by sourceware.org (Postfix) with ESMTPS id EFE92386FC1D for ; Mon, 24 May 2021 12:16:07 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org EFE92386FC1D Received: from AM6P192CA0062.EURP192.PROD.OUTLOOK.COM (2603:10a6:209:82::39) by DB8PR08MB5337.eurprd08.prod.outlook.com (2603:10a6:10:11e::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4150.23; Mon, 24 May 2021 12:16:06 +0000 Received: from AM5EUR03FT021.eop-EUR03.prod.protection.outlook.com (2603:10a6:209:82:cafe::45) by AM6P192CA0062.outlook.office365.com (2603:10a6:209:82::39) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4150.23 via Frontend Transport; Mon, 24 May 2021 12:16:06 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; gcc.gnu.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;gcc.gnu.org; dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM5EUR03FT021.mail.protection.outlook.com (10.152.16.105) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4129.25 via Frontend Transport; Mon, 24 May 2021 12:16:05 +0000 Received: ("Tessian outbound 3050e7a5b95d:v92"); Mon, 24 May 2021 12:16:05 +0000 X-CR-MTA-TID: 64aa7808 Received: from 80e5b63906ac.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id DDE610CE-5728-427C-9317-E442C9FCF3CF.1; Mon, 24 May 2021 12:15:59 +0000 Received: from EUR05-DB8-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 80e5b63906ac.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Mon, 24 May 2021 12:15:59 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=l//Yn3EMViFMb2OrlzBS57RdM13kEch+Kt4yXWQgKPAzDeBt0MznFPBroVx+GZlPH8ur1mAyWXMbq8bUoaVdiE1ekIaDVzxSIz6sg8xdBTHAiOkWEj+i9l6p9zhx8Amj3pObcEx7qtDCtfw5LGf8rmA2bdWRqeXHHmMvwfzIUuNw3synhmTjAr/vHeGyu/uHN+S1Wm+Pc2niAxVzWEdlSd5+R2GPrNuvnrtF0sB4pFyZPs/7BYUa9QjHsFjviQnxvncyzgU8sQIIEMPLMVrvBOK0qOUeBvVPC7wt8H7F1sB6qeO5eyFYw7IfQrCnhL2CQl4oQySJQL6+YM8SD1tjlw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=ZhT/nAZCkqaZXiQ+R7oRbYeDxp7dbcEdFU0YUZvd5Hw=; b=gLmiTrVpqOfhQPPFMHP8JsHWOn+FRgVf6qbf0E3KJwwHun8m+M2Czt0RXT8WgTDA/3RRzLdNeGNwgV5VWvTC7fjQenZUQrrFYr5bBQzgz4nH0eq3hn6a/2Zdu7enN5ZUZYCVoyR1nfsGehNjk7U04Vlk/Nx5fYbM0vfPRFMQwZgkxnfS+S2Crynm5EJSo/T5U6X3wZOmMh6K2BuTOIX2S57Q+uOfi3UJKOTZzq3gwXgCPO9ic9MKjAFhlRlNXN4JIuG6IyMSfzjjv4CzYjhM7aSZVY4KH1LUOP8TfWxWBT8f7mPywt6oopeQqM05JPEwI6SVQW4jN5We0kRJmHcs/Q== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Received: from PAXPR08MB6926.eurprd08.prod.outlook.com (2603:10a6:102:138::24) by PAXPR08MB6670.eurprd08.prod.outlook.com (2603:10a6:102:130::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4150.23; Mon, 24 May 2021 12:15:58 +0000 Received: from PAXPR08MB6926.eurprd08.prod.outlook.com ([fe80::95ab:14a5:b91f:5d7a]) by PAXPR08MB6926.eurprd08.prod.outlook.com ([fe80::95ab:14a5:b91f:5d7a%6]) with mapi id 15.20.4150.027; Mon, 24 May 2021 12:15:58 +0000 From: Kyrylo Tkachov To: Christophe Lyon CC: "gcc-patches@gcc.gnu.org" Subject: RE: [PATCH 9/9] arm: Auto-vectorization for MVE: vld4/vst4 Thread-Topic: [PATCH 9/9] arm: Auto-vectorization for MVE: vld4/vst4 Thread-Index: AQHXPcqzsYDLZRORX0C/iZFrlPAgI6rysXZw Date: Mon, 24 May 2021 12:15:58 +0000 Message-ID: References: <1619791790-628-1-git-send-email-christophe.lyon@linaro.org> <1619791790-628-9-git-send-email-christophe.lyon@linaro.org> In-Reply-To: <1619791790-628-9-git-send-email-christophe.lyon@linaro.org> Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ts-tracking-id: 3D5950953145F74DBE1E10515616EF2E.0 x-checkrecipientchecked: true Authentication-Results-Original: linaro.org; dkim=none (message not signed) header.d=none;linaro.org; dmarc=none action=none header.from=arm.com; x-originating-ip: [86.31.103.53] x-ms-publictraffictype: Email X-MS-Office365-Filtering-Correlation-Id: ab78cff0-db8f-4599-3400-08d91eadb491 x-ms-traffictypediagnostic: PAXPR08MB6670:|DB8PR08MB5337: X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true nodisclaimer: true x-ms-oob-tlc-oobclassifiers: OLM:626;OLM:626; X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: uhEUGlBqrPIlaI+f3CsEVy9iVt6m0ZSAKESFTTpdIuojdHGMsirY8wDSWk0CjpS5+LCqhVF1HjSnEnFQXlu/0jbAQgveUwI9fb8MjXbqBZAzp5uJUo2DHeQjnTab/N8kZikkYW7VJo6iFtXZnnYalg+5m5vCp7tOs3H8l7epBAzvtoTaW6XAItuJ5Cg5Hr63V+aIGw8QEfznDX0ZWV/iGbWCZhj7V5k6bQxj9XKgC2ILO6NBgm0LG+DqwrMuAcVOTf7pk92sHFXvAGlL7e5000rVGJennsjoJ+U85GqUI/y/X8JOq0X4EolB13iXjam/8kBx8Maij6PM81QS4NP/gFalxX45AG96KOBOuSgYkPZB+tLlvpZPr+horu4EmDNYPKnqqeumQjYjmPmsy7oZ14AMal0Kd0YR6Lvs4Ap+o41zvXopEaI+iFKx/aYrwq1GLLYLP96IzZFDvn2gNgQHaH2Fuc9JQ5zDv3YEd/nz7GaZQyBfug1t8C7yTO9dPYwNOgdgMrYO9+5tYmpD2REFSuMD0r65nW6d4trEYG6cf8LQkJGeH/JSfWCUYT6KIPV0HRLO0F3pCLANb7Pg4Vm5ubUS0M5qqOW3oVgpEMxjw9E= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:PAXPR08MB6926.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(136003)(346002)(39850400004)(366004)(376002)(396003)(26005)(186003)(478600001)(33656002)(6916009)(53546011)(6506007)(4326008)(83380400001)(38100700002)(122000001)(7696005)(55236004)(71200400001)(66556008)(9686003)(66476007)(8676002)(55016002)(76116006)(316002)(5660300002)(52536014)(2906002)(66946007)(64756008)(86362001)(66446008)(8936002); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata: =?us-ascii?Q?a7W13hkQB+auxz4OOy55bGdj7uuR2ZgqcpKVW8JwOmGVJ0/5noDDSl1SVXKt?= =?us-ascii?Q?LwkJSeEjWlYcytsp4yL5Xw8nfp87BbmegxMqUHbCiWNh7HwVlsabrJF33AUc?= =?us-ascii?Q?3T3h1xHpIjX8eaRhjRTzJH+EZM/Tw2r2Z7rBjatNmiIQlZWHrJu9wkAhLzbZ?= =?us-ascii?Q?xbQg4/81i4U/12akCGrb27mfueAZq2yI54vgf8S2/slN5JFaMX5bxzBGeSZj?= =?us-ascii?Q?UhsRUqQT9JB7rAnoJsYd7ROXacjMUmGODZkAzXWCDKbJUleIOXXBDF98bxyR?= =?us-ascii?Q?fI0ID+SbdzUd0+DHPFSiTAp2uVw8cmEzusCoKUyR+goFDpsKIH1tx1Vt9q0c?= =?us-ascii?Q?oequjXUMiKG24Nu2T6wVPAdMdgu8qSHUanIKPS6PHpKw2joqV/bzkYtNoflr?= =?us-ascii?Q?lKCm+hSZhXT1ktvPLif8kliEUTXdWVnTWRYZLu1mfGOfj/vjsyAfDNYECcsP?= =?us-ascii?Q?JM0BQF+5JeFgXlwSK7UP8zfuxDiXYJoniD8MjRjAlG9gK7mF1fp6yekjYlMc?= =?us-ascii?Q?wLZ/By03TrTw/cQb1evikHdXl6j0Lz8VSV8jnATrCawXq3uCIEcmJEBYHcL6?= =?us-ascii?Q?L1SSMNM4sxNQTsuB8TPMVHYCufkkrqTniuOpp6jFzoqM5R+/t1hNbn3P9KGF?= =?us-ascii?Q?YwdrATIIBoWif3RoXtornW6pIQjYGOeY7ruPO9Hrqfzk+a3htTiotR/39bkL?= =?us-ascii?Q?2rkyDc3wSzleJ3zZAhYwGWoJoaDcSpJZaiXBEBtRX0kV/9wxlPMzPGf0FNbf?= =?us-ascii?Q?WWEjCedgZNEOp5wyG75ivvdyn25j7hVrRtkQJSpMQDxbKDjr/Ud28hK+zoNX?= =?us-ascii?Q?8BeXGHzLg/2YKeG2xzf1qwlsk00EyLbsI1a/4YspjnZXSLsrhnUqMvBHagLi?= =?us-ascii?Q?hRIKGAIHd76Ui64pSC2MtH15gvIF6VpNPqykhHcjw7pDRaYH8o/kgVACpJO2?= =?us-ascii?Q?hH7eMiFM+XVRjFuWQsLHcXl83EN3qwMKiCpAz94BMdIZh9L+/eFe8NsgUya8?= =?us-ascii?Q?2jXhWFn3gpbR7K8soQARukn0mcyuw6qMRoMkYs3ZGvAuJU5HcUNWQUfTbU9o?= =?us-ascii?Q?KwirGxPd87YYkTXYkjEb0sdrFU8YPvsePJUtAtLc+iRaMdm4oHs0c+NyavU2?= =?us-ascii?Q?4/zKYelrw03nuPnVwo7Eq3DZLHcRq+T+0Tp9HIF0d+sdABMgA0PnXXh86w+E?= =?us-ascii?Q?4MpIlGEmuM8wi9Sir6bgaUeC+pwj+Ti1IsubIsTS3ks4dzS7nrNezI9pRjtI?= =?us-ascii?Q?SEAUneUdfIdd5F4h3ogYy1BMuDGilUvTRFIwjr3ixBJzmn0M7ifUTIPccLC/?= =?us-ascii?Q?dIM=3D?= x-ms-exchange-transport-forked: True Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: PAXPR08MB6670 Original-Authentication-Results: linaro.org; dkim=none (message not signed) header.d=none;linaro.org; dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM5EUR03FT021.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: 11cdb757-5d3f-4dd8-75b4-08d91eadb03a X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: uEr+afvhc89ZQrarYSxsZc7Q4195/5ZLE0aXjkyB14rG6Kg/Nq7480gBWN7N+8Y0tg084uSpJbF2/oxJPGnck1gIsEMKKhNU61ZAs6g2QTFdVPRQOCiL7OFcTG4IgnIlD4EDkdlOJp0pJD/DlEC864IGVygJR5eo1WMlz9FUiaDkmjpX0AN6TJCpJTjyUuV+IXF4MiFu7wDhjyIPj0MRjpfO3zjPaqWnG+7VKeKmTJglfJ+F02hx3HWvMciTSBzunaE0I7vHn53G+mAulcb+kHb+5CUtTbdDysVhoCl2Uzai1PC8jYgVVIo1hMhAPTX7EaJ9PURUUdhLEJuvzMttTI+Fzn5IlJLa6mg0bzPIGlr6lRLigQ9xsgMWjdcLadTm/1QmWYl5h1SFCJzI5FQgedWUSuOTASbWrelK6vNu95D4XaPACKcC1WGIrmj9+qbqrfjyICBgf80E6W8n2EriE6BRYxvmdB9LTVQ4bISEBzexaJx+BT7TM4OExtweNjnwir/GtZf/Cv9BReGLFXgz+HMh0lq3FvrTN+Px/wjqjvYIkwnC+AEUgQQhTrqqnZqrE6Zvyd6C80GvM2VT+wO4OEvu7BToGDP5LS0Wg0+kccKEp0eAdDSxMB88VBkqev3ULEaGBgOrAr3QaYbIJqHFJg== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(136003)(346002)(39850400004)(396003)(376002)(36840700001)(46966006)(70586007)(316002)(356005)(81166007)(8676002)(82740400003)(86362001)(7696005)(478600001)(33656002)(8936002)(70206006)(2906002)(26005)(82310400003)(5660300002)(6862004)(6506007)(53546011)(47076005)(36860700001)(55016002)(186003)(83380400001)(4326008)(52536014)(9686003)(336012)(55236004); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 24 May 2021 12:16:05.6516 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: ab78cff0-db8f-4599-3400-08d91eadb491 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM5EUR03FT021.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB8PR08MB5337 X-Spam-Status: No, score=-13.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, KAM_NUMSUBJECT, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 May 2021 12:16:10 -0000 > -----Original Message----- > From: Gcc-patches On Behalf Of > Christophe Lyon via Gcc-patches > Sent: 30 April 2021 15:10 > To: gcc-patches@gcc.gnu.org > Subject: [PATCH 9/9] arm: Auto-vectorization for MVE: vld4/vst4 >=20 > This patch enables MVE vld4/vst4 instructions for auto-vectorization. > We move the existing expanders from neon.md and enable them for MVE, > calling the respective emitter. Ok. Thanks, Kyrill >=20 > 2021-03-12 Christophe Lyon >=20 > gcc/ > * config/arm/neon.md (vec_load_lanesxi) > (vec_store_lanexoi): Move ... > * config/arm/vec-common.md: here. >=20 > gcc/testsuite/ > * gcc.target/arm/simd/mve-vld4.c: New test, derived from > slp-perm-3.c > --- > gcc/config/arm/neon.md | 20 ---- > gcc/config/arm/vec-common.md | 26 +++++ > gcc/testsuite/gcc.target/arm/simd/mve-vld4.c | 140 > +++++++++++++++++++++++++++ > 3 files changed, 166 insertions(+), 20 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-vld4.c >=20 > diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md > index bc8775c..fb58baf 100644 > --- a/gcc/config/arm/neon.md > +++ b/gcc/config/arm/neon.md > @@ -5617,16 +5617,6 @@ (define_insn "neon_vld4" > (const_string "neon_load4_4reg")))] > ) >=20 > -(define_expand "vec_load_lanesxi" > - [(match_operand:XI 0 "s_register_operand") > - (match_operand:XI 1 "neon_struct_operand") > - (unspec:VQ2 [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] > - "TARGET_NEON" > -{ > - emit_insn (gen_neon_vld4 (operands[0], operands[1])); > - DONE; > -}) > - > (define_expand "neon_vld4" > [(match_operand:XI 0 "s_register_operand") > (match_operand:XI 1 "neon_struct_operand") > @@ -5818,16 +5808,6 @@ (define_insn "neon_vst4" > (const_string "neon_store4_4reg")))] > ) >=20 > -(define_expand "vec_store_lanesxi" > - [(match_operand:XI 0 "neon_struct_operand") > - (match_operand:XI 1 "s_register_operand") > - (unspec:VQ2 [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] > - "TARGET_NEON" > -{ > - emit_insn (gen_neon_vst4 (operands[0], operands[1])); > - DONE; > -}) > - > (define_expand "neon_vst4" > [(match_operand:XI 0 "neon_struct_operand") > (match_operand:XI 1 "s_register_operand") > diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec- > common.md > index 7abefea..d46b78d 100644 > --- a/gcc/config/arm/vec-common.md > +++ b/gcc/config/arm/vec-common.md > @@ -512,3 +512,29 @@ (define_expand "vec_store_lanesoi" > emit_insn (gen_mve_vst2q (operands[0], operands[1])); > DONE; > }) > + > +(define_expand "vec_load_lanesxi" > + [(match_operand:XI 0 "s_register_operand") > + (match_operand:XI 1 "neon_struct_operand") > + (unspec:VQ2 [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] > + "TARGET_NEON || TARGET_HAVE_MVE" > +{ > + if (TARGET_NEON) > + emit_insn (gen_neon_vld4 (operands[0], operands[1])); > + else > + emit_insn (gen_mve_vld4q (operands[0], operands[1])); > + DONE; > +}) > + > +(define_expand "vec_store_lanesxi" > + [(match_operand:XI 0 "neon_struct_operand") > + (match_operand:XI 1 "s_register_operand") > + (unspec:VQ2 [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] > + "TARGET_NEON || TARGET_HAVE_MVE" > +{ > + if (TARGET_NEON) > + emit_insn (gen_neon_vst4 (operands[0], operands[1])); > + else > + emit_insn (gen_mve_vst4q (operands[0], operands[1])); > + DONE; > +}) > diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vld4.c > b/gcc/testsuite/gcc.target/arm/simd/mve-vld4.c > new file mode 100644 > index 0000000..ce3e755 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vld4.c > @@ -0,0 +1,140 @@ > +/* { dg-do assemble } */ > +/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */ > +/* { dg-add-options arm_v8_1m_mve_fp } */ > +/* { dg-additional-options "-O3" } */ > + > +#include > + > +#define M00 100 > +#define M10 216 > +#define M20 23 > +#define M30 237 > +#define M01 1322 > +#define M11 13 > +#define M21 27271 > +#define M31 2280 > +#define M02 74 > +#define M12 191 > +#define M22 500 > +#define M32 111 > +#define M03 134 > +#define M13 117 > +#define M23 11 > +#define M33 771 > + > +#define N 128 > + > +/* Integer tests. */ > +#define FUNC(SIGN, TYPE, BITS) > \ > + void foo_##SIGN##BITS##x (TYPE##BITS##_t *__restrict__ pInput, \ > + TYPE##BITS##_t *__restrict__ pOutput) \ > + { \ > + unsigned int i; \ > + TYPE##BITS##_t a, b, c, d; \ > + \ > + for (i =3D 0; i < N / BITS; i++) \ > + { \ > + a =3D *pInput++; \ > + b =3D *pInput++; \ > + c =3D *pInput++; \ > + d =3D *pInput++; \ > + \ > + *pOutput++ =3D M00 * a + M01 * b + M02 * c + M03 * d; \ > + *pOutput++ =3D M10 * a + M11 * b + M12 * c + M13 * d; \ > + *pOutput++ =3D M20 * a + M21 * b + M22 * c + M23 * d; \ > + *pOutput++ =3D M30 * a + M31 * b + M32 * c + M33 * d; \ > + } \ > + } > + > +FUNC(s, int, 8) > +FUNC(u, uint, 8) > +FUNC(s, int, 16) > +FUNC(u, uint, 16) > +FUNC(s, int, 32) > +FUNC(u, uint, 32) > + > +/* float test, keep the macro because it's similar to the above, but doe= s not > + need the ##BITS##_t. */ > +#define FUNC_FLOAT(SIGN, TYPE, BITS) > \ > + void foo_##SIGN##BITS##x (TYPE *__restrict__ pInput, > \ > + TYPE *__restrict__ pOutput) \ > + { \ > + unsigned int i; \ > + TYPE a, b, c, d; \ > + \ > + for (i =3D 0; i < N / BITS; i++) \ > + { \ > + a =3D *pInput++; \ > + b =3D *pInput++; \ > + c =3D *pInput++; \ > + d =3D *pInput++; \ > + \ > + *pOutput++ =3D M00 * a + M01 * b + M02 * c + M03 * d; \ > + *pOutput++ =3D M10 * a + M11 * b + M12 * c + M13 * d; \ > + *pOutput++ =3D M20 * a + M21 * b + M22 * c + M23 * d; \ > + *pOutput++ =3D M30 * a + M31 * b + M32 * c + M33 * d; \ > + } \ > + } > + > +FUNC_FLOAT(f, float, 32) > + > +/* __fp16 test, needs explicit casts to avoid conversions to floating-po= int > and > + failure to vectorize. */ > +__fp16 M00_fp16 =3D 100.0f16; > +__fp16 M10_fp16 =3D 216.0f16; > +__fp16 M20_fp16 =3D 23.0f16; > +__fp16 M30_fp16 =3D 237.0f16; > +__fp16 M01_fp16 =3D 1322.0f16; > +__fp16 M11_fp16 =3D 13.0f16; > +__fp16 M21_fp16 =3D 27271.0f16; > +__fp16 M31_fp16 =3D 2280.0f16; > +__fp16 M02_fp16 =3D 74.0f16; > +__fp16 M12_fp16 =3D 191.0f16; > +__fp16 M22_fp16 =3D 500.0f16; > +__fp16 M32_fp16 =3D 111.0f16; > +__fp16 M03_fp16 =3D 134.0f16; > +__fp16 M13_fp16 =3D 117.0f16; > +__fp16 M23_fp16 =3D 11.0f16; > +__fp16 M33_fp16 =3D 771.0f16; > + > +#define FUNC_FLOAT_FP16(SIGN, TYPE, BITS) \ > + void foo_##SIGN##BITS##x (TYPE *__restrict__ pInput, > \ > + TYPE *__restrict__ pOutput) \ > + { \ > + unsigned int i; \ > + TYPE a, b, c, d; \ > + \ > + for (i =3D 0; i < N / BITS; i++) \ > + { \ > + a =3D *pInput++; \ > + b =3D *pInput++; \ > + c =3D *pInput++; \ > + d =3D *pInput++; \ > + \ > + TYPE ab, cd; \ > + ab =3D (__fp16)(M00_fp16 * a) + (__fp16)(M01_fp16 * b); \ > + cd =3D (__fp16)(M02_fp16 * c) + (__fp16)(M03_fp16 * d); \ > + *pOutput++ =3D ab + cd; \ > + ab =3D (__fp16)(M10_fp16 * a) + (__fp16)(M11_fp16 * b); \ > + cd =3D (__fp16)(M12_fp16 * c) + (__fp16)(M13_fp16 * d); \ > + *pOutput++ =3D ab + cd; \ > + ab =3D (__fp16)(M20_fp16 * a) + (__fp16)(M21_fp16 * b); \ > + cd =3D (__fp16)(M22_fp16 * c) + (__fp16)(M23_fp16 * d); \ > + *pOutput++ =3D ab + cd; \ > + ab =3D (__fp16)(M30_fp16 * a) + (__fp16)(M31_fp16 * b); \ > + cd =3D (__fp16)(M32_fp16 * c) + (__fp16)(M33_fp16 * d); \ > + *pOutput++ =3D ab + cd; \ > + } \ > + } > + > +FUNC_FLOAT_FP16(f, __fp16, 16) > + > +/* vld4X.8 is used for signed and unsigned chars: 2 * 4. */ > +/* vld4X.16 is used for signed and unsigned shorts and __fp16: 3 * 4. *= / > +/* vld4X.32 is used for signed and unsigned ints and float: 3 * 4. */ > +/* { dg-final { scan-assembler-times {vld4[0123].8\t.q[0-9]+, q[0-9]+, q= [0- > 9]+, q[0-9]+., } 8 } } */ > +/* { dg-final { scan-assembler-times {vld4[0123].16\t.q[0-9]+, q[0-9]+, = q[0- > 9]+, q[0-9]+., } 12 } } */ > +/* { dg-final { scan-assembler-times {vld4[0123].32\t.q[0-9]+, q[0-9]+, = q[0- > 9]+, q[0-9]+., } 12 } } */ > +/* { dg-final { scan-assembler-times {vst4[0123].8\t.q[0-9]+, q[0-9]+, q= [0- > 9]+, q[0-9]+., } 8 } } */ > +/* { dg-final { scan-assembler-times {vst4[0123].16\t.q[0-9]+, q[0-9]+, = q[0- > 9]+, q[0-9]+., } 12 } } */ > +/* { dg-final { scan-assembler-times {vst4[0123].32\t.q[0-9]+, q[0-9]+, = q[0- > 9]+, q[0-9]+., } 12 } } */ > -- > 2.7.4