From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR05-AM6-obe.outbound.protection.outlook.com (mail-am6eur05on2047.outbound.protection.outlook.com [40.107.22.47]) by sourceware.org (Postfix) with ESMTPS id DAC543858D20 for ; Wed, 6 Dec 2023 16:25:18 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org DAC543858D20 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org DAC543858D20 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=40.107.22.47 ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1701879921; cv=pass; b=rAp+XYOeMiKi6TF/UvTE+2wasJKyFaVdF0JkAtdOq8Gio0CQ5x/cZAjN6zGodk97wspYi7WsC5Zv4d0AckngqBlxLOHReO+obKyzO7yYNPzypxE24hWUOu7Yd6EbBGTUY7uU7DIPmHx5NdLMiSK9CHj5qgyNeicyqF0R/t8jjz4= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1701879921; c=relaxed/simple; bh=wm4CfUYBI5Jv7zhvlqA6UADkXUHSqxV/lmuBOOHelmQ=; h=DKIM-Signature:DKIM-Signature:From:To:Subject:Date:Message-ID: MIME-Version; b=Cgc4KfS2w8kUdjolI3AjHoEzf4Y/bRiQ493K8f02KkGczF+gF22d60/LD/bKlAAte/E+eWaTftg0G2ysjNrmeE9jzCFnyyEMlbyaSsuXZ7Nzs3M8ggPycU5qxnwVC2GK2qwzouTC8vMnTDkmEcE78ftEThEP01t+3uCuUMUnGHY= ARC-Authentication-Results: i=3; server2.sourceware.org ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=QuvJOxTe77+ZUnL3Ux1DHYeP/eb2+aeFYds8ke/Kh4eMtwvGAJ1/RujEWT9dsA9J8xY6X1OAFXFmgXddmpB/nV/8/SDih6NpKBzFag/lEkZ1Ri8l6H4pwG4ZfEFth3RRixCS7gEpzrMqYGiM1B1AGIVwXU3N3JqdsgE9ML3YtPYMUcVme1UaiHkFV0fyswGQ4N/niVKtb1X2aNUMlciCQX9aa/yVXtwpnLx3sjAhFE8Za2EpES24Np9AQTQw9hEC18l/QNHhz4T6TP8uaMtr/I6kOe1IFSkYThRbXcxz1CLp0h7x9KcgLI2B2uhpKl4lh8BgFfe7EVQXZDNEBktlxQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=3xRRMhv5Gi+ZB00sm0ofnvHFyL2t8U870E1jyVKdeR8=; b=DHG89g/ocxPG3fstJOlRjVrYxs/afD6jYvyftiBmiwQfJ5lwjWDgs4DDXnSbq3dY3x8IARLtRnF91rmKZqQJG9GZdTWb2wk6DT7Ae9A2dcciF1DuIV4ly9u1N4clmyTPqL4NokFIL9peLfNVl18Fq5FOXaCLzhLi2ZxlXvyUU8fkciJdPnw3dWnbwqOBL29cVs2ly/9o5TwnZ/wGPfD+RbJ8enfukwu+yTMVNrjey6oUW0b5aiG/X0kkXBHNk5HUVIbyl9HPSbHCUj+Oiu4j/6COWQlKXTg/Rv/XRL97wns24tgEl8Ntzgm4dWBGCr9qa94d/6DpbIalCiI3/pnJ3w== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=3xRRMhv5Gi+ZB00sm0ofnvHFyL2t8U870E1jyVKdeR8=; b=EfHQQt1ibVyAYlNdaT+2eHEpPHUz3Q3oFNZbDorSqFDWfb43kxFeaZM7zV64rExzAS0QY1QjKBvf5ZYl79rqKKrEZFQXgHtbyd7gYsI6U2rFSOQyL2mGXeVF5pFRzUYmoVfLlJz1pkpVzkv9D0HJM++THjMqqGvrXYv/1j4VqzE= Received: from AS9PR06CA0311.eurprd06.prod.outlook.com (2603:10a6:20b:45b::8) by AM7PR08MB5527.eurprd08.prod.outlook.com (2603:10a6:20b:de::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7068.25; Wed, 6 Dec 2023 16:25:15 +0000 Received: from AM2PEPF0001C711.eurprd05.prod.outlook.com (2603:10a6:20b:45b:cafe::12) by AS9PR06CA0311.outlook.office365.com (2603:10a6:20b:45b::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7046.34 via Frontend Transport; Wed, 6 Dec 2023 16:25:15 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM2PEPF0001C711.mail.protection.outlook.com (10.167.16.181) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7068.20 via Frontend Transport; Wed, 6 Dec 2023 16:25:15 +0000 Received: ("Tessian outbound 385ad2f98d71:v228"); Wed, 06 Dec 2023 16:25:15 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: b1bb77f222d8a8fd X-CR-MTA-TID: 64aa7808 Received: from 59bedd66f0ac.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 896EB898-B701-4589-962C-C45449D5CFE2.1; Wed, 06 Dec 2023 16:25:09 +0000 Received: from EUR01-DB5-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 59bedd66f0ac.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Wed, 06 Dec 2023 16:25:09 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=DJUCuu3x2XhoFeeS4YxKrimKqX5YrHEepiJIxdkVopAqVcMzyCA04SVoHDzWoUt1hm0MsvO0x7knjUmKh8JuLVHImtzlBmWp1KwLs7xgC81sdON7UN2pJRvyDzlPgXGXKBZNk6152NeVELUBqLHCzyx80iqNPLl0dEIGX1JulDcLTzEtHQHfLmFbUUeSVUN+WtwplG0N+0WFGYAeEWlyor/vGq3A0mEgN70W2Z2whJnexmioa59DLFlub0mVOdv72VA7zhe9DAafBo5Y4EROH09L6Gu+/CtKF0x2W9zYdtiLPXS5D5VgOgh2datz4AEqpdIOQlAeh2oTTXXTwQj9JA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=3xRRMhv5Gi+ZB00sm0ofnvHFyL2t8U870E1jyVKdeR8=; b=UxlLf7QJ1heYV1zi6UL4oXwX3QepuQxsXw56VJwsyR6lKDUDXzlpiiilYEUtFn7h4iw9+bZ5MQwaGo3PhtnRVkkXuTgR5ZXyAceMPmjLPE9WaRjvizxNNcv7r+unEDiZyEAmLDZz3IBPeT+yD9AMyq7OTlXEUY40ztZDlCVnv+vqcBSzdDQbgsx72HCQQrGK1uaNkLobD+tU9cuZ+AYqBlIo91sWfYdkobguS2uma0UudyAYxz132WXjuUkA+xJfpAYS+cfBhrrO+lggDaj7R6DBfRzfuUmldz1JqdWMWz6AJobCDchkfRB8Mqim1tCOOY4rp61bIW76mccZco7hBQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=3xRRMhv5Gi+ZB00sm0ofnvHFyL2t8U870E1jyVKdeR8=; b=EfHQQt1ibVyAYlNdaT+2eHEpPHUz3Q3oFNZbDorSqFDWfb43kxFeaZM7zV64rExzAS0QY1QjKBvf5ZYl79rqKKrEZFQXgHtbyd7gYsI6U2rFSOQyL2mGXeVF5pFRzUYmoVfLlJz1pkpVzkv9D0HJM++THjMqqGvrXYv/1j4VqzE= Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by PAWPR08MB9007.eurprd08.prod.outlook.com (2603:10a6:102:340::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7046.34; Wed, 6 Dec 2023 16:25:03 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::9679:2ab0:99c6:54a3]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::9679:2ab0:99c6:54a3%6]) with mapi id 15.20.7046.034; Wed, 6 Dec 2023 16:25:03 +0000 From: Tamar Christina To: Richard Sandiford CC: "gcc-patches@gcc.gnu.org" , nd , Richard Earnshaw , Marcus Shawcroft , Kyrylo Tkachov Subject: RE: [PATCH 17/21]AArch64: Add implementation for vector cbranch for Advanced SIMD Thread-Topic: [PATCH 17/21]AArch64: Add implementation for vector cbranch for Advanced SIMD Thread-Index: AQHaEIS6oq31vWJHo0GMMiZ4C2R21bCQEU1zgAAV/ZOADHe6EA== Date: Wed, 6 Dec 2023 16:25:03 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: yes X-MS-TNEF-Correlator: Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-traffictypediagnostic: VI1PR08MB5325:EE_|PAWPR08MB9007:EE_|AM2PEPF0001C711:EE_|AM7PR08MB5527:EE_ X-MS-Office365-Filtering-Correlation-Id: 0449b19b-f397-4543-4062-08dbf677ee1b x-checkrecipientrouted: true nodisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: GggH4MOQD5EoWWFYgtLpESjy95wznky/WS6vh+6kqjex0sPG+NsOaXDkHSDrMR9Oh+7sA1ztBTi4wCcMvlkew+uYBRSGhd2fZDL3pkDrtxjlEFX5M8a9iQu/3pjrtTzDrz9a7eWEpisd/xCumZ9RJW4ys/n8QhDKFz3BpxhKZQgSREbJL1Hmh1Fwga9VMWvDgvXYgFIpXp96B3Upf69EL+SBrkHX6sqRbx+sDV0mMFnUpr5OjwZn8F+0FYh68QQ6G2DY5dNaySYzsdn6u6r8wqfTmVnyhasq7Qf8ke7d6X0EbSGRwMLUDN0CAP14j9hgsfqZBqWq5+bhYrUPiLVp8ruazA+gW+XhOycvlksW2bMM0s+BlM0yqZJdkVR7KB6FvDNnCGROCGYe7t22Wu9UrgBfe41nzfIy2wuMy0MJ0btH9lpPf7AT2syjaEqa7s/D9zMeHdi+fAFpkE+skz0kM1kiWACL4LTgZ+glt1NFzg2PutNlxpa1pYk1/jtFs5hdgsCvzKsd1rgu33nwKe5LIKrl/xUkyLlNnzeMYuT2Pd3sHLCZy5W8s0BpAcCn2bOUpwTw2Tq2h+Pzap+JKDDrm8I1InqG0yIV15b4m5YufDk= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:VI1PR08MB5325.eurprd08.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230031)(396003)(346002)(136003)(366004)(376002)(39860400002)(230922051799003)(451199024)(186009)(64100799003)(1800799012)(8936002)(8676002)(6862004)(4326008)(7696005)(71200400001)(52536014)(53546011)(26005)(9686003)(6506007)(64756008)(6636002)(54906003)(66446008)(66556008)(66946007)(66476007)(76116006)(316002)(41300700001)(478600001)(38070700009)(5660300002)(2906002)(84970400001)(86362001)(38100700002)(122000001)(33656002)(99936003)(55016003)(83380400001);DIR:OUT;SFP:1101; Content-Type: multipart/mixed; boundary="_002_VI1PR08MB53252694799DD7DD3D6A12A9FF84AVI1PR08MB5325eurp_" MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: PAWPR08MB9007 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM2PEPF0001C711.eurprd05.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 4ceff2fd-808b-4da6-b717-08dbf677e679 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: WWDrCT7fLm14p+dpKarlrsjXImxmHUK+JpSNC2VzXUdRvKX1Q4E4yaFWracuJQN8udIWi+UuAVSiXDcBhMR+1IpfX5FK0RdXTW8DQmbahp3u+lFeiOml316X+XcHPhp3rIvu4mSpwCjjBfmWaGANjZ2nZEkZYkbTqiRy+F9eBkZW2kykOS2qplmdm0BG++xrRa4XexfsZlalcB+hLMDaZntEdhzQcQgHqrVIX3L9dcFI5ZIDd3N5WHH4U9mvVOu0ekrnkgC1vw6OGj6V4cun7R2V7+S6rv14ArRXI6uv0zzFcdrTq5tKtRqd2PY7s2HESfIpQFi9mZOodKJKw1IIlo8Z8DRwjoWFjqHfN+mCJxyvfyhps5pfi8HehzLGiajmprZiIF8R/SNqCxEKzhAiKWsH7MFkB9EmTeG6gwBH2sxcYA0wd/MDGZo6itu0k/O3BpeCgpBbQPLaCN9wYRPoDr2eePgkzMDoT0yPd6uFOy6uF+FhmSl9WEX4JdmHUF+ykgeXUuCDr7JrGbKuvt8Z0gkuKxb7ALtHn5pyOntvjdngLLMm2Qj7dftg94zTdDdhpKPusPb0eou2fthu0OEamkgiIuAiTkdLMiQXHv9gCEXcmJe5AdmalNOwAhDt2aj3u4Fxy3cEuNpx+WjT8q1MacC/5AkLFjAyi0lgK0387UAB2kTxIdUtZ2J8jz9yYdl6NlTlX2Jg5z2N1ApGKcvtYw== X-Forefront-Antispam-Report: CIP:63.35.35.123;CTRY:IE;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:64aa7808-outbound-1.mta.getcheckrecipient.com;PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com;CAT:NONE;SFS:(13230031)(4636009)(396003)(136003)(376002)(39860400002)(346002)(230922051799003)(451199024)(1800799012)(82310400011)(64100799003)(186009)(46966006)(36840700001)(82740400003)(81166007)(356005)(2906002)(336012)(26005)(54906003)(6636002)(47076005)(70206006)(70586007)(83380400001)(84970400001)(8936002)(7696005)(41300700001)(40480700001)(6506007)(53546011)(9686003)(6862004)(4326008)(55016003)(52536014)(8676002)(316002)(36860700001)(33656002)(235185007)(5660300002)(99936003)(478600001)(86362001);DIR:OUT;SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Dec 2023 16:25:15.8580 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 0449b19b-f397-4543-4062-08dbf677ee1b X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d;Ip=[63.35.35.123];Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM2PEPF0001C711.eurprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM7PR08MB5527 X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,FORGED_SPF_HELO,GIT_PATCH_0,KAM_DMARC_NONE,KAM_LOTSOFHASH,KAM_SHORT,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE,UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --_002_VI1PR08MB53252694799DD7DD3D6A12A9FF84AVI1PR08MB5325eurp_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable > -----Original Message----- > From: Richard Sandiford > Sent: Tuesday, November 28, 2023 5:56 PM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw > ; Marcus Shawcroft > ; Kyrylo Tkachov > Subject: Re: [PATCH 17/21]AArch64: Add implementation for vector cbranch = for > Advanced SIMD >=20 > Richard Sandiford writes: > > Tamar Christina writes: > >> Hi All, > >> > >> This adds an implementation for conditional branch optab for AArch64. > >> > >> For e.g. > >> > >> void f1 () > >> { > >> for (int i =3D 0; i < N; i++) > >> { > >> b[i] +=3D a[i]; > >> if (a[i] > 0) > >> break; > >> } > >> } > >> > >> For 128-bit vectors we generate: > >> > >> cmgt v1.4s, v1.4s, #0 > >> umaxp v1.4s, v1.4s, v1.4s > >> fmov x3, d1 > >> cbnz x3, .L8 > >> > >> and of 64-bit vector we can omit the compression: > >> > >> cmgt v1.2s, v1.2s, #0 > >> fmov x2, d1 > >> cbz x2, .L13 > >> > >> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > >> > >> Ok for master? > >> > >> Thanks, > >> Tamar > >> > >> gcc/ChangeLog: > >> > >> * config/aarch64/aarch64-simd.md (cbranch4): New. > >> > >> gcc/testsuite/ChangeLog: > >> > >> * gcc.target/aarch64/vect-early-break-cbranch.c: New test. > >> > >> --- inline copy of patch -- > >> diff --git a/gcc/config/aarch64/aarch64-simd.md > b/gcc/config/aarch64/aarch64-simd.md > >> index > 90118c6348e9614bef580d1dc94c0c1841dd5204..cd5ec35c3f53028f14828bd7 > 0a92924f62524c15 100644 > >> --- a/gcc/config/aarch64/aarch64-simd.md > >> +++ b/gcc/config/aarch64/aarch64-simd.md > >> @@ -3830,6 +3830,46 @@ (define_expand > "vcond_mask_" > >> DONE; > >> }) > >> > >> +;; Patterns comparing two vectors and conditionally jump > >> + > >> +(define_expand "cbranch4" > >> + [(set (pc) > >> + (if_then_else > >> + (match_operator 0 "aarch64_equality_operator" > >> + [(match_operand:VDQ_I 1 "register_operand") > >> + (match_operand:VDQ_I 2 "aarch64_simd_reg_or_zero")]) > >> + (label_ref (match_operand 3 "")) > >> + (pc)))] > >> + "TARGET_SIMD" > >> +{ > >> + auto code =3D GET_CODE (operands[0]); > >> + rtx tmp =3D operands[1]; > >> + > >> + /* If comparing against a non-zero vector we have to do a compariso= n first > >> + so we can have a !=3D 0 comparison with the result. */ > >> + if (operands[2] !=3D CONST0_RTX (mode)) > >> + emit_insn (gen_vec_cmp (tmp, operands[0], operands[1]= , > >> + operands[2])); > >> + > >> + /* For 64-bit vectors we need no reductions. */ > >> + if (known_eq (128, GET_MODE_BITSIZE (mode))) > >> + { > >> + /* Always reduce using a V4SI. */ > >> + rtx reduc =3D gen_lowpart (V4SImode, tmp); > >> + rtx res =3D gen_reg_rtx (V4SImode); > >> + emit_insn (gen_aarch64_umaxpv4si (res, reduc, reduc)); > >> + emit_move_insn (tmp, gen_lowpart (mode, res)); > >> + } > >> + > >> + rtx val =3D gen_reg_rtx (DImode); > >> + emit_move_insn (val, gen_lowpart (DImode, tmp)); > >> + > >> + rtx cc_reg =3D aarch64_gen_compare_reg (code, val, const0_rtx); > >> + rtx cmp_rtx =3D gen_rtx_fmt_ee (code, DImode, cc_reg, const0_rtx); > >> + emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[3])); > >> + DONE; > > > > Are you sure this is correct for the operands[2] !=3D const0_rtx case? > > It looks like it uses the same comparison code for the vector compariso= n > > and the scalar comparison. > > > > E.g. if the pattern is passed a comparison: > > > > (eq (reg:V2SI x) (reg:V2SI y)) > > > > it looks like we'd generate a CMEQ for the x and y, then branch > > when the DImode bitcast of the CMEQ result equals zero. This means > > that we branch when no elements of x and y are equal, rather than > > when all elements of x and y are equal. > > > > E.g. for: > > > > { 1, 2 } =3D=3D { 1, 2 } > > > > CMEQ will produce { -1, -1 }, the scalar comparison will be -1 =3D=3D 0= , > > and the branch won't be taken. > > > > ISTM it would be easier for the operands[2] !=3D const0_rtx case to use > > EOR instead of a comparison. That gives a zero result if the input > > vectors are equal and a nonzero result if the input vectors are > > different. We can then branch on the result using CODE and const0_rtx. > > > > (Hope I've got that right.) > > > > Maybe that also removes the need for patch 18. >=20 > Sorry, I forgot to say: we can't use operands[1] as a temporary, > since it's only an input to the pattern. The EOR destination would > need to be a fresh register. I've updated the patch but it doesn't help since cbranch doesn't really pus= h comparisons in. So we don't seem to ever really get called with anything n= on-zero. That said, I'm not entirely convince that the =3D=3D case is correct. Since= =3D=3D means all bits Equal instead of any bit set, and so it needs to generate cbz instead of cb= nz and I'm not sure that's guaranteed. I do have a failing testcase with this but haven't tracked down yet if the = mid-end did the right thing. Think there might be a similar issue in a match.pd simplicati= on. Thoughts on the =3D=3D case? Thanks, Tamar --- inline copy of patch --- diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch6= 4-simd.md index c6f2d5828373f2a5272b9d1227bfe34365f9fd09..7b289b1fbec6b1f15fbf51b6c86= 2bcf9a5588b6b 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -3911,6 +3911,46 @@ (define_expand "vcond_mask_" DONE; }) =20 +;; Patterns comparing two vectors and conditionally jump + +(define_expand "cbranch4" + [(set (pc) + (if_then_else + (match_operator 0 "aarch64_equality_operator" + [(match_operand:VDQ_I 1 "register_operand") + (match_operand:VDQ_I 2 "aarch64_simd_reg_or_zero")]) + (label_ref (match_operand 3 "")) + (pc)))] + "TARGET_SIMD" +{ + auto code =3D GET_CODE (operands[0]); + rtx tmp =3D operands[1]; + + /* If comparing against a non-zero vector we have to do a comparison fir= st + so we can have a !=3D 0 comparison with the result. */ + if (operands[2] !=3D CONST0_RTX (mode)) + { + tmp =3D gen_reg_rtx (mode); + emit_insn (gen_xor3 (tmp, operands[1], operands[2])); + } + + /* For 64-bit vectors we need no reductions. */ + if (known_eq (128, GET_MODE_BITSIZE (mode))) + { + /* Always reduce using a V4SI. */ + rtx reduc =3D gen_lowpart (V4SImode, tmp); + rtx res =3D gen_reg_rtx (V4SImode); + emit_insn (gen_aarch64_umaxpv4si (res, reduc, reduc)); + emit_move_insn (tmp, gen_lowpart (mode, res)); + } + + rtx val =3D gen_reg_rtx (DImode); + emit_move_insn (val, gen_lowpart (DImode, tmp)); + emit_jump_insn (gen_cbranchdi4 (operands[0], val, CONST0_RTX (DImode), + operands[3])); + DONE; +}) + ;; Patterns comparing two vectors to produce a mask. =20 (define_expand "vec_cmp" diff --git a/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c b/= gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c new file mode 100644 index 0000000000000000000000000000000000000000..c0363c3787270507d7902bb2ac0= e39faef63a852 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c @@ -0,0 +1,124 @@ +/* { dg-do compile } */ +/* { dg-options "-O3" } */ +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */ + +#pragma GCC target "+nosve" + +#define N 640 +int a[N] =3D {0}; +int b[N] =3D {0}; + + +/* +** f1: +** ... +** cmgt v[0-9]+.4s, v[0-9]+.4s, #0 +** umaxp v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s +** fmov x[0-9]+, d[0-9]+ +** cbnz x[0-9]+, \.L[0-9]+ +** ... +*/ +void f1 () +{ + for (int i =3D 0; i < N; i++) + { + b[i] +=3D a[i]; + if (a[i] > 0) + break; + } +} + +/* +** f2: +** ... +** cmge v[0-9]+.4s, v[0-9]+.4s, #0 +** umaxp v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s +** fmov x[0-9]+, d[0-9]+ +** cbnz x[0-9]+, \.L[0-9]+ +** ... +*/ +void f2 () +{ + for (int i =3D 0; i < N; i++) + { + b[i] +=3D a[i]; + if (a[i] >=3D 0) + break; + } +} + +/* +** f3: +** ... +** cmeq v[0-9]+.4s, v[0-9]+.4s, #0 +** umaxp v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s +** fmov x[0-9]+, d[0-9]+ +** cbnz x[0-9]+, \.L[0-9]+ +** ... +*/ +void f3 () +{ + for (int i =3D 0; i < N; i++) + { + b[i] +=3D a[i]; + if (a[i] =3D=3D 0) + break; + } +} + +/* +** f4: +** ... +** cmtst v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s +** umaxp v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s +** fmov x[0-9]+, d[0-9]+ +** cbnz x[0-9]+, \.L[0-9]+ +** ... +*/ +void f4 () +{ + for (int i =3D 0; i < N; i++) + { + b[i] +=3D a[i]; + if (a[i] !=3D 0) + break; + } +} + +/* +** f5: +** ... +** cmlt v[0-9]+.4s, v[0-9]+.4s, #0 +** umaxp v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s +** fmov x[0-9]+, d[0-9]+ +** cbnz x[0-9]+, \.L[0-9]+ +** ... +*/ +void f5 () +{ + for (int i =3D 0; i < N; i++) + { + b[i] +=3D a[i]; + if (a[i] < 0) + break; + } +} + +/* +** f6: +** ... +** cmle v[0-9]+.4s, v[0-9]+.4s, #0 +** umaxp v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s +** fmov x[0-9]+, d[0-9]+ +** cbnz x[0-9]+, \.L[0-9]+ +** ... +*/ +void f6 () +{ + for (int i =3D 0; i < N; i++) + { + b[i] +=3D a[i]; + if (a[i] <=3D 0) + break; + } +} --_002_VI1PR08MB53252694799DD7DD3D6A12A9FF84AVI1PR08MB5325eurp_ Content-Type: application/octet-stream; name="rb17509.patch" Content-Description: rb17509.patch Content-Disposition: attachment; filename="rb17509.patch"; size=4048; creation-date="Wed, 06 Dec 2023 16:20:25 GMT"; modification-date="Wed, 06 Dec 2023 16:25:02 GMT" Content-Transfer-Encoding: base64 ZGlmZiAtLWdpdCBhL2djYy9jb25maWcvYWFyY2g2NC9hYXJjaDY0LXNpbWQubWQgYi9nY2MvY29u ZmlnL2FhcmNoNjQvYWFyY2g2NC1zaW1kLm1kCmluZGV4IGM2ZjJkNTgyODM3M2YyYTUyNzJiOWQx MjI3YmZlMzQzNjVmOWZkMDkuLjdiMjg5YjFmYmVjNmIxZjE1ZmJmNTFiNmM4NjJiY2Y5YTU1ODhi NmIgMTAwNjQ0Ci0tLSBhL2djYy9jb25maWcvYWFyY2g2NC9hYXJjaDY0LXNpbWQubWQKKysrIGIv Z2NjL2NvbmZpZy9hYXJjaDY0L2FhcmNoNjQtc2ltZC5tZApAQCAtMzkxMSw2ICszOTExLDQ2IEBA IChkZWZpbmVfZXhwYW5kICJ2Y29uZF9tYXNrXzxtb2RlPjx2X2ludF9lcXVpdj4iCiAgIERPTkU7 CiB9KQogCis7OyBQYXR0ZXJucyBjb21wYXJpbmcgdHdvIHZlY3RvcnMgYW5kIGNvbmRpdGlvbmFs bHkganVtcAorCisoZGVmaW5lX2V4cGFuZCAiY2JyYW5jaDxtb2RlPjQiCisgIFsoc2V0IChwYykK KyAgICAgICAgKGlmX3RoZW5fZWxzZQorICAgICAgICAgIChtYXRjaF9vcGVyYXRvciAwICJhYXJj aDY0X2VxdWFsaXR5X29wZXJhdG9yIgorICAgICAgICAgICAgWyhtYXRjaF9vcGVyYW5kOlZEUV9J IDEgInJlZ2lzdGVyX29wZXJhbmQiKQorICAgICAgICAgICAgIChtYXRjaF9vcGVyYW5kOlZEUV9J IDIgImFhcmNoNjRfc2ltZF9yZWdfb3JfemVybyIpXSkKKyAgICAgICAgICAobGFiZWxfcmVmICht YXRjaF9vcGVyYW5kIDMgIiIpKQorICAgICAgICAgIChwYykpKV0KKyAgIlRBUkdFVF9TSU1EIgor eworICBhdXRvIGNvZGUgPSBHRVRfQ09ERSAob3BlcmFuZHNbMF0pOworICBydHggdG1wID0gb3Bl cmFuZHNbMV07CisKKyAgLyogSWYgY29tcGFyaW5nIGFnYWluc3QgYSBub24temVybyB2ZWN0b3Ig d2UgaGF2ZSB0byBkbyBhIGNvbXBhcmlzb24gZmlyc3QKKyAgICAgc28gd2UgY2FuIGhhdmUgYSAh PSAwIGNvbXBhcmlzb24gd2l0aCB0aGUgcmVzdWx0LiAgKi8KKyAgaWYgKG9wZXJhbmRzWzJdICE9 IENPTlNUMF9SVFggKDxNT0RFPm1vZGUpKQorICAgIHsKKyAgICAgIHRtcCA9IGdlbl9yZWdfcnR4 ICg8TU9ERT5tb2RlKTsKKyAgICAgIGVtaXRfaW5zbiAoZ2VuX3hvcjxtb2RlPjMgKHRtcCwgb3Bl cmFuZHNbMV0sIG9wZXJhbmRzWzJdKSk7CisgICAgfQorCisgIC8qIEZvciA2NC1iaXQgdmVjdG9y cyB3ZSBuZWVkIG5vIHJlZHVjdGlvbnMuICAqLworICBpZiAoa25vd25fZXEgKDEyOCwgR0VUX01P REVfQklUU0laRSAoPE1PREU+bW9kZSkpKQorICAgIHsKKyAgICAgIC8qIEFsd2F5cyByZWR1Y2Ug dXNpbmcgYSBWNFNJLiAgKi8KKyAgICAgIHJ0eCByZWR1YyA9IGdlbl9sb3dwYXJ0IChWNFNJbW9k ZSwgdG1wKTsKKyAgICAgIHJ0eCByZXMgPSBnZW5fcmVnX3J0eCAoVjRTSW1vZGUpOworICAgICAg ZW1pdF9pbnNuIChnZW5fYWFyY2g2NF91bWF4cHY0c2kgKHJlcywgcmVkdWMsIHJlZHVjKSk7Cisg ICAgICBlbWl0X21vdmVfaW5zbiAodG1wLCBnZW5fbG93cGFydCAoPE1PREU+bW9kZSwgcmVzKSk7 CisgICAgfQorCisgIHJ0eCB2YWwgPSBnZW5fcmVnX3J0eCAoREltb2RlKTsKKyAgZW1pdF9tb3Zl X2luc24gKHZhbCwgZ2VuX2xvd3BhcnQgKERJbW9kZSwgdG1wKSk7CisgIGVtaXRfanVtcF9pbnNu IChnZW5fY2JyYW5jaGRpNCAob3BlcmFuZHNbMF0sIHZhbCwgQ09OU1QwX1JUWCAoREltb2RlKSwK KwkJCQkgIG9wZXJhbmRzWzNdKSk7CisgIERPTkU7Cit9KQorCiA7OyBQYXR0ZXJucyBjb21wYXJp bmcgdHdvIHZlY3RvcnMgdG8gcHJvZHVjZSBhIG1hc2suCiAKIChkZWZpbmVfZXhwYW5kICJ2ZWNf Y21wPG1vZGU+PG1vZGU+IgpkaWZmIC0tZ2l0IGEvZ2NjL3Rlc3RzdWl0ZS9nY2MudGFyZ2V0L2Fh cmNoNjQvdmVjdC1lYXJseS1icmVhay1jYnJhbmNoLmMgYi9nY2MvdGVzdHN1aXRlL2djYy50YXJn ZXQvYWFyY2g2NC92ZWN0LWVhcmx5LWJyZWFrLWNicmFuY2guYwpuZXcgZmlsZSBtb2RlIDEwMDY0 NAppbmRleCAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwLi5jMDM2M2Mz Nzg3MjcwNTA3ZDc5MDJiYjJhYzBlMzlmYWVmNjNhODUyCi0tLSAvZGV2L251bGwKKysrIGIvZ2Nj L3Rlc3RzdWl0ZS9nY2MudGFyZ2V0L2FhcmNoNjQvdmVjdC1lYXJseS1icmVhay1jYnJhbmNoLmMK QEAgLTAsMCArMSwxMjQgQEAKKy8qIHsgZGctZG8gY29tcGlsZSB9ICovCisvKiB7IGRnLW9wdGlv bnMgIi1PMyIgfSAqLworLyogeyBkZy1maW5hbCB7IGNoZWNrLWZ1bmN0aW9uLWJvZGllcyAiKioi ICIiICIiIHsgdGFyZ2V0IGxwNjQgfSB9IH0gKi8KKworI3ByYWdtYSBHQ0MgdGFyZ2V0ICIrbm9z dmUiCisKKyNkZWZpbmUgTiA2NDAKK2ludCBhW05dID0gezB9OworaW50IGJbTl0gPSB7MH07CisK KworLyoKKyoqIGYxOgorKioJLi4uCisqKgljbWd0CXZbMC05XSsuNHMsIHZbMC05XSsuNHMsICMw CisqKgl1bWF4cAl2WzAtOV0rLjRzLCB2WzAtOV0rLjRzLCB2WzAtOV0rLjRzCisqKglmbW92CXhb MC05XSssIGRbMC05XSsKKyoqCWNibnoJeFswLTldKywgXC5MWzAtOV0rCisqKgkuLi4KKyovCit2 b2lkIGYxICgpCit7CisgIGZvciAoaW50IGkgPSAwOyBpIDwgTjsgaSsrKQorICAgIHsKKyAgICAg IGJbaV0gKz0gYVtpXTsKKyAgICAgIGlmIChhW2ldID4gMCkKKwlicmVhazsKKyAgICB9Cit9CisK Ky8qCisqKiBmMjoKKyoqCS4uLgorKioJY21nZQl2WzAtOV0rLjRzLCB2WzAtOV0rLjRzLCAjMAor KioJdW1heHAJdlswLTldKy40cywgdlswLTldKy40cywgdlswLTldKy40cworKioJZm1vdgl4WzAt OV0rLCBkWzAtOV0rCisqKgljYm56CXhbMC05XSssIFwuTFswLTldKworKioJLi4uCisqLwordm9p ZCBmMiAoKQoreworICBmb3IgKGludCBpID0gMDsgaSA8IE47IGkrKykKKyAgICB7CisgICAgICBi W2ldICs9IGFbaV07CisgICAgICBpZiAoYVtpXSA+PSAwKQorCWJyZWFrOworICAgIH0KK30KKwor LyoKKyoqIGYzOgorKioJLi4uCisqKgljbWVxCXZbMC05XSsuNHMsIHZbMC05XSsuNHMsICMwCisq Kgl1bWF4cAl2WzAtOV0rLjRzLCB2WzAtOV0rLjRzLCB2WzAtOV0rLjRzCisqKglmbW92CXhbMC05 XSssIGRbMC05XSsKKyoqCWNibnoJeFswLTldKywgXC5MWzAtOV0rCisqKgkuLi4KKyovCit2b2lk IGYzICgpCit7CisgIGZvciAoaW50IGkgPSAwOyBpIDwgTjsgaSsrKQorICAgIHsKKyAgICAgIGJb aV0gKz0gYVtpXTsKKyAgICAgIGlmIChhW2ldID09IDApCisJYnJlYWs7CisgICAgfQorfQorCisv KgorKiogZjQ6CisqKgkuLi4KKyoqCWNtdHN0CXZbMC05XSsuNHMsIHZbMC05XSsuNHMsIHZbMC05 XSsuNHMKKyoqCXVtYXhwCXZbMC05XSsuNHMsIHZbMC05XSsuNHMsIHZbMC05XSsuNHMKKyoqCWZt b3YJeFswLTldKywgZFswLTldKworKioJY2Juegl4WzAtOV0rLCBcLkxbMC05XSsKKyoqCS4uLgor Ki8KK3ZvaWQgZjQgKCkKK3sKKyAgZm9yIChpbnQgaSA9IDA7IGkgPCBOOyBpKyspCisgICAgewor ICAgICAgYltpXSArPSBhW2ldOworICAgICAgaWYgKGFbaV0gIT0gMCkKKwlicmVhazsKKyAgICB9 Cit9CisKKy8qCisqKiBmNToKKyoqCS4uLgorKioJY21sdAl2WzAtOV0rLjRzLCB2WzAtOV0rLjRz LCAjMAorKioJdW1heHAJdlswLTldKy40cywgdlswLTldKy40cywgdlswLTldKy40cworKioJZm1v dgl4WzAtOV0rLCBkWzAtOV0rCisqKgljYm56CXhbMC05XSssIFwuTFswLTldKworKioJLi4uCisq Lwordm9pZCBmNSAoKQoreworICBmb3IgKGludCBpID0gMDsgaSA8IE47IGkrKykKKyAgICB7Cisg ICAgICBiW2ldICs9IGFbaV07CisgICAgICBpZiAoYVtpXSA8IDApCisJYnJlYWs7CisgICAgfQor fQorCisvKgorKiogZjY6CisqKgkuLi4KKyoqCWNtbGUJdlswLTldKy40cywgdlswLTldKy40cywg IzAKKyoqCXVtYXhwCXZbMC05XSsuNHMsIHZbMC05XSsuNHMsIHZbMC05XSsuNHMKKyoqCWZtb3YJ eFswLTldKywgZFswLTldKworKioJY2Juegl4WzAtOV0rLCBcLkxbMC05XSsKKyoqCS4uLgorKi8K K3ZvaWQgZjYgKCkKK3sKKyAgZm9yIChpbnQgaSA9IDA7IGkgPCBOOyBpKyspCisgICAgeworICAg ICAgYltpXSArPSBhW2ldOworICAgICAgaWYgKGFbaV0gPD0gMCkKKwlicmVhazsKKyAgICB9Cit9 Cg== --_002_VI1PR08MB53252694799DD7DD3D6A12A9FF84AVI1PR08MB5325eurp_--