From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR05-VI1-obe.outbound.protection.outlook.com (mail-vi1eur05on2054.outbound.protection.outlook.com [40.107.21.54]) by sourceware.org (Postfix) with ESMTPS id A49973858D20 for ; Mon, 6 Nov 2023 15:18:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A49973858D20 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org A49973858D20 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=40.107.21.54 ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1699283884; cv=pass; b=X3LbllLkgc8NmEVF4LMhjpv/hOk0QURszaBGSWAqjj936r7l5DS95XXBRFhg3zDI0M7ldqoSHqXxnopRJaX3D6CvZ3qQtW1iyywEcA7vFBi1IXWxZNW0qXD2/3wNzD8S/ssDw1BI/weQM4tIxklfkP4ZMV8EX/WRqxMhwm67+lU= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1699283884; c=relaxed/simple; bh=BoKIYryIxvhmpBqcz2KLbohKK5OY6bV9XCYyPbUIctY=; h=DKIM-Signature:DKIM-Signature:From:To:Subject:Date:Message-ID: MIME-Version; b=f9C/OI9Pzj4wTCq9vmk+KJrzTLMqShADRnoIxaU44MHi6kjEIRgNCfxmYTK86a6PphBOsof5hKg43GqeUbML+8AG9oZrgZsiLFl6BMlhoIR2YYD76Yrhz7wkoV0hUXe4ftVdXj9MRKXpErGcHahyZ/O8usKhvTZaWZAwXuSlXJ4= ARC-Authentication-Results: i=3; server2.sourceware.org ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=cogFhHUoRbIbhmukcDtwMqrxv0d7+JyNuee/oWTMm2G6WIPvAsxbps7+WqB8ytfvhKqrX91f+Hol3OdSJGfsijMLeYHSSLawnxGuiyImPUmYJGpyCeY7Vuf8vTGNrnMvJFBboSciEU+kzvz+LBQi28pfq9CAvQIshqCH6hKXM80gCwTGA2XYBZaZqiC/C+Bqq+aQEi4ozcj9V07fIIB2rq3MsxJG7zo0BHkQmsSXtldHRNStEkIDX291OF1CURaHSBn4hMpg2WJdeAdEFrlS6KGypuETlxeGeGL5Wj2QiNu5c6JNL9AH8eT+ysaQbGyxIKM+Q+0dm0gmthpi0lJjGw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=GG7KqD6KXUaGlfkz/GDju80njo7ZhE/Y9YwZSj8dbtY=; b=mi7uY+fp+s8bHJ9LbEMzI4WpdtPG3f64qtacG6+/ker4dy2jvQswjBi+phgcRrr3VY6cNBk4mjhujiFo3lc2+Iu7rNKbIe5+vfOjss6/IJDtFger+9f3/CVN8ZqO06ttDa8mHlHRdS3wZLa0c4OW8LmtFUAekm7SsabZEvXQkVnI943l1oO9fyqLrnbo1Djut1Q6A2T4dUZDvbotuxgVOB8ArsFxT4FWKSd6uu9wx6fopEPA+LTuRBh2pa4GZX7rkwRyUq6QMjrBYcSoI/Igr3K14eZ+B/eQB6uLUYnIGRP4daqkX7GgRApS3WVL/VnUHOjXa0COYvp6yG0hOGhXYg== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=GG7KqD6KXUaGlfkz/GDju80njo7ZhE/Y9YwZSj8dbtY=; b=pfFNx8MnbtVK8bpdApb9xuLD8lCaYcFjJo64f657oLzzesEamROOuMdC7WDJKhYyRdXNbYdF9kHClFBcSmtmptBWYjrQHEEmlq6ovh3Zm7WxvXnw5yy2R60TnP6k+7GHxG6cCQD3IKro7gsGzs5V9Hc/Y9af9Fbnh1tG8v66ffc= Received: from DUZPR01CA0034.eurprd01.prod.exchangelabs.com (2603:10a6:10:468::10) by DB9PR08MB9754.eurprd08.prod.outlook.com (2603:10a6:10:45d::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6954.28; Mon, 6 Nov 2023 15:17:58 +0000 Received: from DU6PEPF00009528.eurprd02.prod.outlook.com (2603:10a6:10:468:cafe::58) by DUZPR01CA0034.outlook.office365.com (2603:10a6:10:468::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6954.28 via Frontend Transport; Mon, 6 Nov 2023 15:17:58 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DU6PEPF00009528.mail.protection.outlook.com (10.167.8.9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6977.16 via Frontend Transport; Mon, 6 Nov 2023 15:17:58 +0000 Received: ("Tessian outbound 7c4ecdadb9e7:v228"); Mon, 06 Nov 2023 15:17:58 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 3069bd3d2aa792f2 X-CR-MTA-TID: 64aa7808 Received: from 215706ce1daf.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 9256FACA-990B-4F0F-90A4-2F8F0036638B.1; Mon, 06 Nov 2023 15:17:51 +0000 Received: from EUR05-AM6-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 215706ce1daf.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Mon, 06 Nov 2023 15:17:51 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=dV5IL4NMHoBjy2nWL2guG6epo1GGEXikjmbhDCRuO6Xf1wWejy3uV3hwECDsnVex39I0vXR+w9/zIuD/9ylDN2ySa4NoKcrbueB4e2AI41IseQnS6CVgjXUsHeWDd4OlVkoPkNQFtT+N8YTHdIbfla+ivHlqOsUDn+XsXt6cJaNY6zBHUCDCmXYDtn+0MbwJWCQ1plcmd7ooQGl38cc7Fy0NBMfNk2BrKvE0V7NN4JswjjzQUQ4LXlI8n2QIKxbYoTAbOWOXdWLyEZvg7RfVY9b/JrbWkr6HVRJ4ZR6m2PzBGCTSnbgz72Z6aChh+dG+a4L5Vh+oAoshmggl/exQyg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=GG7KqD6KXUaGlfkz/GDju80njo7ZhE/Y9YwZSj8dbtY=; b=bYuVwNr/Rlvujg6c6WOMLvvHbUlZg/SrMMy6Nq9hcG0ySeuPdkWmwLyflp1NEKuaGFXpCP+0s1fKUkFqG/xiSOeCkT2CasGrh6NzYJY2+GUJbVFqu632MIGgSY1ZZWm15xGjhCi27YyIposby3zOFZ0lPq9S1QC+RbfXITQH4dHNNcZh0NPHyzVpeVHuCFpPKKReWF8yYok3d6rPtrBLqfZPDw9805BoxKHRCvfHy+Bxk7vNn0VuCDgY/1fzik4YtMUL9QM1x5FMWuhh/u9GuMUkz1eS1eJVJYF7iqU+VgtWu2GPYt6NJLQdp0YA80TiHq0+6oRXVvGDVoTciEVYyw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=GG7KqD6KXUaGlfkz/GDju80njo7ZhE/Y9YwZSj8dbtY=; b=pfFNx8MnbtVK8bpdApb9xuLD8lCaYcFjJo64f657oLzzesEamROOuMdC7WDJKhYyRdXNbYdF9kHClFBcSmtmptBWYjrQHEEmlq6ovh3Zm7WxvXnw5yy2R60TnP6k+7GHxG6cCQD3IKro7gsGzs5V9Hc/Y9af9Fbnh1tG8v66ffc= Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by DBBPR08MB10627.eurprd08.prod.outlook.com (2603:10a6:10:52e::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6954.27; Mon, 6 Nov 2023 15:17:48 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::26aa:efdd:a74a:27d0]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::26aa:efdd:a74a:27d0%5]) with mapi id 15.20.6954.028; Mon, 6 Nov 2023 15:17:48 +0000 From: Tamar Christina To: Richard Biener CC: "gcc-patches@gcc.gnu.org" , nd Subject: RE: [PATCH v6 0/21]middle-end: Support early break/return auto-vectorization Thread-Topic: [PATCH v6 0/21]middle-end: Support early break/return auto-vectorization Thread-Index: AQHaEL0aBaqqgH9C5Eiaj2hfUpDx2bBtZJ0w Date: Mon, 6 Nov 2023 15:17:47 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-traffictypediagnostic: VI1PR08MB5325:EE_|DBBPR08MB10627:EE_|DU6PEPF00009528:EE_|DB9PR08MB9754:EE_ X-MS-Office365-Filtering-Correlation-Id: c63f46bd-8287-4521-1ad5-08dbdedb8f28 x-checkrecipientrouted: true nodisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: AARRsFssVqC5ZcPdVt/+fMnSMfqPi0C/ZPqdNEVvMpoQYoFhaiS+2MfrrC6ZcOC5JbC+AOuQr39ll8/oYl38w812DbLKN5ofbzkTQpscq8hep9ZaqyE/wrOkw0YMDRQKEzXGFPzeBmej1YVAsly0QaIBPRoClvgs58ZUzYE3vO8nk2coXIJl26vvc3pBmGazTaivaneqOXVWfLHCnCU75GEMl6vHfdJLYOQMbyXQG+ZQHfoDJBajZ7odM/0i8hf2jQeKXn9jVIO2vd/kDpn7TtygrKjiths9WW8uUcF2hMzhoUDNZPLygx8cLYMrLcu5v0dWxe9MLtK0dI9jk+8xW+chuZbsZPyc00fI+jmlaumS4rhvnurkt+EWrcLhndRrmCR6EhLmRopeha3gGKapEjGEEsh8vADX9ztl2/TZk5+AI2dsh23f8pOt4gWKPSBjVX0LmP4oURlFUO29TiccLKXP0A70+3S1fIqiri8q0z0We3PvRd74GsAtXdfmFkr/M2oNT1sijsdK/ffRM95DhKA/GTx5+np0AfTx7myg2N7xtRlXM+sKkfPjDIa+g68Jkr6w2LREjO71+zCO+I9o6TKW+x5nIQS1xQLX7IOi1XNLeAAz+I1OOnK/zxBr644LWNuYEkUxlKFaZCI85S2oKQ== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:VI1PR08MB5325.eurprd08.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230031)(346002)(376002)(396003)(366004)(136003)(39860400002)(230922051799003)(1800799009)(451199024)(64100799003)(186009)(55016003)(83380400001)(41300700001)(52536014)(33656002)(8936002)(86362001)(4326008)(2906002)(8676002)(5660300002)(38070700009)(71200400001)(53546011)(6506007)(7696005)(38100700002)(316002)(122000001)(478600001)(6916009)(66946007)(66476007)(64756008)(9686003)(54906003)(66556008)(76116006)(26005)(66446008)(357404004);DIR:OUT;SFP:1101; Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: DBBPR08MB10627 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DU6PEPF00009528.eurprd02.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 3a7c8a5c-0860-4767-10db-08dbdedb88f3 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: XwXiXpAj2oQ2kutvVISdG++v50bvT0TWYu+zJRrCD+n33jicM8XQLVjj1JdJ96knWjaSN3WEsC6mdNOdA5u0dizWHihwOitf21IGp4gCv2nNSShapJXTbuI4IDYvRJEdbhnCrbXyiWzqmylOnDnTR+Zb8LKxdAS3rBizvD9uLgghGzSr7yp6fMOFCnCVLuNos4beXN5CnerXvGpxrq4cIKnbnBqjG79VyhrnL7XfxdEx4Z+8erCXSh+P2cDd1OTfxdTw7pU5LfjqKwlcILNMBJC9O3USAJtI3T6RcQOmUGECoEzgqQdkmD1NqPRoBjpYOXe7E83kUDQdqz5euNLB9bsejOXEBzFsMLxuauChLXbtHEABxhf/8D8f7JQRB8eMY5jH/V1WSKVOpd8Hixkn+Y2BtIEc6YorEvgqIqRxal1m4aRcKap12N75v8liwwp7KHLy9QzkEneP+scAWKmlu+RAHXg0bOhZ9Cd4LsfL1BXWC+clq1j4R7yFRRSOM1Y7rmAnlQVcfSNjj6RzIcloeudGujxA7O8d+Q560z8402/lc3XpCbCoDcnfte1rhcqSML8KI3vXIcZD4yJDAJArmr8LpROl1teGvmVnWB2VgB1x+JqYpaA4oL54T/OchdHWEdYBndufhrb+5YaP5P4MgokPoHRFDFNu4gsYt4PlxR42R7ohPOS5chCgRsqyEQ3vFe21tXdLwqWsKx/zrk0H9nvEXS90WpgRTLGwhcIYIBtcLNNsQAaG2XoCQZSnoIKN4eY7zg4o4ZYVlmo1RP2V6A== X-Forefront-Antispam-Report: CIP:63.35.35.123;CTRY:IE;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:64aa7808-outbound-1.mta.getcheckrecipient.com;PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com;CAT:NONE;SFS:(13230031)(4636009)(39860400002)(136003)(396003)(376002)(346002)(230922051799003)(451199024)(82310400011)(64100799003)(1800799009)(186009)(46966006)(40470700004)(36840700001)(2906002)(53546011)(7696005)(6506007)(9686003)(478600001)(83380400001)(47076005)(26005)(336012)(70586007)(5660300002)(41300700001)(54906003)(316002)(52536014)(6862004)(8676002)(4326008)(8936002)(70206006)(36860700001)(33656002)(86362001)(81166007)(356005)(82740400003)(55016003)(40460700003)(40480700001)(357404004);DIR:OUT;SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Nov 2023 15:17:58.3875 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: c63f46bd-8287-4521-1ad5-08dbdedb8f28 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d;Ip=[63.35.35.123];Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DU6PEPF00009528.eurprd02.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB9PR08MB9754 X-Spam-Status: No, score=-6.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,FORGED_SPF_HELO,KAM_DMARC_NONE,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE,UNPARSEABLE_RELAY autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: > -----Original Message----- > From: Richard Biener > Sent: Monday, November 6, 2023 2:25 PM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd > Subject: Re: [PATCH v6 0/21]middle-end: Support early break/return auto- > vectorization >=20 > On Mon, 6 Nov 2023, Tamar Christina wrote: >=20 > > Hi All, > > > > This patch adds initial support for early break vectorization in GCC. > > The support is added for any target that implements a vector cbranch > > optab, this includes both fully masked and non-masked targets. > > > > Depending on the operation, the vectorizer may also require support > > for boolean mask reductions using Inclusive OR. This is however only > > checked then the comparison would produce multiple statements. > > > > Note: I am currently struggling to get patch 7 correct in all cases and= could > use > > some feedback there. > > > > Concretely the kind of loops supported are of the forms: > > > > for (int i =3D 0; i < N; i++) > > { > > > > if () > > { > > ... > > ; > > } > > > > } > > > > where can be: > > - break > > - return > > - goto > > > > Any number of statements can be used before the occurs. > > > > Since this is an initial version for GCC 14 it has the following > > limitations and > > features: > > > > - Only fixed sized iterations and buffers are supported. That is to sa= y any > > vectors loaded or stored must be to statically allocated arrays with = known > > sizes. N must also be known. This limitation is because our primary = target > > for this optimization is SVE. For VLA SVE we can't easily do cross p= age > > iteraion checks. The result is likely to also not be beneficial. For = that > > reason we punt support for variable buffers till we have First-Faulti= ng > > support in GCC. > > - any stores in should not be to the same objects as in > > . Loads are fine as long as they don't have the possibili= ty to > > alias. More concretely, we block RAW dependencies when the intermedi= ate > value > > can't be separated fromt the store, or the store itself can't be move= d. > > - Prologue peeling, alignment peelinig and loop versioning are supporte= d. > > - Fully masked loops, unmasked loops and partially masked loops are > > supported > > - Any number of loop early exits are supported. > > - No support for epilogue vectorization. The only epilogue supported i= s the > > scalar final one. Peeling code supports it but the code motion code = cannot > > find instructions to make the move in the epilog. > > - Early breaks are only supported for inner loop vectorization. > > > > I have pushed a branch to refs/users/tnfchris/heads/gcc-14-early-break > > > > With the help of IPA and LTO this still gets hit quite often. During > > bootstrap it hit rather frequently. Additionally TSVC s332, s481 and > > s482 all pass now since these are tests for support for early exit > vectorization. > > > > This implementation does not support completely handling the early > > break inside the vector loop itself but instead supports adding checks > > such that if we know that we have to exit in the current iteration > > then we branch to scalar code to actually do the final VF iterations wh= ich > handles all the code in . > > > > For the scalar loop we know that whatever exit you take you have to > > perform at most VF iterations. For vector code we only case about the > > state of fully performed iteration and reset the scalar code to the (pa= rtially) > remaining loop. > > > > That is to say, the first vector loop executes so long as the early > > exit isn't needed. Once the exit is taken, the scalar code will > > perform at most VF extra iterations. The exact number depending on pee= ling > and iteration start and which > > exit was taken (natural or early). For this scalar loop, all early ex= its are > > treated the same. > > > > When we vectorize we move any statement not related to the early break > > itself and that would be incorrect to execute before the break (i.e. > > has side effects) to after the break. If this is not possible we decli= ne to > vectorize. > > > > This means that we check at the start of iterations whether we are > > going to exit or not. During the analyis phase we check whether we > > are allowed to do this moving of statements. Also note that we only > > move the scalar statements, but only do so after peeling but just befor= e we > start transforming statements. > > > > Codegen: > > > > for e.g. > > > > #define N 803 > > unsigned vect_a[N]; > > unsigned vect_b[N]; > > > > unsigned test4(unsigned x) > > { > > unsigned ret =3D 0; > > for (int i =3D 0; i < N; i++) > > { > > vect_b[i] =3D x + i; > > if (vect_a[i] > x) > > break; > > vect_a[i] =3D x; > > > > } > > return ret; > > } > > > > We generate for Adv. SIMD: > > > > test4: > > adrp x2, .LC0 > > adrp x3, .LANCHOR0 > > dup v2.4s, w0 > > add x3, x3, :lo12:.LANCHOR0 > > movi v4.4s, 0x4 > > add x4, x3, 3216 > > ldr q1, [x2, #:lo12:.LC0] > > mov x1, 0 > > mov w2, 0 > > .p2align 3,,7 > > .L3: > > ldr q0, [x3, x1] > > add v3.4s, v1.4s, v2.4s > > add v1.4s, v1.4s, v4.4s > > cmhi v0.4s, v0.4s, v2.4s > > umaxp v0.4s, v0.4s, v0.4s > > fmov x5, d0 > > cbnz x5, .L6 > > add w2, w2, 1 > > str q3, [x1, x4] > > str q2, [x3, x1] > > add x1, x1, 16 > > cmp w2, 200 > > bne .L3 > > mov w7, 3 > > .L2: > > lsl w2, w2, 2 > > add x5, x3, 3216 > > add w6, w2, w0 > > sxtw x4, w2 > > ldr w1, [x3, x4, lsl 2] > > str w6, [x5, x4, lsl 2] > > cmp w0, w1 > > bcc .L4 > > add w1, w2, 1 > > str w0, [x3, x4, lsl 2] > > add w6, w1, w0 > > sxtw x1, w1 > > ldr w4, [x3, x1, lsl 2] > > str w6, [x5, x1, lsl 2] > > cmp w0, w4 > > bcc .L4 > > add w4, w2, 2 > > str w0, [x3, x1, lsl 2] > > sxtw x1, w4 > > add w6, w1, w0 > > ldr w4, [x3, x1, lsl 2] > > str w6, [x5, x1, lsl 2] > > cmp w0, w4 > > bcc .L4 > > str w0, [x3, x1, lsl 2] > > add w2, w2, 3 > > cmp w7, 3 > > beq .L4 > > sxtw x1, w2 > > add w2, w2, w0 > > ldr w4, [x3, x1, lsl 2] > > str w2, [x5, x1, lsl 2] > > cmp w0, w4 > > bcc .L4 > > str w0, [x3, x1, lsl 2] > > .L4: > > mov w0, 0 > > ret > > .p2align 2,,3 > > .L6: > > mov w7, 4 > > b .L2 > > > > and for SVE: > > > > test4: > > adrp x2, .LANCHOR0 > > add x2, x2, :lo12:.LANCHOR0 > > add x5, x2, 3216 > > mov x3, 0 > > mov w1, 0 > > cntw x4 > > mov z1.s, w0 > > index z0.s, #0, #1 > > ptrue p1.b, all > > ptrue p0.s, all > > .p2align 3,,7 > > .L3: > > ld1w z2.s, p1/z, [x2, x3, lsl 2] > > add z3.s, z0.s, z1.s > > cmplo p2.s, p0/z, z1.s, z2.s > > b.any .L2 > > st1w z3.s, p1, [x5, x3, lsl 2] > > add w1, w1, 1 > > st1w z1.s, p1, [x2, x3, lsl 2] > > add x3, x3, x4 > > incw z0.s > > cmp w3, 803 > > bls .L3 > > .L5: > > mov w0, 0 > > ret > > .p2align 2,,3 > > .L2: > > cntw x5 > > mul w1, w1, w5 > > cbz w5, .L5 > > sxtw x1, w1 > > sub w5, w5, #1 > > add x5, x5, x1 > > add x6, x2, 3216 > > b .L6 > > .p2align 2,,3 > > .L14: > > str w0, [x2, x1, lsl 2] > > cmp x1, x5 > > beq .L5 > > mov x1, x4 > > .L6: > > ldr w3, [x2, x1, lsl 2] > > add w4, w0, w1 > > str w4, [x6, x1, lsl 2] > > add x4, x1, 1 > > cmp w0, w3 > > bcs .L14 > > mov w0, 0 > > ret > > > > On the workloads this work is based on we see between 2-3x performance > > uplift using this patch. > > > > Follow up plan: > > - Boolean vectorization has several shortcomings. I've filed PR110223= with > the > > bigger ones that cause vectorization to fail with this patch. > > - SLP support. This is planned for GCC 15 as for majority of the case= s build > > SLP itself fails. >=20 > It would be nice to get at least single-lane SLP support working. I thin= k you > need to treat the gcond as SLP root stmt and basically do discovery on th= e > condition as to as if it were a mask generating condition. Hmm ok, will give it a try. >=20 > Code generation would then simply schedule the gcond root instances first > (that would get you the code motion automagically). Right, so you're saying treat the gcond's as the seed, and stores as a sink= . And then schedule only the instances without a gcond around such that we can still vectorize in place to get the branches. Ok, makes sense. >=20 > So, add a new slp_instance_kind, for example slp_inst_kind_early_break, a= nd > record the gcond as root stmt. Possibly "pattern" recognizing >=20 > gcond <_1 !=3D _2> >=20 > as >=20 > _mask =3D _1 !=3D _2; > gcond <_mask !=3D 0> >=20 > makes the SLP discovery less fiddly (but in theory you can of course hand= le > gconds directly). >=20 > Is there any part of the series that can be pushed independelty? If so I= 'll try to > look at those parts first. >=20 Aside from: [PATCH 4/21]middle-end: update loop peeling code to maintain LCSSA form for= early breaks [PATCH 7/21]middle-end: update IV update code to support early breaks and a= rbitrary exits =20 The rest lie dormant and don't do anything or disrupt the tree until those = two are in. The rest all just touch up different parts piecewise. They do rely on the new field introduced in: [PATCH 3/21]middle-end: Implement code motion and dependency analysis for e= arly breaks But can split them out. I'll start respinning no #4 and #7 with your latest changes now. Thanks, Tamar > Thanks, > Richard.