From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR05-DB8-obe.outbound.protection.outlook.com (mail-db8eur05on2086.outbound.protection.outlook.com [40.107.20.86]) by sourceware.org (Postfix) with ESMTPS id 3F7F8385B515 for ; Thu, 16 Nov 2023 14:14:45 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 3F7F8385B515 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 3F7F8385B515 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=40.107.20.86 ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1700144088; cv=pass; b=Pmep13iSLkqX0HyyFh9Ov0hnR0asbPseOUspQDEq+98KYr460MhZH6pnkzraobLmbeQzDkWDHYsM9d/mVcX5V8X9nyErmajMA/ujLiIhxqHoYBS6NhkOquVwLCWALfO5E0CRJmvFYDrBkeQK4wY19u4QzwLJXJ7X9QJNlEgeMbU= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1700144088; c=relaxed/simple; bh=Aos98P5+djcD9RPjLJteWnesdGj5OP6idt/1iddRino=; h=DKIM-Signature:DKIM-Signature:From:To:Subject:Date:Message-ID: MIME-Version; b=kj1hJd9i+NQX/fzEEm/cNuU3SP6yltQabyYKevC465sG8Il2lcPw1MpaXRA/ysV8TtiT8erwpZ08u9ZBlMg796o/8LN5FekVYJ6l7fkYzmoOUujL2LqX/HNQ2W7hstsvZQjKtB7mZjROUI3Abp+dVkC50lrNy1inSztxYC88RbU= ARC-Authentication-Results: i=3; server2.sourceware.org ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=VQrLg9FgOchVkEBaKMo7W18i4ISY+VJVRhmc0F0CQs1JrqwzSQ1AUwV/Yg5A+sDIIGNNT8XinJsbGZsyJYuaZDgf1AF3D5eGJ3avengDS8uRbADnEyXX6nYbItRQCxh7QfDY0BQvd4uP2qv5tu3Ytwhu9883LCX4dUbHq8GFCVrtOILsAL5N7Fsc1GZaBLUPLLZ/YoeJz7DRDmUuEpN8mnuIb977jET3AbscUH0lJvFTAwstR6p5qx0y0yxs/TzdffRoOq9D5feXnARgJzc7WLSWi0CiqCEN1jA7PyIi72sFGvC9ajaSLDh4S2iWy/wCHVrZtbDt6/Fn3tEqvjyODw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=apuTWEYWiW6svFgbrzy5wOpyGFQjWkckZqkJ2rOaSU0=; b=Dq8SgV85ZCKK4Xfi+zMF1a3ZHTlrORa/J66nRuRq1gXxHp50Tk7fhVjHHFMxrJ0r7yD2cn8ARvpV4m+6850J7kLWdd4CWA2AAolGE34+U0t6uV70PV1dTiJmN7vRo55Nm/bxGM4Gv26AjRYYJlXTO5ABpX24KVUKgA4NkkgdeAVN5ZAUrmTOqlyBE55OrE0V1LAPA/PMd739zzJoMOpvPYtlJAFj3duygft9Dcq0sG/PFy8t43Iym5nd9pz6RWPkq1cOJ4aGnbVd6nJ+fYf2bAyo/zz6SIJhkbxP2y8LlCmATQrWROh/1PgsAwUZXiQUfSr2wZhL/DDGn4WXOmd1fw== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=apuTWEYWiW6svFgbrzy5wOpyGFQjWkckZqkJ2rOaSU0=; b=wrfiNOS/XOc4Dk9rTo2FWMD+w663KAex7QOd7+FhOLmoie0h3dyHtk7uhUS1YliEM9rGps1icv4cgvwNGS07aCo+cFhqFtOe6uJh+Kp40IAj1NYm8CMMOEJ2uFuNV9rNd3VppNuEdvrwDFLy0AaCnSMvGK+Bozym+c9jt/LK4S0= Received: from DUZPR01CA0248.eurprd01.prod.exchangelabs.com (2603:10a6:10:4b5::21) by AM8PR08MB6353.eurprd08.prod.outlook.com (2603:10a6:20b:361::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7002.21; Thu, 16 Nov 2023 14:14:42 +0000 Received: from DB5PEPF00014B8A.eurprd02.prod.outlook.com (2603:10a6:10:4b5:cafe::52) by DUZPR01CA0248.outlook.office365.com (2603:10a6:10:4b5::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7002.21 via Frontend Transport; Thu, 16 Nov 2023 14:14:42 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DB5PEPF00014B8A.mail.protection.outlook.com (10.167.8.198) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7002.20 via Frontend Transport; Thu, 16 Nov 2023 14:14:42 +0000 Received: ("Tessian outbound 20615a7e7970:v228"); Thu, 16 Nov 2023 14:14:42 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 3fbc24b13fc84e2d X-CR-MTA-TID: 64aa7808 Received: from 368d16a2e12a.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id AAA320DA-C946-40B2-B72B-D01A85A52537.1; Thu, 16 Nov 2023 14:14:36 +0000 Received: from EUR05-DB8-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 368d16a2e12a.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 16 Nov 2023 14:14:36 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=d1UMf+/ZTSrSM60Az2UP+6JcwvhghwURGZB1x6BFaW1IPTYX5K2ZcFEmiADcyRrY6gYWG6uZ0q2/PUndB4b4fgAR6lnLhqk0+hOyTjig9Mz9YLu6FvGzzamBV6x+xLtgao/VyjLeRqhHAh1H8q+B3TGtRv6/5xh3ZvuzOH151lOgI1pPAcd/ZHWG1cMY66ZJVgNYnhk3Uq5r+XwmcBV+Py+I0Cac3ms1yJWbuwaojsD0UbqmLzbVBIvHcRj4qaxX2O9PcSoqIsy1xVUcLCglEQnc9i95VATEO2rUUlZ1x0WT1Z7Ok28ym2diaVMH/W91pqN6t+4+o6OJAzLvH1/jlw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=apuTWEYWiW6svFgbrzy5wOpyGFQjWkckZqkJ2rOaSU0=; b=YL1s3/j6ipqZYXTwTPz9KNS5qtKLFEt3VS0YLuw9umA5wMvvaSA2wZsGRQbYysa/rraJP+aD5b0P0t9xI46OK/Flmv1kNp1/VaI6sdl5sYi2BLwZ4XZNL7ZEvnqWZZTkubEoE+7F/F5An9QD35aNqC7xh00gJTP+/im5X729w0mC5Iic5h5pRLI3AjQhgTak21ueVZyHl5zPkxDL5+hEXJerb19J1kXq7uSKXjRN8suhuWN0Ad9nfga63y1rtf2aPGWxGFZRKhkpKmNxaMs9RxMN1MIjsc5xNXgHPhu8OuHNq4ACL2ymjqGcYfO6d3kvYmb3mv5t/k7pgtYDtBI+Tw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=apuTWEYWiW6svFgbrzy5wOpyGFQjWkckZqkJ2rOaSU0=; b=wrfiNOS/XOc4Dk9rTo2FWMD+w663KAex7QOd7+FhOLmoie0h3dyHtk7uhUS1YliEM9rGps1icv4cgvwNGS07aCo+cFhqFtOe6uJh+Kp40IAj1NYm8CMMOEJ2uFuNV9rNd3VppNuEdvrwDFLy0AaCnSMvGK+Bozym+c9jt/LK4S0= Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by AS8PR08MB9095.eurprd08.prod.outlook.com (2603:10a6:20b:5c1::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7002.20; Thu, 16 Nov 2023 14:14:33 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::9679:2ab0:99c6:54a3]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::9679:2ab0:99c6:54a3%6]) with mapi id 15.20.7002.021; Thu, 16 Nov 2023 14:14:32 +0000 From: Tamar Christina To: Richard Biener CC: "gcc-patches@gcc.gnu.org" , nd , "jlaw@ventanamicro.com" Subject: RE: [PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits Thread-Topic: [PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits Thread-Index: AQHaEIRUWLSyh++pSEG+iB4lA+c+BLB6jLQwgADZ7QCAAAD4UIAABQiAgAALGkCAAVnUgIAAAkUAgAAK9ACAAAX/4IAAC6cAgAACTBCAAA/GgIAACPOw Date: Thu, 16 Nov 2023 14:14:32 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-traffictypediagnostic: VI1PR08MB5325:EE_|AS8PR08MB9095:EE_|DB5PEPF00014B8A:EE_|AM8PR08MB6353:EE_ X-MS-Office365-Filtering-Correlation-Id: d06dd4a2-990f-45d0-cd89-08dbe6ae6086 x-checkrecipientrouted: true nodisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: d/4tnH+xSmVWPumEMiBXyqk+ZWWWtqWKTzHHY/Gjork8hubq05/Sg45sgAM5lYODSotK3A5W+FK8NbnPH/zMNEoGjh4PUggnQEoSbjgfoqApXQhQ2KA4biIPHEFCwwmngCGMyyZ4RJ/S9PS0gM4kRhbNKpilfg/Z7UWlqKX3I/V5Qw1SL6W6eAzB1hxIanNyfLTG3J0vL4+lSjS4ZQRmKlbzWtbhGsVtt1V1bkgflH5gXguFAHopVbvTYUMZUO0Zh45wo2u3/J0gOsE4sQ4YOiheYvdkhyIJNux8fQhCQmKyOJKY/Oomz92y3Kmpo/Lmsm5PXQNhkatKz02ii4OQj/2dG/XVT6A++F0FfRXev1Y8sfa3tfFbxoAGwIw7mzYTwRI2Angyf0rEkJnYWVZbholAsoLOdLQ2w3PHPhp6bEyUibJWDqdIpjHqloLPzlMl7pFm/EkkPOTz1z977UlZ+PRFOHJOPDM5idH3inJ6lwNRYjbCjcdl5hOqkAdSsOIEIlh8zRbueZhgo/pVqwkSEN5yTABrRPSg4kC3rBS9JsmDByQ/ALo6LDUvW9zmO6LD9TGL3PFAcKe0MAQmex6FG3FZAqk2X0lK92ffsmbwxi2IqpB8THmT+7LKbjKvCXO6 X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:VI1PR08MB5325.eurprd08.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230031)(376002)(136003)(396003)(39860400002)(366004)(346002)(230922051799003)(64100799003)(186009)(1800799009)(451199024)(55016003)(26005)(76116006)(66476007)(64756008)(54906003)(66556008)(66446008)(122000001)(38100700002)(38070700009)(33656002)(86362001)(66946007)(83380400001)(9686003)(53546011)(71200400001)(2906002)(7696005)(6506007)(316002)(6916009)(30864003)(52536014)(8676002)(8936002)(478600001)(4326008)(5660300002)(15650500001)(41300700001)(579004);DIR:OUT;SFP:1101; Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR08MB9095 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DB5PEPF00014B8A.eurprd02.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 856040f6-a23d-4328-8111-08dbe6ae5ac0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: VlNetMjMiJA4ZwFNtT4ck7fV1SYnaaDvXdijCkvP0n64My7TjC3GZY6nFkj7N7vU7Vh2NADP/Ny1LmlhUx79R91wTf7HMJ+jXmluAoNZgmy15Ii1EKiz9Saa8787iEERHJfe2+TZqEtTiblghvXt49sMj7g7ymWjbdN+5qXI8y4nxYobHgpqrQESsnU6VWRu2EqUzEECKtbMtNB5lM7geutLBSaEtk03SeWBcvYxgUhgEA+29Sjm6mHjOsCsToDCONwpnGBTWS1fKGr1MawvwtKAArlKalpkKwqLzy06PjCWbnQU7DPdcurHZ7eJ5YaURQgXRfkjUl9MBy7H/Rc3m7hMAFnnfjU0ifwKaI2joprxxO1VegiQwp/PWzylPfymf4uwRY8tJF4dmCftNodpi6iC8QqTdQ7yyqDHcE9KGAsJJgTOao0Wly9cdUnXmmbfI3x3rKBx5xAzYfszY68WVh1u7BcmPahey2U+tnysCvvSFbmj6pBx5Y5MZ8R2viXgF1ShH18+zPnmuk2LAtzv6OCLi3KXG1rhtRWYrZfLC9cycrpxyYPgLwfYVyrCOIRmqzs8lBRgVrM0LAgBDwivWfJKeNeZdWgEJLbvbDaPINkjTkko0vqHk3bcy0mJ+AFcu1zFcn/JqHB3zeAo0DUx2egbUFLd1I62m+Sl6yaW9AuQZ+T/x1Eqm3gm9huUH78ea2f0OAI/mq80QXNAD1Akc2pLGM2bemFb2QKU3xzTYdlME9pSBoCdB+CQY4jCXILx X-Forefront-Antispam-Report: CIP:63.35.35.123;CTRY:IE;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:64aa7808-outbound-1.mta.getcheckrecipient.com;PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com;CAT:NONE;SFS:(13230031)(4636009)(136003)(39860400002)(346002)(376002)(396003)(230922051799003)(82310400011)(451199024)(1800799009)(64100799003)(186009)(46966006)(40470700004)(36840700001)(82740400003)(356005)(81166007)(30864003)(15650500001)(5660300002)(55016003)(2906002)(47076005)(40480700001)(52536014)(8676002)(6862004)(4326008)(86362001)(36860700001)(8936002)(41300700001)(316002)(40460700003)(54906003)(70586007)(70206006)(336012)(107886003)(478600001)(53546011)(83380400001)(26005)(33656002)(7696005)(6506007)(9686003);DIR:OUT;SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 16 Nov 2023 14:14:42.1046 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: d06dd4a2-990f-45d0-cd89-08dbe6ae6086 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d;Ip=[63.35.35.123];Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DB5PEPF00014B8A.eurprd02.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM8PR08MB6353 X-Spam-Status: No, score=-6.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,FORGED_SPF_HELO,KAM_DMARC_NONE,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE,UNPARSEABLE_RELAY autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: > -----Original Message----- > From: Richard Biener > Sent: Thursday, November 16, 2023 1:36 PM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd ; jlaw@ventanamicro.com > Subject: RE: [PATCH 7/21]middle-end: update IV update code to support ear= ly > breaks and arbitrary exits >=20 > On Thu, 16 Nov 2023, Tamar Christina wrote: >=20 > > > > > > > > > > > > > > > > Perhaps I'm missing something here? > > > > > > > > > > > > > > OK, so I refreshed my mind of what > > > > > > > vect_update_ivs_after_vectorizer > > > > > does. > > > > > > > > > > > > > > I still do not understand the (complexity of the) patch. > > > > > > > Basically the function computes the new value of the IV > > > > > > > "from scratch" based on the number of scalar iterations of > > > > > > > the vector loop, > > > the 'niter' > > > > > > > argument. I would have expected that for the early exits we > > > > > > > either pass in a different 'niter' or alternatively a 'niter_= adjustment'. > > > > > > > > > > > > But for an early exit there's no static value for adjusted > > > > > > niter, since you don't know which iteration you exited from. > > > > > > Unlike the normal exit when you know if you get there you've > > > > > > done all possible > > > > > iterations. > > > > > > > > > > > > So you must compute the scalar iteration count on the exit itse= lf. > > > > > > > > > > ? You do not need the actual scalar iteration you exited (you > > > > > don't compute that either), you need the scalar iteration the > > > > > vector iteration started with when it exited prematurely and > > > > > that's readily > > > available? > > > > > > > > For a normal exit yes, not for an early exit no? > > > > niters_vector_mult_vf is only valid for the main exit. > > > > > > > > There's the unadjusted scalar count, which is what it's using to > > > > adjust it to the final count. Unless I'm missing something? > > > > > > Ah, of course - niters_vector_mult_vf is for the countable exit. > > > For the early exits we can't precompute the scalar iteration value. > > > But that then means we should compute the appropriate "continuation" > > > as live value of the vectorized IVs even when they were not > > > originally used outside of the loop. I don't see how we can express > > > this in terms of the scalar IVs in the (not yet) vectorized loop - > > > similar to the reduction case you are going to end up with the wrong = values > here. > > > > > > That said, I've for a long time wanted to preserve the original > > > control IV also for the vector code (leaving any "optimization" > > > to IVOPTs there), that would enable us to compute the correct > > > "niters_vector_mult_vf" based on that IV. > > > > > > So given we cannot use the scalar IVs you have to handle all > > > inductions (besides the main exit control IV) in vectorizable_live_op= eration > I think. > > > > > > > That's what I currently do, that's why there was the > > if (STMT_VINFO_LIVE_P (phi_info)) > > continue; >=20 > Yes, but that only works for the inductions marked so. We'd need to mark= the > others as well, but only for the early exits. >=20 > > although I don't understand why we use the scalar count, I suppose > > the reasoning is that we don't really want to keep it around, and refer= encing > it forces it to be kept? >=20 > Referencing it will cause the scalar compute to be retained, but since we= do not > adjust the scalar compute during vectorization (but expect it to be dead)= the > scalar compute will compute the wrong thing (as shown by the reduction > example - I suspect inductions will suffer from the same problem). >=20 > > At the moment it just does `init + (final - init) * vf` which is correc= t no? >=20 > The issue is that 'final' is not computed correctly in the vectorized loo= p. This > formula might work for affine evolutions of course. >=20 > Extracting the correct value from the vectorized induction would be the > preferred solution. Ok, so I should be able to just mark IVs as live during process_use if ther= e are multiple exits right? Since it's just gonna be unused on the main exit sinc= e we use niters? Because since it's the PHI inside the loop that needs to be marked live I c= an't just do it for a specific exits no? If I create a copy of the PHI node during peeling for use in early exits an= d mark it live it won't work no? Tamar >=20 > > Also you missed the question below about how to avoid the creation of > > the block, You ok with changing that? > > > > Thanks, > > Tamar > > > > > Or for now disable early-break for inductions that are not the main > > > exit control IV (in vect_can_advance_ivs_p)? > > > > > > > > > > > > > > > > > It seems your change handles different kinds of inductions > differently. > > > > > > > Specifically > > > > > > > > > > > > > > bool ivtemp =3D gimple_cond_lhs (cond) =3D=3D iv_var; > > > > > > > if (restart_loop && ivtemp) > > > > > > > { > > > > > > > type =3D TREE_TYPE (gimple_phi_result (phi)); > > > > > > > ni =3D build_int_cst (type, vf); > > > > > > > if (inversed_iv) > > > > > > > ni =3D fold_build2 (MINUS_EXPR, type, ni, > > > > > > > fold_convert (type, step_expr))= ; > > > > > > > } > > > > > > > > > > > > > > it looks like for the exit test IV we use either 'VF' or 'VF = - step' > > > > > > > as the new value. That seems to be very odd special casing > > > > > > > for unknown reasons. And while you adjust vec_step_op_add, > > > > > > > you don't adjust vect_peel_nonlinear_iv_init (maybe not > > > > > > > supported - better assert > > > > > here). > > > > > > > > > > > > The VF case is for a normal "non-inverted" loop, where if you > > > > > > take an early exit you know that you have to do at most VF iter= ations. > > > > > > The VF > > > > > > - step is to account for the inverted loop control flow where > > > > > > you exit after adjusting the IV already by + step. > > > > > > > > > > But doesn't that assume the IV counts from niter to zero? I > > > > > don't see this special case is actually necessary, no? > > > > > > > > > > > > > I needed it because otherwise the scalar loop iterates one > > > > iteration too little So I got a miscompile with the inverter loop > > > > stuff. I'll look at it again perhaps It can be solved differently. > > > > > > > > > > > > > > > > Peeling doesn't matter here, since you know you were able to > > > > > > do a vector iteration so it's safe to do VF iterations. So > > > > > > having peeled doesn't affect the remaining iters count. > > > > > > > > > > > > > > > > > > > > Also the vec_step_op_add case will keep the original scalar > > > > > > > IV live even when it is a vectorized induction. The code > > > > > > > recomputing the value from scratch avoids this. > > > > > > > > > > > > > > /* For non-main exit create an intermediat edge to get > > > > > > > any updated > > > iv > > > > > > > calculations. */ > > > > > > > if (needs_interm_block > > > > > > > && !iv_block > > > > > > > && (!gimple_seq_empty_p (stmts) || > > > > > > > !gimple_seq_empty_p > > > > > > > (new_stmts))) > > > > > > > { > > > > > > > iv_block =3D split_edge (update_e); > > > > > > > update_e =3D single_succ_edge (update_e->dest); > > > > > > > last_gsi =3D gsi_last_bb (iv_block); > > > > > > > } > > > > > > > > > > > > > > this is also odd, can we adjust the API instead? I suppose > > > > > > > this is because your computation uses the original loop IV, > > > > > > > if you based the computation off the initial value only this > > > > > > > might not be > > > necessary? > > > > > > > > > > > > No, on the main exit the code updates the value in the loop > > > > > > header and puts the Calculation in the merge block. This > > > > > > works because it only needs to consume PHI nodes in the merge > > > > > > block and things like niters are > > > > > adjusted in the guard block. > > > > > > > > > > > > For an early exit, we don't have a guard block, only the merge = block. > > > > > > We have to update the PHI nodes in that block, but can't do > > > > > > so since you can't produce a value and consume it in a PHI > > > > > > node in the same > > > BB. > > > > > > So we need to create the block to put the values in for use in > > > > > > the merge block. Because there's no "guard" block for early ex= its. > > > > > > > > > > ? then compute niters in that block as well. > > > > > > > > We can't since it'll not be reachable through the right edge. > > > > What we can do if you want is slightly change peeling, we currently= peel > as: > > > > > > > > \ \ / > > > > E1 E2 Normal exit > > > > \ | | > > > > \ | Guard > > > > \ | | > > > > Merge block > > > > | > > > > Pre Header > > > > > > > > If we instead peel as: > > > > > > > > > > > > \ \ / > > > > E1 E2 Normal exit > > > > \ | | > > > > Exit join Guard > > > > \ | | > > > > Merge block > > > > | > > > > Pre Header > > > > > > > > We can use the exit join block. This would also mean > > > > vect_update_ivs_after_vectorizer Doesn't need to iterate over all > > > > exits and only really needs to adjust the phi nodes Coming out of > > > > the exit join > > > and guard block. > > > > > > > > Does this work for you? >=20 > Yeah, I think that would work. But I'd like to sort out the correctness = details of > the IV update itself before sorting out this code placement detail. >=20 > Richard. >=20 > > > > Thanks, > > > > Tamar > > > > > > > > > > > The API can be adjusted by always creating the empty block > > > > > > either during > > > > > peeling. > > > > > > That would prevent us from having to do anything special here. > > > > > > Would that work better? Or I can do it in the loop that > > > > > > iterates over the exits to before the call to > > > > > > vect_update_ivs_after_vectorizer, which I think > > > > > might be more consistent. > > > > > > > > > > > > > > > > > > > > That said, I wonder why we cannot simply pass in an adjusted > > > > > > > niter which would be niters_vector_mult_vf - vf and be done w= ith > that? > > > > > > > > > > > > > > > > > > > We can ofcourse not have this and recompute it from niters > > > > > > itself, however this does affect the epilog code layout. > > > > > > Particularly knowing the static number if iterations left > > > > > > causes it to usually unroll the loop and share some of the > > > > > > computations. i.e. the scalar code is often more > > > > > efficient. > > > > > > > > > > > > The computation would be niters_vector_mult_vf - iters_done * > > > > > > vf, since the value put Here is the remaining iteration count. > > > > > > It's static for early > > > > > exits. > > > > > > > > > > Well, it might be "static" in that it doesn't really matter what > > > > > you use for the epilog main IV initial value as long as you are > > > > > sure you're not going to take that exit as you are sure we're > > > > > going to take one of the early exits. So yeah, the special code > > > > > is probably OK, but it needs a better comment and as said the > > > > > structure of > > > vect_update_ivs_after_vectorizer is a bit hard to follow now. > > > > > > > > > > As said an important part for optimization is to not keep the > > > > > scalar IVs live in the vector loop. > > > > > > > > > > > But can do whatever you prefer here. Let me know what you > > > > > > prefer for the > > > > > above. > > > > > > > > > > > > Thanks, > > > > > > Tamar > > > > > > > > > > > > > Thanks, > > > > > > > Richard. > > > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > Tamar > > > > > > > > > > > > > > > > > > > It has to do this since you have to perform the side > > > > > > > > > > effects for the non-matching elements still. > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > > Tamar > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > + if (STMT_VINFO_LIVE_P (phi_info)) > > > > > > > > > > > > + continue; > > > > > > > > > > > > + > > > > > > > > > > > > + /* For early break the final loop IV is: > > > > > > > > > > > > + init + (final - init) * vf which takes into > > > > > > > > > > > > +account > > > peeling > > > > > > > > > > > > + values and non-single steps. The main exit > > > can > > > > > > > > > > > > +use > > > > > > > niters > > > > > > > > > > > > + since if you exit from the main exit you've > > > done > > > > > > > > > > > > +all > > > > > > > vector > > > > > > > > > > > > + iterations. For an early exit we don't know > > > when > > > > > > > > > > > > +we > > > > > > > exit > > > > > > > > > > > > +so > > > > > > > > > > > we > > > > > > > > > > > > + must re-calculate this on the exit. */ > > > > > > > > > > > > + tree start_expr =3D gimple_phi_result (phi)= ; > > > > > > > > > > > > + off =3D fold_build2 (MINUS_EXPR, stype, > > > > > > > > > > > > + fold_convert (stype, > > > start_expr), > > > > > > > > > > > > + fold_convert (stype, > > > init_expr)); > > > > > > > > > > > > + /* Now adjust for VF to get the final itera= tion value. > > > */ > > > > > > > > > > > > + off =3D fold_build2 (MULT_EXPR, stype, off, > > > > > > > > > > > > + build_int_cst (stype, vf)); > > > > > > > > > > > > + } > > > > > > > > > > > > + else > > > > > > > > > > > > + off =3D fold_build2 (MULT_EXPR, stype, > > > > > > > > > > > > + fold_convert (stype, niters), > > > step_expr); > > > > > > > > > > > > + > > > > > > > > > > > > if (POINTER_TYPE_P (type)) > > > > > > > > > > > > ni =3D fold_build_pointer_plus (init_expr, of= f); > > > > > > > > > > > > else > > > > > > > > > > > > @@ -2238,6 +2286,8 @@ > > > > > > > > > > > > vect_update_ivs_after_vectorizer (loop_vec_info > > > > > > > > > > > loop_vinfo, > > > > > > > > > > > > /* Don't bother call vect_peel_nonlinear_iv_= init. */ > > > > > > > > > > > > else if (induction_type =3D=3D vect_step_op_= neg) > > > > > > > > > > > > ni =3D init_expr; > > > > > > > > > > > > + else if (restart_loop) > > > > > > > > > > > > + continue; > > > > > > > > > > > > > > > > > > > > > > This looks all a bit complicated - why wouldn't we > > > > > > > > > > > simply always use the PHI result when 'restart_loop'? > > > > > > > > > > > Isn't that the correct old start value in > > > > > > > > > all cases? > > > > > > > > > > > > > > > > > > > > > > > else > > > > > > > > > > > > ni =3D vect_peel_nonlinear_iv_init (&stmts, init_= expr, > > > > > > > > > > > > niters, step_expr, > @@ - > > > > > 2245,9 +2295,20 @@ > > > > > > > > > > > > vect_update_ivs_after_vectorizer > > > > > > > > > > > (loop_vec_info > > > > > > > > > > > > loop_vinfo, > > > > > > > > > > > > > > > > > > > > > > > > var =3D create_tmp_var (type, "tmp"); > > > > > > > > > > > > > > > > > > > > > > > > - last_gsi =3D gsi_last_bb (exit_bb); > > > > > > > > > > > > gimple_seq new_stmts =3D NULL; > > > > > > > > > > > > ni_name =3D force_gimple_operand (ni, > > > > > > > > > > > > &new_stmts, false, var); > > > > > > > > > > > > + > > > > > > > > > > > > + /* For non-main exit create an intermediat > > > > > > > > > > > > + edge to get any > > > > > > > updated iv > > > > > > > > > > > > + calculations. */ > > > > > > > > > > > > + if (needs_interm_block > > > > > > > > > > > > + && !iv_block > > > > > > > > > > > > + && (!gimple_seq_empty_p (stmts) || > > > > > > > > > > > > +!gimple_seq_empty_p > > > > > > > > > > > (new_stmts))) > > > > > > > > > > > > + { > > > > > > > > > > > > + iv_block =3D split_edge (update_e); > > > > > > > > > > > > + update_e =3D single_succ_edge (update_e->dest); > > > > > > > > > > > > + last_gsi =3D gsi_last_bb (iv_block); > > > > > > > > > > > > + } > > > > > > > > > > > > + > > > > > > > > > > > > /* Exit_bb shouldn't be empty. */ > > > > > > > > > > > > if (!gsi_end_p (last_gsi)) > > > > > > > > > > > > { > > > > > > > > > > > > @@ -3342,8 +3403,26 @@ vect_do_peeling > > > > > > > > > > > > (loop_vec_info loop_vinfo, tree > > > > > > > > > > > niters, tree nitersm1, > > > > > > > > > > > > niters_vector_mult_vf steps. */ > > > > > > > > > > > > gcc_checking_assert (vect_can_advance_ivs_p > > > (loop_vinfo)); > > > > > > > > > > > > update_e =3D skip_vector ? e : loop_preheade= r_edge > (epilog); > > > > > > > > > > > > - vect_update_ivs_after_vectorizer (loop_vinfo= , > > > > > > > niters_vector_mult_vf, > > > > > > > > > > > > - update_e); > > > > > > > > > > > > + if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)) > > > > > > > > > > > > + update_e =3D single_succ_edge (e->dest); > > > > > > > > > > > > + bool inversed_iv > > > > > > > > > > > > + =3D !vect_is_loop_exit_latch_pred > > > (LOOP_VINFO_IV_EXIT > > > > > > > (loop_vinfo), > > > > > > > > > > > > + LOOP_VINFO_LOOP > > > > > > > (loop_vinfo)); > > > > > > > > > > > > > > > > > > > > > > You are computing this here and in > > > > > vect_update_ivs_after_vectorizer? > > > > > > > > > > > > > > > > > > > > > > > + > > > > > > > > > > > > + /* Update the main exit first. */ > > > > > > > > > > > > + vect_update_ivs_after_vectorizer > > > > > > > > > > > > + (loop_vinfo, vf, > > > > > > > > > niters_vector_mult_vf, > > > > > > > > > > > > + update_e, > > > inversed_iv); > > > > > > > > > > > > + > > > > > > > > > > > > + /* And then update the early exits. */ > > > > > > > > > > > > + for (auto exit : get_loop_exit_edges (loop)) > > > > > > > > > > > > + { > > > > > > > > > > > > + if (exit =3D=3D LOOP_VINFO_IV_EXIT (loop_vinfo)= ) > > > > > > > > > > > > + continue; > > > > > > > > > > > > + > > > > > > > > > > > > + vect_update_ivs_after_vectorizer (loop_vinfo, > > > > > > > > > > > > +vf, > > > > > > > > > > > > + > > > niters_vector_mult_vf, > > > > > > > > > > > > + exit, true); > > > > > > > > > > > > > > > > > > > > > > ... why does the same not work here? Wouldn't the > > > > > > > > > > > proper condition be !dominated_by_p (CDI_DOMINATORS, > > > > > > > > > > > exit->src, LOOP_VINFO_IV_EXIT > > > > > > > > > > > (loop_vinfo)->src) or similar? That is, whether the > > > > > > > > > > > exit is at or after the main IV exit? (consider > > > > > > > > > > > having > > > > > > > > > > > two) > > > > > > > > > > > > > > > > > > > > > > > + } > > > > > > > > > > > > > > > > > > > > > > > > if (skip_epilog) > > > > > > > > > > > > { > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > Richard Biener SUSE Software > > > > > > > > > Solutions Germany GmbH, Frankenstrasse 146, 90461 > > > > > > > > > Nuernberg, Germany; > > > > > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB > > > > > > > > > 36809, AG > > > > > > > > > Nuernberg) > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > Richard Biener SUSE Software Solutions > > > > > > > Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; > > > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, > > > > > > > AG > > > > > > > Nuernberg) > > > > > > > > > > > > > > > > -- > > > > > Richard Biener SUSE Software Solutions > > > > > Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG > > > > > Nuernberg) > > > > > > > > > > -- > > > Richard Biener > > > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 > > > Nuernberg, Germany; > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG > > > Nuernberg) > > >=20 > -- > Richard Biener > SUSE Software Solutions Germany GmbH, > Frankenstrasse 146, 90461 Nuernberg, Germany; > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG > Nuernberg)