From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR04-HE1-obe.outbound.protection.outlook.com (mail-he1eur04on2050.outbound.protection.outlook.com [40.107.7.50]) by sourceware.org (Postfix) with ESMTPS id 3BC493858C41 for ; Thu, 16 Nov 2023 15:19:25 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 3BC493858C41 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 3BC493858C41 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=40.107.7.50 ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1700147968; cv=pass; b=ouvYhOTkEn+84OYRSk5xdcJ9wOUBex+raai3WVi0FWRwZsYh+UJMiuPedr06Ml9OFzUqTp07O1NV26+cLeLCe1gw+AVWUf3luMEpecGfq19D2KhYvRQUV70z3SfDzxcHv4eHwAJvN64Xn09N8ZUYmJ7DrJgBdh3Q1quKao9kAxU= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1700147968; c=relaxed/simple; bh=5N/pdvCzVTWtNaB9gOITRJj7J6dZt8ZKF/vJsuB/s/k=; h=DKIM-Signature:DKIM-Signature:From:To:Subject:Date:Message-ID: MIME-Version; b=hOUyBAcKk93vR/39H5AHs2JObLquwXvO07gpVkKpFadXY0MsddgngC5CSl2F6Xi7+SqNUifqCMIIuzilVoc6kyGBg158jTSyBJBP5lnZsoJq3W6XRNdxEmHK8mK0F5yE7dZvSa7IxH2xKYlo20lIPnth2B49PsCKyuHnQZmV9rU= ARC-Authentication-Results: i=3; server2.sourceware.org ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=diGJ1FF+vRD13Yg2xEEekJBI7llMypS6+ExGiFDzF/V60UsWgugbqaJKSPQ+m9B7Kx1+13+xo9ugYkL7ZdDHhgvB47/mQBExTMCaQ2jVf+YgMRSiMosq6RsMV6zoG5MpQG2Pd2ECbZ5kBetQy1tPynMd43Fu9yuBOv/g2t8N6PgfeM4MCB9o9q6XzJ/u6Kl6vFUBllr20kjszjbDQcdvy3wEnxiISO7zCTp5z7yzezo7jx165XxFDBnq1sTLdPzhFGFjhyGcjKcVnoyd9H4Bi5k/tt3xN4qwjJtNW6+d0fx2pUDc39hqosSuGgeXTyY8zNb0cKHwAQ66PL5as91rBg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=1wtN01LSoPIHuycwS7EV7VAAsYRNLNr6lt60QM14lKI=; b=XcUQf8Nfx99yx08G4GUmuu5PQYxpI6g5/o4voZPRPcgROE42DOeVdTTwMpyDyJV8LFuffrlC1boLRPYSuTpJmoNKVsZzROgxTnb+n5a4PTKQvw8oNJSlEDKYeYzrzuU6tO+fM9eHNfoxBFKlpneywiJTxmg6jJRYnc3+bOS/T+qdNASjosK48MxgPbaOCxS8GGjHnxYAiCuPdyKih+LIEsowDWo7A0inxR8zYgqx28IZREB0f4MtdDvBoevRgErrRqzfHYw0nW8ZHZXX8A6ma7Vy2Lp/9y/q1IQ5urycNyGTB6uEMVsPrbxpLsDuPHXvDh9i3h+mUu5IwmInmHl8lQ== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=1wtN01LSoPIHuycwS7EV7VAAsYRNLNr6lt60QM14lKI=; b=7p19/G7JrBx4bcpHdARA6jylDu9FSQ8LyqdC0aSa/D+/an4F6wE2hKC7DdydEB2YCncvbdTMtXlARg2FgdnSgjOobsjbS/bK70uDHgGk9LL6kYmWBNIUY0eOt+TT5wklHkp+wmo1ZxHnspDOgKhVsGZA4xXHXsjZUY1uluSdops= Received: from AM6PR0202CA0039.eurprd02.prod.outlook.com (2603:10a6:20b:3a::16) by DB4PR08MB9286.eurprd08.prod.outlook.com (2603:10a6:10:3f6::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6977.33; Thu, 16 Nov 2023 15:19:20 +0000 Received: from AM2PEPF0001C70E.eurprd05.prod.outlook.com (2603:10a6:20b:3a:cafe::2b) by AM6PR0202CA0039.outlook.office365.com (2603:10a6:20b:3a::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6977.31 via Frontend Transport; Thu, 16 Nov 2023 15:19:20 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM2PEPF0001C70E.mail.protection.outlook.com (10.167.16.202) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7002.20 via Frontend Transport; Thu, 16 Nov 2023 15:19:19 +0000 Received: ("Tessian outbound 26ee1d40577c:v228"); Thu, 16 Nov 2023 15:19:19 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 2f4dc1452ed69e17 X-CR-MTA-TID: 64aa7808 Received: from 109e014f205d.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id E6F004B3-50A9-4D16-A337-27571856257E.1; Thu, 16 Nov 2023 15:19:12 +0000 Received: from EUR03-AM7-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 109e014f205d.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 16 Nov 2023 15:19:12 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=I8zMesUwS0zf9AKQmBiLhiVj0Vqx1PCoRTFSBEzMXeBmzJKRApfviK7bcfFm6alaxQ3BiKE9+riWXvRARF4j6eWdMfLIzzji6ac4JymVUe5vGLMi8aLY4PGvZM3kcABwssi/+whY25KXvcumWLo+4zs8kj6Lo8+fbiEHQLQ+CIQd8foAnK+ZnJUmnBG9yjLFC3bPtW3t4PjjQ2QV7tN615Z/5BAOD94IPKIO97oyos7OT9wwGaMs6cuYhpVlgAqifnSOvR2oODo5eT5rRzEin1mhY0CNTVgaO6kNB5XtFvVE+eJyKckShvsC0F9JWXN8n+6iQdN4zINqxfZr42ctXg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=1wtN01LSoPIHuycwS7EV7VAAsYRNLNr6lt60QM14lKI=; b=hCP4jIqUQjxqunEZ5aqZUap8tVqlvSgyvYFxF2jVMYtCeIEifToVxCtALakYC2FMxp/qLmnE+RyZrZNW59Q6tvtaYYn5jnVTRLj7qweqtMbq7kkMEOKpSAJ4eRBfNcf12hfX2hBb9mIaANyaQHpsOIscsQtKoNTST/Do9GimDjBJ/6HcnGpFPLt579uJS6h328ECH9aV3uRLjLm6mdrk5tMUCpx70dxijQ6zWiQ6aK2XwQb6Tbzqh7wYITSExxruZAOsnE6JRks3qx8kZz62+GctBCPJYKvg38h5LSupqDnLjvGzcklbaMJNFW3cUQX7f4kuFGCNC5BF7dPbvjz1QQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=1wtN01LSoPIHuycwS7EV7VAAsYRNLNr6lt60QM14lKI=; b=7p19/G7JrBx4bcpHdARA6jylDu9FSQ8LyqdC0aSa/D+/an4F6wE2hKC7DdydEB2YCncvbdTMtXlARg2FgdnSgjOobsjbS/bK70uDHgGk9LL6kYmWBNIUY0eOt+TT5wklHkp+wmo1ZxHnspDOgKhVsGZA4xXHXsjZUY1uluSdops= Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by GV2PR08MB9398.eurprd08.prod.outlook.com (2603:10a6:150:df::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7002.21; Thu, 16 Nov 2023 15:19:07 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::9679:2ab0:99c6:54a3]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::9679:2ab0:99c6:54a3%6]) with mapi id 15.20.7002.021; Thu, 16 Nov 2023 15:19:06 +0000 From: Tamar Christina To: Richard Biener CC: "gcc-patches@gcc.gnu.org" , nd , "jlaw@ventanamicro.com" Subject: RE: [PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits Thread-Topic: [PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits Thread-Index: AQHaEIRUWLSyh++pSEG+iB4lA+c+BLB6jLQwgADZ7QCAAAD4UIAABQiAgAALGkCAAVnUgIAAAkUAgAAK9ACAAAX/4IAAC6cAgAACTBCAAA/GgIAACPOwgAAC2ICAAA3PkA== Date: Thu, 16 Nov 2023 15:19:06 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-traffictypediagnostic: VI1PR08MB5325:EE_|GV2PR08MB9398:EE_|AM2PEPF0001C70E:EE_|DB4PR08MB9286:EE_ X-MS-Office365-Filtering-Correlation-Id: 8c863116-100a-49b8-a5c7-08dbe6b767d5 x-checkrecipientrouted: true nodisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: 0VuORDMJ0G85SibM+QXRUwNU+zEBornuhFVbKy8FfsYlsDstjVZKapqSOtrDxVwRYvFK4kaVgUtDvNCo1fCOh2COORAI/Ns8rhEgeF6nVg24nr7+Vl7L7GwrH+U89rrY+4goRvajbVjZTzbhkIBp3FZ6OP7bCBRsFtyQo5sL54GPYkqy/i+01q4KC6Ta3yrVdEKWsse9/iTd/d3LIpaixxnF6m2Vh/pWPNHbK1sBvnEy0nvVLQ0Tr6z7n8Yj2NhKC/Kmjly4EmcnAvFm38jtigVofQEx7KqQKSBk3KKG8mx3eCQShXZkhFhSbBbAuxev0zpP0+G/IcrRI4xIs4Q16yNpNjQ3ZSBdSvFG9xxNCKH5EVbfmgQUnRlFTYa0uaQihcPfXf1n9UajtSD89yHVWkeR6oo6JBpVUjwp4WLGCOdw7B8cQiP9hNI9VRaeen73chnsmhA6CJi4s1mp0dW1YTOTUWqkXRQO1eI22by0vwv3bh2KxkK/oCalAQGpZGc7u4n3EBxVc2KhcE2CfZfrkZhFTESAj6WsaLRNGQE8rjl4juSvObdPAS+R1Y3cz8hAfuXuu+U/QkouO1hR+v3L4rxwH1D9XnM2d06+VvO1X6pd/TSsFdAHfvJ7EHwJdAyc X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:VI1PR08MB5325.eurprd08.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230031)(366004)(376002)(136003)(346002)(396003)(39860400002)(230922051799003)(451199024)(64100799003)(1800799009)(186009)(83380400001)(122000001)(478600001)(71200400001)(26005)(38100700002)(52536014)(55016003)(66946007)(66556008)(66476007)(64756008)(66446008)(54906003)(316002)(4326008)(8676002)(6916009)(8936002)(41300700001)(30864003)(2906002)(33656002)(5660300002)(86362001)(15650500001)(53546011)(76116006)(7696005)(6506007)(38070700009)(9686003)(559001)(579004);DIR:OUT;SFP:1101; Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: GV2PR08MB9398 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM2PEPF0001C70E.eurprd05.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 24c9b9c9-1555-45d9-cae5-08dbe6b75fd6 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: lRJqm3Y8YmV9f6OLt3q9Y23/qP9H+jyJtFEOcyvjDvvqO8WpMNTNkYff3kpQLppWPGAu3V6U4m74hbNNQZ174DJvZ++H1kSbj8d1C7LomlQfvwzdtASsj3HIeDHZK9o6yrBDPEkNuHWlGtY/ilmQuen2Bd/fynqlYLQ2CprcceT+j1Zqn+Blyk61ZLu8NEXJAQJiu72iyEhw3QErwu6GD4/uv2IfbVzeew0MmtMmjCbqJhX200jI1ZnHuvdbLlghaDVUww1TK7XYFWAt64LGF0PvWPzrCp01u95oxjCS3d7jnRQVe+J/CxagIICnPaIGsIwMSJeciP39BHyAOapL14W4WSbtnyy518899ZFyRle5OHcVOADqe2ZYKaLdei3egNVXI4Dx82TA4OXjljazWHOvaCINSFRO72MAH6LYMIm5BQCWPED6aeqWHj9NZHA0M+tmGbeQGJCGGwxSxD2zwKRZ8SrkBdqPEUD9TEULBoz4r01LrqXSMEM4P76KLNPSlYCaIFX/CREPHTOHEXTJ4BWHLpiUXPGNR7aGFvE8Suf9iSf59uPVTwZGDkqC8AFQveu3cp5n6BE7NFdfBTET7kyKNPA4pIleV9NF61pugRKU5cJSmByrALrQT8Osjfr1OsQM4zfSolOBp6sYC04+91WH6ie9iPrqJ/p8++iNHPdJooV/OgRNc27NFXSa6b1u0NDuoKJDFOFJKl/nFhYtvGvLwdLec3IAxao264eDMdYArEnfhhQpsV0FWLPy5Vuf X-Forefront-Antispam-Report: CIP:63.35.35.123;CTRY:IE;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:64aa7808-outbound-1.mta.getcheckrecipient.com;PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com;CAT:NONE;SFS:(13230031)(4636009)(136003)(376002)(346002)(396003)(39850400004)(230922051799003)(1800799009)(82310400011)(64100799003)(451199024)(186009)(40470700004)(36840700001)(46966006)(2906002)(478600001)(6506007)(9686003)(7696005)(53546011)(86362001)(52536014)(33656002)(15650500001)(5660300002)(40460700003)(316002)(70586007)(8936002)(4326008)(6862004)(8676002)(70206006)(30864003)(54906003)(81166007)(356005)(40480700001)(36860700001)(83380400001)(41300700001)(55016003)(26005)(47076005)(82740400003)(107886003)(336012);DIR:OUT;SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 16 Nov 2023 15:19:19.7734 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 8c863116-100a-49b8-a5c7-08dbe6b767d5 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d;Ip=[63.35.35.123];Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM2PEPF0001C70E.eurprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB4PR08MB9286 X-Spam-Status: No, score=-6.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,FORGED_SPF_HELO,KAM_DMARC_NONE,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE,UNPARSEABLE_RELAY autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: > -----Original Message----- > From: Richard Biener > Sent: Thursday, November 16, 2023 2:18 PM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd ; jlaw@ventanamicro.com > Subject: RE: [PATCH 7/21]middle-end: update IV update code to support ear= ly > breaks and arbitrary exits >=20 > On Thu, 16 Nov 2023, Tamar Christina wrote: >=20 > > > -----Original Message----- > > > From: Richard Biener > > > Sent: Thursday, November 16, 2023 1:36 PM > > > To: Tamar Christina > > > Cc: gcc-patches@gcc.gnu.org; nd ; > jlaw@ventanamicro.com > > > Subject: RE: [PATCH 7/21]middle-end: update IV update code to > > > support early breaks and arbitrary exits > > > > > > On Thu, 16 Nov 2023, Tamar Christina wrote: > > > > > > > > > > > > > > > > > > > > > > > Perhaps I'm missing something here? > > > > > > > > > > > > > > > > > > OK, so I refreshed my mind of what > > > > > > > > > vect_update_ivs_after_vectorizer > > > > > > > does. > > > > > > > > > > > > > > > > > > I still do not understand the (complexity of the) patch. > > > > > > > > > Basically the function computes the new value of the IV > > > > > > > > > "from scratch" based on the number of scalar iterations > > > > > > > > > of the vector loop, > > > > > the 'niter' > > > > > > > > > argument. I would have expected that for the early > > > > > > > > > exits we either pass in a different 'niter' or alternativ= ely a > 'niter_adjustment'. > > > > > > > > > > > > > > > > But for an early exit there's no static value for adjusted > > > > > > > > niter, since you don't know which iteration you exited from= . > > > > > > > > Unlike the normal exit when you know if you get there > > > > > > > > you've done all possible > > > > > > > iterations. > > > > > > > > > > > > > > > > So you must compute the scalar iteration count on the exit = itself. > > > > > > > > > > > > > > ? You do not need the actual scalar iteration you exited > > > > > > > (you don't compute that either), you need the scalar > > > > > > > iteration the vector iteration started with when it exited > > > > > > > prematurely and that's readily > > > > > available? > > > > > > > > > > > > For a normal exit yes, not for an early exit no? > > > > > > niters_vector_mult_vf is only valid for the main exit. > > > > > > > > > > > > There's the unadjusted scalar count, which is what it's using > > > > > > to adjust it to the final count. Unless I'm missing something? > > > > > > > > > > Ah, of course - niters_vector_mult_vf is for the countable exit. > > > > > For the early exits we can't precompute the scalar iteration valu= e. > > > > > But that then means we should compute the appropriate > "continuation" > > > > > as live value of the vectorized IVs even when they were not > > > > > originally used outside of the loop. I don't see how we can > > > > > express this in terms of the scalar IVs in the (not yet) > > > > > vectorized loop - similar to the reduction case you are going to > > > > > end up with the wrong values > > > here. > > > > > > > > > > That said, I've for a long time wanted to preserve the original > > > > > control IV also for the vector code (leaving any "optimization" > > > > > to IVOPTs there), that would enable us to compute the correct > > > > > "niters_vector_mult_vf" based on that IV. > > > > > > > > > > So given we cannot use the scalar IVs you have to handle all > > > > > inductions (besides the main exit control IV) in > > > > > vectorizable_live_operation > > > I think. > > > > > > > > > > > > > That's what I currently do, that's why there was the > > > > if (STMT_VINFO_LIVE_P (phi_info)) > > > > continue; > > > > > > Yes, but that only works for the inductions marked so. We'd need to > > > mark the others as well, but only for the early exits. > > > > > > > although I don't understand why we use the scalar count, I > > > > suppose the reasoning is that we don't really want to keep it > > > > around, and referencing > > > it forces it to be kept? > > > > > > Referencing it will cause the scalar compute to be retained, but > > > since we do not adjust the scalar compute during vectorization (but > > > expect it to be dead) the scalar compute will compute the wrong > > > thing (as shown by the reduction example - I suspect inductions will = suffer > from the same problem). > > > > > > > At the moment it just does `init + (final - init) * vf` which is co= rrect no? > > > > > > The issue is that 'final' is not computed correctly in the > > > vectorized loop. This formula might work for affine evolutions of co= urse. > > > > > > Extracting the correct value from the vectorized induction would be > > > the preferred solution. > > > > Ok, so I should be able to just mark IVs as live during process_use if > > there are multiple exits right? Since it's just gonna be unused on the > > main exit since we use niters? > > > > Because since it's the PHI inside the loop that needs to be marked > > live I can't just do it for a specific exits no? > > > > If I create a copy of the PHI node during peeling for use in early > > exits and mark it live it won't work no? >=20 > I guess I wouldn't actually mark it STMT_VINFO_LIVE_P but somehow arrange > vectorizable_live_operation to be called, possibly adding a edge argument= to > that as well. >=20 > Maybe the thing to do for the moment is to reject vectorization with earl= y > breaks if there's any (non-STMT_VINFO_LIVE_P?) induction or reduction > besides the main counting IV one you can already special-case? Ok so I did a quick hack with: if (!virtual_operand_p (PHI_RESULT (phi)) && !STMT_VINFO_LIVE_P (phi_info)) { use_operand_p use_p; imm_use_iterator imm_iter; bool non_exit_use =3D false; FOR_EACH_IMM_USE_FAST (use_p, imm_iter, PHI_RESULT (phi)) if (!flow_bb_inside_loop_p (loop, gimple_bb (USE_STMT (use_p)))) for (auto exit : get_loop_exit_edges (loop)) { if (exit =3D=3D LOOP_VINFO_IV_EXIT (loop_vinfo)) continue; if (gimple_bb (USE_STMT (use_p)) !=3D exit->dest) { non_exit_use =3D true; goto fail; } =20 } fail: if (non_exit_use) return false; } And it does seem to still allow all the cases I want. I've placed this in = vect_can_advance_ivs_p. Does this cover what you meant? Thanks, Tamar >=20 > Richard. >=20 > > Tamar > > > > > > > Also you missed the question below about how to avoid the creation > > > > of the block, You ok with changing that? > > > > > > > > Thanks, > > > > Tamar > > > > > > > > > Or for now disable early-break for inductions that are not the > > > > > main exit control IV (in vect_can_advance_ivs_p)? > > > > > > > > > > > > > > > > > > > > > > > It seems your change handles different kinds of > > > > > > > > > inductions > > > differently. > > > > > > > > > Specifically > > > > > > > > > > > > > > > > > > bool ivtemp =3D gimple_cond_lhs (cond) =3D=3D iv_va= r; > > > > > > > > > if (restart_loop && ivtemp) > > > > > > > > > { > > > > > > > > > type =3D TREE_TYPE (gimple_phi_result (phi)); > > > > > > > > > ni =3D build_int_cst (type, vf); > > > > > > > > > if (inversed_iv) > > > > > > > > > ni =3D fold_build2 (MINUS_EXPR, type, ni, > > > > > > > > > fold_convert (type, step_ex= pr)); > > > > > > > > > } > > > > > > > > > > > > > > > > > > it looks like for the exit test IV we use either 'VF' or = 'VF - step' > > > > > > > > > as the new value. That seems to be very odd special > > > > > > > > > casing for unknown reasons. And while you adjust > > > > > > > > > vec_step_op_add, you don't adjust > > > > > > > > > vect_peel_nonlinear_iv_init (maybe not supported - > > > > > > > > > better assert > > > > > > > here). > > > > > > > > > > > > > > > > The VF case is for a normal "non-inverted" loop, where if > > > > > > > > you take an early exit you know that you have to do at most= VF > iterations. > > > > > > > > The VF > > > > > > > > - step is to account for the inverted loop control flow > > > > > > > > where you exit after adjusting the IV already by + step. > > > > > > > > > > > > > > But doesn't that assume the IV counts from niter to zero? I > > > > > > > don't see this special case is actually necessary, no? > > > > > > > > > > > > > > > > > > > I needed it because otherwise the scalar loop iterates one > > > > > > iteration too little So I got a miscompile with the inverter > > > > > > loop stuff. I'll look at it again perhaps It can be solved dif= ferently. > > > > > > > > > > > > > > > > > > > > > > Peeling doesn't matter here, since you know you were able > > > > > > > > to do a vector iteration so it's safe to do VF iterations. > > > > > > > > So having peeled doesn't affect the remaining iters count. > > > > > > > > > > > > > > > > > > > > > > > > > > Also the vec_step_op_add case will keep the original > > > > > > > > > scalar IV live even when it is a vectorized induction. > > > > > > > > > The code recomputing the value from scratch avoids this. > > > > > > > > > > > > > > > > > > /* For non-main exit create an intermediat edge to > > > > > > > > > get any updated > > > > > iv > > > > > > > > > calculations. */ > > > > > > > > > if (needs_interm_block > > > > > > > > > && !iv_block > > > > > > > > > && (!gimple_seq_empty_p (stmts) || > > > > > > > > > !gimple_seq_empty_p > > > > > > > > > (new_stmts))) > > > > > > > > > { > > > > > > > > > iv_block =3D split_edge (update_e); > > > > > > > > > update_e =3D single_succ_edge (update_e->dest); > > > > > > > > > last_gsi =3D gsi_last_bb (iv_block); > > > > > > > > > } > > > > > > > > > > > > > > > > > > this is also odd, can we adjust the API instead? I > > > > > > > > > suppose this is because your computation uses the > > > > > > > > > original loop IV, if you based the computation off the > > > > > > > > > initial value only this might not be > > > > > necessary? > > > > > > > > > > > > > > > > No, on the main exit the code updates the value in the > > > > > > > > loop header and puts the Calculation in the merge block. > > > > > > > > This works because it only needs to consume PHI nodes in > > > > > > > > the merge block and things like niters are > > > > > > > adjusted in the guard block. > > > > > > > > > > > > > > > > For an early exit, we don't have a guard block, only the me= rge > block. > > > > > > > > We have to update the PHI nodes in that block, but can't > > > > > > > > do so since you can't produce a value and consume it in a > > > > > > > > PHI node in the same > > > > > BB. > > > > > > > > So we need to create the block to put the values in for > > > > > > > > use in the merge block. Because there's no "guard" block f= or early > exits. > > > > > > > > > > > > > > ? then compute niters in that block as well. > > > > > > > > > > > > We can't since it'll not be reachable through the right edge. > > > > > > What we can do if you want is slightly change peeling, we > > > > > > currently peel > > > as: > > > > > > > > > > > > \ \ / > > > > > > E1 E2 Normal exit > > > > > > \ | | > > > > > > \ | Guard > > > > > > \ | | > > > > > > Merge block > > > > > > | > > > > > > Pre Header > > > > > > > > > > > > If we instead peel as: > > > > > > > > > > > > > > > > > > \ \ / > > > > > > E1 E2 Normal exit > > > > > > \ | | > > > > > > Exit join Guard > > > > > > \ | | > > > > > > Merge block > > > > > > | > > > > > > Pre Header > > > > > > > > > > > > We can use the exit join block. This would also mean > > > > > > vect_update_ivs_after_vectorizer Doesn't need to iterate over > > > > > > all exits and only really needs to adjust the phi nodes Coming > > > > > > out of the exit join > > > > > and guard block. > > > > > > > > > > > > Does this work for you? > > > > > > Yeah, I think that would work. But I'd like to sort out the > > > correctness details of the IV update itself before sorting out this c= ode > placement detail. > > > > > > Richard. > > > > > > > > > Thanks, > > > > > > Tamar > > > > > > > > > > > > > > > The API can be adjusted by always creating the empty block > > > > > > > > either during > > > > > > > peeling. > > > > > > > > That would prevent us from having to do anything special he= re. > > > > > > > > Would that work better? Or I can do it in the loop that > > > > > > > > iterates over the exits to before the call to > > > > > > > > vect_update_ivs_after_vectorizer, which I think > > > > > > > might be more consistent. > > > > > > > > > > > > > > > > > > > > > > > > > > That said, I wonder why we cannot simply pass in an > > > > > > > > > adjusted niter which would be niters_vector_mult_vf - vf > > > > > > > > > and be done with > > > that? > > > > > > > > > > > > > > > > > > > > > > > > > We can ofcourse not have this and recompute it from niters > > > > > > > > itself, however this does affect the epilog code layout. > > > > > > > > Particularly knowing the static number if iterations left > > > > > > > > causes it to usually unroll the loop and share some of the > > > > > > > > computations. i.e. the scalar code is often more > > > > > > > efficient. > > > > > > > > > > > > > > > > The computation would be niters_vector_mult_vf - > > > > > > > > iters_done * vf, since the value put Here is the remaining = iteration > count. > > > > > > > > It's static for early > > > > > > > exits. > > > > > > > > > > > > > > Well, it might be "static" in that it doesn't really matter > > > > > > > what you use for the epilog main IV initial value as long as > > > > > > > you are sure you're not going to take that exit as you are > > > > > > > sure we're going to take one of the early exits. So yeah, > > > > > > > the special code is probably OK, but it needs a better > > > > > > > comment and as said the structure of > > > > > vect_update_ivs_after_vectorizer is a bit hard to follow now. > > > > > > > > > > > > > > As said an important part for optimization is to not keep > > > > > > > the scalar IVs live in the vector loop. > > > > > > > > > > > > > > > But can do whatever you prefer here. Let me know what you > > > > > > > > prefer for the > > > > > > > above. > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Tamar > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > Richard. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > > Tamar > > > > > > > > > > > > > > > > > > > > > > > It has to do this since you have to perform the > > > > > > > > > > > > side effects for the non-matching elements still. > > > > > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > > > > Tamar > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > + if (STMT_VINFO_LIVE_P (phi_info)) > > > > > > > > > > > > > > + continue; > > > > > > > > > > > > > > + > > > > > > > > > > > > > > + /* For early break the final loop IV is= : > > > > > > > > > > > > > > + init + (final - init) * vf which takes > > > > > > > > > > > > > > +into account > > > > > peeling > > > > > > > > > > > > > > + values and non-single steps. The main > > > > > > > > > > > > > > +exit > > > > > can > > > > > > > > > > > > > > +use > > > > > > > > > niters > > > > > > > > > > > > > > + since if you exit from the main exit > > > > > > > > > > > > > > +you've > > > > > done > > > > > > > > > > > > > > +all > > > > > > > > > vector > > > > > > > > > > > > > > + iterations. For an early exit we don't > > > > > > > > > > > > > > +know > > > > > when > > > > > > > > > > > > > > +we > > > > > > > > > exit > > > > > > > > > > > > > > +so > > > > > > > > > > > > > we > > > > > > > > > > > > > > + must re-calculate this on the exit. */ > > > > > > > > > > > > > > + tree start_expr =3D gimple_phi_result (= phi); > > > > > > > > > > > > > > + off =3D fold_build2 (MINUS_EXPR, stype, > > > > > > > > > > > > > > + fold_convert (stype, > > > > > start_expr), > > > > > > > > > > > > > > + fold_convert (stype, > > > > > init_expr)); > > > > > > > > > > > > > > + /* Now adjust for VF to get the final i= teration value. > > > > > */ > > > > > > > > > > > > > > + off =3D fold_build2 (MULT_EXPR, stype, = off, > > > > > > > > > > > > > > + build_int_cst (stype, vf)); > > > > > > > > > > > > > > + } > > > > > > > > > > > > > > + else > > > > > > > > > > > > > > + off =3D fold_build2 (MULT_EXPR, stype, > > > > > > > > > > > > > > + fold_convert (stype, niters), > > > > > step_expr); > > > > > > > > > > > > > > + > > > > > > > > > > > > > > if (POINTER_TYPE_P (type)) > > > > > > > > > > > > > > ni =3D fold_build_pointer_plus (init_expr= , off); > > > > > > > > > > > > > > else > > > > > > > > > > > > > > @@ -2238,6 +2286,8 @@ > > > > > > > > > > > > > > vect_update_ivs_after_vectorizer > > > > > > > > > > > > > > (loop_vec_info > > > > > > > > > > > > > loop_vinfo, > > > > > > > > > > > > > > /* Don't bother call vect_peel_nonlinear= _iv_init. */ > > > > > > > > > > > > > > else if (induction_type =3D=3D vect_step= _op_neg) > > > > > > > > > > > > > > ni =3D init_expr; > > > > > > > > > > > > > > + else if (restart_loop) > > > > > > > > > > > > > > + continue; > > > > > > > > > > > > > > > > > > > > > > > > > > This looks all a bit complicated - why wouldn't > > > > > > > > > > > > > we simply always use the PHI result when 'restart= _loop'? > > > > > > > > > > > > > Isn't that the correct old start value in > > > > > > > > > > > all cases? > > > > > > > > > > > > > > > > > > > > > > > > > > > else > > > > > > > > > > > > > > ni =3D vect_peel_nonlinear_iv_init (&stmts, i= nit_expr, > > > > > > > > > > > > > > niters, step_expr, > > > @@ - > > > > > > > 2245,9 +2295,20 @@ > > > > > > > > > > > > > > vect_update_ivs_after_vectorizer > > > > > > > > > > > > > (loop_vec_info > > > > > > > > > > > > > > loop_vinfo, > > > > > > > > > > > > > > > > > > > > > > > > > > > > var =3D create_tmp_var (type, "tmp"); > > > > > > > > > > > > > > > > > > > > > > > > > > > > - last_gsi =3D gsi_last_bb (exit_bb); > > > > > > > > > > > > > > gimple_seq new_stmts =3D NULL; > > > > > > > > > > > > > > ni_name =3D force_gimple_operand (ni, > > > > > > > > > > > > > > &new_stmts, false, var); > > > > > > > > > > > > > > + > > > > > > > > > > > > > > + /* For non-main exit create an > > > > > > > > > > > > > > + intermediat edge to get any > > > > > > > > > updated iv > > > > > > > > > > > > > > + calculations. */ > > > > > > > > > > > > > > + if (needs_interm_block > > > > > > > > > > > > > > + && !iv_block > > > > > > > > > > > > > > + && (!gimple_seq_empty_p (stmts) || > > > > > > > > > > > > > > +!gimple_seq_empty_p > > > > > > > > > > > > > (new_stmts))) > > > > > > > > > > > > > > + { > > > > > > > > > > > > > > + iv_block =3D split_edge (update_e); > > > > > > > > > > > > > > + update_e =3D single_succ_edge (update_e->de= st); > > > > > > > > > > > > > > + last_gsi =3D gsi_last_bb (iv_block); > > > > > > > > > > > > > > + } > > > > > > > > > > > > > > + > > > > > > > > > > > > > > /* Exit_bb shouldn't be empty. */ > > > > > > > > > > > > > > if (!gsi_end_p (last_gsi)) > > > > > > > > > > > > > > { > > > > > > > > > > > > > > @@ -3342,8 +3403,26 @@ vect_do_peeling > > > > > > > > > > > > > > (loop_vec_info loop_vinfo, tree > > > > > > > > > > > > > niters, tree nitersm1, > > > > > > > > > > > > > > niters_vector_mult_vf steps. */ > > > > > > > > > > > > > > gcc_checking_assert > > > > > > > > > > > > > > (vect_can_advance_ivs_p > > > > > (loop_vinfo)); > > > > > > > > > > > > > > update_e =3D skip_vector ? e : > > > > > > > > > > > > > > loop_preheader_edge > > > (epilog); > > > > > > > > > > > > > > - vect_update_ivs_after_vectorizer (loop_v= info, > > > > > > > > > niters_vector_mult_vf, > > > > > > > > > > > > > > - update_e); > > > > > > > > > > > > > > + if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)= ) > > > > > > > > > > > > > > + update_e =3D single_succ_edge (e->dest); > > > > > > > > > > > > > > + bool inversed_iv > > > > > > > > > > > > > > + =3D !vect_is_loop_exit_latch_pred > > > > > (LOOP_VINFO_IV_EXIT > > > > > > > > > (loop_vinfo), > > > > > > > > > > > > > > + LOOP_VINFO_LOOP > > > > > > > > > (loop_vinfo)); > > > > > > > > > > > > > > > > > > > > > > > > > > You are computing this here and in > > > > > > > vect_update_ivs_after_vectorizer? > > > > > > > > > > > > > > > > > > > > > > > > > > > + > > > > > > > > > > > > > > + /* Update the main exit first. */ > > > > > > > > > > > > > > + vect_update_ivs_after_vectorizer > > > > > > > > > > > > > > + (loop_vinfo, vf, > > > > > > > > > > > niters_vector_mult_vf, > > > > > > > > > > > > > > + update_e, > > > > > inversed_iv); > > > > > > > > > > > > > > + > > > > > > > > > > > > > > + /* And then update the early exits. */ > > > > > > > > > > > > > > + for (auto exit : get_loop_exit_edges (lo= op)) > > > > > > > > > > > > > > + { > > > > > > > > > > > > > > + if (exit =3D=3D LOOP_VINFO_IV_EXIT (loop_vi= nfo)) > > > > > > > > > > > > > > + continue; > > > > > > > > > > > > > > + > > > > > > > > > > > > > > + vect_update_ivs_after_vectorizer > > > > > > > > > > > > > > +(loop_vinfo, vf, > > > > > > > > > > > > > > + > > > > > niters_vector_mult_vf, > > > > > > > > > > > > > > + exit, true); > > > > > > > > > > > > > > > > > > > > > > > > > > ... why does the same not work here? Wouldn't > > > > > > > > > > > > > the proper condition be !dominated_by_p > > > > > > > > > > > > > (CDI_DOMINATORS, > > > > > > > > > > > > > exit->src, LOOP_VINFO_IV_EXIT > > > > > > > > > > > > > (loop_vinfo)->src) or similar? That is, whether > > > > > > > > > > > > > the exit is at or after the main IV exit? > > > > > > > > > > > > > (consider having > > > > > > > > > > > > > two) > > > > > > > > > > > > > > > > > > > > > > > > > > > + } > > > > > > > > > > > > > > > > > > > > > > > > > > > > if (skip_epilog) > > > > > > > > > > > > > > { > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > Richard Biener SUSE Software > > > > > > > > > > > Solutions Germany GmbH, Frankenstrasse 146, 90461 > > > > > > > > > > > Nuernberg, Germany; > > > > > > > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; > > > > > > > > > > > (HRB 36809, AG > > > > > > > > > > > Nuernberg) > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > Richard Biener SUSE Software > > > > > > > > > Solutions Germany GmbH, Frankenstrasse 146, 90461 > > > > > > > > > Nuernberg, Germany; > > > > > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB > > > > > > > > > 36809, AG > > > > > > > > > Nuernberg) > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > Richard Biener SUSE Software Solutions > > > > > > > Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; > > > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, > > > > > > > AG > > > > > > > Nuernberg) > > > > > > > > > > > > > > > > -- > > > > > Richard Biener SUSE Software Solutions > > > > > Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG > > > > > Nuernberg) > > > > > > > > > > -- > > > Richard Biener > > > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 > > > Nuernberg, Germany; > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG > > > Nuernberg) > > >=20 > -- > Richard Biener > SUSE Software Solutions Germany GmbH, > Frankenstrasse 146, 90461 Nuernberg, Germany; > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG > Nuernberg)