From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR01-DB5-obe.outbound.protection.outlook.com (mail-db5eur01on2051.outbound.protection.outlook.com [40.107.15.51]) by sourceware.org (Postfix) with ESMTPS id 8667E38582B3 for ; Thu, 16 Nov 2023 13:22:35 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 8667E38582B3 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 8667E38582B3 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=40.107.15.51 ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1700140958; cv=pass; b=PX/omlAFnGQjl1YLXLPSxWpjHLq8bIDjNxh0frRaiI4vCr56Qn2S36NaKX7BqCnH+68rwbJHEqUwigEVDpmuLLoduAEHRynaNqOi3cAWg4CToYPe78mgFeVU9cWLJeY5Nbg6tp2TS1lezs6cUsNwvCve6XDh/H6BCTUKJ4YlYhs= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1700140958; c=relaxed/simple; bh=G/5HhCNXyqphrkT4TpbM/L0+Bv7mpjaEdvd4GT0urKc=; h=DKIM-Signature:DKIM-Signature:From:To:Subject:Date:Message-ID: MIME-Version; b=GmxVygb6sypJu1CM50JJHj0ightdQnaliSb/AGMH6p6pO6GpDi30m2YTQecYUxK2jkaGIdguw3W8PEOtnLKkqxuv2pEGC9bPxI4z0sOmT9KA3g8VAkkP3o6ffwNNL6PSbOVe1myTnP81QCNunQ9+GzKFr6yPmyIFvpny1BoCGRQ= ARC-Authentication-Results: i=3; server2.sourceware.org ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=E6NNt/Z2lR19YSKU+G6S71n3gM1L7Bt5kitwPchOFmdTYC7bSvcxKUuisCv8qFIqwlHLg4t6aCFhGF4HfL9cBmvKqaqS4VpkKFR3VQC6jDwzy6Zys6+fLVpNUvqJPUnWtZKTDxHWfMU/zi1ZRGzqxjeFtSKUnDJJB01IHrXDsQMYP03eH2Tx+QBrRS1+DMDWQ4RMwa1mY4Wql2Lnjco6j23OHoRW03kaVVXv+gS8gfPOn0t8tgx4+JuBTCHtqzdWPxtF3ZFBaWJjyIyKMktaI+ehgoGfbaEBuU7NjElJLBUZYiujWBNzuLLunWrlhi0ANWLgrwTuNLwEQS7fpjRqpg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=ReH1Nkhb6G9jKptNkI9//KYUkRW2hlmRdDahjEczEIc=; b=fiOIZdCidl+aNIy12fjREX86ME9m16d0/7sY1NPWk/Mqg6E8dVWwVanpwGOovf2MfeanXSP3eeL5M8VuqAePAvPR3NNuKP9YIe7yRhCiR/5zkfj2wgDSA0AjvpOp5BjgbahVc8pUOlYRTlGgCKrNcf3UeVGF1I1tv5O1XrSfSsrbXwHzT8aQz+msbznz9MhR+SDv/HYSDb3aJSVUkUQwssu+lm+6wFSXmHDdgSL0fFla7EMHNAV/7apCW5f6zW6ZuzUYyVGJHDiSz5lFleYNUM9cE9BsbG/JvtMUtg7swoOsU8qC2Sc9t1NiDDi1xkkViaAP0NZYriaZ/e3FM1cb6g== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=ReH1Nkhb6G9jKptNkI9//KYUkRW2hlmRdDahjEczEIc=; b=OZ6zS5d5wk18cyO+AsnsIk31evVh9stozmlti3tm8fuOJ+ZRtk+MQHhgyyKkyKPaNQHHqGkROxsPd0yciXNB3QCrm+WqcaPV/9+kwXMvJQzuObgpOnbyERxmapra0F7jPJ6E4x/LRiiy5vPS792JcS9AiwmklryYWB46mpt62AE= Received: from AS9PR06CA0266.eurprd06.prod.outlook.com (2603:10a6:20b:45f::15) by AS8PR08MB9813.eurprd08.prod.outlook.com (2603:10a6:20b:616::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7002.21; Thu, 16 Nov 2023 13:22:32 +0000 Received: from AMS0EPF000001B5.eurprd05.prod.outlook.com (2603:10a6:20b:45f:cafe::1d) by AS9PR06CA0266.outlook.office365.com (2603:10a6:20b:45f::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7002.21 via Frontend Transport; Thu, 16 Nov 2023 13:22:32 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AMS0EPF000001B5.mail.protection.outlook.com (10.167.16.169) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7002.20 via Frontend Transport; Thu, 16 Nov 2023 13:22:32 +0000 Received: ("Tessian outbound 7671e7ddc218:v228"); Thu, 16 Nov 2023 13:22:32 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 5e892180aa603c76 X-CR-MTA-TID: 64aa7808 Received: from 5c39ae0bebb1.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 59B9AFF1-3DD8-4A7E-8A33-475C378DACA9.1; Thu, 16 Nov 2023 13:22:26 +0000 Received: from EUR04-HE1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 5c39ae0bebb1.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 16 Nov 2023 13:22:26 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=JY3shNWZfFNmqr120XK1mtPOCWdv0Pmzvv98EweT89/23kMN80RLhRpukWjJuzqGu8yGOK5HLMZQJGx6yQZDFCPqpTqhFnmytpIaosdby74COaC6iHJ9XBdCUd4HO7ZuzzUkZ+m8UPlx7uTlyTmpn5zf5KXlkB7bZYXaounw0FQZf2ohNXTsQJ2uiB40sxp9GTiPrzyV7FNFUItRZkco9s+Qeqz1Ljp/MipKCCsV9xF4K/6FS8WYJfQL6rPYx9ClcX442IQcryn5nd7h1qN13am7AvO/s9Fm29I1b8UZ65o49TJ6sHQiBAN8zQV7tf2a0IEPgxnw5hGetetbalP3nw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=ReH1Nkhb6G9jKptNkI9//KYUkRW2hlmRdDahjEczEIc=; b=S3+1jkFPe0suEFo0UFohjG2wmfMJLNIj49i2VGrFbGXKK8zRJxjYwqPQFKtpFO+f966j/8WAac0t9Lrm1LaVio6Gld1zgTZBV53bqGt1hSAw+ZlQwhXzaRFZklDQO0BqjZno1BDgQ/ZS6cLRRzZ6iQDWuO7CQiq2R6ZRS06O2Ac675FnW8C65k+IQpmO42Nm2mcqUhysfz9A+DQa76UfsbakvJMIQx/1qsV1vPYz5KGMInupP1g/ZqS2HjcBqnucG1B8pCwRalIvbEeJe17uwEJW6Mu4NWi50s92nGhXTzcvwvxrfrlXFQjpv/MU1thYpt+PtNpyjoAyJXCER8MFqg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=ReH1Nkhb6G9jKptNkI9//KYUkRW2hlmRdDahjEczEIc=; b=OZ6zS5d5wk18cyO+AsnsIk31evVh9stozmlti3tm8fuOJ+ZRtk+MQHhgyyKkyKPaNQHHqGkROxsPd0yciXNB3QCrm+WqcaPV/9+kwXMvJQzuObgpOnbyERxmapra0F7jPJ6E4x/LRiiy5vPS792JcS9AiwmklryYWB46mpt62AE= Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by VI1PR08MB10218.eurprd08.prod.outlook.com (2603:10a6:800:1be::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7002.21; Thu, 16 Nov 2023 13:22:23 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::9679:2ab0:99c6:54a3]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::9679:2ab0:99c6:54a3%6]) with mapi id 15.20.7002.021; Thu, 16 Nov 2023 13:22:23 +0000 From: Tamar Christina To: Richard Biener CC: "gcc-patches@gcc.gnu.org" , nd , "jlaw@ventanamicro.com" Subject: RE: [PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits Thread-Topic: [PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits Thread-Index: AQHaEIRUWLSyh++pSEG+iB4lA+c+BLB6jLQwgADZ7QCAAAD4UIAABQiAgAALGkCAAVnUgIAAAkUAgAAK9ACAAAX/4IAAC6cAgAACTBA= Date: Thu, 16 Nov 2023 13:22:23 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-traffictypediagnostic: VI1PR08MB5325:EE_|VI1PR08MB10218:EE_|AMS0EPF000001B5:EE_|AS8PR08MB9813:EE_ X-MS-Office365-Filtering-Correlation-Id: 4407606e-be75-48d5-5adc-08dbe6a71730 x-checkrecipientrouted: true nodisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: fBhVO0jzREGUucMl2ln5B37R9Pklqj+GwItwGGISUmGK/CX9f9mmR4I6WDIXn9XLIo+5C7FYuCVHdeBZfnlD6S/qFfcggpls4hCaBSdUWlK/ihFa7Vdw4CDUcjrnN7A9xqLdB9qU7nCgNvljLHchnkk6e/JTTcXidHzJ50UW+bBI5YqlMti6/x/qp4QbvSEq4qplHvcavpbYGTsbnQkFOWQsB1u538CfBnwp0s2g74UoyYM7qeogK8MRjJgHYf1LwfrvDuhnqI4H4/3ROL7dwhu91L09DItzr0TE/8eRzQk7sM7d6mzr5sDaC5DQXiFo0TWZqLIbl9SH1aNfnBjiwtV7juliffABuksQbGoIWx+jFMM0qwcB7yQH6nujiQJbgCFH3p/lBGC6cKmnDvH/oU6mPZPQw8hxbw5c9Cmqmpqv1iasqkvb+/VLPzhEzbognRI5kVIxQ2XGU+YibrchcTNmaUOvZPxKL0YyY1wimwyjiR8e6xrVSMaCtHS0mKgbGadHWs2mCK+4hrT6e/jpXS1rQ3w2pRLye4KABwDsQl4rky+a6XqZbFBsFxoKxrfKdO9v8oNiOt/MMfp6230SWHkpFaj8LMD9KKpYPNGEzS0RYLQwKYgVNyZ23Xl8kEeJ X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:VI1PR08MB5325.eurprd08.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230031)(136003)(346002)(39850400004)(366004)(376002)(396003)(230922051799003)(1800799009)(64100799003)(186009)(451199024)(15650500001)(30864003)(2906002)(33656002)(41300700001)(316002)(66556008)(54906003)(64756008)(66476007)(66446008)(55016003)(6916009)(71200400001)(8936002)(38070700009)(52536014)(8676002)(66946007)(4326008)(76116006)(38100700002)(122000001)(478600001)(5660300002)(86362001)(7696005)(83380400001)(26005)(6506007)(9686003);DIR:OUT;SFP:1101; Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR08MB10218 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AMS0EPF000001B5.eurprd05.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 0daea892-6ccd-4678-4814-08dbe6a7118a X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: fuIiALdHluAB+E1XbYsth2E5iiNWVD6Jaj4AmrkGSFMy9GIM789agKrhOHRrLwfrNjcPOn+KIPeAyYzehF959yojfUy/eKT9A23IcjLW5P8e4O+qp0nN9aDQGQHSyoZ6BKAW+ZitnSdDsfgUde9yPaLEGT7RZSntCnWFTCNLvPNMBOdIm1iNCI+gQVaLs96fRVOrMJYGuhzcyriV0DfeYSd3CugU6hWJIE5YXikd/B3do5s1jPaDMj11Y7SLhbDMgLO0bjh/jpNxC6IIU2t5ufNiIKwJQ/xBX8wjs6hFmjIdnLTROajx3epEkfYq9oAoPiXwt6swCgudEzRw0gJZHnlF7r3cnAw0uuj/ua/931kC3mXDFtfvF6KqZoTY1w7bBUiGCoUiTudIkf8+hmh1lBmCzAN2/Q0Z0dtv1U1e0cs3trmeLufSOym1I4a2+6McHxjfPUtDsQWKETgnYMyqNYqEnHJ994/rZIzE2gWOZ/oBQ/imC/ezqc09j7R7l0HiKoNFCL7i1vO1UrzNehPvbx0xCy/5JBVyEqZDJ555ts8V992zzwFPngiYBkaVSplP3ywuCYGjvgQqDkSz4aDYhA5UT4pDz+hRfQBS/AbzzPPIeAWLhkGRRAw9SgjBDNLafrAPXbYQ+bduz+PVcekkC6hhpmaQWTedGzqWOenwGGLDmIELwp5meqN1Ub8XagtIDW4+Yixz6Rv/jMmrsLVI8XXjMttZXhPdDDy112WRd2VtGEZcbElsoBEdvf/yCTcT X-Forefront-Antispam-Report: CIP:63.35.35.123;CTRY:IE;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:64aa7808-outbound-1.mta.getcheckrecipient.com;PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com;CAT:NONE;SFS:(13230031)(4636009)(136003)(376002)(396003)(39860400002)(346002)(230922051799003)(451199024)(186009)(82310400011)(64100799003)(1800799009)(46966006)(40470700004)(36840700001)(30864003)(70586007)(70206006)(41300700001)(54906003)(316002)(86362001)(5660300002)(33656002)(2906002)(15650500001)(8676002)(40460700003)(4326008)(6862004)(52536014)(8936002)(47076005)(81166007)(356005)(36860700001)(336012)(82740400003)(55016003)(83380400001)(40480700001)(478600001)(26005)(107886003)(6506007)(7696005)(9686003);DIR:OUT;SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 16 Nov 2023 13:22:32.5404 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 4407606e-be75-48d5-5adc-08dbe6a71730 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d;Ip=[63.35.35.123];Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AMS0EPF000001B5.eurprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR08MB9813 X-Spam-Status: No, score=-6.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,FORGED_SPF_HELO,KAM_DMARC_NONE,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE,UNPARSEABLE_RELAY autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: > > > > > > > > > > > > Perhaps I'm missing something here? > > > > > > > > > > OK, so I refreshed my mind of what > > > > > vect_update_ivs_after_vectorizer > > > does. > > > > > > > > > > I still do not understand the (complexity of the) patch. > > > > > Basically the function computes the new value of the IV "from > > > > > scratch" based on the number of scalar iterations of the vector l= oop, > the 'niter' > > > > > argument. I would have expected that for the early exits we > > > > > either pass in a different 'niter' or alternatively a 'niter_adju= stment'. > > > > > > > > But for an early exit there's no static value for adjusted niter, > > > > since you don't know which iteration you exited from. Unlike the > > > > normal exit when you know if you get there you've done all > > > > possible > > > iterations. > > > > > > > > So you must compute the scalar iteration count on the exit itself. > > > > > > ? You do not need the actual scalar iteration you exited (you don't > > > compute that either), you need the scalar iteration the vector > > > iteration started with when it exited prematurely and that's readily > available? > > > > For a normal exit yes, not for an early exit no? niters_vector_mult_vf > > is only valid for the main exit. > > > > There's the unadjusted scalar count, which is what it's using to > > adjust it to the final count. Unless I'm missing something? >=20 > Ah, of course - niters_vector_mult_vf is for the countable exit. For the= early > exits we can't precompute the scalar iteration value. But that then mean= s we > should compute the appropriate "continuation" as live value of the vector= ized > IVs even when they were not originally used outside of the loop. I don't= see > how we can express this in terms of the scalar IVs in the (not yet) vecto= rized > loop - similar to the reduction case you are going to end up with the wro= ng > values here. >=20 > That said, I've for a long time wanted to preserve the original control I= V also for > the vector code (leaving any "optimization" > to IVOPTs there), that would enable us to compute the correct > "niters_vector_mult_vf" based on that IV. >=20 > So given we cannot use the scalar IVs you have to handle all inductions > (besides the main exit control IV) in vectorizable_live_operation I think= . >=20 That's what I currently do, that's why there was the if (STMT_VINFO_LIVE_P (phi_info)) continue; although I don't understand why we use the scalar count, I suppose the rea= soning is that we don't really want to keep it around, and referencing it forces i= t to be kept? At the moment it just does `init + (final - init) * vf` which is correct no= ? Also you missed the question below about how to avoid the creation of the b= lock, You ok with changing that? Thanks, Tamar > Or for now disable early-break for inductions that are not the main exit = control > IV (in vect_can_advance_ivs_p)? >=20 > > > > > > > > > > It seems your change handles different kinds of inductions differ= ently. > > > > > Specifically > > > > > > > > > > bool ivtemp =3D gimple_cond_lhs (cond) =3D=3D iv_var; > > > > > if (restart_loop && ivtemp) > > > > > { > > > > > type =3D TREE_TYPE (gimple_phi_result (phi)); > > > > > ni =3D build_int_cst (type, vf); > > > > > if (inversed_iv) > > > > > ni =3D fold_build2 (MINUS_EXPR, type, ni, > > > > > fold_convert (type, step_expr)); > > > > > } > > > > > > > > > > it looks like for the exit test IV we use either 'VF' or 'VF - st= ep' > > > > > as the new value. That seems to be very odd special casing for > > > > > unknown reasons. And while you adjust vec_step_op_add, you > > > > > don't adjust vect_peel_nonlinear_iv_init (maybe not supported - > > > > > better assert > > > here). > > > > > > > > The VF case is for a normal "non-inverted" loop, where if you take > > > > an early exit you know that you have to do at most VF iterations. > > > > The VF > > > > - step is to account for the inverted loop control flow where you > > > > exit after adjusting the IV already by + step. > > > > > > But doesn't that assume the IV counts from niter to zero? I don't > > > see this special case is actually necessary, no? > > > > > > > I needed it because otherwise the scalar loop iterates one iteration > > too little So I got a miscompile with the inverter loop stuff. I'll > > look at it again perhaps It can be solved differently. > > > > > > > > > > Peeling doesn't matter here, since you know you were able to do a > > > > vector iteration so it's safe to do VF iterations. So having > > > > peeled doesn't affect the remaining iters count. > > > > > > > > > > > > > > Also the vec_step_op_add case will keep the original scalar IV > > > > > live even when it is a vectorized induction. The code > > > > > recomputing the value from scratch avoids this. > > > > > > > > > > /* For non-main exit create an intermediat edge to get any = updated > iv > > > > > calculations. */ > > > > > if (needs_interm_block > > > > > && !iv_block > > > > > && (!gimple_seq_empty_p (stmts) || !gimple_seq_empty_p > > > > > (new_stmts))) > > > > > { > > > > > iv_block =3D split_edge (update_e); > > > > > update_e =3D single_succ_edge (update_e->dest); > > > > > last_gsi =3D gsi_last_bb (iv_block); > > > > > } > > > > > > > > > > this is also odd, can we adjust the API instead? I suppose this > > > > > is because your computation uses the original loop IV, if you > > > > > based the computation off the initial value only this might not b= e > necessary? > > > > > > > > No, on the main exit the code updates the value in the loop header > > > > and puts the Calculation in the merge block. This works because > > > > it only needs to consume PHI nodes in the merge block and things > > > > like niters are > > > adjusted in the guard block. > > > > > > > > For an early exit, we don't have a guard block, only the merge bloc= k. > > > > We have to update the PHI nodes in that block, but can't do so > > > > since you can't produce a value and consume it in a PHI node in the= same > BB. > > > > So we need to create the block to put the values in for use in the > > > > merge block. Because there's no "guard" block for early exits. > > > > > > ? then compute niters in that block as well. > > > > We can't since it'll not be reachable through the right edge. What we > > can do if you want is slightly change peeling, we currently peel as: > > > > \ \ / > > E1 E2 Normal exit > > \ | | > > \ | Guard > > \ | | > > Merge block > > | > > Pre Header > > > > If we instead peel as: > > > > > > \ \ / > > E1 E2 Normal exit > > \ | | > > Exit join Guard > > \ | | > > Merge block > > | > > Pre Header > > > > We can use the exit join block. This would also mean > > vect_update_ivs_after_vectorizer Doesn't need to iterate over all > > exits and only really needs to adjust the phi nodes Coming out of the e= xit join > and guard block. > > > > Does this work for you? > > > > Thanks, > > Tamar > > > > > > > The API can be adjusted by always creating the empty block either > > > > during > > > peeling. > > > > That would prevent us from having to do anything special here. > > > > Would that work better? Or I can do it in the loop that iterates > > > > over the exits to before the call to > > > > vect_update_ivs_after_vectorizer, which I think > > > might be more consistent. > > > > > > > > > > > > > > That said, I wonder why we cannot simply pass in an adjusted > > > > > niter which would be niters_vector_mult_vf - vf and be done with = that? > > > > > > > > > > > > > We can ofcourse not have this and recompute it from niters itself, > > > > however this does affect the epilog code layout. Particularly > > > > knowing the static number if iterations left causes it to usually > > > > unroll the loop and share some of the computations. i.e. the > > > > scalar code is often more > > > efficient. > > > > > > > > The computation would be niters_vector_mult_vf - iters_done * vf, > > > > since the value put Here is the remaining iteration count. It's > > > > static for early > > > exits. > > > > > > Well, it might be "static" in that it doesn't really matter what you > > > use for the epilog main IV initial value as long as you are sure > > > you're not going to take that exit as you are sure we're going to > > > take one of the early exits. So yeah, the special code is probably > > > OK, but it needs a better comment and as said the structure of > vect_update_ivs_after_vectorizer is a bit hard to follow now. > > > > > > As said an important part for optimization is to not keep the scalar > > > IVs live in the vector loop. > > > > > > > But can do whatever you prefer here. Let me know what you prefer > > > > for the > > > above. > > > > > > > > Thanks, > > > > Tamar > > > > > > > > > Thanks, > > > > > Richard. > > > > > > > > > > > > > > > > Regards, > > > > > > Tamar > > > > > > > > > > > > > > > It has to do this since you have to perform the side > > > > > > > > effects for the non-matching elements still. > > > > > > > > > > > > > > > > Regards, > > > > > > > > Tamar > > > > > > > > > > > > > > > > > > > > > > > > > > > + if (STMT_VINFO_LIVE_P (phi_info)) > > > > > > > > > > + continue; > > > > > > > > > > + > > > > > > > > > > + /* For early break the final loop IV is: > > > > > > > > > > + init + (final - init) * vf which takes into account > peeling > > > > > > > > > > + values and non-single steps. The main exit > can > > > > > > > > > > +use > > > > > niters > > > > > > > > > > + since if you exit from the main exit you've > done > > > > > > > > > > +all > > > > > vector > > > > > > > > > > + iterations. For an early exit we don't know > when > > > > > > > > > > +we > > > > > exit > > > > > > > > > > +so > > > > > > > > > we > > > > > > > > > > + must re-calculate this on the exit. */ > > > > > > > > > > + tree start_expr =3D gimple_phi_result (phi); > > > > > > > > > > + off =3D fold_build2 (MINUS_EXPR, stype, > > > > > > > > > > + fold_convert (stype, > start_expr), > > > > > > > > > > + fold_convert (stype, > init_expr)); > > > > > > > > > > + /* Now adjust for VF to get the final iteration= value. > */ > > > > > > > > > > + off =3D fold_build2 (MULT_EXPR, stype, off, > > > > > > > > > > + build_int_cst (stype, vf)); > > > > > > > > > > + } > > > > > > > > > > + else > > > > > > > > > > + off =3D fold_build2 (MULT_EXPR, stype, > > > > > > > > > > + fold_convert (stype, niters), > step_expr); > > > > > > > > > > + > > > > > > > > > > if (POINTER_TYPE_P (type)) > > > > > > > > > > ni =3D fold_build_pointer_plus (init_expr, off); > > > > > > > > > > else > > > > > > > > > > @@ -2238,6 +2286,8 @@ vect_update_ivs_after_vectorizer > > > > > > > > > > (loop_vec_info > > > > > > > > > loop_vinfo, > > > > > > > > > > /* Don't bother call vect_peel_nonlinear_iv_init= . */ > > > > > > > > > > else if (induction_type =3D=3D vect_step_op_neg) > > > > > > > > > > ni =3D init_expr; > > > > > > > > > > + else if (restart_loop) > > > > > > > > > > + continue; > > > > > > > > > > > > > > > > > > This looks all a bit complicated - why wouldn't we > > > > > > > > > simply always use the PHI result when 'restart_loop'? > > > > > > > > > Isn't that the correct old start value in > > > > > > > all cases? > > > > > > > > > > > > > > > > > > > else > > > > > > > > > > ni =3D vect_peel_nonlinear_iv_init (&stmts, init_expr= , > > > > > > > > > > niters, step_expr, @@ - > > > 2245,9 +2295,20 @@ > > > > > > > > > > vect_update_ivs_after_vectorizer > > > > > > > > > (loop_vec_info > > > > > > > > > > loop_vinfo, > > > > > > > > > > > > > > > > > > > > var =3D create_tmp_var (type, "tmp"); > > > > > > > > > > > > > > > > > > > > - last_gsi =3D gsi_last_bb (exit_bb); > > > > > > > > > > gimple_seq new_stmts =3D NULL; > > > > > > > > > > ni_name =3D force_gimple_operand (ni, &new_stmts= , > > > > > > > > > > false, var); > > > > > > > > > > + > > > > > > > > > > + /* For non-main exit create an intermediat edge > > > > > > > > > > + to get any > > > > > updated iv > > > > > > > > > > + calculations. */ > > > > > > > > > > + if (needs_interm_block > > > > > > > > > > + && !iv_block > > > > > > > > > > + && (!gimple_seq_empty_p (stmts) || > > > > > > > > > > +!gimple_seq_empty_p > > > > > > > > > (new_stmts))) > > > > > > > > > > + { > > > > > > > > > > + iv_block =3D split_edge (update_e); > > > > > > > > > > + update_e =3D single_succ_edge (update_e->dest); > > > > > > > > > > + last_gsi =3D gsi_last_bb (iv_block); > > > > > > > > > > + } > > > > > > > > > > + > > > > > > > > > > /* Exit_bb shouldn't be empty. */ > > > > > > > > > > if (!gsi_end_p (last_gsi)) > > > > > > > > > > { > > > > > > > > > > @@ -3342,8 +3403,26 @@ vect_do_peeling (loop_vec_info > > > > > > > > > > loop_vinfo, tree > > > > > > > > > niters, tree nitersm1, > > > > > > > > > > niters_vector_mult_vf steps. */ > > > > > > > > > > gcc_checking_assert (vect_can_advance_ivs_p > (loop_vinfo)); > > > > > > > > > > update_e =3D skip_vector ? e : loop_preheader_ed= ge (epilog); > > > > > > > > > > - vect_update_ivs_after_vectorizer (loop_vinfo, > > > > > niters_vector_mult_vf, > > > > > > > > > > - update_e); > > > > > > > > > > + if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)) > > > > > > > > > > + update_e =3D single_succ_edge (e->dest); > > > > > > > > > > + bool inversed_iv > > > > > > > > > > + =3D !vect_is_loop_exit_latch_pred > (LOOP_VINFO_IV_EXIT > > > > > (loop_vinfo), > > > > > > > > > > + LOOP_VINFO_LOOP > > > > > (loop_vinfo)); > > > > > > > > > > > > > > > > > > You are computing this here and in > > > vect_update_ivs_after_vectorizer? > > > > > > > > > > > > > > > > > > > + > > > > > > > > > > + /* Update the main exit first. */ > > > > > > > > > > + vect_update_ivs_after_vectorizer (loop_vinfo, > > > > > > > > > > + vf, > > > > > > > niters_vector_mult_vf, > > > > > > > > > > + update_e, > inversed_iv); > > > > > > > > > > + > > > > > > > > > > + /* And then update the early exits. */ > > > > > > > > > > + for (auto exit : get_loop_exit_edges (loop)) > > > > > > > > > > + { > > > > > > > > > > + if (exit =3D=3D LOOP_VINFO_IV_EXIT (loop_vinfo)) > > > > > > > > > > + continue; > > > > > > > > > > + > > > > > > > > > > + vect_update_ivs_after_vectorizer (loop_vinfo, vf, > > > > > > > > > > + > niters_vector_mult_vf, > > > > > > > > > > + exit, true); > > > > > > > > > > > > > > > > > > ... why does the same not work here? Wouldn't the > > > > > > > > > proper condition be !dominated_by_p (CDI_DOMINATORS, > > > > > > > > > exit->src, LOOP_VINFO_IV_EXIT > > > > > > > > > (loop_vinfo)->src) or similar? That is, whether the > > > > > > > > > exit is at or after the main IV exit? (consider having > > > > > > > > > two) > > > > > > > > > > > > > > > > > > > + } > > > > > > > > > > > > > > > > > > > > if (skip_epilog) > > > > > > > > > > { > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > Richard Biener SUSE Software Solutions > > > > > > > Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; > > > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, > > > > > > > AG > > > > > > > Nuernberg) > > > > > > > > > > > > > > > > -- > > > > > Richard Biener SUSE Software Solutions > > > > > Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG > > > > > Nuernberg) > > > > > > > > > > -- > > > Richard Biener > > > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 > > > Nuernberg, Germany; > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG > > > Nuernberg) > > >=20 > -- > Richard Biener > SUSE Software Solutions Germany GmbH, > Frankenstrasse 146, 90461 Nuernberg, Germany; > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG > Nuernberg)