From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR01-DB5-obe.outbound.protection.outlook.com (mail-db5eur01on2053.outbound.protection.outlook.com [40.107.15.53]) by sourceware.org (Postfix) with ESMTPS id 3B49B3858D38 for ; Fri, 17 Nov 2023 10:41:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 3B49B3858D38 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 3B49B3858D38 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=40.107.15.53 ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1700217665; cv=pass; b=MuPAds9RtK7CwRsMeIPa0bG1rbVE6s4yNqJylIVM2UUe7GQvHUeR1AcdcDEx9C3Ve2g3dK5VKA35VZelxz2UdBYDGEqulq1U3EqoUAjFy70r2khToN5MguIuWlZQLPELxNZFpwGmYfDSb3aQ8dbneHb65B3wdVXmbzhoHRcO9Wg= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1700217665; c=relaxed/simple; bh=WmsX2MOc+vpQECE5PLz9Kl4SvCX3DfIIVOuxDYrc/6c=; h=DKIM-Signature:DKIM-Signature:From:To:Subject:Date:Message-ID: MIME-Version; b=qWcpW8eoNOxJdUmPDoefjkerGnKwWlE1MyapL1tQl6tFvoVqeXauBbBYBdzvQi96GFpgwXfpL1A5fpQjYYw2etLgEW6uDw5gIFEpumNfehBGrlDGlYRGE7q3D08KfvQufJoJLwueBYdgpf2xi67dW9Cz0d05DkdDGRcAPFIl1WA= ARC-Authentication-Results: i=3; server2.sourceware.org ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=aCZ4sMk++4y0p9mtqNeGuhWbp1l98n44C+BzpSHRGf89vhRr/LY17OqQYtyfH+2JelIDf1arChKzob6voKdfJS/5GGMMl5pesBNBdybfmrwF8SPhtlVsGrBpnqd2RHVN5b8mJh4YJuQfTsgaOEz1V4xq7csr/bWH89qdnNAGRdqlbkt9OqGgN4a/zrA8BxtPbf8C21kcE4xSMDUYsfnNopclzHtKcWb7S7AkttrRnQ/1ttnuyPJbTy17gNzbeLy4bJ+TnLTMBoTVhZAAG1n164nvBfVEK6HSeeFI2iWKeJAKdX/UjoAR1h98UR0gT9a+fvU42lGaixORQUiumAa3FQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=ddib17cith3IwO7kXYydP7iXqouEEUEiIZVa9NVrjjQ=; b=jnUMaJXASroOsB1ooQN4j322uq/j01m+ez5bcwcz5SXgaq1+eNrdxeR+K2BRa/aWai9EosMIMomC/LB4SpGQ0y9FeFfhHKu3wWvkaoMWLIyoLF3TecivqPG3aND8nwOspNCD38oHrMt3Bo7ONyz0cGtvE8vFWxXFS1T8+CwWNE1tk3mywOKuXzLPI71uE6GfvQ/vEm5RoCdwU7JQ3hcfLGzt8Sr2ixJpIq13pIbvKOyaii4ouCUro69drXfafd5mncGkz1t1YzxGLDDaNmBQXr6fb83/2jIha0Jj9Thn58HKvvajIYrQ9r/YzvnoJaOSMiz0OhgQw7AZ0gH1zG5tkA== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=ddib17cith3IwO7kXYydP7iXqouEEUEiIZVa9NVrjjQ=; b=eQ3Wa8NIxp6+lbpDdIAUDDUmU5I+NbAeFtzqbKx5mBeVolCemfWkxsp+sVcP9EgG34244V4cLgG4mzfGLJNU8RqUHXrQCIkxI0tEG8FKkogF+ikpyhts64rUip22FflWGjNkesZ4JrSrLIUBfXbyJSn8p1i3KhCnJgh4lHWApbM= Received: from AM0PR03CA0028.eurprd03.prod.outlook.com (2603:10a6:208:14::41) by GV2PR08MB9951.eurprd08.prod.outlook.com (2603:10a6:150:b8::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7002.21; Fri, 17 Nov 2023 10:40:56 +0000 Received: from AM4PEPF00025F97.EURPRD83.prod.outlook.com (2603:10a6:208:14:cafe::eb) by AM0PR03CA0028.outlook.office365.com (2603:10a6:208:14::41) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7002.23 via Frontend Transport; Fri, 17 Nov 2023 10:40:56 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM4PEPF00025F97.mail.protection.outlook.com (10.167.16.6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7025.11 via Frontend Transport; Fri, 17 Nov 2023 10:40:56 +0000 Received: ("Tessian outbound 26ee1d40577c:v228"); Fri, 17 Nov 2023 10:40:56 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 555f07fb29a4b3ca X-CR-MTA-TID: 64aa7808 Received: from ef5ba321c491.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 4D3254A2-3B57-4AF2-A123-EAEAD6092098.1; Fri, 17 Nov 2023 10:40:49 +0000 Received: from EUR02-DB5-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id ef5ba321c491.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Fri, 17 Nov 2023 10:40:49 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=D+ZG2E42PeSUq4BhI1a10/Zd9O6ysarr/DHnqbYHn6RcYT9dMAOOYwPX6ArjsD/q4CDgzoQtpquml9UzzvJpNSua4kxNOu1EL5nr03NogTI5BRWlwzmbaDUeR98YJRJetO5TGmBnK5z8OZxdzQ9rNBUhwEt74xi1MES4844kYuA+5gwLNSeE2MxIFCTQhoI5lGBuTXDIat6guhGh6QTOa1GV8NUeYxkNab1irg77hLavuHVGsXa0ziX8bwCscNBh8YCzAoCbbTrplLt3P/9DWOxU5WFBTjalyjSGh3ycFgEbikoR/nt0+MuB9Zk/kML6HNqkSPGQ7SdY4N/s+hw9WA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=ddib17cith3IwO7kXYydP7iXqouEEUEiIZVa9NVrjjQ=; b=N/2RrFRmZDzj1ed9XUlo6RzZZIOJP3IIJsuVPXGw6KH14l0JtzH9UvKyrGINR4P73LLnjhgftRebvXNAGDcimpquzuA2FwcojL55UbLVCrOLdDMeen6ckYT++psd7Tbwgu3EuPb0AWQ26KHtG7jx7PN3B0LqF0+WADK4oRrauV31nM0jgcz/OZQ3kC06ibfYUEtKTJuh2gFQ6MK2oNHkFmbXmoHp0kM46cf2g8a4Mf2y11G7FyL8NHHWqOm9uOJvfhKkWVzthZXVN1qkNBEmJyB2yTImQsWXGKOpBIUm+EiV7WejROTc83o/GoDOmNjz0dYdKq/T6QZfz6z286eafw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=ddib17cith3IwO7kXYydP7iXqouEEUEiIZVa9NVrjjQ=; b=eQ3Wa8NIxp6+lbpDdIAUDDUmU5I+NbAeFtzqbKx5mBeVolCemfWkxsp+sVcP9EgG34244V4cLgG4mzfGLJNU8RqUHXrQCIkxI0tEG8FKkogF+ikpyhts64rUip22FflWGjNkesZ4JrSrLIUBfXbyJSn8p1i3KhCnJgh4lHWApbM= Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by DB3PR08MB9109.eurprd08.prod.outlook.com (2603:10a6:10:430::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7002.23; Fri, 17 Nov 2023 10:40:46 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::9679:2ab0:99c6:54a3]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::9679:2ab0:99c6:54a3%6]) with mapi id 15.20.7002.022; Fri, 17 Nov 2023 10:40:45 +0000 From: Tamar Christina To: Richard Biener CC: "gcc-patches@gcc.gnu.org" , nd , "jlaw@ventanamicro.com" Subject: RE: [PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits Thread-Topic: [PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits Thread-Index: AQHaEIRUWLSyh++pSEG+iB4lA+c+BLB6jLQwgADZ7QCAAAD4UIAABQiAgAALGkCAAVnUgIAAAkUAgAAK9ACAAAX/4IAAC6cAgAACTBCAAA/GgIAACPOwgAAC2ICAAA3PkIAAN+aQgAENK8A= Date: Fri, 17 Nov 2023 10:40:45 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-traffictypediagnostic: VI1PR08MB5325:EE_|DB3PR08MB9109:EE_|AM4PEPF00025F97:EE_|GV2PR08MB9951:EE_ X-MS-Office365-Filtering-Correlation-Id: ab0efc1a-6a8b-4712-fa4a-08dbe759ae46 x-checkrecipientrouted: true nodisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: iQYr6fQlgXnhF9gHdBvFVE8TFAMtO+YpMTUb7ugyvSjd4jfG3ajX8x/aQh/cXimUN3OTP/agqoHH+yOOWrcaaL1tzEy68mjp9uTRITu/9hCB1eZLSVJT4YwrdjVFzahUuLWOK4BoMspJ7C5cE3I2tgXYZidPI9YKimjoiMdM++hLkLMcpD60+ehMjMpv9iIFr4GBhHpEUb3RagxEszVlmkJwigaWZBeMkxLc4j5AzDdpPtIAg8iN7rqfIQK2mmXPl+JCEy7GReetHpvyWdvr8+z4HmIEEu8s+8JeTRenWZS9lkLFID92D/G49pnQfoj4Dmr+DYf+ZTz9Meoq+agIuW39AqPY8k47PqbCMcFGAxN20Hm1yy/RNHOtKfqFu4lShDnPeCRMvc/ABth7pBEvJPcPBaoz7tgyGP/w6JszcvLNshvKj3c2v7xz83/GDHmrRiOTBqdEP22O3bFs7SdYFEmBwqNwUiNEVplXnvD2NJzXg4Yx1M8HVqTAptMj9QjzGPai7MWTrGzY9W3vyZ7kQGBEJhGchnesq5A1yVeB+GB+VHWUuJh8MpFuLxAbUsgxuQvuSQkxkwAOQW7zbPy845rVOujS6qXvdEYzTR/EYdTAVv8DbbKAFrIJqIkhcWVj X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:VI1PR08MB5325.eurprd08.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230031)(366004)(376002)(136003)(346002)(39850400004)(396003)(230922051799003)(451199024)(1800799009)(186009)(64100799003)(55016003)(52536014)(38070700009)(71200400001)(6506007)(7696005)(9686003)(478600001)(4326008)(5660300002)(8936002)(8676002)(26005)(86362001)(2906002)(38100700002)(30864003)(83380400001)(15650500001)(41300700001)(122000001)(33656002)(316002)(6916009)(66476007)(66446008)(54906003)(66556008)(66946007)(64756008)(76116006)(559001)(579004);DIR:OUT;SFP:1101; Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB3PR08MB9109 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM4PEPF00025F97.EURPRD83.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 567f79ef-646e-4a46-dbbf-08dbe759a7c7 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: J7fQ3uBTKOQtbIOtDoTFDqmAY82EekJmQ22v9XCv3ymnLM0tSGcN5sAzKz1w4ZdEsmetPSkXMwTAxykJG9UIG9ZZ+rseqzlG5Hiq6PbLq9XJS2zw7Ix4xHfXSFWtMiR0bhobUgXb65vquUg5HJr8j6Qgi1qNfADYQzbdxfkYTVZqML6j1A+U/wsp8Il5SrDcG8f3HTZnLR4dkijLqY5KVQkHyMf+duiCgXCg4iML1uCiNjiIv17rMW6uGliUOkixnvhmPD3tz5rkm3ha8Tb98Qz4M3NdVPmEp3tjpqLNjH/VDxMTfTRUfG3FUvd+fPvsx+qlEbmlfvz+kHKInr32BAq6+4PTm0hzfrbZccyLrjoIBxMttYEVXCdR3Mhvp0z+meXuEcMZ7jrKkDuhrXIVbBrJiRg/5Tj2N+PwuzOWsNqswL3JEG3DPw83B72fT1JE+xyElqLuFvGgkEAMSbl3NALPsOFP+6xia96j8AhFA2A509VS1epq+FL3a5pOF+vfK/Yt2SYHoQidZkIHFGE2uVS2v6GLi+P2I/MB2AROTE+e9GAGFvSlwPjK68VDNwTDWG2s9My4nEH1zjv+b4Itykk/KIK/YqZXxa9A3cCl9qAd82AzlP1WlAnx9RyaGdA5lHofNsEajzY2BmAjIEJBFypfwgcMJi25LCcRLlrI3vBhRz1bV5irZYFQrTi1Wk3FFb/zzS88aYz/Gotiy0XXmg== X-Forefront-Antispam-Report: CIP:63.35.35.123;CTRY:;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:64aa7808-outbound-1.mta.getcheckrecipient.com;PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com;CAT:NONE;SFS:(13230031)(4636009)(346002)(376002)(39860400002)(396003)(136003)(230922051799003)(82310400011)(186009)(64100799003)(1800799009)(451199024)(46966006)(40470700004)(36840700001)(33656002)(86362001)(5660300002)(41300700001)(30864003)(15650500001)(2906002)(8936002)(4326008)(8676002)(6862004)(54906003)(70586007)(70206006)(316002)(55016003)(47076005)(36860700001)(478600001)(52536014)(81166007)(40480700001)(356005)(83380400001)(107886003)(7696005)(336012)(82740400003)(6506007)(26005)(40460700003)(9686003);DIR:OUT;SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 17 Nov 2023 10:40:56.4424 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: ab0efc1a-6a8b-4712-fa4a-08dbe759ae46 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d;Ip=[63.35.35.123];Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM4PEPF00025F97.EURPRD83.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: GV2PR08MB9951 X-Spam-Status: No, score=-6.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,FORGED_SPF_HELO,KAM_DMARC_NONE,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE,UNPARSEABLE_RELAY autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: > > > > > Yes, but that only works for the inductions marked so. We'd > > > > > need to mark the others as well, but only for the early exits. > > > > > > > > > > > although I don't understand why we use the scalar count, I > > > > > > suppose the reasoning is that we don't really want to keep it > > > > > > around, and referencing > > > > > it forces it to be kept? > > > > > > > > > > Referencing it will cause the scalar compute to be retained, but > > > > > since we do not adjust the scalar compute during vectorization > > > > > (but expect it to be dead) the scalar compute will compute the > > > > > wrong thing (as shown by the reduction example - I suspect > > > > > inductions will suffer > > > from the same problem). > > > > > > > > > > > At the moment it just does `init + (final - init) * vf` which i= s correct no? > > > > > > > > > > The issue is that 'final' is not computed correctly in the > > > > > vectorized loop. This formula might work for affine evolutions o= f > course. > > > > > > > > > > Extracting the correct value from the vectorized induction would > > > > > be the preferred solution. > > > > > > > > Ok, so I should be able to just mark IVs as live during > > > > process_use if there are multiple exits right? Since it's just > > > > gonna be unused on the main exit since we use niters? > > > > > > > > Because since it's the PHI inside the loop that needs to be marked > > > > live I can't just do it for a specific exits no? > > > > > > > > If I create a copy of the PHI node during peeling for use in early > > > > exits and mark it live it won't work no? > > > > > > I guess I wouldn't actually mark it STMT_VINFO_LIVE_P but somehow > > > arrange vectorizable_live_operation to be called, possibly adding a > > > edge argument to that as well. > > > > > > Maybe the thing to do for the moment is to reject vectorization with > > > early breaks if there's any (non-STMT_VINFO_LIVE_P?) induction or > > > reduction besides the main counting IV one you can already special-ca= se? > > > > Ok so I did a quick hack with: > > > > if (!virtual_operand_p (PHI_RESULT (phi)) > > && !STMT_VINFO_LIVE_P (phi_info)) > > { > > use_operand_p use_p; > > imm_use_iterator imm_iter; > > bool non_exit_use =3D false; > > FOR_EACH_IMM_USE_FAST (use_p, imm_iter, PHI_RESULT (phi)) > > if (!flow_bb_inside_loop_p (loop, gimple_bb (USE_STMT (use_p)))) > > for (auto exit : get_loop_exit_edges (loop)) > > { > > if (exit =3D=3D LOOP_VINFO_IV_EXIT (loop_vinfo)) > > continue; > > > > if (gimple_bb (USE_STMT (use_p)) !=3D exit->dest) > > { > > non_exit_use =3D true; > > goto fail; > > } > > } > > fail: > > if (non_exit_use) > > return false; > > } > > > > And it does seem to still allow all the cases I want. I've placed > > this in vect_can_advance_ivs_p. > > > > Does this cover what you meant? > > >=20 > Ok, I've rewritten this in a nicer form, but doesn't this mean we now blo= ck any > loop there the index is not live? > i.e. we block such simple loops like >=20 > #ifndef N > #define N 800 > #endif > unsigned vect_a[N]; >=20 > unsigned test4(unsigned x) > { > unsigned ret =3D 0; > for (int i =3D 0; i < N; i++) > { > if (vect_a[i]*2 !=3D x) > break; > vect_a[i] =3D x; > } > return ret; > } >=20 > because it does a simple `break`. If I force it to be live it works, but= then I need > to differentiate between the counter and the IV. >=20 > # i_15 =3D PHI > # ivtmp_7 =3D PHI >=20 > I seems like if we don't want to keep i_15 around (at the moment it will = be kept > because of its usage in the exit block it won't be DCEd) then we need to = mark it > live early during analysis. >=20 > Most likely if we do this I don't need to care about the "inverted" workf= low > here at all. What do you think? >=20 > Yes that doesn't work for SLP, but I don't think I can get SLP working in= the > remaining time anyway.. >=20 > I'll fix reduction and multiple exit live values in the mean time. >=20 Ok, so I currently have the following solution. Let me know if you agree w= ith it and I'll polish it up today and tomorrow and respin things. 1. During vect_update_ivs_after_vectorizer we no longer touch any PHIs asid= e from Just updating IVtemps with the expected remaining iteration count. 2. During vect_transform_loop after vectorizing any induction or reduction = I call vectorizable_live_operation For any phi node that still has any usages in the early exit merge blo= ck. 3. vectorizable_live_operation is taught to have to materialize the same PH= I in multiple exits 4. vectorizable_reduction or maybe vect_create_epilog_for_reduction need to= be modified to for early exits materialize The previous iteration value. This seems to work and produces now for the simple loop above: .L2: str q27, [x1, x3] str q29, [x2, x1] add x1, x1, 16 cmp x1, 3200 beq .L11 .L4: ldr q31, [x2, x1] mov v28.16b, v30.16b add v30.4s, v30.4s, v26.4s shl v31.4s, v31.4s, 1 add v27.4s, v28.4s, v29.4s cmeq v31.4s, v31.4s, v29.4s not v31.16b, v31.16b umaxp v31.4s, v31.4s, v31.4s fmov x4, d31 cbz x4, .L2 fmov w1, s28 mov w6, 4 = = = .L3: so now the scalar index is no longer kept and it reduces the value from the= vector IV in the exit: fmov w1, s28 Does this work as you expected? Thanks, Tamar > Thanks, > Tamar > > Thanks, > > Tamar > > > > > > > > Richard. > > > > > > > Tamar > > > > > > > > > > > Also you missed the question below about how to avoid the > > > > > > creation of the block, You ok with changing that? > > > > > > > > > > > > Thanks, > > > > > > Tamar > > > > > > > > > > > > > Or for now disable early-break for inductions that are not > > > > > > > the main exit control IV (in vect_can_advance_ivs_p)? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > It seems your change handles different kinds of > > > > > > > > > > > inductions > > > > > differently. > > > > > > > > > > > Specifically > > > > > > > > > > > > > > > > > > > > > > bool ivtemp =3D gimple_cond_lhs (cond) =3D=3D i= v_var; > > > > > > > > > > > if (restart_loop && ivtemp) > > > > > > > > > > > { > > > > > > > > > > > type =3D TREE_TYPE (gimple_phi_result (phi)= ); > > > > > > > > > > > ni =3D build_int_cst (type, vf); > > > > > > > > > > > if (inversed_iv) > > > > > > > > > > > ni =3D fold_build2 (MINUS_EXPR, type, ni, > > > > > > > > > > > fold_convert (type, ste= p_expr)); > > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > it looks like for the exit test IV we use either 'VF'= or 'VF - step' > > > > > > > > > > > as the new value. That seems to be very odd special > > > > > > > > > > > casing for unknown reasons. And while you adjust > > > > > > > > > > > vec_step_op_add, you don't adjust > > > > > > > > > > > vect_peel_nonlinear_iv_init (maybe not supported - > > > > > > > > > > > better assert > > > > > > > > > here). > > > > > > > > > > > > > > > > > > > > The VF case is for a normal "non-inverted" loop, where > > > > > > > > > > if you take an early exit you know that you have to do > > > > > > > > > > at most VF > > > iterations. > > > > > > > > > > The VF > > > > > > > > > > - step is to account for the inverted loop control > > > > > > > > > > flow where you exit after adjusting the IV already by += step. > > > > > > > > > > > > > > > > > > But doesn't that assume the IV counts from niter to zero? > > > > > > > > > I don't see this special case is actually necessary, no? > > > > > > > > > > > > > > > > > > > > > > > > > I needed it because otherwise the scalar loop iterates one > > > > > > > > iteration too little So I got a miscompile with the > > > > > > > > inverter loop stuff. I'll look at it again perhaps It can = be solved > differently. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Peeling doesn't matter here, since you know you were > > > > > > > > > > able to do a vector iteration so it's safe to do VF ite= rations. > > > > > > > > > > So having peeled doesn't affect the remaining iters cou= nt. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Also the vec_step_op_add case will keep the original > > > > > > > > > > > scalar IV live even when it is a vectorized induction= . > > > > > > > > > > > The code recomputing the value from scratch avoids th= is. > > > > > > > > > > > > > > > > > > > > > > /* For non-main exit create an intermediat > > > > > > > > > > > edge to get any updated > > > > > > > iv > > > > > > > > > > > calculations. */ > > > > > > > > > > > if (needs_interm_block > > > > > > > > > > > && !iv_block > > > > > > > > > > > && (!gimple_seq_empty_p (stmts) || > > > > > > > > > > > !gimple_seq_empty_p > > > > > > > > > > > (new_stmts))) > > > > > > > > > > > { > > > > > > > > > > > iv_block =3D split_edge (update_e); > > > > > > > > > > > update_e =3D single_succ_edge (update_e->de= st); > > > > > > > > > > > last_gsi =3D gsi_last_bb (iv_block); > > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > this is also odd, can we adjust the API instead? I > > > > > > > > > > > suppose this is because your computation uses the > > > > > > > > > > > original loop IV, if you based the computation off > > > > > > > > > > > the initial value only this might not be > > > > > > > necessary? > > > > > > > > > > > > > > > > > > > > No, on the main exit the code updates the value in the > > > > > > > > > > loop header and puts the Calculation in the merge block= . > > > > > > > > > > This works because it only needs to consume PHI nodes > > > > > > > > > > in the merge block and things like niters are > > > > > > > > > adjusted in the guard block. > > > > > > > > > > > > > > > > > > > > For an early exit, we don't have a guard block, only > > > > > > > > > > the merge > > > block. > > > > > > > > > > We have to update the PHI nodes in that block, but > > > > > > > > > > can't do so since you can't produce a value and > > > > > > > > > > consume it in a PHI node in the same > > > > > > > BB. > > > > > > > > > > So we need to create the block to put the values in > > > > > > > > > > for use in the merge block. Because there's no "guard" > > > > > > > > > > block for early > > > exits. > > > > > > > > > > > > > > > > > > ? then compute niters in that block as well. > > > > > > > > > > > > > > > > We can't since it'll not be reachable through the right edg= e. > > > > > > > > What we can do if you want is slightly change peeling, we > > > > > > > > currently peel > > > > > as: > > > > > > > > > > > > > > > > \ \ / > > > > > > > > E1 E2 Normal exit > > > > > > > > \ | | > > > > > > > > \ | Guard > > > > > > > > \ | | > > > > > > > > Merge block > > > > > > > > | > > > > > > > > Pre Header > > > > > > > > > > > > > > > > If we instead peel as: > > > > > > > > > > > > > > > > > > > > > > > > \ \ / > > > > > > > > E1 E2 Normal exit > > > > > > > > \ | | > > > > > > > > Exit join Guard > > > > > > > > \ | | > > > > > > > > Merge block > > > > > > > > | > > > > > > > > Pre Header > > > > > > > > > > > > > > > > We can use the exit join block. This would also mean > > > > > > > > vect_update_ivs_after_vectorizer Doesn't need to iterate > > > > > > > > over all exits and only really needs to adjust the phi > > > > > > > > nodes Coming out of the exit join > > > > > > > and guard block. > > > > > > > > > > > > > > > > Does this work for you? > > > > > > > > > > Yeah, I think that would work. But I'd like to sort out the > > > > > correctness details of the IV update itself before sorting out > > > > > this code > > > placement detail. > > > > > > > > > > Richard. > > > > > > > > > > > > > Thanks, > > > > > > > > Tamar > > > > > > > > > > > > > > > > > > > The API can be adjusted by always creating the empty > > > > > > > > > > block either during > > > > > > > > > peeling. > > > > > > > > > > That would prevent us from having to do anything specia= l here. > > > > > > > > > > Would that work better? Or I can do it in the loop > > > > > > > > > > that iterates over the exits to before the call to > > > > > > > > > > vect_update_ivs_after_vectorizer, which I think > > > > > > > > > might be more consistent. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > That said, I wonder why we cannot simply pass in an > > > > > > > > > > > adjusted niter which would be niters_vector_mult_vf > > > > > > > > > > > - vf and be done with > > > > > that? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > We can ofcourse not have this and recompute it from > > > > > > > > > > niters itself, however this does affect the epilog code= layout. > > > > > > > > > > Particularly knowing the static number if iterations > > > > > > > > > > left causes it to usually unroll the loop and share > > > > > > > > > > some of the computations. i.e. the scalar code is > > > > > > > > > > often more > > > > > > > > > efficient. > > > > > > > > > > > > > > > > > > > > The computation would be niters_vector_mult_vf - > > > > > > > > > > iters_done * vf, since the value put Here is the > > > > > > > > > > remaining iteration > > > count. > > > > > > > > > > It's static for early > > > > > > > > > exits. > > > > > > > > > > > > > > > > > > Well, it might be "static" in that it doesn't really > > > > > > > > > matter what you use for the epilog main IV initial value > > > > > > > > > as long as you are sure you're not going to take that > > > > > > > > > exit as you are sure we're going to take one of the > > > > > > > > > early exits. So yeah, the special code is probably OK, > > > > > > > > > but it needs a better comment and as said the structure > > > > > > > > > of > > > > > > > vect_update_ivs_after_vectorizer is a bit hard to follow now. > > > > > > > > > > > > > > > > > > As said an important part for optimization is to not > > > > > > > > > keep the scalar IVs live in the vector loop. > > > > > > > > > > > > > > > > > > > But can do whatever you prefer here. Let me know what > > > > > > > > > > you prefer for the > > > > > > > > > above. > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > Tamar > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > Richard. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > > > > Tamar > > > > > > > > > > > > > > > > > > > > > > > > > > > It has to do this since you have to perform > > > > > > > > > > > > > > the side effects for the non-matching elements = still. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > > > > > > Tamar > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > + if (STMT_VINFO_LIVE_P (phi_info)) > > > > > > > > > > > > > > > > + continue; > > > > > > > > > > > > > > > > + > > > > > > > > > > > > > > > > + /* For early break the final loop I= V is: > > > > > > > > > > > > > > > > + init + (final - init) * vf which takes > > > > > > > > > > > > > > > > +into account > > > > > > > peeling > > > > > > > > > > > > > > > > + values and non-single steps. The > > main > > > > > > > > > > > > > > > > +exit > > > > > > > can > > > > > > > > > > > > > > > > +use > > > > > > > > > > > niters > > > > > > > > > > > > > > > > + since if you exit from the main exit > > > > > > > > > > > > > > > > +you've > > > > > > > done > > > > > > > > > > > > > > > > +all > > > > > > > > > > > vector > > > > > > > > > > > > > > > > + iterations. For an early exit we > > > > > > > > > > > > > > > > +don't know > > > > > > > when > > > > > > > > > > > > > > > > +we > > > > > > > > > > > exit > > > > > > > > > > > > > > > > +so > > > > > > > > > > > > > > > we > > > > > > > > > > > > > > > > + must re-calculate this on the exit. */ > > > > > > > > > > > > > > > > + tree start_expr =3D gimple_phi_resu= lt (phi); > > > > > > > > > > > > > > > > + off =3D fold_build2 (MINUS_EXPR, st= ype, > > > > > > > > > > > > > > > > + fold_convert (stype, > > > > > > > start_expr), > > > > > > > > > > > > > > > > + fold_convert (stype, > > > > > > > init_expr)); > > > > > > > > > > > > > > > > + /* Now adjust for VF to get the > > > > > > > > > > > > > > > > +final > > iteration value. > > > > > > > */ > > > > > > > > > > > > > > > > + off =3D fold_build2 (MULT_EXPR, sty= pe, off, > > > > > > > > > > > > > > > > + build_int_cst (stype, > > vf)); > > > > > > > > > > > > > > > > + } > > > > > > > > > > > > > > > > + else > > > > > > > > > > > > > > > > + off =3D fold_build2 (MULT_EXPR, stype= , > > > > > > > > > > > > > > > > + fold_convert (stype, > > niters), > > > > > > > step_expr); > > > > > > > > > > > > > > > > + > > > > > > > > > > > > > > > > if (POINTER_TYPE_P (type)) > > > > > > > > > > > > > > > > ni =3D fold_build_pointer_plus (init_= expr, off); > > > > > > > > > > > > > > > > else > > > > > > > > > > > > > > > > @@ -2238,6 +2286,8 @@ > > > > > > > > > > > > > > > > vect_update_ivs_after_vectorizer > > > > > > > > > > > > > > > > (loop_vec_info > > > > > > > > > > > > > > > loop_vinfo, > > > > > > > > > > > > > > > > /* Don't bother call vect_peel_nonli= near_iv_init. > */ > > > > > > > > > > > > > > > > else if (induction_type =3D=3D vect_= step_op_neg) > > > > > > > > > > > > > > > > ni =3D init_expr; > > > > > > > > > > > > > > > > + else if (restart_loop) > > > > > > > > > > > > > > > > + continue; > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > This looks all a bit complicated - why > > > > > > > > > > > > > > > wouldn't we simply always use the PHI result > > > > > > > > > > > > > > > when > > 'restart_loop'? > > > > > > > > > > > > > > > Isn't that the correct old start value in > > > > > > > > > > > > > all cases? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > else > > > > > > > > > > > > > > > > ni =3D vect_peel_nonlinear_iv_init > > > > > > > > > > > > > > > > (&stmts, > > init_expr, > > > > > > > > > > > > > > > > niters, > > step_expr, > > > > > @@ - > > > > > > > > > 2245,9 +2295,20 @@ > > > > > > > > > > > > > > > > vect_update_ivs_after_vectorizer > > > > > > > > > > > > > > > (loop_vec_info > > > > > > > > > > > > > > > > loop_vinfo, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > var =3D create_tmp_var (type, "tmp")= ; > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > - last_gsi =3D gsi_last_bb (exit_bb); > > > > > > > > > > > > > > > > gimple_seq new_stmts =3D NULL; > > > > > > > > > > > > > > > > ni_name =3D force_gimple_operand (ni= , > > > > > > > > > > > > > > > > &new_stmts, false, var); > > > > > > > > > > > > > > > > + > > > > > > > > > > > > > > > > + /* For non-main exit create an > > > > > > > > > > > > > > > > + intermediat edge to get any > > > > > > > > > > > updated iv > > > > > > > > > > > > > > > > + calculations. */ > > > > > > > > > > > > > > > > + if (needs_interm_block > > > > > > > > > > > > > > > > + && !iv_block > > > > > > > > > > > > > > > > + && (!gimple_seq_empty_p (stmts) || > > > > > > > > > > > > > > > > +!gimple_seq_empty_p > > > > > > > > > > > > > > > (new_stmts))) > > > > > > > > > > > > > > > > + { > > > > > > > > > > > > > > > > + iv_block =3D split_edge (update_e); > > > > > > > > > > > > > > > > + update_e =3D single_succ_edge (update_e= - > > >dest); > > > > > > > > > > > > > > > > + last_gsi =3D gsi_last_bb (iv_block); > > > > > > > > > > > > > > > > + } > > > > > > > > > > > > > > > > + > > > > > > > > > > > > > > > > /* Exit_bb shouldn't be empty. */ > > > > > > > > > > > > > > > > if (!gsi_end_p (last_gsi)) > > > > > > > > > > > > > > > > { > > > > > > > > > > > > > > > > @@ -3342,8 +3403,26 @@ vect_do_peeling > > > > > > > > > > > > > > > > (loop_vec_info loop_vinfo, tree > > > > > > > > > > > > > > > niters, tree nitersm1, > > > > > > > > > > > > > > > > niters_vector_mult_vf steps. */ > > > > > > > > > > > > > > > > gcc_checking_assert > > > > > > > > > > > > > > > > (vect_can_advance_ivs_p > > > > > > > (loop_vinfo)); > > > > > > > > > > > > > > > > update_e =3D skip_vector ? e : > > > > > > > > > > > > > > > > loop_preheader_edge > > > > > (epilog); > > > > > > > > > > > > > > > > - vect_update_ivs_after_vectorizer (lo= op_vinfo, > > > > > > > > > > > niters_vector_mult_vf, > > > > > > > > > > > > > > > > - update_e); > > > > > > > > > > > > > > > > + if (LOOP_VINFO_EARLY_BREAKS (loop_vi= nfo)) > > > > > > > > > > > > > > > > + update_e =3D single_succ_edge (e->dest); > > > > > > > > > > > > > > > > + bool inversed_iv > > > > > > > > > > > > > > > > + =3D !vect_is_loop_exit_latch_pred > > > > > > > (LOOP_VINFO_IV_EXIT > > > > > > > > > > > (loop_vinfo), > > > > > > > > > > > > > > > > + > > LOOP_VINFO_LOOP > > > > > > > > > > > (loop_vinfo)); > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > You are computing this here and in > > > > > > > > > vect_update_ivs_after_vectorizer? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > + > > > > > > > > > > > > > > > > + /* Update the main exit first. */ > > > > > > > > > > > > > > > > + vect_update_ivs_after_vectorizer > > > > > > > > > > > > > > > > + (loop_vinfo, vf, > > > > > > > > > > > > > niters_vector_mult_vf, > > > > > > > > > > > > > > > > + update_e, > > > > > > > inversed_iv); > > > > > > > > > > > > > > > > + > > > > > > > > > > > > > > > > + /* And then update the early exits. = */ > > > > > > > > > > > > > > > > + for (auto exit : get_loop_exit_edges= (loop)) > > > > > > > > > > > > > > > > + { > > > > > > > > > > > > > > > > + if (exit =3D=3D LOOP_VINFO_IV_EXIT > > (loop_vinfo)) > > > > > > > > > > > > > > > > + continue; > > > > > > > > > > > > > > > > + > > > > > > > > > > > > > > > > + vect_update_ivs_after_vectorizer > > > > > > > > > > > > > > > > +(loop_vinfo, vf, > > > > > > > > > > > > > > > > + > > > > > > > niters_vector_mult_vf, > > > > > > > > > > > > > > > > + exit, true); > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ... why does the same not work here? > > > > > > > > > > > > > > > Wouldn't the proper condition be > > > > > > > > > > > > > > > !dominated_by_p (CDI_DOMINATORS, > > > > > > > > > > > > > > > exit->src, LOOP_VINFO_IV_EXIT > > > > > > > > > > > > > > > (loop_vinfo)->src) or similar? That is, > > > > > > > > > > > > > > > whether the exit is at or after the main IV e= xit? > > > > > > > > > > > > > > > (consider having > > > > > > > > > > > > > > > two) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > + } > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > if (skip_epilog) > > > > > > > > > > > > > > > > { > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > Richard Biener SUSE Software > > > > > > > > > > > > > Solutions Germany GmbH, Frankenstrasse 146, > > > > > > > > > > > > > 90461 Nuernberg, Germany; > > > > > > > > > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; > > > > > > > > > > > > > (HRB 36809, AG > > > > > > > > > > > > > Nuernberg) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > Richard Biener SUSE Software > > > > > > > > > > > Solutions Germany GmbH, Frankenstrasse 146, 90461 > > > > > > > > > > > Nuernberg, Germany; > > > > > > > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; > > > > > > > > > > > (HRB 36809, AG > > > > > > > > > > > Nuernberg) > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > Richard Biener SUSE Software > > > > > > > > > Solutions Germany GmbH, Frankenstrasse 146, 90461 > > > > > > > > > Nuernberg, Germany; > > > > > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB > > > > > > > > > 36809, AG > > > > > > > > > Nuernberg) > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > Richard Biener SUSE Software Solutions > > > > > > > Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; > > > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, > > > > > > > AG > > > > > > > Nuernberg) > > > > > > > > > > > > > > > > -- > > > > > Richard Biener SUSE Software Solutions > > > > > Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG > > > > > Nuernberg) > > > > > > > > > > -- > > > Richard Biener > > > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 > > > Nuernberg, Germany; > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG > > > Nuernberg)