From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR04-VI1-obe.outbound.protection.outlook.com (mail-vi1eur04on2087.outbound.protection.outlook.com [40.107.8.87]) by sourceware.org (Postfix) with ESMTPS id 72A0B3858CDA for ; Wed, 11 Oct 2023 11:16:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 72A0B3858CDA Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=WUVHVfDwo7RI4sUPl1YKZfnj9X55Bmt37WEDd4POwQ8=; b=kb+aLN94OML+Sd74sBzHYA/aNyfgTYS+ImKW/wNX7uy3LFdt7KBEmHkLcyj9i3VDyHl74WiBMLrC937WX3o2+mcVZ4Gm5L4KGxGRK7BnhzsGnO8uaD+yxGOvmQn9980t0VXBPNKB2s7RyiAREPNK9AVmO62WOn7gdH24JQVTHtg= Received: from DB9PR01CA0018.eurprd01.prod.exchangelabs.com (2603:10a6:10:1d8::23) by DB5PR08MB10234.eurprd08.prod.outlook.com (2603:10a6:10:4a6::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6863.36; Wed, 11 Oct 2023 11:16:23 +0000 Received: from DBAEUR03FT043.eop-EUR03.prod.protection.outlook.com (2603:10a6:10:1d8:cafe::81) by DB9PR01CA0018.outlook.office365.com (2603:10a6:10:1d8::23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6886.27 via Frontend Transport; Wed, 11 Oct 2023 11:16:23 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DBAEUR03FT043.mail.protection.outlook.com (100.127.143.24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6886.23 via Frontend Transport; Wed, 11 Oct 2023 11:16:22 +0000 Received: ("Tessian outbound fb5c0777b309:v211"); Wed, 11 Oct 2023 11:16:22 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 4d544fa97f3ddf6e X-CR-MTA-TID: 64aa7808 Received: from 9836ed34e373.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 13EADCC8-9AB7-4643-9DAE-650EBB3525BA.1; Wed, 11 Oct 2023 11:16:16 +0000 Received: from EUR03-AM7-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 9836ed34e373.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Wed, 11 Oct 2023 11:16:16 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=gZqWctAc6nU5qwpGCljmwSVeP+AKf1Gup1+hDZhoG783gFxiTUWIk2PBsygyD2IxZSNVFr421ZxIzdII+IemSFIvSUruoQBwWdvViD3AgoatBCNkoHh4qpEBCQrz6WAbqQAXEhWzaB4yjFoDD1J4/Tqx2YjVC1rc9YWys/Nd3SagjVT8MBGExjgapMKLSECF0NxBRJ0jRVXPW1o+cmWeVF3ZgXlY+pFSkEtJAvyQRIO6ZWRa4UJzoKl5YjPz56P+j9MfbUJqcK+LjSBZb8Nh7xJBLD2sr8O67do1YFs/CR8PRuLIg2hnHX9y8aBh9uMXCR351++d7R29Uki9swcHbw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=WUVHVfDwo7RI4sUPl1YKZfnj9X55Bmt37WEDd4POwQ8=; b=eIe5E0D/00Evos9qU98QecvZGLYdmGtksoF7Fl8KorVfgZLq/W7eMMiDqGRFeP/kUXNHpueIoGKHPHBEmqFPycQqVxsJ+5fCULN0PUvJmsmJasDyU7ItgkICA+WiaqnM/ojgggE2b2JrvDxDNd0fnkw65o9tHfrg24d3KgM15E+kAP9P1PAGzJTChwgHQbHV5wajCH6NCazCcOOHhxr/+GYM2kCQ07m1pj+HwUWOE9B3SX3ienno8XC8c/VevRCJ3GGJSrCMe7v5U9LHDUJOpHCmlgSLmyXQ0VqIP3ts1f8+qO2cywt/w8KDVOsEclCIfeazzNA+Zi9ZDxGlvFznCQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=WUVHVfDwo7RI4sUPl1YKZfnj9X55Bmt37WEDd4POwQ8=; b=kb+aLN94OML+Sd74sBzHYA/aNyfgTYS+ImKW/wNX7uy3LFdt7KBEmHkLcyj9i3VDyHl74WiBMLrC937WX3o2+mcVZ4Gm5L4KGxGRK7BnhzsGnO8uaD+yxGOvmQn9980t0VXBPNKB2s7RyiAREPNK9AVmO62WOn7gdH24JQVTHtg= Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by PA4PR08MB6127.eurprd08.prod.outlook.com (2603:10a6:102:f3::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6863.38; Wed, 11 Oct 2023 11:16:13 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::bba1:2711:6992:468d]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::bba1:2711:6992:468d%4]) with mapi id 15.20.6863.032; Wed, 11 Oct 2023 11:16:11 +0000 From: Tamar Christina To: Richard Biener CC: "gcc-patches@gcc.gnu.org" , nd , "jlaw@ventanamicro.com" Subject: RE: [PATCH 3/3]middle-end: maintain LCSSA throughout loop peeling Thread-Topic: [PATCH 3/3]middle-end: maintain LCSSA throughout loop peeling Thread-Index: AQHZ9QP2I9eXmIrVbEWQK+BCpUjsn7BDCT6AgAFwf8A= Date: Wed, 11 Oct 2023 11:16:11 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-traffictypediagnostic: VI1PR08MB5325:EE_|PA4PR08MB6127:EE_|DBAEUR03FT043:EE_|DB5PR08MB10234:EE_ X-MS-Office365-Filtering-Correlation-Id: 824f25c6-4d6d-4582-11b6-08dbca4b8080 x-checkrecipientrouted: true nodisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: vmbNmLBi2NXP7JvEUYdcyvR6nfWGX+AZWo728U4TH9nF5I/gMSrhGZfXYlTo/E3fX1WnaCtarzpkQq1bquXPZkudSRj7FhChV3MbMOjmG7xlVUpHfUnUYXM2fjKoZ2/amfoiHfDFnrzQ7+neMKjdslo9P+JFSQylFYjrvPG0ehye15LNTOmC5ce9emVUba6FnQAb+WbS+8Knn+o8MWuVFOhrMq1O52cLB8l7LcOofNRQn4WH4XHGK/7Dk6QtzpFu7qICU9Kw9VbI6F33gbySMULfZy+Nt3OPFdsND/GQnNV9iahYxuQ2Jl/EGtzAFCsTS3q2SO0F4M5bNOpaDo0ajuyrEaPkUN8Wlw9FZSLLqBzU94/zoejvyXiuZ96Uk8QU4EioaTSGx/UEeWEcRIs2jMGl0dpKzdln33tJIDjR0RMOxI13tlg6NyQ+IOmgV8JsUSiyQhCBNAWB1xuydiFm+tS2nZeMLcmtWjkRT0zqoxnZE+V3V3d1CBDN+ot6UFQKx+wV0QNXUjIoQ4tfIezSSdvd3CCkMV71jwgIE/Lieve/V69NP2+EUOqBgnC5SOwST5/lVIDivVnlOltXuLrRaTeavlouFo3Hi6d3CUtCPE7tfCGbHuNsQNgQ7Qeaxf3d X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:VI1PR08MB5325.eurprd08.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230031)(39860400002)(136003)(366004)(346002)(396003)(376002)(230922051799003)(186009)(64100799003)(1800799009)(451199024)(26005)(55016003)(66899024)(83380400001)(52536014)(41300700001)(5660300002)(6916009)(54906003)(76116006)(66946007)(66556008)(66476007)(66446008)(64756008)(8936002)(8676002)(4326008)(316002)(30864003)(86362001)(33656002)(7696005)(6506007)(478600001)(71200400001)(2906002)(38100700002)(9686003)(122000001)(38070700005)(579004);DIR:OUT;SFP:1101; Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: PA4PR08MB6127 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DBAEUR03FT043.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: f6d3689b-66a0-4f2f-3191-08dbca4b79bf X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: +yWJ3zBV9vbsepkps3bChfi4LU+zYfQc5SwgbFUzepGaW+ZaxUCX82LJQvbucCc9Px4C2cywnnQo785J6d53BbmkifZaYkUGbZxZmldYdhv30kRN//Gce3zSTSDsDrACuOGdX3m6LW18z/viDFTpFEN0VmW909Kh5l541On4OSdKxaS+/uxQP99Hsd/16tWO7kR8IsHSuDku1RR2k6CXVccIigYpTP0ZJ/5JUGjlgGXl3UM09F990yFUlJ8BiMAxYNfgW4bKKv/nC6gNBcmMGxY48mCxj8NgHcoh2jFx2FjUXBVzaQIZWNQz/Ok+jsE0EEunQHV5ZqE8HC5vMIAmwq/TBxpKsRRTardvT1I94OlVvXjQmoB8FTnm8cb5rSX8O/NJk2YnL6kt0IaQ06tilivhnzEQ7gdIC0R7cq4VKLhHgeoyyCZ3WBKpdGSo7ROOvhoQuTqWV940g7PUDA12e6auc1pFv8Vmwv0JhUnVGrrxreHe7C6SDGXHUmgQQxKZYtxQGKtM0ZYK1CA8da7lJ99loo3JwQKpI1BtXworwNIaErjeN1B3VrGs4GEZLizQPYFo5il+KmpZ8capvXZYix+gaSPsTfUibVoNfLU07YNX6znWqRQb50gGiBj4snYFoo5kDjNLP1WVhs1uJcwvaz0C6VqQRfRVyrej2YuHpf+FlJiHP7SihQudWtDCX0ISJL5R0ehKdvzXN+4OS2XVPT9Murs2IT1o5ApHj/IRI7FcFulyNGXGCqTmG6apAcNE X-Forefront-Antispam-Report: CIP:63.35.35.123;CTRY:IE;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:64aa7808-outbound-1.mta.getcheckrecipient.com;PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com;CAT:NONE;SFS:(13230031)(4636009)(346002)(136003)(376002)(39850400004)(396003)(230922051799003)(1800799009)(64100799003)(451199024)(186009)(82310400011)(46966006)(36840700001)(40470700004)(9686003)(66899024)(7696005)(336012)(26005)(356005)(81166007)(55016003)(40460700003)(86362001)(33656002)(40480700001)(82740400003)(107886003)(70586007)(4326008)(6862004)(2906002)(30864003)(36860700001)(8936002)(47076005)(6506007)(478600001)(8676002)(52536014)(83380400001)(41300700001)(316002)(54906003)(70206006)(5660300002);DIR:OUT;SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 11 Oct 2023 11:16:22.9823 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 824f25c6-4d6d-4582-11b6-08dbca4b8080 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d;Ip=[63.35.35.123];Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DBAEUR03FT043.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB5PR08MB10234 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,FORGED_SPF_HELO,GIT_PATCH_0,KAM_DMARC_NONE,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE,TXREP,UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: > > + auto loop_exits =3D get_loop_exit_edges (loop); > > + auto_vec doms; > > + > > if (at_exit) /* Add the loop copy at exit. */ > > { > > - if (scalar_loop !=3D loop) > > + if (scalar_loop !=3D loop && new_exit->dest !=3D exit_dest) > > { > > - gphi_iterator gsi; > > new_exit =3D redirect_edge_and_branch (new_exit, exit_dest); > > + flush_pending_stmts (new_exit); > > + } > > > > - for (gsi =3D gsi_start_phis (exit_dest); !gsi_end_p (gsi); > > - gsi_next (&gsi)) > > - { > > - gphi *phi =3D gsi.phi (); > > - tree orig_arg =3D PHI_ARG_DEF_FROM_EDGE (phi, e); > > - location_t orig_locus > > - =3D gimple_phi_arg_location_from_edge (phi, e); > > + auto_vec new_phis; > > + hash_map new_phi_args; > > + /* First create the empty phi nodes so that when we flush the > > + statements they can be filled in. However because there is no orde= r > > + between the PHI nodes in the exits and the loop headers we need to > > + order them base on the order of the two headers. First record the > new > > + phi nodes. */ > > + for (auto gsi_from =3D gsi_start_phis (scalar_exit->dest); > > + !gsi_end_p (gsi_from); gsi_next (&gsi_from)) > > + { > > + gimple *from_phi =3D gsi_stmt (gsi_from); > > + tree new_res =3D copy_ssa_name (gimple_phi_result (from_phi)); > > + gphi *res =3D create_phi_node (new_res, new_preheader); > > + new_phis.safe_push (res); > > + } > > > > - add_phi_arg (phi, orig_arg, new_exit, orig_locus); > > + /* Then redirect the edges and flush the changes. This writes o= ut the > new > > + SSA names. */ > > + for (edge exit : loop_exits) >=20 > I realize at the moment it's the same, but we are redirecting multiple ex= it edges > here and from the walk above expect them all to have the same set of PHI > nodes - that looks a bit fragile? No, it only expects the two preheaders to have the same PHI nodes. Since o= ne loop is copied from the other we know that to be true. Now of course there are cases where your exit blocks have more PHI nodes th= an the headers (e.g. live values) but those are handled later in the hunk below (w= ith new_phi_args). For the flush_pending_stmts to work I had to make sure the order of the phi= nodes are the same as the original. This is why I can't iterate over the values in the e= xit block instead and need to handle it in two steps. > Does this need adjustments later for the early exit vectorization? >=20 I believe (need to finish the rebase) that the only adjustment I'll need he= re for multiple exits is the updates of the dominators. I don't think I'll need more. I had iss= ues with live values that I had to handle specially before, but I think this new approach should deal= with it already. > This also somewhat confuses the original redirection of 'e', the main exi= t with > the later (*) >=20 > > + { > > + edge e =3D redirect_edge_and_branch (exit, new_preheader); > > + flush_pending_stmts (e); > > + } > > + > > + /* Record the new SSA names in the cache so that we can skip > materializing > > + them again when we fill in the rest of the LCSSA variables. */ > > + for (auto phi : new_phis) > > + { > > + tree new_arg =3D gimple_phi_arg (phi, 0)->def; >=20 > and here you look at the (for now) single edge we redirected ... >=20 > > + new_phi_args.put (new_arg, gimple_phi_result (phi)); > > + } > > + > > + /* Copy the current loop LC PHI nodes between the original loop = exit > > + block and the new loop header. This allows us to later split the > > + preheader block and still find the right LC nodes. */ > > + edge latch_new =3D single_succ_edge (new_preheader); >=20 > odd name - the single successor of a loop preheader is the loop header an= d the > corresponding edge is the loop entry edge, not the latch? >=20 > > + for (auto gsi_from =3D gsi_start_phis (loop->header), > > + gsi_to =3D gsi_start_phis (new_loop->header); > > + flow_loops && !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to); >=20 > Eh, can we have >=20 > if (flow_loops) > for (auto ...) >=20 > please, even if that indents more? >=20 > > + gsi_next (&gsi_from), gsi_next (&gsi_to)) > > + { > > + gimple *from_phi =3D gsi_stmt (gsi_from); > > + gimple *to_phi =3D gsi_stmt (gsi_to); > > + tree new_arg =3D PHI_ARG_DEF_FROM_EDGE (from_phi, > > + loop_latch_edge (loop)); > > + > > + /* Check if we've already created a new phi node during edge > > + redirection. If we have, only propagate the value downwards. *= / > > + if (tree *res =3D new_phi_args.get (new_arg)) > > + { > > + adjust_phi_and_debug_stmts (to_phi, latch_new, *res); > > + continue; > > } > > + > > + tree new_res =3D copy_ssa_name (gimple_phi_result (from_phi)); > > + gphi *lcssa_phi =3D create_phi_node (new_res, e->dest); > > + > > + /* Main loop exit should use the final iter value. */ > > + add_phi_arg (lcssa_phi, new_arg, loop_exit, UNKNOWN_LOCATION); >=20 > For all other edges into the loop besides 'e' there's missing PHI argumen= ts? > You are using 'e' here again, but also use that as temporary in for block= s, > shadowing the parameter - that makes it difficult to read. Also it's som= etimes > 'e->dest' and sometimes new_preheader - I think you want to use > new_preheader here as well (in create_phi_node) for consistency and ease = of > understanding. >=20 > ISTR when early break vectorization lands we're going to redirect the alt= ernate > exits away again "fixing" the missing PHI args. >=20 We indeed had a discussion about this, and I'll expand more on the reasonin= g in the patch for early breaks. But I think not redirecting the edges away for ear= ly break makes more sense as It treats early break, alignment peeling and epilogue vectori= zation the same way and the only difference is in the statement inside the guard blocks. But also more importantly this representation also makes it easier to imple= ment First-Faulting Loads support. For FFL we'll copy the main loop and at the "fault" check w= e branch to a new Loop remainder that has the same sequences as the remainder of the main vec= tor loop but with different predicates. The reason for this is to remove the predicate = mangling from the optimal/likely loop body which is critical for performance. Now since FFL is intended to pair naturally with early break having the ear= ly exit edges all lead into the same block makes the flow a lot easier to manage. But I'll make sure to include a diagram in the early break peeling patch. Thanks, Tamar > > + > > + adjust_phi_and_debug_stmts (to_phi, latch_new, new_res); > > } > > - redirect_edge_and_branch_force (e, new_preheader); > > - flush_pending_stmts (e); > > + > > set_immediate_dominator (CDI_DOMINATORS, new_preheader, e->src); > > - if (was_imm_dom || duplicate_outer_loop) > > + > > + if ((was_imm_dom || duplicate_outer_loop)) >=20 > extra ()s >=20 > > set_immediate_dominator (CDI_DOMINATORS, exit_dest, new_exit- > >src); > > > > /* And remove the non-necessary forwarder again. Keep the > > other @@ -1598,6 +1680,22 @@ slpeel_tree_duplicate_loop_to_edge_cfg > (class loop *loop, edge loop_exit, > > } > > else /* Add the copy at entry. */ > > { > > + /* Copy the current loop LC PHI nodes between the original loop = exit > > + block and the new loop header. This allows us to later split the > > + preheader block and still find the right LC nodes. */ > > + for (auto gsi_from =3D gsi_start_phis (new_loop->header), > > + gsi_to =3D gsi_start_phis (loop->header); > > + flow_loops && !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to); >=20 > same if (flow_loops) >=20 > > + gsi_next (&gsi_from), gsi_next (&gsi_to)) > > + { > > + gimple *from_phi =3D gsi_stmt (gsi_from); > > + gimple *to_phi =3D gsi_stmt (gsi_to); > > + tree new_arg =3D PHI_ARG_DEF_FROM_EDGE (from_phi, > > + loop_latch_edge (new_loop)); >=20 > this looks wrong? IMHO it should be the PHI_RESULT, no? Note this only > triggers for alignment peeling ... >=20 > Otherwise looks OK. >=20 > Thanks, > Richard. >=20 >=20 > > + adjust_phi_and_debug_stmts (to_phi, loop_preheader_edge (loop), > > + new_arg); > > + } > > + > > if (scalar_loop !=3D loop) > > { > > /* Remove the non-necessary forwarder of scalar_loop again. */ @@ > > -1627,29 +1725,6 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop > *loop, edge loop_exit, > > loop_preheader_edge (new_loop)->src); > > } > > > > - if (scalar_loop !=3D loop) > > - { > > - /* Update new_loop->header PHIs, so that on the preheader > > - edge they are the ones from loop rather than scalar_loop. */ > > - gphi_iterator gsi_orig, gsi_new; > > - edge orig_e =3D loop_preheader_edge (loop); > > - edge new_e =3D loop_preheader_edge (new_loop); > > - > > - for (gsi_orig =3D gsi_start_phis (loop->header), > > - gsi_new =3D gsi_start_phis (new_loop->header); > > - !gsi_end_p (gsi_orig) && !gsi_end_p (gsi_new); > > - gsi_next (&gsi_orig), gsi_next (&gsi_new)) > > - { > > - gphi *orig_phi =3D gsi_orig.phi (); > > - gphi *new_phi =3D gsi_new.phi (); > > - tree orig_arg =3D PHI_ARG_DEF_FROM_EDGE (orig_phi, orig_e); > > - location_t orig_locus > > - =3D gimple_phi_arg_location_from_edge (orig_phi, orig_e); > > - > > - add_phi_arg (new_phi, orig_arg, new_e, orig_locus); > > - } > > - } > > - > > free (new_bbs); > > free (bbs); > > > > @@ -2579,139 +2654,36 @@ vect_gen_vector_loop_niters_mult_vf > > (loop_vec_info loop_vinfo, > > > > /* LCSSA_PHI is a lcssa phi of EPILOG loop which is copied from LOOP, > > this function searches for the corresponding lcssa phi node in exit > > - bb of LOOP. If it is found, return the phi result; otherwise retur= n > > - NULL. */ > > + bb of LOOP following the LCSSA_EDGE to the exit node. If it is fou= nd, > > + return the phi result; otherwise return NULL. */ > > > > static tree > > find_guard_arg (class loop *loop ATTRIBUTE_UNUSED, > > class loop *epilog ATTRIBUTE_UNUSED, > > - const_edge e, gphi *lcssa_phi) > > + const_edge e, gphi *lcssa_phi, int lcssa_edge =3D 0) > > { > > gphi_iterator gsi; > > > > - gcc_assert (single_pred_p (e->dest)); > > for (gsi =3D gsi_start_phis (e->dest); !gsi_end_p (gsi); gsi_next (&= gsi)) > > { > > gphi *phi =3D gsi.phi (); > > - if (operand_equal_p (PHI_ARG_DEF (phi, 0), > > - PHI_ARG_DEF (lcssa_phi, 0), 0)) > > - return PHI_RESULT (phi); > > - } > > - return NULL_TREE; > > -} > > - > > -/* Function slpeel_tree_duplicate_loop_to_edge_cfg duplciates > FIRST/SECOND > > - from SECOND/FIRST and puts it at the original loop's preheader/exit > > - edge, the two loops are arranged as below: > > - > > - preheader_a: > > - first_loop: > > - header_a: > > - i_1 =3D PHI; > > - ... > > - i_2 =3D i_1 + 1; > > - if (cond_a) > > - goto latch_a; > > - else > > - goto between_bb; > > - latch_a: > > - goto header_a; > > - > > - between_bb: > > - ;; i_x =3D PHI; ;; LCSSA phi node to be created for FIRST, > > - > > - second_loop: > > - header_b: > > - i_3 =3D PHI; ;; Use of i_0 to be replaced with i_x, > > - or with i_2 if no LCSSA phi is created > > - under condition of > CREATE_LCSSA_FOR_IV_PHIS. > > - ... > > - i_4 =3D i_3 + 1; > > - if (cond_b) > > - goto latch_b; > > - else > > - goto exit_bb; > > - latch_b: > > - goto header_b; > > - > > - exit_bb: > > - > > - This function creates loop closed SSA for the first loop; update th= e > > - second loop's PHI nodes by replacing argument on incoming edge with= the > > - result of newly created lcssa PHI nodes. IF CREATE_LCSSA_FOR_IV_PH= IS > > - is false, Loop closed ssa phis will only be created for non-iv phis= for > > - the first loop. > > - > > - This function assumes exit bb of the first loop is preheader bb of = the > > - second loop, i.e, between_bb in the example code. With PHIs update= d, > > - the second loop will execute rest iterations of the first. */ > > - > > -static void > > -slpeel_update_phi_nodes_for_loops (loop_vec_info loop_vinfo, > > - class loop *first, edge first_loop_e, > > - class loop *second, edge second_loop_e, > > - bool create_lcssa_for_iv_phis) > > -{ > > - gphi_iterator gsi_update, gsi_orig; > > - class loop *loop =3D LOOP_VINFO_LOOP (loop_vinfo); > > - > > - edge first_latch_e =3D EDGE_SUCC (first->latch, 0); > > - edge second_preheader_e =3D loop_preheader_edge (second); > > - basic_block between_bb =3D first_loop_e->dest; > > - > > - gcc_assert (between_bb =3D=3D second_preheader_e->src); > > - gcc_assert (single_pred_p (between_bb) && single_succ_p > > (between_bb)); > > - /* Either the first loop or the second is the loop to be > > vectorized. */ > > - gcc_assert (loop =3D=3D first || loop =3D=3D second); > > - > > - for (gsi_orig =3D gsi_start_phis (first->header), > > - gsi_update =3D gsi_start_phis (second->header); > > - !gsi_end_p (gsi_orig) && !gsi_end_p (gsi_update); > > - gsi_next (&gsi_orig), gsi_next (&gsi_update)) > > - { > > - gphi *orig_phi =3D gsi_orig.phi (); > > - gphi *update_phi =3D gsi_update.phi (); > > - > > - tree arg =3D PHI_ARG_DEF_FROM_EDGE (orig_phi, first_latch_e); > > - /* Generate lcssa PHI node for the first loop. */ > > - gphi *vect_phi =3D (loop =3D=3D first) ? orig_phi : update_phi; > > - stmt_vec_info vect_phi_info =3D loop_vinfo->lookup_stmt (vect_ph= i); > > - if (create_lcssa_for_iv_phis || !iv_phi_p (vect_phi_info)) > > + /* Nested loops with multiple exits can have different no# phi n= ode > > + arguments between the main loop and epilog as epilog falls to the > > + second loop. */ > > + if (gimple_phi_num_args (phi) > e->dest_idx) > > { > > - tree new_res =3D copy_ssa_name (PHI_RESULT (orig_phi)); > > - gphi *lcssa_phi =3D create_phi_node (new_res, between_bb); > > - add_phi_arg (lcssa_phi, arg, first_loop_e, UNKNOWN_LOCATION); > > - arg =3D new_res; > > - } > > - > > - /* Update PHI node in the second loop by replacing arg on the lo= op's > > - incoming edge. */ > > - adjust_phi_and_debug_stmts (update_phi, second_preheader_e, arg)= ; > > - } > > - > > - /* For epilogue peeling we have to make sure to copy all LC PHIs > > - for correct vectorization of live stmts. */ > > - if (loop =3D=3D first) > > - { > > - basic_block orig_exit =3D second_loop_e->dest; > > - for (gsi_orig =3D gsi_start_phis (orig_exit); > > - !gsi_end_p (gsi_orig); gsi_next (&gsi_orig)) > > - { > > - gphi *orig_phi =3D gsi_orig.phi (); > > - tree orig_arg =3D PHI_ARG_DEF (orig_phi, 0); > > - if (TREE_CODE (orig_arg) !=3D SSA_NAME || virtual_operand_p > (orig_arg)) > > - continue; > > - > > - const_edge exit_e =3D LOOP_VINFO_IV_EXIT (loop_vinfo); > > - /* Already created in the above loop. */ > > - if (find_guard_arg (first, second, exit_e, orig_phi)) > > + tree var =3D PHI_ARG_DEF (phi, e->dest_idx); > > + if (TREE_CODE (var) !=3D SSA_NAME) > > continue; > > - > > - tree new_res =3D copy_ssa_name (orig_arg); > > - gphi *lcphi =3D create_phi_node (new_res, between_bb); > > - add_phi_arg (lcphi, orig_arg, first_loop_e, UNKNOWN_LOCATION); > > + tree def =3D get_current_def (var); > > + if (!def) > > + continue; > > + if (operand_equal_p (def, > > + PHI_ARG_DEF (lcssa_phi, lcssa_edge), 0)) > > + return PHI_RESULT (phi); > > } > > } > > + return NULL_TREE; > > } > > > > /* Function slpeel_add_loop_guard adds guard skipping from the > > beginning @@ -2796,11 +2768,11 @@ > slpeel_update_phi_nodes_for_guard1 (class loop *skip_loop, > > } > > } > > > > -/* LOOP and EPILOG are two consecutive loops in CFG and EPILOG is copi= ed > > - from LOOP. Function slpeel_add_loop_guard adds guard skipping from= a > > - point between the two loops to the end of EPILOG. Edges GUARD_EDGE > > - and MERGE_EDGE are the two pred edges of merge_bb at the end of > EPILOG. > > - The CFG looks like: > > +/* LOOP and EPILOG are two consecutive loops in CFG connected by > LOOP_EXIT edge > > + and EPILOG is copied from LOOP. Function slpeel_add_loop_guard add= s > guard > > + skipping from a point between the two loops to the end of EPILOG. = Edges > > + GUARD_EDGE and MERGE_EDGE are the two pred edges of merge_bb at > the end of > > + EPILOG. The CFG looks like: > > > > loop: > > header_a: > > @@ -2851,6 +2823,7 @@ slpeel_update_phi_nodes_for_guard1 (class loop > > *skip_loop, > > > > static void > > slpeel_update_phi_nodes_for_guard2 (class loop *loop, class loop > > *epilog, > > + const_edge loop_exit, > > edge guard_edge, edge merge_edge) { > > gphi_iterator gsi; > > @@ -2859,13 +2832,11 @@ slpeel_update_phi_nodes_for_guard2 (class > loop *loop, class loop *epilog, > > gcc_assert (single_succ_p (merge_bb)); > > edge e =3D single_succ_edge (merge_bb); > > basic_block exit_bb =3D e->dest; > > - gcc_assert (single_pred_p (exit_bb)); > > - gcc_assert (single_pred (exit_bb) =3D=3D single_exit (epilog)->dest)= ; > > > > for (gsi =3D gsi_start_phis (exit_bb); !gsi_end_p (gsi); gsi_next (&= gsi)) > > { > > gphi *update_phi =3D gsi.phi (); > > - tree old_arg =3D PHI_ARG_DEF (update_phi, 0); > > + tree old_arg =3D PHI_ARG_DEF (update_phi, e->dest_idx); > > > > tree merge_arg =3D NULL_TREE; > > > > @@ -2877,8 +2848,8 @@ slpeel_update_phi_nodes_for_guard2 (class loop > *loop, class loop *epilog, > > if (!merge_arg) > > merge_arg =3D old_arg; > > > > - tree guard_arg > > - =3D find_guard_arg (loop, epilog, single_exit (loop), update_phi); > > + tree guard_arg =3D find_guard_arg (loop, epilog, loop_exit, > > + update_phi, e->dest_idx); > > /* If the var is live after loop but not a reduction, we simply > > use the old arg. */ > > if (!guard_arg) > > @@ -2898,21 +2869,6 @@ slpeel_update_phi_nodes_for_guard2 (class > loop *loop, class loop *epilog, > > } > > } > > > > -/* EPILOG loop is duplicated from the original loop for vectorizing, > > - the arg of its loop closed ssa PHI needs to be updated. */ > > - > > -static void > > -slpeel_update_phi_nodes_for_lcssa (class loop *epilog) -{ > > - gphi_iterator gsi; > > - basic_block exit_bb =3D single_exit (epilog)->dest; > > - > > - gcc_assert (single_pred_p (exit_bb)); > > - edge e =3D EDGE_PRED (exit_bb, 0); > > - for (gsi =3D gsi_start_phis (exit_bb); !gsi_end_p (gsi); gsi_next (&= gsi)) > > - rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi.phi (), e)); > > -} > > - > > /* LOOP_VINFO is an epilogue loop whose corresponding main loop can be > skipped. > > Return a value that equals: > > > > @@ -3255,8 +3211,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree > niters, tree nitersm1, > > e, &prolog_e); > > gcc_assert (prolog); > > prolog->force_vectorize =3D false; > > - slpeel_update_phi_nodes_for_loops (loop_vinfo, prolog, prolog_e,= loop, > > - exit_e, true); > > + > > first_loop =3D prolog; > > reset_original_copy_tables (); > > > > @@ -3336,8 +3291,6 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree > niters, tree nitersm1, > > LOOP_VINFO_EPILOGUE_IV_EXIT (loop_vinfo) =3D new_epilog_e; > > gcc_assert (epilog); > > epilog->force_vectorize =3D false; > > - slpeel_update_phi_nodes_for_loops (loop_vinfo, loop, e, epilog, > > - new_epilog_e, false); > > bb_before_epilog =3D loop_preheader_edge (epilog)->src; > > > > /* Scalar version loop may be preferred. In this case, add > > guard @@ -3430,7 +3383,9 @@ vect_do_peeling (loop_vec_info > loop_vinfo, tree niters, tree nitersm1, > > irred_flag); > > if (vect_epilogues) > > epilogue_vinfo->skip_this_loop_edge =3D guard_e; > > - slpeel_update_phi_nodes_for_guard2 (loop, epilog, guard_e, > epilog_e); > > + edge main_iv =3D LOOP_VINFO_IV_EXIT (loop_vinfo); > > + slpeel_update_phi_nodes_for_guard2 (loop, epilog, main_iv, > guard_e, > > + epilog_e); > > /* Only need to handle basic block before epilog loop if it's not > > the guard_bb, which is the case when skip_vector is true. */ > > if (guard_bb !=3D bb_before_epilog) > > @@ -3441,8 +3396,6 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree > niters, tree nitersm1, > > } > > scale_loop_profile (epilog, prob_epilog, -1); > > } > > - else > > - slpeel_update_phi_nodes_for_lcssa (epilog); > > > > unsigned HOST_WIDE_INT bound; > > if (bound_scalar.is_constant (&bound)) diff --git > > a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index > > > f1caa5f207d3b13da58c3a313b11d1ef98374349..327cab0f736da7f1bd3e0 > 24d666d > > f46ef9208107 100644 > > --- a/gcc/tree-vect-loop.cc > > +++ b/gcc/tree-vect-loop.cc > > @@ -5877,7 +5877,7 @@ vect_create_epilog_for_reduction (loop_vec_info > loop_vinfo, > > basic_block exit_bb; > > tree scalar_dest; > > tree scalar_type; > > - gimple *new_phi =3D NULL, *phi; > > + gimple *new_phi =3D NULL, *phi =3D NULL; > > gimple_stmt_iterator exit_gsi; > > tree new_temp =3D NULL_TREE, new_name, new_scalar_dest; > > gimple *epilog_stmt =3D NULL; > > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index > > > 55b6771b271d5072fa1327d595e1dddb112cfdf6..25ceb6600673d71fd601 > 24434039 > > 97e921066483 100644 > > --- a/gcc/tree-vectorizer.h > > +++ b/gcc/tree-vectorizer.h > > @@ -2183,7 +2183,7 @@ extern bool slpeel_can_duplicate_loop_p (const > class loop *, const_edge, > > const_edge); > > class loop *slpeel_tree_duplicate_loop_to_edge_cfg (class loop *, edge= , > > class loop *, edge, > > - edge, edge *); > > + edge, edge *, bool =3D true); > > class loop *vect_loop_versioning (loop_vec_info, gimple *); extern > > class loop *vect_do_peeling (loop_vec_info, tree, tree, > > tree *, tree *, tree *, int, bool, bool, > > > > > > > > > > >=20 > -- > Richard Biener > SUSE Software Solutions Germany GmbH, > Frankenstrasse 146, 90461 Nuernberg, Germany; > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG > Nuernberg)