From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=qeCr=ED=arm.com=Tamar.Christina@sourceware.org>
Received: from EUR04-DB3-obe.outbound.protection.outlook.com (mail-db3eur04on2062.outbound.protection.outlook.com [40.107.6.62])
	by sourceware.org (Postfix) with ESMTPS id 3EF3F3858401
	for <gcc-patches@gcc.gnu.org>; Fri, 18 Aug 2023 11:36:04 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 3EF3F3858401
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com;
 s=selector2-armh-onmicrosoft-com;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=UfYcQbELZriNG1NY0dfxV1VTXr21IRaymd7ehLtnEmU=;
 b=c6rwV6h8czCf+/+TPPGaEr8bvH34xK2xRhGa60KpKlWBOJz1o6OlU7VxE2BAVjDA/WvNwfeQh9cEtExNUuccN1goQlmL8txr2SOoU1a1Vz5fcuPZc5sX6IR7ulk+EGmoWkvQALfqAYOKEGH4e3v/Vw1db0zOvFz6tiqxmvg67ZI=
Received: from AM4PR0302CA0033.eurprd03.prod.outlook.com (2603:10a6:205:2::46)
 by DBBPR08MB6172.eurprd08.prod.outlook.com (2603:10a6:10:1f4::6) with
 Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6699.20; Fri, 18 Aug
 2023 11:36:00 +0000
Received: from AM7EUR03FT060.eop-EUR03.prod.protection.outlook.com
 (2603:10a6:205:2:cafe::93) by AM4PR0302CA0033.outlook.office365.com
 (2603:10a6:205:2::46) with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6652.33 via Frontend
 Transport; Fri, 18 Aug 2023 11:36:00 +0000
X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123)
 smtp.mailfrom=arm.com; dkim=pass (signature was verified)
 header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com;
Received-SPF: Pass (protection.outlook.com: domain of arm.com designates
 63.35.35.123 as permitted sender) receiver=protection.outlook.com;
 client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com;
 pr=C
Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by
 AM7EUR03FT060.mail.protection.outlook.com (100.127.140.216) with Microsoft
 SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 15.20.6699.20 via Frontend Transport; Fri, 18 Aug 2023 11:36:00 +0000
Received: ("Tessian outbound 1eb4e931b055:v175"); Fri, 18 Aug 2023 11:35:59 +0000
X-CheckRecipientChecked: true
X-CR-MTA-CID: ddb02dcacfd72f17
X-CR-MTA-TID: 64aa7808
Received: from 0d8e473e9f4c.1
	by 64aa7808-outbound-1.mta.getcheckrecipient.com id C0DC40B8-5B63-4B45-A5C4-C0D8CDFC17D4.1;
	Fri, 18 Aug 2023 11:35:52 +0000
Received: from EUR05-VI1-obe.outbound.protection.outlook.com
    by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 0d8e473e9f4c.1
    (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384);
    Fri, 18 Aug 2023 11:35:52 +0000
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none;
 b=nisQ6WW4dsc58Ty8ic9qSCHiOBCQ3EKBvjPitWepQGyu8uBOuod//AJIUlDTvraPf8HWv2yBCNM5Uyd3ohUVfcvTwZzaKbLm9CXVsFyi1tWvxjUuSQtG1yCyxyG4EQFnEFWKsucECGrjvO+DsQsp7r9DBiaOlvlIfONkkAaGvGWgfDtkU2tfIiKAezSOHzLxNxFFhCZBMXkReH/lFd/yrcMAqKGUyqj9TU1kVAVpJ298m71XT0Bc7FaKzuRjQYaCsggo2Ao8uAaDdONHUVnYR2vxD0PfbynjEZe1dBlI5eZNmPZYyOd8kbvGBT788HR7UGl5T2wpY8mkPtI2VUJ2ug==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com;
 s=arcselector9901;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1;
 bh=UfYcQbELZriNG1NY0dfxV1VTXr21IRaymd7ehLtnEmU=;
 b=PsGpOLCtUDlh7Q7VFGRXwxhfw4T0ey/meS2gIqviwgn8mK+mbfLvkNprR5Axg0TrZLsDtn0clW4/f0Xv52iBdrRqGkYyc5WR5YAIROMnVv7ekP2bqYAkg83uFVotTgtAJPDlV5w4Js0VJcHZsE6P9lyRXuPZDwf9IvsusNKPnLtqFQQeroEzzDDzt5SvD+CCmqeY4nimNKoes99z8AFBFHMUBveiHuVnlV7NMTNhl1nCIqo3Vl0wtjhb126E4rAF11tecHR8HORkc7LchgY2Zkzgr9JolGt24R3ISGEWb6ojhlnDSHTnCSvSlfXQoSWiFwerHM2K9XHNJTcbPQ1shQ==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass
 smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass
 header.d=arm.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com;
 s=selector2-armh-onmicrosoft-com;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=UfYcQbELZriNG1NY0dfxV1VTXr21IRaymd7ehLtnEmU=;
 b=c6rwV6h8czCf+/+TPPGaEr8bvH34xK2xRhGa60KpKlWBOJz1o6OlU7VxE2BAVjDA/WvNwfeQh9cEtExNUuccN1goQlmL8txr2SOoU1a1Vz5fcuPZc5sX6IR7ulk+EGmoWkvQALfqAYOKEGH4e3v/Vw1db0zOvFz6tiqxmvg67ZI=
Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17)
 by PA4PR08MB6047.eurprd08.prod.outlook.com (2603:10a6:102:e3::16) with
 Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6678.31; Fri, 18 Aug
 2023 11:35:48 +0000
Received: from VI1PR08MB5325.eurprd08.prod.outlook.com
 ([fe80::a85:6d3:5dd7:7d3]) by VI1PR08MB5325.eurprd08.prod.outlook.com
 ([fe80::a85:6d3:5dd7:7d3%7]) with mapi id 15.20.6678.029; Fri, 18 Aug 2023
 11:35:46 +0000
From: Tamar Christina <Tamar.Christina@arm.com>
To: Richard Biener <rguenther@suse.de>
CC: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>, nd <nd@arm.com>,
	"jlaw@ventanamicro.com" <jlaw@ventanamicro.com>
Subject: RE: [PATCH 12/19]middle-end: implement loop peeling and IV updates
 for early break.
Thread-Topic: [PATCH 12/19]middle-end: implement loop peeling and IV updates
 for early break.
Thread-Index: AQHZqccHX5IjMK39iE64m4nr7rdxO6+4DBQAgAAQ3sCAAT9ugIA2vfCw
Date: Fri, 18 Aug 2023 11:35:45 +0000
Message-ID:
 <VI1PR08MB53251F997A2B28903E364359FF1BA@VI1PR08MB5325.eurprd08.prod.outlook.com>
References: <ZJw510c3/NpFrqbh@arm.com>
 <pnos2033-74o3-881-8260-8n6qspn6031@fhfr.qr>
  <VI1PR08MB532590F231318958BC6D0016FF37A@VI1PR08MB5325.eurprd08.prod.outlook.com>
 <2s4ro37o-pnn0-7750-4286-5982q36opn5@fhfr.qr>
In-Reply-To: <2s4ro37o-pnn0-7750-4286-5982q36opn5@fhfr.qr>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
Authentication-Results-Original: dkim=none (message not signed)
 header.d=none;dmarc=none action=none header.from=arm.com;
x-ms-traffictypediagnostic:
	VI1PR08MB5325:EE_|PA4PR08MB6047:EE_|AM7EUR03FT060:EE_|DBBPR08MB6172:EE_
X-MS-Office365-Filtering-Correlation-Id: 93b3cbfb-dceb-4a2f-a422-08db9fdf4bc8
x-checkrecipientrouted: true
nodisclaimer: true
X-MS-Exchange-SenderADCheck: 1
X-MS-Exchange-AntiSpam-Relay: 0
X-Microsoft-Antispam-Untrusted: BCL:0;
X-Microsoft-Antispam-Message-Info-Original:
 mUeXORDHl1OQVvKDocuznBymssQMzec5zMxJ05LGbqEh/hpvCxtD9c7vWsFCqA4pZIKFV3S7NrZvN1Og+BU10i+CJ+71/nIj1w2ZlijFIqqI6gViNrsECDbSr4sw8owKr+2M94FTpPl1RWoyXsVInfrgVBSIwH6P7KAM5Y0sT+JexrJPXTQWjKbVKVW2VEofDxocECynKSYks4ZhGcLsUId4W5cvqxZYnNqemRCo6XhDsvqjHsANpXhHwg94Ki5fTq4Po0OpEE4P8DLU2VlqxQCPI09YhJRKh/rrGx3HhgV8Al/PXlDCd6BB6TUs/WkEdKIgoA2Am2ZZUogkABtQaZGCL1bf2kDW0wFYEQMThYgGAL0kgo1TQZ/0k5wnWmxBLfE75vufV6idcMxHeFJc81TSFag30uI2hJ/uB100BNw5BcTLxAlLIThHLRrDriZ9r29v2LCSv/YYsGfroenQMJTWhU6ZgL9PcwqKAv/Pkh+4WoUzO3aqoiKZsSAaHH2JHsl/7EqZbkvd0WC4V4TkXaAgsv74YDP3aeEVHNWUNevR60gcNyKfCWznvTp0dLY6pZMOiywAsQKhVW99TPGLhWiX0/qwqnXCuCQ3oMiHHnVw0V1/aYdgVdQvn/ysDT0N
X-Forefront-Antispam-Report-Untrusted:
 CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:VI1PR08MB5325.eurprd08.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230031)(396003)(376002)(39860400002)(346002)(366004)(136003)(1800799009)(186009)(451199024)(71200400001)(7696005)(9686003)(6506007)(66899024)(83380400001)(38070700005)(26005)(38100700002)(122000001)(33656002)(86362001)(55016003)(2906002)(6916009)(316002)(41300700001)(15650500001)(30864003)(66946007)(64756008)(54906003)(66446008)(76116006)(66476007)(66556008)(5660300002)(8936002)(4326008)(8676002)(52536014)(478600001)(579004)(559001);DIR:OUT;SFP:1101;
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-MS-Exchange-Transport-CrossTenantHeadersStamped: PA4PR08MB6047
Original-Authentication-Results: dkim=none (message not signed)
 header.d=none;dmarc=none action=none header.from=arm.com;
X-EOPAttributedMessage: 0
X-MS-Exchange-Transport-CrossTenantHeadersStripped:
 AM7EUR03FT060.eop-EUR03.prod.protection.outlook.com
X-MS-PublicTrafficType: Email
X-MS-Office365-Filtering-Correlation-Id-Prvs:
	e4ac92a1-8f3b-4b43-83b8-08db9fdf433e
X-Microsoft-Antispam: BCL:0;
X-Microsoft-Antispam-Message-Info:
	dtZD0Z02wfZMbSMBZhn4bUcIsDigLvDQ0wzDNKDha3MSLLIQdTcn/ktxTTpzWvIoO/AkgX5VK62P2lv21J8YMc/IeB5Eb155bR5osWzLx+AC1xYkwabE+NEE/sj5gadbjim2zcfpsrNzpoGPwyPdWY6PCaIsyr8pu9hJCwNkKyD0gKa9T6mYeOfyqrwT0eTPDFzZL1+Esqp/duoKVNzeNiy04uuLvcqFwh78qiuVxC7AYU9JJ8ph8rnDja0AFSKckIx5H8n7jLS8E+xMlULoF5NnjF6gfKCXTWNImZU9S6drc8H2dzLV53A7YJX6Ub91UxcoShyxhJHCCUivaOHWVY+sAzsPvmaGDMyTVYG03bsYMPD9PqseNnMBkhrSe25JfwTyJg/mjQvspj8vOB47Bkk+CcaJiZ87A1uFKHswsHfW88ZM22uEd3acsMKZemyMUtnop6Y5wL7oEXv2I0kBGLzZrlSWWAhfge+uyiOsF1LpJ8syO5sHCx4WS9xVBtRG8DUaJjPoMIwTAOYl82gChu/bU8hYz4haQfmjjsfk8/XjQpp7QUcqjcgXqIb5Z/f3NLISpOQ/siONb+p2x1aFcRiOAa3ArWlm7ZvtZwDSUSZ0KR9HdPrKya9d0NoOZdlGd+hu5AuoCH0nblXmkSZmK69suvKCZERNRFTJ8pBaqtX48YqpUYYytnok9xC92F3alijL9QpHTJ0IGKonwuiG2Q8FAGzgCqAfD3wQRB7k50fdzpYfX6aDX+xXHSiGz+R/
X-Forefront-Antispam-Report:
	CIP:63.35.35.123;CTRY:IE;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:64aa7808-outbound-1.mta.getcheckrecipient.com;PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com;CAT:NONE;SFS:(13230031)(4636009)(346002)(136003)(39860400002)(376002)(396003)(186009)(1800799009)(82310400011)(451199024)(36840700001)(46966006)(40470700004)(6506007)(7696005)(9686003)(40460700003)(86362001)(66899024)(336012)(107886003)(55016003)(26005)(40480700001)(83380400001)(36860700001)(47076005)(81166007)(82740400003)(356005)(33656002)(2906002)(54906003)(316002)(15650500001)(30864003)(41300700001)(70206006)(70586007)(5660300002)(52536014)(8676002)(6862004)(4326008)(8936002)(478600001)(559001)(579004);DIR:OUT;SFP:1101;
X-OriginatorOrg: arm.com
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 18 Aug 2023 11:36:00.0083
 (UTC)
X-MS-Exchange-CrossTenant-Network-Message-Id: 93b3cbfb-dceb-4a2f-a422-08db9fdf4bc8
X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d
X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d;Ip=[63.35.35.123];Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com]
X-MS-Exchange-CrossTenant-AuthSource:
	AM7EUR03FT060.eop-EUR03.prod.protection.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Anonymous
X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem
X-MS-Exchange-Transport-CrossTenantHeadersStamped: DBBPR08MB6172
X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,FORGED_SPF_HELO,GIT_PATCH_0,KAM_DMARC_NONE,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE,TXREP,UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

> > Yeah if you comment it out one of the testcases should fail.
>=20
> using new_preheader instead of e->dest would make things clearer.
>=20
> You are now adding the same arg to every exit (you've just queried the
> main exit redirect_edge_var_map_vector).
>=20
> OK, so I think I understand what you're doing.  If I understand
> correctly we know that when we exit the main loop via one of the
> early exits we are definitely going to enter the epilog but when
> we take the main exit we might not.
>=20

Correct.. but..

> Looking at the CFG we create currently this isn't reflected and
> this complicates this PHI node updating.  What I'd try to do
> is leave redirecting the alternate exits until after

It is, in the case of the alternate exits this is reflected in copying
the same values, as they are the values of the number of completed=20
iterations since the scalar code restarts the last iteration.

So all the PHI nodes of the alternate exits are correct.  The vector
Iteration doesn't handle the partial iteration.

> slpeel_tree_duplicate_loop_to_edge_cfg finished which probably
> means leaving it almost unchanged besides the LC SSA maintaining
> changes.  After that for the multi-exit case split the
> epilog preheader edge and redirect all the alternate exits to the
> new preheader.  So the CFG becomes
>=20
>                  <original loop>
>                 /      |
>                /    <main exit w/ original LC PHI>
>               /      if (epilog)
>    alt exits /        /  \
>             /        /    loop around
>             |       /
>            preheader with "header" PHIs
>               |
>           <epilog>
>=20
> note you need the header PHIs also on the main exit path but you
> only need the loop end PHIs there.
>=20
> It seems so that at least currently the order of things makes
> them more complicated than necessary.

I've been trying to, but this representation seems a lot harder to work wit=
h,
In particular at the moment once we exit slpeel_tree_duplicate_loop_to_edge=
_cfg
the loop structure is exactly the same as one expects from any normal epilo=
g vectorization.

But this new representation requires me to place the guard much earlier tha=
n the epilogue
preheader,  yet I still have to adjust the PHI nodes in the preheader.  So =
it seems that this split
is there to only indicate that we always enter the epilog when taking an ea=
rly exit.

Today this is reflected in the values of the PHI nodes rather than structur=
ally.  Once we place
The guard we update the nodes and the alternate exits get their value for i=
vtmp updated to VF.

This representation also forces me to do the redirection in every call site=
 of
slpeel_tree_duplicate_loop_to_edge_cfg making the code more complicated in =
all use sites.

But I think this doesn't address the main reason why the slpeel_tree_duplic=
ate_loop_to_edge_cfg
code has a large block of code to deal with PHI node updates.

The reason as you mentioned somewhere else is that after we redirect the ed=
ges I have to reconstruct
the phi nodes.  For most it's straight forwards, but for live values or vus=
e chains it requires extra code.

You're right in that before we redirect the edges they are all correct in t=
he exit block, you mentioned that
the API for the edge redirection is supposed to copy the values over if I c=
reate the phi nodes before hand.

However this doesn't seem to work:

     for (auto gsi_from =3D gsi_start_phis (scalar_exit->dest);
	   !gsi_end_p (gsi_from); gsi_next (&gsi_from))
	{
	  gimple *from_phi =3D gsi_stmt (gsi_from);
	  tree new_res =3D copy_ssa_name (gimple_phi_result (from_phi));
	  create_phi_node (new_res, new_preheader);
	}

      for (edge exit : loop_exits)
	redirect_edge_and_branch (exit, new_preheader);

Still leaves them empty.  Grepping around most code seems to pair redirect_=
edge_and_branch with
copy_phi_arg_into_existing_phi.  The problem is that in all these cases aft=
er redirecting an edge they
call copy_phi_arg_into_existing_phi from a predecessor edge to fill in the =
phi nodes.

This is because as I redirect_edge_and_branch destroys the phi node entries=
 and copy_phi_arg_into_existing_phi
simply just reads the gimple_phi_arg_def which would be NULL.

You could point it to the src block of the exit, in which case it copies th=
e wrong values in for the vuses.  At the end
of vectorization the cfgcleanup code does the same thing to maintain LCSSA =
if you haven't.  This code always goes
wrong for multiple exits because of the problem described above.  There's n=
o node for it to copy the right value
from.

As an alternate approach I can split the exit edges, copy the phi nodes int=
o the split and after that redirect them.
This however creates the awkwardness of having the exit edges no longer con=
nect to the preheader.

All of this then begs the question if this is all easier than the current a=
pproach which is just to read the edge var
map to figure out the nodes that were removed during the redirect.

Maybe I'm still misunderstanding the API, but reading the sources of the fu=
nctions, they all copy values from *existing*
phi nodes.  And any existing phi node after the redirect are not correct.

gimple_redirect_edge_and_branch has a chunk that indicates it should have u=
pdated the PHI nodes before calling
ssa_redirect_edge to remove the old ones, but there's no code there. It's a=
ll empty.

Most of the other refactorings/changes were easy enough to do, but this one=
 I seem to be a struggling with.

Thanks,
Tamar
>=20
> > >
> > > >  	}
> > > > -      redirect_edge_and_branch_force (e, new_preheader);
> > > > -      flush_pending_stmts (e);
> > > > +
> > > >        set_immediate_dominator (CDI_DOMINATORS, new_preheader, e-
> >src);
> > > > -      if (was_imm_dom || duplicate_outer_loop)
> > > > +
> > > > +      if ((was_imm_dom || duplicate_outer_loop) && !multiple_exits=
_p)
> > > >  	set_immediate_dominator (CDI_DOMINATORS, exit_dest, new_exit-
> > > >src);
> > > >
> > > >        /* And remove the non-necessary forwarder again.  Keep the o=
ther
> > > > @@ -1647,9 +1756,42 @@ slpeel_tree_duplicate_loop_to_edge_cfg
> (class
> > > loop *loop,
> > > >        delete_basic_block (preheader);
> > > >        set_immediate_dominator (CDI_DOMINATORS, scalar_loop->header=
,
> > > >  			       loop_preheader_edge (scalar_loop)->src);
> > > > +
> > > > +      /* Finally after wiring the new epilogue we need to update i=
ts main
> exit
> > > > +	 to the original function exit we recorded.  Other exits are alre=
ady
> > > > +	 correct.  */
> > > > +      if (multiple_exits_p)
> > > > +	{
> > > > +	  for (edge e : get_loop_exit_edges (loop))
> > > > +	    doms.safe_push (e->dest);
> > > > +	  update_loop =3D new_loop;
> > > > +	  doms.safe_push (exit_dest);
> > > > +
> > > > +	  /* Likely a fall-through edge, so update if needed.  */
> > > > +	  if (single_succ_p (exit_dest))
> > > > +	    doms.safe_push (single_succ (exit_dest));
> > > > +	}
> > > >      }
> > > >    else /* Add the copy at entry.  */
> > > >      {
> > > > +      /* Copy the current loop LC PHI nodes between the original l=
oop exit
> > > > +	 block and the new loop header.  This allows us to later split th=
e
> > > > +	 preheader block and still find the right LC nodes.  */
> > > > +      edge old_latch_loop =3D loop_latch_edge (loop);
> > > > +      edge old_latch_init =3D loop_preheader_edge (loop);
> > > > +      edge new_latch_loop =3D loop_latch_edge (new_loop);
> > > > +      edge new_latch_init =3D loop_preheader_edge (new_loop);
> > > > +      for (auto gsi_from =3D gsi_start_phis (new_latch_init->dest)=
,
> > >
> > > see above
> > >
> > > > +	   gsi_to =3D gsi_start_phis (old_latch_loop->dest);
> > > > +	   flow_loops && !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
> > > > +	   gsi_next (&gsi_from), gsi_next (&gsi_to))
> > > > +	{
> > > > +	  gimple *from_phi =3D gsi_stmt (gsi_from);
> > > > +	  gimple *to_phi =3D gsi_stmt (gsi_to);
> > > > +	  tree new_arg =3D PHI_ARG_DEF_FROM_EDGE (from_phi,
> > > new_latch_loop);
> > > > +	  adjust_phi_and_debug_stmts (to_phi, old_latch_init, new_arg);
> > > > +	}
> > > > +
> > > >        if (scalar_loop !=3D loop)
> > > >  	{
> > > >  	  /* Remove the non-necessary forwarder of scalar_loop again.  */
> > > > @@ -1677,31 +1819,36 @@ slpeel_tree_duplicate_loop_to_edge_cfg
> (class
> > > loop *loop,
> > > >        delete_basic_block (new_preheader);
> > > >        set_immediate_dominator (CDI_DOMINATORS, new_loop->header,
> > > >  			       loop_preheader_edge (new_loop)->src);
> > > > +
> > > > +      if (multiple_exits_p)
> > > > +	update_loop =3D loop;
> > > >      }
> > > >
> > > > -  if (scalar_loop !=3D loop)
> > > > +  if (multiple_exits_p)
> > > >      {
> > > > -      /* Update new_loop->header PHIs, so that on the preheader
> > > > -	 edge they are the ones from loop rather than scalar_loop.  */
> > > > -      gphi_iterator gsi_orig, gsi_new;
> > > > -      edge orig_e =3D loop_preheader_edge (loop);
> > > > -      edge new_e =3D loop_preheader_edge (new_loop);
> > > > -
> > > > -      for (gsi_orig =3D gsi_start_phis (loop->header),
> > > > -	   gsi_new =3D gsi_start_phis (new_loop->header);
> > > > -	   !gsi_end_p (gsi_orig) && !gsi_end_p (gsi_new);
> > > > -	   gsi_next (&gsi_orig), gsi_next (&gsi_new))
> > > > +      for (edge e : get_loop_exit_edges (update_loop))
> > > >  	{
> > > > -	  gphi *orig_phi =3D gsi_orig.phi ();
> > > > -	  gphi *new_phi =3D gsi_new.phi ();
> > > > -	  tree orig_arg =3D PHI_ARG_DEF_FROM_EDGE (orig_phi, orig_e);
> > > > -	  location_t orig_locus
> > > > -	    =3D gimple_phi_arg_location_from_edge (orig_phi, orig_e);
> > > > -
> > > > -	  add_phi_arg (new_phi, orig_arg, new_e, orig_locus);
> > > > +	  edge ex;
> > > > +	  edge_iterator ei;
> > > > +	  FOR_EACH_EDGE (ex, ei, e->dest->succs)
> > > > +	    {
> > > > +	      /* Find the first non-fallthrough block as fall-throughs ca=
n't
> > > > +		 dominate other blocks.  */
> > > > +	      while ((ex->flags & EDGE_FALLTHRU)
>=20
> For the prologue peeling any early exit we take would skip all other
> loops so we can simply leave them and their LC PHI nodes in place.
> We need extra PHIs only on the path to the main vector loop.  I
> think the comment isn't accurately reflecting what we do.  In
> fact we do not add any LC PHI nodes here but simply adjust the
> main loop header PHI arguments?
>=20
> > > I don't think EDGE_FALLTHRU is set correctly, what's wrong with
> > > just using single_succ_p here?  A fallthru edge src dominates the
> > > fallthru edge dest, so the sentence above doesn't make sense.
> >
> > I wanted to say, that the immediate dominator of a block is never
> > an fall through block.  At least from what I understood from how
> > the dominators are calculated in the code, though may have missed
> > something.
>=20
>  BB1
>   |
>  BB2
>   |
>  BB3
>=20
> here the immediate dominator of BB3 is BB2 and that of BB2 is BB1.
>=20
> > >
> > > > +		     && single_succ_p (ex->dest))
> > > > +		{
> > > > +		  doms.safe_push (ex->dest);
> > > > +		  ex =3D single_succ_edge (ex->dest);
> > > > +		}
> > > > +	      doms.safe_push (ex->dest);
> > > > +	    }
> > > > +	  doms.safe_push (e->dest);
> > > >  	}
> > > > -    }
> > > >
> > > > +      iterate_fix_dominators (CDI_DOMINATORS, doms, false);
> > > > +      if (updated_doms)
> > > > +	updated_doms->safe_splice (doms);
> > > > +    }
> > > >    free (new_bbs);
> > > >    free (bbs);
> > > >
> > > > @@ -1777,6 +1924,9 @@ slpeel_can_duplicate_loop_p (const
> > > loop_vec_info loop_vinfo, const_edge e)
> > > >    gimple_stmt_iterator loop_exit_gsi =3D gsi_last_bb (exit_e->src)=
;
> > > >    unsigned int num_bb =3D loop->inner? 5 : 2;
> > > >
> > > > +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > +    num_bb +=3D LOOP_VINFO_ALT_EXITS (loop_vinfo).length ();
> > > > +
> > >
> > > I think checking the number of BBs is odd, I don't remember anything
> > > in slpeel is specifically tied to that?  I think we can simply drop
> > > this or do you remember anything that would depend on ->num_nodes
> > > being only exactly 5 or 2?
> >
> > Never actually seemed to require it, but they're used as some check to
> > see if there are unexpected control flow in the loop.
> >
> > i.e. this would say no if you have an if statement in the loop that was=
n't
> > converted.  The other part of this and the accompanying explanation is =
in
> > vect_analyze_loop_form.  In the patch series I had to remove the hard
> > num_nodes =3D=3D 2 check from there because number of nodes restricted
> > things too much.  If you have an empty fall through block, which seems =
to
> > happen often between the main exit and the latch block then we'd not
> > vectorize.
> >
> > So instead I now rejects loops after analyzing the gcond.  So think thi=
s check
> > can go/needs to be different.
>=20
> Lets remove it from this function then.
>=20
> > >
> > > >    /* All loops have an outer scope; the only case loop->outer is N=
ULL is for
> > > >       the function itself.  */
> > > >    if (!loop_outer (loop)
> > > > @@ -2044,6 +2194,11 @@ vect_update_ivs_after_vectorizer
> > > (loop_vec_info loop_vinfo,
> > > >    class loop *loop =3D LOOP_VINFO_LOOP (loop_vinfo);
> > > >    basic_block update_bb =3D update_e->dest;
> > > >
> > > > +  /* For early exits we'll update the IVs in
> > > > +     vect_update_ivs_after_early_break.  */
> > > > +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > +    return;
> > > > +
> > > >    basic_block exit_bb =3D LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> > > >
> > > >    /* Make sure there exists a single-predecessor exit bb:  */
> > > > @@ -2131,6 +2286,208 @@ vect_update_ivs_after_vectorizer
> > > (loop_vec_info loop_vinfo,
> > > >        /* Fix phi expressions in the successor bb.  */
> > > >        adjust_phi_and_debug_stmts (phi1, update_e, ni_name);
> > > >      }
> > > > +  return;
> > >
> > > we don't usually place a return at the end of void functions
> > >
> > > > +}
> > > > +
> > > > +/*   Function vect_update_ivs_after_early_break.
> > > > +
> > > > +     "Advance" the induction variables of LOOP to the value they s=
hould
> take
> > > > +     after the execution of LOOP.  This is currently necessary bec=
ause the
> > > > +     vectorizer does not handle induction variables that are used =
after the
> > > > +     loop.  Such a situation occurs when the last iterations of LO=
OP are
> > > > +     peeled, because of the early exit.  With an early exit we alw=
ays peel
> the
> > > > +     loop.
> > > > +
> > > > +     Input:
> > > > +     - LOOP_VINFO - a loop info structure for the loop that is goi=
ng to be
> > > > +		    vectorized. The last few iterations of LOOP were peeled.
> > > > +     - LOOP - a loop that is going to be vectorized. The last few =
iterations
> > > > +	      of LOOP were peeled.
> > > > +     - VF - The loop vectorization factor.
> > > > +     - NITERS_ORIG - the number of iterations that LOOP executes (=
before
> it is
> > > > +		     vectorized). i.e, the number of times the ivs should be
> > > > +		     bumped.
> > > > +     - NITERS_VECTOR - The number of iterations that the vector LO=
OP
> > > executes.
> > > > +     - UPDATE_E - a successor edge of LOOP->exit that is on the (o=
nly)
> path
> > > > +		  coming out from LOOP on which there are uses of the LOOP
> > > ivs
> > > > +		  (this is the path from LOOP->exit to epilog_loop->preheader).
> > > > +
> > > > +		  The new definitions of the ivs are placed in LOOP->exit.
> > > > +		  The phi args associated with the edge UPDATE_E in the bb
> > > > +		  UPDATE_E->dest are updated accordingly.
> > > > +
> > > > +     Output:
> > > > +       - If available, the LCSSA phi node for the loop IV temp.
> > > > +
> > > > +     Assumption 1: Like the rest of the vectorizer, this function =
assumes
> > > > +     a single loop exit that has a single predecessor.
> > > > +
> > > > +     Assumption 2: The phi nodes in the LOOP header and in update_=
bb
> are
> > > > +     organized in the same order.
> > > > +
> > > > +     Assumption 3: The access function of the ivs is simple enough=
 (see
> > > > +     vect_can_advance_ivs_p).  This assumption will be relaxed in =
the
> future.
> > > > +
> > > > +     Assumption 4: Exactly one of the successors of LOOP exit-bb i=
s on a
> path
> > > > +     coming out of LOOP on which the ivs of LOOP are used (this is=
 the
> path
> > > > +     that leads to the epilog loop; other paths skip the epilog lo=
op).  This
> > > > +     path starts with the edge UPDATE_E, and its destination (deno=
ted
> > > update_bb)
> > > > +     needs to have its phis updated.
> > > > + */
> > > > +
> > > > +static tree
> > > > +vect_update_ivs_after_early_break (loop_vec_info loop_vinfo, class
> loop *
> > > epilog,
> > > > +				   poly_int64 vf, tree niters_orig,
> > > > +				   tree niters_vector, edge update_e)
> > > > +{
> > > > +  if (!LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > +    return NULL;
> > > > +
> > > > +  gphi_iterator gsi, gsi1;
> > > > +  tree ni_name, ivtmp =3D NULL;
> > > > +  basic_block update_bb =3D update_e->dest;
> > > > +  vec<edge> alt_exits =3D LOOP_VINFO_ALT_EXITS (loop_vinfo);
> > > > +  edge loop_iv =3D LOOP_VINFO_IV_EXIT (loop_vinfo);
> > > > +  basic_block exit_bb =3D loop_iv->dest;
> > > > +  class loop *loop =3D LOOP_VINFO_LOOP (loop_vinfo);
> > > > +  gcond *cond =3D LOOP_VINFO_LOOP_IV_COND (loop_vinfo);
> > > > +
> > > > +  gcc_assert (cond);
> > > > +
> > > > +  for (gsi =3D gsi_start_phis (loop->header), gsi1 =3D gsi_start_p=
his
> (update_bb);
> > > > +       !gsi_end_p (gsi) && !gsi_end_p (gsi1);
> > > > +       gsi_next (&gsi), gsi_next (&gsi1))
> > > > +    {
> > > > +      tree init_expr, final_expr, step_expr;
> > > > +      tree type;
> > > > +      tree var, ni, off;
> > > > +      gimple_stmt_iterator last_gsi;
> > > > +
> > > > +      gphi *phi =3D gsi1.phi ();
> > > > +      tree phi_ssa =3D PHI_ARG_DEF_FROM_EDGE (phi,
> loop_preheader_edge
> > > (epilog));
> > >
> > > I'm confused about the setup.  update_bb looks like the block with th=
e
> > > loop-closed PHI nodes of 'loop' and the exit (update_e)?  How does
> > > loop_preheader_edge (epilog) come into play here?  That would feed in=
to
> > > epilog->header PHIs?!
> >
> > We can't query the type of the phis in the block with the LC PHI nodes,=
 so the
> > Typical pattern seems to be that we iterate over a block that's part of=
 the
> loop
> > and that would have the PHIs in the same order, just so we can get to t=
he
> > stmt_vec_info.
> >
> > >
> > > It would be nice to name 'gsi[1]', 'update_e' and 'update_bb' in a
> > > better way?  Is update_bb really epilog->header?!
> > >
> > > We're missing checking in PHI_ARG_DEF_FROM_EDGE, namely that
> > > E->dest =3D=3D gimple_bb (PHI) - we're just using E->dest_idx there
> > > which "works" even for totally unrelated edges.
> > >
> > > > +      gphi *phi1 =3D dyn_cast <gphi *> (SSA_NAME_DEF_STMT (phi_ssa=
));
> > > > +      if (!phi1)
> > >
> > > shouldn't that be an assert?
> > >
> > > > +	continue;
> > > > +      stmt_vec_info phi_info =3D loop_vinfo->lookup_stmt (gsi.phi =
());
> > > > +      if (dump_enabled_p ())
> > > > +	dump_printf_loc (MSG_NOTE, vect_location,
> > > > +			 "vect_update_ivs_after_early_break: phi: %G",
> > > > +			 (gimple *)phi);
> > > > +
> > > > +      /* Skip reduction and virtual phis.  */
> > > > +      if (!iv_phi_p (phi_info))
> > > > +	{
> > > > +	  if (dump_enabled_p ())
> > > > +	    dump_printf_loc (MSG_NOTE, vect_location,
> > > > +			     "reduc or virtual phi. skip.\n");
> > > > +	  continue;
> > > > +	}
> > > > +
> > > > +      /* For multiple exits where we handle early exits we need to=
 carry on
> > > > +	 with the previous IV as loop iteration was not done because we e=
xited
> > > > +	 early.  As such just grab the original IV.  */
> > > > +      phi_ssa =3D PHI_ARG_DEF_FROM_EDGE (gsi.phi (), loop_latch_ed=
ge
> > > (loop));
> > >
> > > but this should be taken care of by LC SSA?
> >
> > It is, the comment is probably missing details, this part just scales t=
he
> counter
> > from VF to scalar counts.  It's just a reminder that this scaling is do=
ne
> differently
> > from normal single exit vectorization.
> >
> > >
> > > OK, have to continue tomorrow from here.
> >
> > Cheers, Thank you!
> >
> > Tamar
> >
> > >
> > > Richard.
> > >
> > > > +      if (gimple_cond_lhs (cond) !=3D phi_ssa
> > > > +	  && gimple_cond_rhs (cond) !=3D phi_ssa)
>=20
> so this is a way to avoid touching the main IV?  Looks a bit fragile to
> me.  Hmm, we're iterating over the main loop header PHIs here?
> Can't you check, say, the relevancy of the PHI node instead?  Though
> it might also be used as induction.  Can't it be used as alternate
> exit like
>=20
>   for (i)
>    {
>      if (i & bit)
>        break;
>    }
>=20
> and would we need to adjust 'i' then?
>=20
> > > > +	{
> > > > +	  type =3D TREE_TYPE (gimple_phi_result (phi));
> > > > +	  step_expr =3D STMT_VINFO_LOOP_PHI_EVOLUTION_PART (phi_info);
> > > > +	  step_expr =3D unshare_expr (step_expr);
> > > > +
> > > > +	  /* We previously generated the new merged phi in the same BB as
> > > the
> > > > +	     guard.  So use that to perform the scaling on rather than th=
e
> > > > +	     normal loop phi which don't take the early breaks into accou=
nt.  */
> > > > +	  final_expr =3D gimple_phi_result (phi1);
> > > > +	  init_expr =3D PHI_ARG_DEF_FROM_EDGE (gsi.phi (),
> > > loop_preheader_edge (loop));
> > > > +
> > > > +	  tree stype =3D TREE_TYPE (step_expr);
> > > > +	  /* For early break the final loop IV is:
> > > > +	     init + (final - init) * vf which takes into account peeling
> > > > +	     values and non-single steps.  */
> > > > +	  off =3D fold_build2 (MINUS_EXPR, stype,
> > > > +			     fold_convert (stype, final_expr),
> > > > +			     fold_convert (stype, init_expr));
> > > > +	  /* Now adjust for VF to get the final iteration value.  */
> > > > +	  off =3D fold_build2 (MULT_EXPR, stype, off, build_int_cst (styp=
e, vf));
> > > > +
> > > > +	  /* Adjust the value with the offset.  */
> > > > +	  if (POINTER_TYPE_P (type))
> > > > +	    ni =3D fold_build_pointer_plus (init_expr, off);
> > > > +	  else
> > > > +	    ni =3D fold_convert (type,
> > > > +			       fold_build2 (PLUS_EXPR, stype,
> > > > +					    fold_convert (stype, init_expr),
> > > > +					    off));
> > > > +	  var =3D create_tmp_var (type, "tmp");
>=20
> so how does the non-early break code deal with updating inductions?
> And how do you avoid altering this when we flow in from the normal
> exit?  That is, you are updating the value in the epilog loop
> header but don't you need to instead update the value only on
> the alternate exit edges from the main loop (and keep the not
> updated value on the main exit edge)?
>=20
> > > > +	  last_gsi =3D gsi_last_bb (exit_bb);
> > > > +	  gimple_seq new_stmts =3D NULL;
> > > > +	  ni_name =3D force_gimple_operand (ni, &new_stmts, false, var);
> > > > +	  /* Exit_bb shouldn't be empty.  */
> > > > +	  if (!gsi_end_p (last_gsi))
> > > > +	    gsi_insert_seq_after (&last_gsi, new_stmts, GSI_SAME_STMT);
> > > > +	  else
> > > > +	    gsi_insert_seq_before (&last_gsi, new_stmts, GSI_SAME_STMT);
> > > > +
> > > > +	  /* Fix phi expressions in the successor bb.  */
> > > > +	  adjust_phi_and_debug_stmts (phi, update_e, ni_name);
> > > > +	}
> > > > +      else
> > > > +	{
> > > > +	  type =3D TREE_TYPE (gimple_phi_result (phi));
> > > > +	  step_expr =3D STMT_VINFO_LOOP_PHI_EVOLUTION_PART (phi_info);
> > > > +	  step_expr =3D unshare_expr (step_expr);
> > > > +
> > > > +	  /* We previously generated the new merged phi in the same BB as
> > > the
> > > > +	     guard.  So use that to perform the scaling on rather than th=
e
> > > > +	     normal loop phi which don't take the early breaks into accou=
nt.  */
> > > > +	  init_expr =3D PHI_ARG_DEF_FROM_EDGE (phi1, loop_preheader_edge
> > > (loop));
> > > > +	  tree stype =3D TREE_TYPE (step_expr);
> > > > +
> > > > +	  if (vf.is_constant ())
> > > > +	    {
> > > > +	      ni =3D fold_build2 (MULT_EXPR, stype,
> > > > +				fold_convert (stype,
> > > > +					      niters_vector),
> > > > +				build_int_cst (stype, vf));
> > > > +
> > > > +	      ni =3D fold_build2 (MINUS_EXPR, stype,
> > > > +				fold_convert (stype,
> > > > +					      niters_orig),
> > > > +				fold_convert (stype, ni));
> > > > +	    }
> > > > +	  else
> > > > +	    /* If the loop's VF isn't constant then the loop must have be=
en
> > > > +	       masked, so at the end of the loop we know we have finished
> > > > +	       the entire loop and found nothing.  */
> > > > +	    ni =3D build_zero_cst (stype);
> > > > +
> > > > +	  ni =3D fold_convert (type, ni);
> > > > +	  /* We don't support variable n in this version yet.  */
> > > > +	  gcc_assert (TREE_CODE (ni) =3D=3D INTEGER_CST);
> > > > +
> > > > +	  var =3D create_tmp_var (type, "tmp");
> > > > +
> > > > +	  last_gsi =3D gsi_last_bb (exit_bb);
> > > > +	  gimple_seq new_stmts =3D NULL;
> > > > +	  ni_name =3D force_gimple_operand (ni, &new_stmts, false, var);
> > > > +	  /* Exit_bb shouldn't be empty.  */
> > > > +	  if (!gsi_end_p (last_gsi))
> > > > +	    gsi_insert_seq_after (&last_gsi, new_stmts, GSI_SAME_STMT);
> > > > +	  else
> > > > +	    gsi_insert_seq_before (&last_gsi, new_stmts, GSI_SAME_STMT);
> > > > +
> > > > +	  adjust_phi_and_debug_stmts (phi1, loop_iv, ni_name);
> > > > +
> > > > +	  for (edge exit : alt_exits)
> > > > +	    adjust_phi_and_debug_stmts (phi1, exit,
> > > > +					build_int_cst (TREE_TYPE (step_expr),
> > > > +						       vf));
> > > > +	  ivtmp =3D gimple_phi_result (phi1);
> > > > +	}
> > > > +    }
> > > > +
> > > > +  return ivtmp;
> > > >  }
> > > >
> > > >  /* Return a gimple value containing the misalignment (measured in
> vector
> > > > @@ -2632,137 +2989,34 @@ vect_gen_vector_loop_niters_mult_vf
> > > (loop_vec_info loop_vinfo,
> > > >
> > > >  /* LCSSA_PHI is a lcssa phi of EPILOG loop which is copied from LO=
OP,
> > > >     this function searches for the corresponding lcssa phi node in =
exit
> > > > -   bb of LOOP.  If it is found, return the phi result; otherwise r=
eturn
> > > > -   NULL.  */
> > > > +   bb of LOOP following the LCSSA_EDGE to the exit node.  If it is=
 found,
> > > > +   return the phi result; otherwise return NULL.  */
> > > >
> > > >  static tree
> > > >  find_guard_arg (class loop *loop, class loop *epilog
> ATTRIBUTE_UNUSED,
> > > > -		gphi *lcssa_phi)
> > > > +		gphi *lcssa_phi, int lcssa_edge =3D 0)
> > > >  {
> > > >    gphi_iterator gsi;
> > > >    edge e =3D loop->vec_loop_iv;
> > > >
> > > > -  gcc_assert (single_pred_p (e->dest));
> > > >    for (gsi =3D gsi_start_phis (e->dest); !gsi_end_p (gsi); gsi_nex=
t (&gsi))
> > > >      {
> > > >        gphi *phi =3D gsi.phi ();
> > > > -      if (operand_equal_p (PHI_ARG_DEF (phi, 0),
> > > > -			   PHI_ARG_DEF (lcssa_phi, 0), 0))
> > > > -	return PHI_RESULT (phi);
> > > > -    }
> > > > -  return NULL_TREE;
> > > > -}
> > > > -
> > > > -/* Function slpeel_tree_duplicate_loop_to_edge_cfg duplciates
> > > FIRST/SECOND
> > > > -   from SECOND/FIRST and puts it at the original loop's preheader/=
exit
> > > > -   edge, the two loops are arranged as below:
> > > > -
> > > > -       preheader_a:
> > > > -     first_loop:
> > > > -       header_a:
> > > > -	 i_1 =3D PHI<i_0, i_2>;
> > > > -	 ...
> > > > -	 i_2 =3D i_1 + 1;
> > > > -	 if (cond_a)
> > > > -	   goto latch_a;
> > > > -	 else
> > > > -	   goto between_bb;
> > > > -       latch_a:
> > > > -	 goto header_a;
> > > > -
> > > > -       between_bb:
> > > > -	 ;; i_x =3D PHI<i_2>;   ;; LCSSA phi node to be created for FIRST=
,
> > > > -
> > > > -     second_loop:
> > > > -       header_b:
> > > > -	 i_3 =3D PHI<i_0, i_4>; ;; Use of i_0 to be replaced with i_x,
> > > > -				 or with i_2 if no LCSSA phi is created
> > > > -				 under condition of
> > > CREATE_LCSSA_FOR_IV_PHIS.
> > > > -	 ...
> > > > -	 i_4 =3D i_3 + 1;
> > > > -	 if (cond_b)
> > > > -	   goto latch_b;
> > > > -	 else
> > > > -	   goto exit_bb;
> > > > -       latch_b:
> > > > -	 goto header_b;
> > > > -
> > > > -       exit_bb:
> > > > -
> > > > -   This function creates loop closed SSA for the first loop; updat=
e the
> > > > -   second loop's PHI nodes by replacing argument on incoming edge =
with
> the
> > > > -   result of newly created lcssa PHI nodes.  IF
> CREATE_LCSSA_FOR_IV_PHIS
> > > > -   is false, Loop closed ssa phis will only be created for non-iv =
phis for
> > > > -   the first loop.
> > > > -
> > > > -   This function assumes exit bb of the first loop is preheader bb=
 of the
> > > > -   second loop, i.e, between_bb in the example code.  With PHIs up=
dated,
> > > > -   the second loop will execute rest iterations of the first.  */
> > > > -
> > > > -static void
> > > > -slpeel_update_phi_nodes_for_loops (loop_vec_info loop_vinfo,
> > > > -				   class loop *first, class loop *second,
> > > > -				   bool create_lcssa_for_iv_phis)
> > > > -{
> > > > -  gphi_iterator gsi_update, gsi_orig;
> > > > -  class loop *loop =3D LOOP_VINFO_LOOP (loop_vinfo);
> > > > -
> > > > -  edge first_latch_e =3D EDGE_SUCC (first->latch, 0);
> > > > -  edge second_preheader_e =3D loop_preheader_edge (second);
> > > > -  basic_block between_bb =3D single_exit (first)->dest;
> > > > -
> > > > -  gcc_assert (between_bb =3D=3D second_preheader_e->src);
> > > > -  gcc_assert (single_pred_p (between_bb) && single_succ_p
> (between_bb));
> > > > -  /* Either the first loop or the second is the loop to be vectori=
zed.  */
> > > > -  gcc_assert (loop =3D=3D first || loop =3D=3D second);
> > > > -
> > > > -  for (gsi_orig =3D gsi_start_phis (first->header),
> > > > -       gsi_update =3D gsi_start_phis (second->header);
> > > > -       !gsi_end_p (gsi_orig) && !gsi_end_p (gsi_update);
> > > > -       gsi_next (&gsi_orig), gsi_next (&gsi_update))
> > > > -    {
> > > > -      gphi *orig_phi =3D gsi_orig.phi ();
> > > > -      gphi *update_phi =3D gsi_update.phi ();
> > > > -
> > > > -      tree arg =3D PHI_ARG_DEF_FROM_EDGE (orig_phi, first_latch_e)=
;
> > > > -      /* Generate lcssa PHI node for the first loop.  */
> > > > -      gphi *vect_phi =3D (loop =3D=3D first) ? orig_phi : update_p=
hi;
> > > > -      stmt_vec_info vect_phi_info =3D loop_vinfo->lookup_stmt (vec=
t_phi);
> > > > -      if (create_lcssa_for_iv_phis || !iv_phi_p (vect_phi_info))
> > > > +      /* Nested loops with multiple exits can have different no# p=
hi node
> > > > +	 arguments between the main loop and epilog as epilog falls to th=
e
> > > > +	 second loop.  */
> > > > +      if (gimple_phi_num_args (phi) > e->dest_idx)
> > > >  	{
> > > > -	  tree new_res =3D copy_ssa_name (PHI_RESULT (orig_phi));
> > > > -	  gphi *lcssa_phi =3D create_phi_node (new_res, between_bb);
> > > > -	  add_phi_arg (lcssa_phi, arg, single_exit (first),
> > > UNKNOWN_LOCATION);
> > > > -	  arg =3D new_res;
> > > > -	}
> > > > -
> > > > -      /* Update PHI node in the second loop by replacing arg on th=
e loop's
> > > > -	 incoming edge.  */
> > > > -      adjust_phi_and_debug_stmts (update_phi, second_preheader_e,
> arg);
> > > > -    }
> > > > -
> > > > -  /* For epilogue peeling we have to make sure to copy all LC PHIs
> > > > -     for correct vectorization of live stmts.  */
> > > > -  if (loop =3D=3D first)
> > > > -    {
> > > > -      basic_block orig_exit =3D single_exit (second)->dest;
> > > > -      for (gsi_orig =3D gsi_start_phis (orig_exit);
> > > > -	   !gsi_end_p (gsi_orig); gsi_next (&gsi_orig))
> > > > -	{
> > > > -	  gphi *orig_phi =3D gsi_orig.phi ();
> > > > -	  tree orig_arg =3D PHI_ARG_DEF (orig_phi, 0);
> > > > -	  if (TREE_CODE (orig_arg) !=3D SSA_NAME || virtual_operand_p
> > > (orig_arg))
> > > > -	    continue;
> > > > -
> > > > -	  /* Already created in the above loop.   */
> > > > -	  if (find_guard_arg (first, second, orig_phi))
> > > > +	  tree var =3D PHI_ARG_DEF (phi, e->dest_idx);
> > > > +	  if (TREE_CODE (var) !=3D SSA_NAME)
> > > >  	    continue;
> > > >
> > > > -	  tree new_res =3D copy_ssa_name (orig_arg);
> > > > -	  gphi *lcphi =3D create_phi_node (new_res, between_bb);
> > > > -	  add_phi_arg (lcphi, orig_arg, single_exit (first),
> > > UNKNOWN_LOCATION);
> > > > +	  if (operand_equal_p (get_current_def (var),
> > > > +			       PHI_ARG_DEF (lcssa_phi, lcssa_edge), 0))
> > > > +	    return PHI_RESULT (phi);
> > > >  	}
> > > >      }
> > > > +  return NULL_TREE;
> > > >  }
> > > >
> > > >  /* Function slpeel_add_loop_guard adds guard skipping from the
> beginning
> > > > @@ -2910,13 +3164,11 @@ slpeel_update_phi_nodes_for_guard2
> (class
> > > loop *loop, class loop *epilog,
> > > >    gcc_assert (single_succ_p (merge_bb));
> > > >    edge e =3D single_succ_edge (merge_bb);
> > > >    basic_block exit_bb =3D e->dest;
> > > > -  gcc_assert (single_pred_p (exit_bb));
> > > > -  gcc_assert (single_pred (exit_bb) =3D=3D single_exit (epilog)->d=
est);
> > > >
> > > >    for (gsi =3D gsi_start_phis (exit_bb); !gsi_end_p (gsi); gsi_nex=
t (&gsi))
> > > >      {
> > > >        gphi *update_phi =3D gsi.phi ();
> > > > -      tree old_arg =3D PHI_ARG_DEF (update_phi, 0);
> > > > +      tree old_arg =3D PHI_ARG_DEF (update_phi, e->dest_idx);
> > > >
> > > >        tree merge_arg =3D NULL_TREE;
> > > >
> > > > @@ -2928,7 +3180,7 @@ slpeel_update_phi_nodes_for_guard2 (class
> loop
> > > *loop, class loop *epilog,
> > > >        if (!merge_arg)
> > > >  	merge_arg =3D old_arg;
> > > >
> > > > -      tree guard_arg =3D find_guard_arg (loop, epilog, update_phi)=
;
> > > > +      tree guard_arg =3D find_guard_arg (loop, epilog, update_phi,=
 e-
> >dest_idx);
> > > >        /* If the var is live after loop but not a reduction, we sim=
ply
> > > >  	 use the old arg.  */
> > > >        if (!guard_arg)
> > > > @@ -2948,21 +3200,6 @@ slpeel_update_phi_nodes_for_guard2 (class
> > > loop *loop, class loop *epilog,
> > > >      }
> > > >  }
> > > >
> > > > -/* EPILOG loop is duplicated from the original loop for vectorizin=
g,
> > > > -   the arg of its loop closed ssa PHI needs to be updated.  */
> > > > -
> > > > -static void
> > > > -slpeel_update_phi_nodes_for_lcssa (class loop *epilog)
> > > > -{
> > > > -  gphi_iterator gsi;
> > > > -  basic_block exit_bb =3D single_exit (epilog)->dest;
> > > > -
> > > > -  gcc_assert (single_pred_p (exit_bb));
> > > > -  edge e =3D EDGE_PRED (exit_bb, 0);
> > > > -  for (gsi =3D gsi_start_phis (exit_bb); !gsi_end_p (gsi); gsi_nex=
t (&gsi))
> > > > -    rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi.phi (), e));
> > > > -}
> > > > -
>=20
> I wonder if we can still split these changes out to before early break
> vect?
>=20
> > > >  /* EPILOGUE_VINFO is an epilogue loop that we now know would need
> to
> > > >     iterate exactly CONST_NITERS times.  Make a final decision abou=
t
> > > >     whether the epilogue loop should be used, returning true if so.=
  */
> > > > @@ -3138,6 +3375,14 @@ vect_do_peeling (loop_vec_info loop_vinfo,
> tree
> > > niters, tree nitersm1,
> > > >      bound_epilog +=3D vf - 1;
> > > >    if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
> > > >      bound_epilog +=3D 1;
> > > > +  /* For early breaks the scalar loop needs to execute at most VF =
times
> > > > +     to find the element that caused the break.  */
> > > > +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > +    {
> > > > +      bound_epilog =3D vf;
> > > > +      /* Force a scalar epilogue as we can't vectorize the index f=
inding.  */
> > > > +      vect_epilogues =3D false;
> > > > +    }
> > > >    bool epilog_peeling =3D maybe_ne (bound_epilog, 0U);
> > > >    poly_uint64 bound_scalar =3D bound_epilog;
> > > >
> > > > @@ -3297,16 +3542,24 @@ vect_do_peeling (loop_vec_info
> loop_vinfo,
> > > tree niters, tree nitersm1,
> > > >  				  bound_prolog + bound_epilog)
> > > >  		      : (!LOOP_REQUIRES_VERSIONING (loop_vinfo)
> > > >  			 || vect_epilogues));
> > > > +
> > > > +  /* We only support early break vectorization on known bounds at =
this
> > > time.
> > > > +     This means that if the vector loop can't be entered then we w=
on't
> > > generate
> > > > +     it at all.  So for now force skip_vector off because the addi=
tional
> control
> > > > +     flow messes with the BB exits and we've already analyzed them=
.  */
> > > > + skip_vector =3D skip_vector && !LOOP_VINFO_EARLY_BREAKS
> (loop_vinfo);
> > > > +
>=20
> I think it should be as "easy" as entering the epilog via the block takin=
g
> the regular exit?
>=20
> > > >    /* Epilog loop must be executed if the number of iterations for =
epilog
> > > >       loop is known at compile time, otherwise we need to add a che=
ck at
> > > >       the end of vector loop and skip to the end of epilog loop.  *=
/
> > > >    bool skip_epilog =3D (prolog_peeling < 0
> > > >  		      || !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> > > >  		      || !vf.is_constant ());
> > > > -  /* PEELING_FOR_GAPS is special because epilog loop must be execu=
ted.
> */
> > > > -  if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
> > > > +  /* PEELING_FOR_GAPS and peeling for early breaks are special bec=
ause
> > > epilog
> > > > +     loop must be executed.  */
> > > > +  if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
> > > > +      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > >      skip_epilog =3D false;
> > > > -
> > > >    class loop *scalar_loop =3D LOOP_VINFO_SCALAR_LOOP (loop_vinfo);
> > > >    auto_vec<profile_count> original_counts;
> > > >    basic_block *original_bbs =3D NULL;
> > > > @@ -3344,13 +3597,13 @@ vect_do_peeling (loop_vec_info
> loop_vinfo,
> > > tree niters, tree nitersm1,
> > > >    if (prolog_peeling)
> > > >      {
> > > >        e =3D loop_preheader_edge (loop);
> > > > -      gcc_checking_assert (slpeel_can_duplicate_loop_p (loop, e));
> > > > -
> > > > +      gcc_checking_assert (slpeel_can_duplicate_loop_p (loop_vinfo=
, e));
> > > >        /* Peel prolog and put it on preheader edge of loop.  */
> > > > -      prolog =3D slpeel_tree_duplicate_loop_to_edge_cfg (loop, sca=
lar_loop,
> e);
> > > > +      prolog =3D slpeel_tree_duplicate_loop_to_edge_cfg (loop, sca=
lar_loop,
> e,
> > > > +						       true);
> > > >        gcc_assert (prolog);
> > > >        prolog->force_vectorize =3D false;
> > > > -      slpeel_update_phi_nodes_for_loops (loop_vinfo, prolog, loop,=
 true);
> > > > +
> > > >        first_loop =3D prolog;
> > > >        reset_original_copy_tables ();
> > > >
> > > > @@ -3420,11 +3673,12 @@ vect_do_peeling (loop_vec_info
> loop_vinfo,
> > > tree niters, tree nitersm1,
> > > >  	 as the transformations mentioned above make less or no sense whe=
n
> > > not
> > > >  	 vectorizing.  */
> > > >        epilog =3D vect_epilogues ? get_loop_copy (loop) : scalar_lo=
op;
> > > > -      epilog =3D slpeel_tree_duplicate_loop_to_edge_cfg (loop, epi=
log, e);
> > > > +      auto_vec<basic_block> doms;
> > > > +      epilog =3D slpeel_tree_duplicate_loop_to_edge_cfg (loop, epi=
log, e,
> true,
> > > > +						       &doms);
> > > >        gcc_assert (epilog);
> > > >
> > > >        epilog->force_vectorize =3D false;
> > > > -      slpeel_update_phi_nodes_for_loops (loop_vinfo, loop, epilog,=
 false);
> > > >
> > > >        /* Scalar version loop may be preferred.  In this case, add =
guard
> > > >  	 and skip to epilog.  Note this only happens when the number of
> > > > @@ -3496,6 +3750,54 @@ vect_do_peeling (loop_vec_info loop_vinfo,
> tree
> > > niters, tree nitersm1,
> > > >        vect_update_ivs_after_vectorizer (loop_vinfo, niters_vector_=
mult_vf,
> > > >  					update_e);
> > > >
> > > > +      /* For early breaks we must create a guard to check how many
> iterations
> > > > +	 of the scalar loop are yet to be performed.  */
>=20
> We have this check anyway, no?  In fact don't we know that we always ente=
r
> the epilog (see above)?
>=20
> > > > +      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > +	{
> > > > +	  tree ivtmp =3D
> > > > +	    vect_update_ivs_after_early_break (loop_vinfo, epilog, vf, ni=
ters,
> > > > +					       *niters_vector, update_e);
> > > > +
> > > > +	  gcc_assert (ivtmp);
> > > > +	  tree guard_cond =3D fold_build2 (EQ_EXPR, boolean_type_node,
> > > > +					 fold_convert (TREE_TYPE (niters),
> > > > +						       ivtmp),
> > > > +					 build_zero_cst (TREE_TYPE (niters)));
> > > > +	  basic_block guard_bb =3D LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> > > > +
> > > > +	  /* If we had a fallthrough edge, the guard will the threaded th=
rough
> > > > +	     and so we may need to find the actual final edge.  */
> > > > +	  edge final_edge =3D epilog->vec_loop_iv;
> > > > +	  /* slpeel_update_phi_nodes_for_guard2 expects an empty block in
> > > > +	     between the guard and the exit edge.  It only adds new nodes=
 and
> > > > +	     doesn't update existing one in the current scheme.  */
> > > > +	  basic_block guard_to =3D split_edge (final_edge);
> > > > +	  edge guard_e =3D slpeel_add_loop_guard (guard_bb, guard_cond,
> > > guard_to,
> > > > +						guard_bb, prob_epilog.invert
> > > (),
> > > > +						irred_flag);
> > > > +	  doms.safe_push (guard_bb);
> > > > +
> > > > +	  iterate_fix_dominators (CDI_DOMINATORS, doms, false);
> > > > +
> > > > +	  /* We must update all the edges from the new guard_bb.  */
> > > > +	  slpeel_update_phi_nodes_for_guard2 (loop, epilog, guard_e,
> > > > +					      final_edge);
> > > > +
> > > > +	  /* If the loop was versioned we'll have an intermediate BB betw=
een
> > > > +	     the guard and the exit.  This intermediate block is required
> > > > +	     because in the current scheme of things the guard block phi
> > > > +	     updating can only maintain LCSSA by creating new blocks.  In=
 this
> > > > +	     case we just need to update the uses in this block as well. =
 */
> > > > +	  if (loop !=3D scalar_loop)
> > > > +	    {
> > > > +	      for (gphi_iterator gsi =3D gsi_start_phis (guard_to);
> > > > +		   !gsi_end_p (gsi); gsi_next (&gsi))
> > > > +		rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi.phi (),
> > > guard_e));
> > > > +	    }
> > > > +
> > > > +	  flush_pending_stmts (guard_e);
> > > > +	}
> > > > +
> > > >        if (skip_epilog)
> > > >  	{
> > > >  	  guard_cond =3D fold_build2 (EQ_EXPR, boolean_type_node,
> > > > @@ -3520,8 +3822,6 @@ vect_do_peeling (loop_vec_info loop_vinfo,
> tree
> > > niters, tree nitersm1,
> > > >  	    }
> > > >  	  scale_loop_profile (epilog, prob_epilog, 0);
> > > >  	}
> > > > -      else
> > > > -	slpeel_update_phi_nodes_for_lcssa (epilog);
> > > >
> > > >        unsigned HOST_WIDE_INT bound;
> > > >        if (bound_scalar.is_constant (&bound))
> > > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > > > index
> > >
> b4a98de80aa39057fc9b17977dd0e347b4f0fb5d..ab9a2048186f461f5ec49
> > > f21421958e7ee25eada 100644
> > > > --- a/gcc/tree-vect-loop.cc
> > > > +++ b/gcc/tree-vect-loop.cc
> > > > @@ -1007,6 +1007,8 @@ _loop_vec_info::_loop_vec_info (class loop
> > > *loop_in, vec_info_shared *shared)
> > > >      partial_load_store_bias (0),
> > > >      peeling_for_gaps (false),
> > > >      peeling_for_niter (false),
> > > > +    early_breaks (false),
> > > > +    non_break_control_flow (false),
> > > >      no_data_dependencies (false),
> > > >      has_mask_store (false),
> > > >      scalar_loop_scaling (profile_probability::uninitialized ()),
> > > > @@ -1199,6 +1201,14 @@ vect_need_peeling_or_partial_vectors_p
> > > (loop_vec_info loop_vinfo)
> > > >      th =3D LOOP_VINFO_COST_MODEL_THRESHOLD
> > > (LOOP_VINFO_ORIG_LOOP_INFO
> > > >  					  (loop_vinfo));
> > > >
> > > > +  /* When we have multiple exits and VF is unknown, we must requir=
e
> > > partial
> > > > +     vectors because the loop bounds is not a minimum but a maximu=
m.
> > > That is to
> > > > +     say we cannot unpredicate the main loop unless we peel or use=
 partial
> > > > +     vectors in the epilogue.  */
> > > > +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > > > +      && !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ())
> > > > +    return true;
> > > > +
> > > >    if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> > > >        && LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) >=3D 0)
> > > >      {
> > > > @@ -1652,12 +1662,12 @@ vect_compute_single_scalar_iteration_cost
> > > (loop_vec_info loop_vinfo)
> > > >    loop_vinfo->scalar_costs->finish_cost (nullptr);
> > > >  }
> > > >
> > > > -
> > > >  /* Function vect_analyze_loop_form.
> > > >
> > > >     Verify that certain CFG restrictions hold, including:
> > > >     - the loop has a pre-header
> > > > -   - the loop has a single entry and exit
> > > > +   - the loop has a single entry
> > > > +   - nested loops can have only a single exit.
> > > >     - the loop exit condition is simple enough
> > > >     - the number of iterations can be analyzed, i.e, a countable lo=
op.  The
> > > >       niter could be analyzed under some assumptions.  */
> > > > @@ -1693,11 +1703,6 @@ vect_analyze_loop_form (class loop *loop,
> > > vect_loop_form_info *info)
> > > >                             |
> > > >                          (exit-bb)  */
> > > >
> > > > -      if (loop->num_nodes !=3D 2)
> > > > -	return opt_result::failure_at (vect_location,
> > > > -				       "not vectorized:"
> > > > -				       " control flow in loop.\n");
> > > > -
> > > >        if (empty_block_p (loop->header))
> > > >  	return opt_result::failure_at (vect_location,
> > > >  				       "not vectorized: empty loop.\n");
> > > > @@ -1768,11 +1773,13 @@ vect_analyze_loop_form (class loop *loop,
> > > vect_loop_form_info *info)
> > > >          dump_printf_loc (MSG_NOTE, vect_location,
> > > >  			 "Considering outer-loop vectorization.\n");
> > > >        info->inner_loop_cond =3D inner.loop_cond;
> > > > +
> > > > +      if (!single_exit (loop))
> > > > +	return opt_result::failure_at (vect_location,
> > > > +				       "not vectorized: multiple exits.\n");
> > > > +
> > > >      }
> > > >
> > > > -  if (!single_exit (loop))
> > > > -    return opt_result::failure_at (vect_location,
> > > > -				   "not vectorized: multiple exits.\n");
> > > >    if (EDGE_COUNT (loop->header->preds) !=3D 2)
> > > >      return opt_result::failure_at (vect_location,
> > > >  				   "not vectorized:"
> > > > @@ -1788,11 +1795,36 @@ vect_analyze_loop_form (class loop *loop,
> > > vect_loop_form_info *info)
> > > >  				   "not vectorized: latch block not empty.\n");
> > > >
> > > >    /* Make sure the exit is not abnormal.  */
> > > > -  edge e =3D single_exit (loop);
> > > > -  if (e->flags & EDGE_ABNORMAL)
> > > > -    return opt_result::failure_at (vect_location,
> > > > -				   "not vectorized:"
> > > > -				   " abnormal loop exit edge.\n");
> > > > +  auto_vec<edge> exits =3D get_loop_exit_edges (loop);
> > > > +  edge nexit =3D loop->vec_loop_iv;
> > > > +  for (edge e : exits)
> > > > +    {
> > > > +      if (e->flags & EDGE_ABNORMAL)
> > > > +	return opt_result::failure_at (vect_location,
> > > > +				       "not vectorized:"
> > > > +				       " abnormal loop exit edge.\n");
> > > > +      /* Early break BB must be after the main exit BB.  In theory=
 we should
> > > > +	 be able to vectorize the inverse order, but the current flow in =
the
> > > > +	 the vectorizer always assumes you update successor PHI nodes, no=
t
> > > > +	 preds.  */
> > > > +      if (e !=3D nexit && !dominated_by_p (CDI_DOMINATORS, nexit->=
src, e-
> > > >src))
> > > > +	return opt_result::failure_at (vect_location,
> > > > +				       "not vectorized:"
> > > > +				       " abnormal loop exit edge order.\n");
>=20
> "unsupported loop exit order", but I don't understand the comment.
>=20
> > > > +    }
> > > > +
> > > > +  /* We currently only support early exit loops with known bounds.=
   */
>=20
> Btw, why's that?  Is that because we don't support the loop-around edge?
> IMHO this is the most serious limitation (and as said above it should be
> trivial to fix).
>=20
> > > > +  if (exits.length () > 1)
> > > > +    {
> > > > +      class tree_niter_desc niter;
> > > > +      if (!number_of_iterations_exit_assumptions (loop, nexit, &ni=
ter,
> NULL)
> > > > +	  || chrec_contains_undetermined (niter.niter)
> > > > +	  || !evolution_function_is_constant_p (niter.niter))
> > > > +	return opt_result::failure_at (vect_location,
> > > > +				       "not vectorized:"
> > > > +				       " early breaks only supported on loops"
> > > > +				       " with known iteration bounds.\n");
> > > > +    }
> > > >
> > > >    info->conds
> > > >      =3D vect_get_loop_niters (loop, &info->assumptions,
> > > > @@ -1866,6 +1898,10 @@ vect_create_loop_vinfo (class loop *loop,
> > > vec_info_shared *shared,
> > > >    LOOP_VINFO_LOOP_CONDS (loop_vinfo).safe_splice (info-
> > > >alt_loop_conds);
> > > >    LOOP_VINFO_LOOP_IV_COND (loop_vinfo) =3D info->loop_cond;
> > > >
> > > > +  /* Check to see if we're vectorizing multiple exits.  */
> > > > +  LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > > > +    =3D !LOOP_VINFO_LOOP_CONDS (loop_vinfo).is_empty ();
> > > > +
> > > >    if (info->inner_loop_cond)
> > > >      {
> > > >        stmt_vec_info inner_loop_cond_info
> > > > @@ -3070,7 +3106,8 @@ start_over:
> > > >
> > > >    /* If an epilogue loop is required make sure we can create one. =
 */
> > > >    if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
> > > > -      || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo))
> > > > +      || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)
> > > > +      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > >      {
> > > >        if (dump_enabled_p ())
> > > >          dump_printf_loc (MSG_NOTE, vect_location, "epilog loop
> required\n");
> > > > @@ -5797,7 +5834,7 @@ vect_create_epilog_for_reduction
> (loop_vec_info
> > > loop_vinfo,
> > > >    basic_block exit_bb;
> > > >    tree scalar_dest;
> > > >    tree scalar_type;
> > > > -  gimple *new_phi =3D NULL, *phi;
> > > > +  gimple *new_phi =3D NULL, *phi =3D NULL;
> > > >    gimple_stmt_iterator exit_gsi;
> > > >    tree new_temp =3D NULL_TREE, new_name, new_scalar_dest;
> > > >    gimple *epilog_stmt =3D NULL;
> > > > @@ -6039,6 +6076,33 @@ vect_create_epilog_for_reduction
> > > (loop_vec_info loop_vinfo,
> > > >  	  new_def =3D gimple_convert (&stmts, vectype, new_def);
> > > >  	  reduc_inputs.quick_push (new_def);
> > > >  	}
> > > > +
> > > > +	/* Update the other exits.  */
> > > > +	if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > +	  {
> > > > +	    vec<edge> alt_exits =3D LOOP_VINFO_ALT_EXITS (loop_vinfo);
> > > > +	    gphi_iterator gsi, gsi1;
> > > > +	    for (edge exit : alt_exits)
> > > > +	      {
> > > > +		/* Find the phi node to propaget into the exit block for each
> > > > +		   exit edge.  */
> > > > +		for (gsi =3D gsi_start_phis (exit_bb),
> > > > +		     gsi1 =3D gsi_start_phis (exit->src);
>=20
> exit->src =3D=3D loop->header, right?  I think this won't work for multip=
le
> alternate exits.  It's probably easier to do this where we create the
> LC PHI node for the reduction result?
>=20
> > > > +		     !gsi_end_p (gsi) && !gsi_end_p (gsi1);
> > > > +		     gsi_next (&gsi), gsi_next (&gsi1))
> > > > +		  {
> > > > +		    /* There really should be a function to just get the number
> > > > +		       of phis inside a bb.  */
> > > > +		    if (phi && phi =3D=3D gsi.phi ())
> > > > +		      {
> > > > +			gphi *phi1 =3D gsi1.phi ();
> > > > +			SET_PHI_ARG_DEF (phi, exit->dest_idx,
> > > > +					 PHI_RESULT (phi1));
>=20
> I think we know the header PHI of a reduction perfectly well, there
> shouldn't be the need to "search" for it.
>=20
> > > > +			break;
> > > > +		      }
> > > > +		  }
> > > > +	      }
> > > > +	  }
> > > >        gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
> > > >      }
> > > >
> > > > @@ -10355,6 +10419,13 @@ vectorizable_live_operation (vec_info
> *vinfo,
> > > >  	   new_tree =3D lane_extract <vec_lhs', ...>;
> > > >  	   lhs' =3D new_tree;  */
> > > >
> > > > +      /* When vectorizing an early break, any live statement that =
is used
> > > > +	 outside of the loop are dead.  The loop will never get to them.
> > > > +	 We could change the liveness value during analysis instead but s=
ince
> > > > +	 the below code is invalid anyway just ignore it during codegen. =
 */
> > > > +      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > +	return true;
>=20
> But what about the value that's live across the main exit when the
> epilogue is not entered?
>=20
> > > > +
> > > >        class loop *loop =3D LOOP_VINFO_LOOP (loop_vinfo);
> > > >        basic_block exit_bb =3D LOOP_VINFO_IV_EXIT (loop_vinfo)->des=
t;
> > > >        gcc_assert (single_pred_p (exit_bb));
> > > > @@ -11277,7 +11348,7 @@ vect_transform_loop (loop_vec_info
> > > loop_vinfo, gimple *loop_vectorized_call)
> > > >    /* Make sure there exists a single-predecessor exit bb.  Do this=
 before
> > > >       versioning.   */
> > > >    edge e =3D LOOP_VINFO_IV_EXIT (loop_vinfo);
> > > > -  if (! single_pred_p (e->dest))
> > > > +  if (e && ! single_pred_p (e->dest) && !LOOP_VINFO_EARLY_BREAKS
> > > (loop_vinfo))
>=20
> e can be NULL here?  I think we should reject such loops earlier.
>=20
> > > >      {
> > > >        split_loop_exit_edge (e, true);
> > > >        if (dump_enabled_p ())
> > > > @@ -11303,7 +11374,7 @@ vect_transform_loop (loop_vec_info
> > > loop_vinfo, gimple *loop_vectorized_call)
> > > >    if (LOOP_VINFO_SCALAR_LOOP (loop_vinfo))
> > > >      {
> > > >        e =3D single_exit (LOOP_VINFO_SCALAR_LOOP (loop_vinfo));
> > > > -      if (! single_pred_p (e->dest))
> > > > +      if (e && ! single_pred_p (e->dest))
> > > >  	{
> > > >  	  split_loop_exit_edge (e, true);
> > > >  	  if (dump_enabled_p ())
> > > > @@ -11641,7 +11712,8 @@ vect_transform_loop (loop_vec_info
> > > loop_vinfo, gimple *loop_vectorized_call)
> > > >
> > > >    /* Loops vectorized with a variable factor won't benefit from
> > > >       unrolling/peeling.  */
>=20
> update the comment?  Why would we unroll a VLA loop with early breaks?
> Or did you mean to use || LOOP_VINFO_EARLY_BREAKS (loop_vinfo)?
>=20
> > > > -  if (!vf.is_constant ())
> > > > +  if (!vf.is_constant ()
> > > > +      && !LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > >      {
> > > >        loop->unroll =3D 1;
> > > >        if (dump_enabled_p ())
> > > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > > > index
> > >
> 87c4353fa5180fcb7f60b192897456cf24f3fdbe..03524e8500ee06df42f82af
> > > e78ee2a7c627be45b 100644
> > > > --- a/gcc/tree-vect-stmts.cc
> > > > +++ b/gcc/tree-vect-stmts.cc
> > > > @@ -344,9 +344,34 @@ vect_stmt_relevant_p (stmt_vec_info
> stmt_info,
> > > loop_vec_info loop_vinfo,
> > > >    *live_p =3D false;
> > > >
> > > >    /* cond stmt other than loop exit cond.  */
> > > > -  if (is_ctrl_stmt (stmt_info->stmt)
> > > > -      && STMT_VINFO_TYPE (stmt_info) !=3D loop_exit_ctrl_vec_info_=
type)
> > > > -    *relevant =3D vect_used_in_scope;
>=20
> how was that ever hit before?  For outer loop processing with outer loop
> vectorization?
>=20
> > > > +  if (is_ctrl_stmt (stmt_info->stmt))
> > > > +    {
> > > > +      /* Ideally EDGE_LOOP_EXIT would have been set on the exit ed=
ge,
> but
> > > > +	 it looks like loop_manip doesn't do that..  So we have to do it
> > > > +	 the hard way.  */
> > > > +      basic_block bb =3D gimple_bb (stmt_info->stmt);
> > > > +      bool exit_bb =3D false, early_exit =3D false;
> > > > +      edge_iterator ei;
> > > > +      edge e;
> > > > +      FOR_EACH_EDGE (e, ei, bb->succs)
> > > > +        if (!flow_bb_inside_loop_p (loop, e->dest))
> > > > +	  {
> > > > +	    exit_bb =3D true;
> > > > +	    early_exit =3D loop->vec_loop_iv->src !=3D bb;
> > > > +	    break;
> > > > +	  }
> > > > +
> > > > +      /* We should have processed any exit edge, so an edge not an=
 early
> > > > +	 break must be a loop IV edge.  We need to distinguish between th=
e
> > > > +	 two as we don't want to generate code for the main loop IV.  */
> > > > +      if (exit_bb)
> > > > +	{
> > > > +	  if (early_exit)
> > > > +	    *relevant =3D vect_used_in_scope;
> > > > +	}
>=20
> I wonder why you can't simply do
>=20
>          if (is_ctrl_stmt (stmt_info->stmt)
>              && stmt_info->stmt !=3D LOOP_VINFO_COND (loop_info))
>=20
> ?
>=20
> > > > +      else if (bb->loop_father =3D=3D loop)
> > > > +	LOOP_VINFO_GENERAL_CTR_FLOW (loop_vinfo) =3D true;
>=20
> so for control flow not exiting the loop you can check
> loop_exits_from_bb_p ().
>=20
> > > > +    }
> > > >
> > > >    /* changing memory.  */
> > > >    if (gimple_code (stmt_info->stmt) !=3D GIMPLE_PHI)
> > > > @@ -359,6 +384,11 @@ vect_stmt_relevant_p (stmt_vec_info
> stmt_info,
> > > loop_vec_info loop_vinfo,
> > > >  	*relevant =3D vect_used_in_scope;
> > > >        }
> > > >
> > > > +  auto_vec<edge> exits =3D get_loop_exit_edges (loop);
> > > > +  auto_bitmap exit_bbs;
> > > > +  for (edge exit : exits)
> > > > +    bitmap_set_bit (exit_bbs, exit->dest->index);
> > > > +
> > > >    /* uses outside the loop.  */
> > > >    FOR_EACH_PHI_OR_STMT_DEF (def_p, stmt_info->stmt, op_iter,
> > > SSA_OP_DEF)
> > > >      {
> > > > @@ -377,7 +407,7 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info,
> > > loop_vec_info loop_vinfo,
> > > >  	      /* We expect all such uses to be in the loop exit phis
> > > >  		 (because of loop closed form)   */
> > > >  	      gcc_assert (gimple_code (USE_STMT (use_p)) =3D=3D GIMPLE_PH=
I);
> > > > -	      gcc_assert (bb =3D=3D single_exit (loop)->dest);
> > > > +	      gcc_assert (bitmap_bit_p (exit_bbs, bb->index));
>=20
> That now becomes quite expensive checking already covered by the LC SSA
> verifier so I suggest to simply drop this assert instead.
>=20
> > > >                *live_p =3D true;
> > > >  	    }
> > > > @@ -683,6 +713,13 @@ vect_mark_stmts_to_be_vectorized
> > > (loop_vec_info loop_vinfo, bool *fatal)
> > > >  	}
> > > >      }
> > > >
> > > > +  /* Ideally this should be in vect_analyze_loop_form but we haven=
't
> seen all
> > > > +     the conds yet at that point and there's no quick way to retri=
eve them.
> */
> > > > +  if (LOOP_VINFO_GENERAL_CTR_FLOW (loop_vinfo))
> > > > +    return opt_result::failure_at (vect_location,
> > > > +				   "not vectorized:"
> > > > +				   " unsupported control flow in loop.\n");
>=20
> so we didn't do this before?  But see above where I wondered.  So when
> does this hit with early exits and why can't we check for this in
> vect_verify_loop_form?
>=20
> > > > +
> > > >    /* 2. Process_worklist */
> > > >    while (worklist.length () > 0)
> > > >      {
> > > > @@ -778,6 +815,20 @@ vect_mark_stmts_to_be_vectorized
> > > (loop_vec_info loop_vinfo, bool *fatal)
> > > >  			return res;
> > > >  		    }
> > > >                   }
> > > > +	    }
> > > > +	  else if (gcond *cond =3D dyn_cast <gcond *> (stmt_vinfo->stmt))
> > > > +	    {
> > > > +	      enum tree_code rhs_code =3D gimple_cond_code (cond);
> > > > +	      gcc_assert (TREE_CODE_CLASS (rhs_code) =3D=3D tcc_compariso=
n);
> > > > +	      opt_result res
> > > > +		=3D process_use (stmt_vinfo, gimple_cond_lhs (cond),
> > > > +			       loop_vinfo, relevant, &worklist, false);
> > > > +	      if (!res)
> > > > +		return res;
> > > > +	      res =3D process_use (stmt_vinfo, gimple_cond_rhs (cond),
> > > > +				loop_vinfo, relevant, &worklist, false);
> > > > +	      if (!res)
> > > > +		return res;
> > > >              }
> > > >  	  else if (gcall *call =3D dyn_cast <gcall *> (stmt_vinfo->stmt))
> > > >  	    {
> > > > @@ -11919,11 +11970,15 @@ vect_analyze_stmt (vec_info *vinfo,
> > > >  			     node_instance, cost_vec);
> > > >        if (!res)
> > > >  	return res;
> > > > -   }
> > > > +    }
> > > > +
> > > > +  if (is_ctrl_stmt (stmt_info->stmt))
> > > > +    STMT_VINFO_DEF_TYPE (stmt_info) =3D vect_early_exit_def;
> > > >
> > > >    switch (STMT_VINFO_DEF_TYPE (stmt_info))
> > > >      {
> > > >        case vect_internal_def:
> > > > +      case vect_early_exit_def:
> > > >          break;
> > > >
> > > >        case vect_reduction_def:
> > > > @@ -11956,6 +12011,7 @@ vect_analyze_stmt (vec_info *vinfo,
> > > >      {
> > > >        gcall *call =3D dyn_cast <gcall *> (stmt_info->stmt);
> > > >        gcc_assert (STMT_VINFO_VECTYPE (stmt_info)
> > > > +		  || gimple_code (stmt_info->stmt) =3D=3D GIMPLE_COND
> > > >  		  || (call && gimple_call_lhs (call) =3D=3D NULL_TREE));
> > > >        *need_to_vectorize =3D true;
> > > >      }
> > > > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> > > > index
> > >
> ec65b65b5910e9cbad0a8c7e83c950b6168b98bf..24a0567a2f23f1b3d8b3
> > > 40baff61d18da8e242dd 100644
> > > > --- a/gcc/tree-vectorizer.h
> > > > +++ b/gcc/tree-vectorizer.h
> > > > @@ -63,6 +63,7 @@ enum vect_def_type {
> > > >    vect_internal_def,
> > > >    vect_induction_def,
> > > >    vect_reduction_def,
> > > > +  vect_early_exit_def,
>=20
> can you avoid putting this inbetween reduction and double reduction
> please?  Just put it before vect_unknown_def_type.  In fact the COND
> isn't a def ... maybe we should have pattern recogized
>=20
>  if (a < b) exit;
>=20
> as
>=20
>  cond =3D a < b;
>  if (cond !=3D 0) exit;
>=20
> so the part that we need to vectorize is more clear.
>=20
> > > >    vect_double_reduction_def,
> > > >    vect_nested_cycle,
> > > >    vect_first_order_recurrence,
> > > > @@ -876,6 +877,13 @@ public:
> > > >       we need to peel off iterations at the end to form an epilogue=
 loop.  */
> > > >    bool peeling_for_niter;
> > > >
> > > > +  /* When the loop has early breaks that we can vectorize we need =
to
> peel
> > > > +     the loop for the break finding loop.  */
> > > > +  bool early_breaks;
> > > > +
> > > > +  /* When the loop has a non-early break control flow inside.  */
> > > > +  bool non_break_control_flow;
> > > > +
> > > >    /* List of loop additional IV conditionals found in the loop.  *=
/
> > > >    auto_vec<gcond *> conds;
> > > >
> > > > @@ -985,9 +993,11 @@ public:
> > > >  #define LOOP_VINFO_REDUCTION_CHAINS(L)     (L)->reduction_chains
> > > >  #define LOOP_VINFO_PEELING_FOR_GAPS(L)     (L)->peeling_for_gaps
> > > >  #define LOOP_VINFO_PEELING_FOR_NITER(L)    (L)->peeling_for_niter
> > > > +#define LOOP_VINFO_EARLY_BREAKS(L)         (L)->early_breaks
> > > >  #define LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS(L) (L)-
> > > >early_break_conflict
> > > >  #define LOOP_VINFO_EARLY_BRK_DEST_BB(L)    (L)-
> >early_break_dest_bb
> > > >  #define LOOP_VINFO_EARLY_BRK_VUSES(L)      (L)->early_break_vuses
> > > > +#define LOOP_VINFO_GENERAL_CTR_FLOW(L)     (L)-
> > > >non_break_control_flow
> > > >  #define LOOP_VINFO_LOOP_CONDS(L)           (L)->conds
> > > >  #define LOOP_VINFO_LOOP_IV_COND(L)         (L)->loop_iv_cond
> > > >  #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)-
> > > >no_data_dependencies
> > > > @@ -1038,8 +1048,8 @@ public:
> > > >     stack.  */
> > > >  typedef opt_pointer_wrapper <loop_vec_info> opt_loop_vec_info;
> > > >
> > > > -inline loop_vec_info
> > > > -loop_vec_info_for_loop (class loop *loop)
> > > > +static inline loop_vec_info
> > > > +loop_vec_info_for_loop (const class loop *loop)
> > > >  {
> > > >    return (loop_vec_info) loop->aux;
> > > >  }
> > > > @@ -1789,7 +1799,7 @@ is_loop_header_bb_p (basic_block bb)
> > > >  {
> > > >    if (bb =3D=3D (bb->loop_father)->header)
> > > >      return true;
> > > > -  gcc_checking_assert (EDGE_COUNT (bb->preds) =3D=3D 1);
> > > > +
> > > >    return false;
> > > >  }
> > > >
> > > > @@ -2176,9 +2186,10 @@ class auto_purge_vect_location
> > > >     in tree-vect-loop-manip.cc.  */
> > > >  extern void vect_set_loop_condition (class loop *, loop_vec_info,
> > > >  				     tree, tree, tree, bool);
> > > > -extern bool slpeel_can_duplicate_loop_p (const class loop *,
> const_edge);
> > > > +extern bool slpeel_can_duplicate_loop_p (const loop_vec_info,
> > > const_edge);
> > > >  class loop *slpeel_tree_duplicate_loop_to_edge_cfg (class loop *,
> > > > -						     class loop *, edge);
> > > > +						    class loop *, edge, bool,
> > > > +						    vec<basic_block> * =3D NULL);
> > > >  class loop *vect_loop_versioning (loop_vec_info, gimple *);
> > > >  extern class loop *vect_do_peeling (loop_vec_info, tree, tree,
> > > >  				    tree *, tree *, tree *, int, bool, bool,
> > > > diff --git a/gcc/tree-vectorizer.cc b/gcc/tree-vectorizer.cc
> > > > index
> > >
> a048e9d89178a37455bd7b83ab0f2a238a4ce69e..0dc5479dc92058b6c70c
> > > 67f29f5dc9a8d72235f4 100644
> > > > --- a/gcc/tree-vectorizer.cc
> > > > +++ b/gcc/tree-vectorizer.cc
> > > > @@ -1379,7 +1379,9 @@ pass_vectorize::execute (function *fun)
> > > >  	 predicates that need to be shared for optimal predicate usage.
> > > >  	 However reassoc will re-order them and prevent CSE from working
> > > >  	 as it should.  CSE only the loop body, not the entry.  */
> > > > -      bitmap_set_bit (exit_bbs, single_exit (loop)->dest->index);
> > > > +      auto_vec<edge> exits =3D get_loop_exit_edges (loop);
>=20
> seeing this more and more I think we want a simple way to iterate over
> all exits without copying to a vector when we have them recorded.  My
> C++ fu is too limited to support
>=20
>   for (auto exit : recorded_exits (loop))
>     ...
>=20
> (maybe that's enough for somebody to jump onto this ;))
>=20
> Don't treat all review comments as change orders, but it should be clear
> the code isn't 100% obvious.  Maybe the patch can be simplified by
> splitting out the LC SSA cleanup parts.
>=20
> Thanks,
> Richard.
>=20
> > > > +      for (edge exit : exits)
> > > > +	bitmap_set_bit (exit_bbs, exit->dest->index);
> > > >
> > > >        edge entry =3D EDGE_PRED (loop_preheader_edge (loop)->src, 0=
);
> > > >        do_rpo_vn (fun, entry, exit_bbs);
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> > > --
> > > Richard Biener <rguenther@suse.de>
> > > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461
> > > Nuernberg,
> > > Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien
> > > Moerman;
> > > HRB 36809 (AG Nuernberg)
> >