From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by sourceware.org (Postfix) with ESMTPS id CF99A3858D20 for ; Thu, 16 Nov 2023 13:35:32 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CF99A3858D20 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de ARC-Filter: OpenARC Filter v1.0.0 sourceware.org CF99A3858D20 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=195.135.220.29 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700141735; cv=none; b=lnHXLRrxU5PSbAR3Mlr2yZUpiqFeE4FQxZn1Gbm2YlZmSsr+XrGqpzrEMJiPetvA8DAw2jvFNtu7ontR1fkBf9h5XllLKHsD5MeI95zfeRQY+eqwHL8WV0iFFu+JiF3t2y6I2wMQ09W89K81mZfQKmg35n7j20jzNqKRC4KxNQM= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700141735; c=relaxed/simple; bh=sXe2uSIl2Cp22uNN6j4gcx8LCHFIixtIWAjyToggh08=; h=Date:From:To:Subject:Message-ID:MIME-Version; b=Ki2zWKTRSIdYb1T/+HvCVyCveRrDeMgH2QAW0d7SCimck9eEBfn89IzyK5n6MLY3mftNhxqYz8GNCFD6DM2JlVO3jvt0Xdn/JuVr6UPUlOxDR/lqciY/R8WzLADPKiicjDV1wfe7esy4ap6jm5E5ENCig9eTeI8i/JGJkHI7LPk= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 5C9A320503; Thu, 16 Nov 2023 13:35:31 +0000 (UTC) Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 2CE192C43D; Thu, 16 Nov 2023 13:35:31 +0000 (UTC) Date: Thu, 16 Nov 2023 13:35:31 +0000 (UTC) From: Richard Biener To: Tamar Christina cc: "gcc-patches@gcc.gnu.org" , nd , "jlaw@ventanamicro.com" Subject: RE: [PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits In-Reply-To: Message-ID: References: User-Agent: Alpine 2.22 (LSU 394 2020-01-19) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Level: Authentication-Results: smtp-out2.suse.de; none X-Rspamd-Server: rspamd2 X-Spamd-Result: default: False [-4.00 / 50.00]; REPLY(-4.00)[] X-Spam-Score: -4.00 X-Rspamd-Queue-Id: 5C9A320503 X-Spam-Status: No, score=-4.8 required=5.0 tests=BAYES_00,KAM_DMARC_STATUS,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Thu, 16 Nov 2023, Tamar Christina wrote: > > > > > > > > > > > > > > Perhaps I'm missing something here? > > > > > > > > > > > > OK, so I refreshed my mind of what > > > > > > vect_update_ivs_after_vectorizer > > > > does. > > > > > > > > > > > > I still do not understand the (complexity of the) patch. > > > > > > Basically the function computes the new value of the IV "from > > > > > > scratch" based on the number of scalar iterations of the vector loop, > > the 'niter' > > > > > > argument. I would have expected that for the early exits we > > > > > > either pass in a different 'niter' or alternatively a 'niter_adjustment'. > > > > > > > > > > But for an early exit there's no static value for adjusted niter, > > > > > since you don't know which iteration you exited from. Unlike the > > > > > normal exit when you know if you get there you've done all > > > > > possible > > > > iterations. > > > > > > > > > > So you must compute the scalar iteration count on the exit itself. > > > > > > > > ? You do not need the actual scalar iteration you exited (you don't > > > > compute that either), you need the scalar iteration the vector > > > > iteration started with when it exited prematurely and that's readily > > available? > > > > > > For a normal exit yes, not for an early exit no? niters_vector_mult_vf > > > is only valid for the main exit. > > > > > > There's the unadjusted scalar count, which is what it's using to > > > adjust it to the final count. Unless I'm missing something? > > > > Ah, of course - niters_vector_mult_vf is for the countable exit. For the early > > exits we can't precompute the scalar iteration value. But that then means we > > should compute the appropriate "continuation" as live value of the vectorized > > IVs even when they were not originally used outside of the loop. I don't see > > how we can express this in terms of the scalar IVs in the (not yet) vectorized > > loop - similar to the reduction case you are going to end up with the wrong > > values here. > > > > That said, I've for a long time wanted to preserve the original control IV also for > > the vector code (leaving any "optimization" > > to IVOPTs there), that would enable us to compute the correct > > "niters_vector_mult_vf" based on that IV. > > > > So given we cannot use the scalar IVs you have to handle all inductions > > (besides the main exit control IV) in vectorizable_live_operation I think. > > > > That's what I currently do, that's why there was the > if (STMT_VINFO_LIVE_P (phi_info)) > continue; Yes, but that only works for the inductions marked so. We'd need to mark the others as well, but only for the early exits. > although I don't understand why we use the scalar count, I suppose the reasoning > is that we don't really want to keep it around, and referencing it forces it to be kept? Referencing it will cause the scalar compute to be retained, but since we do not adjust the scalar compute during vectorization (but expect it to be dead) the scalar compute will compute the wrong thing (as shown by the reduction example - I suspect inductions will suffer from the same problem). > At the moment it just does `init + (final - init) * vf` which is correct no? The issue is that 'final' is not computed correctly in the vectorized loop. This formula might work for affine evolutions of course. Extracting the correct value from the vectorized induction would be the preferred solution. > Also you missed the question below about how to avoid the creation of the block, > You ok with changing that? > > Thanks, > Tamar > > > Or for now disable early-break for inductions that are not the main exit control > > IV (in vect_can_advance_ivs_p)? > > > > > > > > > > > > > > It seems your change handles different kinds of inductions differently. > > > > > > Specifically > > > > > > > > > > > > bool ivtemp = gimple_cond_lhs (cond) == iv_var; > > > > > > if (restart_loop && ivtemp) > > > > > > { > > > > > > type = TREE_TYPE (gimple_phi_result (phi)); > > > > > > ni = build_int_cst (type, vf); > > > > > > if (inversed_iv) > > > > > > ni = fold_build2 (MINUS_EXPR, type, ni, > > > > > > fold_convert (type, step_expr)); > > > > > > } > > > > > > > > > > > > it looks like for the exit test IV we use either 'VF' or 'VF - step' > > > > > > as the new value. That seems to be very odd special casing for > > > > > > unknown reasons. And while you adjust vec_step_op_add, you > > > > > > don't adjust vect_peel_nonlinear_iv_init (maybe not supported - > > > > > > better assert > > > > here). > > > > > > > > > > The VF case is for a normal "non-inverted" loop, where if you take > > > > > an early exit you know that you have to do at most VF iterations. > > > > > The VF > > > > > - step is to account for the inverted loop control flow where you > > > > > exit after adjusting the IV already by + step. > > > > > > > > But doesn't that assume the IV counts from niter to zero? I don't > > > > see this special case is actually necessary, no? > > > > > > > > > > I needed it because otherwise the scalar loop iterates one iteration > > > too little So I got a miscompile with the inverter loop stuff. I'll > > > look at it again perhaps It can be solved differently. > > > > > > > > > > > > > Peeling doesn't matter here, since you know you were able to do a > > > > > vector iteration so it's safe to do VF iterations. So having > > > > > peeled doesn't affect the remaining iters count. > > > > > > > > > > > > > > > > > Also the vec_step_op_add case will keep the original scalar IV > > > > > > live even when it is a vectorized induction. The code > > > > > > recomputing the value from scratch avoids this. > > > > > > > > > > > > /* For non-main exit create an intermediat edge to get any updated > > iv > > > > > > calculations. */ > > > > > > if (needs_interm_block > > > > > > && !iv_block > > > > > > && (!gimple_seq_empty_p (stmts) || !gimple_seq_empty_p > > > > > > (new_stmts))) > > > > > > { > > > > > > iv_block = split_edge (update_e); > > > > > > update_e = single_succ_edge (update_e->dest); > > > > > > last_gsi = gsi_last_bb (iv_block); > > > > > > } > > > > > > > > > > > > this is also odd, can we adjust the API instead? I suppose this > > > > > > is because your computation uses the original loop IV, if you > > > > > > based the computation off the initial value only this might not be > > necessary? > > > > > > > > > > No, on the main exit the code updates the value in the loop header > > > > > and puts the Calculation in the merge block. This works because > > > > > it only needs to consume PHI nodes in the merge block and things > > > > > like niters are > > > > adjusted in the guard block. > > > > > > > > > > For an early exit, we don't have a guard block, only the merge block. > > > > > We have to update the PHI nodes in that block, but can't do so > > > > > since you can't produce a value and consume it in a PHI node in the same > > BB. > > > > > So we need to create the block to put the values in for use in the > > > > > merge block. Because there's no "guard" block for early exits. > > > > > > > > ? then compute niters in that block as well. > > > > > > We can't since it'll not be reachable through the right edge. What we > > > can do if you want is slightly change peeling, we currently peel as: > > > > > > \ \ / > > > E1 E2 Normal exit > > > \ | | > > > \ | Guard > > > \ | | > > > Merge block > > > | > > > Pre Header > > > > > > If we instead peel as: > > > > > > > > > \ \ / > > > E1 E2 Normal exit > > > \ | | > > > Exit join Guard > > > \ | | > > > Merge block > > > | > > > Pre Header > > > > > > We can use the exit join block. This would also mean > > > vect_update_ivs_after_vectorizer Doesn't need to iterate over all > > > exits and only really needs to adjust the phi nodes Coming out of the exit join > > and guard block. > > > > > > Does this work for you? Yeah, I think that would work. But I'd like to sort out the correctness details of the IV update itself before sorting out this code placement detail. Richard. > > > Thanks, > > > Tamar > > > > > > > > > The API can be adjusted by always creating the empty block either > > > > > during > > > > peeling. > > > > > That would prevent us from having to do anything special here. > > > > > Would that work better? Or I can do it in the loop that iterates > > > > > over the exits to before the call to > > > > > vect_update_ivs_after_vectorizer, which I think > > > > might be more consistent. > > > > > > > > > > > > > > > > > That said, I wonder why we cannot simply pass in an adjusted > > > > > > niter which would be niters_vector_mult_vf - vf and be done with that? > > > > > > > > > > > > > > > > We can ofcourse not have this and recompute it from niters itself, > > > > > however this does affect the epilog code layout. Particularly > > > > > knowing the static number if iterations left causes it to usually > > > > > unroll the loop and share some of the computations. i.e. the > > > > > scalar code is often more > > > > efficient. > > > > > > > > > > The computation would be niters_vector_mult_vf - iters_done * vf, > > > > > since the value put Here is the remaining iteration count. It's > > > > > static for early > > > > exits. > > > > > > > > Well, it might be "static" in that it doesn't really matter what you > > > > use for the epilog main IV initial value as long as you are sure > > > > you're not going to take that exit as you are sure we're going to > > > > take one of the early exits. So yeah, the special code is probably > > > > OK, but it needs a better comment and as said the structure of > > vect_update_ivs_after_vectorizer is a bit hard to follow now. > > > > > > > > As said an important part for optimization is to not keep the scalar > > > > IVs live in the vector loop. > > > > > > > > > But can do whatever you prefer here. Let me know what you prefer > > > > > for the > > > > above. > > > > > > > > > > Thanks, > > > > > Tamar > > > > > > > > > > > Thanks, > > > > > > Richard. > > > > > > > > > > > > > > > > > > > Regards, > > > > > > > Tamar > > > > > > > > > > > > > > > > > It has to do this since you have to perform the side > > > > > > > > > effects for the non-matching elements still. > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > Tamar > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > + if (STMT_VINFO_LIVE_P (phi_info)) > > > > > > > > > > > + continue; > > > > > > > > > > > + > > > > > > > > > > > + /* For early break the final loop IV is: > > > > > > > > > > > + init + (final - init) * vf which takes into account > > peeling > > > > > > > > > > > + values and non-single steps. The main exit > > can > > > > > > > > > > > +use > > > > > > niters > > > > > > > > > > > + since if you exit from the main exit you've > > done > > > > > > > > > > > +all > > > > > > vector > > > > > > > > > > > + iterations. For an early exit we don't know > > when > > > > > > > > > > > +we > > > > > > exit > > > > > > > > > > > +so > > > > > > > > > > we > > > > > > > > > > > + must re-calculate this on the exit. */ > > > > > > > > > > > + tree start_expr = gimple_phi_result (phi); > > > > > > > > > > > + off = fold_build2 (MINUS_EXPR, stype, > > > > > > > > > > > + fold_convert (stype, > > start_expr), > > > > > > > > > > > + fold_convert (stype, > > init_expr)); > > > > > > > > > > > + /* Now adjust for VF to get the final iteration value. > > */ > > > > > > > > > > > + off = fold_build2 (MULT_EXPR, stype, off, > > > > > > > > > > > + build_int_cst (stype, vf)); > > > > > > > > > > > + } > > > > > > > > > > > + else > > > > > > > > > > > + off = fold_build2 (MULT_EXPR, stype, > > > > > > > > > > > + fold_convert (stype, niters), > > step_expr); > > > > > > > > > > > + > > > > > > > > > > > if (POINTER_TYPE_P (type)) > > > > > > > > > > > ni = fold_build_pointer_plus (init_expr, off); > > > > > > > > > > > else > > > > > > > > > > > @@ -2238,6 +2286,8 @@ vect_update_ivs_after_vectorizer > > > > > > > > > > > (loop_vec_info > > > > > > > > > > loop_vinfo, > > > > > > > > > > > /* Don't bother call vect_peel_nonlinear_iv_init. */ > > > > > > > > > > > else if (induction_type == vect_step_op_neg) > > > > > > > > > > > ni = init_expr; > > > > > > > > > > > + else if (restart_loop) > > > > > > > > > > > + continue; > > > > > > > > > > > > > > > > > > > > This looks all a bit complicated - why wouldn't we > > > > > > > > > > simply always use the PHI result when 'restart_loop'? > > > > > > > > > > Isn't that the correct old start value in > > > > > > > > all cases? > > > > > > > > > > > > > > > > > > > > > else > > > > > > > > > > > ni = vect_peel_nonlinear_iv_init (&stmts, init_expr, > > > > > > > > > > > niters, step_expr, @@ - > > > > 2245,9 +2295,20 @@ > > > > > > > > > > > vect_update_ivs_after_vectorizer > > > > > > > > > > (loop_vec_info > > > > > > > > > > > loop_vinfo, > > > > > > > > > > > > > > > > > > > > > > var = create_tmp_var (type, "tmp"); > > > > > > > > > > > > > > > > > > > > > > - last_gsi = gsi_last_bb (exit_bb); > > > > > > > > > > > gimple_seq new_stmts = NULL; > > > > > > > > > > > ni_name = force_gimple_operand (ni, &new_stmts, > > > > > > > > > > > false, var); > > > > > > > > > > > + > > > > > > > > > > > + /* For non-main exit create an intermediat edge > > > > > > > > > > > + to get any > > > > > > updated iv > > > > > > > > > > > + calculations. */ > > > > > > > > > > > + if (needs_interm_block > > > > > > > > > > > + && !iv_block > > > > > > > > > > > + && (!gimple_seq_empty_p (stmts) || > > > > > > > > > > > +!gimple_seq_empty_p > > > > > > > > > > (new_stmts))) > > > > > > > > > > > + { > > > > > > > > > > > + iv_block = split_edge (update_e); > > > > > > > > > > > + update_e = single_succ_edge (update_e->dest); > > > > > > > > > > > + last_gsi = gsi_last_bb (iv_block); > > > > > > > > > > > + } > > > > > > > > > > > + > > > > > > > > > > > /* Exit_bb shouldn't be empty. */ > > > > > > > > > > > if (!gsi_end_p (last_gsi)) > > > > > > > > > > > { > > > > > > > > > > > @@ -3342,8 +3403,26 @@ vect_do_peeling (loop_vec_info > > > > > > > > > > > loop_vinfo, tree > > > > > > > > > > niters, tree nitersm1, > > > > > > > > > > > niters_vector_mult_vf steps. */ > > > > > > > > > > > gcc_checking_assert (vect_can_advance_ivs_p > > (loop_vinfo)); > > > > > > > > > > > update_e = skip_vector ? e : loop_preheader_edge (epilog); > > > > > > > > > > > - vect_update_ivs_after_vectorizer (loop_vinfo, > > > > > > niters_vector_mult_vf, > > > > > > > > > > > - update_e); > > > > > > > > > > > + if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)) > > > > > > > > > > > + update_e = single_succ_edge (e->dest); > > > > > > > > > > > + bool inversed_iv > > > > > > > > > > > + = !vect_is_loop_exit_latch_pred > > (LOOP_VINFO_IV_EXIT > > > > > > (loop_vinfo), > > > > > > > > > > > + LOOP_VINFO_LOOP > > > > > > (loop_vinfo)); > > > > > > > > > > > > > > > > > > > > You are computing this here and in > > > > vect_update_ivs_after_vectorizer? > > > > > > > > > > > > > > > > > > > > > + > > > > > > > > > > > + /* Update the main exit first. */ > > > > > > > > > > > + vect_update_ivs_after_vectorizer (loop_vinfo, > > > > > > > > > > > + vf, > > > > > > > > niters_vector_mult_vf, > > > > > > > > > > > + update_e, > > inversed_iv); > > > > > > > > > > > + > > > > > > > > > > > + /* And then update the early exits. */ > > > > > > > > > > > + for (auto exit : get_loop_exit_edges (loop)) > > > > > > > > > > > + { > > > > > > > > > > > + if (exit == LOOP_VINFO_IV_EXIT (loop_vinfo)) > > > > > > > > > > > + continue; > > > > > > > > > > > + > > > > > > > > > > > + vect_update_ivs_after_vectorizer (loop_vinfo, vf, > > > > > > > > > > > + > > niters_vector_mult_vf, > > > > > > > > > > > + exit, true); > > > > > > > > > > > > > > > > > > > > ... why does the same not work here? Wouldn't the > > > > > > > > > > proper condition be !dominated_by_p (CDI_DOMINATORS, > > > > > > > > > > exit->src, LOOP_VINFO_IV_EXIT > > > > > > > > > > (loop_vinfo)->src) or similar? That is, whether the > > > > > > > > > > exit is at or after the main IV exit? (consider having > > > > > > > > > > two) > > > > > > > > > > > > > > > > > > > > > + } > > > > > > > > > > > > > > > > > > > > > > if (skip_epilog) > > > > > > > > > > > { > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > Richard Biener SUSE Software Solutions > > > > > > > > Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; > > > > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, > > > > > > > > AG > > > > > > > > Nuernberg) > > > > > > > > > > > > > > > > > > > -- > > > > > > Richard Biener SUSE Software Solutions > > > > > > Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; > > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG > > > > > > Nuernberg) > > > > > > > > > > > > > -- > > > > Richard Biener > > > > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 > > > > Nuernberg, Germany; > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG > > > > Nuernberg) > > > > > > > -- > > Richard Biener > > SUSE Software Solutions Germany GmbH, > > Frankenstrasse 146, 90461 Nuernberg, Germany; > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG > > Nuernberg) > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)