On 11/19/21 11:00, Richard Biener wrote:
> On Tue, Nov 16, 2021 at 3:40 PM Martin Liška <mliska@suse.cz> wrote:
>>
>> On 11/11/21 08:15, Richard Biener wrote:
>>> So I'd try to do no functional change first, improving the costing and
>>> setting up the transform to simply pick up the stmts to "fold" as discovered
>>> during analysis (as I hinted you possibly can use gimple_uid to mark
>>> the stmts that simplify, IIRC gimple_uid is preserved during copying.
>>> gimple_uid would also scale better than gimple_plf in case we do
>>> the analysis for all candidates at once).
>>
>> Thinking about the analysis. Am I correct that we want to properly calculate
>> loop size for true and false edge of a potential gcond before the actually unswitching?
> 
> Yes.
> 
>> We can do that by finding a first gcond candidate, evaluate (symbolic + irange approache)
>> all other gcond in the loop body and use BB_REACHABLE discovery. Similarly to what we do now
>> at lines 378-446. Then tree_num_loop_insns can be adjusted for only these reachable blocks.
>> Having that, we can calculate # of insns that will live in true/false loops.
> 
> So whatever we do here we should record as "this control stmt folds to
> {true,false}" (or {true,unknown},
> or in future, "this control stmt will lead to edge {e,unknown}"),
> recording the simplification
> on the true/false loop version in a way we can apply it after the transform.
> 
>> Then we can call tree_unswitch_loop and make the gcond folding as we do in the versioned loops.
>>
>> Is it a step in good direction? Having that we can then extend it to gswitch statements.
> 
> One issue I see is that BB_REACHABLE is there only once but you could use
> auto_bb_flag reachable_true, reachable_false to distinguish the
> true/false loop version
> copies.
> 
> So yes, I think that sounds reasonable.  At the point we want to
> evaluate different
> (first) unswitching opportunities against each other storing this only
> as BB flag is
> likely to hit limits.  When we want to evaluate multiple levels of
> unswitching before
> doing any transforms even more so (if there are 3 opportunities there'd be
> many cases to be considered when going to level 3 ;)).  I _think_ that a sparse
> lattice of stmt UID -> edge might do the trick if we change tree_num_loop_insns
> do to a DFS walk from the loop header, ignoring not taken edges by
> consulting the
> lattice.  Or, for speed reason, pre-compute tree_num_loop_insns for each BB
> so we just have to sum a different set of BBs rather than walking all
> stmts again.
> 
> That said, the second step would definitely be to choose the "best" opportunity
> on the current level.
> 
> Richard.
> 
>> Cheers,
>> Martin

Hello.

I'm sending a new version where I changed:
1) all unswitch_predicates are find for a loop
2) context sensitive costing happens based on an unswitch_predicate and BB reachability
    is implemented
3) folding happens in recursive invocation once we decide to unswitch
4) the patch folds both symbolic gcond predicates and irange provided by ranger
5) debug counter was added

Patch can bootstrap on x86_64-linux-gnu and survives regression tests. Plus, I tested it
on SPEC2006 and SPEC2017 with -size=ref.

Thoughts?
Thanks,
Martin