on 2021/8/3 下午8:08, Richard Biener wrote:
> On Fri, Jul 30, 2021 at 7:20 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>
>> on 2021/7/29 下午4:01, Richard Biener wrote:
>>> On Fri, Jul 23, 2021 at 10:41 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>>>
>>>> on 2021/7/22 下午8:56, Richard Biener wrote:
>>>>> On Tue, Jul 20, 2021 at 4:37
>>>>> PM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> This v2 has addressed some review comments/suggestions:
>>>>>>
>>>>>>   - Use "!=" instead of "<" in function operator!= (const Iter &rhs)
>>>>>>   - Add new CTOR loops_list (struct loops *loops, unsigned flags)
>>>>>>     to support loop hierarchy tree rather than just a function,
>>>>>>     and adjust to use loops* accordingly.
>>>>>
>>>>> I actually meant struct loop *, not struct loops * ;)  At the point
>>>>> we pondered to make loop invariant motion work on single
>>>>> loop nests we gave up not only but also because it iterates
>>>>> over the loop nest but all the iterators only ever can process
>>>>> all loops, not say, all loops inside a specific 'loop' (and
>>>>> including that 'loop' if LI_INCLUDE_ROOT).  So the
>>>>> CTOR would take the 'root' of the loop tree as argument.
>>>>>
>>>>> I see that doesn't trivially fit how loops_list works, at least
>>>>> not for LI_ONLY_INNERMOST.  But I guess FROM_INNERMOST
>>>>> could be adjusted to do ONLY_INNERMOST as well?
>>>>>
>>>>
>>>>
>>>> Thanks for the clarification!  I just realized that the previous
>>>> version with struct loops* is problematic, all traversal is
>>>> still bounded with outer_loop == NULL.  I think what you expect
>>>> is to respect the given loop_p root boundary.  Since we just
>>>> record the loops' nums, I think we still need the function* fn?
>>>
>>> Would it simplify things if we recorded the actual loop *?
>>>
>>
>> I'm afraid it's unsafe to record the loop*.  I had the same
>> question why the loop iterator uses index rather than loop* when
>> I read this at the first time.  I guess the design of processing
>> loops allows its user to update or even delete the folllowing
>> loops to be visited.  For example, when the user does some tricks
>> on one loop, then it duplicates the loop and its children to
>> somewhere and then removes the loop and its children, when
>> iterating onto its children later, the "index" way will check its
>> validity by get_loop at that point, but the "loop *" way will
>> have some recorded pointers to become dangling, can't do the
>> validity check on itself, seems to need a side linear search to
>> ensure the validity.
>>
>>> There's still the to_visit reserve which needs a bound on
>>> the number of loops for efficiency reasons.
>>>
>>
>> Yes, I still keep the fn in the updated version.
>>
>>>> So I add one optional argument loop_p root and update the
>>>> visiting codes accordingly.  Before this change, the previous
>>>> visiting uses the outer_loop == NULL as the termination condition,
>>>> it perfectly includes the root itself, but with this given root,
>>>> we have to use it as the termination condition to avoid to iterate
>>>> onto its possible existing next.
>>>>
>>>> For LI_ONLY_INNERMOST, I was thinking whether we can use the
>>>> code like:
>>>>
>>>>     struct loops *fn_loops = loops_for_fn (fn)->larray;
>>>>     for (i = 0; vec_safe_iterate (fn_loops, i, &aloop); i++)
>>>>         if (aloop != NULL
>>>>             && aloop->inner == NULL
>>>>             && flow_loop_nested_p (tree_root, aloop))
>>>>              this->to_visit.quick_push (aloop->num);
>>>>
>>>> it has the stable bound, but if the given root only has several
>>>> child loops, it can be much worse if there are many loops in fn.
>>>> It seems impossible to predict the given root loop hierarchy size,
>>>> maybe we can still use the original linear searching for the case
>>>> loops_for_fn (fn) == root?  But since this visiting seems not so
>>>> performance critical, I chose to share the code originally used
>>>> for FROM_INNERMOST, hope it can have better readability and
>>>> maintainability.
>>>
>>> I was indeed looking for something that has execution/storage
>>> bound on the subtree we're interested in.  If we pull the CTOR
>>> out-of-line we can probably keep the linear search for
>>> LI_ONLY_INNERMOST when looking at the whole loop tree.
>>>
>>
>> OK, I've moved the suggested single loop tree walker out-of-line
>> to cfgloop.c, and brought the linear search back for
>> LI_ONLY_INNERMOST when looking at the whole loop tree.
>>
>>> It just seemed to me that we can eventually re-use a
>>> single loop tree walker for all orders, just adjusting the
>>> places we push.
>>>
>>
>> Wow, good point!  Indeed, I have further unified all orders
>> handlings into a single function walk_loop_tree.
>>
>>>>
>>>> Bootstrapped and regtested on powerpc64le-linux-gnu P9,
>>>> x86_64-redhat-linux and aarch64-linux-gnu, also
>>>> bootstrapped on ppc64le P9 with bootstrap-O3 config.
>>>>
>>>> Does the attached patch meet what you expect?
>>>
>>> So yeah, it's probably close to what is sensible.  Not sure
>>> whether optimizing the loops for the !only_push_innermost_p
>>> case is important - if we manage to produce a single
>>> walker with conditionals based on 'flags' then IPA-CP should
>>> produce optimal clones as well I guess.
>>>
>>
>> Thanks for the comments, the updated v2 is attached.
>> Comparing with v1, it does:
>>
>>   - Unify one single loop tree walker for all orders.
>>   - Move walk_loop_tree out-of-line to cfgloop.c.
>>   - Keep the linear search for LI_ONLY_INNERMOST with
>>     tree_root of fn loops.
>>   - Use class loop * instead of loop_p.
>>
>> Bootstrapped & regtested on powerpc64le-linux-gnu Power9
>> (with/without the hunk for LI_ONLY_INNERMOST linear search,
>> it can have the coverage to exercise LI_ONLY_INNERMOST
>> in walk_loop_tree when "without").
>>
>> Is it ok for trunk?
> 
> Looks good to me.  I think that the 'mn' was an optimization
> for the linear walk and it's cheaper to pointer test against
> the actual 'root' loop (no need to dereference).  Thus
> 
> +  if (flags & LI_ONLY_INNERMOST && tree_root == loops->tree_root)
>      {
> -      for (i = 0; vec_safe_iterate (loops_for_fn (fn)->larray, i, &aloop); i++)
> +      class loop *aloop;
> +      unsigned int i;
> +      for (i = 0; vec_safe_iterate (loops->larray, i, &aloop); i++)
>         if (aloop != NULL
>             && aloop->inner == NULL
> -           && aloop->num >= mn)
> +           && aloop->num != mn)
>           this->to_visit.quick_push (aloop->num);
> 
> could elide the aloop->num != mn check and start iterating from 1,
> since loops->tree_root->num == 0
> 
> and the walk_loop_tree could simply do
> 
>   class loop *exclude = flags & LI_INCLUDE_ROOT ? NULL : root;
> 
> and pointer test aloop against exclude.  That avoids the idea that
> 'mn' is a vehicle to exclude one random loop from the iteration.
> 

Good idea!  Thanks for the comments!  The attached v3 has addressed
the review comments on "mn".

Bootstrapped & regtested again on powerpc64le-linux-gnu Power9
(with/without the hunk for LI_ONLY_INNERMOST linear search).

Is it ok for trunk?

BR,
Kewen
-----
gcc/ChangeLog:

	* cfgloop.h (loops_list::loops_list): Add one optional argument root
	and adjust accordingly, update loop tree walking and factor out
	to ...
	* cfgloop.c (loops_list::walk_loop_tree): ...this.  New function.