Hi,

On 2021/8/16 19:46, Richard Biener wrote:
> On Mon, 16 Aug 2021, Xiong Hu Luo wrote:
> 
>> It seems to me that ALWAYS_EXECUTED_IN is not computed correctly for
>> nested loops.  inn_loop is updated to inner loop, so it need be restored
>> when exiting from innermost loop. With this patch, the store instruction
>> in outer loop could also be moved out of outer loop by store motion.
>> Any comments?  Thanks.
> 
>> gcc/ChangeLog:
>>
>> 	* tree-ssa-loop-im.c (fill_always_executed_in_1): Restore
>> 	inn_loop when exiting from innermost loop.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 	* gcc.dg/tree-ssa/ssa-lim-19.c: New test.
>> ---
>>   gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c | 24 ++++++++++++++++++++++
>>   gcc/tree-ssa-loop-im.c                     |  6 +++++-
>>   2 files changed, 29 insertions(+), 1 deletion(-)
>>   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
>>
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
>> new file mode 100644
>> index 00000000000..097a5ee4a4b
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
>> @@ -0,0 +1,24 @@
>> +/* PR/101293 */
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -fdump-tree-lim2-details" } */
>> +
>> +struct X { int i; int j; int k;};
>> +
>> +void foo(struct X *x, int n, int l)
>> +{
>> +  for (int j = 0; j < l; j++)
>> +    {
>> +      for (int i = 0; i < n; ++i)
>> +	{
>> +	  int *p = &x->j;
>> +	  int tem = *p;
>> +	  x->j += tem * i;
>> +	}
>> +      int *r = &x->k;
>> +      int tem2 = *r;
>> +      x->k += tem2 * j;
>> +    }
>> +}
>> +
>> +/* { dg-final { scan-tree-dump-times "Executing store motion" 2 "lim2" } } */
>> +
>> diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
>> index b24bc64f2a7..5ca4738b20e 100644
>> --- a/gcc/tree-ssa-loop-im.c
>> +++ b/gcc/tree-ssa-loop-im.c
>> @@ -3211,6 +3211,10 @@ fill_always_executed_in_1 (class loop *loop, sbitmap contains_call)
>>   	  if (dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
>>   	    last = bb;
>>   
>> +	  if (inn_loop != loop
>> +	      && flow_loop_nested_p (bb->loop_father, inn_loop))
>> +	    inn_loop = bb->loop_father;
>> +
> 
> The comment says
> 
>                /* In a loop that is always entered we may proceed anyway.
>                   But record that we entered it and stop once we leave it.
> */
>                inn_loop = bb->loop_father;
> 
> and your change would defeat that early return, no?

The issue is the search method exits too early when iterating the outer
loop.  For example of a nested loop, loop 1 includes 5,8,3,10,4,9
and loop2 includes 3,10.  Currently, it breaks when bb is 3 as bb 3
doesn't dominate bb 9 of loop 1.  But actually, both bb 5 and bb 4 are
ALWAYS_EXECUTED for loop 1, so if there are store instructions in bb 4
they won't be processed by store motion again.
 

     5<----
     |\   |
     8 \  9
     |  \ |
 --->3--->4
|    | 
10---|
 

SET_ALWAYS_EXECUTED_IN is only set to bb 5 on master code now, with this
patch, it will continue search when meet bb 3 until bb 4, then last is updated
to bb 4, it will break until exit edge is found at bb 4 by
"if (!flow_bb_inside_loop_p (loop, e->dest))".  Then the followed loop code will
set bb 4 as ALWAYS_EXEUCTED and all it's idoms bb 5.


      while (1)
	{
	  SET_ALWAYS_EXECUTED_IN (last, loop);
	  if (last == loop->header)
	    break;
	  last = get_immediate_dominator (CDI_DOMINATORS, last);
	}

After further discussion with Kewen, we found that the inn_loop variable is
totally useless and could be removed.


> 
>>   	  if (bitmap_bit_p (contains_call, bb->index))
>>   	    break;
>>   
>> @@ -3238,7 +3242,7 @@ fill_always_executed_in_1 (class loop *loop, sbitmap contains_call)
>>   
>>   	  if (bb->loop_father->header == bb)
>>   	    {
>> -	      if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
>> +	      if (!dominated_by_p (CDI_DOMINATORS, bb->loop_father->latch, bb))
>>   		break;
> 
> That's now a always false condition - a loops latch is always dominated
> by its header.  The condition as written tries to verify whether the
> loop is always entered - mind we visit all blocks, not only those
> always executed.

Thanks for the catch!  I am afraid the piece of code should be removed since it stops
search of potential ALWAYS EXECUTED bb after inner loop...

> 
> In fact for your testcase the x->j ref is _not_ always executed
> since the inner loop is conditional on n > 0.

Yes.  But I want to move x->k (not x->j) out of loop 1 when l > 0 in store-motion.
Attached the diff file without and with my patch to show the extra optimization.

x->j is already moved out of loop 2 on master code.
If change n and l to constant numbers like 100, master code could also do 2 store
motions as expected. The edge from bb 5 to bb 4 doesn't exist now, so bb 4, bb 3
and bb 5 are ALWAYS EXECUTED for loop 1.


struct X { int i; int j; int k;};

void foo(struct X *x, int n, int l)
{
  for (int j = 0; j < l; j++) // loop 1
    {
      for (int i = 0; i < n; ++i)  // loop 2
        {
          int *p = &x->j;
          int tem = *p;
          x->j += tem * i;
        }
      int *r = &x->k;
      int tem2 = *r;
      x->k += tem2 * j;
    }
}

> 
> Richard.
> 

-- 
Thanks,
Xionghu