public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH, PR66846] Mark inner loop for fixup in parloops
@ 2015-07-15 19:49 Tom de Vries
  2015-07-16  8:46 ` Richard Biener
  0 siblings, 1 reply; 9+ messages in thread
From: Tom de Vries @ 2015-07-15 19:49 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2920 bytes --]

Hi,

I.

In openmp expansion of loops, we do some effort to try to create 
matching loops in the loop state of the child function, f.i.in 
expand_omp_for_generic:
...
       struct loop *outer_loop;
       if (seq_loop)
         outer_loop = l0_bb->loop_father;
       else
         {
           outer_loop = alloc_loop ();
           outer_loop->header = l0_bb;
           outer_loop->latch = l2_bb;
           add_loop (outer_loop, l0_bb->loop_father);
         }

       if (!gimple_omp_for_combined_p (fd->for_stmt))
         {
           struct loop *loop = alloc_loop ();
           loop->header = l1_bb;
           /* The loop may have multiple latches.  */
           add_loop (loop, outer_loop);
         }
...

And if that doesn't work out, we try to mark the loop state for fixup, 
in expand_omp_taskreg and expand_omp_target:
...
       /* When the OMP expansion process cannot guarantee an up-to-date
          loop tree arrange for the child function to fixup loops.  */
       if (loops_state_satisfies_p (LOOPS_NEED_FIXUP))
         child_cfun->x_current_loops->state |= LOOPS_NEED_FIXUP;
...

and expand_omp_for:
...
   else
     /* If there isn't a continue then this is a degerate case where
        the introduction of abnormal edges during lowering will prevent
        original loops from being detected.  Fix that up.  */
     loops_state_set (LOOPS_NEED_FIXUP);
...

However, loops are fixed up anyway, because the first pass we execute 
with the new child function is pass_fixup_cfg.

The new child function contains a function call to 
__builtin_omp_get_num_threads, which is marked with ECF_CONST, so 
execute_fixup_cfg marks the function for TODO_cleanup_cfg, and 
subsequently the loops with LOOPS_NEED_FIXUP.


II.

This patch adds a verification that at the end of the omp-expand 
processing of the child function, either the loop structure is ok, or 
marked for fixup.

This verfication triggered a failure in parloops. When an outer loop is 
being parallelized, both the outer and inner loop are cancelled. Then 
during omp-expansion, we create a loop in the loop state for the outer 
loop (the one that is transformed), but not for the inner, which causes 
the verification failure:
...
outer-1.c:11:3: error: loop with header 5 not in loop tree
...

[ I ran into this verification failure with an openacc kernels testcase 
on the gomp-4_0-branch, where parloops is called additionally from a 
different location, and pass_fixup_cfg is not the first pass that the 
child function is processed by. ]

The patch contains a bit that makes sure that the loop state of the 
child function is marked for fixup in parloops. The bit is non-trival 
since it create a loop state and sets the fixup flag on the loop state, 
but postpones the init_loops_structure call till move_sese_region_to_fn, 
where it can succeed.


III.

Bootstrapped and reg-tested on x86_64.

OK for trunk?

Thanks,
- Tom

[-- Attachment #2: 0001-Mark-inner-loop-for-fixup-in-parloops.patch --]
[-- Type: text/x-patch, Size: 3649 bytes --]

Mark inner loop for fixup in parloops

2015-07-13  Tom de Vries  <tom@codesourcery.com>

	PR tree-optimization/66846
	* omp-low.c (expand_omp_taskreg) [ENABLE_CHECKING]: If
	!LOOPS_NEED_FIXUP, verify_loop_structure in child_fn.
	(expand_omp_target) [ENABLE_CHECKING]: Same.
	* tree-cfg.c (move_sese_region_to_fn): Only allocate struct loops if
	necessary.  Preserve dest_cfun->x_current_loops->state while calling
	init_loops_structure.
	* tree-parloops.c (gen_parallel_loop): If inner loop is cancelled,
	allocate struct loop and mark child_fn for loop fixup.
---
 gcc/omp-low.c       |  8 ++++++++
 gcc/tree-cfg.c      | 26 +++++++++++++++++++++-----
 gcc/tree-parloops.c | 10 +++++++++-
 3 files changed, 38 insertions(+), 6 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 0e69bc2..64d9742 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -5603,6 +5603,10 @@ expand_omp_taskreg (struct omp_region *region)
 	}
       if (gimple_in_ssa_p (cfun))
 	update_ssa (TODO_update_ssa);
+#ifdef ENABLE_CHECKING
+      if (!loops_state_satisfies_p (LOOPS_NEED_FIXUP))
+	verify_loop_structure ();
+#endif
       pop_cfun ();
     }
 
@@ -8983,6 +8987,10 @@ expand_omp_target (struct omp_region *region)
 	  if (changed)
 	    cleanup_tree_cfg ();
 	}
+#ifdef ENABLE_CHECKING
+      if (!loops_state_satisfies_p (LOOPS_NEED_FIXUP))
+	verify_loop_structure ();
+#endif
       pop_cfun ();
     }
 
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index d97b824..6b415fe 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -7126,11 +7126,27 @@ move_sese_region_to_fn (struct function *dest_cfun, basic_block entry_bb,
 	}
     }
 
-  /* Initialize an empty loop tree.  */
-  struct loops *loops = ggc_cleared_alloc<struct loops> ();
-  init_loops_structure (dest_cfun, loops, 1);
-  loops->state = LOOPS_MAY_HAVE_MULTIPLE_LATCHES;
-  set_loops_for_fn (dest_cfun, loops);
+  struct loops *loops;
+  int loop_state_flags = 0;
+  if (dest_cfun->x_current_loops == NULL)
+    {
+      /* Initialize an empty loop tree.  */
+      loops = ggc_cleared_alloc<struct loops> ();
+      set_loops_for_fn (dest_cfun, loops);
+    }
+  else
+    {
+      loops = dest_cfun->x_current_loops;
+      loop_state_flags = loops->state;
+    }
+
+  if (loops->tree_root == NULL)
+    {
+      init_loops_structure (dest_cfun, loops, 1);
+      loops->state |= loop_state_flags;
+    }
+
+  loops->state |= LOOPS_MAY_HAVE_MULTIPLE_LATCHES;
 
   /* Move the outlined loop tree part.  */
   num_nodes = bbs.length ();
diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index 036677b..23b134f 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -2267,7 +2267,8 @@ gen_parallel_loop (struct loop *loop,
   cond_stmt = last_stmt (loop->header);
   if (cond_stmt)
     loc = gimple_location (cond_stmt);
-  create_parallel_loop (loop, create_loop_fn (loc), arg_struct,
+  tree child_fn = create_loop_fn (loc);
+  create_parallel_loop (loop, child_fn, arg_struct,
 			new_arg_struct, n_threads, loc);
   if (reduction_list->elements () > 0)
     create_call_for_reduction (loop, reduction_list, &clsn_data);
@@ -2276,6 +2277,13 @@ gen_parallel_loop (struct loop *loop,
 
   /* Cancel the loop (it is simpler to do it here rather than to teach the
      expander to do it).  */
+  if (loop->inner)
+    {
+      struct function *child_cfun = DECL_STRUCT_FUNCTION (child_fn);
+      struct loops *loops = ggc_cleared_alloc<struct loops> ();
+      set_loops_for_fn (child_cfun, loops);
+      child_cfun->x_current_loops->state = LOOPS_NEED_FIXUP;
+    }
   cancel_loop_tree (loop);
 
   /* Free loop bound estimations that could contain references to
-- 
1.9.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH, PR66846] Mark inner loop for fixup in parloops
  2015-07-15 19:49 [PATCH, PR66846] Mark inner loop for fixup in parloops Tom de Vries
@ 2015-07-16  8:46 ` Richard Biener
  2015-07-16 10:12   ` Tom de Vries
  0 siblings, 1 reply; 9+ messages in thread
From: Richard Biener @ 2015-07-16  8:46 UTC (permalink / raw)
  To: Tom de Vries; +Cc: gcc-patches

On Wed, Jul 15, 2015 at 9:36 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
> Hi,
>
> I.
>
> In openmp expansion of loops, we do some effort to try to create matching
> loops in the loop state of the child function, f.i.in
> expand_omp_for_generic:
> ...
>       struct loop *outer_loop;
>       if (seq_loop)
>         outer_loop = l0_bb->loop_father;
>       else
>         {
>           outer_loop = alloc_loop ();
>           outer_loop->header = l0_bb;
>           outer_loop->latch = l2_bb;
>           add_loop (outer_loop, l0_bb->loop_father);
>         }
>
>       if (!gimple_omp_for_combined_p (fd->for_stmt))
>         {
>           struct loop *loop = alloc_loop ();
>           loop->header = l1_bb;
>           /* The loop may have multiple latches.  */
>           add_loop (loop, outer_loop);
>         }
> ...
>
> And if that doesn't work out, we try to mark the loop state for fixup, in
> expand_omp_taskreg and expand_omp_target:
> ...
>       /* When the OMP expansion process cannot guarantee an up-to-date
>          loop tree arrange for the child function to fixup loops.  */
>       if (loops_state_satisfies_p (LOOPS_NEED_FIXUP))
>         child_cfun->x_current_loops->state |= LOOPS_NEED_FIXUP;
> ...
>
> and expand_omp_for:
> ...
>   else
>     /* If there isn't a continue then this is a degerate case where
>        the introduction of abnormal edges during lowering will prevent
>        original loops from being detected.  Fix that up.  */
>     loops_state_set (LOOPS_NEED_FIXUP);
> ...
>
> However, loops are fixed up anyway, because the first pass we execute with
> the new child function is pass_fixup_cfg.
>
> The new child function contains a function call to
> __builtin_omp_get_num_threads, which is marked with ECF_CONST, so
> execute_fixup_cfg marks the function for TODO_cleanup_cfg, and subsequently
> the loops with LOOPS_NEED_FIXUP.
>
>
> II.
>
> This patch adds a verification that at the end of the omp-expand processing
> of the child function, either the loop structure is ok, or marked for fixup.
>
> This verfication triggered a failure in parloops. When an outer loop is
> being parallelized, both the outer and inner loop are cancelled. Then during
> omp-expansion, we create a loop in the loop state for the outer loop (the
> one that is transformed), but not for the inner, which causes the
> verification failure:
> ...
> outer-1.c:11:3: error: loop with header 5 not in loop tree
> ...
>
> [ I ran into this verification failure with an openacc kernels testcase on
> the gomp-4_0-branch, where parloops is called additionally from a different
> location, and pass_fixup_cfg is not the first pass that the child function
> is processed by. ]
>
> The patch contains a bit that makes sure that the loop state of the child
> function is marked for fixup in parloops. The bit is non-trival since it
> create a loop state and sets the fixup flag on the loop state, but postpones
> the init_loops_structure call till move_sese_region_to_fn, where it can
> succeed.
>
>
> III.
>
> Bootstrapped and reg-tested on x86_64.
>
> OK for trunk?

+  struct loops *loops;
+  int loop_state_flags = 0;
+  if (dest_cfun->x_current_loops == NULL)
+    {
+      /* Initialize an empty loop tree.  */
+      loops = ggc_cleared_alloc<struct loops> ();
+      set_loops_for_fn (dest_cfun, loops);
+    }
+  else
+    {
+      loops = dest_cfun->x_current_loops;
+      loop_state_flags = loops->state;
+    }
+
+  if (loops->tree_root == NULL)
+    {
+      init_loops_structure (dest_cfun, loops, 1);
+      loops->state |= loop_state_flags;
+    }
+
+  loops->state |= LOOPS_MAY_HAVE_MULTIPLE_LATCHES;

this looks twisted just because you do a half-way init of the loop tree here:

+  if (loop->inner)
+    {
+      struct function *child_cfun = DECL_STRUCT_FUNCTION (child_fn);
+      struct loops *loops = ggc_cleared_alloc<struct loops> ();
+      set_loops_for_fn (child_cfun, loops);
+      child_cfun->x_current_loops->state = LOOPS_NEED_FIXUP;
+    }

why not unconditionally initialize the loop tree properly in autopar?

> Thanks,
> - Tom

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH, PR66846] Mark inner loop for fixup in parloops
  2015-07-16  8:46 ` Richard Biener
@ 2015-07-16 10:12   ` Tom de Vries
  2015-07-16 10:23     ` Richard Biener
  0 siblings, 1 reply; 9+ messages in thread
From: Tom de Vries @ 2015-07-16 10:12 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

On 16/07/15 10:44, Richard Biener wrote:
> On Wed, Jul 15, 2015 at 9:36 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>> Hi,
>>
>> I.
>>
>> In openmp expansion of loops, we do some effort to try to create matching
>> loops in the loop state of the child function, f.i.in
>> expand_omp_for_generic:
>> ...
>>        struct loop *outer_loop;
>>        if (seq_loop)
>>          outer_loop = l0_bb->loop_father;
>>        else
>>          {
>>            outer_loop = alloc_loop ();
>>            outer_loop->header = l0_bb;
>>            outer_loop->latch = l2_bb;
>>            add_loop (outer_loop, l0_bb->loop_father);
>>          }
>>
>>        if (!gimple_omp_for_combined_p (fd->for_stmt))
>>          {
>>            struct loop *loop = alloc_loop ();
>>            loop->header = l1_bb;
>>            /* The loop may have multiple latches.  */
>>            add_loop (loop, outer_loop);
>>          }
>> ...
>>
>> And if that doesn't work out, we try to mark the loop state for fixup, in
>> expand_omp_taskreg and expand_omp_target:
>> ...
>>        /* When the OMP expansion process cannot guarantee an up-to-date
>>           loop tree arrange for the child function to fixup loops.  */
>>        if (loops_state_satisfies_p (LOOPS_NEED_FIXUP))
>>          child_cfun->x_current_loops->state |= LOOPS_NEED_FIXUP;
>> ...
>>
>> and expand_omp_for:
>> ...
>>    else
>>      /* If there isn't a continue then this is a degerate case where
>>         the introduction of abnormal edges during lowering will prevent
>>         original loops from being detected.  Fix that up.  */
>>      loops_state_set (LOOPS_NEED_FIXUP);
>> ...
>>
>> However, loops are fixed up anyway, because the first pass we execute with
>> the new child function is pass_fixup_cfg.
>>
>> The new child function contains a function call to
>> __builtin_omp_get_num_threads, which is marked with ECF_CONST, so
>> execute_fixup_cfg marks the function for TODO_cleanup_cfg, and subsequently
>> the loops with LOOPS_NEED_FIXUP.
>>
>>
>> II.
>>
>> This patch adds a verification that at the end of the omp-expand processing
>> of the child function, either the loop structure is ok, or marked for fixup.
>>
>> This verfication triggered a failure in parloops. When an outer loop is
>> being parallelized, both the outer and inner loop are cancelled. Then during
>> omp-expansion, we create a loop in the loop state for the outer loop (the
>> one that is transformed), but not for the inner, which causes the
>> verification failure:
>> ...
>> outer-1.c:11:3: error: loop with header 5 not in loop tree
>> ...
>>
>> [ I ran into this verification failure with an openacc kernels testcase on
>> the gomp-4_0-branch, where parloops is called additionally from a different
>> location, and pass_fixup_cfg is not the first pass that the child function
>> is processed by. ]
>>
>> The patch contains a bit that makes sure that the loop state of the child
>> function is marked for fixup in parloops. The bit is non-trival since it
>> create a loop state and sets the fixup flag on the loop state, but postpones
>> the init_loops_structure call till move_sese_region_to_fn, where it can
>> succeed.
>>
>>
>> III.
>>
>> Bootstrapped and reg-tested on x86_64.
>>
>> OK for trunk?
>
> +  struct loops *loops;
> +  int loop_state_flags = 0;
> +  if (dest_cfun->x_current_loops == NULL)
> +    {
> +      /* Initialize an empty loop tree.  */
> +      loops = ggc_cleared_alloc<struct loops> ();
> +      set_loops_for_fn (dest_cfun, loops);
> +    }
> +  else
> +    {
> +      loops = dest_cfun->x_current_loops;
> +      loop_state_flags = loops->state;
> +    }
> +
> +  if (loops->tree_root == NULL)
> +    {
> +      init_loops_structure (dest_cfun, loops, 1);
> +      loops->state |= loop_state_flags;
> +    }
> +
> +  loops->state |= LOOPS_MAY_HAVE_MULTIPLE_LATCHES;
>
> this looks twisted just because you do a half-way init of the loop tree here:
>
> +  if (loop->inner)
> +    {
> +      struct function *child_cfun = DECL_STRUCT_FUNCTION (child_fn);
> +      struct loops *loops = ggc_cleared_alloc<struct loops> ();
> +      set_loops_for_fn (child_cfun, loops);
> +      child_cfun->x_current_loops->state = LOOPS_NEED_FIXUP;
> +    }
>
> why not unconditionally initialize the loop tree properly in autopar?
>

Because init_loops_structure accesses f.i. n_basic_blocks_for_fn 
(child_cfun), in other words child_cfun->cfg->x_n_basic_blocks. At this 
point in parloops, there's no child_cfun->cfg yet, that field is set by 
the following pass_expand_omp_ssa.

Normally, the solution is to do loops_state_set (LOOPS_NEED_FIXUP) for 
the parent function, which will get propagated to the child. But I'm 
trying to be more precise than that, by only setting LOOPS_NEED_FIXUP 
for the child, not the parent.

Thanks,
- Tom

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH, PR66846] Mark inner loop for fixup in parloops
  2015-07-16 10:12   ` Tom de Vries
@ 2015-07-16 10:23     ` Richard Biener
  2015-07-20 13:38       ` Tom de Vries
  0 siblings, 1 reply; 9+ messages in thread
From: Richard Biener @ 2015-07-16 10:23 UTC (permalink / raw)
  To: Tom de Vries; +Cc: gcc-patches

On Thu, Jul 16, 2015 at 11:39 AM, Tom de Vries <Tom_deVries@mentor.com> wrote:
> On 16/07/15 10:44, Richard Biener wrote:
>>
>> On Wed, Jul 15, 2015 at 9:36 PM, Tom de Vries <Tom_deVries@mentor.com>
>> wrote:
>>>
>>> Hi,
>>>
>>> I.
>>>
>>> In openmp expansion of loops, we do some effort to try to create matching
>>> loops in the loop state of the child function, f.i.in
>>> expand_omp_for_generic:
>>> ...
>>>        struct loop *outer_loop;
>>>        if (seq_loop)
>>>          outer_loop = l0_bb->loop_father;
>>>        else
>>>          {
>>>            outer_loop = alloc_loop ();
>>>            outer_loop->header = l0_bb;
>>>            outer_loop->latch = l2_bb;
>>>            add_loop (outer_loop, l0_bb->loop_father);
>>>          }
>>>
>>>        if (!gimple_omp_for_combined_p (fd->for_stmt))
>>>          {
>>>            struct loop *loop = alloc_loop ();
>>>            loop->header = l1_bb;
>>>            /* The loop may have multiple latches.  */
>>>            add_loop (loop, outer_loop);
>>>          }
>>> ...
>>>
>>> And if that doesn't work out, we try to mark the loop state for fixup, in
>>> expand_omp_taskreg and expand_omp_target:
>>> ...
>>>        /* When the OMP expansion process cannot guarantee an up-to-date
>>>           loop tree arrange for the child function to fixup loops.  */
>>>        if (loops_state_satisfies_p (LOOPS_NEED_FIXUP))
>>>          child_cfun->x_current_loops->state |= LOOPS_NEED_FIXUP;
>>> ...
>>>
>>> and expand_omp_for:
>>> ...
>>>    else
>>>      /* If there isn't a continue then this is a degerate case where
>>>         the introduction of abnormal edges during lowering will prevent
>>>         original loops from being detected.  Fix that up.  */
>>>      loops_state_set (LOOPS_NEED_FIXUP);
>>> ...
>>>
>>> However, loops are fixed up anyway, because the first pass we execute
>>> with
>>> the new child function is pass_fixup_cfg.
>>>
>>> The new child function contains a function call to
>>> __builtin_omp_get_num_threads, which is marked with ECF_CONST, so
>>> execute_fixup_cfg marks the function for TODO_cleanup_cfg, and
>>> subsequently
>>> the loops with LOOPS_NEED_FIXUP.
>>>
>>>
>>> II.
>>>
>>> This patch adds a verification that at the end of the omp-expand
>>> processing
>>> of the child function, either the loop structure is ok, or marked for
>>> fixup.
>>>
>>> This verfication triggered a failure in parloops. When an outer loop is
>>> being parallelized, both the outer and inner loop are cancelled. Then
>>> during
>>> omp-expansion, we create a loop in the loop state for the outer loop (the
>>> one that is transformed), but not for the inner, which causes the
>>> verification failure:
>>> ...
>>> outer-1.c:11:3: error: loop with header 5 not in loop tree
>>> ...
>>>
>>> [ I ran into this verification failure with an openacc kernels testcase
>>> on
>>> the gomp-4_0-branch, where parloops is called additionally from a
>>> different
>>> location, and pass_fixup_cfg is not the first pass that the child
>>> function
>>> is processed by. ]
>>>
>>> The patch contains a bit that makes sure that the loop state of the child
>>> function is marked for fixup in parloops. The bit is non-trival since it
>>> create a loop state and sets the fixup flag on the loop state, but
>>> postpones
>>> the init_loops_structure call till move_sese_region_to_fn, where it can
>>> succeed.
>>>
>>>
>>> III.
>>>
>>> Bootstrapped and reg-tested on x86_64.
>>>
>>> OK for trunk?
>>
>>
>> +  struct loops *loops;
>> +  int loop_state_flags = 0;
>> +  if (dest_cfun->x_current_loops == NULL)
>> +    {
>> +      /* Initialize an empty loop tree.  */
>> +      loops = ggc_cleared_alloc<struct loops> ();
>> +      set_loops_for_fn (dest_cfun, loops);
>> +    }
>> +  else
>> +    {
>> +      loops = dest_cfun->x_current_loops;
>> +      loop_state_flags = loops->state;
>> +    }
>> +
>> +  if (loops->tree_root == NULL)
>> +    {
>> +      init_loops_structure (dest_cfun, loops, 1);
>> +      loops->state |= loop_state_flags;
>> +    }
>> +
>> +  loops->state |= LOOPS_MAY_HAVE_MULTIPLE_LATCHES;
>>
>> this looks twisted just because you do a half-way init of the loop tree
>> here:
>>
>> +  if (loop->inner)
>> +    {
>> +      struct function *child_cfun = DECL_STRUCT_FUNCTION (child_fn);
>> +      struct loops *loops = ggc_cleared_alloc<struct loops> ();
>> +      set_loops_for_fn (child_cfun, loops);
>> +      child_cfun->x_current_loops->state = LOOPS_NEED_FIXUP;
>> +    }
>>
>> why not unconditionally initialize the loop tree properly in autopar?
>>
>
> Because init_loops_structure accesses f.i. n_basic_blocks_for_fn
> (child_cfun), in other words child_cfun->cfg->x_n_basic_blocks. At this
> point in parloops, there's no child_cfun->cfg yet, that field is set by the
> following pass_expand_omp_ssa.

Well.  The above exposes too much internals about how this all works
(IMHO).

> Normally, the solution is to do loops_state_set (LOOPS_NEED_FIXUP) for the
> parent function, which will get propagated to the child. But I'm trying to
> be more precise than that, by only setting LOOPS_NEED_FIXUP for the child,
> not the parent.

Can we fix the root-cause of the issue instead?  That is, build a valid loop
structure in the first place?

Richard.

> Thanks,
> - Tom

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH, PR66846] Mark inner loop for fixup in parloops
  2015-07-16 10:23     ` Richard Biener
@ 2015-07-20 13:38       ` Tom de Vries
  2015-07-24 10:31         ` Tom de Vries
  0 siblings, 1 reply; 9+ messages in thread
From: Tom de Vries @ 2015-07-20 13:38 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 4450 bytes --]

On 16/07/15 12:15, Richard Biener wrote:
> On Thu, Jul 16, 2015 at 11:39 AM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>> On 16/07/15 10:44, Richard Biener wrote:
>>>
>>> On Wed, Jul 15, 2015 at 9:36 PM, Tom de Vries <Tom_deVries@mentor.com>
>>> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I.
>>>>
>>>> In openmp expansion of loops, we do some effort to try to create matching
>>>> loops in the loop state of the child function, f.i.in
>>>> expand_omp_for_generic:
>>>> ...
>>>>         struct loop *outer_loop;
>>>>         if (seq_loop)
>>>>           outer_loop = l0_bb->loop_father;
>>>>         else
>>>>           {
>>>>             outer_loop = alloc_loop ();
>>>>             outer_loop->header = l0_bb;
>>>>             outer_loop->latch = l2_bb;
>>>>             add_loop (outer_loop, l0_bb->loop_father);
>>>>           }
>>>>
>>>>         if (!gimple_omp_for_combined_p (fd->for_stmt))
>>>>           {
>>>>             struct loop *loop = alloc_loop ();
>>>>             loop->header = l1_bb;
>>>>             /* The loop may have multiple latches.  */
>>>>             add_loop (loop, outer_loop);
>>>>           }
>>>> ...
>>>>
>>>> And if that doesn't work out, we try to mark the loop state for fixup, in
>>>> expand_omp_taskreg and expand_omp_target:
>>>> ...
>>>>         /* When the OMP expansion process cannot guarantee an up-to-date
>>>>            loop tree arrange for the child function to fixup loops.  */
>>>>         if (loops_state_satisfies_p (LOOPS_NEED_FIXUP))
>>>>           child_cfun->x_current_loops->state |= LOOPS_NEED_FIXUP;
>>>> ...
>>>>
>>>> and expand_omp_for:
>>>> ...
>>>>     else
>>>>       /* If there isn't a continue then this is a degerate case where
>>>>          the introduction of abnormal edges during lowering will prevent
>>>>          original loops from being detected.  Fix that up.  */
>>>>       loops_state_set (LOOPS_NEED_FIXUP);
>>>> ...
>>>>
>>>> However, loops are fixed up anyway, because the first pass we execute
>>>> with
>>>> the new child function is pass_fixup_cfg.
>>>>
>>>> The new child function contains a function call to
>>>> __builtin_omp_get_num_threads, which is marked with ECF_CONST, so
>>>> execute_fixup_cfg marks the function for TODO_cleanup_cfg, and
>>>> subsequently
>>>> the loops with LOOPS_NEED_FIXUP.
>>>>
>>>>
>>>> II.
>>>>
>>>> This patch adds a verification that at the end of the omp-expand
>>>> processing
>>>> of the child function, either the loop structure is ok, or marked for
>>>> fixup.
>>>>
>>>> This verfication triggered a failure in parloops. When an outer loop is
>>>> being parallelized, both the outer and inner loop are cancelled. Then
>>>> during
>>>> omp-expansion, we create a loop in the loop state for the outer loop (the
>>>> one that is transformed), but not for the inner, which causes the
>>>> verification failure:
>>>> ...
>>>> outer-1.c:11:3: error: loop with header 5 not in loop tree
>>>> ...
>>>>
>>>> [ I ran into this verification failure with an openacc kernels testcase
>>>> on
>>>> the gomp-4_0-branch, where parloops is called additionally from a
>>>> different
>>>> location, and pass_fixup_cfg is not the first pass that the child
>>>> function
>>>> is processed by. ]
>>>>
>>>> The patch contains a bit that makes sure that the loop state of the child
>>>> function is marked for fixup in parloops. The bit is non-trival since it
>>>> create a loop state and sets the fixup flag on the loop state, but
>>>> postpones
>>>> the init_loops_structure call till move_sese_region_to_fn, where it can
>>>> succeed.
>>>>
>>>>

<SNIP>

> Can we fix the root-cause of the issue instead?  That is, build a valid loop
> structure in the first place?
>

This patch manages to keep the loop structure, that is, to not cancel 
the loop tree in parloops, and guarantee a valid loop structure at the 
end of parloops.

The transformation to insert the omp_for invalidates the loop state 
properties LOOPS_HAVE_RECORDED_EXITS and LOOPS_HAVE_SIMPLE_LATCHES, so 
we drop those in parloops.

In expand_omp_for_static_nochunk, we detect the existing loop struct of 
the omp_for, and keep it.

Then by calling pass_tree_loop_init after pass_expand_omp_ssa, we get 
the loop state properties LOOPS_HAVE_RECORDED_EXITS and 
LOOPS_HAVE_SIMPLE_LATCHES back.

Tested by running:
- gcc dg.exp "parloops*.c"
- gcc autopar.exp
- target-libgomp c.exp

Currently bootstrapping and reg-testing on x86_64.

If that succeeds, OK for trunk?

Thanks,
- Tom



[-- Attachment #2: 0001-Don-t-cancel-loop-tree-in-parloops.patch --]
[-- Type: text/x-patch, Size: 6026 bytes --]

Don't cancel loop tree in parloops

2015-07-20  Tom de Vries  <tom@codesourcery.com>

	PR tree-optimization/66846
	* omp-low.c (expand_omp_taskreg) [ENABLE_CHECKING]: Call
	verify_loop_structure for child_cfun if !LOOPS_NEED_FIXUP.
	(expand_omp_target) [ENABLE_CHECKING]: Same.
	(execute_expand_omp)  [ENABLE_CHECKING]: Call verify_loop_structure for
	cfun if !LOOPS_NEED_FIXUP.
	(expand_omp_for_static_nochunk): Handle case that omp_for already has
	its own loop struct.
	* passes.def: Add pass_tree_loop_init after pass_expand_omp_ssa in
	pass_parallelize_loops.
	* tree-parloops.c (create_parallel_loop): Add comment.
	(gen_parallel_loop): Remove call to cancel_loop_tree.
	(parallelize_loops): Skip loops that are inner loops of parallelized
	loops.
	(pass_parallelize_loops::execute): Clear LOOPS_HAVE_RECORDED_EXITS and
	LOOPS_HAVE_SIMPLE_LATCHES on loop state.
	[ENABLE_CHECKING]: Call verify_loop_structure.
	* tree-ssa-loop.c (pass_tree_loop_init::clone): New function.
---
 gcc/omp-low.c       | 22 +++++++++++++++++++++-
 gcc/passes.def      |  1 +
 gcc/tree-parloops.c | 33 +++++++++++++++++++++++++++++----
 gcc/tree-ssa-loop.c |  1 +
 4 files changed, 52 insertions(+), 5 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 3135606..dce0c02 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -5604,6 +5604,10 @@ expand_omp_taskreg (struct omp_region *region)
 	}
       if (gimple_in_ssa_p (cfun))
 	update_ssa (TODO_update_ssa);
+#ifdef ENABLE_CHECKING
+      if (!loops_state_satisfies_p (LOOPS_NEED_FIXUP))
+	verify_loop_structure ();
+#endif
       pop_cfun ();
     }
 
@@ -6843,9 +6847,17 @@ expand_omp_for_static_nochunk (struct omp_region *region,
   set_immediate_dominator (CDI_DOMINATORS, fin_bb,
 			   recompute_dominator (CDI_DOMINATORS, fin_bb));
 
+  struct loop *loop = body_bb->loop_father;
+  if (loop != entry_bb->loop_father)
+    {
+      gcc_assert (loop->header == body_bb);
+      gcc_assert (broken_loop || loop->latch == region->cont);
+      return;
+    }
+
   if (!broken_loop && !gimple_omp_for_combined_p (fd->for_stmt))
     {
-      struct loop *loop = alloc_loop ();
+      loop = alloc_loop ();
       loop->header = body_bb;
       if (collapse_bb == NULL)
 	loop->latch = cont_bb;
@@ -8984,6 +8996,10 @@ expand_omp_target (struct omp_region *region)
 	  if (changed)
 	    cleanup_tree_cfg ();
 	}
+#ifdef ENABLE_CHECKING
+      if (!loops_state_satisfies_p (LOOPS_NEED_FIXUP))
+	verify_loop_structure ();
+#endif
       pop_cfun ();
     }
 
@@ -9492,6 +9508,10 @@ execute_expand_omp (void)
 
   expand_omp (root_omp_region);
 
+#ifdef ENABLE_CHECKING
+  if (!loops_state_satisfies_p (LOOPS_NEED_FIXUP))
+    verify_loop_structure ();
+#endif
   cleanup_tree_cfg ();
 
   free_omp_regions ();
diff --git a/gcc/passes.def b/gcc/passes.def
index 6b66f8f..ea3a800 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -246,6 +246,7 @@ along with GCC; see the file COPYING3.  If not see
 	  NEXT_PASS (pass_parallelize_loops);
 	  PUSH_INSERT_PASSES_WITHIN (pass_parallelize_loops)
 	      NEXT_PASS (pass_expand_omp_ssa);
+	      NEXT_PASS (pass_tree_loop_init);
 	  POP_INSERT_PASSES ()
 	  NEXT_PASS (pass_ch_vect);
 	  NEXT_PASS (pass_if_conversion);
diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index ec41834..e33071d 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -2045,7 +2045,14 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
 
   guard = make_edge (for_bb, ex_bb, 0);
   single_succ_edge (loop->latch)->flags = 0;
+
+  /* After creating this edge, the latch has two successors, so
+     LOOPS_HAVE_SIMPLE_LATCHES is no longer valid.  Furthermore, this exit is
+     not recorded.  We'll update the loop state as such at the end of the
+     pass, since it's not needed earlier, and doing it earlier will invalidate
+     info for loops we still need to process.  */
   end = make_edge (loop->latch, ex_bb, EDGE_FALLTHRU);
+
   for (gphi_iterator gpi = gsi_start_phis (ex_bb);
        !gsi_end_p (gpi); gsi_next (&gpi))
     {
@@ -2282,10 +2289,6 @@ gen_parallel_loop (struct loop *loop,
 
   scev_reset ();
 
-  /* Cancel the loop (it is simpler to do it here rather than to teach the
-     expander to do it).  */
-  cancel_loop_tree (loop);
-
   /* Free loop bound estimations that could contain references to
      removed statements.  */
   FOR_EACH_LOOP (loop, 0)
@@ -2521,6 +2524,7 @@ parallelize_loops (void)
   unsigned n_threads = flag_tree_parallelize_loops;
   bool changed = false;
   struct loop *loop;
+  struct loop *skip_loop = NULL;
   struct tree_niter_desc niter_desc;
   struct obstack parloop_obstack;
   HOST_WIDE_INT estimated;
@@ -2538,6 +2542,19 @@ parallelize_loops (void)
 
   FOR_EACH_LOOP (loop, 0)
     {
+      if (loop == skip_loop)
+	{
+	  if (dump_file && (dump_flags & TDF_DETAILS))
+	    fprintf (dump_file,
+		     "Skipping loop %d as inner loop of parallelized loop\n",
+		     loop->num);
+
+	  skip_loop = loop->inner;
+	  continue;
+	}
+      else
+	skip_loop = NULL;
+
       reduction_list.empty ();
       if (dump_file && (dump_flags & TDF_DETAILS))
       {
@@ -2597,6 +2614,7 @@ parallelize_loops (void)
 	continue;
 
       changed = true;
+      skip_loop = loop->inner;
       if (dump_file && (dump_flags & TDF_DETAILS))
       {
 	if (loop->inner)
@@ -2663,6 +2681,13 @@ pass_parallelize_loops::execute (function *fun)
   if (parallelize_loops ())
     {
       fun->curr_properties &= ~(PROP_gimple_eomp);
+
+      release_recorded_exits ();
+      loops_state_clear (LOOPS_HAVE_SIMPLE_LATCHES);
+#ifdef ENABLE_CHECKING
+      verify_loop_structure ();
+#endif
+
       return TODO_update_ssa;
     }
 
diff --git a/gcc/tree-ssa-loop.c b/gcc/tree-ssa-loop.c
index 078b1d1..3520d92 100644
--- a/gcc/tree-ssa-loop.c
+++ b/gcc/tree-ssa-loop.c
@@ -210,6 +210,7 @@ public:
 
   /* opt_pass methods: */
   virtual unsigned int execute (function *);
+  opt_pass * clone () { return new pass_tree_loop_init (m_ctxt); }
 
 }; // class pass_tree_loop_init
 
-- 
1.9.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH, PR66846] Mark inner loop for fixup in parloops
  2015-07-20 13:38       ` Tom de Vries
@ 2015-07-24 10:31         ` Tom de Vries
  2015-07-28 10:14           ` Richard Biener
  0 siblings, 1 reply; 9+ messages in thread
From: Tom de Vries @ 2015-07-24 10:31 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 4860 bytes --]

On 20/07/15 15:04, Tom de Vries wrote:
> On 16/07/15 12:15, Richard Biener wrote:
>> On Thu, Jul 16, 2015 at 11:39 AM, Tom de Vries
>> <Tom_deVries@mentor.com> wrote:
>>> On 16/07/15 10:44, Richard Biener wrote:
>>>>
>>>> On Wed, Jul 15, 2015 at 9:36 PM, Tom de Vries <Tom_deVries@mentor.com>
>>>> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I.
>>>>>
>>>>> In openmp expansion of loops, we do some effort to try to create
>>>>> matching
>>>>> loops in the loop state of the child function, f.i.in
>>>>> expand_omp_for_generic:
>>>>> ...
>>>>>         struct loop *outer_loop;
>>>>>         if (seq_loop)
>>>>>           outer_loop = l0_bb->loop_father;
>>>>>         else
>>>>>           {
>>>>>             outer_loop = alloc_loop ();
>>>>>             outer_loop->header = l0_bb;
>>>>>             outer_loop->latch = l2_bb;
>>>>>             add_loop (outer_loop, l0_bb->loop_father);
>>>>>           }
>>>>>
>>>>>         if (!gimple_omp_for_combined_p (fd->for_stmt))
>>>>>           {
>>>>>             struct loop *loop = alloc_loop ();
>>>>>             loop->header = l1_bb;
>>>>>             /* The loop may have multiple latches.  */
>>>>>             add_loop (loop, outer_loop);
>>>>>           }
>>>>> ...
>>>>>
>>>>> And if that doesn't work out, we try to mark the loop state for
>>>>> fixup, in
>>>>> expand_omp_taskreg and expand_omp_target:
>>>>> ...
>>>>>         /* When the OMP expansion process cannot guarantee an
>>>>> up-to-date
>>>>>            loop tree arrange for the child function to fixup
>>>>> loops.  */
>>>>>         if (loops_state_satisfies_p (LOOPS_NEED_FIXUP))
>>>>>           child_cfun->x_current_loops->state |= LOOPS_NEED_FIXUP;
>>>>> ...
>>>>>
>>>>> and expand_omp_for:
>>>>> ...
>>>>>     else
>>>>>       /* If there isn't a continue then this is a degerate case where
>>>>>          the introduction of abnormal edges during lowering will
>>>>> prevent
>>>>>          original loops from being detected.  Fix that up.  */
>>>>>       loops_state_set (LOOPS_NEED_FIXUP);
>>>>> ...
>>>>>
>>>>> However, loops are fixed up anyway, because the first pass we execute
>>>>> with
>>>>> the new child function is pass_fixup_cfg.
>>>>>
>>>>> The new child function contains a function call to
>>>>> __builtin_omp_get_num_threads, which is marked with ECF_CONST, so
>>>>> execute_fixup_cfg marks the function for TODO_cleanup_cfg, and
>>>>> subsequently
>>>>> the loops with LOOPS_NEED_FIXUP.
>>>>>
>>>>>
>>>>> II.
>>>>>
>>>>> This patch adds a verification that at the end of the omp-expand
>>>>> processing
>>>>> of the child function, either the loop structure is ok, or marked for
>>>>> fixup.
>>>>>
>>>>> This verfication triggered a failure in parloops. When an outer
>>>>> loop is
>>>>> being parallelized, both the outer and inner loop are cancelled. Then
>>>>> during
>>>>> omp-expansion, we create a loop in the loop state for the outer
>>>>> loop (the
>>>>> one that is transformed), but not for the inner, which causes the
>>>>> verification failure:
>>>>> ...
>>>>> outer-1.c:11:3: error: loop with header 5 not in loop tree
>>>>> ...
>>>>>
>>>>> [ I ran into this verification failure with an openacc kernels
>>>>> testcase
>>>>> on
>>>>> the gomp-4_0-branch, where parloops is called additionally from a
>>>>> different
>>>>> location, and pass_fixup_cfg is not the first pass that the child
>>>>> function
>>>>> is processed by. ]
>>>>>
>>>>> The patch contains a bit that makes sure that the loop state of the
>>>>> child
>>>>> function is marked for fixup in parloops. The bit is non-trival
>>>>> since it
>>>>> create a loop state and sets the fixup flag on the loop state, but
>>>>> postpones
>>>>> the init_loops_structure call till move_sese_region_to_fn, where it
>>>>> can
>>>>> succeed.
>>>>>
>>>>>
>
> <SNIP>
>
>> Can we fix the root-cause of the issue instead?  That is, build a
>> valid loop
>> structure in the first place?
>>
>
> This patch manages to keep the loop structure, that is, to not cancel
> the loop tree in parloops, and guarantee a valid loop structure at the
> end of parloops.
>
> The transformation to insert the omp_for invalidates the loop state
> properties LOOPS_HAVE_RECORDED_EXITS and LOOPS_HAVE_SIMPLE_LATCHES, so
> we drop those in parloops.
>
> In expand_omp_for_static_nochunk, we detect the existing loop struct of
> the omp_for, and keep it.
>
> Then by calling pass_tree_loop_init after pass_expand_omp_ssa, we get
> the loop state properties LOOPS_HAVE_RECORDED_EXITS and
> LOOPS_HAVE_SIMPLE_LATCHES back.
>

This updated patch tries a more minimal approach.

Rather than dropping property LOOPS_HAVE_RECORDED_EXITS, we record the 
new exit instead.

And rather than adding pass_tree_loop_init after pass_expand_omp_ssa, we 
just set LOOPS_HAVE_SIMPLE_LATCHES back at the end of pass_expand_omp_ssa.

Bootstrapped and reg-tested on x86_64.

OK for trunk?

Thanks,
- Tom


[-- Attachment #2: 0001-Don-t-cancel-loop-tree-in-parloops.patch --]
[-- Type: text/x-patch, Size: 5269 bytes --]

Don't cancel loop tree in parloops

2015-07-20  Tom de Vries  <tom@codesourcery.com>

	PR tree-optimization/66846
	* omp-low.c (expand_omp_taskreg) [ENABLE_CHECKING]: Call
	verify_loop_structure for child_cfun if !LOOPS_NEED_FIXUP.
	(expand_omp_target) [ENABLE_CHECKING]: Same.
	(execute_expand_omp): Reinstate LOOPS_HAVE_SIMPLE_LATCHES if in ssa.
	[ENABLE_CHECKING]: Call verify_loop_structure for cfun if
	!LOOPS_NEED_FIXUP.
	(expand_omp_for_static_nochunk): Handle case that omp_for already has
	its own loop struct.
	* tree-parloops.c (create_parallel_loop): Add comment about breaking
	LOOPS_HAVE_SIMPLE_LATCHES.  Record new exit.
	(gen_parallel_loop): Remove call to cancel_loop_tree.
	(parallelize_loops): Skip loops that are inner loops of parallelized
	loops.
	(pass_parallelize_loops::execute): Clear LOOPS_HAVE_SIMPLE_LATCHES on
	loop state.
	[ENABLE_CHECKING]: Call verify_loop_structure.
---
 gcc/omp-low.c       | 27 ++++++++++++++++++++++++++-
 gcc/tree-parloops.c | 32 ++++++++++++++++++++++++++++----
 2 files changed, 54 insertions(+), 5 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 3135606..ce83abf 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -5604,6 +5604,10 @@ expand_omp_taskreg (struct omp_region *region)
 	}
       if (gimple_in_ssa_p (cfun))
 	update_ssa (TODO_update_ssa);
+#ifdef ENABLE_CHECKING
+      if (!loops_state_satisfies_p (LOOPS_NEED_FIXUP))
+	verify_loop_structure ();
+#endif
       pop_cfun ();
     }
 
@@ -6843,9 +6847,17 @@ expand_omp_for_static_nochunk (struct omp_region *region,
   set_immediate_dominator (CDI_DOMINATORS, fin_bb,
 			   recompute_dominator (CDI_DOMINATORS, fin_bb));
 
+  struct loop *loop = body_bb->loop_father;
+  if (loop != entry_bb->loop_father)
+    {
+      gcc_assert (loop->header == body_bb);
+      gcc_assert (broken_loop || loop->latch == region->cont);
+      return;
+    }
+
   if (!broken_loop && !gimple_omp_for_combined_p (fd->for_stmt))
     {
-      struct loop *loop = alloc_loop ();
+      loop = alloc_loop ();
       loop->header = body_bb;
       if (collapse_bb == NULL)
 	loop->latch = cont_bb;
@@ -8984,6 +8996,10 @@ expand_omp_target (struct omp_region *region)
 	  if (changed)
 	    cleanup_tree_cfg ();
 	}
+#ifdef ENABLE_CHECKING
+      if (!loops_state_satisfies_p (LOOPS_NEED_FIXUP))
+	verify_loop_structure ();
+#endif
       pop_cfun ();
     }
 
@@ -9492,6 +9508,15 @@ execute_expand_omp (void)
 
   expand_omp (root_omp_region);
 
+  /* We dropped this property in parloops because of the omp_for.  Now that the
+     omp_for has been expanded, reinstate it.  */
+  if (gimple_in_ssa_p (cfun))
+    loops_state_set (LOOPS_HAVE_SIMPLE_LATCHES);
+
+#ifdef ENABLE_CHECKING
+  if (!loops_state_satisfies_p (LOOPS_NEED_FIXUP))
+    verify_loop_structure ();
+#endif
   cleanup_tree_cfg ();
 
   free_omp_regions ();
diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index ec41834..1899052 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -2045,7 +2045,14 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
 
   guard = make_edge (for_bb, ex_bb, 0);
   single_succ_edge (loop->latch)->flags = 0;
+
+  /* After creating this edge, the latch has two successors, so
+     LOOPS_HAVE_SIMPLE_LATCHES is no longer valid.  We'll update the loop state
+     as such at the end of the pass, since it's not needed earlier, and doing it
+     earlier will invalidate info for loops we still need to process.  */
   end = make_edge (loop->latch, ex_bb, EDGE_FALLTHRU);
+  rescan_loop_exit (end, true, false);
+
   for (gphi_iterator gpi = gsi_start_phis (ex_bb);
        !gsi_end_p (gpi); gsi_next (&gpi))
     {
@@ -2282,10 +2289,6 @@ gen_parallel_loop (struct loop *loop,
 
   scev_reset ();
 
-  /* Cancel the loop (it is simpler to do it here rather than to teach the
-     expander to do it).  */
-  cancel_loop_tree (loop);
-
   /* Free loop bound estimations that could contain references to
      removed statements.  */
   FOR_EACH_LOOP (loop, 0)
@@ -2521,6 +2524,7 @@ parallelize_loops (void)
   unsigned n_threads = flag_tree_parallelize_loops;
   bool changed = false;
   struct loop *loop;
+  struct loop *skip_loop = NULL;
   struct tree_niter_desc niter_desc;
   struct obstack parloop_obstack;
   HOST_WIDE_INT estimated;
@@ -2538,6 +2542,19 @@ parallelize_loops (void)
 
   FOR_EACH_LOOP (loop, 0)
     {
+      if (loop == skip_loop)
+	{
+	  if (dump_file && (dump_flags & TDF_DETAILS))
+	    fprintf (dump_file,
+		     "Skipping loop %d as inner loop of parallelized loop\n",
+		     loop->num);
+
+	  skip_loop = loop->inner;
+	  continue;
+	}
+      else
+	skip_loop = NULL;
+
       reduction_list.empty ();
       if (dump_file && (dump_flags & TDF_DETAILS))
       {
@@ -2597,6 +2614,7 @@ parallelize_loops (void)
 	continue;
 
       changed = true;
+      skip_loop = loop->inner;
       if (dump_file && (dump_flags & TDF_DETAILS))
       {
 	if (loop->inner)
@@ -2663,6 +2681,12 @@ pass_parallelize_loops::execute (function *fun)
   if (parallelize_loops ())
     {
       fun->curr_properties &= ~(PROP_gimple_eomp);
+
+      loops_state_clear (LOOPS_HAVE_SIMPLE_LATCHES);
+#ifdef ENABLE_CHECKING
+      verify_loop_structure ();
+#endif
+
       return TODO_update_ssa;
     }
 
-- 
1.9.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH, PR66846] Mark inner loop for fixup in parloops
  2015-07-24 10:31         ` Tom de Vries
@ 2015-07-28 10:14           ` Richard Biener
  2015-07-29 12:01             ` Tom de Vries
  0 siblings, 1 reply; 9+ messages in thread
From: Richard Biener @ 2015-07-28 10:14 UTC (permalink / raw)
  To: Tom de Vries; +Cc: gcc-patches

On Fri, Jul 24, 2015 at 12:10 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
> On 20/07/15 15:04, Tom de Vries wrote:
>>
>> On 16/07/15 12:15, Richard Biener wrote:
>>>
>>> On Thu, Jul 16, 2015 at 11:39 AM, Tom de Vries
>>> <Tom_deVries@mentor.com> wrote:
>>>>
>>>> On 16/07/15 10:44, Richard Biener wrote:
>>>>>
>>>>>
>>>>> On Wed, Jul 15, 2015 at 9:36 PM, Tom de Vries <Tom_deVries@mentor.com>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I.
>>>>>>
>>>>>> In openmp expansion of loops, we do some effort to try to create
>>>>>> matching
>>>>>> loops in the loop state of the child function, f.i.in
>>>>>> expand_omp_for_generic:
>>>>>> ...
>>>>>>         struct loop *outer_loop;
>>>>>>         if (seq_loop)
>>>>>>           outer_loop = l0_bb->loop_father;
>>>>>>         else
>>>>>>           {
>>>>>>             outer_loop = alloc_loop ();
>>>>>>             outer_loop->header = l0_bb;
>>>>>>             outer_loop->latch = l2_bb;
>>>>>>             add_loop (outer_loop, l0_bb->loop_father);
>>>>>>           }
>>>>>>
>>>>>>         if (!gimple_omp_for_combined_p (fd->for_stmt))
>>>>>>           {
>>>>>>             struct loop *loop = alloc_loop ();
>>>>>>             loop->header = l1_bb;
>>>>>>             /* The loop may have multiple latches.  */
>>>>>>             add_loop (loop, outer_loop);
>>>>>>           }
>>>>>> ...
>>>>>>
>>>>>> And if that doesn't work out, we try to mark the loop state for
>>>>>> fixup, in
>>>>>> expand_omp_taskreg and expand_omp_target:
>>>>>> ...
>>>>>>         /* When the OMP expansion process cannot guarantee an
>>>>>> up-to-date
>>>>>>            loop tree arrange for the child function to fixup
>>>>>> loops.  */
>>>>>>         if (loops_state_satisfies_p (LOOPS_NEED_FIXUP))
>>>>>>           child_cfun->x_current_loops->state |= LOOPS_NEED_FIXUP;
>>>>>> ...
>>>>>>
>>>>>> and expand_omp_for:
>>>>>> ...
>>>>>>     else
>>>>>>       /* If there isn't a continue then this is a degerate case where
>>>>>>          the introduction of abnormal edges during lowering will
>>>>>> prevent
>>>>>>          original loops from being detected.  Fix that up.  */
>>>>>>       loops_state_set (LOOPS_NEED_FIXUP);
>>>>>> ...
>>>>>>
>>>>>> However, loops are fixed up anyway, because the first pass we execute
>>>>>> with
>>>>>> the new child function is pass_fixup_cfg.
>>>>>>
>>>>>> The new child function contains a function call to
>>>>>> __builtin_omp_get_num_threads, which is marked with ECF_CONST, so
>>>>>> execute_fixup_cfg marks the function for TODO_cleanup_cfg, and
>>>>>> subsequently
>>>>>> the loops with LOOPS_NEED_FIXUP.
>>>>>>
>>>>>>
>>>>>> II.
>>>>>>
>>>>>> This patch adds a verification that at the end of the omp-expand
>>>>>> processing
>>>>>> of the child function, either the loop structure is ok, or marked for
>>>>>> fixup.
>>>>>>
>>>>>> This verfication triggered a failure in parloops. When an outer
>>>>>> loop is
>>>>>> being parallelized, both the outer and inner loop are cancelled. Then
>>>>>> during
>>>>>> omp-expansion, we create a loop in the loop state for the outer
>>>>>> loop (the
>>>>>> one that is transformed), but not for the inner, which causes the
>>>>>> verification failure:
>>>>>> ...
>>>>>> outer-1.c:11:3: error: loop with header 5 not in loop tree
>>>>>> ...
>>>>>>
>>>>>> [ I ran into this verification failure with an openacc kernels
>>>>>> testcase
>>>>>> on
>>>>>> the gomp-4_0-branch, where parloops is called additionally from a
>>>>>> different
>>>>>> location, and pass_fixup_cfg is not the first pass that the child
>>>>>> function
>>>>>> is processed by. ]
>>>>>>
>>>>>> The patch contains a bit that makes sure that the loop state of the
>>>>>> child
>>>>>> function is marked for fixup in parloops. The bit is non-trival
>>>>>> since it
>>>>>> create a loop state and sets the fixup flag on the loop state, but
>>>>>> postpones
>>>>>> the init_loops_structure call till move_sese_region_to_fn, where it
>>>>>> can
>>>>>> succeed.
>>>>>>
>>>>>>
>>
>> <SNIP>
>>
>>> Can we fix the root-cause of the issue instead?  That is, build a
>>> valid loop
>>> structure in the first place?
>>>
>>
>> This patch manages to keep the loop structure, that is, to not cancel
>> the loop tree in parloops, and guarantee a valid loop structure at the
>> end of parloops.
>>
>> The transformation to insert the omp_for invalidates the loop state
>> properties LOOPS_HAVE_RECORDED_EXITS and LOOPS_HAVE_SIMPLE_LATCHES, so
>> we drop those in parloops.
>>
>> In expand_omp_for_static_nochunk, we detect the existing loop struct of
>> the omp_for, and keep it.
>>
>> Then by calling pass_tree_loop_init after pass_expand_omp_ssa, we get
>> the loop state properties LOOPS_HAVE_RECORDED_EXITS and
>> LOOPS_HAVE_SIMPLE_LATCHES back.
>>
>
> This updated patch tries a more minimal approach.
>
> Rather than dropping property LOOPS_HAVE_RECORDED_EXITS, we record the new
> exit instead.
>
> And rather than adding pass_tree_loop_init after pass_expand_omp_ssa, we
> just set LOOPS_HAVE_SIMPLE_LATCHES back at the end of pass_expand_omp_ssa.
>
> Bootstrapped and reg-tested on x86_64.
>
> OK for trunk?

I wonder about the need to clear LOOPS_HAVE_SIMPLE_LATCHES (and esp. turning
that back on in execute_expand_omp).  The parloops code lacks comments and
the /* Prepare cfg.  */ part looks twisty to me - but I don't see why
it should be difficult
to preserve simple latches as well - is this just because we insert
the GIMPLE_OMP_CONTINUE
in it?

If execute_expand_omp is not performed in a loop pipeline where things
expect simple latches
(well, obviously it isn't) then why set LOOPS_HAVE_SIMPLE_LATCHES here
at all?  If somebody
needs it it will just request simple latches.

So if the GIMPLE_OMP_CONTINUE part is correct then I'd prefer to keep
LOOPS_HAVE_SIMPLE_LATCHES
unset in execute_expand_omp.

Ok with that change.

Thanks,
Richard.

> Thanks,
> - Tom
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH, PR66846] Mark inner loop for fixup in parloops
  2015-07-28 10:14           ` Richard Biener
@ 2015-07-29 12:01             ` Tom de Vries
  2015-07-29 12:20               ` Richard Biener
  0 siblings, 1 reply; 9+ messages in thread
From: Tom de Vries @ 2015-07-29 12:01 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 6674 bytes --]

On 28/07/15 12:11, Richard Biener wrote:
> On Fri, Jul 24, 2015 at 12:10 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>> On 20/07/15 15:04, Tom de Vries wrote:
>>>
>>> On 16/07/15 12:15, Richard Biener wrote:
>>>>
>>>> On Thu, Jul 16, 2015 at 11:39 AM, Tom de Vries
>>>> <Tom_deVries@mentor.com> wrote:
>>>>>
>>>>> On 16/07/15 10:44, Richard Biener wrote:
>>>>>>
>>>>>>
>>>>>> On Wed, Jul 15, 2015 at 9:36 PM, Tom de Vries <Tom_deVries@mentor.com>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I.
>>>>>>>
>>>>>>> In openmp expansion of loops, we do some effort to try to create
>>>>>>> matching
>>>>>>> loops in the loop state of the child function, f.i.in
>>>>>>> expand_omp_for_generic:
>>>>>>> ...
>>>>>>>          struct loop *outer_loop;
>>>>>>>          if (seq_loop)
>>>>>>>            outer_loop = l0_bb->loop_father;
>>>>>>>          else
>>>>>>>            {
>>>>>>>              outer_loop = alloc_loop ();
>>>>>>>              outer_loop->header = l0_bb;
>>>>>>>              outer_loop->latch = l2_bb;
>>>>>>>              add_loop (outer_loop, l0_bb->loop_father);
>>>>>>>            }
>>>>>>>
>>>>>>>          if (!gimple_omp_for_combined_p (fd->for_stmt))
>>>>>>>            {
>>>>>>>              struct loop *loop = alloc_loop ();
>>>>>>>              loop->header = l1_bb;
>>>>>>>              /* The loop may have multiple latches.  */
>>>>>>>              add_loop (loop, outer_loop);
>>>>>>>            }
>>>>>>> ...
>>>>>>>
>>>>>>> And if that doesn't work out, we try to mark the loop state for
>>>>>>> fixup, in
>>>>>>> expand_omp_taskreg and expand_omp_target:
>>>>>>> ...
>>>>>>>          /* When the OMP expansion process cannot guarantee an
>>>>>>> up-to-date
>>>>>>>             loop tree arrange for the child function to fixup
>>>>>>> loops.  */
>>>>>>>          if (loops_state_satisfies_p (LOOPS_NEED_FIXUP))
>>>>>>>            child_cfun->x_current_loops->state |= LOOPS_NEED_FIXUP;
>>>>>>> ...
>>>>>>>
>>>>>>> and expand_omp_for:
>>>>>>> ...
>>>>>>>      else
>>>>>>>        /* If there isn't a continue then this is a degerate case where
>>>>>>>           the introduction of abnormal edges during lowering will
>>>>>>> prevent
>>>>>>>           original loops from being detected.  Fix that up.  */
>>>>>>>        loops_state_set (LOOPS_NEED_FIXUP);
>>>>>>> ...
>>>>>>>
>>>>>>> However, loops are fixed up anyway, because the first pass we execute
>>>>>>> with
>>>>>>> the new child function is pass_fixup_cfg.
>>>>>>>
>>>>>>> The new child function contains a function call to
>>>>>>> __builtin_omp_get_num_threads, which is marked with ECF_CONST, so
>>>>>>> execute_fixup_cfg marks the function for TODO_cleanup_cfg, and
>>>>>>> subsequently
>>>>>>> the loops with LOOPS_NEED_FIXUP.
>>>>>>>
>>>>>>>
>>>>>>> II.
>>>>>>>
>>>>>>> This patch adds a verification that at the end of the omp-expand
>>>>>>> processing
>>>>>>> of the child function, either the loop structure is ok, or marked for
>>>>>>> fixup.
>>>>>>>
>>>>>>> This verfication triggered a failure in parloops. When an outer
>>>>>>> loop is
>>>>>>> being parallelized, both the outer and inner loop are cancelled. Then
>>>>>>> during
>>>>>>> omp-expansion, we create a loop in the loop state for the outer
>>>>>>> loop (the
>>>>>>> one that is transformed), but not for the inner, which causes the
>>>>>>> verification failure:
>>>>>>> ...
>>>>>>> outer-1.c:11:3: error: loop with header 5 not in loop tree
>>>>>>> ...
>>>>>>>
>>>>>>> [ I ran into this verification failure with an openacc kernels
>>>>>>> testcase
>>>>>>> on
>>>>>>> the gomp-4_0-branch, where parloops is called additionally from a
>>>>>>> different
>>>>>>> location, and pass_fixup_cfg is not the first pass that the child
>>>>>>> function
>>>>>>> is processed by. ]
>>>>>>>
>>>>>>> The patch contains a bit that makes sure that the loop state of the
>>>>>>> child
>>>>>>> function is marked for fixup in parloops. The bit is non-trival
>>>>>>> since it
>>>>>>> create a loop state and sets the fixup flag on the loop state, but
>>>>>>> postpones
>>>>>>> the init_loops_structure call till move_sese_region_to_fn, where it
>>>>>>> can
>>>>>>> succeed.
>>>>>>>
>>>>>>>
>>>
>>> <SNIP>
>>>
>>>> Can we fix the root-cause of the issue instead?  That is, build a
>>>> valid loop
>>>> structure in the first place?
>>>>
>>>
>>> This patch manages to keep the loop structure, that is, to not cancel
>>> the loop tree in parloops, and guarantee a valid loop structure at the
>>> end of parloops.
>>>
>>> The transformation to insert the omp_for invalidates the loop state
>>> properties LOOPS_HAVE_RECORDED_EXITS and LOOPS_HAVE_SIMPLE_LATCHES, so
>>> we drop those in parloops.
>>>
>>> In expand_omp_for_static_nochunk, we detect the existing loop struct of
>>> the omp_for, and keep it.
>>>
>>> Then by calling pass_tree_loop_init after pass_expand_omp_ssa, we get
>>> the loop state properties LOOPS_HAVE_RECORDED_EXITS and
>>> LOOPS_HAVE_SIMPLE_LATCHES back.
>>>
>>
>> This updated patch tries a more minimal approach.
>>
>> Rather than dropping property LOOPS_HAVE_RECORDED_EXITS, we record the new
>> exit instead.
>>
>> And rather than adding pass_tree_loop_init after pass_expand_omp_ssa, we
>> just set LOOPS_HAVE_SIMPLE_LATCHES back at the end of pass_expand_omp_ssa.
>>
>> Bootstrapped and reg-tested on x86_64.
>>
>> OK for trunk?
>
> I wonder about the need to clear LOOPS_HAVE_SIMPLE_LATCHES (and esp. turning
> that back on in execute_expand_omp).  The parloops code lacks comments and
> the /* Prepare cfg.  */ part looks twisty to me - but I don't see why
> it should be difficult
> to preserve simple latches as well - is this just because we insert
> the GIMPLE_OMP_CONTINUE
> in it?
>

It's not difficult to do. It's just that the omp-for that is generated 
via the normal route (omp-annotated source code) doesn't have this 
simple latch, and parloops just mimics that cfg shape.

Attached updated patch preserves simple latches in parloops. We need 
some minor adjustments in omp-expand to handle that.

> If execute_expand_omp is not performed in a loop pipeline where things
> expect simple latches
> (well, obviously it isn't) then why set LOOPS_HAVE_SIMPLE_LATCHES here
> at all?  If somebody
> needs it it will just request simple latches.
>

I've looked into this. The (first?) pass that gives me trouble is 
pass_iv_optimize. It doesn't request simple latches, but it does need them.

> So if the GIMPLE_OMP_CONTINUE part is correct then I'd prefer to keep
> LOOPS_HAVE_SIMPLE_LATCHES
> unset in execute_expand_omp.
>
> Ok with that change.
>

OK with this approach of keeping LOOPS_HAVE_SIMPLE_LATCHES in parloops 
(if bootstrap and reg-test succeeds)?

Thanks,
- Tom


[-- Attachment #2: 0001-Don-t-cancel-loop-tree-in-parloops.patch --]
[-- Type: text/x-patch, Size: 8675 bytes --]

Don't cancel loop tree in parloops

2015-07-28  Tom de Vries  <tom@codesourcery.com>

	PR tree-optimization/66846
	* omp-low.c (expand_omp_taskreg) [ENABLE_CHECKING]: Call
	verify_loop_structure for child_cfun if !LOOPS_NEED_FIXUP.
	(expand_omp_target) [ENABLE_CHECKING]: Same.
	(execute_expand_omp) [ENABLE_CHECKING]: Call verify_loop_structure for
	cfun if !LOOPS_NEED_FIXUP.
	(expand_omp_for_static_nochunk): Handle simple latch bb.  Handle case
	that omp_for already has its own loop struct.
	* tree-parloops.c (create_phi_for_local_result)
	(create_call_for_reduction): Handle simple latch bb.
	(create_parallel_loop): Add simple latch bb to preserve
	LOOPS_HAVE_SIMPLE_LATCHES.  Record new exit.  Handle simple latch bb.
	(gen_parallel_loop): Remove call to cancel_loop_tree.
	(parallelize_loops): Skip loops that are inner loops of parallelized
	loops.
	(pass_parallelize_loops::execute) [ENABLE_CHECKING]: Call
	verify_loop_structure.
---
 gcc/omp-low.c       | 32 ++++++++++++++++++++++++++++++--
 gcc/tree-parloops.c | 49 ++++++++++++++++++++++++++++++++++++-------------
 2 files changed, 66 insertions(+), 15 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 3135606..0f5c0f1 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -5604,6 +5604,10 @@ expand_omp_taskreg (struct omp_region *region)
 	}
       if (gimple_in_ssa_p (cfun))
 	update_ssa (TODO_update_ssa);
+#ifdef ENABLE_CHECKING
+      if (!loops_state_satisfies_p (LOOPS_NEED_FIXUP))
+	verify_loop_structure ();
+#endif
       pop_cfun ();
     }
 
@@ -6535,7 +6539,8 @@ expand_omp_for_static_nochunk (struct omp_region *region,
   body_bb = single_succ (seq_start_bb);
   if (!broken_loop)
     {
-      gcc_assert (BRANCH_EDGE (cont_bb)->dest == body_bb);
+      gcc_assert (BRANCH_EDGE (cont_bb)->dest == body_bb
+		  || single_succ (BRANCH_EDGE (cont_bb)->dest) == body_bb);
       gcc_assert (EDGE_COUNT (cont_bb->succs) == 2);
     }
   exit_bb = region->exit;
@@ -6818,6 +6823,11 @@ expand_omp_for_static_nochunk (struct omp_region *region,
   if (!broken_loop)
     {
       ep = find_edge (cont_bb, body_bb);
+      if (ep == NULL)
+	{
+	  ep = BRANCH_EDGE (cont_bb);
+	  gcc_assert (single_succ (ep->dest) == body_bb);
+	}
       if (gimple_omp_for_combined_p (fd->for_stmt))
 	{
 	  remove_edge (ep);
@@ -6843,9 +6853,19 @@ expand_omp_for_static_nochunk (struct omp_region *region,
   set_immediate_dominator (CDI_DOMINATORS, fin_bb,
 			   recompute_dominator (CDI_DOMINATORS, fin_bb));
 
+  struct loop *loop = body_bb->loop_father;
+  if (loop != entry_bb->loop_father)
+    {
+      gcc_assert (loop->header == body_bb);
+      gcc_assert (broken_loop
+		  || loop->latch == region->cont
+		  || single_pred (loop->latch) == region->cont);
+      return;
+    }
+
   if (!broken_loop && !gimple_omp_for_combined_p (fd->for_stmt))
     {
-      struct loop *loop = alloc_loop ();
+      loop = alloc_loop ();
       loop->header = body_bb;
       if (collapse_bb == NULL)
 	loop->latch = cont_bb;
@@ -8984,6 +9004,10 @@ expand_omp_target (struct omp_region *region)
 	  if (changed)
 	    cleanup_tree_cfg ();
 	}
+#ifdef ENABLE_CHECKING
+      if (!loops_state_satisfies_p (LOOPS_NEED_FIXUP))
+	verify_loop_structure ();
+#endif
       pop_cfun ();
     }
 
@@ -9492,6 +9516,10 @@ execute_expand_omp (void)
 
   expand_omp (root_omp_region);
 
+#ifdef ENABLE_CHECKING
+  if (!loops_state_satisfies_p (LOOPS_NEED_FIXUP))
+    verify_loop_structure ();
+#endif
   cleanup_tree_cfg ();
 
   free_omp_regions ();
diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index b06265c..d017479 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -1032,21 +1032,22 @@ create_phi_for_local_result (reduction_info **slot, struct loop *loop)
   struct reduction_info *const reduc = *slot;
   edge e;
   gphi *new_phi;
-  basic_block store_bb;
+  basic_block store_bb, continue_bb;
   tree local_res;
   source_location locus;
 
   /* STORE_BB is the block where the phi
      should be stored.  It is the destination of the loop exit.
      (Find the fallthru edge from GIMPLE_OMP_CONTINUE).  */
-  store_bb = FALLTHRU_EDGE (loop->latch)->dest;
+  continue_bb = single_pred (loop->latch);
+  store_bb = FALLTHRU_EDGE (continue_bb)->dest;
 
   /* STORE_BB has two predecessors.  One coming from  the loop
      (the reduction's result is computed at the loop),
      and another coming from a block preceding the loop,
      when no iterations
      are executed (the initial value should be taken).  */
-  if (EDGE_PRED (store_bb, 0) == FALLTHRU_EDGE (loop->latch))
+  if (EDGE_PRED (store_bb, 0) == FALLTHRU_EDGE (continue_bb))
     e = EDGE_PRED (store_bb, 1);
   else
     e = EDGE_PRED (store_bb, 0);
@@ -1055,7 +1056,7 @@ create_phi_for_local_result (reduction_info **slot, struct loop *loop)
   locus = gimple_location (reduc->reduc_stmt);
   new_phi = create_phi_node (local_res, store_bb);
   add_phi_arg (new_phi, reduc->init, e, locus);
-  add_phi_arg (new_phi, lhs, FALLTHRU_EDGE (loop->latch), locus);
+  add_phi_arg (new_phi, lhs, FALLTHRU_EDGE (continue_bb), locus);
   reduc->new_phi = new_phi;
 
   return 1;
@@ -1134,7 +1135,8 @@ create_call_for_reduction (struct loop *loop,
 {
   reduction_list->traverse <struct loop *, create_phi_for_local_result> (loop);
   /* Find the fallthru edge from GIMPLE_OMP_CONTINUE.  */
-  ld_st_data->load_bb = FALLTHRU_EDGE (loop->latch)->dest;
+  basic_block continue_bb = single_pred (loop->latch);
+  ld_st_data->load_bb = FALLTHRU_EDGE (continue_bb)->dest;
   reduction_list
     ->traverse <struct clsn_data *, create_call_for_reduction_1> (ld_st_data);
 }
@@ -1981,7 +1983,7 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
 		      tree new_data, unsigned n_threads, location_t loc)
 {
   gimple_stmt_iterator gsi;
-  basic_block bb, paral_bb, for_bb, ex_bb;
+  basic_block bb, paral_bb, for_bb, ex_bb, continue_bb;
   tree t, param;
   gomp_parallel *omp_par_stmt;
   gimple omp_return_stmt1, omp_return_stmt2;
@@ -2052,8 +2054,12 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
   gcc_assert (exit == single_dom_exit (loop));
 
   guard = make_edge (for_bb, ex_bb, 0);
-  single_succ_edge (loop->latch)->flags = 0;
-  end = make_edge (loop->latch, ex_bb, EDGE_FALLTHRU);
+  /* Split the latch edge, so LOOPS_HAVE_SIMPLE_LATCHES is still valid.  */
+  loop->latch = split_edge (single_succ_edge (loop->latch));
+  single_pred_edge (loop->latch)->flags = 0;
+  end = make_edge (single_pred (loop->latch), ex_bb, EDGE_FALLTHRU);
+  rescan_loop_exit (end, true, false);
+
   for (gphi_iterator gpi = gsi_start_phis (ex_bb);
        !gsi_end_p (gpi); gsi_next (&gpi))
     {
@@ -2102,7 +2108,8 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
   SSA_NAME_DEF_STMT (initvar) = for_stmt;
 
   /* Emit GIMPLE_OMP_CONTINUE.  */
-  gsi = gsi_last_bb (loop->latch);
+  continue_bb = single_pred (loop->latch);
+  gsi = gsi_last_bb (continue_bb);
   omp_cont_stmt = gimple_build_omp_continue (cvar_next, cvar);
   gimple_set_location (omp_cont_stmt, loc);
   gsi_insert_after (&gsi, omp_cont_stmt, GSI_NEW_STMT);
@@ -2298,10 +2305,6 @@ gen_parallel_loop (struct loop *loop,
 
   scev_reset ();
 
-  /* Cancel the loop (it is simpler to do it here rather than to teach the
-     expander to do it).  */
-  cancel_loop_tree (loop);
-
   /* Free loop bound estimations that could contain references to
      removed statements.  */
   FOR_EACH_LOOP (loop, 0)
@@ -2587,6 +2590,7 @@ parallelize_loops (void)
   unsigned n_threads = flag_tree_parallelize_loops;
   bool changed = false;
   struct loop *loop;
+  struct loop *skip_loop = NULL;
   struct tree_niter_desc niter_desc;
   struct obstack parloop_obstack;
   HOST_WIDE_INT estimated;
@@ -2604,6 +2608,19 @@ parallelize_loops (void)
 
   FOR_EACH_LOOP (loop, 0)
     {
+      if (loop == skip_loop)
+	{
+	  if (dump_file && (dump_flags & TDF_DETAILS))
+	    fprintf (dump_file,
+		     "Skipping loop %d as inner loop of parallelized loop\n",
+		     loop->num);
+
+	  skip_loop = loop->inner;
+	  continue;
+	}
+      else
+	skip_loop = NULL;
+
       reduction_list.empty ();
       if (dump_file && (dump_flags & TDF_DETAILS))
       {
@@ -2663,6 +2680,7 @@ parallelize_loops (void)
 	continue;
 
       changed = true;
+      skip_loop = loop->inner;
       if (dump_file && (dump_flags & TDF_DETAILS))
       {
 	if (loop->inner)
@@ -2729,6 +2747,11 @@ pass_parallelize_loops::execute (function *fun)
   if (parallelize_loops ())
     {
       fun->curr_properties &= ~(PROP_gimple_eomp);
+
+#ifdef ENABLE_CHECKING
+      verify_loop_structure ();
+#endif
+
       return TODO_update_ssa;
     }
 
-- 
1.9.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH, PR66846] Mark inner loop for fixup in parloops
  2015-07-29 12:01             ` Tom de Vries
@ 2015-07-29 12:20               ` Richard Biener
  0 siblings, 0 replies; 9+ messages in thread
From: Richard Biener @ 2015-07-29 12:20 UTC (permalink / raw)
  To: Tom de Vries; +Cc: gcc-patches

On Wed, Jul 29, 2015 at 1:38 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
> On 28/07/15 12:11, Richard Biener wrote:
>>
>> On Fri, Jul 24, 2015 at 12:10 PM, Tom de Vries <Tom_deVries@mentor.com>
>> wrote:
>>>
>>> On 20/07/15 15:04, Tom de Vries wrote:
>>>>
>>>>
>>>> On 16/07/15 12:15, Richard Biener wrote:
>>>>>
>>>>>
>>>>> On Thu, Jul 16, 2015 at 11:39 AM, Tom de Vries
>>>>> <Tom_deVries@mentor.com> wrote:
>>>>>>
>>>>>>
>>>>>> On 16/07/15 10:44, Richard Biener wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jul 15, 2015 at 9:36 PM, Tom de Vries
>>>>>>> <Tom_deVries@mentor.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I.
>>>>>>>>
>>>>>>>> In openmp expansion of loops, we do some effort to try to create
>>>>>>>> matching
>>>>>>>> loops in the loop state of the child function, f.i.in
>>>>>>>>
>>>>>>>> expand_omp_for_generic:
>>>>>>>> ...
>>>>>>>>          struct loop *outer_loop;
>>>>>>>>          if (seq_loop)
>>>>>>>>            outer_loop = l0_bb->loop_father;
>>>>>>>>          else
>>>>>>>>            {
>>>>>>>>              outer_loop = alloc_loop ();
>>>>>>>>              outer_loop->header = l0_bb;
>>>>>>>>              outer_loop->latch = l2_bb;
>>>>>>>>              add_loop (outer_loop, l0_bb->loop_father);
>>>>>>>>            }
>>>>>>>>
>>>>>>>>          if (!gimple_omp_for_combined_p (fd->for_stmt))
>>>>>>>>            {
>>>>>>>>              struct loop *loop = alloc_loop ();
>>>>>>>>              loop->header = l1_bb;
>>>>>>>>              /* The loop may have multiple latches.  */
>>>>>>>>              add_loop (loop, outer_loop);
>>>>>>>>            }
>>>>>>>> ...
>>>>>>>>
>>>>>>>> And if that doesn't work out, we try to mark the loop state for
>>>>>>>> fixup, in
>>>>>>>> expand_omp_taskreg and expand_omp_target:
>>>>>>>> ...
>>>>>>>>          /* When the OMP expansion process cannot guarantee an
>>>>>>>> up-to-date
>>>>>>>>             loop tree arrange for the child function to fixup
>>>>>>>> loops.  */
>>>>>>>>          if (loops_state_satisfies_p (LOOPS_NEED_FIXUP))
>>>>>>>>            child_cfun->x_current_loops->state |= LOOPS_NEED_FIXUP;
>>>>>>>> ...
>>>>>>>>
>>>>>>>> and expand_omp_for:
>>>>>>>> ...
>>>>>>>>      else
>>>>>>>>        /* If there isn't a continue then this is a degerate case
>>>>>>>> where
>>>>>>>>           the introduction of abnormal edges during lowering will
>>>>>>>> prevent
>>>>>>>>           original loops from being detected.  Fix that up.  */
>>>>>>>>        loops_state_set (LOOPS_NEED_FIXUP);
>>>>>>>> ...
>>>>>>>>
>>>>>>>> However, loops are fixed up anyway, because the first pass we
>>>>>>>> execute
>>>>>>>> with
>>>>>>>> the new child function is pass_fixup_cfg.
>>>>>>>>
>>>>>>>> The new child function contains a function call to
>>>>>>>> __builtin_omp_get_num_threads, which is marked with ECF_CONST, so
>>>>>>>> execute_fixup_cfg marks the function for TODO_cleanup_cfg, and
>>>>>>>> subsequently
>>>>>>>> the loops with LOOPS_NEED_FIXUP.
>>>>>>>>
>>>>>>>>
>>>>>>>> II.
>>>>>>>>
>>>>>>>> This patch adds a verification that at the end of the omp-expand
>>>>>>>> processing
>>>>>>>> of the child function, either the loop structure is ok, or marked
>>>>>>>> for
>>>>>>>> fixup.
>>>>>>>>
>>>>>>>> This verfication triggered a failure in parloops. When an outer
>>>>>>>> loop is
>>>>>>>> being parallelized, both the outer and inner loop are cancelled.
>>>>>>>> Then
>>>>>>>> during
>>>>>>>> omp-expansion, we create a loop in the loop state for the outer
>>>>>>>> loop (the
>>>>>>>> one that is transformed), but not for the inner, which causes the
>>>>>>>> verification failure:
>>>>>>>> ...
>>>>>>>> outer-1.c:11:3: error: loop with header 5 not in loop tree
>>>>>>>> ...
>>>>>>>>
>>>>>>>> [ I ran into this verification failure with an openacc kernels
>>>>>>>> testcase
>>>>>>>> on
>>>>>>>> the gomp-4_0-branch, where parloops is called additionally from a
>>>>>>>> different
>>>>>>>> location, and pass_fixup_cfg is not the first pass that the child
>>>>>>>> function
>>>>>>>> is processed by. ]
>>>>>>>>
>>>>>>>> The patch contains a bit that makes sure that the loop state of the
>>>>>>>> child
>>>>>>>> function is marked for fixup in parloops. The bit is non-trival
>>>>>>>> since it
>>>>>>>> create a loop state and sets the fixup flag on the loop state, but
>>>>>>>> postpones
>>>>>>>> the init_loops_structure call till move_sese_region_to_fn, where it
>>>>>>>> can
>>>>>>>> succeed.
>>>>>>>>
>>>>>>>>
>>>>
>>>> <SNIP>
>>>>
>>>>> Can we fix the root-cause of the issue instead?  That is, build a
>>>>> valid loop
>>>>> structure in the first place?
>>>>>
>>>>
>>>> This patch manages to keep the loop structure, that is, to not cancel
>>>> the loop tree in parloops, and guarantee a valid loop structure at the
>>>> end of parloops.
>>>>
>>>> The transformation to insert the omp_for invalidates the loop state
>>>> properties LOOPS_HAVE_RECORDED_EXITS and LOOPS_HAVE_SIMPLE_LATCHES, so
>>>> we drop those in parloops.
>>>>
>>>> In expand_omp_for_static_nochunk, we detect the existing loop struct of
>>>> the omp_for, and keep it.
>>>>
>>>> Then by calling pass_tree_loop_init after pass_expand_omp_ssa, we get
>>>> the loop state properties LOOPS_HAVE_RECORDED_EXITS and
>>>> LOOPS_HAVE_SIMPLE_LATCHES back.
>>>>
>>>
>>> This updated patch tries a more minimal approach.
>>>
>>> Rather than dropping property LOOPS_HAVE_RECORDED_EXITS, we record the
>>> new
>>> exit instead.
>>>
>>> And rather than adding pass_tree_loop_init after pass_expand_omp_ssa, we
>>> just set LOOPS_HAVE_SIMPLE_LATCHES back at the end of
>>> pass_expand_omp_ssa.
>>>
>>> Bootstrapped and reg-tested on x86_64.
>>>
>>> OK for trunk?
>>
>>
>> I wonder about the need to clear LOOPS_HAVE_SIMPLE_LATCHES (and esp.
>> turning
>> that back on in execute_expand_omp).  The parloops code lacks comments and
>> the /* Prepare cfg.  */ part looks twisty to me - but I don't see why
>> it should be difficult
>> to preserve simple latches as well - is this just because we insert
>> the GIMPLE_OMP_CONTINUE
>> in it?
>>
>
> It's not difficult to do. It's just that the omp-for that is generated via
> the normal route (omp-annotated source code) doesn't have this simple latch,
> and parloops just mimics that cfg shape.
>
> Attached updated patch preserves simple latches in parloops. We need some
> minor adjustments in omp-expand to handle that.
>
>> If execute_expand_omp is not performed in a loop pipeline where things
>> expect simple latches
>> (well, obviously it isn't) then why set LOOPS_HAVE_SIMPLE_LATCHES here
>> at all?  If somebody
>> needs it it will just request simple latches.
>>
>
> I've looked into this. The (first?) pass that gives me trouble is
> pass_iv_optimize. It doesn't request simple latches, but it does need them.

Well, pass_iv_optimize is inside the loop pass which ensures we have
LOOPS_NORMAL.
Passes in it rely on that.

>> So if the GIMPLE_OMP_CONTINUE part is correct then I'd prefer to keep
>> LOOPS_HAVE_SIMPLE_LATCHES
>> unset in execute_expand_omp.
>>
>> Ok with that change.
>>
>
> OK with this approach of keeping LOOPS_HAVE_SIMPLE_LATCHES in parloops (if
> bootstrap and reg-test succeeds)?

Yes.

Thanks,
Richard.

> Thanks,
> - Tom
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-07-29 12:03 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-15 19:49 [PATCH, PR66846] Mark inner loop for fixup in parloops Tom de Vries
2015-07-16  8:46 ` Richard Biener
2015-07-16 10:12   ` Tom de Vries
2015-07-16 10:23     ` Richard Biener
2015-07-20 13:38       ` Tom de Vries
2015-07-24 10:31         ` Tom de Vries
2015-07-28 10:14           ` Richard Biener
2015-07-29 12:01             ` Tom de Vries
2015-07-29 12:20               ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).