* [PATCH, PR66846] Mark inner loop for fixup in parloops
@ 2015-07-15 19:49 Tom de Vries
2015-07-16 8:46 ` Richard Biener
0 siblings, 1 reply; 9+ messages in thread
From: Tom de Vries @ 2015-07-15 19:49 UTC (permalink / raw)
To: gcc-patches
[-- Attachment #1: Type: text/plain, Size: 2920 bytes --]
Hi,
I.
In openmp expansion of loops, we do some effort to try to create
matching loops in the loop state of the child function, f.i.in
expand_omp_for_generic:
...
struct loop *outer_loop;
if (seq_loop)
outer_loop = l0_bb->loop_father;
else
{
outer_loop = alloc_loop ();
outer_loop->header = l0_bb;
outer_loop->latch = l2_bb;
add_loop (outer_loop, l0_bb->loop_father);
}
if (!gimple_omp_for_combined_p (fd->for_stmt))
{
struct loop *loop = alloc_loop ();
loop->header = l1_bb;
/* The loop may have multiple latches. */
add_loop (loop, outer_loop);
}
...
And if that doesn't work out, we try to mark the loop state for fixup,
in expand_omp_taskreg and expand_omp_target:
...
/* When the OMP expansion process cannot guarantee an up-to-date
loop tree arrange for the child function to fixup loops. */
if (loops_state_satisfies_p (LOOPS_NEED_FIXUP))
child_cfun->x_current_loops->state |= LOOPS_NEED_FIXUP;
...
and expand_omp_for:
...
else
/* If there isn't a continue then this is a degerate case where
the introduction of abnormal edges during lowering will prevent
original loops from being detected. Fix that up. */
loops_state_set (LOOPS_NEED_FIXUP);
...
However, loops are fixed up anyway, because the first pass we execute
with the new child function is pass_fixup_cfg.
The new child function contains a function call to
__builtin_omp_get_num_threads, which is marked with ECF_CONST, so
execute_fixup_cfg marks the function for TODO_cleanup_cfg, and
subsequently the loops with LOOPS_NEED_FIXUP.
II.
This patch adds a verification that at the end of the omp-expand
processing of the child function, either the loop structure is ok, or
marked for fixup.
This verfication triggered a failure in parloops. When an outer loop is
being parallelized, both the outer and inner loop are cancelled. Then
during omp-expansion, we create a loop in the loop state for the outer
loop (the one that is transformed), but not for the inner, which causes
the verification failure:
...
outer-1.c:11:3: error: loop with header 5 not in loop tree
...
[ I ran into this verification failure with an openacc kernels testcase
on the gomp-4_0-branch, where parloops is called additionally from a
different location, and pass_fixup_cfg is not the first pass that the
child function is processed by. ]
The patch contains a bit that makes sure that the loop state of the
child function is marked for fixup in parloops. The bit is non-trival
since it create a loop state and sets the fixup flag on the loop state,
but postpones the init_loops_structure call till move_sese_region_to_fn,
where it can succeed.
III.
Bootstrapped and reg-tested on x86_64.
OK for trunk?
Thanks,
- Tom
[-- Attachment #2: 0001-Mark-inner-loop-for-fixup-in-parloops.patch --]
[-- Type: text/x-patch, Size: 3649 bytes --]
Mark inner loop for fixup in parloops
2015-07-13 Tom de Vries <tom@codesourcery.com>
PR tree-optimization/66846
* omp-low.c (expand_omp_taskreg) [ENABLE_CHECKING]: If
!LOOPS_NEED_FIXUP, verify_loop_structure in child_fn.
(expand_omp_target) [ENABLE_CHECKING]: Same.
* tree-cfg.c (move_sese_region_to_fn): Only allocate struct loops if
necessary. Preserve dest_cfun->x_current_loops->state while calling
init_loops_structure.
* tree-parloops.c (gen_parallel_loop): If inner loop is cancelled,
allocate struct loop and mark child_fn for loop fixup.
---
gcc/omp-low.c | 8 ++++++++
gcc/tree-cfg.c | 26 +++++++++++++++++++++-----
gcc/tree-parloops.c | 10 +++++++++-
3 files changed, 38 insertions(+), 6 deletions(-)
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 0e69bc2..64d9742 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -5603,6 +5603,10 @@ expand_omp_taskreg (struct omp_region *region)
}
if (gimple_in_ssa_p (cfun))
update_ssa (TODO_update_ssa);
+#ifdef ENABLE_CHECKING
+ if (!loops_state_satisfies_p (LOOPS_NEED_FIXUP))
+ verify_loop_structure ();
+#endif
pop_cfun ();
}
@@ -8983,6 +8987,10 @@ expand_omp_target (struct omp_region *region)
if (changed)
cleanup_tree_cfg ();
}
+#ifdef ENABLE_CHECKING
+ if (!loops_state_satisfies_p (LOOPS_NEED_FIXUP))
+ verify_loop_structure ();
+#endif
pop_cfun ();
}
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index d97b824..6b415fe 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -7126,11 +7126,27 @@ move_sese_region_to_fn (struct function *dest_cfun, basic_block entry_bb,
}
}
- /* Initialize an empty loop tree. */
- struct loops *loops = ggc_cleared_alloc<struct loops> ();
- init_loops_structure (dest_cfun, loops, 1);
- loops->state = LOOPS_MAY_HAVE_MULTIPLE_LATCHES;
- set_loops_for_fn (dest_cfun, loops);
+ struct loops *loops;
+ int loop_state_flags = 0;
+ if (dest_cfun->x_current_loops == NULL)
+ {
+ /* Initialize an empty loop tree. */
+ loops = ggc_cleared_alloc<struct loops> ();
+ set_loops_for_fn (dest_cfun, loops);
+ }
+ else
+ {
+ loops = dest_cfun->x_current_loops;
+ loop_state_flags = loops->state;
+ }
+
+ if (loops->tree_root == NULL)
+ {
+ init_loops_structure (dest_cfun, loops, 1);
+ loops->state |= loop_state_flags;
+ }
+
+ loops->state |= LOOPS_MAY_HAVE_MULTIPLE_LATCHES;
/* Move the outlined loop tree part. */
num_nodes = bbs.length ();
diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index 036677b..23b134f 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -2267,7 +2267,8 @@ gen_parallel_loop (struct loop *loop,
cond_stmt = last_stmt (loop->header);
if (cond_stmt)
loc = gimple_location (cond_stmt);
- create_parallel_loop (loop, create_loop_fn (loc), arg_struct,
+ tree child_fn = create_loop_fn (loc);
+ create_parallel_loop (loop, child_fn, arg_struct,
new_arg_struct, n_threads, loc);
if (reduction_list->elements () > 0)
create_call_for_reduction (loop, reduction_list, &clsn_data);
@@ -2276,6 +2277,13 @@ gen_parallel_loop (struct loop *loop,
/* Cancel the loop (it is simpler to do it here rather than to teach the
expander to do it). */
+ if (loop->inner)
+ {
+ struct function *child_cfun = DECL_STRUCT_FUNCTION (child_fn);
+ struct loops *loops = ggc_cleared_alloc<struct loops> ();
+ set_loops_for_fn (child_cfun, loops);
+ child_cfun->x_current_loops->state = LOOPS_NEED_FIXUP;
+ }
cancel_loop_tree (loop);
/* Free loop bound estimations that could contain references to
--
1.9.1
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH, PR66846] Mark inner loop for fixup in parloops
2015-07-15 19:49 [PATCH, PR66846] Mark inner loop for fixup in parloops Tom de Vries
@ 2015-07-16 8:46 ` Richard Biener
2015-07-16 10:12 ` Tom de Vries
0 siblings, 1 reply; 9+ messages in thread
From: Richard Biener @ 2015-07-16 8:46 UTC (permalink / raw)
To: Tom de Vries; +Cc: gcc-patches
On Wed, Jul 15, 2015 at 9:36 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
> Hi,
>
> I.
>
> In openmp expansion of loops, we do some effort to try to create matching
> loops in the loop state of the child function, f.i.in
> expand_omp_for_generic:
> ...
> struct loop *outer_loop;
> if (seq_loop)
> outer_loop = l0_bb->loop_father;
> else
> {
> outer_loop = alloc_loop ();
> outer_loop->header = l0_bb;
> outer_loop->latch = l2_bb;
> add_loop (outer_loop, l0_bb->loop_father);
> }
>
> if (!gimple_omp_for_combined_p (fd->for_stmt))
> {
> struct loop *loop = alloc_loop ();
> loop->header = l1_bb;
> /* The loop may have multiple latches. */
> add_loop (loop, outer_loop);
> }
> ...
>
> And if that doesn't work out, we try to mark the loop state for fixup, in
> expand_omp_taskreg and expand_omp_target:
> ...
> /* When the OMP expansion process cannot guarantee an up-to-date
> loop tree arrange for the child function to fixup loops. */
> if (loops_state_satisfies_p (LOOPS_NEED_FIXUP))
> child_cfun->x_current_loops->state |= LOOPS_NEED_FIXUP;
> ...
>
> and expand_omp_for:
> ...
> else
> /* If there isn't a continue then this is a degerate case where
> the introduction of abnormal edges during lowering will prevent
> original loops from being detected. Fix that up. */
> loops_state_set (LOOPS_NEED_FIXUP);
> ...
>
> However, loops are fixed up anyway, because the first pass we execute with
> the new child function is pass_fixup_cfg.
>
> The new child function contains a function call to
> __builtin_omp_get_num_threads, which is marked with ECF_CONST, so
> execute_fixup_cfg marks the function for TODO_cleanup_cfg, and subsequently
> the loops with LOOPS_NEED_FIXUP.
>
>
> II.
>
> This patch adds a verification that at the end of the omp-expand processing
> of the child function, either the loop structure is ok, or marked for fixup.
>
> This verfication triggered a failure in parloops. When an outer loop is
> being parallelized, both the outer and inner loop are cancelled. Then during
> omp-expansion, we create a loop in the loop state for the outer loop (the
> one that is transformed), but not for the inner, which causes the
> verification failure:
> ...
> outer-1.c:11:3: error: loop with header 5 not in loop tree
> ...
>
> [ I ran into this verification failure with an openacc kernels testcase on
> the gomp-4_0-branch, where parloops is called additionally from a different
> location, and pass_fixup_cfg is not the first pass that the child function
> is processed by. ]
>
> The patch contains a bit that makes sure that the loop state of the child
> function is marked for fixup in parloops. The bit is non-trival since it
> create a loop state and sets the fixup flag on the loop state, but postpones
> the init_loops_structure call till move_sese_region_to_fn, where it can
> succeed.
>
>
> III.
>
> Bootstrapped and reg-tested on x86_64.
>
> OK for trunk?
+ struct loops *loops;
+ int loop_state_flags = 0;
+ if (dest_cfun->x_current_loops == NULL)
+ {
+ /* Initialize an empty loop tree. */
+ loops = ggc_cleared_alloc<struct loops> ();
+ set_loops_for_fn (dest_cfun, loops);
+ }
+ else
+ {
+ loops = dest_cfun->x_current_loops;
+ loop_state_flags = loops->state;
+ }
+
+ if (loops->tree_root == NULL)
+ {
+ init_loops_structure (dest_cfun, loops, 1);
+ loops->state |= loop_state_flags;
+ }
+
+ loops->state |= LOOPS_MAY_HAVE_MULTIPLE_LATCHES;
this looks twisted just because you do a half-way init of the loop tree here:
+ if (loop->inner)
+ {
+ struct function *child_cfun = DECL_STRUCT_FUNCTION (child_fn);
+ struct loops *loops = ggc_cleared_alloc<struct loops> ();
+ set_loops_for_fn (child_cfun, loops);
+ child_cfun->x_current_loops->state = LOOPS_NEED_FIXUP;
+ }
why not unconditionally initialize the loop tree properly in autopar?
> Thanks,
> - Tom
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH, PR66846] Mark inner loop for fixup in parloops
2015-07-16 8:46 ` Richard Biener
@ 2015-07-16 10:12 ` Tom de Vries
2015-07-16 10:23 ` Richard Biener
0 siblings, 1 reply; 9+ messages in thread
From: Tom de Vries @ 2015-07-16 10:12 UTC (permalink / raw)
To: Richard Biener; +Cc: gcc-patches
On 16/07/15 10:44, Richard Biener wrote:
> On Wed, Jul 15, 2015 at 9:36 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>> Hi,
>>
>> I.
>>
>> In openmp expansion of loops, we do some effort to try to create matching
>> loops in the loop state of the child function, f.i.in
>> expand_omp_for_generic:
>> ...
>> struct loop *outer_loop;
>> if (seq_loop)
>> outer_loop = l0_bb->loop_father;
>> else
>> {
>> outer_loop = alloc_loop ();
>> outer_loop->header = l0_bb;
>> outer_loop->latch = l2_bb;
>> add_loop (outer_loop, l0_bb->loop_father);
>> }
>>
>> if (!gimple_omp_for_combined_p (fd->for_stmt))
>> {
>> struct loop *loop = alloc_loop ();
>> loop->header = l1_bb;
>> /* The loop may have multiple latches. */
>> add_loop (loop, outer_loop);
>> }
>> ...
>>
>> And if that doesn't work out, we try to mark the loop state for fixup, in
>> expand_omp_taskreg and expand_omp_target:
>> ...
>> /* When the OMP expansion process cannot guarantee an up-to-date
>> loop tree arrange for the child function to fixup loops. */
>> if (loops_state_satisfies_p (LOOPS_NEED_FIXUP))
>> child_cfun->x_current_loops->state |= LOOPS_NEED_FIXUP;
>> ...
>>
>> and expand_omp_for:
>> ...
>> else
>> /* If there isn't a continue then this is a degerate case where
>> the introduction of abnormal edges during lowering will prevent
>> original loops from being detected. Fix that up. */
>> loops_state_set (LOOPS_NEED_FIXUP);
>> ...
>>
>> However, loops are fixed up anyway, because the first pass we execute with
>> the new child function is pass_fixup_cfg.
>>
>> The new child function contains a function call to
>> __builtin_omp_get_num_threads, which is marked with ECF_CONST, so
>> execute_fixup_cfg marks the function for TODO_cleanup_cfg, and subsequently
>> the loops with LOOPS_NEED_FIXUP.
>>
>>
>> II.
>>
>> This patch adds a verification that at the end of the omp-expand processing
>> of the child function, either the loop structure is ok, or marked for fixup.
>>
>> This verfication triggered a failure in parloops. When an outer loop is
>> being parallelized, both the outer and inner loop are cancelled. Then during
>> omp-expansion, we create a loop in the loop state for the outer loop (the
>> one that is transformed), but not for the inner, which causes the
>> verification failure:
>> ...
>> outer-1.c:11:3: error: loop with header 5 not in loop tree
>> ...
>>
>> [ I ran into this verification failure with an openacc kernels testcase on
>> the gomp-4_0-branch, where parloops is called additionally from a different
>> location, and pass_fixup_cfg is not the first pass that the child function
>> is processed by. ]
>>
>> The patch contains a bit that makes sure that the loop state of the child
>> function is marked for fixup in parloops. The bit is non-trival since it
>> create a loop state and sets the fixup flag on the loop state, but postpones
>> the init_loops_structure call till move_sese_region_to_fn, where it can
>> succeed.
>>
>>
>> III.
>>
>> Bootstrapped and reg-tested on x86_64.
>>
>> OK for trunk?
>
> + struct loops *loops;
> + int loop_state_flags = 0;
> + if (dest_cfun->x_current_loops == NULL)
> + {
> + /* Initialize an empty loop tree. */
> + loops = ggc_cleared_alloc<struct loops> ();
> + set_loops_for_fn (dest_cfun, loops);
> + }
> + else
> + {
> + loops = dest_cfun->x_current_loops;
> + loop_state_flags = loops->state;
> + }
> +
> + if (loops->tree_root == NULL)
> + {
> + init_loops_structure (dest_cfun, loops, 1);
> + loops->state |= loop_state_flags;
> + }
> +
> + loops->state |= LOOPS_MAY_HAVE_MULTIPLE_LATCHES;
>
> this looks twisted just because you do a half-way init of the loop tree here:
>
> + if (loop->inner)
> + {
> + struct function *child_cfun = DECL_STRUCT_FUNCTION (child_fn);
> + struct loops *loops = ggc_cleared_alloc<struct loops> ();
> + set_loops_for_fn (child_cfun, loops);
> + child_cfun->x_current_loops->state = LOOPS_NEED_FIXUP;
> + }
>
> why not unconditionally initialize the loop tree properly in autopar?
>
Because init_loops_structure accesses f.i. n_basic_blocks_for_fn
(child_cfun), in other words child_cfun->cfg->x_n_basic_blocks. At this
point in parloops, there's no child_cfun->cfg yet, that field is set by
the following pass_expand_omp_ssa.
Normally, the solution is to do loops_state_set (LOOPS_NEED_FIXUP) for
the parent function, which will get propagated to the child. But I'm
trying to be more precise than that, by only setting LOOPS_NEED_FIXUP
for the child, not the parent.
Thanks,
- Tom
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH, PR66846] Mark inner loop for fixup in parloops
2015-07-16 10:12 ` Tom de Vries
@ 2015-07-16 10:23 ` Richard Biener
2015-07-20 13:38 ` Tom de Vries
0 siblings, 1 reply; 9+ messages in thread
From: Richard Biener @ 2015-07-16 10:23 UTC (permalink / raw)
To: Tom de Vries; +Cc: gcc-patches
On Thu, Jul 16, 2015 at 11:39 AM, Tom de Vries <Tom_deVries@mentor.com> wrote:
> On 16/07/15 10:44, Richard Biener wrote:
>>
>> On Wed, Jul 15, 2015 at 9:36 PM, Tom de Vries <Tom_deVries@mentor.com>
>> wrote:
>>>
>>> Hi,
>>>
>>> I.
>>>
>>> In openmp expansion of loops, we do some effort to try to create matching
>>> loops in the loop state of the child function, f.i.in
>>> expand_omp_for_generic:
>>> ...
>>> struct loop *outer_loop;
>>> if (seq_loop)
>>> outer_loop = l0_bb->loop_father;
>>> else
>>> {
>>> outer_loop = alloc_loop ();
>>> outer_loop->header = l0_bb;
>>> outer_loop->latch = l2_bb;
>>> add_loop (outer_loop, l0_bb->loop_father);
>>> }
>>>
>>> if (!gimple_omp_for_combined_p (fd->for_stmt))
>>> {
>>> struct loop *loop = alloc_loop ();
>>> loop->header = l1_bb;
>>> /* The loop may have multiple latches. */
>>> add_loop (loop, outer_loop);
>>> }
>>> ...
>>>
>>> And if that doesn't work out, we try to mark the loop state for fixup, in
>>> expand_omp_taskreg and expand_omp_target:
>>> ...
>>> /* When the OMP expansion process cannot guarantee an up-to-date
>>> loop tree arrange for the child function to fixup loops. */
>>> if (loops_state_satisfies_p (LOOPS_NEED_FIXUP))
>>> child_cfun->x_current_loops->state |= LOOPS_NEED_FIXUP;
>>> ...
>>>
>>> and expand_omp_for:
>>> ...
>>> else
>>> /* If there isn't a continue then this is a degerate case where
>>> the introduction of abnormal edges during lowering will prevent
>>> original loops from being detected. Fix that up. */
>>> loops_state_set (LOOPS_NEED_FIXUP);
>>> ...
>>>
>>> However, loops are fixed up anyway, because the first pass we execute
>>> with
>>> the new child function is pass_fixup_cfg.
>>>
>>> The new child function contains a function call to
>>> __builtin_omp_get_num_threads, which is marked with ECF_CONST, so
>>> execute_fixup_cfg marks the function for TODO_cleanup_cfg, and
>>> subsequently
>>> the loops with LOOPS_NEED_FIXUP.
>>>
>>>
>>> II.
>>>
>>> This patch adds a verification that at the end of the omp-expand
>>> processing
>>> of the child function, either the loop structure is ok, or marked for
>>> fixup.
>>>
>>> This verfication triggered a failure in parloops. When an outer loop is
>>> being parallelized, both the outer and inner loop are cancelled. Then
>>> during
>>> omp-expansion, we create a loop in the loop state for the outer loop (the
>>> one that is transformed), but not for the inner, which causes the
>>> verification failure:
>>> ...
>>> outer-1.c:11:3: error: loop with header 5 not in loop tree
>>> ...
>>>
>>> [ I ran into this verification failure with an openacc kernels testcase
>>> on
>>> the gomp-4_0-branch, where parloops is called additionally from a
>>> different
>>> location, and pass_fixup_cfg is not the first pass that the child
>>> function
>>> is processed by. ]
>>>
>>> The patch contains a bit that makes sure that the loop state of the child
>>> function is marked for fixup in parloops. The bit is non-trival since it
>>> create a loop state and sets the fixup flag on the loop state, but
>>> postpones
>>> the init_loops_structure call till move_sese_region_to_fn, where it can
>>> succeed.
>>>
>>>
>>> III.
>>>
>>> Bootstrapped and reg-tested on x86_64.
>>>
>>> OK for trunk?
>>
>>
>> + struct loops *loops;
>> + int loop_state_flags = 0;
>> + if (dest_cfun->x_current_loops == NULL)
>> + {
>> + /* Initialize an empty loop tree. */
>> + loops = ggc_cleared_alloc<struct loops> ();
>> + set_loops_for_fn (dest_cfun, loops);
>> + }
>> + else
>> + {
>> + loops = dest_cfun->x_current_loops;
>> + loop_state_flags = loops->state;
>> + }
>> +
>> + if (loops->tree_root == NULL)
>> + {
>> + init_loops_structure (dest_cfun, loops, 1);
>> + loops->state |= loop_state_flags;
>> + }
>> +
>> + loops->state |= LOOPS_MAY_HAVE_MULTIPLE_LATCHES;
>>
>> this looks twisted just because you do a half-way init of the loop tree
>> here:
>>
>> + if (loop->inner)
>> + {
>> + struct function *child_cfun = DECL_STRUCT_FUNCTION (child_fn);
>> + struct loops *loops = ggc_cleared_alloc<struct loops> ();
>> + set_loops_for_fn (child_cfun, loops);
>> + child_cfun->x_current_loops->state = LOOPS_NEED_FIXUP;
>> + }
>>
>> why not unconditionally initialize the loop tree properly in autopar?
>>
>
> Because init_loops_structure accesses f.i. n_basic_blocks_for_fn
> (child_cfun), in other words child_cfun->cfg->x_n_basic_blocks. At this
> point in parloops, there's no child_cfun->cfg yet, that field is set by the
> following pass_expand_omp_ssa.
Well. The above exposes too much internals about how this all works
(IMHO).
> Normally, the solution is to do loops_state_set (LOOPS_NEED_FIXUP) for the
> parent function, which will get propagated to the child. But I'm trying to
> be more precise than that, by only setting LOOPS_NEED_FIXUP for the child,
> not the parent.
Can we fix the root-cause of the issue instead? That is, build a valid loop
structure in the first place?
Richard.
> Thanks,
> - Tom
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH, PR66846] Mark inner loop for fixup in parloops
2015-07-16 10:23 ` Richard Biener
@ 2015-07-20 13:38 ` Tom de Vries
2015-07-24 10:31 ` Tom de Vries
0 siblings, 1 reply; 9+ messages in thread
From: Tom de Vries @ 2015-07-20 13:38 UTC (permalink / raw)
To: Richard Biener; +Cc: gcc-patches
[-- Attachment #1: Type: text/plain, Size: 4450 bytes --]
On 16/07/15 12:15, Richard Biener wrote:
> On Thu, Jul 16, 2015 at 11:39 AM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>> On 16/07/15 10:44, Richard Biener wrote:
>>>
>>> On Wed, Jul 15, 2015 at 9:36 PM, Tom de Vries <Tom_deVries@mentor.com>
>>> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I.
>>>>
>>>> In openmp expansion of loops, we do some effort to try to create matching
>>>> loops in the loop state of the child function, f.i.in
>>>> expand_omp_for_generic:
>>>> ...
>>>> struct loop *outer_loop;
>>>> if (seq_loop)
>>>> outer_loop = l0_bb->loop_father;
>>>> else
>>>> {
>>>> outer_loop = alloc_loop ();
>>>> outer_loop->header = l0_bb;
>>>> outer_loop->latch = l2_bb;
>>>> add_loop (outer_loop, l0_bb->loop_father);
>>>> }
>>>>
>>>> if (!gimple_omp_for_combined_p (fd->for_stmt))
>>>> {
>>>> struct loop *loop = alloc_loop ();
>>>> loop->header = l1_bb;
>>>> /* The loop may have multiple latches. */
>>>> add_loop (loop, outer_loop);
>>>> }
>>>> ...
>>>>
>>>> And if that doesn't work out, we try to mark the loop state for fixup, in
>>>> expand_omp_taskreg and expand_omp_target:
>>>> ...
>>>> /* When the OMP expansion process cannot guarantee an up-to-date
>>>> loop tree arrange for the child function to fixup loops. */
>>>> if (loops_state_satisfies_p (LOOPS_NEED_FIXUP))
>>>> child_cfun->x_current_loops->state |= LOOPS_NEED_FIXUP;
>>>> ...
>>>>
>>>> and expand_omp_for:
>>>> ...
>>>> else
>>>> /* If there isn't a continue then this is a degerate case where
>>>> the introduction of abnormal edges during lowering will prevent
>>>> original loops from being detected. Fix that up. */
>>>> loops_state_set (LOOPS_NEED_FIXUP);
>>>> ...
>>>>
>>>> However, loops are fixed up anyway, because the first pass we execute
>>>> with
>>>> the new child function is pass_fixup_cfg.
>>>>
>>>> The new child function contains a function call to
>>>> __builtin_omp_get_num_threads, which is marked with ECF_CONST, so
>>>> execute_fixup_cfg marks the function for TODO_cleanup_cfg, and
>>>> subsequently
>>>> the loops with LOOPS_NEED_FIXUP.
>>>>
>>>>
>>>> II.
>>>>
>>>> This patch adds a verification that at the end of the omp-expand
>>>> processing
>>>> of the child function, either the loop structure is ok, or marked for
>>>> fixup.
>>>>
>>>> This verfication triggered a failure in parloops. When an outer loop is
>>>> being parallelized, both the outer and inner loop are cancelled. Then
>>>> during
>>>> omp-expansion, we create a loop in the loop state for the outer loop (the
>>>> one that is transformed), but not for the inner, which causes the
>>>> verification failure:
>>>> ...
>>>> outer-1.c:11:3: error: loop with header 5 not in loop tree
>>>> ...
>>>>
>>>> [ I ran into this verification failure with an openacc kernels testcase
>>>> on
>>>> the gomp-4_0-branch, where parloops is called additionally from a
>>>> different
>>>> location, and pass_fixup_cfg is not the first pass that the child
>>>> function
>>>> is processed by. ]
>>>>
>>>> The patch contains a bit that makes sure that the loop state of the child
>>>> function is marked for fixup in parloops. The bit is non-trival since it
>>>> create a loop state and sets the fixup flag on the loop state, but
>>>> postpones
>>>> the init_loops_structure call till move_sese_region_to_fn, where it can
>>>> succeed.
>>>>
>>>>
<SNIP>
> Can we fix the root-cause of the issue instead? That is, build a valid loop
> structure in the first place?
>
This patch manages to keep the loop structure, that is, to not cancel
the loop tree in parloops, and guarantee a valid loop structure at the
end of parloops.
The transformation to insert the omp_for invalidates the loop state
properties LOOPS_HAVE_RECORDED_EXITS and LOOPS_HAVE_SIMPLE_LATCHES, so
we drop those in parloops.
In expand_omp_for_static_nochunk, we detect the existing loop struct of
the omp_for, and keep it.
Then by calling pass_tree_loop_init after pass_expand_omp_ssa, we get
the loop state properties LOOPS_HAVE_RECORDED_EXITS and
LOOPS_HAVE_SIMPLE_LATCHES back.
Tested by running:
- gcc dg.exp "parloops*.c"
- gcc autopar.exp
- target-libgomp c.exp
Currently bootstrapping and reg-testing on x86_64.
If that succeeds, OK for trunk?
Thanks,
- Tom
[-- Attachment #2: 0001-Don-t-cancel-loop-tree-in-parloops.patch --]
[-- Type: text/x-patch, Size: 6026 bytes --]
Don't cancel loop tree in parloops
2015-07-20 Tom de Vries <tom@codesourcery.com>
PR tree-optimization/66846
* omp-low.c (expand_omp_taskreg) [ENABLE_CHECKING]: Call
verify_loop_structure for child_cfun if !LOOPS_NEED_FIXUP.
(expand_omp_target) [ENABLE_CHECKING]: Same.
(execute_expand_omp) [ENABLE_CHECKING]: Call verify_loop_structure for
cfun if !LOOPS_NEED_FIXUP.
(expand_omp_for_static_nochunk): Handle case that omp_for already has
its own loop struct.
* passes.def: Add pass_tree_loop_init after pass_expand_omp_ssa in
pass_parallelize_loops.
* tree-parloops.c (create_parallel_loop): Add comment.
(gen_parallel_loop): Remove call to cancel_loop_tree.
(parallelize_loops): Skip loops that are inner loops of parallelized
loops.
(pass_parallelize_loops::execute): Clear LOOPS_HAVE_RECORDED_EXITS and
LOOPS_HAVE_SIMPLE_LATCHES on loop state.
[ENABLE_CHECKING]: Call verify_loop_structure.
* tree-ssa-loop.c (pass_tree_loop_init::clone): New function.
---
gcc/omp-low.c | 22 +++++++++++++++++++++-
gcc/passes.def | 1 +
gcc/tree-parloops.c | 33 +++++++++++++++++++++++++++++----
gcc/tree-ssa-loop.c | 1 +
4 files changed, 52 insertions(+), 5 deletions(-)
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 3135606..dce0c02 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -5604,6 +5604,10 @@ expand_omp_taskreg (struct omp_region *region)
}
if (gimple_in_ssa_p (cfun))
update_ssa (TODO_update_ssa);
+#ifdef ENABLE_CHECKING
+ if (!loops_state_satisfies_p (LOOPS_NEED_FIXUP))
+ verify_loop_structure ();
+#endif
pop_cfun ();
}
@@ -6843,9 +6847,17 @@ expand_omp_for_static_nochunk (struct omp_region *region,
set_immediate_dominator (CDI_DOMINATORS, fin_bb,
recompute_dominator (CDI_DOMINATORS, fin_bb));
+ struct loop *loop = body_bb->loop_father;
+ if (loop != entry_bb->loop_father)
+ {
+ gcc_assert (loop->header == body_bb);
+ gcc_assert (broken_loop || loop->latch == region->cont);
+ return;
+ }
+
if (!broken_loop && !gimple_omp_for_combined_p (fd->for_stmt))
{
- struct loop *loop = alloc_loop ();
+ loop = alloc_loop ();
loop->header = body_bb;
if (collapse_bb == NULL)
loop->latch = cont_bb;
@@ -8984,6 +8996,10 @@ expand_omp_target (struct omp_region *region)
if (changed)
cleanup_tree_cfg ();
}
+#ifdef ENABLE_CHECKING
+ if (!loops_state_satisfies_p (LOOPS_NEED_FIXUP))
+ verify_loop_structure ();
+#endif
pop_cfun ();
}
@@ -9492,6 +9508,10 @@ execute_expand_omp (void)
expand_omp (root_omp_region);
+#ifdef ENABLE_CHECKING
+ if (!loops_state_satisfies_p (LOOPS_NEED_FIXUP))
+ verify_loop_structure ();
+#endif
cleanup_tree_cfg ();
free_omp_regions ();
diff --git a/gcc/passes.def b/gcc/passes.def
index 6b66f8f..ea3a800 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -246,6 +246,7 @@ along with GCC; see the file COPYING3. If not see
NEXT_PASS (pass_parallelize_loops);
PUSH_INSERT_PASSES_WITHIN (pass_parallelize_loops)
NEXT_PASS (pass_expand_omp_ssa);
+ NEXT_PASS (pass_tree_loop_init);
POP_INSERT_PASSES ()
NEXT_PASS (pass_ch_vect);
NEXT_PASS (pass_if_conversion);
diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index ec41834..e33071d 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -2045,7 +2045,14 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
guard = make_edge (for_bb, ex_bb, 0);
single_succ_edge (loop->latch)->flags = 0;
+
+ /* After creating this edge, the latch has two successors, so
+ LOOPS_HAVE_SIMPLE_LATCHES is no longer valid. Furthermore, this exit is
+ not recorded. We'll update the loop state as such at the end of the
+ pass, since it's not needed earlier, and doing it earlier will invalidate
+ info for loops we still need to process. */
end = make_edge (loop->latch, ex_bb, EDGE_FALLTHRU);
+
for (gphi_iterator gpi = gsi_start_phis (ex_bb);
!gsi_end_p (gpi); gsi_next (&gpi))
{
@@ -2282,10 +2289,6 @@ gen_parallel_loop (struct loop *loop,
scev_reset ();
- /* Cancel the loop (it is simpler to do it here rather than to teach the
- expander to do it). */
- cancel_loop_tree (loop);
-
/* Free loop bound estimations that could contain references to
removed statements. */
FOR_EACH_LOOP (loop, 0)
@@ -2521,6 +2524,7 @@ parallelize_loops (void)
unsigned n_threads = flag_tree_parallelize_loops;
bool changed = false;
struct loop *loop;
+ struct loop *skip_loop = NULL;
struct tree_niter_desc niter_desc;
struct obstack parloop_obstack;
HOST_WIDE_INT estimated;
@@ -2538,6 +2542,19 @@ parallelize_loops (void)
FOR_EACH_LOOP (loop, 0)
{
+ if (loop == skip_loop)
+ {
+ if (dump_file && (dump_flags & TDF_DETAILS))
+ fprintf (dump_file,
+ "Skipping loop %d as inner loop of parallelized loop\n",
+ loop->num);
+
+ skip_loop = loop->inner;
+ continue;
+ }
+ else
+ skip_loop = NULL;
+
reduction_list.empty ();
if (dump_file && (dump_flags & TDF_DETAILS))
{
@@ -2597,6 +2614,7 @@ parallelize_loops (void)
continue;
changed = true;
+ skip_loop = loop->inner;
if (dump_file && (dump_flags & TDF_DETAILS))
{
if (loop->inner)
@@ -2663,6 +2681,13 @@ pass_parallelize_loops::execute (function *fun)
if (parallelize_loops ())
{
fun->curr_properties &= ~(PROP_gimple_eomp);
+
+ release_recorded_exits ();
+ loops_state_clear (LOOPS_HAVE_SIMPLE_LATCHES);
+#ifdef ENABLE_CHECKING
+ verify_loop_structure ();
+#endif
+
return TODO_update_ssa;
}
diff --git a/gcc/tree-ssa-loop.c b/gcc/tree-ssa-loop.c
index 078b1d1..3520d92 100644
--- a/gcc/tree-ssa-loop.c
+++ b/gcc/tree-ssa-loop.c
@@ -210,6 +210,7 @@ public:
/* opt_pass methods: */
virtual unsigned int execute (function *);
+ opt_pass * clone () { return new pass_tree_loop_init (m_ctxt); }
}; // class pass_tree_loop_init
--
1.9.1
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH, PR66846] Mark inner loop for fixup in parloops
2015-07-20 13:38 ` Tom de Vries
@ 2015-07-24 10:31 ` Tom de Vries
2015-07-28 10:14 ` Richard Biener
0 siblings, 1 reply; 9+ messages in thread
From: Tom de Vries @ 2015-07-24 10:31 UTC (permalink / raw)
To: Richard Biener; +Cc: gcc-patches
[-- Attachment #1: Type: text/plain, Size: 4860 bytes --]
On 20/07/15 15:04, Tom de Vries wrote:
> On 16/07/15 12:15, Richard Biener wrote:
>> On Thu, Jul 16, 2015 at 11:39 AM, Tom de Vries
>> <Tom_deVries@mentor.com> wrote:
>>> On 16/07/15 10:44, Richard Biener wrote:
>>>>
>>>> On Wed, Jul 15, 2015 at 9:36 PM, Tom de Vries <Tom_deVries@mentor.com>
>>>> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I.
>>>>>
>>>>> In openmp expansion of loops, we do some effort to try to create
>>>>> matching
>>>>> loops in the loop state of the child function, f.i.in
>>>>> expand_omp_for_generic:
>>>>> ...
>>>>> struct loop *outer_loop;
>>>>> if (seq_loop)
>>>>> outer_loop = l0_bb->loop_father;
>>>>> else
>>>>> {
>>>>> outer_loop = alloc_loop ();
>>>>> outer_loop->header = l0_bb;
>>>>> outer_loop->latch = l2_bb;
>>>>> add_loop (outer_loop, l0_bb->loop_father);
>>>>> }
>>>>>
>>>>> if (!gimple_omp_for_combined_p (fd->for_stmt))
>>>>> {
>>>>> struct loop *loop = alloc_loop ();
>>>>> loop->header = l1_bb;
>>>>> /* The loop may have multiple latches. */
>>>>> add_loop (loop, outer_loop);
>>>>> }
>>>>> ...
>>>>>
>>>>> And if that doesn't work out, we try to mark the loop state for
>>>>> fixup, in
>>>>> expand_omp_taskreg and expand_omp_target:
>>>>> ...
>>>>> /* When the OMP expansion process cannot guarantee an
>>>>> up-to-date
>>>>> loop tree arrange for the child function to fixup
>>>>> loops. */
>>>>> if (loops_state_satisfies_p (LOOPS_NEED_FIXUP))
>>>>> child_cfun->x_current_loops->state |= LOOPS_NEED_FIXUP;
>>>>> ...
>>>>>
>>>>> and expand_omp_for:
>>>>> ...
>>>>> else
>>>>> /* If there isn't a continue then this is a degerate case where
>>>>> the introduction of abnormal edges during lowering will
>>>>> prevent
>>>>> original loops from being detected. Fix that up. */
>>>>> loops_state_set (LOOPS_NEED_FIXUP);
>>>>> ...
>>>>>
>>>>> However, loops are fixed up anyway, because the first pass we execute
>>>>> with
>>>>> the new child function is pass_fixup_cfg.
>>>>>
>>>>> The new child function contains a function call to
>>>>> __builtin_omp_get_num_threads, which is marked with ECF_CONST, so
>>>>> execute_fixup_cfg marks the function for TODO_cleanup_cfg, and
>>>>> subsequently
>>>>> the loops with LOOPS_NEED_FIXUP.
>>>>>
>>>>>
>>>>> II.
>>>>>
>>>>> This patch adds a verification that at the end of the omp-expand
>>>>> processing
>>>>> of the child function, either the loop structure is ok, or marked for
>>>>> fixup.
>>>>>
>>>>> This verfication triggered a failure in parloops. When an outer
>>>>> loop is
>>>>> being parallelized, both the outer and inner loop are cancelled. Then
>>>>> during
>>>>> omp-expansion, we create a loop in the loop state for the outer
>>>>> loop (the
>>>>> one that is transformed), but not for the inner, which causes the
>>>>> verification failure:
>>>>> ...
>>>>> outer-1.c:11:3: error: loop with header 5 not in loop tree
>>>>> ...
>>>>>
>>>>> [ I ran into this verification failure with an openacc kernels
>>>>> testcase
>>>>> on
>>>>> the gomp-4_0-branch, where parloops is called additionally from a
>>>>> different
>>>>> location, and pass_fixup_cfg is not the first pass that the child
>>>>> function
>>>>> is processed by. ]
>>>>>
>>>>> The patch contains a bit that makes sure that the loop state of the
>>>>> child
>>>>> function is marked for fixup in parloops. The bit is non-trival
>>>>> since it
>>>>> create a loop state and sets the fixup flag on the loop state, but
>>>>> postpones
>>>>> the init_loops_structure call till move_sese_region_to_fn, where it
>>>>> can
>>>>> succeed.
>>>>>
>>>>>
>
> <SNIP>
>
>> Can we fix the root-cause of the issue instead? That is, build a
>> valid loop
>> structure in the first place?
>>
>
> This patch manages to keep the loop structure, that is, to not cancel
> the loop tree in parloops, and guarantee a valid loop structure at the
> end of parloops.
>
> The transformation to insert the omp_for invalidates the loop state
> properties LOOPS_HAVE_RECORDED_EXITS and LOOPS_HAVE_SIMPLE_LATCHES, so
> we drop those in parloops.
>
> In expand_omp_for_static_nochunk, we detect the existing loop struct of
> the omp_for, and keep it.
>
> Then by calling pass_tree_loop_init after pass_expand_omp_ssa, we get
> the loop state properties LOOPS_HAVE_RECORDED_EXITS and
> LOOPS_HAVE_SIMPLE_LATCHES back.
>
This updated patch tries a more minimal approach.
Rather than dropping property LOOPS_HAVE_RECORDED_EXITS, we record the
new exit instead.
And rather than adding pass_tree_loop_init after pass_expand_omp_ssa, we
just set LOOPS_HAVE_SIMPLE_LATCHES back at the end of pass_expand_omp_ssa.
Bootstrapped and reg-tested on x86_64.
OK for trunk?
Thanks,
- Tom
[-- Attachment #2: 0001-Don-t-cancel-loop-tree-in-parloops.patch --]
[-- Type: text/x-patch, Size: 5269 bytes --]
Don't cancel loop tree in parloops
2015-07-20 Tom de Vries <tom@codesourcery.com>
PR tree-optimization/66846
* omp-low.c (expand_omp_taskreg) [ENABLE_CHECKING]: Call
verify_loop_structure for child_cfun if !LOOPS_NEED_FIXUP.
(expand_omp_target) [ENABLE_CHECKING]: Same.
(execute_expand_omp): Reinstate LOOPS_HAVE_SIMPLE_LATCHES if in ssa.
[ENABLE_CHECKING]: Call verify_loop_structure for cfun if
!LOOPS_NEED_FIXUP.
(expand_omp_for_static_nochunk): Handle case that omp_for already has
its own loop struct.
* tree-parloops.c (create_parallel_loop): Add comment about breaking
LOOPS_HAVE_SIMPLE_LATCHES. Record new exit.
(gen_parallel_loop): Remove call to cancel_loop_tree.
(parallelize_loops): Skip loops that are inner loops of parallelized
loops.
(pass_parallelize_loops::execute): Clear LOOPS_HAVE_SIMPLE_LATCHES on
loop state.
[ENABLE_CHECKING]: Call verify_loop_structure.
---
gcc/omp-low.c | 27 ++++++++++++++++++++++++++-
gcc/tree-parloops.c | 32 ++++++++++++++++++++++++++++----
2 files changed, 54 insertions(+), 5 deletions(-)
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 3135606..ce83abf 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -5604,6 +5604,10 @@ expand_omp_taskreg (struct omp_region *region)
}
if (gimple_in_ssa_p (cfun))
update_ssa (TODO_update_ssa);
+#ifdef ENABLE_CHECKING
+ if (!loops_state_satisfies_p (LOOPS_NEED_FIXUP))
+ verify_loop_structure ();
+#endif
pop_cfun ();
}
@@ -6843,9 +6847,17 @@ expand_omp_for_static_nochunk (struct omp_region *region,
set_immediate_dominator (CDI_DOMINATORS, fin_bb,
recompute_dominator (CDI_DOMINATORS, fin_bb));
+ struct loop *loop = body_bb->loop_father;
+ if (loop != entry_bb->loop_father)
+ {
+ gcc_assert (loop->header == body_bb);
+ gcc_assert (broken_loop || loop->latch == region->cont);
+ return;
+ }
+
if (!broken_loop && !gimple_omp_for_combined_p (fd->for_stmt))
{
- struct loop *loop = alloc_loop ();
+ loop = alloc_loop ();
loop->header = body_bb;
if (collapse_bb == NULL)
loop->latch = cont_bb;
@@ -8984,6 +8996,10 @@ expand_omp_target (struct omp_region *region)
if (changed)
cleanup_tree_cfg ();
}
+#ifdef ENABLE_CHECKING
+ if (!loops_state_satisfies_p (LOOPS_NEED_FIXUP))
+ verify_loop_structure ();
+#endif
pop_cfun ();
}
@@ -9492,6 +9508,15 @@ execute_expand_omp (void)
expand_omp (root_omp_region);
+ /* We dropped this property in parloops because of the omp_for. Now that the
+ omp_for has been expanded, reinstate it. */
+ if (gimple_in_ssa_p (cfun))
+ loops_state_set (LOOPS_HAVE_SIMPLE_LATCHES);
+
+#ifdef ENABLE_CHECKING
+ if (!loops_state_satisfies_p (LOOPS_NEED_FIXUP))
+ verify_loop_structure ();
+#endif
cleanup_tree_cfg ();
free_omp_regions ();
diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index ec41834..1899052 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -2045,7 +2045,14 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
guard = make_edge (for_bb, ex_bb, 0);
single_succ_edge (loop->latch)->flags = 0;
+
+ /* After creating this edge, the latch has two successors, so
+ LOOPS_HAVE_SIMPLE_LATCHES is no longer valid. We'll update the loop state
+ as such at the end of the pass, since it's not needed earlier, and doing it
+ earlier will invalidate info for loops we still need to process. */
end = make_edge (loop->latch, ex_bb, EDGE_FALLTHRU);
+ rescan_loop_exit (end, true, false);
+
for (gphi_iterator gpi = gsi_start_phis (ex_bb);
!gsi_end_p (gpi); gsi_next (&gpi))
{
@@ -2282,10 +2289,6 @@ gen_parallel_loop (struct loop *loop,
scev_reset ();
- /* Cancel the loop (it is simpler to do it here rather than to teach the
- expander to do it). */
- cancel_loop_tree (loop);
-
/* Free loop bound estimations that could contain references to
removed statements. */
FOR_EACH_LOOP (loop, 0)
@@ -2521,6 +2524,7 @@ parallelize_loops (void)
unsigned n_threads = flag_tree_parallelize_loops;
bool changed = false;
struct loop *loop;
+ struct loop *skip_loop = NULL;
struct tree_niter_desc niter_desc;
struct obstack parloop_obstack;
HOST_WIDE_INT estimated;
@@ -2538,6 +2542,19 @@ parallelize_loops (void)
FOR_EACH_LOOP (loop, 0)
{
+ if (loop == skip_loop)
+ {
+ if (dump_file && (dump_flags & TDF_DETAILS))
+ fprintf (dump_file,
+ "Skipping loop %d as inner loop of parallelized loop\n",
+ loop->num);
+
+ skip_loop = loop->inner;
+ continue;
+ }
+ else
+ skip_loop = NULL;
+
reduction_list.empty ();
if (dump_file && (dump_flags & TDF_DETAILS))
{
@@ -2597,6 +2614,7 @@ parallelize_loops (void)
continue;
changed = true;
+ skip_loop = loop->inner;
if (dump_file && (dump_flags & TDF_DETAILS))
{
if (loop->inner)
@@ -2663,6 +2681,12 @@ pass_parallelize_loops::execute (function *fun)
if (parallelize_loops ())
{
fun->curr_properties &= ~(PROP_gimple_eomp);
+
+ loops_state_clear (LOOPS_HAVE_SIMPLE_LATCHES);
+#ifdef ENABLE_CHECKING
+ verify_loop_structure ();
+#endif
+
return TODO_update_ssa;
}
--
1.9.1
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH, PR66846] Mark inner loop for fixup in parloops
2015-07-24 10:31 ` Tom de Vries
@ 2015-07-28 10:14 ` Richard Biener
2015-07-29 12:01 ` Tom de Vries
0 siblings, 1 reply; 9+ messages in thread
From: Richard Biener @ 2015-07-28 10:14 UTC (permalink / raw)
To: Tom de Vries; +Cc: gcc-patches
On Fri, Jul 24, 2015 at 12:10 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
> On 20/07/15 15:04, Tom de Vries wrote:
>>
>> On 16/07/15 12:15, Richard Biener wrote:
>>>
>>> On Thu, Jul 16, 2015 at 11:39 AM, Tom de Vries
>>> <Tom_deVries@mentor.com> wrote:
>>>>
>>>> On 16/07/15 10:44, Richard Biener wrote:
>>>>>
>>>>>
>>>>> On Wed, Jul 15, 2015 at 9:36 PM, Tom de Vries <Tom_deVries@mentor.com>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I.
>>>>>>
>>>>>> In openmp expansion of loops, we do some effort to try to create
>>>>>> matching
>>>>>> loops in the loop state of the child function, f.i.in
>>>>>> expand_omp_for_generic:
>>>>>> ...
>>>>>> struct loop *outer_loop;
>>>>>> if (seq_loop)
>>>>>> outer_loop = l0_bb->loop_father;
>>>>>> else
>>>>>> {
>>>>>> outer_loop = alloc_loop ();
>>>>>> outer_loop->header = l0_bb;
>>>>>> outer_loop->latch = l2_bb;
>>>>>> add_loop (outer_loop, l0_bb->loop_father);
>>>>>> }
>>>>>>
>>>>>> if (!gimple_omp_for_combined_p (fd->for_stmt))
>>>>>> {
>>>>>> struct loop *loop = alloc_loop ();
>>>>>> loop->header = l1_bb;
>>>>>> /* The loop may have multiple latches. */
>>>>>> add_loop (loop, outer_loop);
>>>>>> }
>>>>>> ...
>>>>>>
>>>>>> And if that doesn't work out, we try to mark the loop state for
>>>>>> fixup, in
>>>>>> expand_omp_taskreg and expand_omp_target:
>>>>>> ...
>>>>>> /* When the OMP expansion process cannot guarantee an
>>>>>> up-to-date
>>>>>> loop tree arrange for the child function to fixup
>>>>>> loops. */
>>>>>> if (loops_state_satisfies_p (LOOPS_NEED_FIXUP))
>>>>>> child_cfun->x_current_loops->state |= LOOPS_NEED_FIXUP;
>>>>>> ...
>>>>>>
>>>>>> and expand_omp_for:
>>>>>> ...
>>>>>> else
>>>>>> /* If there isn't a continue then this is a degerate case where
>>>>>> the introduction of abnormal edges during lowering will
>>>>>> prevent
>>>>>> original loops from being detected. Fix that up. */
>>>>>> loops_state_set (LOOPS_NEED_FIXUP);
>>>>>> ...
>>>>>>
>>>>>> However, loops are fixed up anyway, because the first pass we execute
>>>>>> with
>>>>>> the new child function is pass_fixup_cfg.
>>>>>>
>>>>>> The new child function contains a function call to
>>>>>> __builtin_omp_get_num_threads, which is marked with ECF_CONST, so
>>>>>> execute_fixup_cfg marks the function for TODO_cleanup_cfg, and
>>>>>> subsequently
>>>>>> the loops with LOOPS_NEED_FIXUP.
>>>>>>
>>>>>>
>>>>>> II.
>>>>>>
>>>>>> This patch adds a verification that at the end of the omp-expand
>>>>>> processing
>>>>>> of the child function, either the loop structure is ok, or marked for
>>>>>> fixup.
>>>>>>
>>>>>> This verfication triggered a failure in parloops. When an outer
>>>>>> loop is
>>>>>> being parallelized, both the outer and inner loop are cancelled. Then
>>>>>> during
>>>>>> omp-expansion, we create a loop in the loop state for the outer
>>>>>> loop (the
>>>>>> one that is transformed), but not for the inner, which causes the
>>>>>> verification failure:
>>>>>> ...
>>>>>> outer-1.c:11:3: error: loop with header 5 not in loop tree
>>>>>> ...
>>>>>>
>>>>>> [ I ran into this verification failure with an openacc kernels
>>>>>> testcase
>>>>>> on
>>>>>> the gomp-4_0-branch, where parloops is called additionally from a
>>>>>> different
>>>>>> location, and pass_fixup_cfg is not the first pass that the child
>>>>>> function
>>>>>> is processed by. ]
>>>>>>
>>>>>> The patch contains a bit that makes sure that the loop state of the
>>>>>> child
>>>>>> function is marked for fixup in parloops. The bit is non-trival
>>>>>> since it
>>>>>> create a loop state and sets the fixup flag on the loop state, but
>>>>>> postpones
>>>>>> the init_loops_structure call till move_sese_region_to_fn, where it
>>>>>> can
>>>>>> succeed.
>>>>>>
>>>>>>
>>
>> <SNIP>
>>
>>> Can we fix the root-cause of the issue instead? That is, build a
>>> valid loop
>>> structure in the first place?
>>>
>>
>> This patch manages to keep the loop structure, that is, to not cancel
>> the loop tree in parloops, and guarantee a valid loop structure at the
>> end of parloops.
>>
>> The transformation to insert the omp_for invalidates the loop state
>> properties LOOPS_HAVE_RECORDED_EXITS and LOOPS_HAVE_SIMPLE_LATCHES, so
>> we drop those in parloops.
>>
>> In expand_omp_for_static_nochunk, we detect the existing loop struct of
>> the omp_for, and keep it.
>>
>> Then by calling pass_tree_loop_init after pass_expand_omp_ssa, we get
>> the loop state properties LOOPS_HAVE_RECORDED_EXITS and
>> LOOPS_HAVE_SIMPLE_LATCHES back.
>>
>
> This updated patch tries a more minimal approach.
>
> Rather than dropping property LOOPS_HAVE_RECORDED_EXITS, we record the new
> exit instead.
>
> And rather than adding pass_tree_loop_init after pass_expand_omp_ssa, we
> just set LOOPS_HAVE_SIMPLE_LATCHES back at the end of pass_expand_omp_ssa.
>
> Bootstrapped and reg-tested on x86_64.
>
> OK for trunk?
I wonder about the need to clear LOOPS_HAVE_SIMPLE_LATCHES (and esp. turning
that back on in execute_expand_omp). The parloops code lacks comments and
the /* Prepare cfg. */ part looks twisty to me - but I don't see why
it should be difficult
to preserve simple latches as well - is this just because we insert
the GIMPLE_OMP_CONTINUE
in it?
If execute_expand_omp is not performed in a loop pipeline where things
expect simple latches
(well, obviously it isn't) then why set LOOPS_HAVE_SIMPLE_LATCHES here
at all? If somebody
needs it it will just request simple latches.
So if the GIMPLE_OMP_CONTINUE part is correct then I'd prefer to keep
LOOPS_HAVE_SIMPLE_LATCHES
unset in execute_expand_omp.
Ok with that change.
Thanks,
Richard.
> Thanks,
> - Tom
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH, PR66846] Mark inner loop for fixup in parloops
2015-07-28 10:14 ` Richard Biener
@ 2015-07-29 12:01 ` Tom de Vries
2015-07-29 12:20 ` Richard Biener
0 siblings, 1 reply; 9+ messages in thread
From: Tom de Vries @ 2015-07-29 12:01 UTC (permalink / raw)
To: Richard Biener; +Cc: gcc-patches
[-- Attachment #1: Type: text/plain, Size: 6674 bytes --]
On 28/07/15 12:11, Richard Biener wrote:
> On Fri, Jul 24, 2015 at 12:10 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>> On 20/07/15 15:04, Tom de Vries wrote:
>>>
>>> On 16/07/15 12:15, Richard Biener wrote:
>>>>
>>>> On Thu, Jul 16, 2015 at 11:39 AM, Tom de Vries
>>>> <Tom_deVries@mentor.com> wrote:
>>>>>
>>>>> On 16/07/15 10:44, Richard Biener wrote:
>>>>>>
>>>>>>
>>>>>> On Wed, Jul 15, 2015 at 9:36 PM, Tom de Vries <Tom_deVries@mentor.com>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I.
>>>>>>>
>>>>>>> In openmp expansion of loops, we do some effort to try to create
>>>>>>> matching
>>>>>>> loops in the loop state of the child function, f.i.in
>>>>>>> expand_omp_for_generic:
>>>>>>> ...
>>>>>>> struct loop *outer_loop;
>>>>>>> if (seq_loop)
>>>>>>> outer_loop = l0_bb->loop_father;
>>>>>>> else
>>>>>>> {
>>>>>>> outer_loop = alloc_loop ();
>>>>>>> outer_loop->header = l0_bb;
>>>>>>> outer_loop->latch = l2_bb;
>>>>>>> add_loop (outer_loop, l0_bb->loop_father);
>>>>>>> }
>>>>>>>
>>>>>>> if (!gimple_omp_for_combined_p (fd->for_stmt))
>>>>>>> {
>>>>>>> struct loop *loop = alloc_loop ();
>>>>>>> loop->header = l1_bb;
>>>>>>> /* The loop may have multiple latches. */
>>>>>>> add_loop (loop, outer_loop);
>>>>>>> }
>>>>>>> ...
>>>>>>>
>>>>>>> And if that doesn't work out, we try to mark the loop state for
>>>>>>> fixup, in
>>>>>>> expand_omp_taskreg and expand_omp_target:
>>>>>>> ...
>>>>>>> /* When the OMP expansion process cannot guarantee an
>>>>>>> up-to-date
>>>>>>> loop tree arrange for the child function to fixup
>>>>>>> loops. */
>>>>>>> if (loops_state_satisfies_p (LOOPS_NEED_FIXUP))
>>>>>>> child_cfun->x_current_loops->state |= LOOPS_NEED_FIXUP;
>>>>>>> ...
>>>>>>>
>>>>>>> and expand_omp_for:
>>>>>>> ...
>>>>>>> else
>>>>>>> /* If there isn't a continue then this is a degerate case where
>>>>>>> the introduction of abnormal edges during lowering will
>>>>>>> prevent
>>>>>>> original loops from being detected. Fix that up. */
>>>>>>> loops_state_set (LOOPS_NEED_FIXUP);
>>>>>>> ...
>>>>>>>
>>>>>>> However, loops are fixed up anyway, because the first pass we execute
>>>>>>> with
>>>>>>> the new child function is pass_fixup_cfg.
>>>>>>>
>>>>>>> The new child function contains a function call to
>>>>>>> __builtin_omp_get_num_threads, which is marked with ECF_CONST, so
>>>>>>> execute_fixup_cfg marks the function for TODO_cleanup_cfg, and
>>>>>>> subsequently
>>>>>>> the loops with LOOPS_NEED_FIXUP.
>>>>>>>
>>>>>>>
>>>>>>> II.
>>>>>>>
>>>>>>> This patch adds a verification that at the end of the omp-expand
>>>>>>> processing
>>>>>>> of the child function, either the loop structure is ok, or marked for
>>>>>>> fixup.
>>>>>>>
>>>>>>> This verfication triggered a failure in parloops. When an outer
>>>>>>> loop is
>>>>>>> being parallelized, both the outer and inner loop are cancelled. Then
>>>>>>> during
>>>>>>> omp-expansion, we create a loop in the loop state for the outer
>>>>>>> loop (the
>>>>>>> one that is transformed), but not for the inner, which causes the
>>>>>>> verification failure:
>>>>>>> ...
>>>>>>> outer-1.c:11:3: error: loop with header 5 not in loop tree
>>>>>>> ...
>>>>>>>
>>>>>>> [ I ran into this verification failure with an openacc kernels
>>>>>>> testcase
>>>>>>> on
>>>>>>> the gomp-4_0-branch, where parloops is called additionally from a
>>>>>>> different
>>>>>>> location, and pass_fixup_cfg is not the first pass that the child
>>>>>>> function
>>>>>>> is processed by. ]
>>>>>>>
>>>>>>> The patch contains a bit that makes sure that the loop state of the
>>>>>>> child
>>>>>>> function is marked for fixup in parloops. The bit is non-trival
>>>>>>> since it
>>>>>>> create a loop state and sets the fixup flag on the loop state, but
>>>>>>> postpones
>>>>>>> the init_loops_structure call till move_sese_region_to_fn, where it
>>>>>>> can
>>>>>>> succeed.
>>>>>>>
>>>>>>>
>>>
>>> <SNIP>
>>>
>>>> Can we fix the root-cause of the issue instead? That is, build a
>>>> valid loop
>>>> structure in the first place?
>>>>
>>>
>>> This patch manages to keep the loop structure, that is, to not cancel
>>> the loop tree in parloops, and guarantee a valid loop structure at the
>>> end of parloops.
>>>
>>> The transformation to insert the omp_for invalidates the loop state
>>> properties LOOPS_HAVE_RECORDED_EXITS and LOOPS_HAVE_SIMPLE_LATCHES, so
>>> we drop those in parloops.
>>>
>>> In expand_omp_for_static_nochunk, we detect the existing loop struct of
>>> the omp_for, and keep it.
>>>
>>> Then by calling pass_tree_loop_init after pass_expand_omp_ssa, we get
>>> the loop state properties LOOPS_HAVE_RECORDED_EXITS and
>>> LOOPS_HAVE_SIMPLE_LATCHES back.
>>>
>>
>> This updated patch tries a more minimal approach.
>>
>> Rather than dropping property LOOPS_HAVE_RECORDED_EXITS, we record the new
>> exit instead.
>>
>> And rather than adding pass_tree_loop_init after pass_expand_omp_ssa, we
>> just set LOOPS_HAVE_SIMPLE_LATCHES back at the end of pass_expand_omp_ssa.
>>
>> Bootstrapped and reg-tested on x86_64.
>>
>> OK for trunk?
>
> I wonder about the need to clear LOOPS_HAVE_SIMPLE_LATCHES (and esp. turning
> that back on in execute_expand_omp). The parloops code lacks comments and
> the /* Prepare cfg. */ part looks twisty to me - but I don't see why
> it should be difficult
> to preserve simple latches as well - is this just because we insert
> the GIMPLE_OMP_CONTINUE
> in it?
>
It's not difficult to do. It's just that the omp-for that is generated
via the normal route (omp-annotated source code) doesn't have this
simple latch, and parloops just mimics that cfg shape.
Attached updated patch preserves simple latches in parloops. We need
some minor adjustments in omp-expand to handle that.
> If execute_expand_omp is not performed in a loop pipeline where things
> expect simple latches
> (well, obviously it isn't) then why set LOOPS_HAVE_SIMPLE_LATCHES here
> at all? If somebody
> needs it it will just request simple latches.
>
I've looked into this. The (first?) pass that gives me trouble is
pass_iv_optimize. It doesn't request simple latches, but it does need them.
> So if the GIMPLE_OMP_CONTINUE part is correct then I'd prefer to keep
> LOOPS_HAVE_SIMPLE_LATCHES
> unset in execute_expand_omp.
>
> Ok with that change.
>
OK with this approach of keeping LOOPS_HAVE_SIMPLE_LATCHES in parloops
(if bootstrap and reg-test succeeds)?
Thanks,
- Tom
[-- Attachment #2: 0001-Don-t-cancel-loop-tree-in-parloops.patch --]
[-- Type: text/x-patch, Size: 8675 bytes --]
Don't cancel loop tree in parloops
2015-07-28 Tom de Vries <tom@codesourcery.com>
PR tree-optimization/66846
* omp-low.c (expand_omp_taskreg) [ENABLE_CHECKING]: Call
verify_loop_structure for child_cfun if !LOOPS_NEED_FIXUP.
(expand_omp_target) [ENABLE_CHECKING]: Same.
(execute_expand_omp) [ENABLE_CHECKING]: Call verify_loop_structure for
cfun if !LOOPS_NEED_FIXUP.
(expand_omp_for_static_nochunk): Handle simple latch bb. Handle case
that omp_for already has its own loop struct.
* tree-parloops.c (create_phi_for_local_result)
(create_call_for_reduction): Handle simple latch bb.
(create_parallel_loop): Add simple latch bb to preserve
LOOPS_HAVE_SIMPLE_LATCHES. Record new exit. Handle simple latch bb.
(gen_parallel_loop): Remove call to cancel_loop_tree.
(parallelize_loops): Skip loops that are inner loops of parallelized
loops.
(pass_parallelize_loops::execute) [ENABLE_CHECKING]: Call
verify_loop_structure.
---
gcc/omp-low.c | 32 ++++++++++++++++++++++++++++++--
gcc/tree-parloops.c | 49 ++++++++++++++++++++++++++++++++++++-------------
2 files changed, 66 insertions(+), 15 deletions(-)
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 3135606..0f5c0f1 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -5604,6 +5604,10 @@ expand_omp_taskreg (struct omp_region *region)
}
if (gimple_in_ssa_p (cfun))
update_ssa (TODO_update_ssa);
+#ifdef ENABLE_CHECKING
+ if (!loops_state_satisfies_p (LOOPS_NEED_FIXUP))
+ verify_loop_structure ();
+#endif
pop_cfun ();
}
@@ -6535,7 +6539,8 @@ expand_omp_for_static_nochunk (struct omp_region *region,
body_bb = single_succ (seq_start_bb);
if (!broken_loop)
{
- gcc_assert (BRANCH_EDGE (cont_bb)->dest == body_bb);
+ gcc_assert (BRANCH_EDGE (cont_bb)->dest == body_bb
+ || single_succ (BRANCH_EDGE (cont_bb)->dest) == body_bb);
gcc_assert (EDGE_COUNT (cont_bb->succs) == 2);
}
exit_bb = region->exit;
@@ -6818,6 +6823,11 @@ expand_omp_for_static_nochunk (struct omp_region *region,
if (!broken_loop)
{
ep = find_edge (cont_bb, body_bb);
+ if (ep == NULL)
+ {
+ ep = BRANCH_EDGE (cont_bb);
+ gcc_assert (single_succ (ep->dest) == body_bb);
+ }
if (gimple_omp_for_combined_p (fd->for_stmt))
{
remove_edge (ep);
@@ -6843,9 +6853,19 @@ expand_omp_for_static_nochunk (struct omp_region *region,
set_immediate_dominator (CDI_DOMINATORS, fin_bb,
recompute_dominator (CDI_DOMINATORS, fin_bb));
+ struct loop *loop = body_bb->loop_father;
+ if (loop != entry_bb->loop_father)
+ {
+ gcc_assert (loop->header == body_bb);
+ gcc_assert (broken_loop
+ || loop->latch == region->cont
+ || single_pred (loop->latch) == region->cont);
+ return;
+ }
+
if (!broken_loop && !gimple_omp_for_combined_p (fd->for_stmt))
{
- struct loop *loop = alloc_loop ();
+ loop = alloc_loop ();
loop->header = body_bb;
if (collapse_bb == NULL)
loop->latch = cont_bb;
@@ -8984,6 +9004,10 @@ expand_omp_target (struct omp_region *region)
if (changed)
cleanup_tree_cfg ();
}
+#ifdef ENABLE_CHECKING
+ if (!loops_state_satisfies_p (LOOPS_NEED_FIXUP))
+ verify_loop_structure ();
+#endif
pop_cfun ();
}
@@ -9492,6 +9516,10 @@ execute_expand_omp (void)
expand_omp (root_omp_region);
+#ifdef ENABLE_CHECKING
+ if (!loops_state_satisfies_p (LOOPS_NEED_FIXUP))
+ verify_loop_structure ();
+#endif
cleanup_tree_cfg ();
free_omp_regions ();
diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index b06265c..d017479 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -1032,21 +1032,22 @@ create_phi_for_local_result (reduction_info **slot, struct loop *loop)
struct reduction_info *const reduc = *slot;
edge e;
gphi *new_phi;
- basic_block store_bb;
+ basic_block store_bb, continue_bb;
tree local_res;
source_location locus;
/* STORE_BB is the block where the phi
should be stored. It is the destination of the loop exit.
(Find the fallthru edge from GIMPLE_OMP_CONTINUE). */
- store_bb = FALLTHRU_EDGE (loop->latch)->dest;
+ continue_bb = single_pred (loop->latch);
+ store_bb = FALLTHRU_EDGE (continue_bb)->dest;
/* STORE_BB has two predecessors. One coming from the loop
(the reduction's result is computed at the loop),
and another coming from a block preceding the loop,
when no iterations
are executed (the initial value should be taken). */
- if (EDGE_PRED (store_bb, 0) == FALLTHRU_EDGE (loop->latch))
+ if (EDGE_PRED (store_bb, 0) == FALLTHRU_EDGE (continue_bb))
e = EDGE_PRED (store_bb, 1);
else
e = EDGE_PRED (store_bb, 0);
@@ -1055,7 +1056,7 @@ create_phi_for_local_result (reduction_info **slot, struct loop *loop)
locus = gimple_location (reduc->reduc_stmt);
new_phi = create_phi_node (local_res, store_bb);
add_phi_arg (new_phi, reduc->init, e, locus);
- add_phi_arg (new_phi, lhs, FALLTHRU_EDGE (loop->latch), locus);
+ add_phi_arg (new_phi, lhs, FALLTHRU_EDGE (continue_bb), locus);
reduc->new_phi = new_phi;
return 1;
@@ -1134,7 +1135,8 @@ create_call_for_reduction (struct loop *loop,
{
reduction_list->traverse <struct loop *, create_phi_for_local_result> (loop);
/* Find the fallthru edge from GIMPLE_OMP_CONTINUE. */
- ld_st_data->load_bb = FALLTHRU_EDGE (loop->latch)->dest;
+ basic_block continue_bb = single_pred (loop->latch);
+ ld_st_data->load_bb = FALLTHRU_EDGE (continue_bb)->dest;
reduction_list
->traverse <struct clsn_data *, create_call_for_reduction_1> (ld_st_data);
}
@@ -1981,7 +1983,7 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
tree new_data, unsigned n_threads, location_t loc)
{
gimple_stmt_iterator gsi;
- basic_block bb, paral_bb, for_bb, ex_bb;
+ basic_block bb, paral_bb, for_bb, ex_bb, continue_bb;
tree t, param;
gomp_parallel *omp_par_stmt;
gimple omp_return_stmt1, omp_return_stmt2;
@@ -2052,8 +2054,12 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
gcc_assert (exit == single_dom_exit (loop));
guard = make_edge (for_bb, ex_bb, 0);
- single_succ_edge (loop->latch)->flags = 0;
- end = make_edge (loop->latch, ex_bb, EDGE_FALLTHRU);
+ /* Split the latch edge, so LOOPS_HAVE_SIMPLE_LATCHES is still valid. */
+ loop->latch = split_edge (single_succ_edge (loop->latch));
+ single_pred_edge (loop->latch)->flags = 0;
+ end = make_edge (single_pred (loop->latch), ex_bb, EDGE_FALLTHRU);
+ rescan_loop_exit (end, true, false);
+
for (gphi_iterator gpi = gsi_start_phis (ex_bb);
!gsi_end_p (gpi); gsi_next (&gpi))
{
@@ -2102,7 +2108,8 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
SSA_NAME_DEF_STMT (initvar) = for_stmt;
/* Emit GIMPLE_OMP_CONTINUE. */
- gsi = gsi_last_bb (loop->latch);
+ continue_bb = single_pred (loop->latch);
+ gsi = gsi_last_bb (continue_bb);
omp_cont_stmt = gimple_build_omp_continue (cvar_next, cvar);
gimple_set_location (omp_cont_stmt, loc);
gsi_insert_after (&gsi, omp_cont_stmt, GSI_NEW_STMT);
@@ -2298,10 +2305,6 @@ gen_parallel_loop (struct loop *loop,
scev_reset ();
- /* Cancel the loop (it is simpler to do it here rather than to teach the
- expander to do it). */
- cancel_loop_tree (loop);
-
/* Free loop bound estimations that could contain references to
removed statements. */
FOR_EACH_LOOP (loop, 0)
@@ -2587,6 +2590,7 @@ parallelize_loops (void)
unsigned n_threads = flag_tree_parallelize_loops;
bool changed = false;
struct loop *loop;
+ struct loop *skip_loop = NULL;
struct tree_niter_desc niter_desc;
struct obstack parloop_obstack;
HOST_WIDE_INT estimated;
@@ -2604,6 +2608,19 @@ parallelize_loops (void)
FOR_EACH_LOOP (loop, 0)
{
+ if (loop == skip_loop)
+ {
+ if (dump_file && (dump_flags & TDF_DETAILS))
+ fprintf (dump_file,
+ "Skipping loop %d as inner loop of parallelized loop\n",
+ loop->num);
+
+ skip_loop = loop->inner;
+ continue;
+ }
+ else
+ skip_loop = NULL;
+
reduction_list.empty ();
if (dump_file && (dump_flags & TDF_DETAILS))
{
@@ -2663,6 +2680,7 @@ parallelize_loops (void)
continue;
changed = true;
+ skip_loop = loop->inner;
if (dump_file && (dump_flags & TDF_DETAILS))
{
if (loop->inner)
@@ -2729,6 +2747,11 @@ pass_parallelize_loops::execute (function *fun)
if (parallelize_loops ())
{
fun->curr_properties &= ~(PROP_gimple_eomp);
+
+#ifdef ENABLE_CHECKING
+ verify_loop_structure ();
+#endif
+
return TODO_update_ssa;
}
--
1.9.1
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH, PR66846] Mark inner loop for fixup in parloops
2015-07-29 12:01 ` Tom de Vries
@ 2015-07-29 12:20 ` Richard Biener
0 siblings, 0 replies; 9+ messages in thread
From: Richard Biener @ 2015-07-29 12:20 UTC (permalink / raw)
To: Tom de Vries; +Cc: gcc-patches
On Wed, Jul 29, 2015 at 1:38 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
> On 28/07/15 12:11, Richard Biener wrote:
>>
>> On Fri, Jul 24, 2015 at 12:10 PM, Tom de Vries <Tom_deVries@mentor.com>
>> wrote:
>>>
>>> On 20/07/15 15:04, Tom de Vries wrote:
>>>>
>>>>
>>>> On 16/07/15 12:15, Richard Biener wrote:
>>>>>
>>>>>
>>>>> On Thu, Jul 16, 2015 at 11:39 AM, Tom de Vries
>>>>> <Tom_deVries@mentor.com> wrote:
>>>>>>
>>>>>>
>>>>>> On 16/07/15 10:44, Richard Biener wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jul 15, 2015 at 9:36 PM, Tom de Vries
>>>>>>> <Tom_deVries@mentor.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I.
>>>>>>>>
>>>>>>>> In openmp expansion of loops, we do some effort to try to create
>>>>>>>> matching
>>>>>>>> loops in the loop state of the child function, f.i.in
>>>>>>>>
>>>>>>>> expand_omp_for_generic:
>>>>>>>> ...
>>>>>>>> struct loop *outer_loop;
>>>>>>>> if (seq_loop)
>>>>>>>> outer_loop = l0_bb->loop_father;
>>>>>>>> else
>>>>>>>> {
>>>>>>>> outer_loop = alloc_loop ();
>>>>>>>> outer_loop->header = l0_bb;
>>>>>>>> outer_loop->latch = l2_bb;
>>>>>>>> add_loop (outer_loop, l0_bb->loop_father);
>>>>>>>> }
>>>>>>>>
>>>>>>>> if (!gimple_omp_for_combined_p (fd->for_stmt))
>>>>>>>> {
>>>>>>>> struct loop *loop = alloc_loop ();
>>>>>>>> loop->header = l1_bb;
>>>>>>>> /* The loop may have multiple latches. */
>>>>>>>> add_loop (loop, outer_loop);
>>>>>>>> }
>>>>>>>> ...
>>>>>>>>
>>>>>>>> And if that doesn't work out, we try to mark the loop state for
>>>>>>>> fixup, in
>>>>>>>> expand_omp_taskreg and expand_omp_target:
>>>>>>>> ...
>>>>>>>> /* When the OMP expansion process cannot guarantee an
>>>>>>>> up-to-date
>>>>>>>> loop tree arrange for the child function to fixup
>>>>>>>> loops. */
>>>>>>>> if (loops_state_satisfies_p (LOOPS_NEED_FIXUP))
>>>>>>>> child_cfun->x_current_loops->state |= LOOPS_NEED_FIXUP;
>>>>>>>> ...
>>>>>>>>
>>>>>>>> and expand_omp_for:
>>>>>>>> ...
>>>>>>>> else
>>>>>>>> /* If there isn't a continue then this is a degerate case
>>>>>>>> where
>>>>>>>> the introduction of abnormal edges during lowering will
>>>>>>>> prevent
>>>>>>>> original loops from being detected. Fix that up. */
>>>>>>>> loops_state_set (LOOPS_NEED_FIXUP);
>>>>>>>> ...
>>>>>>>>
>>>>>>>> However, loops are fixed up anyway, because the first pass we
>>>>>>>> execute
>>>>>>>> with
>>>>>>>> the new child function is pass_fixup_cfg.
>>>>>>>>
>>>>>>>> The new child function contains a function call to
>>>>>>>> __builtin_omp_get_num_threads, which is marked with ECF_CONST, so
>>>>>>>> execute_fixup_cfg marks the function for TODO_cleanup_cfg, and
>>>>>>>> subsequently
>>>>>>>> the loops with LOOPS_NEED_FIXUP.
>>>>>>>>
>>>>>>>>
>>>>>>>> II.
>>>>>>>>
>>>>>>>> This patch adds a verification that at the end of the omp-expand
>>>>>>>> processing
>>>>>>>> of the child function, either the loop structure is ok, or marked
>>>>>>>> for
>>>>>>>> fixup.
>>>>>>>>
>>>>>>>> This verfication triggered a failure in parloops. When an outer
>>>>>>>> loop is
>>>>>>>> being parallelized, both the outer and inner loop are cancelled.
>>>>>>>> Then
>>>>>>>> during
>>>>>>>> omp-expansion, we create a loop in the loop state for the outer
>>>>>>>> loop (the
>>>>>>>> one that is transformed), but not for the inner, which causes the
>>>>>>>> verification failure:
>>>>>>>> ...
>>>>>>>> outer-1.c:11:3: error: loop with header 5 not in loop tree
>>>>>>>> ...
>>>>>>>>
>>>>>>>> [ I ran into this verification failure with an openacc kernels
>>>>>>>> testcase
>>>>>>>> on
>>>>>>>> the gomp-4_0-branch, where parloops is called additionally from a
>>>>>>>> different
>>>>>>>> location, and pass_fixup_cfg is not the first pass that the child
>>>>>>>> function
>>>>>>>> is processed by. ]
>>>>>>>>
>>>>>>>> The patch contains a bit that makes sure that the loop state of the
>>>>>>>> child
>>>>>>>> function is marked for fixup in parloops. The bit is non-trival
>>>>>>>> since it
>>>>>>>> create a loop state and sets the fixup flag on the loop state, but
>>>>>>>> postpones
>>>>>>>> the init_loops_structure call till move_sese_region_to_fn, where it
>>>>>>>> can
>>>>>>>> succeed.
>>>>>>>>
>>>>>>>>
>>>>
>>>> <SNIP>
>>>>
>>>>> Can we fix the root-cause of the issue instead? That is, build a
>>>>> valid loop
>>>>> structure in the first place?
>>>>>
>>>>
>>>> This patch manages to keep the loop structure, that is, to not cancel
>>>> the loop tree in parloops, and guarantee a valid loop structure at the
>>>> end of parloops.
>>>>
>>>> The transformation to insert the omp_for invalidates the loop state
>>>> properties LOOPS_HAVE_RECORDED_EXITS and LOOPS_HAVE_SIMPLE_LATCHES, so
>>>> we drop those in parloops.
>>>>
>>>> In expand_omp_for_static_nochunk, we detect the existing loop struct of
>>>> the omp_for, and keep it.
>>>>
>>>> Then by calling pass_tree_loop_init after pass_expand_omp_ssa, we get
>>>> the loop state properties LOOPS_HAVE_RECORDED_EXITS and
>>>> LOOPS_HAVE_SIMPLE_LATCHES back.
>>>>
>>>
>>> This updated patch tries a more minimal approach.
>>>
>>> Rather than dropping property LOOPS_HAVE_RECORDED_EXITS, we record the
>>> new
>>> exit instead.
>>>
>>> And rather than adding pass_tree_loop_init after pass_expand_omp_ssa, we
>>> just set LOOPS_HAVE_SIMPLE_LATCHES back at the end of
>>> pass_expand_omp_ssa.
>>>
>>> Bootstrapped and reg-tested on x86_64.
>>>
>>> OK for trunk?
>>
>>
>> I wonder about the need to clear LOOPS_HAVE_SIMPLE_LATCHES (and esp.
>> turning
>> that back on in execute_expand_omp). The parloops code lacks comments and
>> the /* Prepare cfg. */ part looks twisty to me - but I don't see why
>> it should be difficult
>> to preserve simple latches as well - is this just because we insert
>> the GIMPLE_OMP_CONTINUE
>> in it?
>>
>
> It's not difficult to do. It's just that the omp-for that is generated via
> the normal route (omp-annotated source code) doesn't have this simple latch,
> and parloops just mimics that cfg shape.
>
> Attached updated patch preserves simple latches in parloops. We need some
> minor adjustments in omp-expand to handle that.
>
>> If execute_expand_omp is not performed in a loop pipeline where things
>> expect simple latches
>> (well, obviously it isn't) then why set LOOPS_HAVE_SIMPLE_LATCHES here
>> at all? If somebody
>> needs it it will just request simple latches.
>>
>
> I've looked into this. The (first?) pass that gives me trouble is
> pass_iv_optimize. It doesn't request simple latches, but it does need them.
Well, pass_iv_optimize is inside the loop pass which ensures we have
LOOPS_NORMAL.
Passes in it rely on that.
>> So if the GIMPLE_OMP_CONTINUE part is correct then I'd prefer to keep
>> LOOPS_HAVE_SIMPLE_LATCHES
>> unset in execute_expand_omp.
>>
>> Ok with that change.
>>
>
> OK with this approach of keeping LOOPS_HAVE_SIMPLE_LATCHES in parloops (if
> bootstrap and reg-test succeeds)?
Yes.
Thanks,
Richard.
> Thanks,
> - Tom
>
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2015-07-29 12:03 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-15 19:49 [PATCH, PR66846] Mark inner loop for fixup in parloops Tom de Vries
2015-07-16 8:46 ` Richard Biener
2015-07-16 10:12 ` Tom de Vries
2015-07-16 10:23 ` Richard Biener
2015-07-20 13:38 ` Tom de Vries
2015-07-24 10:31 ` Tom de Vries
2015-07-28 10:14 ` Richard Biener
2015-07-29 12:01 ` Tom de Vries
2015-07-29 12:20 ` Richard Biener
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).