public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [gomp4, committed] Fix parallelization for fortran oacc kernels tests
@ 2015-06-18 10:48 Tom de Vries
  2015-06-18 10:55 ` Richard Biener
  0 siblings, 1 reply; 4+ messages in thread
From: Tom de Vries @ 2015-06-18 10:48 UTC (permalink / raw)
  To: GCC Patches; +Cc: Jakub Jelinek, Richard Biener

[-- Attachment #1: Type: text/plain, Size: 1325 bytes --]

Hi,

I ran into a problem with fortran loops in oacc kernels regions not 
being parallelized, after introducting transform_to_exit_first_loop_alt.

For gfortran.dg/goacc/kernels-loop.f95, we get:
...
#pragma omp target oacc_parallel num_gangs(1)
...
instead of the desired num_gangs (32).

transform_to_exit_first_loop_alt fails because nit is _135, where nit is 
defined by:
...
*_105 = 0;
D__lsm.27_50 = *_105;
_32 = (unsigned int) D__lsm.27_50;
_135 = 1023 - _32;
...

pass_fre would manage to propagate the '*105 = 0' assignment. But in the 
current pass order, pass_fre is run before pass_lim, where this pattern 
is introduced:
...
               NEXT_PASS (pass_ch_oacc_kernels);
               NEXT_PASS (pass_fre);
               NEXT_PASS (pass_tree_loop_init);
               NEXT_PASS (pass_lim);
               NEXT_PASS (pass_copy_prop);
               NEXT_PASS (pass_scev_cprop);
               NEXT_PASS (pass_parallelize_loops_oacc_kernels);
               NEXT_PASS (pass_expand_omp_ssa);
               NEXT_PASS (pass_tree_loop_done);
...

The patch moves pass_fre to the location of pass_copy_prop, and replaces 
it.  Furthermore, it adds scans to the fortran test-cases to make sure 
they get properly parallelized.

Bootstrapped and reg-tested on x86_64.

Committed to gomp-4_0-branch.

Thanks,
- Tom

[-- Attachment #2: 0003-Fix-parallelization-for-fortran-oacc-kernels-tests.patch --]
[-- Type: text/x-patch, Size: 8124 bytes --]

Fix parallelization for fortran oacc kernels tests

2015-06-18  Tom de Vries  <tom@codesourcery.com>

	* passes.def: Move pass_fre later in pass group pass_oacc_kernels,
	replacing pass_copy_prop.

	* gfortran.dg/goacc/kernels-loop-2.f95: Add check for num_gangs (32).
	* gfortran.dg/goacc/kernels-loop-data-2.f95: Same.
	* gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95: Same.
	* gfortran.dg/goacc/kernels-loop-data-enter-exit.f95: Same.
	* gfortran.dg/goacc/kernels-loop-data-update.f95: Same.
	* gfortran.dg/goacc/kernels-loop-data.f95: Same.
	* gfortran.dg/goacc/kernels-loop.f95: Same.
	* gfortran.dg/goacc/kernels-parallel-loop-data-enter-exit.f95:  Same.
---
 gcc/passes.def                                                       | 5 +++--
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-2.f95                   | 2 ++
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-2.f95              | 2 ++
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95   | 2 ++
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit.f95     | 2 ++
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-update.f95         | 2 ++
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-data.f95                | 2 ++
 gcc/testsuite/gfortran.dg/goacc/kernels-loop.f95                     | 2 ++
 .../gfortran.dg/goacc/kernels-parallel-loop-data-enter-exit.f95      | 2 ++
 9 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/gcc/passes.def b/gcc/passes.def
index da497ed..8b00f17 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -93,10 +93,11 @@ along with GCC; see the file COPYING3.  If not see
 	  NEXT_PASS (pass_oacc_kernels);
 	  PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels)
 	      NEXT_PASS (pass_ch_oacc_kernels);
-	      NEXT_PASS (pass_fre);
 	      NEXT_PASS (pass_tree_loop_init);
 	      NEXT_PASS (pass_lim);
-	      NEXT_PASS (pass_copy_prop);
+	      NEXT_PASS (pass_tree_loop_done);
+	      NEXT_PASS (pass_fre);
+	      NEXT_PASS (pass_tree_loop_init);
 	      NEXT_PASS (pass_scev_cprop);
       	      NEXT_PASS (pass_parallelize_loops_oacc_kernels);
 	      NEXT_PASS (pass_expand_omp_ssa);
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-2.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-2.f95
index bef69f8..634c670 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-2.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-2.f95
@@ -42,5 +42,7 @@ end program main
 ! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.1 " 1 "optimized" } }
 ! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.2 " 1 "optimized" } }
 
+! { dg-final { scan-tree-dump-times "(?n)pragma omp target oacc_parallel.*num_gangs\\(32\\)" 3 "parloops_oacc_kernels" } }
+
 ! { dg-final { cleanup-tree-dump "parloops_oacc_kernels" } }
 ! { dg-final { cleanup-tree-dump "optimized" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-2.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-2.f95
index 1b75a23..58d92ff 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-2.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-2.f95
@@ -48,5 +48,7 @@ end program main
 ! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.1 " 1 "optimized" } }
 ! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.2 " 1 "optimized" } }
 
+! { dg-final { scan-tree-dump-times "(?n)pragma omp target oacc_parallel.*num_gangs\\(32\\)" 3 "parloops_oacc_kernels" } }
+
 ! { dg-final { cleanup-tree-dump "parloops_oacc_kernels" } }
 ! { dg-final { cleanup-tree-dump "optimized" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95
index 4ba83b6..689002d 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95
@@ -48,5 +48,7 @@ end program main
 ! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.1 " 1 "optimized" } }
 ! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.2 " 1 "optimized" } }
 
+! { dg-final { scan-tree-dump-times "(?n)pragma omp target oacc_parallel.*num_gangs\\(32\\)" 3 "parloops_oacc_kernels" } }
+
 ! { dg-final { cleanup-tree-dump "parloops_oacc_kernels" } }
 ! { dg-final { cleanup-tree-dump "optimized" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit.f95
index 2b05b33..0145754 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit.f95
@@ -46,5 +46,7 @@ end program main
 ! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.1 " 1 "optimized" } }
 ! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.2 " 1 "optimized" } }
 
+! { dg-final { scan-tree-dump-times "(?n)pragma omp target oacc_parallel.*num_gangs\\(32\\)" 3 "parloops_oacc_kernels" } }
+
 ! { dg-final { cleanup-tree-dump "parloops_oacc_kernels" } }
 ! { dg-final { cleanup-tree-dump "optimized" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-update.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-update.f95
index b3c80dc..2c2da27 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-update.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-update.f95
@@ -45,5 +45,7 @@ end program main
 ! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.0 " 1 "optimized" } }
 ! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.1 " 1 "optimized" } }
 
+! { dg-final { scan-tree-dump-times "(?n)pragma omp target oacc_parallel.*num_gangs\\(32\\)" 2 "parloops_oacc_kernels" } }
+
 ! { dg-final { cleanup-tree-dump "parloops_oacc_kernels" } }
 ! { dg-final { cleanup-tree-dump "optimized" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data.f95
index 98c5e7a..f61fe7e 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data.f95
@@ -46,5 +46,7 @@ end program main
 ! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.1 " 1 "optimized" } }
 ! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.2 " 1 "optimized" } }
 
+! { dg-final { scan-tree-dump-times "(?n)pragma omp target oacc_parallel.*num_gangs\\(32\\)" 3 "parloops_oacc_kernels" } }
+
 ! { dg-final { cleanup-tree-dump "parloops_oacc_kernels" } }
 ! { dg-final { cleanup-tree-dump "optimized" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop.f95
index be5f26d..f9f8bbd 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop.f95
@@ -36,5 +36,7 @@ end program main
 ! Check that the loop has been split off into a function.
 ! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.0 " 1 "optimized" } }
 
+! { dg-final { scan-tree-dump-times "(?n)pragma omp target oacc_parallel.*num_gangs\\(32\\)" 1 "parloops_oacc_kernels" } }
+
 ! { dg-final { cleanup-tree-dump "parloops_oacc_kernels" } }
 ! { dg-final { cleanup-tree-dump "optimized" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-parallel-loop-data-enter-exit.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-parallel-loop-data-enter-exit.f95
index 7ea2b49..c6ab436 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-parallel-loop-data-enter-exit.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-parallel-loop-data-enter-exit.f95
@@ -47,5 +47,7 @@ end program main
 ! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.1 " 1 "optimized" } }
 ! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.2 " 1 "optimized" } }
 
+! { dg-final { scan-tree-dump-times "(?n)pragma omp target oacc_parallel.*num_gangs\\(32\\)" 2 "parloops_oacc_kernels" } }
+
 ! { dg-final { cleanup-tree-dump "parloops_oacc_kernels" } }
 ! { dg-final { cleanup-tree-dump "optimized" } }
-- 
1.9.1


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [gomp4, committed] Fix parallelization for fortran oacc kernels tests
  2015-06-18 10:48 [gomp4, committed] Fix parallelization for fortran oacc kernels tests Tom de Vries
@ 2015-06-18 10:55 ` Richard Biener
  2015-06-18 11:35   ` Tom de Vries
  0 siblings, 1 reply; 4+ messages in thread
From: Richard Biener @ 2015-06-18 10:55 UTC (permalink / raw)
  To: Tom de Vries; +Cc: GCC Patches, Jakub Jelinek

On Thu, 18 Jun 2015, Tom de Vries wrote:

> Hi,
> 
> I ran into a problem with fortran loops in oacc kernels regions not being
> parallelized, after introducting transform_to_exit_first_loop_alt.
> 
> For gfortran.dg/goacc/kernels-loop.f95, we get:
> ...
> #pragma omp target oacc_parallel num_gangs(1)
> ...
> instead of the desired num_gangs (32).
> 
> transform_to_exit_first_loop_alt fails because nit is _135, where nit is
> defined by:
> ...
> *_105 = 0;
> D__lsm.27_50 = *_105;
> _32 = (unsigned int) D__lsm.27_50;
> _135 = 1023 - _32;
> ...
> 
> pass_fre would manage to propagate the '*105 = 0' assignment. But in the
> current pass order, pass_fre is run before pass_lim, where this pattern is
> introduced:
> ...
>               NEXT_PASS (pass_ch_oacc_kernels);
>               NEXT_PASS (pass_fre);
>               NEXT_PASS (pass_tree_loop_init);
>               NEXT_PASS (pass_lim);
>               NEXT_PASS (pass_copy_prop);
>               NEXT_PASS (pass_scev_cprop);
>               NEXT_PASS (pass_parallelize_loops_oacc_kernels);
>               NEXT_PASS (pass_expand_omp_ssa);
>               NEXT_PASS (pass_tree_loop_done);
> ...
> 
> The patch moves pass_fre to the location of pass_copy_prop, and replaces it.
> Furthermore, it adds scans to the fortran test-cases to make sure they get
> properly parallelized.

You may now figure out that LIM needs FRE to detect equal memory
references to apply store-motion.  But maybe the issues oacc
lowering introduces are limited and under your control.

Richard.

> Bootstrapped and reg-tested on x86_64.
> 
> Committed to gomp-4_0-branch.
> 
> Thanks,
> - Tom
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [gomp4, committed] Fix parallelization for fortran oacc kernels tests
  2015-06-18 10:55 ` Richard Biener
@ 2015-06-18 11:35   ` Tom de Vries
  2015-06-18 11:50     ` Richard Biener
  0 siblings, 1 reply; 4+ messages in thread
From: Tom de Vries @ 2015-06-18 11:35 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches, Jakub Jelinek

On 18/06/15 12:48, Richard Biener wrote:
> On Thu, 18 Jun 2015, Tom de Vries wrote:
>
>> Hi,
>>
>> I ran into a problem with fortran loops in oacc kernels regions not being
>> parallelized, after introducting transform_to_exit_first_loop_alt.
>>
>> For gfortran.dg/goacc/kernels-loop.f95, we get:
>> ...
>> #pragma omp target oacc_parallel num_gangs(1)
>> ...
>> instead of the desired num_gangs (32).
>>
>> transform_to_exit_first_loop_alt fails because nit is _135, where nit is
>> defined by:
>> ...
>> *_105 = 0;
>> D__lsm.27_50 = *_105;
>> _32 = (unsigned int) D__lsm.27_50;
>> _135 = 1023 - _32;
>> ...
>>
>> pass_fre would manage to propagate the '*105 = 0' assignment. But in the
>> current pass order, pass_fre is run before pass_lim, where this pattern is
>> introduced:
>> ...
>>                NEXT_PASS (pass_ch_oacc_kernels);
>>                NEXT_PASS (pass_fre);
>>                NEXT_PASS (pass_tree_loop_init);
>>                NEXT_PASS (pass_lim);
>>                NEXT_PASS (pass_copy_prop);
>>                NEXT_PASS (pass_scev_cprop);
>>                NEXT_PASS (pass_parallelize_loops_oacc_kernels);
>>                NEXT_PASS (pass_expand_omp_ssa);
>>                NEXT_PASS (pass_tree_loop_done);
>> ...
>>
>> The patch moves pass_fre to the location of pass_copy_prop, and replaces it.
>> Furthermore, it adds scans to the fortran test-cases to make sure they get
>> properly parallelized.
>
> You may now figure out that LIM needs FRE to detect equal memory
> references to apply store-motion.  But maybe the issues oacc
> lowering introduces are limited and under your control.
>

To show the context of the pass group, after this commit the pass group 
looks like this:
...
           NEXT_PASS (pass_sra_early);
           NEXT_PASS (pass_build_ealias);
           NEXT_PASS (pass_fre);
           NEXT_PASS (pass_oacc_kernels);
           PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels)
               NEXT_PASS (pass_ch_oacc_kernels);
               NEXT_PASS (pass_tree_loop_init);
               NEXT_PASS (pass_lim);
               NEXT_PASS (pass_tree_loop_done);
               NEXT_PASS (pass_fre);
               NEXT_PASS (pass_tree_loop_init);
               NEXT_PASS (pass_scev_cprop);
               NEXT_PASS (pass_parallelize_loops_oacc_kernels);
               NEXT_PASS (pass_expand_omp_ssa);
               NEXT_PASS (pass_tree_loop_done);
           POP_INSERT_PASSES ()
           NEXT_PASS (pass_merge_phi);
           NEXT_PASS (pass_dse);
           NEXT_PASS (pass_cd_dce);
...
In other words, the pass group is run directly after pass_fre.

When I move pass_fre before the pass group to directly after the pass 
group, I start seeing the failure mode you describe.

Thanks,
- Tom

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [gomp4, committed] Fix parallelization for fortran oacc kernels tests
  2015-06-18 11:35   ` Tom de Vries
@ 2015-06-18 11:50     ` Richard Biener
  0 siblings, 0 replies; 4+ messages in thread
From: Richard Biener @ 2015-06-18 11:50 UTC (permalink / raw)
  To: Tom de Vries; +Cc: GCC Patches, Jakub Jelinek

On Thu, 18 Jun 2015, Tom de Vries wrote:

> On 18/06/15 12:48, Richard Biener wrote:
> > On Thu, 18 Jun 2015, Tom de Vries wrote:
> > 
> > > Hi,
> > > 
> > > I ran into a problem with fortran loops in oacc kernels regions not being
> > > parallelized, after introducting transform_to_exit_first_loop_alt.
> > > 
> > > For gfortran.dg/goacc/kernels-loop.f95, we get:
> > > ...
> > > #pragma omp target oacc_parallel num_gangs(1)
> > > ...
> > > instead of the desired num_gangs (32).
> > > 
> > > transform_to_exit_first_loop_alt fails because nit is _135, where nit is
> > > defined by:
> > > ...
> > > *_105 = 0;
> > > D__lsm.27_50 = *_105;
> > > _32 = (unsigned int) D__lsm.27_50;
> > > _135 = 1023 - _32;
> > > ...
> > > 
> > > pass_fre would manage to propagate the '*105 = 0' assignment. But in the
> > > current pass order, pass_fre is run before pass_lim, where this pattern is
> > > introduced:
> > > ...
> > >                NEXT_PASS (pass_ch_oacc_kernels);
> > >                NEXT_PASS (pass_fre);
> > >                NEXT_PASS (pass_tree_loop_init);
> > >                NEXT_PASS (pass_lim);
> > >                NEXT_PASS (pass_copy_prop);
> > >                NEXT_PASS (pass_scev_cprop);
> > >                NEXT_PASS (pass_parallelize_loops_oacc_kernels);
> > >                NEXT_PASS (pass_expand_omp_ssa);
> > >                NEXT_PASS (pass_tree_loop_done);
> > > ...
> > > 
> > > The patch moves pass_fre to the location of pass_copy_prop, and replaces
> > > it.
> > > Furthermore, it adds scans to the fortran test-cases to make sure they get
> > > properly parallelized.
> > 
> > You may now figure out that LIM needs FRE to detect equal memory
> > references to apply store-motion.  But maybe the issues oacc
> > lowering introduces are limited and under your control.
> > 
> 
> To show the context of the pass group, after this commit the pass group looks
> like this:
> ...
>           NEXT_PASS (pass_sra_early);
>           NEXT_PASS (pass_build_ealias);
>           NEXT_PASS (pass_fre);
>           NEXT_PASS (pass_oacc_kernels);
>           PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels)
>               NEXT_PASS (pass_ch_oacc_kernels);
>               NEXT_PASS (pass_tree_loop_init);
>               NEXT_PASS (pass_lim);
>               NEXT_PASS (pass_tree_loop_done);
>               NEXT_PASS (pass_fre);
>               NEXT_PASS (pass_tree_loop_init);
>               NEXT_PASS (pass_scev_cprop);
>               NEXT_PASS (pass_parallelize_loops_oacc_kernels);
>               NEXT_PASS (pass_expand_omp_ssa);
>               NEXT_PASS (pass_tree_loop_done);
>           POP_INSERT_PASSES ()
>           NEXT_PASS (pass_merge_phi);
>           NEXT_PASS (pass_dse);
>           NEXT_PASS (pass_cd_dce);
> ...
> In other words, the pass group is run directly after pass_fre.
> 
> When I move pass_fre before the pass group to directly after the pass group, I
> start seeing the failure mode you describe.

Yes, it really depends on what kind of changes pass_oacc_kernels
does (though pass_ch_oacc_kernels which is loop-header copying? may
also do relevant changes enabling LIM/store-motion after FRE cleanup
if there is a loop nest involved)

Richard.

> Thanks,
> - Tom
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-06-18 11:40 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-18 10:48 [gomp4, committed] Fix parallelization for fortran oacc kernels tests Tom de Vries
2015-06-18 10:55 ` Richard Biener
2015-06-18 11:35   ` Tom de Vries
2015-06-18 11:50     ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).