Hi!

On Mon, 18 Jan 2016 14:07:11 +0100, Tom de Vries <Tom_deVries@mentor.com> wrote:
> Add oacc_kernels_p argument to pass_parallelize_loops

> --- a/gcc/tree-parloops.c
> +++ b/gcc/tree-parloops.c

> @@ -2315,6 +2367,9 @@ gen_parallel_loop (struct loop *loop,

|   /* Ensure that the exit condition is the first statement in the loop.
|      The common case is that latch of the loop is empty (apart from the
|      increment) and immediately follows the loop exit test.  Attempt to move the
|      entry of the loop directly before the exit check and increase the number of
|      iterations of the loop by one.  */
|   if (try_transform_to_exit_first_loop_alt (loop, reduction_list, nit))
|     {
|       if (dump_file
| 	  && (dump_flags & TDF_DETAILS))
| 	fprintf (dump_file,
| 		 "alternative exit-first loop transform succeeded"
| 		 " for loop %d\n", loop->num);
|     }
|   else
|     {
> +      if (oacc_kernels_p)
> +	n_threads = 1;
> +
|       /* Fall back on the method that handles more cases, but duplicates the
| 	 loop body: move the exit condition of LOOP to the beginning of its
| 	 header, and duplicate the part of the last iteration that gets disabled
| 	 to the exit of the loop.  */
|       transform_to_exit_first_loop (loop, reduction_list, nit);
|     }

Just for my own education: this pessimization "n_threads = 1" for OpenACC
kernels is because the duplicated loop bodies generated by
transform_to_exit_first_loop are not appropriate for parallel OpenACC
offloading execution?  (Might add a source code comment here?)  Testing
on gomp-4_0-branch, there are no changes in the testsuite if I remove
this hunk.


Grüße
 Thomas