[was: Re: [PIING][PATCH, 9/16] Add pass_parallelize_loops_oacc_kernels ] On 14/12/15 16:22, Richard Biener wrote: > On Sun, Dec 13, 2015 at 5:58 PM, Tom de Vries wrote: >> On 24/11/15 13:24, Tom de Vries wrote: >>> >>> On 16/11/15 12:59, Tom de Vries wrote: >>>> >>>> On 09/11/15 20:52, Tom de Vries wrote: >>>>> >>>>> On 09/11/15 16:35, Tom de Vries wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> this patch series for stage1 trunk adds support to: >>>>>> - parallelize oacc kernels regions using parloops, and >>>>>> - map the loops onto the oacc gang dimension. >>>>>> >>>>>> The patch series contains these patches: >>>>>> >>>>>> 1 Insert new exit block only when needed in >>>>>> transform_to_exit_first_loop_alt >>>>>> 2 Make create_parallel_loop return void >>>>>> 3 Ignore reduction clause on kernels directive >>>>>> 4 Implement -foffload-alias >>>>>> 5 Add in_oacc_kernels_region in struct loop >>>>>> 6 Add pass_oacc_kernels >>>>>> 7 Add pass_dominator_oacc_kernels >>>>>> 8 Add pass_ch_oacc_kernels >>>>>> 9 Add pass_parallelize_loops_oacc_kernels >>>>>> 10 Add pass_oacc_kernels pass group in passes.def >>>>>> 11 Update testcases after adding kernels pass group >>>>>> 12 Handle acc loop directive >>>>>> 13 Add c-c++-common/goacc/kernels-*.c >>>>>> 14 Add gfortran.dg/goacc/kernels-*.f95 >>>>>> 15 Add libgomp.oacc-c-c++-common/kernels-*.c >>>>>> 16 Add libgomp.oacc-fortran/kernels-*.f95 >>>>>> >>>>>> The first 9 patches are more or less independent, but patches 10-16 are >>>>>> intended to be committed at the same time. >>>>>> >>>>>> Bootstrapped and reg-tested on x86_64. >>>>>> >>>>>> Build and reg-tested with nvidia accelerator, in combination with a >>>>>> patch that enables accelerator testing (which is submitted at >>>>>> https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ). >>>>>> >>>>>> I'll post the individual patches in reply to this message. >>>>> >>>>> >>>>> This patch adds pass_parallelize_loops_oacc_kernels. >>>>> >>>>> There's a number of things we do differently in parloops for oacc >>>>> kernels: >>>>> - in normal parloops, we generate code to choose between a parallel >>>>> version of the loop, and a sequential (low iteration count) version. >>>>> Since the code in oacc kernels region is supposed to run on the >>>>> accelerator anyway, we skip this check, and don't add a low iteration >>>>> count loop. >>>>> - in normal parloops, we generate an #pragma omp parallel / >>>>> GIMPLE_OMP_RETURN pair to delimit the region which will we split off >>>>> into a thread function. Since the oacc kernels region is already >>>>> split off, we don't add this pair. >>>>> - we indicate the parallelization factor by setting the oacc function >>>>> attributes >>>>> - we generate an #pragma oacc loop instead of an #pragma omp for, and >>>>> we add the gang clause >>>>> - in normal parloops, we rewrite the variable accesses in the loop in >>>>> terms into accesses relative to a thread function parameter. For the >>>>> oacc kernels region, that rewrite has already been done at omp-lower, >>>>> so we skip this. >>>>> - we need to ensure that the entire kernels region can be run in >>>>> parallel. The loop independence check is already present, so for oacc >>>>> kernels we add a check between blocks outside the loop and the entire >>>>> region. >>>>> - we guard stores in the blocks outside the loop with gang_pos == 0. >>>>> There's no need for each gang to write to a single location, we can >>>>> do this in just one gang. (Typically this is the write of the final >>>>> value of the iteration variable if that one is copied back to the >>>>> host). >>>>> >>>> >>>> Reposting with loop optimizer init added in >>>> pass_parallelize_loops_oacc_kernels::execute. >>>> >>> >>> Reposting with loop_optimizer_finalize,scev_initialize and scev_finalize >>> added in pass_parallelize_loops_oacc_kernels::execute. >>> >> >> Ping. >> >> Anything I can do to facilitate the review? > > Document new functions. Done. avoid if (1). Done. > Ideally some refactoring would avoid some of the if (!oacc_kernels_p) spaghetti Ack. For now, i've tried to minimize the number of oacc_kernels_p tests in the code. Further suggestions on how to improve here are much appreciated. > but I'm considering tree-parloops.c (and its bugs) yours. Ack. > Can the pass not just use a pass parameter to switch between oacc/non-oacc? > This patch introduces the pass parameter oacc_kernels_p (but does not instantiate an oacc_kernels_p == true pass version yet). Bootstrapped and reg-tested on x86_64. Committed to trunk. Thanks, - Tom