From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 63678 invoked by alias); 12 Jul 2015 12:46:27 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 63666 invoked by uid 89); 12 Jul 2015 12:46:25 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.7 required=5.0 tests=AWL,BAYES_00,RP_MATCHES_RCVD,SPF_PASS autolearn=ham version=3.3.2 X-HELO: fencepost.gnu.org Received: from fencepost.gnu.org (HELO fencepost.gnu.org) (208.118.235.10) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Sun, 12 Jul 2015 12:46:24 +0000 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37792) by fencepost.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1ZEGeE-00066C-8Y for gcc-patches@gnu.org; Sun, 12 Jul 2015 08:46:22 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZEGe8-0007Z8-Qe for gcc-patches@gnu.org; Sun, 12 Jul 2015 08:46:21 -0400 Received: from relay1.mentorg.com ([192.94.38.131]:48778) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZEGe8-0007VQ-HS for gcc-patches@gnu.org; Sun, 12 Jul 2015 08:46:16 -0400 Received: from nat-ies.mentorg.com ([192.94.31.2] helo=SVR-IES-FEM-02.mgc.mentorg.com) by relay1.mentorg.com with esmtp id 1ZEGe3-0003Xg-BU from Tom_deVries@mentor.com ; Sun, 12 Jul 2015 05:46:11 -0700 Received: from [127.0.0.1] (137.202.0.76) by SVR-IES-FEM-02.mgc.mentorg.com (137.202.0.106) with Microsoft SMTP Server id 14.3.224.2; Sun, 12 Jul 2015 13:46:07 +0100 Message-ID: <55A2618A.7050503@mentor.com> Date: Sun, 12 Jul 2015 12:46:00 -0000 From: Tom de Vries User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: "gcc-patches@gnu.org" CC: Jakub Jelinek , Thomas Schwinge Subject: [gomp4, committed] Handle nested loops in kernels regions Content-Type: multipart/mixed; boundary="------------050008070206060407060006" X-detected-operating-system: by eggs.gnu.org: Windows NT kernel [generic] [fuzzy] X-Received-From: 192.94.38.131 X-SW-Source: 2015-07/txt/msg00966.txt.bz2 --------------050008070206060407060006 Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit Content-length: 1578 Hi, I. This patch allows parallelization of an outer loop in an openacc kernels region. The testcase is based on autopar/outer-1.c. II. We rely on pass_lim to move the *.omp_data_i loads out of the loop nest. For the test-case, pass_lim was managing to move the load out of the inner loop, but not the outer loop, because the load was classified as 'MOVE_PRESERVE_EXECUTION'. By marking the *.omp_data_i load non-trapping, it's now classified as 'MOVE_POSSIBLE', and moved out of the loop nest. III. The 'loops_state_set (LOOPS_NEED_FIXUP)' is a somewhat blunt and temporary fix for the oacc kernels variant of PR66846 - parloops does not always mark loops for fixup if needed. The original PR needs an added verify_loop_structure to trigger the problem. Normally the problem is hidden by the fact that the first pass that runs on the new function is pass_fixup_cfg, which happens to fixup the loops (The loops are fixed up because TODO_cleanup_cfg is set during pass_fixup_cfg, because the function contains an ECF_CONST function: __builtin_omp_get_num_threads). For the oacc kernels variant, the problem triggers without adding verify_loop_structure. During pass_ipa_inline, we call loop_optimizer_init, which (given that LOOPS_NEED_FIXUP is not set) verifies the loop structure, which fails. Pass_fixup_cfg is not run inbetween the discovery of the new function and pass_ipa_inline. IV. I've committed this patch to gomp-4_0-branch. Bootstrapped and reg-tested on x86_64. Build and reg-tested on setup with nvidia accelerator. Thanks, - Tom --------------050008070206060407060006 Content-Type: text/x-patch; name="0001-Handle-nested-loops-in-kernels-regions.patch" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="0001-Handle-nested-loops-in-kernels-regions.patch" Content-length: 4374 Handle nested loops in kernels regions 2015-07-12 Tom de Vries * omp-low.c (build_receiver_ref): Mark *.omp_data_i as non-trapping. * tree-parloops.c (gen_parallel_loop): Add LOOPS_NEED_FIXUP to loop state. (parallelize_loops): Allow nested loops. * testsuite/libgomp.oacc-c-c++-common/kernels-loop-nest.c: New test. * c-c++-common/goacc/kernels-loop-nest.c: New test. --- gcc/omp-low.c | 1 + .../c-c++-common/goacc/kernels-loop-nest.c | 42 ++++++++++++++++++++++ gcc/tree-parloops.c | 5 +-- .../libgomp.oacc-c-c++-common/kernels-loop-nest.c | 26 ++++++++++++++ 4 files changed, 70 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-nest.c create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-nest.c diff --git a/gcc/omp-low.c b/gcc/omp-low.c index 11ac909..a938ce0 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -1147,6 +1147,7 @@ build_receiver_ref (tree var, bool by_ref, omp_context *ctx) field = x; x = build_simple_mem_ref (ctx->receiver_decl); + TREE_THIS_NOTRAP (x) = 1; x = omp_build_component_ref (x, field); if (by_ref) x = build_simple_mem_ref (x); diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-nest.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-nest.c new file mode 100644 index 0000000..3e06c9f --- /dev/null +++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-nest.c @@ -0,0 +1,42 @@ +/* { dg-additional-options "-O2" } */ +/* { dg-additional-options "-ftree-parallelize-loops=32" } */ +/* { dg-additional-options "-fdump-tree-parloops_oacc_kernels-all" } */ +/* { dg-additional-options "-fdump-tree-optimized" } */ + +/* Based on autopar/outer-1.c. */ + +#include + +#define N 1000 + +int +main (void) +{ + int x[N][N]; + +#pragma acc kernels copyout (x) + { + for (int ii = 0; ii < N; ii++) + for (int jj = 0; jj < N; jj++) + x[ii][jj] = ii + jj + 3; + } + + for (int i = 0; i < N; i++) + for (int j = 0; j < N; j++) + if (x[i][j] != i + j + 3) + abort (); + + return 0; +} + +/* Check that only one loop is analyzed, and that it can be parallelized. */ +/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops_oacc_kernels" } } */ +/* { dg-final { scan-tree-dump-not "FAILED:" "parloops_oacc_kernels" } } */ + +/* Check that the loop has been split off into a function. */ +/* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.0" 1 "optimized" } } */ + +/* { dg-final { scan-tree-dump-times "(?n)pragma omp target oacc_parallel.*num_gangs\\(32\\)" 1 "parloops_oacc_kernels" } } */ + +/* { dg-final { cleanup-tree-dump "parloops_oacc_kernels" } } */ +/* { dg-final { cleanup-tree-dump "optimized" } } */ diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c index 04708c0..492ffcb 100644 --- a/gcc/tree-parloops.c +++ b/gcc/tree-parloops.c @@ -2442,6 +2442,7 @@ gen_parallel_loop (struct loop *loop, /* Cancel the loop (it is simpler to do it here rather than to teach the expander to do it). */ cancel_loop_tree (loop); + loops_state_set (LOOPS_NEED_FIXUP); /* Free loop bound estimations that could contain references to removed statements. */ @@ -2761,10 +2762,6 @@ parallelize_loops (bool oacc_kernels_p) if (!loop->in_oacc_kernels_region) continue; - /* TODO: Allow nested loops. */ - if (loop->inner) - continue; - if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, "Trying loop %d with header bb %d in oacc kernels region\n", diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-nest.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-nest.c new file mode 100644 index 0000000..21d2599 --- /dev/null +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-nest.c @@ -0,0 +1,26 @@ +/* { dg-do run } */ +/* { dg-additional-options "-ftree-parallelize-loops=32" } */ + +#include + +#define N 1000 + +int +main (void) +{ + int x[N][N]; + +#pragma acc kernels copyout (x) + { + for (int ii = 0; ii < N; ii++) + for (int jj = 0; jj < N; jj++) + x[ii][jj] = ii + jj + 3; + } + + for (int i = 0; i < N; i++) + for (int j = 0; j < N; j++) + if (x[i][j] != i + j + 3) + abort (); + + return 0; +} -- 1.9.1 --------------050008070206060407060006--