From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 12714 invoked by alias); 20 Jan 2016 10:31:51 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 12700 invoked by uid 89); 20 Jan 2016 10:31:50 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00,RP_MATCHES_RCVD,SPF_PASS autolearn=ham version=3.3.2 spammy=Hx-languages-length:2764, education, HX-detected-operating-system:Windows, atm X-HELO: fencepost.gnu.org Received: from fencepost.gnu.org (HELO fencepost.gnu.org) (208.118.235.10) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Wed, 20 Jan 2016 10:31:49 +0000 Received: from eggs.gnu.org ([2001:4830:134:3::10]:59837) by fencepost.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1aLq3H-0002ke-5W for gcc-patches@gnu.org; Wed, 20 Jan 2016 05:31:47 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aLq3B-0005rr-PT for gcc-patches@gnu.org; Wed, 20 Jan 2016 05:31:46 -0500 Received: from relay1.mentorg.com ([192.94.38.131]:54439) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aLq3B-0005ra-Je for gcc-patches@gnu.org; Wed, 20 Jan 2016 05:31:41 -0500 Received: from nat-ies.mentorg.com ([192.94.31.2] helo=SVR-IES-FEM-01.mgc.mentorg.com) by relay1.mentorg.com with esmtp id 1aLq35-0002g1-PC from Tom_deVries@mentor.com ; Wed, 20 Jan 2016 02:31:36 -0800 Received: from [127.0.0.1] (137.202.0.76) by SVR-IES-FEM-01.mgc.mentorg.com (137.202.0.104) with Microsoft SMTP Server id 14.3.224.2; Wed, 20 Jan 2016 10:31:34 +0000 Subject: Re: [committed] Add oacc_kernels_p argument to pass_parallelize_loops To: Thomas Schwinge References: <5640BD31.2060602@mentor.com> <5640F98B.5050601@mentor.com> <5649C508.80803@mentor.com> <5654570F.3050003@mentor.com> <566DA3BF.7040105@mentor.com> <569CE37F.3070206@mentor.com> <87r3hczl5a.fsf@kepler.schwinge.homeip.net> CC: "gcc-patches@gnu.org" , Jakub Jelinek , Richard Biener , Richard Biener From: Tom de Vries Message-ID: <569F6200.8040204@mentor.com> Date: Wed, 20 Jan 2016 10:31:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 MIME-Version: 1.0 In-Reply-To: <87r3hczl5a.fsf@kepler.schwinge.homeip.net> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit X-detected-operating-system: by eggs.gnu.org: Windows NT kernel [generic] [fuzzy] X-Received-From: 192.94.38.131 X-SW-Source: 2016-01/txt/msg01498.txt.bz2 On 20/01/16 09:54, Thomas Schwinge wrote: > Hi! > > On Mon, 18 Jan 2016 14:07:11 +0100, Tom de Vries wrote: >> Add oacc_kernels_p argument to pass_parallelize_loops > >> --- a/gcc/tree-parloops.c >> +++ b/gcc/tree-parloops.c > >> @@ -2315,6 +2367,9 @@ gen_parallel_loop (struct loop *loop, > > | /* Ensure that the exit condition is the first statement in the loop. > | The common case is that latch of the loop is empty (apart from the > | increment) and immediately follows the loop exit test. Attempt to move the > | entry of the loop directly before the exit check and increase the number of > | iterations of the loop by one. */ > | if (try_transform_to_exit_first_loop_alt (loop, reduction_list, nit)) > | { > | if (dump_file > | && (dump_flags & TDF_DETAILS)) > | fprintf (dump_file, > | "alternative exit-first loop transform succeeded" > | " for loop %d\n", loop->num); > | } > | else > | { >> + if (oacc_kernels_p) >> + n_threads = 1; >> + > | /* Fall back on the method that handles more cases, but duplicates the > | loop body: move the exit condition of LOOP to the beginning of its > | header, and duplicate the part of the last iteration that gets disabled > | to the exit of the loop. */ > | transform_to_exit_first_loop (loop, reduction_list, nit); > | } > > Just for my own education: this pessimization "n_threads = 1" for OpenACC > kernels is because the duplicated loop bodies generated by > transform_to_exit_first_loop are not appropriate for parallel OpenACC > offloading execution? In the case of standard parloops, only the loop is executed in parallel, so the duplicated loop body is outside the parallel region. In the case of oacc parloops, the duplicated body is included in the kernels region, and executed in parallel. The duplicated body for the last iteration can be executed in parallel with the loop body in the loop for all the other iterations. We've done the dependency analysis for that. But the duplicated loop body for the last iteration is now executed in parallel with itself as well. We've got code that deals with that by guarding the side-effects such that they're only executed for a single gang. But that code is atm only effective in oacc_entry_exit_ok, before transform_to_exit_first_loop_alt introduces the duplicated loop body. > (Might add a source code comment here?) Testing > on gomp-4_0-branch, there are no changes in the testsuite if I remove > this hunk. If you want to see the effect of removing the 'n_threads = 1' hunk, make try_transform_to_exit_first_loop_alt always return false. I expect a loop for (i = 0; i < N; ++i) a[i] = a[i] + 1; would give incorrect results in a[N - 1]. Thanks, - Tom