From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 91160 invoked by alias); 14 Jul 2015 09:35:43 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 91147 invoked by uid 89); 14 Jul 2015 09:35:42 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 X-HELO: relay1.mentorg.com Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 14 Jul 2015 09:35:41 +0000 Received: from svr-orw-fem-05.mgc.mentorg.com ([147.34.97.43]) by relay1.mentorg.com with esmtp id 1ZEwcj-0002vz-3X from ChungLin_Tang@mentor.com ; Tue, 14 Jul 2015 02:35:37 -0700 Received: from [0.0.0.0] (147.34.91.1) by svr-orw-fem-05.mgc.mentorg.com (147.34.97.43) with Microsoft SMTP Server id 14.3.224.2; Tue, 14 Jul 2015 02:35:35 -0700 Message-ID: <55A4D7E0.2020303@codesourcery.com> Date: Tue, 14 Jul 2015 09:35:00 -0000 From: Chung-Lin Tang User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: Jakub Jelinek CC: gcc-patches , Tom de Vries , Thomas Schwinge Subject: Re: [PATCH, gomp4] Propagate independent clause for OpenACC kernels pass References: <55A4A21C.1070004@codesourcery.com> <20150714070010.GY1788@tucnak.redhat.com> In-Reply-To: <20150714070010.GY1788@tucnak.redhat.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-IsSubscribed: yes X-SW-Source: 2015-07/txt/msg01113.txt.bz2 On 15/7/14 3:00 PM, Jakub Jelinek wrote: > On Tue, Jul 14, 2015 at 01:46:04PM +0800, Chung-Lin Tang wrote: >> this patch provides a 'bool independent' field in struct loop, which >> will be switched on by an "independent" clause in a #pragma acc loop directive. >> I assume you'll be wiring it to the kernels parloops pass in a followup patch. >> >> Note: there are already a few other similar fields in struct loop, namely >> 'safelen' and 'can_be_parallel', used by OMP simd safelen and GRAPHITE respectively. >> The intention and/or setting of these fields are all a bit different, so I've >> decided to add a new bool for OpenACC. > > How is it different though? Can you cite exact definition of the > independent clause vs. safelen (set to INT_MAX)? > The OpenMP definition is: > "A SIMD loop has logical iterations numbered 0,1,...,N-1 where N is the > number of loop iterations, and the logical numbering denotes the sequence in which the iterations would > be executed if the associated loop(s) were executed with no SIMD instructions. If the safelen > clause is used then no two iterations executed concurrently with SIMD instructions can have a > greater distance in the logical iteration space than its value." > ... > "Lexical forward dependencies in the iterations of the > original loop must be preserved within each SIMD chunk." The wording of OpenACC independent is more simple: "... the independent clause tells the implementation that the iterations of this loop are data-independent with respect to each other." -- OpenACC spec 2.7.9 I would say this implies even more relaxed conditions than OpenMP simd safelen, essentially saying that the compiler doesn't even need dependence analysis; just assume independence of iterations. > So e.g. safelen >= 32 means for PTX you can safely implement it by > running up to 32 consecutive iterations by all threads in the warp > (assuming code that for some reason must be run by a single thread > (e.g. calls to functions that are marked so that they expect to be run > by the first thread in a warp initially) is run sequentially by increasing > iterator), but it doesn't mean the iterations have no dependencies in between > them whatsoever (see the above note about lexical forward dependencies), > so you can't parallelize it by assigning different iterations to different > threads outside of warp (or pthread_create created threads). > So if OpenACC independent means there are no dependencies in between > iterations, the OpenMP counterpart here is #pragma omp for simd schedule (auto) > or #pragma omp distribute parallel for simd schedule (auto). schedule(auto) appears to correspond to the OpenACC 'auto' clause, or what is implied in a kernels compute construct, but I'm not sure it implies no dependencies between iterations? Putting aside the semantic issues, as of currently safelen>0 turns on a certain amount of vectorization code that we are not currently using (and not likely at all for nvptx). Right now, we're just trying to pass the new flag to a kernels tree-parloops based pass. Maybe this can all be reconciled later in a more precise way, e.g. have flags that correspond specifically to phases of internal compiler passes (and selected by needs of the accel target), instead of ones that are "sort of" associated with high-level language features. Chung-Lin