From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-402852-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 91160 invoked by alias); 14 Jul 2015 09:35:43 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 91147 invoked by uid 89); 14 Jul 2015 09:35:42 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2
X-HELO: relay1.mentorg.com
Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 14 Jul 2015 09:35:41 +0000
Received: from svr-orw-fem-05.mgc.mentorg.com ([147.34.97.43])	by relay1.mentorg.com with esmtp 	id 1ZEwcj-0002vz-3X from ChungLin_Tang@mentor.com ; Tue, 14 Jul 2015 02:35:37 -0700
Received: from [0.0.0.0] (147.34.91.1) by svr-orw-fem-05.mgc.mentorg.com (147.34.97.43) with Microsoft SMTP Server id 14.3.224.2; Tue, 14 Jul 2015 02:35:35 -0700
Message-ID: <55A4D7E0.2020303@codesourcery.com>
Date: Tue, 14 Jul 2015 09:35:00 -0000
From: Chung-Lin Tang <cltang@codesourcery.com>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:31.0) Gecko/20100101 Thunderbird/31.7.0
MIME-Version: 1.0
To: Jakub Jelinek <jakub@redhat.com>
CC: gcc-patches <gcc-patches@gcc.gnu.org>, Tom de Vries	<vries@codesourcery.com>, Thomas Schwinge <thomas@codesourcery.com>
Subject: Re: [PATCH, gomp4] Propagate independent clause for OpenACC kernels pass
References: <55A4A21C.1070004@codesourcery.com> <20150714070010.GY1788@tucnak.redhat.com>
In-Reply-To: <20150714070010.GY1788@tucnak.redhat.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-IsSubscribed: yes
X-SW-Source: 2015-07/txt/msg01113.txt.bz2

On 15/7/14 3:00 PM, Jakub Jelinek wrote:
> On Tue, Jul 14, 2015 at 01:46:04PM +0800, Chung-Lin Tang wrote:
>> this patch provides a 'bool independent' field in struct loop, which
>> will be switched on by an "independent" clause in a #pragma acc loop directive.
>> I assume you'll be wiring it to the kernels parloops pass in a followup patch.
>>
>> Note: there are already a few other similar fields in struct loop, namely
>> 'safelen' and 'can_be_parallel', used by OMP simd safelen and GRAPHITE respectively.
>> The intention and/or setting of these fields are all a bit different, so I've
>> decided to add a new bool for OpenACC.
> 
> How is it different though?  Can you cite exact definition of the
> independent clause vs. safelen (set to INT_MAX)?
> The OpenMP definition is:
> "A SIMD loop has logical iterations numbered 0,1,...,N-1 where N is the
> number of loop iterations, and the logical numbering denotes the sequence in which the iterations would
> be executed if the associated loop(s) were executed with no SIMD instructions. If the safelen
> clause is used then no two iterations executed concurrently with SIMD instructions can have a
> greater distance in the logical iteration space than its value."
> ...
> "Lexical forward dependencies in the iterations of the
> original loop must be preserved within each SIMD chunk."

The wording of OpenACC independent is more simple:
"... the independent clause tells the implementation that the iterations of this loop
are data-independent with respect to each other." -- OpenACC spec 2.7.9

I would say this implies even more relaxed conditions than OpenMP simd safelen,
essentially saying that the compiler doesn't even need dependence analysis; just
assume independence of iterations.

> So e.g. safelen >= 32 means for PTX you can safely implement it by
> running up to 32 consecutive iterations by all threads in the warp
> (assuming code that for some reason must be run by a single thread
> (e.g. calls to functions that are marked so that they expect to be run
> by the first thread in a warp initially) is run sequentially by increasing
> iterator), but it doesn't mean the iterations have no dependencies in between
> them whatsoever (see the above note about lexical forward dependencies),
> so you can't parallelize it by assigning different iterations to different
> threads outside of warp (or pthread_create created threads).

> So if OpenACC independent means there are no dependencies in between
> iterations, the OpenMP counterpart here is #pragma omp for simd schedule (auto)
> or #pragma omp distribute parallel for simd schedule (auto).

schedule(auto) appears to correspond to the OpenACC 'auto' clause, or
what is implied in a kernels compute construct, but I'm not sure it implies
no dependencies between iterations?

Putting aside the semantic issues, as of currently safelen>0 turns on a certain amount of
vectorization code that we are not currently using (and not likely at all for nvptx).
Right now, we're just trying to pass the new flag to a kernels tree-parloops based pass.

Maybe this can all be reconciled later in a more precise way, e.g. have flags that correspond
specifically to phases of internal compiler passes (and selected by needs of the accel target),
instead of ones that are "sort of" associated with high-level language features.

Chung-Lin