From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-411082-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 72338 invoked by alias); 22 Oct 2015 16:42:06 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 72324 invoked by uid 89); 22 Oct 2015 16:42:06 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00,KAM_LAZY_DOMAIN_SECURITY,RCVD_IN_DNSWL_LOW,RP_MATCHES_RCVD autolearn=no version=3.3.2
X-HELO: smtp.ispras.ru
Received: from smtp.ispras.ru (HELO smtp.ispras.ru) (83.149.199.79) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 22 Oct 2015 16:41:56 +0000
Received: from [10.10.3.121] (unknown [83.149.199.91])	by smtp.ispras.ru (Postfix) with ESMTP id A5C1720508;	Thu, 22 Oct 2015 19:41:51 +0300 (MSK)
Date: Thu, 22 Oct 2015 16:42:00 -0000
From: Alexander Monakov <amonakov@ispras.ru>
To: Jakub Jelinek <jakub@redhat.com>
cc: Bernd Schmidt <bschmidt@redhat.com>, gcc-patches@gcc.gnu.org,     Dmitry Melnik <dm@ispras.ru>
Subject: Re: [gomp4 00/14] NVPTX: further porting
In-Reply-To: <20151022095442.GN478@tucnak.redhat.com>
Message-ID: <alpine.LNX.2.20.1510221833190.28723@monopod.intra.ispras.ru>
References: <1445366076-16082-1-git-send-email-amonakov@ispras.ru> <562779F9.9070800@redhat.com> <alpine.LNX.2.20.1510211759420.23517@monopod.intra.ispras.ru> <20151022095442.GN478@tucnak.redhat.com>
User-Agent: Alpine 2.20 (LNX 67 2015-01-07)
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
X-SW-Source: 2015-10/txt/msg02303.txt.bz2

On Thu, 22 Oct 2015, Jakub Jelinek wrote:
> Does that apply also to threads within a warp?  I.e. is .local local to each
> thread in the warp, or to the whole warp, and if the former, how can say at
> the start of a SIMD region or at its end the local vars be broadcast to
> other threads and collected back?  One thing is scalar vars, another
> pointers, or references to various types, or even bigger indirection.

.local is indeed local to each warp member, not the warp as a whole.  What
OpenACC/PTX implementation does is to copy the whole stack frame, plus live
registers: the implementation is in nvptx.c:nvptx_propagate.

I see two possible alternative approaches for OpenMP/PTX.

The first approach is to try and follow the OpenACC scheme.  In OpenMP that
will be more complicated.  First, we won't have a single stack frame, so we'll
need to emit stack propagation at call sites.  Second, we'll have to ensure
that each libgomp function that can appear in call chain from target region
entry to simd loop runs in "vector-neutered" mode, that is, threads 1-31 in
each warp follow branches that thread 0 executes.

The second approach is to run all threads in the warp all the time, making
sure they execute the same code with the same data, and thus build up the same
local state.  In this case we'd need to ensure this invariant: if threads in
the warp have the same state prior to executing an instruction, they also have
the same state after executing that instruction (plus global state changes as
if only one thread executed that instruction).

Most instructions are safe w.r.t this invariant.  Atomics break it, so to
maintain the invariant for atomics we need to conditionally execute it in only
one thread, and then copy the register holding the result to other threads.
Apart from atomics, I see only two more hazards: calls and user asm.

For calls, I think the solution is to execute the call in all threads,
demanding that callees hold up the invariant.  To ensure that, we'd need to
recompile newlib and other libs in that mode.  Finally, a few callees are out
of our control since they are provided by the driver: malloc, free, vprintf.
Those we can treat like atomics.

What do you think?  Does that sound correct?

Was something like this considered (and rejected?) for OpenACC?

Thanks.

Alexander