public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [OpenACC 0/11] execution model
@ 2015-10-21 19:00 Nathan Sidwell
  2015-10-21 19:00 ` [OpenACC 1/11] UNIQUE internal function Nathan Sidwell
                   ` (10 more replies)
  0 siblings, 11 replies; 120+ messages in thread
From: Nathan Sidwell @ 2015-10-21 19:00 UTC (permalink / raw)
  To: GCC Patches; +Cc: Jakub Jelinek, Bernd Schmidt, Jason Merrill, Joseph S. Myers

I'll be posting a patch series for trunk, which implements the core of the 
OpenACC execution model.  This is split into the following patches:

01-trunk-unique.patch
   Internal function with a 'uniqueness' property
02-trunk-nvptx-partition.patch
   NVPTX backend patch set for partitioned execution
03-trunk-hook.patch
   OpenACC hook
04-trunk-c.patch
   C FE changes
05-trunk-cxx.patch
   C++ FE changes
06-trunk-red-init.patch
   Placeholder to keep reductions functioning
07-trunk-loop-mark.patch
   Annotate OpenACC loops in device-agnostic manner
08-trunk-dev-lower.patch
   Device-specific lowering of loop markers
09-trunk-lower-gate.patch
   Run oacc_device_lower pass regardless of errors
10-trunk-libgomp.patch
   Libgomp change (remove dimension check)
11-trunk-tests.patch
   Initial set of execution tests
[let's try that again, after slapping my mail agent for using an old address]

With the exception of patch 6, these are all on the gomp4 branch.  This patch 
set does not change reduction handling, which will be dealt with in a subsequent 
set.

An offloaded region is spawned on a set of execution engines.   These are 
organized as a cube, with specific axes controlled by the programmer.  The 
engines may operate in a 'partitioned' mode, where each engine executes as a 
separate thread, or they may operate in a 'single' mode, where one engine of a 
particular set executes the program and the other engines are idled (in an 
implementation-specific manner).

A driving example is the following:
#pragma acc parallel ...
{
  // single mode here
#pragma acc loop ...
for (i = 0; i < N; i++) // loop 1
   ... // partitioned mode here

if (expr) // single mode here
#pragma acc loop ...
   for (i = 0; i < N; i++) // loop 2
     ... // partitioned mode here
}

While it's clear all paths lead to loop 1, it's not statically determinable 
whether loop 2 is executed or not.

This implementation marks the head and tail of partitioned execution regions 
with builtin functions indicating the axes of partitioning.  After 
device-specific lowering, these will eventually make it to RTL expansion time, 
where they get expanded to backend-specific  RTL.  In the PTX implementation 
'single' mode is implemented by a 'neutering' mechanism, where the non-active 
execution engines skip each basic block and 'follow along' conditional branches 
to get to a subsequent block.  In this manner all engines can reach a 
dynamically determinable partitioned region.

On entry to a partitioned region, we execute a 'fork' operation, cloning live 
state from the single active engine before the region, into the other threads 
that become activated.

This patchset has been tested on x86_64-linux & ptx accelerator.

nathan

^ permalink raw reply	[flat|nested] 120+ messages in thread

end of thread, other threads:[~2020-11-24 10:34 UTC | newest]

Thread overview: 120+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-21 19:00 [OpenACC 0/11] execution model Nathan Sidwell
2015-10-21 19:00 ` [OpenACC 1/11] UNIQUE internal function Nathan Sidwell
2015-10-22  7:49   ` Richard Biener
2015-10-22  7:55     ` Richard Biener
2015-10-22  8:04       ` Jakub Jelinek
2015-10-22  8:07         ` Richard Biener
2015-10-22 11:42           ` Julian Brown
2015-10-22 13:12             ` Nathan Sidwell
2015-10-22 13:20               ` Jakub Jelinek
2015-10-22 13:27                 ` Nathan Sidwell
2015-10-22 14:31                   ` Richard Biener
2015-10-22 14:47                     ` Nathan Sidwell
2015-10-22  8:05   ` Jakub Jelinek
2015-10-22  8:12     ` Richard Biener
2015-10-22 13:08       ` Nathan Sidwell
2015-10-22 14:04       ` Nathan Sidwell
2015-10-22 14:28         ` Richard Biener
2015-10-22 14:31           ` Nathan Sidwell
2015-10-22 18:08           ` Nathan Sidwell
2015-10-23  8:46             ` Jakub Jelinek
2015-10-23 13:03               ` Nathan Sidwell
2015-10-23 13:03                 ` Richard Biener
2015-10-23 13:16                   ` Nathan Sidwell
2015-10-23 13:16                     ` Jakub Jelinek
2015-10-23 14:46                       ` Nathan Sidwell
2015-10-23 13:12                 ` Jakub Jelinek
2015-10-23 13:38                   ` Nathan Sidwell
2015-10-25 14:29                   ` Nathan Sidwell
2015-10-26 22:35                     ` Nathan Sidwell
2015-10-27  8:18                       ` Jakub Jelinek
2015-10-27 13:47                         ` Richard Biener
2015-10-27 14:06                           ` Nathan Sidwell
2015-10-27 14:07                             ` Jakub Jelinek
2015-10-27 20:18                             ` Nathan Sidwell
2015-10-27 14:15                         ` Nathan Sidwell
2015-10-23  9:40             ` Richard Biener
2015-10-22 17:39       ` Nathan Sidwell
2015-10-22 20:25     ` Nathan Sidwell
2015-10-23  8:05       ` Jakub Jelinek
2015-10-21 19:11 ` [OpenACC 2/11] PTX backend changes Nathan Sidwell
2015-10-22  8:16   ` Jakub Jelinek
2015-10-22  9:58     ` Bernd Schmidt
2015-10-22 13:02       ` Nathan Sidwell
2015-10-22 13:23         ` Nathan Sidwell
2015-10-22 14:05   ` Bernd Schmidt
2015-10-22 14:26     ` Nathan Sidwell
2015-10-22 14:30       ` Bernd Schmidt
2015-10-22 14:36         ` Jakub Jelinek
2015-10-22 14:52           ` Nathan Sidwell
2015-10-28 14:28             ` Nathan Sidwell
2015-10-22 14:42         ` Nathan Sidwell
2015-10-21 19:16 ` [OpenACC 3/11] new target hook Nathan Sidwell
2015-10-22  8:23   ` Jakub Jelinek
2015-10-22 13:17     ` Nathan Sidwell
2015-10-27 22:15     ` Nathan Sidwell
2015-10-21 19:19 ` [OpenACC 5/11] C++ FE changes Nathan Sidwell
2015-10-22  8:58   ` Jakub Jelinek
2015-10-23 20:26     ` Cesar Philippidis
2015-10-24  2:39       ` Cesar Philippidis
2015-10-24 21:15         ` Cesar Philippidis
2015-10-26 10:30           ` Jakub Jelinek
2015-10-26 22:44             ` Cesar Philippidis
2015-10-27  8:03               ` Jakub Jelinek
2015-10-27 20:21                 ` Nathan Sidwell
2015-10-21 19:19 ` [OpenACC 4/11] C " Nathan Sidwell
2015-10-22  8:25   ` Jakub Jelinek
2015-10-23 20:20     ` Cesar Philippidis
2015-10-23 20:40       ` Jakub Jelinek
2015-10-23 21:31         ` Jakub Jelinek
2015-10-23 21:32         ` Cesar Philippidis
2015-10-24  2:37           ` Cesar Philippidis
2015-10-24 13:08             ` Jakub Jelinek
2015-10-24 21:11               ` Cesar Philippidis
2015-10-26  9:47                 ` Jakub Jelinek
2015-10-26 10:09                   ` Jakub Jelinek
2015-10-26 22:32                   ` Cesar Philippidis
2015-10-27 20:23                     ` Nathan Sidwell
2015-10-23 21:25       ` Nathan Sidwell
2015-10-25 14:18         ` Nathan Sidwell
2015-10-21 19:32 ` [OpenACC 6/11] Reduction initialization Nathan Sidwell
2015-10-22  9:11   ` Jakub Jelinek
2015-10-27 22:27     ` Nathan Sidwell
2015-10-21 19:47 ` [OpenACC 7/11] execution model Nathan Sidwell
2015-10-22  9:32   ` Jakub Jelinek
2015-10-22 12:51     ` Nathan Sidwell
2015-10-22 13:01       ` Jakub Jelinek
2015-10-22 13:08         ` Nathan Sidwell
2015-10-25 15:03     ` Nathan Sidwell
2015-10-26 23:39       ` Nathan Sidwell
2015-10-27  8:33         ` Jakub Jelinek
2015-10-27 14:03           ` Nathan Sidwell
2015-10-28  5:45             ` Nathan Sidwell
2020-11-24 10:34   ` Thomas Schwinge
2015-10-21 19:50 ` [OpenACC 8/11] device-specific lowering Nathan Sidwell
2015-10-22  9:32   ` Jakub Jelinek
2015-10-22 12:59     ` Nathan Sidwell
2015-10-26 15:21   ` Jakub Jelinek
2015-10-26 16:23     ` Nathan Sidwell
2015-10-26 16:56       ` Jakub Jelinek
2015-10-26 18:10         ` Nathan Sidwell
2015-10-28  1:06     ` Nathan Sidwell
2015-10-21 19:51 ` [OpenACC 9/11] oacc_device_lower pass gate Nathan Sidwell
2015-10-22  9:33   ` Jakub Jelinek
2015-10-27 20:31     ` Nathan Sidwell
2015-10-21 19:52 ` [OpenACC 10/11] remove plugin restriction Nathan Sidwell
2015-10-22  9:38   ` Jakub Jelinek
2015-10-21 19:59 ` [OpenACC 11/11] execution tests Nathan Sidwell
2015-10-21 20:15   ` Ilya Verbin
2015-10-21 20:17     ` Nathan Sidwell
2015-10-28 14:30       ` Nathan Sidwell
2015-10-22  9:54   ` Jakub Jelinek
2015-10-22 14:02     ` Nathan Sidwell
2015-10-22 14:07       ` Jakub Jelinek
2015-10-22 14:23         ` Nathan Sidwell
2015-10-22 14:47           ` Cesar Philippidis
2015-10-22 14:58             ` Nathan Sidwell
2015-10-22 15:03             ` Jakub Jelinek
2015-10-22 15:08               ` Cesar Philippidis
2015-10-23 20:32               ` Cesar Philippidis
2015-10-24  2:56                 ` Cesar Philippidis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).