From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-return-179624-listarch-gcc=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 19040 invoked by alias); 29 Jul 2013 13:18:58 -0000
Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <http://gcc.gnu.org/ml/>
Sender: gcc-owner@gcc.gnu.org
Received: (qmail 19031 invoked by uid 89); 29 Jul 2013 13:18:57 -0000
X-Spam-SWARE-Status: No, score=-4.4 required=5.0 tests=AWL,BAYES_50,RCVD_IN_HOSTKARMA_W,RCVD_IN_HOSTKARMA_WL,RDNS_NONE,SPF_HELO_PASS,SPF_PASS autolearn=no version=3.3.1
Received: from Unknown (HELO mx1.redhat.com) (209.132.183.28)    by sourceware.org (qpsmtpd/0.84/v0.84-167-ge50287c) with ESMTP; Mon, 29 Jul 2013 13:18:55 +0000
Received: from int-mx12.intmail.prod.int.phx2.redhat.com (int-mx12.intmail.prod.int.phx2.redhat.com [10.5.11.25])	by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id r6TDImu8029099	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK)	for <gcc@gcc.gnu.org>; Mon, 29 Jul 2013 09:18:48 -0400
Received: from [10.36.6.212] (vpn1-6-212.ams2.redhat.com [10.36.6.212])	by int-mx12.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id r6TDIlhF016171	for <gcc@gcc.gnu.org>; Mon, 29 Jul 2013 09:18:47 -0400
Subject: Summary of the  Accelerator BOF at Cauldron
From: Torvald Riegel <triegel@redhat.com>
To: gcc <gcc@gcc.gnu.org>
Content-Type: text/plain; charset="UTF-8"
Date: Mon, 29 Jul 2013 13:18:00 -0000
Message-ID: <1375103926.7129.7694.camel@triegel.csb>
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
X-SW-Source: 2013-07/txt/msg00428.txt.bz2

The Accelerator BOF at the GNU Tools Cauldron was worthwhile.  Several
people presented their current work or announced upcoming projects in
the accelerator space.  There was significant interest from the Cauldron
attendees; we had about 40 People in the room on the first day, and
about 20 in the follow-up session that we had to schedule on the next
day due to interest in further discussion.  How to support accelerators
is a very broad topic, so we often hadn't time to discuss everything in
detail nor all the subtopics we would have liked to talk about -- but we
covered the main issues, and got the discussion started.

Thomas Schwinge (Mentor Graphics) talked about their plans to support
OpenACC in GCC; OpenACC constructs and the enclosed code are transformed
into NVIDIA's PTX virtual ISA and calls into CUDA libraries.

Martin Jambor (SuSE) is working on expanding GIMPLE into HSAIL, the HSA
virtual ISA.  They are currently focusing on code suitable for kernels
(i.e., the parallel tasks that one could send off to an accelerator) and
haven't started looking at transforming programming-language-level
parallelism constructs (e.g., parallel loops) into code that makes use
of the HSA queues or runtime.

Kirill Yukhin (Intel) gave an overview of how offloading works in the
current generation of MIC accelerators.

Jakub Jelinek (Red Hat) has been working on OpenMP 4.0, including the
parsing of OpenMP 4.0's accelerator-related constructs.  However, any
such accelerator code isn't offloaded to an actual accelerator yet but
still executed on the host.

The topics we discussed fell roughly into three categories: Programming
abstractions and front-end, middle-end, and back-end.  In the first
category, OpenMP and OpenACC are the most obvious candidates that we
could support; they have a lot of similarities, but also differ in a few
aspects such as how they treat potential aliasing between data on the
host and on the accelerator: OpenACC specifies that copying happens,
whereas OpenMP specifies a mapping between the data (which does not
require copying, but would allow data to remain at the same place).  We
want interoperability in this regard, and at least for the
OpenACC/OpenMP integration, libgomp is probably the centerpiece.

It was also noted that the ISO C++ committee doesn't seem to want to be
too favorable to including OpenMP-like constructs in the standard;
however, neither the ISO C++ nor the C study group on parallelism
currently work on support for accelerators.  Furthermore, while
auto-parallelization (or auto-vectorization) are very useful features,
they also have their limits, and there's no reason to not support
standards such as OpenMP or OpenACC.

Regarding the middle-end, the most interesting topic is how to represent
accelerator constructs in the intermediate representation.  We didn't
discuss in detail how to do this, but there was strong support for
striving for one way of representing accelerator code that's common
across the several programming abstractions / frontends we might
support.  It was noted that finding the right semantics to target
internally would be key for this.  The various GCC accelerator efforts
are all in a rather early "experimentation stage"; thus, to find the
right general abstraction for how to describe any kind of parallel
regions, it could make sense to first try implementating, for example,
the OpenACC support by making use of existing OpenMP contructs (and
extending them as necessary).

The main issue we discussed in the backend category was how to target
more than one ISA when generating code (i.e., we need code in the host's
ISA and in the accelerator(s)' (virtual) ISA(s)).  Multi-target support
in GCC might be one option, but would probably need quite some time and
thus depending on it would probably delay the accelerator efforts.  It
might be simpler to stream code several times to different backends
using the LTO infrastructure.  However, there is a (likely) risk of
having target dependencies in the code before LTO; we concluded that for
this approach to work, the target ABIs would thus need to be
sufficiently compatible (e.g., regarding data type sizes).  A third
option that SuSE is experimenting with is not writing a new backend but
instead generating code right after the last GIMPLE pass; however, HSAIL
needs register allocation, so it was noted that writing a light-weight
backend might be
easier.

We also discussed a few other topics, such as where to store accelerator
code in binaries (e.g., Thomas is thinking about store PTX textual
representation in separate ELF segments), or how to deal with
dispatching of functions that might have both an accelerator and a host
variant (e.g., math functions).  We didn't have time to discuss other
topics such as how to debug or link applications with accelerator code.

Thanks to Thomas Schwinge for providing additional notes for this
summary.

Torvald