From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 19040 invoked by alias); 29 Jul 2013 13:18:58 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 19031 invoked by uid 89); 29 Jul 2013 13:18:57 -0000 X-Spam-SWARE-Status: No, score=-4.4 required=5.0 tests=AWL,BAYES_50,RCVD_IN_HOSTKARMA_W,RCVD_IN_HOSTKARMA_WL,RDNS_NONE,SPF_HELO_PASS,SPF_PASS autolearn=no version=3.3.1 Received: from Unknown (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.84/v0.84-167-ge50287c) with ESMTP; Mon, 29 Jul 2013 13:18:55 +0000 Received: from int-mx12.intmail.prod.int.phx2.redhat.com (int-mx12.intmail.prod.int.phx2.redhat.com [10.5.11.25]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id r6TDImu8029099 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Mon, 29 Jul 2013 09:18:48 -0400 Received: from [10.36.6.212] (vpn1-6-212.ams2.redhat.com [10.36.6.212]) by int-mx12.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id r6TDIlhF016171 for ; Mon, 29 Jul 2013 09:18:47 -0400 Subject: Summary of the Accelerator BOF at Cauldron From: Torvald Riegel To: gcc Content-Type: text/plain; charset="UTF-8" Date: Mon, 29 Jul 2013 13:18:00 -0000 Message-ID: <1375103926.7129.7694.camel@triegel.csb> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit X-SW-Source: 2013-07/txt/msg00428.txt.bz2 The Accelerator BOF at the GNU Tools Cauldron was worthwhile. Several people presented their current work or announced upcoming projects in the accelerator space. There was significant interest from the Cauldron attendees; we had about 40 People in the room on the first day, and about 20 in the follow-up session that we had to schedule on the next day due to interest in further discussion. How to support accelerators is a very broad topic, so we often hadn't time to discuss everything in detail nor all the subtopics we would have liked to talk about -- but we covered the main issues, and got the discussion started. Thomas Schwinge (Mentor Graphics) talked about their plans to support OpenACC in GCC; OpenACC constructs and the enclosed code are transformed into NVIDIA's PTX virtual ISA and calls into CUDA libraries. Martin Jambor (SuSE) is working on expanding GIMPLE into HSAIL, the HSA virtual ISA. They are currently focusing on code suitable for kernels (i.e., the parallel tasks that one could send off to an accelerator) and haven't started looking at transforming programming-language-level parallelism constructs (e.g., parallel loops) into code that makes use of the HSA queues or runtime. Kirill Yukhin (Intel) gave an overview of how offloading works in the current generation of MIC accelerators. Jakub Jelinek (Red Hat) has been working on OpenMP 4.0, including the parsing of OpenMP 4.0's accelerator-related constructs. However, any such accelerator code isn't offloaded to an actual accelerator yet but still executed on the host. The topics we discussed fell roughly into three categories: Programming abstractions and front-end, middle-end, and back-end. In the first category, OpenMP and OpenACC are the most obvious candidates that we could support; they have a lot of similarities, but also differ in a few aspects such as how they treat potential aliasing between data on the host and on the accelerator: OpenACC specifies that copying happens, whereas OpenMP specifies a mapping between the data (which does not require copying, but would allow data to remain at the same place). We want interoperability in this regard, and at least for the OpenACC/OpenMP integration, libgomp is probably the centerpiece. It was also noted that the ISO C++ committee doesn't seem to want to be too favorable to including OpenMP-like constructs in the standard; however, neither the ISO C++ nor the C study group on parallelism currently work on support for accelerators. Furthermore, while auto-parallelization (or auto-vectorization) are very useful features, they also have their limits, and there's no reason to not support standards such as OpenMP or OpenACC. Regarding the middle-end, the most interesting topic is how to represent accelerator constructs in the intermediate representation. We didn't discuss in detail how to do this, but there was strong support for striving for one way of representing accelerator code that's common across the several programming abstractions / frontends we might support. It was noted that finding the right semantics to target internally would be key for this. The various GCC accelerator efforts are all in a rather early "experimentation stage"; thus, to find the right general abstraction for how to describe any kind of parallel regions, it could make sense to first try implementating, for example, the OpenACC support by making use of existing OpenMP contructs (and extending them as necessary). The main issue we discussed in the backend category was how to target more than one ISA when generating code (i.e., we need code in the host's ISA and in the accelerator(s)' (virtual) ISA(s)). Multi-target support in GCC might be one option, but would probably need quite some time and thus depending on it would probably delay the accelerator efforts. It might be simpler to stream code several times to different backends using the LTO infrastructure. However, there is a (likely) risk of having target dependencies in the code before LTO; we concluded that for this approach to work, the target ABIs would thus need to be sufficiently compatible (e.g., regarding data type sizes). A third option that SuSE is experimenting with is not writing a new backend but instead generating code right after the last GIMPLE pass; however, HSAIL needs register allocation, so it was noted that writing a light-weight backend might be easier. We also discussed a few other topics, such as where to store accelerator code in binaries (e.g., Thomas is thinking about store PTX textual representation in separate ELF segments), or how to deal with dispatching of functions that might have both an accelerator and a host variant (e.g., math functions). We didn't have time to discuss other topics such as how to debug or link applications with accelerator code. Thanks to Thomas Schwinge for providing additional notes for this summary. Torvald