From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 64541 invoked by alias); 26 Jun 2017 22:45:46 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 61880 invoked by uid 89); 26 Jun 2017 22:45:38 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.3 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy= X-HELO: relay1.mentorg.com Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 26 Jun 2017 22:45:36 +0000 Received: from svr-orw-mbx-04.mgc.mentorg.com ([147.34.90.204]) by relay1.mentorg.com with esmtp id 1dPcl9-0005Jg-AK from Cesar_Philippidis@mentor.com for gcc-patches@gcc.gnu.org; Mon, 26 Jun 2017 15:45:31 -0700 Received: from [127.0.0.1] (147.34.91.1) by SVR-ORW-MBX-04.mgc.mentorg.com (147.34.90.204) with Microsoft SMTP Server (TLS) id 15.0.1263.5; Mon, 26 Jun 2017 15:45:28 -0700 Subject: Re: [gomp4] OpenACC async re-work To: Chung-Lin Tang , gcc-patches , Thomas Schwinge References: From: Cesar Philippidis Message-ID: <469ff96f-b7de-cac6-0051-e23f70997cf4@codesourcery.com> Date: Mon, 26 Jun 2017 22:45:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.1.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit X-ClientProxiedBy: svr-orw-mbx-03.mgc.mentorg.com (147.34.90.203) To SVR-ORW-MBX-04.mgc.mentorg.com (147.34.90.204) X-SW-Source: 2017-06/txt/msg01965.txt.bz2 I still need more time to review this, but ... On 06/24/2017 12:54 AM, Chung-Lin Tang wrote: > Hi Cesar, Thomas, > This patch is the re-implementation of OpenACC async we talked about. > The changes are rather large, so I am putting it here for a few days before > actually committing them to gomp-4_0-branch. Would appreciate if you guys > take a look. > > To overall describe the highlights of the changes: > > (1) Instead of essentially implementing the entire OpenACC async support > inside the plugin, we now use an opaque 'goacc_asyncqueue' implemented > by the plugin, along with core 'test', 'synchronize', 'serialize', etc. > plugin functions. Most of the OpenACC specific logic is pulled into > libgomp/oacc-async.c I'm not sure if plugins need to maintain backwards compatibility. However, I don't see any changes inside libgomp.map, so maybe it's not required. > (2) CUDA events are no longer used. The limitation of no CUDA calls inside > CUDA callbacks were a problem for resource freeing, but we now stash > them onto the ptx_device and free them later. Yay! > (3) For 'wait + async', we now add a local thread synchronize, instead > of just ordering the streams. > > (4) To work with the (3) change, some front end changes were added to > propagate argument-less wait clauses as 'wait(GOACC_ASYNC_NOVAL)' to > represent a 'wait all'. What's the significance of GOMP_ASYNC_NOVAL? Wouldn't it have been easier to make that change in the gimplifier? > Patch was tested to have no regressions on gomp-4_0-branch. I'll commit > this after the weekend (or Tues.) > * plugin/plugin-nvptx.c (struct cuda_map): Remove. > (GOMP_OFFLOAD_openacc_exec): Adjust parameters and code. > (GOMP_OFFLOAD_openacc_async_exec): New plugin hook function. These two functions seem extremely similar. I wonder if you should consolidate them. Overall, I like how you were able eliminate the externally managed map_* data structure which was used to pass in arguments to nvptx_exec. Although I wonder if we should just pass in those individual arguments directly to cuLaunchKernel. But that's a big change in itself. Cesar