From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 15601 invoked by alias); 22 Aug 2013 14:28:21 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 15582 invoked by uid 89); 22 Aug 2013 14:28:20 -0000 X-Spam-SWARE-Status: No, score=-7.6 required=5.0 tests=AWL,BAYES_00,RCVD_IN_HOSTKARMA_W,RCVD_IN_HOSTKARMA_WL,RP_MATCHES_RCVD,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.2 Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.84/v0.84-167-ge50287c) with ESMTP; Thu, 22 Aug 2013 14:28:19 +0000 Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id r7MESHbZ006022 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Thu, 22 Aug 2013 10:28:17 -0400 Received: from zalov.cz (vpn1-4-190.ams2.redhat.com [10.36.4.190]) by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id r7MESFr1002999 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 22 Aug 2013 10:28:17 -0400 Received: from zalov.cz (localhost [127.0.0.1]) by zalov.cz (8.14.5/8.14.5) with ESMTP id r7MESFJM004652; Thu, 22 Aug 2013 16:28:15 +0200 Received: (from jakub@localhost) by zalov.cz (8.14.5/8.14.5/Submit) id r7MESEfS004651; Thu, 22 Aug 2013 16:28:14 +0200 Date: Fri, 23 Aug 2013 00:22:00 -0000 From: Jakub Jelinek To: "Michael V. Zolotukhin" Cc: Kirill Yukhin , Richard Henderson , gcc@gcc.gnu.org, triegel@redhat.com Subject: Re: [RFC] Offloading Support in libgomp Message-ID: <20130822142814.GB1814@tucnak.redhat.com> Reply-To: Jakub Jelinek References: <20130822140810.GA27868@msticlxl57.ims.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130822140810.GA27868@msticlxl57.ims.intel.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-SW-Source: 2013-08/txt/msg00260.txt.bz2 On Thu, Aug 22, 2013 at 06:08:10PM +0400, Michael V. Zolotukhin wrote: > We're working on design for offloading support in GCC (part of OpenMP4), and I > have a question regarding libgomp part. > > Suppose we expand '#pragma omp target' like we expand '#pragma omp parallel', > i.e. the compiler expands the following code: > #pragma omp target > { > body; > } > to this: > void subfunction (void *data) > { > use data; > body; > } > > setup data; > function_name = "subfunction"; > GOMP_offload (subfunction, &data, function_name); Roughly. We have 3 directives here, #pragma omp target #pragma omp target data #pragma omp target update and all of them have various clauses, some that are allowed at most once (e.g. the device clause, if clause) and others that can be used many times (the data movement clauses). I'd prefer GOMP_target instead of GOMP_offload for the function name, to make it clearly related to the corresponding directive. The question is if we want to emit multiple calls for the single directive, say one for each data movement clause (where for each one we need address, length, direction and some way how to propagate the transformed address to the accelerator code), or if we build an array of the data movement structures and just pass that down to a single routine. Because of the device clause which should be probably passed just as an integer with -1 meaning the default, perhaps single routine might be better. > GOMP_offload is a call to libgomp, which will be implemented somehow like this: > void GOMP_offload (void (*fn)(void*), void *data, const char *fname) > { > if (gomp_offload_available ()) This really isn't just check whether accelerator is available, we need to query all accelerators in the system (and cache that somehow in the library), assign device numbers to individual devices (say, you could have two Intel MIC cards, one AMD HSAIL capable GPGPU and 4 Nvidia PTX capable GPGPUs or similar), ensure that already assigned device numbers aren't reused when discovering new ones and then just check what device user requested (if not available, fall back to host), next check see if we have corresponding code for that accelerator (again, fallback to host otherwise), optionally compile the code if not compiled yet (HSAIL/PTX code only) then finally do the name lookup and spawn it. Stuff specific to the HW should be in libgomp plugins IMHO, so we have one dlopenable module for each of the 3 variants, where one fn in the plugin would be about checking what HW is available, one about trying to run the code etc. Jakub