From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 10095 invoked by alias); 23 Aug 2013 09:52:56 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 10085 invoked by uid 89); 23 Aug 2013 09:52:56 -0000 X-Spam-SWARE-Status: No, score=-7.6 required=5.0 tests=AWL,BAYES_00,RCVD_IN_HOSTKARMA_W,RCVD_IN_HOSTKARMA_WL,RP_MATCHES_RCVD,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.2 Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.84/v0.84-167-ge50287c) with ESMTP; Fri, 23 Aug 2013 09:52:55 +0000 Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id r7N9qsk6000724 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Fri, 23 Aug 2013 05:52:54 -0400 Received: from zalov.cz (vpn1-4-247.ams2.redhat.com [10.36.4.247]) by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id r7N9qpFC004018 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 23 Aug 2013 05:52:52 -0400 Received: from zalov.cz (localhost [127.0.0.1]) by zalov.cz (8.14.5/8.14.5) with ESMTP id r7N9qoqm004989; Fri, 23 Aug 2013 11:52:51 +0200 Received: (from jakub@localhost) by zalov.cz (8.14.5/8.14.5/Submit) id r7N9qosw004988; Fri, 23 Aug 2013 11:52:50 +0200 Date: Fri, 23 Aug 2013 12:37:00 -0000 From: Jakub Jelinek To: "Michael V. Zolotukhin" Cc: Kirill Yukhin , Richard Henderson , gcc@gcc.gnu.org, triegel@redhat.com Subject: Re: [RFC] Offloading Support in libgomp Message-ID: <20130823095250.GJ1814@tucnak.redhat.com> Reply-To: Jakub Jelinek References: <20130822140810.GA27868@msticlxl57.ims.intel.com> <20130822142814.GB1814@tucnak.redhat.com> <20130823092810.GA36483@msticlxl57.ims.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130823092810.GA36483@msticlxl57.ims.intel.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-SW-Source: 2013-08/txt/msg00271.txt.bz2 On Fri, Aug 23, 2013 at 01:28:10PM +0400, Michael V. Zolotukhin wrote: > Sure, I used '#pragma omp target' just for a simple example. The > question about '#pragma omp target data' is still open. As far as I > understand, all three of the pragmas could require data marshalling (but > not necessary - 'omp target', if it's located inside 'omp target data' > which specifies all needed for 'omp dtarget' variables, won't need any > data marshalling - right?). This data movement could be done, as you > noted by a single call or a set of calls (one for each clause) - and > while single call seems appealing, it could be better to use separate > calls in case, when e.g. we have a 'target update' for only a subset of > all described in 'target data' variables. One-call approach has > difficulties with specifying, which subset of the data we want to > update. The single call approach would just be passed array of control structures that would describe each of the MAP clauses, and you'd simply loop over them; the standard requires that if some object is already mapped into the device, then nothing is performed (well, target update is an exception). So you'd have some data structure that maps [address,address+length) intervals for each device into target addresses and start with doing a lookup on that (note, if [addr,addr+6) -> devaddr is already in, then addr+4 should be just mapped to devaddr+4). If not found, you'd allocate the memory on the device, add into the mapping data structure and record in a vector for the target/target data in question, so that on exit from that it would be copied back if requested, and deallocated. I don't see how single call vs. multiple calls would change anything on that. Plus a single call allows the device id to be looked up just once and treat all the mappings as a batch (GOMP_target_data would probably need corresponding GOMP_target_data_end call, perhaps just with a device_id and would pop from the stack of pending target data regions for that device. [B > > I'd prefer GOMP_target instead of GOMP_offload for the function name, to > > make it clearly related to the corresponding directive. > That makes sense - I just used first name that came to my mind here. > > > > GOMP_offload is a call to libgomp, which will be implemented somehow like this: > > > void GOMP_offload (void (*fn)(void*), void *data, const char *fname) > > > { > > > if (gomp_offload_available ()) > > > > This really isn't just check whether accelerator is available, we need to > > query all accelerators in the system (and cache that somehow in the > > library), assign device numbers to individual devices (say, you could have > > two Intel MIC cards, one AMD HSAIL capable GPGPU and 4 Nvidia PTX capable > > GPGPUs or similar), ensure that already assigned device numbers aren't > > reused when discovering new ones and then just check what device user > > requested (if not available, fall back to host), next check see if we > > have corresponding code for that accelerator (again, fallback to host > > otherwise), optionally compile the code if not compiled yet (HSAIL/PTX code > > only) then finally do the name lookup and spawn it. > Multi-target option arises another bunch of questions:) Could you > please check if my vision of how GCC would handle multiple offload > targets? Here it is: > We have GCC with a set of plugins for compiling code for each available > offloading target. These plugins work similarly to lto-plugin, i.e. > they consume gimple as the input, but produce a code for the specific > target. Libgomp also has similar set of plugins for HW specific > implementation of functions for remote running code, data transferring, > getting device status etc. > For example, for Intel MIC, AMD HSAIL and Nvidia PTX we'll have host-GCC > with three plugins and host-libgomp, also with three plugins. > Invoking GCC with options like '-mmic', '-mhsail', '-mptx' triggers > usage of a corresponding plugins in GCC driver. In result, after the > compilation we'd have four binaries: one for host and three for possible > targets. I meant just a single plugin that would handle all of them, or as richi said, perhaps teach LTO plugin to do that. For options, my vision was something like: -ftarget=mic -ftarget=hsail='-mfoobaz=4 -mbazbaz' which would mean: 1) compile LTO IL from the accelerator section for mic with the originally recorded gcc command line options with the Target options removed and no extra options added 2) compile LTO IL also for hsail target, with originally recorded gcc command line options but Target options and -mfoobaz=4 -mbazbaz options added 3) don't compile for ptx The thing is if you originally compile with -O3 -ftree-vectorize -march=corei7-avx -minline-all-stringops the -m* options might not apply to the target compiler at all. So you'd construct the command line from the original command line sans CL_TARGET options, append to that the link time override for the accelerator. Then another thing is how to find out the corresponding compiler (including its driver) for the target from the plugin. > Now, libgomp part. The host binary consists calls to libgomp.so, which > is target-independent (i.e. it's host-specific). It should be able to > call plugins for all three targets, so in functions like > gomp_offload_available it probably would iterate through all available > plugins, asking for device status and the code availability. This > iterating would include dlopen of the corresponding plugin, calling a > function from it and moving to the next plugin. > Is this correct? libgomp would start by trying to dlopen all available plugins, and for each of them call some routine in them that would query the hw for available devices, then libgomp would assign device ids to them (0 and up) and then for target specific parts just dispatch again to the plugin corresponding to the chosen device. Jakub