From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id 449A2385C676 for ; Fri, 9 Dec 2022 14:44:30 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 449A2385C676 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1670597069; h=from:from:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7cRr8AVIQLHAlckiXnDbYBgpzkIPIFaJxPg/ae2tPRM=; b=E8Tm3zBUzLgbbfk49ADpswz6SqcD4GR+oLyGWexzDWtpxDTsJ/xkm8YBb1MU3N+lKVxyxi Gowc6orpAWXd7Y7XLHYxlXUs+0ZnPSTK+OgqYy33z16W9psNb6027qO+3M6Y0Cpbj5gu9a GKit8Kn3TRDZ+NaHAEGX4t/uRrJyeg4= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-639-u_s4n9qTOompPFA9-u00KQ-1; Fri, 09 Dec 2022 09:44:26 -0500 X-MC-Unique: u_s4n9qTOompPFA9-u00KQ-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 8E8EB801231; Fri, 9 Dec 2022 14:44:26 +0000 (UTC) Received: from tucnak.zalov.cz (unknown [10.39.195.114]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 4D7D940C2065; Fri, 9 Dec 2022 14:44:26 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.17.1/8.17.1) with ESMTPS id 2B9EiLBT2009055 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Fri, 9 Dec 2022 15:44:21 +0100 Received: (from jakub@localhost) by tucnak.zalov.cz (8.17.1/8.17.1/Submit) id 2B9EiKTX2009054; Fri, 9 Dec 2022 15:44:20 +0100 Date: Fri, 9 Dec 2022 15:44:19 +0100 From: Jakub Jelinek To: Tobias Burnus Cc: gcc-patches Subject: Re: [Patch] libgomp: Handle OpenMP's reverse offloads Message-ID: Reply-To: Jakub Jelinek References: <0567b7c6-fede-72b8-63d1-1fc10dca36a0@codesourcery.com> MIME-Version: 1.0 In-Reply-To: <0567b7c6-fede-72b8-63d1-1fc10dca36a0@codesourcery.com> X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-3.5 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Tue, Dec 06, 2022 at 08:45:07AM +0100, Tobias Burnus wrote: > 32bit vs. 64bit: libgomp itself is compiled with both -m32 and -m64; however, > nvptx and gcn requires -m64 on the device side and assume that the device > pointers are representable on the host (i.e. all are 64bit). The new code > tries to be in principle compatible with uint32_t pointers and uses uint64_t > to represent it consistently. – The code should be mostly fine, except that > one called function requires an array of void* and size_t. Instead of handling > that case, I added some code to permit optimizing away the function content > without offloading - and a run-time assert if it should ever happen that this > function gets called on a 32bit host from the target side. I think we just shouldn't support libgomp plugins for 32-bit libgomp, only host fallback. If you want offloading, use 64-bit host... > libgomp: Handle OpenMP's reverse offloads > > This commit enabled reverse offload for nvptx such that gomp_target_rev > actually gets called. And it fills the latter function to do all of > the following: finding the host function to the device func ptr and > copying the arguments to the host, processing the mapping/firstprivate, > calling the host function, copying back the data and freeing as needed. > > The data handling is made easier by assuming that all host variables > either existed before (and are in the mapping) or that those are > devices variables not yet available on the host. Thus, the reverse > mapping can do without refcounts etc. Note that the spec disallows > inside a target region device-affecting constructs other than target > plus ancestor device-modifier and it also limits the clauses permitted > on this construct. > > For the function addresses, an additional splay tree is used; for > the lookup of mapped variables, the existing splay-tree is used. > Unfortunately, its data structure requires a full walk of the tree; > Additionally, the just mapped variables are recorded in a separate > data structure an extra lookup. While the lookup is slow, assuming > that only few variables get mapped in each reverse offload construct > and that reverse offload is the exception and not performance critical, > this seems to be acceptable. > > libgomp/ChangeLog: > > * libgomp.h (struct target_mem_desc): Predeclare; move > below after 'reverse_splay_tree_node' and add rev_array > member. > (struct reverse_splay_tree_key_s, reverse_splay_compare): New. > (reverse_splay_tree_node, reverse_splay_tree, > reverse_splay_tree_key): New typedef. > (struct gomp_device_descr): Add mem_map_rev member. > * oacc-host.c (host_dispatch): NULL init .mem_map_rev. > * plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_num_devices): Claim > support for GOMP_REQUIRES_REVERSE_OFFLOAD. > * splay-tree.h (splay_tree_callback_stop): New typedef; like > splay_tree_callback but returning int not void. > (splay_tree_foreach_lazy): Define; like splay_tree_foreach but > taking splay_tree_callback_stop as argument. > * splay-tree.c (splay_tree_foreach_internal_lazy, > splay_tree_foreach_lazy): New; but early exit if callback returns > nonzero. > * target.c: Instatiate splay_tree_c with splay_tree_prefix 'reverse'. > (gomp_map_lookup_rev): New. > (gomp_load_image_to_device): Handle reverse-offload function > lookup table. > (gomp_unload_image_from_device): Free devicep->mem_map_rev. > (struct gomp_splay_tree_rev_lookup_data, gomp_splay_tree_rev_lookup, > gomp_map_rev_lookup, struct cpy_data, gomp_map_cdata_lookup_int, > gomp_map_cdata_lookup): New auxiliary structs and functions for > gomp_target_rev. > (gomp_target_rev): Implement reverse offloading and its mapping. > (gomp_target_init): Init current_device.mem_map_rev.root. > * testsuite/libgomp.fortran/reverse-offload-2.f90: New test. > * testsuite/libgomp.fortran/reverse-offload-3.f90: New test. > * testsuite/libgomp.fortran/reverse-offload-4.f90: New test. > * testsuite/libgomp.fortran/reverse-offload-5.f90: New test. > * testsuite/libgomp.fortran/reverse-offload-5a.f90: New test without > mapping of on-device allocated variables. > + /* Likeverse for the reverse lookup device->host for reverse offload. */ Likewise > + reverse_splay_tree_node rev_array; Do we need reverse_splay_tree* stuff in libgomp.h? As splay_tree_node is just a pointer, perhaps just struct reverse_splay_tree_node_s; early and struct reverse_splay_tree_node_s *rev_array; in libgomp.h and include the extra splay-tree.h only in target.c? Unless one needs it anywhere else... Otherwise LGTM. Jakub