From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm1-x333.google.com (mail-wm1-x333.google.com [IPv6:2a00:1450:4864:20::333]) by sourceware.org (Postfix) with ESMTPS id EDA28384AB76 for ; Thu, 11 Apr 2024 10:10:12 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org EDA28384AB76 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=baylibre.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=baylibre.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org EDA28384AB76 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::333 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1712830216; cv=none; b=PIrTwfB3s9zOHyuKD8U0vkrIjPDmzq7nkna1X9Pkif0r8dIh7u0fBdsEfh58WjNnToMefGxfxZGIMDAp7EAWfVP5ck3grLzspL+ICvszer+ieoWCwjOb+1V/iAFKxrqneqbmwO1lXaxhjsUvtPGA836SLktfYMmPtZ3+bxdHr5k= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1712830216; c=relaxed/simple; bh=KaHmQdxiSk+KX7XGfQV5k5lghpQJNgbKJggsUZOTtlo=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=UZxpKbNrRmiiUWWWx2ZNH8bTt3Qi8NFqdOPntAbQIHfLlF4ndIfBFBVdlgUlmUR3CEPmDV7bAFjqPuQLnOhhxivAuvNQfbn4xRXv6G4p/4iaFO2x066W68OnLl6CGbq8N5kbdo44apOH5ayuIxu5vs3g7ixn1vw+mJRbDh/BfoQ= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-wm1-x333.google.com with SMTP id 5b1f17b1804b1-41551500a7eso58663695e9.2 for ; Thu, 11 Apr 2024 03:10:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=baylibre-com.20230601.gappssmtp.com; s=20230601; t=1712830211; x=1713435011; darn=gcc.gnu.org; h=content-transfer-encoding:mime-version:message-id:date:user-agent :references:in-reply-to:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=qKOM0SmNp5xn17iM95fnl19q9YsRWosHMJsSyNTCj4Y=; b=ak7MOyH9gGjgC7YyDPJAj5gJArRm+EctxkpZdry7/Xdu9obfMsPlr2Ur+GQt7kq4gW cynrOpbUNBzCfID7NNt95u/IsaqLAS/MrsYN2471MSL38aZEjA4gzqA3UwYeZnTF+xCh 7q4aa71AzNa+/qeJy5IoyD+0mwK1dBIZThufBUEOigdH8uFZ/l61oUXce4/XbVbK5htp cJ8R3QT3QG6xINb65chUFhCVxikQCC4RIfJrFHAGey6sUK+FbJ5DMF8SRbDRdK6x+j7T XhU+nKHcQybgRcKSoI13iZ2kANayBkdpIuI2aTy3pV8oEN1xnMk2bXJNOoaMFx+FnQDv qS+g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712830211; x=1713435011; h=content-transfer-encoding:mime-version:message-id:date:user-agent :references:in-reply-to:subject:cc:to:from:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=qKOM0SmNp5xn17iM95fnl19q9YsRWosHMJsSyNTCj4Y=; b=fzTELD2ucb0M58FDCL7ItCfh/TJdZDVZjfmPkdkxf8fTS2TX/cTch0Y/2HhSZm+DHD jTk8MfU4xpxOg/Dci+9XNqPjbiCF4mxplBsnqohhiS2AgWQEXkZkRhWrJPo0VKg9FlVT 27qEavctr552B2E5A0Ea37msHHC0N6aOFbAAXKNIVvhkkmrfTjeTztiIvA+GMkZOJ8Qd TZqJQ4BEFNQk/MUd95rydKxM6eQM/1dhIg+ApWPZCj+9O7PovWiz8gWaixEqOTPKzazK dKTiRnoGpaI9hpo/8Up3v6oD9V2ZP0Goo2lhfz3QvT2A0lbpFuWWvQwf/0qJDzKzK3WC /47A== X-Gm-Message-State: AOJu0YwJAKHnt25tCDzAWFZRlmA0phMFuMv0v7+DXWd0r93iHrOJjAKA Y4SKzA+txukZW9PblbJxSMAbIlbQLUZS1Y6aAFwqUtW3EW7FOvU03fcLolV6gwo= X-Google-Smtp-Source: AGHT+IGbqGHoBHIufANgrk/vs0dhnqukk37yo2vhdeHUy0hwVACXkxWHkFyZnqieyk/XGHJ8bCv3aA== X-Received: by 2002:a05:600c:1911:b0:416:5d63:e651 with SMTP id j17-20020a05600c191100b004165d63e651mr3521520wmq.37.1712830211592; Thu, 11 Apr 2024 03:10:11 -0700 (PDT) Received: from euler.schwinge.homeip.net (p200300c8b70ce600fbf8323f8abe0da4.dip0.t-ipconnect.de. [2003:c8:b70c:e600:fbf8:323f:8abe:da4]) by smtp.gmail.com with ESMTPSA id bd12-20020a05600c1f0c00b00416645e7e47sm5109871wmb.13.2024.04.11.03.10.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 11 Apr 2024 03:10:11 -0700 (PDT) From: Thomas Schwinge To: Tobias Burnus , Kwok Cheung Yeung Cc: gcc-patches@gcc.gnu.org, Jakub Jelinek Subject: Re: [PATCH] openmp: Add support for the 'indirect' clause in C/C++ In-Reply-To: <7ed57b08-3895-49b8-8b10-642609f0a8d2@codesourcery.com> References: <37f412ee-58e7-4bde-a763-591268e8f8f4@codesourcery.com> <87wmurru61.fsf@euler.schwinge.homeip.net> <875y25udf9.fsf@euler.schwinge.homeip.net> <7ed57b08-3895-49b8-8b10-642609f0a8d2@codesourcery.com> User-Agent: Notmuch/0.30+8~g47a4bad (https://notmuchmail.org) Emacs/29.2 (x86_64-pc-linux-gnu) Date: Thu, 11 Apr 2024 12:10:09 +0200 Message-ID: <87v84odxby.fsf@euler.schwinge.ddns.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-5.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi! I've filed "OpenMP 'indirect' clause: dynamic image loading/unloading" for the following issue: On 2023-11-13T12:47:04+0100, Tobias Burnus wrote: > On 13.11.23 11:59, Thomas Schwinge wrote: >>>> Also, for my understanding: why is 'build_indirect_map' done at kernel >>>> invocation time (here) instead of at image load time? >>> The splay_tree is generated on the device itself - and we currently do >>> not start a kernel during GOMP_OFFLOAD_load_image. We could, the >>> question is whether it makes sense. (Generating the splay_tree on the >>> host for the device is a hassle and error prone as it needs to use >>> device pointers at the end.) >> Hmm. It seems conceptually cleaner to me to set this up upfront, and >> avoids potentially slowing down every device kernel invocation (at least >> another function call, and 'gomp_mutex_lock' check). Though, I agree >> this may be "in the noise" with regards to all the other stuff going on >> in 'gomp_gcn_enter_kernel' and elsewhere... > > I think the most common case is GOMP_INDIRECT_ADDR_MAP =3D=3D NULL. > > The question is whether the lock should/could be moved inside if (!indir= ect_array) > or not. Probably yes: > * doing an atomic load for the outer '!indirect array', work on a local a= rray for > the build up and only assign it at the end - and just after the lock chec= k again > whether '!indirect array'. > > That way, it is lock free once build but when build there is no race. > >> What I just realize, what's also unclear to me is how the current >> implementation works with regards to several images getting loaded -- >> don't we then overwrite 'GOMP_INDIRECT_ADDR_MAP' instead of >> (conceptually) appending to it? > > Yes, I think that will happen - but it looks as if the same issue exists > also the other code? I think that's not the first variable that has that > issue? > > I think we should try to cleanup that handling, also to support calling > a device function in a shared library from a target region in the main > program, which currently also fails. > > All device routines that are in normal static libraries and in the > object files of the main program should simply work thanks to offload > LTO such that there is only a single GOMP_offload_register_ver call (per > device type) and GOMP_OFFLOAD_load_image call (per device). > > Likewise if the offloading is only done via a single shared library. =E2= =80=94 > Any mixing will currently fail, unfortunately. This patch just adds > another item which does not handle it properly. > > (Not good but IMHO also not a showstopper for this patch.) > >> In the general case, additional images may also get loaded during >> execution. We thus need proper locking of the shared data structure, uh? >> Or, can we have separate on-device data structures per image? (I've not >> yet thought about that in detail.) > > I think we could - but in the main-program 'omp target' case that calls > a shared-library 'declare target' function means that we need to handle > multiple GOMP_offload_register_ver / load_image calls such that they can > work together. > > Obviously, it gets harder if the user keeps doing dlopen() / dlclose() > of libraries containing offload code where a target/compute region is > run before, between, and after those calls (but hopefully not running > when calling dlopen/dlclose). > >> Relatedly then, when images are unloaded, we also need to remove stale >> items from the table, and release resources (for example, the >> 'GOMP_OFFLOAD_alloc' for 'map_target_addr'). > > True. I think the general assumption is that images only get unloaded at > the very end, which matches most but not all code. Yet another work item. > > I think we should open a new PR about this topic and collect work items > there. Gr=C3=BC=C3=9Fe Thomas