From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from esa1.mentor.iphmx.com (esa1.mentor.iphmx.com [68.232.129.153]) by sourceware.org (Postfix) with ESMTPS id 92CC33858428 for ; Mon, 13 Nov 2023 11:47:09 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 92CC33858428 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 92CC33858428 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=68.232.129.153 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699876031; cv=none; b=a1uAZ63VOVlUxzQBd/uN0yTyliEuu+qihIzAxt/hbkPf6NRPU4LscqhZdr5rI9FTczns6rhHHrHZgFP4ThMPNh2YZX/+5RoNSBE7UJP8WqpwobIzQf3ix5FJcCVKltZzqi0aR2GusPHG2t8Jc6MN3Vha+XUY2iL4LK2LYIMtFUk= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699876031; c=relaxed/simple; bh=Pk40kkjjgzJekBtmEWGyjTy3856SpemniBN7BakeJGI=; h=Message-ID:Date:MIME-Version:Subject:To:From; b=p9zYjI5vE6Uh4IFx3qbn5xpajDHZSDodXm73hHk36Ev0sjwuJzWYMAGV6fs0XZeDRZJmOdkJjwJPzmka/955Ne4nB7+E6MzvEoiHtiSXZuQYNvurpqkOE9O8kJyJRuCavGj2bkO1XekPnqZ6weQgTrEN8L72/YOhQM+3Bm0nJpo= ARC-Authentication-Results: i=1; server2.sourceware.org X-CSE-ConnectionGUID: nn5vz9yCQxiZUN9yMJfdbA== X-CSE-MsgGUID: /Noa99e0SjqIp8AlOLfzfA== X-IronPort-AV: E=Sophos;i="6.03,299,1694764800"; d="scan'208";a="25575499" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa1.mentor.iphmx.com with ESMTP; 13 Nov 2023 03:47:08 -0800 IronPort-SDR: +bp7Z94MPSO0iuMS6fIhw0A58f5JUI7IDoKJxEAuHdo8a966Ckedy12OQfXr1RhkWV9iNfQFTx ft6TINjD8VHm+8E7IHdCs/sGHe8nqkPo77IiKO92CVyBHlfzeqXokWI2Y4SzWIIyCzQAWg7ZyL eVzKN5jiQcRpcz1g3uaoYduKP3QPesGHcr8V8fn81DGEfKxYFuVo5tr17WbKOz0s2XhxHAbDxD Jcmnte26G0bVBVGNLTrSxWcK3DTJ6kjEqIR127DXgcxG2FWi+agJY0HAXEa4GfhzZJ6bliyx3v LNw= Message-ID: <7ed57b08-3895-49b8-8b10-642609f0a8d2@codesourcery.com> Date: Mon, 13 Nov 2023 12:47:04 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] openmp: Add support for the 'indirect' clause in C/C++ Content-Language: en-US To: Thomas Schwinge , Kwok Cheung Yeung CC: , Jakub Jelinek References: <37f412ee-58e7-4bde-a763-591268e8f8f4@codesourcery.com> <87wmurru61.fsf@euler.schwinge.homeip.net> <875y25udf9.fsf@euler.schwinge.homeip.net> From: Tobias Burnus In-Reply-To: <875y25udf9.fsf@euler.schwinge.homeip.net> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: quoted-printable X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-10.mgc.mentorg.com (139.181.222.10) To svr-ies-mbx-12.mgc.mentorg.com (139.181.222.12) X-Spam-Status: No, score=-5.3 required=5.0 tests=BAYES_00,HEADER_FROM_DIFFERENT_DOMAINS,KAM_DMARC_STATUS,SPF_HELO_PASS,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Thomas, On 13.11.23 11:59, Thomas Schwinge wrote: >>> - 'gcc/cp/pt.cc:tsubst_omp_clauses', >>> - 'gcc/gimplify.cc:gimplify_scan_omp_clauses', >>> 'gcc/gimplify.cc:gimplify_adjust_omp_clauses' >>> - 'gcc/omp-low.cc:scan_sharing_clauses' (twice) >>> - 'gcc/tree-nested.cc:convert_nonlocal_omp_clauses', >>> 'gcc/tree-nested.cc:convert_local_omp_clauses' >>> - 'gcc/tree-pretty-print.cc:dump_omp_clause' > [...] >> Most of those seem to relate to executable directives > (That remark I don't understand.) OpenMP classifies directives into categories. The following exists (grep + count of TR12's json/directive/ files.) 2 meta 2 subsidiary 2 utility 4 informational 11 declarative 40 executable where "declare target' + 'indirect' (like declare variant, declare simd) is= declarative, i.e. "declare target directive - A declarative directive that ensures that proce= dures and/or variables can be executed or accessed on a device." For those, we usually add an attribute to the function declaration. And 'ex= ecutive' is defined as "executable directive - A directive that appears in an executable context a= nd results in implementation code and/or prescribes the manner in which associate user co= de must execute." Those either are turned into a libgomp call or transform some code - possib= ly including calling the library. That would be, e.g. 'omp target', 'omp parallel', 'omp= atomic'. The listed functions all deal with executable code, i.e. some tree ..._EXPR= , possibly associated with a structured block (at least the scan_* only apply for block-associate= d directives as they scan of usage inside that block - affecting the directive/the directive's c= lauses). * * * >> We use noclone,noinline attributes for 'declare target', thus, there >> should be no issue on this side and regarding tsubst_omp_clauses, as the >> clause is either present or not (either bare or with a parse-time >> constant logical), there is not much post processing needed. > That's not obvious to the casual reader of GCC source code, though. True, but that's the general problem with code - without background, you don't always understand the flow in the code and when something is called. I think there is no good way how this can be solved; or rather, it can be solved for a specific question but the next person looks for something else and then has the same problem and the previous "solution" (like comment) doesn't help. In some cases, I think it helps to add comments, but I have the feeling that your question is to specific (you look at a single patch) to be really helpful here. But I am happy to be proven wrong and see useful code changes/comments. >> Thus, I bet that there is nothing to do for those. >> >>> Please verify, and add handling as well as test cases as necessary, or, >>> as applicable, put 'case OMP_CLAUSE_INDIRECT:' next to >>> 'default: gcc_unreachable ();' etc., if indeed that clause is not >>> expected there. >> What's the point of having it next to default if it is gcc_unreachable? > Instead of "bet", I suggest to document intentions: so that it's clear > that 'OMP_CLAUSE_INDIRECT' is not meant to be seen here vs. an accidental > omission. Done - in this email thread, which can be found by patch archeology. >> I mean there are several others which shouldn't be needed like >> OMP_CLAUSE_DEVICE_TYPE which also does not show up at gcc/cp/pt.cc. > Quite possible. :-) I certainly wouldn't object to "handling" those, > too. > > Generally, in my opinion, we should usually see 'case's listed for all > clause codes where we 'switch' on them, for example. If there are plenty of 'default:', I am no sure it makes sense. But in general, the question is (for a switch where most 'enum' values are = used) whether it would make more sense to remove the 'default: gcc_unreachable();= '. In that case, we do not handle the others - but the -Wswitch-enum will warn= about it. Thus, a bootstrap build will ensure that all values are covered due to -Wer= ror=3Dswitch-enum. Downside is that without -Werror/bootstrap, it will silently fall through b= ut for normal FE/ME code, it is guaranteed to be bootstrapped and will show up. * * * >>> Also, for my understanding: why is 'build_indirect_map' done at kernel >>> invocation time (here) instead of at image load time? >> The splay_tree is generated on the device itself - and we currently do >> not start a kernel during GOMP_OFFLOAD_load_image. We could, the >> question is whether it makes sense. (Generating the splay_tree on the >> host for the device is a hassle and error prone as it needs to use >> device pointers at the end.) > Hmm. It seems conceptually cleaner to me to set this up upfront, and > avoids potentially slowing down every device kernel invocation (at least > another function call, and 'gomp_mutex_lock' check). Though, I agree > this may be "in the noise" with regards to all the other stuff going on > in 'gomp_gcn_enter_kernel' and elsewhere... I think the most common case is GOMP_INDIRECT_ADDR_MAP =3D=3D NULL. The question is whether the lock should/could be moved inside if (!indirec= t_array) or not. Probably yes: * doing an atomic load for the outer '!indirect array', work on a local arr= ay for the build up and only assign it at the end - and just after the lock check = again whether '!indirect array'. That way, it is lock free once build but when build there is no race. > What I just realize, what's also unclear to me is how the current > implementation works with regards to several images getting loaded -- > don't we then overwrite 'GOMP_INDIRECT_ADDR_MAP' instead of > (conceptually) appending to it? Yes, I think that will happen - but it looks as if the same issue exists also the other code? I think that's not the first variable that has that issue? I think we should try to cleanup that handling, also to support calling a device function in a shared library from a target region in the main program, which currently also fails. All device routines that are in normal static libraries and in the object files of the main program should simply work thanks to offload LTO such that there is only a single GOMP_offload_register_ver call (per device type) and GOMP_OFFLOAD_load_image call (per device). Likewise if the offloading is only done via a single shared library. =E2=80= =94 Any mixing will currently fail, unfortunately. This patch just adds another item which does not handle it properly. (Not good but IMHO also not a showstopper for this patch.) > In the general case, additional images may also get loaded during > execution. We thus need proper locking of the shared data structure, uh? > Or, can we have separate on-device data structures per image? (I've not > yet thought about that in detail.) I think we could - but in the main-program 'omp target' case that calls a shared-library 'declare target' function means that we need to handle multiple GOMP_offload_register_ver / load_image calls such that they can work together. Obviously, it gets harder if the user keeps doing dlopen() / dlclose() of libraries containing offload code where a target/compute region is run before, between, and after those calls (but hopefully not running when calling dlopen/dlclose). > Relatedly then, when images are unloaded, we also need to remove stale > items from the table, and release resources (for example, the > 'GOMP_OFFLOAD_alloc' for 'map_target_addr'). True. I think the general assumption is that images only get unloaded at the very end, which matches most but not all code. Yet another work item. I think we should open a new PR about this topic and collect work items there. >>>> +++ b/libgomp/testsuite/libgomp.c-c++-common/declare-target-indirect-2= .c > Another thing regarding this test case: testing > '-foffload-options=3Damdgcn-amdhsa=3D-march=3Dgfx900' offloading on our > amdnano4 system, I see: > ... > Re-running this manually a few times, I got: > pass: 5 > fail (as above): 3 > hang: 1 Looks like something that needs to be investigated. Tobias ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstra=C3=9Fe 201= , 80634 M=C3=BCnchen; Gesellschaft mit beschr=C3=A4nkter Haftung; Gesch=C3= =A4ftsf=C3=BChrer: Thomas Heurung, Frank Th=C3=BCrauf; Sitz der Gesellschaf= t: M=C3=BCnchen; Registergericht M=C3=BCnchen, HRB 106955