From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from esa1.mentor.iphmx.com (esa1.mentor.iphmx.com [68.232.129.153]) by sourceware.org (Postfix) with ESMTPS id 8DA5E3858C52 for ; Mon, 13 Nov 2023 10:59:31 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 8DA5E3858C52 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 8DA5E3858C52 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=68.232.129.153 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699873173; cv=none; b=KGp6Nx6bkcgehAIPJPm3dRnSS25+SIS6pg7WGF6Ds55xs35DoXd6DKrWgOXaCDDtYjb394Zm9aVpXvYMAJOa0iB3OLPJau19beNfOTLhZB8ejWRvrgM1XA9OEYk00r5ZfSWOQwRHL+fkwjbahD/RTVnniLsX8aBVupL+I/t6B+0= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699873173; c=relaxed/simple; bh=2EGdsb9gmKBLddKcHzEUaA8cD4KzrPfMRlPUDw60zOU=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=n3lgH1/jIdlfr2j6X1BtpJMTJjaAId1TDipXpJfmssc2q33t3/yDvlU54E/vJy5f9IoMTt8HDjNxz7ODr1hs5GHjxm/ZiVtFq5ZIHXdhQPvq5MF/gW2KGKSTkd9DRjEpHlAInU01XpXpUiFwv6Z6lBoRKpV2UAsLlW1DUXLmytg= ARC-Authentication-Results: i=1; server2.sourceware.org X-CSE-ConnectionGUID: Vl0+HuhOQUGfj6uadDmBkA== X-CSE-MsgGUID: UKn6IIY+Tdq4HK3JnlryCA== X-IronPort-AV: E=Sophos;i="6.03,299,1694764800"; d="scan'208";a="25571652" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa1.mentor.iphmx.com with ESMTP; 13 Nov 2023 02:59:30 -0800 IronPort-SDR: NMjqh7zbXnsNLtuDxVTmtvJ6R2IXzmYdcOjgNhdrSFVuVeY265u6uB1ctQyDs9ZW9Fcn4xieDP P0pvkc3TrVlRNoUIbaM2XbMEuqXVcpyG/K1YysKIWJ6jN26YRQbXINUN4wEvqbpXkBrqUIW672 HXD/ADnQdpBKv8bHZTs4CKSZHjVtiVtNqeKA1xG3Pdxp9tKNm8KpIUlIZFKC5zypKKf4M9hxKD /8837XayyuefjhnHY1s/9tWknmRSn7qsDvSuTAXjQVZ5fzD0ZqmLjJ0sqLGtxNCQo2zmacNEm5 +9Q= From: Thomas Schwinge To: Tobias Burnus , Kwok Cheung Yeung CC: , Jakub Jelinek Subject: Re: [PATCH] openmp: Add support for the 'indirect' clause in C/C++ In-Reply-To: References: <37f412ee-58e7-4bde-a763-591268e8f8f4@codesourcery.com> <87wmurru61.fsf@euler.schwinge.homeip.net> User-Agent: Notmuch/0.29.3+94~g74c3f1b (https://notmuchmail.org) Emacs/28.2 (x86_64-pc-linux-gnu) Date: Mon, 13 Nov 2023 11:59:22 +0100 Message-ID: <875y25udf9.fsf@euler.schwinge.homeip.net> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-12.mgc.mentorg.com (139.181.222.12) To svr-ies-mbx-10.mgc.mentorg.com (139.181.222.10) X-Spam-Status: No, score=-5.8 required=5.0 tests=BAYES_00,HEADER_FROM_DIFFERENT_DOMAINS,KAM_DMARC_STATUS,SPF_HELO_PASS,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi! On 2023-11-09T17:00:11+0100, Tobias Burnus wrote: > On 09.11.23 13:24, Thomas Schwinge wrote: >> Also, assuming that the order of appearance of 'IND_FUNC_MAP' does matte= r >> as it does for 'FUNC_MAP', ... https://github.com/MentorEmbedded/nvptx-t= ools/pull/29 ... > > It should matter. Thus, we should indeed update nvptx-tools for this! > > For hello-world it probably does not show up that easily as there are > only very few such tagged functions. But especially once it gets used > for C++ virtual functions, the number of function will be that large > that the ordering issue is likely to occur in the real world. > > (I shouldn't have missed this =E2=80=93 given that I debugging and report= ed the > original issue.) Let's blame this on the inadequate general (non-)handling of these directives in nvptx-tools 'as' -- which, as I said, I'll address once Kwok has fixed this specific issue (with test case, please). (Generalize/refactor after fixing specific issue.) > (BTW: OMP_CLAUSE_INDIRECT is only used intermittendly in the C/C++ FEs > and not in the ME as it is soon turned into an attribute string.) OK, that does explain: >> I would've assumed handling for 'OMP_CLAUSE_INDIRECT' to also be >> necessary in the following places: >> >> - 'gcc/c-family/c-omp.cc:c_omp_split_clauses' > "split_clauses" applies only to combined composite constructs like > 'target'+'parallel' +'for' + 'simd' where clauses have to be added to > the right constituent clause(s). Declarative directives cannot be combine= d. >> - 'gcc/cp/pt.cc:tsubst_omp_clauses', >> - 'gcc/gimplify.cc:gimplify_scan_omp_clauses', >> 'gcc/gimplify.cc:gimplify_adjust_omp_clauses' >> - 'gcc/omp-low.cc:scan_sharing_clauses' (twice) >> - 'gcc/tree-nested.cc:convert_nonlocal_omp_clauses', >> 'gcc/tree-nested.cc:convert_local_omp_clauses' >> - 'gcc/tree-pretty-print.cc:dump_omp_clause' ... this. > Most of those seem to relate to executable directives (That remark I don't understand.) > =E2=80=93 and not to > declarative ones, where we attach DECL_ATTRIBUTES to a decl and process > them. For functions, the pretty printer prints the attributes. > Here, we use "omp declare target indirect" as attribute. ACK. > We use noclone,noinline attributes for 'declare target', thus, there > should be no issue on this side and regarding tsubst_omp_clauses, as the > clause is either present or not (either bare or with a parse-time > constant logical), there is not much post processing needed. That's not obvious to the casual reader of GCC source code, though. > Thus, I bet that there is nothing to do for those. > >> Please verify, and add handling as well as test cases as necessary, or, >> as applicable, put 'case OMP_CLAUSE_INDIRECT:' next to >> 'default: gcc_unreachable ();' etc., if indeed that clause is not >> expected there. > > What's the point of having it next to default if it is gcc_unreachable? Instead of "bet", I suggest to document intentions: so that it's clear that 'OMP_CLAUSE_INDIRECT' is not meant to be seen here vs. an accidental omission. > I mean there are several others which shouldn't be needed like > OMP_CLAUSE_DEVICE_TYPE which also does not show up at gcc/cp/pt.cc. Quite possible. :-) I certainly wouldn't object to "handling" those, too. Generally, in my opinion, we should usually see 'case's listed for all clause codes where we 'switch' on them, for example. >>> --- a/libgomp/config/gcn/team.c >>> +++ b/libgomp/config/gcn/team.c >>> @@ -45,6 +46,9 @@ gomp_gcn_enter_kernel (void) >>> { >>> int threadid =3D __builtin_gcn_dim_pos (1); >>> >> Shouldn't this: >> >>> + /* Initialize indirect function support. */ >>> + build_indirect_map (); >>> + >> ... be called inside here: >> >>> if (threadid =3D=3D 0) >>> { >> ..., so that it's only executed by one thread? > (concur) >> Also, for my understanding: why is 'build_indirect_map' done at kernel >> invocation time (here) instead of at image load time? > > The splay_tree is generated on the device itself - and we currently do > not start a kernel during GOMP_OFFLOAD_load_image. We could, the > question is whether it makes sense. (Generating the splay_tree on the > host for the device is a hassle and error prone as it needs to use > device pointers at the end.) Hmm. It seems conceptually cleaner to me to set this up upfront, and avoids potentially slowing down every device kernel invocation (at least another function call, and 'gomp_mutex_lock' check). Though, I agree this may be "in the noise" with regards to all the other stuff going on in 'gomp_gcn_enter_kernel' and elsewhere... What I just realize, what's also unclear to me is how the current implementation works with regards to several images getting loaded -- don't we then overwrite 'GOMP_INDIRECT_ADDR_MAP' instead of (conceptually) appending to it? In the general case, additional images may also get loaded during execution. We thus need proper locking of the shared data structure, uh? Or, can we have separate on-device data structures per image? (I've not yet thought about that in detail.) Relatedly then, when images are unloaded, we also need to remove stale items from the table, and release resources (for example, the 'GOMP_OFFLOAD_alloc' for 'map_target_addr'). >>> +++ b/libgomp/testsuite/libgomp.c-c++-common/declare-target-indirect-2.= c Another thing regarding this test case: testing '-foffload-options=3Damdgcn-amdhsa=3D-march=3Dgfx900' offloading on our amdnano4 system, I see: +PASS: libgomp.c/../libgomp.c-c++-common/declare-target-indirect-2.c ex= ecution test ..., but: +FAIL: libgomp.c++/../libgomp.c-c++-common/declare-target-indirect-2.c = execution test Memory access fault by GPU node-2 (Agent handle: 0x21b0530) on address = 0x401000. Reason: Page not present or supervisor privilege. Re-running this manually a few times, I got: pass: 5 fail (as above): 3 hang: 1 Otherwise, that system appears to behave normally, and a reboot did not cure this. Gr=C3=BC=C3=9Fe Thomas ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstra=C3=9Fe 201= , 80634 M=C3=BCnchen; Gesellschaft mit beschr=C3=A4nkter Haftung; Gesch=C3= =A4ftsf=C3=BChrer: Thomas Heurung, Frank Th=C3=BCrauf; Sitz der Gesellschaf= t: M=C3=BCnchen; Registergericht M=C3=BCnchen, HRB 106955