From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from esa3.mentor.iphmx.com (esa3.mentor.iphmx.com [68.232.137.180]) by sourceware.org (Postfix) with ESMTPS id 6E16C3858407 for ; Thu, 9 Nov 2023 12:24:49 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 6E16C3858407 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 6E16C3858407 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=68.232.137.180 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699532693; cv=none; b=fS+lixbUAwLgavB/CzbNcyipIIqrYO37/S43zAQOYvZG+EjUvpAoc5iHd3w9f0phkNS2C5jzbZyaX0zqJjnH3ueKU5aqH/8fYTkiSgsqwGcB3uvfL3Ji1LZGQ8NCaEn+Nm+iwj5SQ/FbwBpKt3ddVfZlFo1hLL6DjXjTxONROMw= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699532693; c=relaxed/simple; bh=2Fq0XVvdE+VJ/byd24yXYdLJeyWU72ha7uwK/zUsTyM=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=JTr1jJdj7Db0XMPk7NVSmy+Y3n1eUfrKlATPFbbzUQfetifNqwTEKMizbsJlX3h8parQd983GpTC0M979rmfTlhFSn3n61+KmHxc01qmf36ppshoUd6NMrQK+Tq8x9731IYn1+yZhmGtVtzHp73fdzxhyAIPvO3eXo2ai6ftIgI= ARC-Authentication-Results: i=1; server2.sourceware.org X-CSE-ConnectionGUID: Lx6NJosyQ7qzSnR/oiobew== X-CSE-MsgGUID: bv/uRjZLRXu+yKqWRsx48Q== X-IronPort-AV: E=Sophos;i="6.03,289,1694764800"; d="scan'208";a="22354167" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa3.mentor.iphmx.com with ESMTP; 09 Nov 2023 04:24:48 -0800 IronPort-SDR: bIouGaEtF85DjCIIjJfBvDKZdQZ3ZLpSu5RQUal1/CdnPN5EhNatPUjfYtxm57xfBPVCS+LEy1 VHKJjczjir81uNZrwD0aGCSZaWsOzAnnQ5IqFmMXjpUpLqhdzw1pzejBkCH6FFQtrDbmJf/2+1 yfOviyVpLRozPQJutNEP4Rt7n4RA1/CcAUSnsHXjmqbcS2B8zBvrFl7O4YkGAIcKCzp4PKloAJ 1ZRB3qdtS6D98C91esJz0AdjgbgO0f9AZwIrz6zUNGaP0uYsrgT6fxNFBSHggCvRnyvcFOJ3C5 GT4= From: Thomas Schwinge To: Kwok Cheung Yeung CC: Tobias Burnus , , "Jakub Jelinek" Subject: Re: [PATCH] openmp: Add support for the 'indirect' clause in C/C++ In-Reply-To: <37f412ee-58e7-4bde-a763-591268e8f8f4@codesourcery.com> References: <37f412ee-58e7-4bde-a763-591268e8f8f4@codesourcery.com> User-Agent: Notmuch/0.29.3+94~g74c3f1b (https://notmuchmail.org) Emacs/28.2 (x86_64-pc-linux-gnu) Date: Thu, 9 Nov 2023 13:24:38 +0100 Message-ID: <87wmurru61.fsf@euler.schwinge.homeip.net> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-14.mgc.mentorg.com (139.181.222.14) To svr-ies-mbx-10.mgc.mentorg.com (139.181.222.10) X-Spam-Status: No, score=-5.8 required=5.0 tests=BAYES_00,HEADER_FROM_DIFFERENT_DOMAINS,KAM_DMARC_STATUS,KAM_SHORT,SPF_HELO_PASS,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Kwok! Nice work! A few comments: On 2023-11-03T19:53:28+0000, Kwok Cheung Yeung wrote= : > Subject: [PATCH] openmp: Add support for the 'indirect' clause in C/C++ > > This adds support for the 'indirect' clause in the 'declare target' > directive. Functions declared as indirect may be called via function > pointers passed from the host in offloaded code. > > Virtual calls to member functions via the object pointer in C++ are > currently not supported in target regions. Similar to how you have it here: > --- a/gcc/config/nvptx/mkoffload.cc > +++ b/gcc/config/nvptx/mkoffload.cc > @@ -51,6 +51,7 @@ struct id_map > }; > > static id_map *func_ids, **funcs_tail =3D &func_ids; > +static id_map *ind_func_ids, **ind_funcs_tail =3D &ind_func_ids; > static id_map *var_ids, **vars_tail =3D &var_ids; > > /* Files to unlink. */ > @@ -302,6 +303,11 @@ process (FILE *in, FILE *out, uint32_t omp_requires) | else if (startswith (input + i, "FUNC_MAP ")) | { > output_fn_ptr =3D true; > record_id (input + i + 9, &funcs_tail); > } > + else if (startswith (input + i, "IND_FUNC_MAP ")) > + { > + output_fn_ptr =3D true; > + record_id (input + i + 13, &ind_funcs_tail); > + } > else > abort (); > /* Skip to next line. */ ..., please also here: > --- a/gcc/config/nvptx/nvptx.cc > +++ b/gcc/config/nvptx/nvptx.cc > @@ -5919,7 +5919,11 @@ nvptx_record_offload_symbol (tree decl) > /* OpenMP offloading does not set this attribute. */ > tree dims =3D attr ? TREE_VALUE (attr) : NULL_TREE; > > - fprintf (asm_out_file, "//:FUNC_MAP \"%s\"", > + fprintf (asm_out_file, "//:"); > + if (lookup_attribute ("omp declare target indirect", > + DECL_ATTRIBUTES (decl))) > + fprintf (asm_out_file, "IND_"); > + fprintf (asm_out_file, "FUNC_MAP \"%s\"", > IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl))); ... maintain separate 'if' branches for 'FUNC_MAP' vs. 'IND_FUNC_MAP', so that we're able to easily locate those with 'grep', for example. Also, assuming that the order of appearance of 'IND_FUNC_MAP' does matter as it does for 'FUNC_MAP', don't we have to handle the former in nvptx-tools 'as', too? For example, see "Ensure :VAR_MAP and :FUNC_MAP are output in order", in particular nvptx-tools commit aa3404ad5a496cda5d79a50bedb1344fd63e8763 "Ensure :VAR_MAP and :FUNC_MAP are output in order, part II [#29]". Please check that, and submit on , if necessary. If yes, maybe that's another nudge towards: "Instead of 'K_comment', a point could be made to have these be represented as their own 'Kind'." That is, '//:' be its own token kind, and handled generically, instead of '//:VAR_MAP', '//:FUNC_MAP', '//:IND_FUNC_MAP' specially/only. I shall then look into that, later. > --- a/gcc/lto-cgraph.cc > +++ b/gcc/lto-cgraph.cc > @@ -68,6 +68,7 @@ enum LTO_symtab_tags > LTO_symtab_edge, > LTO_symtab_indirect_edge, > LTO_symtab_variable, > + LTO_symtab_indirect_function, > LTO_symtab_last_tag > }; I did wonder if that new tag should have "offload" in its name, as that's the only case where it's used? But then I noticed that here ('output_offload_tables'): > @@ -1111,6 +1112,18 @@ output_offload_tables (void) > (*offload_vars)[i]); > } > > + for (unsigned i =3D 0; i < vec_safe_length (offload_ind_funcs); i++) > + { > + symtab_node *node =3D symtab_node::get ((*offload_ind_funcs)[i]); > + if (!node) > + continue; > + node->force_output =3D true; > + streamer_write_enum (ob->main_stream, LTO_symtab_tags, > + LTO_symtab_last_tag, LTO_symtab_indirect_functio= n); > + lto_output_fn_decl_ref (ob->decl_state, ob->main_stream, > + (*offload_ind_funcs)[i]); > + } > + ..., and correspondingly here ('input_offload_tables'): > @@ -1863,6 +1877,19 @@ input_offload_tables (bool do_force_output) > varpool_node::get (var_decl)->force_output =3D 1; > tmp_decl =3D var_decl; > } > + else if (tag =3D=3D LTO_symtab_indirect_function) > + { > + tree fn_decl > + =3D lto_input_fn_decl_ref (ib, file_data); > + vec_safe_push (offload_ind_funcs, fn_decl); > + > + /* Prevent IPA from removing fn_decl as unreachable, since th= ere > + may be no refs from the parent function to child_fn in off= load > + LTO mode. */ > + if (do_force_output) > + cgraph_node::get (fn_decl)->mark_force_output (); > + tmp_decl =3D fn_decl; > + } > else if (tag =3D=3D LTO_symtab_edge) > { > static bool error_emitted =3D false; ..., we're currently using 'LTO_symtab_unavail_node' for 'offload_funcs' and 'LTO_symtab_variable' for 'offload_vars', also with "LTO" (non-"offload") tags, so 'LTO_symtab_indirect_function' isn't any worse. Maybe, though, we should generally have separate tags for offloading use? Possibly aliasing (in value) the LTO ones -- but maybe actually not, to improve "type safety". I shall look into that, later. > --- a/gcc/tree-core.h > +++ b/gcc/tree-core.h > @@ -350,6 +350,9 @@ enum omp_clause_code { > /* OpenMP clause: doacross ({source,sink}:vec). */ > OMP_CLAUSE_DOACROSS, > > + /* OpenMP clause: indirect [(constant-integer-expression)]. */ > + OMP_CLAUSE_INDIRECT, > + > /* Internal structure to hold OpenACC cache directive's variable-list. > #pragma acc cache (variable-list). */ > OMP_CLAUSE__CACHE_, In this position here, isn't 'OMP_CLAUSE_INDIRECT' applicable to the 'OMP_CLAUSE_RANGE_CHECK' in 'gcc/tree.h:OMP_CLAUSE_SIZE' and 'gcc/tree.h:OMP_CLAUSE_DECL': #define OMP_CLAUSE_SIZE(NODE) = \ OMP_CLAUSE_OPERAND (OMP_CLAUSE_RANGE_CHECK (OMP_CLAUSE_CHECK (NODE), = \ OMP_CLAUSE_FROM, \ OMP_CLAUSE__CACHE_), 1) #define OMP_CLAUSE_DECL(NODE) \ OMP_CLAUSE_OPERAND (OMP_CLAUSE_RANGE_CHECK (OMP_CLAUSE_CHECK (NODE), = \ OMP_CLAUSE_PRIVATE, \ OMP_CLAUSE__SCANTEMP_), 0) That's probably not intentional? In that case, maybe simply move it at the end of the clause list? (..., and generally then match that ordering in any 'switch'es, as applicable, and likewise position 'gcc/tree.h:OMP_CLAUSE_INDIRECT_EXPR' correspondingly.) I would've assumed handling for 'OMP_CLAUSE_INDIRECT' to also be necessary in the following places: - 'gcc/c-family/c-omp.cc:c_omp_split_clauses' - 'gcc/cp/pt.cc:tsubst_omp_clauses', - 'gcc/gimplify.cc:gimplify_scan_omp_clauses', 'gcc/gimplify.cc:gimplify_adjust_omp_clauses' - 'gcc/omp-low.cc:scan_sharing_clauses' (twice) - 'gcc/tree-nested.cc:convert_nonlocal_omp_clauses', 'gcc/tree-nested.cc:convert_local_omp_clauses' - 'gcc/tree-pretty-print.cc:dump_omp_clause' Please verify, and add handling as well as test cases as necessary, or, as applicable, put 'case OMP_CLAUSE_INDIRECT:' next to 'default: gcc_unreachable ();' etc., if indeed that clause is not expected there. In this file here: > --- /dev/null > +++ b/libgomp/config/accel/target-indirect.c > @@ -0,0 +1,126 @@ > +/* Copyright (C) 2023 Free Software Foundation, Inc. > + > + Contributed by Siemens. > + > + This file is part of the GNU Offloading and Multi Processing Library > + (libgomp). > + > + Libgomp is free software; you can redistribute it and/or modify it > + under the terms of the GNU General Public License as published by > + the Free Software Foundation; either version 3, or (at your option) > + any later version. > + > + Libgomp is distributed in the hope that it will be useful, but WITHOU= T ANY > + WARRANTY; without even the implied warranty of MERCHANTABILITY or FIT= NESS > + FOR A PARTICULAR PURPOSE. See the GNU General Public License for > + more details. > + > + Under Section 7 of GPL version 3, you are granted additional > + permissions described in the GCC Runtime Library Exception, version > + 3.1, as published by the Free Software Foundation. > + > + You should have received a copy of the GNU General Public License and > + a copy of the GCC Runtime Library Exception along with this program; > + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see > + . */ > + > +#include > +#include "libgomp.h" > + > +#define splay_tree_prefix indirect > +#define splay_tree_c > +#include "splay-tree.h" > + > +volatile void **GOMP_INDIRECT_ADDR_MAP =3D NULL; > + > +/* Use a splay tree to lookup the target address instead of using a > + linear search. */ > +#define USE_SPLAY_TREE_LOOKUP > + > +#ifdef USE_SPLAY_TREE_LOOKUP > + > +static struct indirect_splay_tree_s indirect_map; > +static indirect_splay_tree_node indirect_array =3D NULL; > + > +/* Build the splay tree used for host->target address lookups. */ > + > +void > +build_indirect_map (void) > +{ > + size_t num_ind_funcs =3D 0; > + volatile void **map_entry; > + static int lock =3D 0; /* =3D=3D gomp_mutex_t lock; gomp_mutex_init (&= lock); */ > + > + if (!GOMP_INDIRECT_ADDR_MAP) > + return; > + > + gomp_mutex_lock (&lock); > + > + if (!indirect_array) > + { > + /* Count the number of entries in the NULL-terminated address map.= */ > + for (map_entry =3D GOMP_INDIRECT_ADDR_MAP; *map_entry; > + map_entry +=3D 2, num_ind_funcs++); > + > + /* Build splay tree for address lookup. */ > + indirect_array =3D gomp_malloc (num_ind_funcs * sizeof (*indirect_= array)); > + indirect_splay_tree_node array =3D indirect_array; > + map_entry =3D GOMP_INDIRECT_ADDR_MAP; > + > + for (int i =3D 0; i < num_ind_funcs; i++, array++) > + { > + indirect_splay_tree_key k =3D &array->key; > + k->host_addr =3D (uint64_t) *map_entry++; > + k->target_addr =3D (uint64_t) *map_entry++; > + array->left =3D NULL; > + array->right =3D NULL; > + indirect_splay_tree_insert (&indirect_map, array); > + } > + } > + > + gomp_mutex_unlock (&lock); > +} > + > +void * > +GOMP_target_map_indirect_ptr (void *ptr) > +{ > + /* NULL pointers always resolve to NULL. */ > + if (!ptr) > + return ptr; > + > + assert (indirect_array); > + > + struct indirect_splay_tree_key_s k; > + indirect_splay_tree_key node =3D NULL; > + > + k.host_addr =3D (uint64_t) ptr; > + node =3D indirect_splay_tree_lookup (&indirect_map, &k); > + > + return node ? (void *) node->target_addr : ptr; > +} > + > +#else > + > +void > +build_indirect_map (void) > +{ > +} > + > +void * > +GOMP_target_map_indirect_ptr (void *ptr) > +{ > + /* NULL pointers always resolve to NULL. */ > + if (!ptr) > + return ptr; > + > + assert (GOMP_INDIRECT_ADDR_MAP); > + > + for (volatile void **map_entry =3D GOMP_INDIRECT_ADDR_MAP; *map_entry; > + map_entry +=3D 2) > + if (*map_entry =3D=3D ptr) > + return (void *) *(map_entry + 1); > + > + return ptr; > +} > + > +#endif ..., I'm curious why certain variables are declared 'volatile'? Is that really the right approach for whatever exactly the (concurrency?) requirements here are? > --- a/libgomp/config/gcn/team.c > +++ b/libgomp/config/gcn/team.c > @@ -30,6 +30,7 @@ > +extern void build_indirect_map (void); Why not generally have a prototype for this (new 'libgomp/config/accel/target-indirect.h', or maybe just 'libgomp/libgomp.h'?)? > @@ -45,6 +46,9 @@ gomp_gcn_enter_kernel (void) > { > int threadid =3D __builtin_gcn_dim_pos (1); > Shouldn't this: > + /* Initialize indirect function support. */ > + build_indirect_map (); > + ... be called inside here: > if (threadid =3D=3D 0) > { ..., so that it's only executed by one thread? Also, for my understanding: why is 'build_indirect_map' done at kernel invocation time (here) instead of at image load time? > --- a/libgomp/config/nvptx/team.c > +++ b/libgomp/config/nvptx/team.c > @@ -35,6 +35,7 @@ struct gomp_thread *nvptx_thrs __attribute__((shared,no= common)); > +extern void build_indirect_map (void); Likewise to 'libgomp/config/gcn/team.c'. > @@ -52,6 +53,10 @@ gomp_nvptx_main (void (*fn) (void *), void *fn_data) > int tid, ntids; > asm ("mov.u32 %0, %%tid.y;" : "=3Dr" (tid)); > asm ("mov.u32 %0, %%ntid.y;" : "=3Dr" (ntids)); > + > + /* Initialize indirect function support. */ > + build_indirect_map (); > + > if (tid =3D=3D 0) > { Likewise to 'libgomp/config/gcn/team.c'. > --- /dev/null > +++ b/libgomp/testsuite/libgomp.c-c++-common/declare-target-indirect-2.c > @@ -0,0 +1,33 @@ > +/* { dg-do run } */ > + > +#define N 256 > + > +#pragma omp begin declare target indirect > +int foo(void) { return 5; } > +int bar(void) { return 8; } > +int baz(void) { return 11; } > +#pragma omp end declare target > + > +int main (void) > +{ > + int i, x =3D 0, expected =3D 0; > + int (*fn_ptr[N])(void); > + > + for (i =3D 0; i < N; i++) > + { > + switch (i % 3) > + { > + case 0: fn_ptr[i] =3D &foo; > + case 1: fn_ptr[i] =3D &bar; > + case 2: fn_ptr[i] =3D &baz; > + } > + expected +=3D (*fn_ptr[i]) (); > + } > + > +#pragma omp target teams distribute parallel for reduction(+: x) \ > + map (to: fn_ptr) map (tofrom: x) > + for (int i =3D 0; i < N; i++) > + x +=3D (*fn_ptr[i]) (); > + > + return x - expected; > +} [...]/libgomp.c-c++-common/declare-target-indirect-2.c: In function =E2= =80=98main=E2=80=99: [...]/libgomp.c-c++-common/declare-target-indirect-2.c:20:27: warning: = this statement may fall through [-Wimplicit-fallthrough=3D] 20 | case 0: fn_ptr[i] =3D &foo; | ~~~~~~~~~~^~~~~~ [...]/libgomp.c-c++-common/declare-target-indirect-2.c:21:9: note: here 21 | case 1: fn_ptr[i] =3D &bar; | ^~~~ [...]/libgomp.c-c++-common/declare-target-indirect-2.c:21:27: warning: = this statement may fall through [-Wimplicit-fallthrough=3D] 21 | case 1: fn_ptr[i] =3D &bar; | ~~~~~~~~~~^~~~~~ [...]/libgomp.c-c++-common/declare-target-indirect-2.c:22:9: note: here 22 | case 2: fn_ptr[i] =3D &baz; | ^~~~ ..., so I suppose that's effectively testing 'fn_ptr[i] =3D &baz;' only for all 'i's? Gr=C3=BC=C3=9Fe Thomas ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstra=C3=9Fe 201= , 80634 M=C3=BCnchen; Gesellschaft mit beschr=C3=A4nkter Haftung; Gesch=C3= =A4ftsf=C3=BChrer: Thomas Heurung, Frank Th=C3=BCrauf; Sitz der Gesellschaf= t: M=C3=BCnchen; Registergericht M=C3=BCnchen, HRB 106955