From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-x12d.google.com (mail-lf1-x12d.google.com [IPv6:2a00:1450:4864:20::12d]) by sourceware.org (Postfix) with ESMTPS id 463173857736 for ; Fri, 8 Mar 2024 13:40:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 463173857736 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=baylibre.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=baylibre.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 463173857736 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::12d ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1709905261; cv=none; b=mpv/3hXLuWCD+61k99e5pikHIrr1roU5MBowK3NqQ/q0u6VKRbFsq2oO8qSYFdn0h4fgAuz/QWm0f9HU4lF5GkkiAGL5o69+AMYDklA0UubQUt00OSxFPlEalPhYZa35jBZxcjuv0/WntIWnJCdBwFKFtrSF2WEgIEEQ6P3Wohs= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1709905261; c=relaxed/simple; bh=s/BK1oUKD7QNpDbzZSl3sKxA71L5AC3VDK2e+YvALhQ=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=aiWF9/jC2to4pSQ7i/cWsxS920kgMz8pTKrVIw+FvoirEflmgYvHD4+ECGwdUzHwG7wiZsCKuArW90M3XofyI0U47jHsNcwMWZd6vIkjgt7FoeTFgOrb2LtE2Di7XtVIcPDzah3Mx9qn8Ms6T7Lu68UKxXNOmqjLyPaBd1sNIkw= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-lf1-x12d.google.com with SMTP id 2adb3069b0e04-51320ca689aso1113787e87.2 for ; Fri, 08 Mar 2024 05:40:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=baylibre-com.20230601.gappssmtp.com; s=20230601; t=1709905256; x=1710510056; darn=gcc.gnu.org; h=content-transfer-encoding:mime-version:message-id:date:user-agent :references:in-reply-to:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=vrKi1Bgvdn3/XeV/f1eJgq6AQ8ST1VUi++q+8AdJHD4=; b=u0RnpNWBjRl32PlbDP2jNgctUS8FUm5etivg/47eWYe00BfRVS+rkotg9mS8mUU9iQ Ac0gRmc7rrM9UJ/qlJ8cuIfWOgft14Nrb99ia7GE5tWg9L9ZMCnVvlnFyik3QKwmhd/w Nk3hXtgr+9l0cuukEFPZ/gKrb45ICb5rZ/TYkkWfEwcpDL+cM3kImU2JmO4OibE06v+h LQVum2TvwUBVJYYkhToLARzGHSs3XydvCdJquiWDub3wBW3rls3L37YuCfbbpxk6jTpl 5kMFRe1GmgjmaHi25Ut38pI8ys3c7yR7InafAKaYPGCVujc+F190ceohHc8aZuhsSzMI LabQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709905256; x=1710510056; h=content-transfer-encoding:mime-version:message-id:date:user-agent :references:in-reply-to:subject:cc:to:from:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=vrKi1Bgvdn3/XeV/f1eJgq6AQ8ST1VUi++q+8AdJHD4=; b=RlucsbNYMZ3glycG0bN92H1jZe6tV1S1k+Qtmaek9FbjSBtH9p7yvpvHpNPzh+lntM 44tk57/A6EEGVjfsM1zhK1ylR95WeuTZw3gu9K94ZcXCHFaBVWqzpNzE8sbvhczh92gx K804RU7ygG2tAF0+KRsrhOzw39oZzfp7pLKkTQwaXzLjEd7SnMHPkS0JvsZK+VdSO0vw Xymk74/t71icTZk3whst2FwgyjizyEwPxPkKxmmbqNSodxJGLlVVTVpyrsb5UrSCRXhD DOXmUCGj2eGk6P5jTlof9i3++sy4HtWDHIflMn1ZRkq9gCKLSWpgZZbDLSbcVOpcxYxs vgcw== X-Gm-Message-State: AOJu0Yzt5ncBDozBYKaalF1lolNJS9Uu6Vi4R1jX20/XIWzX4uUGftFQ 3igQW1ytsy/Ll1OzNyTtrH1ez9yG9TPgNpklhBYDxG5yR7Cg2dc9IT15OmGGvNs= X-Google-Smtp-Source: AGHT+IHI+u0gGQN8lJhGIFz/fuIYoBEC0WNbVEViLfzFPH1bCm6o7AGIU0NgPfMWV6/ZzniUk880Pw== X-Received: by 2002:a05:6512:b9d:b0:513:97eb:2810 with SMTP id b29-20020a0565120b9d00b0051397eb2810mr964097lfv.54.1709905256280; Fri, 08 Mar 2024 05:40:56 -0800 (PST) Received: from euler.schwinge.homeip.net (p200300c8b70336000b0134869109dcb1.dip0.t-ipconnect.de. [2003:c8:b703:3600:b01:3486:9109:dcb1]) by smtp.gmail.com with ESMTPSA id t14-20020a05600c198e00b0041312c4865asm4287824wmq.2.2024.03.08.05.40.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 Mar 2024 05:40:56 -0800 (PST) From: Thomas Schwinge To: Kwok Cheung Yeung , Jakub Jelinek , Tobias Burnus Cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH v2] openmp: Change to using a hashtab to lookup offload target addresses for indirect function calls In-Reply-To: <679889de-bf47-4a01-887e-db96f7fad427@baylibre.com> References: <679889de-bf47-4a01-887e-db96f7fad427@baylibre.com> User-Agent: Notmuch/0.29.3+94~g74c3f1b (https://notmuchmail.org) Emacs/29.1 (x86_64-pc-linux-gnu) Date: Fri, 08 Mar 2024 14:40:47 +0100 Message-ID: <87o7boet9c.fsf@euler.schwinge.ddns.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-10.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi! On 2024-01-29T17:48:47+0000, Kwok Cheung Yeung wrote: > A splay-tree was previously used to lookup equivalent target addresses > for a given host address on offload targets. However, as splay-trees can > modify their structure on lookup, they are not suitable for concurrent > access from separate teams/threads without some form of locking. Heh. ,-) > This > patch changes the lookup data structure to a hashtab instead, which does > not have these issues. (I've not looked into which data structure is most suitable here; not my area of expertise.) > The call to build_indirect_map to initialize the data structure is now > called from just the first thread of the first team to avoid redundant > calls to this function. ACK, and also you've removed a number of 'volatile's, as I had questioned earlier. It remains open the question when to do the initialization, and how to react to dynamic device image load and unload, and possibly other (but not many?) raised during review. I cannot formally approve this patch, but it seems a good incremental step forward to me: per my testing so far, (a) 'libgomp.c-c++-common/declare-target-indirect-2.c' is all-PASS, with 'warning: this statement may fall through' resolved, and (b) for 'libgomp.fortran/declare-target-indirect-2.f90': no more timeouts (applies to nvptx only), and all-PASS execution test (both GCN, nvptx): PASS: libgomp.fortran/declare-target-indirect-2.f90 -O0 (test for ex= cess errors) [-WARNING: libgomp.fortran/declare-target-indirect-2.f90 -O0 executi= on test program timed out.-] [-XFAIL:-]{+PASS:+} libgomp.fortran/declare-target-indirect-2.f90 -O0= execution test PASS: libgomp.fortran/declare-target-indirect-2.f90 -O1 (test for ex= cess errors) [-WARNING: libgomp.fortran/declare-target-indirect-2.f90 -O1 executi= on test program timed out.-] [-XFAIL:-]{+PASS:+} libgomp.fortran/declare-target-indirect-2.f90 -O1= execution test PASS: libgomp.fortran/declare-target-indirect-2.f90 -O2 (test for ex= cess errors) [-WARNING: libgomp.fortran/declare-target-indirect-2.f90 -O2 executi= on test program timed out.-] [-XFAIL:-]{+PASS:+} libgomp.fortran/declare-target-indirect-2.f90 -O2= execution test PASS: libgomp.fortran/declare-target-indirect-2.f90 -O3 -fomit-frame-= pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (test for = excess errors) [-WARNING: libgomp.fortran/declare-target-indirect-2.f90 -O3 -fomit-f= rame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execu= tion test program timed out.-] [-XFAIL:-]{+PASS:+} libgomp.fortran/declare-target-indirect-2.f90 -O3= -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functio= ns execution test PASS: libgomp.fortran/declare-target-indirect-2.f90 -O3 -g (test for= excess errors) [-WARNING: libgomp.fortran/declare-target-indirect-2.f90 -O3 -g exec= ution test program timed out.-] [-XFAIL:-]{+PASS:+} libgomp.fortran/declare-target-indirect-2.f90 -O3= -g execution test PASS: libgomp.fortran/declare-target-indirect-2.f90 -Os (test for ex= cess errors) [-WARNING: libgomp.fortran/declare-target-indirect-2.f90 -Os executi= on test program timed out.-] [-XFAIL:-]{+PASS:+} libgomp.fortran/declare-target-indirect-2.f90 -Os= execution test (Of course, the patch now needs un-XFAILing of 'libgomp.fortran/declare-target-indirect-2.f90' merged in.) Gr=C3=BC=C3=9Fe Thomas > libgomp/ > * config/accel/target-indirect.c: Include string.h and hashtab.h. > Remove include of splay-tree.h. Update comments. > (splay_tree_prefix, splay_tree_c): Delete. > (struct indirect_map_t): New. > (hash_entry_type, htab_alloc, htab_free, htab_hash, htab_eq): New. > (GOMP_INDIRECT_ADD_MAP): Remove volatile qualifier. > (USE_SPLAY_TREE_LOOKUP): Rename to... > (USE_HASHTAB_LOOKUP): ..this. > (indirect_map, indirect_array): Delete. > (indirect_htab): New. > (build_indirect_map): Remove locking. Build indirect map using > hashtab. > (GOMP_target_map_indirect_ptr): Use indirect_htab to lookup target > address. > (GOMP_target_map_indirect_ptr): Remove volatile qualifier. > * config/gcn/team.c (gomp_gcn_enter_kernel): Call build_indirect_map > from first thread of first team only. > * config/nvptx/team.c (gomp_nvptx_main): Likewise. > * testsuite/libgomp.c-c++-common/declare-target-indirect-2.c (main): > Add missing break statements. > --- > libgomp/config/accel/target-indirect.c | 83 ++++++++++--------- > libgomp/config/gcn/team.c | 7 +- > libgomp/config/nvptx/team.c | 9 +- > .../declare-target-indirect-2.c | 14 ++-- > 4 files changed, 63 insertions(+), 50 deletions(-) > > diff --git a/libgomp/config/accel/target-indirect.c b/libgomp/config/acce= l/target-indirect.c > index c60fd547cb6..cfef1ddbc49 100644 > --- a/libgomp/config/accel/target-indirect.c > +++ b/libgomp/config/accel/target-indirect.c > @@ -25,60 +25,73 @@ > . */ >=20=20 > #include > +#include > #include "libgomp.h" >=20=20 > -#define splay_tree_prefix indirect > -#define splay_tree_c > -#include "splay-tree.h" > +struct indirect_map_t > +{ > + void *host_addr; > + void *target_addr; > +}; > + > +typedef struct indirect_map_t *hash_entry_type; > + > +static inline void * htab_alloc (size_t size) { return gomp_malloc (size= ); } > +static inline void htab_free (void *ptr) { free (ptr); } > + > +#include "hashtab.h" > + > +static inline hashval_t > +htab_hash (hash_entry_type element) > +{ > + return hash_pointer (element->host_addr); > +} >=20=20 > -volatile void **GOMP_INDIRECT_ADDR_MAP =3D NULL; > +static inline bool > +htab_eq (hash_entry_type x, hash_entry_type y) > +{ > + return x->host_addr =3D=3D y->host_addr; > +} >=20=20 > -/* Use a splay tree to lookup the target address instead of using a > - linear search. */ > -#define USE_SPLAY_TREE_LOOKUP > +void **GOMP_INDIRECT_ADDR_MAP =3D NULL; >=20=20 > -#ifdef USE_SPLAY_TREE_LOOKUP > +/* Use a hashtab to lookup the target address instead of using a linear > + search. */ > +#define USE_HASHTAB_LOOKUP >=20=20 > -static struct indirect_splay_tree_s indirect_map; > -static indirect_splay_tree_node indirect_array =3D NULL; > +#ifdef USE_HASHTAB_LOOKUP >=20=20 > -/* Build the splay tree used for host->target address lookups. */ > +static htab_t indirect_htab =3D NULL; > + > +/* Build the hashtab used for host->target address lookups. */ >=20=20 > void > build_indirect_map (void) > { > size_t num_ind_funcs =3D 0; > - volatile void **map_entry; > - static int lock =3D 0; /* =3D=3D gomp_mutex_t lock; gomp_mutex_init (&= lock); */ > + void **map_entry; >=20=20 > if (!GOMP_INDIRECT_ADDR_MAP) > return; >=20=20 > - gomp_mutex_lock (&lock); > - > - if (!indirect_array) > + if (!indirect_htab) > { > /* Count the number of entries in the NULL-terminated address map.= */ > for (map_entry =3D GOMP_INDIRECT_ADDR_MAP; *map_entry; > map_entry +=3D 2, num_ind_funcs++); >=20=20 > - /* Build splay tree for address lookup. */ > - indirect_array =3D gomp_malloc (num_ind_funcs * sizeof (*indirect_= array)); > - indirect_splay_tree_node array =3D indirect_array; > + /* Build hashtab for address lookup. */ > + indirect_htab =3D htab_create (num_ind_funcs); > map_entry =3D GOMP_INDIRECT_ADDR_MAP; >=20=20 > - for (int i =3D 0; i < num_ind_funcs; i++, array++) > + for (int i =3D 0; i < num_ind_funcs; i++, map_entry +=3D 2) > { > - indirect_splay_tree_key k =3D &array->key; > - k->host_addr =3D (uint64_t) *map_entry++; > - k->target_addr =3D (uint64_t) *map_entry++; > - array->left =3D NULL; > - array->right =3D NULL; > - indirect_splay_tree_insert (&indirect_map, array); > + struct indirect_map_t element =3D { *map_entry, NULL }; > + hash_entry_type *slot =3D htab_find_slot (&indirect_htab, &element, > + INSERT); > + *slot =3D (hash_entry_type) map_entry; > } > } > - > - gomp_mutex_unlock (&lock); > } >=20=20 > void * > @@ -88,15 +101,11 @@ GOMP_target_map_indirect_ptr (void *ptr) > if (!ptr) > return ptr; >=20=20 > - assert (indirect_array); > - > - struct indirect_splay_tree_key_s k; > - indirect_splay_tree_key node =3D NULL; > - > - k.host_addr =3D (uint64_t) ptr; > - node =3D indirect_splay_tree_lookup (&indirect_map, &k); > + assert (indirect_htab); >=20=20 > - return node ? (void *) node->target_addr : ptr; > + struct indirect_map_t element =3D { ptr, NULL }; > + hash_entry_type entry =3D htab_find (indirect_htab, &element); > + return entry ? entry->target_addr : ptr; > } >=20=20 > #else > @@ -115,7 +124,7 @@ GOMP_target_map_indirect_ptr (void *ptr) >=20=20 > assert (GOMP_INDIRECT_ADDR_MAP); >=20=20 > - for (volatile void **map_entry =3D GOMP_INDIRECT_ADDR_MAP; *map_entry; > + for (void **map_entry =3D GOMP_INDIRECT_ADDR_MAP; *map_entry; > map_entry +=3D 2) > if (*map_entry =3D=3D ptr) > return (void *) *(map_entry + 1); > diff --git a/libgomp/config/gcn/team.c b/libgomp/config/gcn/team.c > index 61e9c616b67..bd3df448b52 100644 > --- a/libgomp/config/gcn/team.c > +++ b/libgomp/config/gcn/team.c > @@ -52,14 +52,15 @@ gomp_gcn_enter_kernel (void) > { > int threadid =3D __builtin_gcn_dim_pos (1); >=20=20 > - /* Initialize indirect function support. */ > - build_indirect_map (); > - > if (threadid =3D=3D 0) > { > int numthreads =3D __builtin_gcn_dim_size (1); > int teamid =3D __builtin_gcn_dim_pos(0); >=20=20 > + /* Initialize indirect function support. */ > + if (teamid =3D=3D 0) > + build_indirect_map (); > + > /* Set up the global state. > Every team will do this, but that should be harmless. */ > gomp_global_icv.nthreads_var =3D 16; > diff --git a/libgomp/config/nvptx/team.c b/libgomp/config/nvptx/team.c > index 0cf5dad39ca..d5361917a24 100644 > --- a/libgomp/config/nvptx/team.c > +++ b/libgomp/config/nvptx/team.c > @@ -60,9 +60,6 @@ gomp_nvptx_main (void (*fn) (void *), void *fn_data) > asm ("mov.u32 %0, %%tid.y;" : "=3Dr" (tid)); > asm ("mov.u32 %0, %%ntid.y;" : "=3Dr" (ntids)); >=20=20 > - /* Initialize indirect function support. */ > - build_indirect_map (); > - > if (tid =3D=3D 0) > { > gomp_global_icv.nthreads_var =3D ntids; > @@ -74,6 +71,12 @@ gomp_nvptx_main (void (*fn) (void *), void *fn_data) > nvptx_thrs =3D alloca (ntids * sizeof (*nvptx_thrs)); > memset (nvptx_thrs, 0, ntids * sizeof (*nvptx_thrs)); >=20=20 > + /* Initialize indirect function support. */ > + unsigned int block_id; > + asm ("mov.u32 %0, %%ctaid.x;" : "=3Dr" (block_id)); > + if (block_id =3D=3D 0) > + build_indirect_map (); > + > /* Find the low-latency heap details .... */ > uint32_t *shared_pool; > uint32_t shared_pool_size =3D 0; > diff --git a/libgomp/testsuite/libgomp.c-c++-common/declare-target-indire= ct-2.c b/libgomp/testsuite/libgomp.c-c++-common/declare-target-indirect-2.c > index 9fe190efce8..545f1a9fcbf 100644 > --- a/libgomp/testsuite/libgomp.c-c++-common/declare-target-indirect-2.c > +++ b/libgomp/testsuite/libgomp.c-c++-common/declare-target-indirect-2.c > @@ -17,17 +17,17 @@ int main (void) > { > switch (i % 3) > { > - case 0: fn_ptr[i] =3D &foo; > - case 1: fn_ptr[i] =3D &bar; > - case 2: fn_ptr[i] =3D &baz; > + case 0: fn_ptr[i] =3D &foo; break; > + case 1: fn_ptr[i] =3D &bar; break; > + case 2: fn_ptr[i] =3D &baz; break; > } > expected +=3D (*fn_ptr[i]) (); > } >=20=20 > -#pragma omp target teams distribute parallel for reduction(+: x) \ > - map (to: fn_ptr) map (tofrom: x) > - for (int i =3D 0; i < N; i++) > - x +=3D (*fn_ptr[i]) (); > + #pragma omp target teams distribute parallel for \ > + reduction (+: x) map (to: fn_ptr) map (tofrom: x) > + for (int i =3D 0; i < N; i++) > + x +=3D (*fn_ptr[i]) (); >=20=20 > return x - expected; > }