From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr1-x430.google.com (mail-wr1-x430.google.com [IPv6:2a00:1450:4864:20::430]) by sourceware.org (Postfix) with ESMTPS id 8A8C83857BB5 for ; Thu, 14 Mar 2024 11:38:26 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 8A8C83857BB5 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=baylibre.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=baylibre.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 8A8C83857BB5 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::430 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1710416309; cv=none; b=XztBOUA9AEVpdB8nVzznm5zkYGj0NYz5koYWRvTLrHR4TJbhU9C+JjwGg6wBlfK/aJ66md8/0Rqp1on7GAZdpp8kcQ/djxtfjgBww9wpwFDbLT4lvaIc9Fmo67zo1YpVd9V3msfKgpb7RMOOVtT9UNxqbAdmcXBbA4ZvOAIsj+U= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1710416309; c=relaxed/simple; bh=cNez+2zVWiPU1gwRSvGbHSHsoUmcONKTE47fWiOwMJ8=; h=DKIM-Signature:Message-ID:Date:MIME-Version:Subject:To:From; b=oJ0M5J3hNUkBUT+1DvqUuUtX+Tp7rE8t8V56xWLvHdkU2SD3iIICV73G61IfKm1C18045rGHc2n5/S7tlNg9rEEuPuao5hXAtvcNAXZJATlq2B1pekeZCGtmywdus1rybdrMHZRxoE4B7Ty2UFZ/dX6QEWo49TLIAyDmW5L0Kok= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-wr1-x430.google.com with SMTP id ffacd0b85a97d-33e94c12f33so729065f8f.3 for ; Thu, 14 Mar 2024 04:38:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=baylibre-com.20230601.gappssmtp.com; s=20230601; t=1710416305; x=1711021105; darn=gcc.gnu.org; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=VYGQmkoHyvwF5DPLFLH4n71ajYIN6lAUaEQ/5SpSRdI=; b=cbpAETSXQlROGo3jF2nLUSCfRJeWAjcBeVlAgHl5NFo3wwerkZ4Ich59De9a/OrKzp FCRRh+Up1RAfZn4N0o1DEyYxHxjzQESDismndUZM8eS3wmvk3/gydxtgQuH3T4DdbxLl c5PSd6djJGE45GtM4LxR4f2Xn99aPGYnSshpyl/SqB4YvVDWRCsPmHgo+I+YAlpm50la l1N11zl1gZxSbaRZrxK4NdtLF3g0gMTB2wDpq4Wu/xxZ3hIp4wySzejoStQSg/vQinMr XBOEY0lh/e8hzDFwaVVdwv9anSSJTbD4df97LNteWBxkbIkyCFAnQdrDiTZCQVJ3fOAO SLag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710416305; x=1711021105; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=VYGQmkoHyvwF5DPLFLH4n71ajYIN6lAUaEQ/5SpSRdI=; b=Pn9SlwZM5W4BV+VD3KrckhpB6BQK4TKepHhx3QUEqFoxZMh0fyIN2DFnzkffaaIaPe 19zm9A7xIUEWAT2xdEEqXS9K2YDFEVHu4ss3bldnROJ5NHb8VH6gFw5ss3IYWTr5lzty xB80ePSkkK6YmlE+WlQ0I2jmda+gZsl325YKmBqZJ1P9c7gKbvXzr1A7QOmUOw5REHQ0 zjt3UUsjY7v16cRk9WB3ccHjULYWfEwbZHconNv4cQ7+0vXuOit2AqEREypeUHBLt6+7 MfcAxYaGgIohPZFnE11C6GkMM0tnJl1fgXKEjxe1JEKg/gRqS6q8UcSg2nhhgGIiywc9 vHHA== X-Forwarded-Encrypted: i=1; AJvYcCUHFoOXAy3AI1XlnPFMUtTB/i6Zn3QBCS3j9OAPWH5fajvzKNDwktSASp5o95Xms5FDv5+gIrvTlOTHk++8hIN88GhkvpwdVQ== X-Gm-Message-State: AOJu0YzRPqkAXDV9tTmJv5ggVg0f9yM3kag/Rwsy8EhKXvjwMxbf+yKE rpRF/KV25745xCaEyJCLNCj9knKPoQvP6vWoWW3Gdcb/LB/wagksamcFvdBHPXw= X-Google-Smtp-Source: AGHT+IF+Tecu/XIv0PDcy6/pr1H8JdTuRSMANvx4W5ff3sX0aG4r05aluiV1RLd9GFkZ8JwzH7Nf+Q== X-Received: by 2002:adf:8b59:0:b0:33d:6984:3f87 with SMTP id v25-20020adf8b59000000b0033d69843f87mr1131008wra.8.1710416305162; Thu, 14 Mar 2024 04:38:25 -0700 (PDT) Received: from ?IPV6:2001:16b8:2a97:9a00:1328:ba90:3636:1a76? (200116b82a979a001328ba9036361a76.dip.versatel-1u1.de. [2001:16b8:2a97:9a00:1328:ba90:3636:1a76]) by smtp.gmail.com with ESMTPSA id bp6-20020a5d5a86000000b0033ecb329f15sm302380wrb.9.2024.03.14.04.38.24 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 14 Mar 2024 04:38:24 -0700 (PDT) Message-ID: <1b7d9cfd-8586-49a0-8bc4-fa65f2c59bf3@baylibre.com> Date: Thu, 14 Mar 2024 12:38:23 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2] openmp: Change to using a hashtab to lookup offload target addresses for indirect function calls Content-Language: en-US To: Kwok Cheung Yeung , gcc-patches Cc: Jakub Jelinek , Thomas Schwinge References: <679889de-bf47-4a01-887e-db96f7fad427@baylibre.com> From: Tobias Burnus In-Reply-To: <679889de-bf47-4a01-887e-db96f7fad427@baylibre.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Kwok, On January 22, 2024, Kwok Cheung Yeung wrote: > There was a bug in the declare-target-indirect-2.c libgomp testcase > (testing indirect calls in offloaded target regions, spread over > multiple teams/threads) that due to an errant fallthrough in a switch > statement resulted in only one indirect function ever getting called: (When applying, also the 'dg-xfail-run-if' needs to be removed from libgomp.fortran/declare-target-indirect-2.f90) ... > However, when the missing break statements are added, the testcase > fails with an invalid memory access. Upon investigation, this is due > to the use of a splay-tree as the lookup structure for indirect > addresses, as the splay-tree moves frequently accessed elements closer > to the root node and so needs locking when used from multiple threads. > However, this would end up partially serialising all the threads and > kill performance. I have switched the lookup structure from a splay > tree to a hashtab instead to avoid locking during lookup. > > I have also tidied up the initialisation of the lookup table by > calling it only from the first thread of the first team, instead of > redundantly calling it from every thread and only having the first one > reached do the initialisation. This removes the need for locking > during initialisation. LGTM - except of the following, which we need to solve (as suggested or differently (locking, or ...) or by declaring it a nonissue (e.g. because of thinko of mine). Thoughts about the following? * * * Namely, I wonder whether there will be an issue for #pragma target nowait    ... #pragma target    ... Once the kernel is started, thegcn_expand_prologue creates some setup code and then a call to gomp_gcn_enter_kernel. Likewise for gcc/config/nvptx/nvptx.cc, where nvptx_declare_function_name adds via write_omp_entry a call to gomp_nvptx_main. And one of the first tasks there is 'build_indirect_map'. Assume a very simple kernel for the second item (i.e. it is quickly started) and a very large number of reverse kernels. Now, I wonder whether it is possible to have a race between the two kernels; it seems as if that might happen but is extremely unlikely accounting for all the overhead of launching and the rather small list of reverse offload items. As it is unlikely, I wonder whether doing the following lock free, opportunistic approach will be the best solution. Namely, assuming that no other kernel updates the hash, but if that happens by chance, use the one that was created first. (If we are lucky, the atomic overhead is fully cancelled by using a local variable in the function but neither should matter much.) if (!indirect_htab) // or: __atomic_load_n (&indirect_htab, __ATOMIC_RELAXED) ? { htab_t local_indirect_htab = htab_create (num_ind_funcs); ... htab_t expected = NULL; __atomic_compare_exchange_n (&indirect_htab, &expected, local_indirect_htab, false, ...); if (expected) // Other kernel was faster, drop our version htab_free (local_indirect_htab); } On January 29, 2024, Kwok Cheung Yeung wrote: >> Can you please akso update the comments to talk about hashtab instead >> of splay? > This version has the comments updated and removes a stray 'volatile' > in the #ifdefed out code. Thanks, Tobias