From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id DC0663858034 for ; Fri, 12 Nov 2021 17:58:16 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org DC0663858034 Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-243-jWlrS0TqORyTaF7ndh3wng-1; Fri, 12 Nov 2021 12:58:12 -0500 X-MC-Unique: jWlrS0TqORyTaF7ndh3wng-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id EE97F1006AA0; Fri, 12 Nov 2021 17:58:10 +0000 (UTC) Received: from tucnak.zalov.cz (unknown [10.39.192.54]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 4FC5860C05; Fri, 12 Nov 2021 17:58:10 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.16.1/8.16.1) with ESMTPS id 1ACHw6q9823429 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Fri, 12 Nov 2021 18:58:07 +0100 Received: (from jakub@localhost) by tucnak.zalov.cz (8.16.1/8.16.1/Submit) id 1ACHw45h823428; Fri, 12 Nov 2021 18:58:04 +0100 Date: Fri, 12 Nov 2021 18:58:04 +0100 From: Jakub Jelinek To: Tom de Vries , Alexander Monakov Cc: gcc-patches@gcc.gnu.org, Tobias Burnus Subject: [PATCH] libgomp, nvptx, v3: Honor OpenMP 5.1 num_teams lower bound Message-ID: <20211112175804.GJ2710@tucnak> Reply-To: Jakub Jelinek References: <20211112132023.GC2710@tucnak> <20211112132716.GD2710@tucnak> MIME-Version: 1.0 In-Reply-To: <20211112132716.GD2710@tucnak> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Spam-Status: No, score=-5.7 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 12 Nov 2021 17:58:19 -0000 On Fri, Nov 12, 2021 at 02:27:16PM +0100, Jakub Jelinek via Gcc-patches wrote: > On Fri, Nov 12, 2021 at 02:20:23PM +0100, Jakub Jelinek via Gcc-patches wrote: > > This patch assumes that .shared variables are initialized to 0, > > https://docs.nvidia.com/cuda/parallel-thread-execution/index.html lists > > in Table 7. .shared as non-initializable. If that isn't the case, > > we need to initialize it somewhere for the case of #pragma omp target > > without #pragma omp teams in it, maybe in libgcc/config/nvptx/crt0.c ? > > A quick look at libgcc/config/nvptx/crt0.c shows the target supports > __attribute__((shared)), so perhaps either following instead, or, if > .shared isn't preinitialized to zero, defining the variable in > libgcc/config/nvptx/crt0.c , adding there __gomp_team_num = 0; > and adding extern keyword before int __gomp_team_num __attribute__((shared)); > in libgomp/config/nvptx/target.c. And finally here is a third version, which fixes a typo in the previous patch (in instead of int) and actually initializes the shared var because PTX documentation doesn't say anything about how the shared vars are initialized. Tested on x86_64-linux with nvptx-none offloading, ok for trunk? 2021-11-12 Jakub Jelinek * config/nvptx/team.c (__gomp_team_num): Define as __attribute__((shared)) var. (gomp_nvptx_main): Initialize __gomp_team_num to 0. * config/nvptx/target.c (__gomp_team_num): Declare as extern __attribute__((shared)) var. (GOMP_teams4): Use __gomp_team_num as the team number instead of %ctaid.x. If first, initialize it to %ctaid.x. If num_teams_lower is bigger than num_blocks, use num_teams_lower teams and arrange for bumping of __gomp_team_num if !first and returning false once we run out of teams. * config/nvptx/teams.c (__gomp_team_num): Declare as extern __attribute__((shared)) var. (omp_get_team_num): Return __gomp_team_num value instead of %ctaid.x. --- libgomp/config/nvptx/team.c.jj 2021-05-25 13:43:02.793121350 +0200 +++ libgomp/config/nvptx/team.c 2021-11-12 17:49:02.847341650 +0100 @@ -32,6 +32,7 @@ #include struct gomp_thread *nvptx_thrs __attribute__((shared,nocommon)); +int __gomp_team_num __attribute__((shared)); static void gomp_thread_start (struct gomp_thread_pool *); @@ -57,6 +58,7 @@ gomp_nvptx_main (void (*fn) (void *), vo /* Starting additional threads is not supported. */ gomp_global_icv.dyn_var = true; + __gomp_team_num = 0; nvptx_thrs = alloca (ntids * sizeof (*nvptx_thrs)); memset (nvptx_thrs, 0, ntids * sizeof (*nvptx_thrs)); --- libgomp/config/nvptx/target.c.jj 2021-11-12 15:57:29.400632875 +0100 +++ libgomp/config/nvptx/target.c 2021-11-12 17:47:39.499533296 +0100 @@ -26,28 +26,41 @@ #include "libgomp.h" #include +extern int __gomp_team_num __attribute__((shared)); + bool GOMP_teams4 (unsigned int num_teams_lower, unsigned int num_teams_upper, unsigned int thread_limit, bool first) { + unsigned int num_blocks, block_id; + asm ("mov.u32 %0, %%nctaid.x;" : "=r" (num_blocks)); if (!first) - return false; + { + unsigned int team_num; + if (num_blocks > gomp_num_teams_var) + return false; + team_num = __gomp_team_num; + if (team_num > gomp_num_teams_var - num_blocks) + return false; + __gomp_team_num = team_num + num_blocks; + return true; + } if (thread_limit) { struct gomp_task_icv *icv = gomp_icv (true); icv->thread_limit_var = thread_limit > INT_MAX ? UINT_MAX : thread_limit; } - unsigned int num_blocks, block_id; - asm ("mov.u32 %0, %%nctaid.x;" : "=r" (num_blocks)); - asm ("mov.u32 %0, %%ctaid.x;" : "=r" (block_id)); - /* FIXME: If num_teams_lower > num_blocks, we want to loop multiple - times for some CTAs. */ - (void) num_teams_lower; - if (!num_teams_upper || num_teams_upper >= num_blocks) + if (!num_teams_upper) num_teams_upper = num_blocks; - else if (block_id >= num_teams_upper) + else if (num_blocks < num_teams_lower) + num_teams_upper = num_teams_lower; + else if (num_blocks < num_teams_upper) + num_teams_upper = num_blocks; + asm ("mov.u32 %0, %%ctaid.x;" : "=r" (block_id)); + if (block_id >= num_teams_upper) return false; + __gomp_team_num = block_id; gomp_num_teams_var = num_teams_upper - 1; return true; } --- libgomp/config/nvptx/teams.c.jj 2021-05-25 13:43:02.793121350 +0200 +++ libgomp/config/nvptx/teams.c 2021-11-12 17:37:18.933361024 +0100 @@ -28,6 +28,8 @@ #include "libgomp.h" +extern int __gomp_team_num __attribute__((shared)); + void GOMP_teams_reg (void (*fn) (void *), void *data, unsigned int num_teams, unsigned int thread_limit, unsigned int flags) @@ -48,9 +50,7 @@ omp_get_num_teams (void) int omp_get_team_num (void) { - int ctaid; - asm ("mov.u32 %0, %%ctaid.x;" : "=r" (ctaid)); - return ctaid; + return __gomp_team_num; } ialias (omp_get_num_teams) Jakub