From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by sourceware.org (Postfix) with ESMTPS id BA19C3858402 for ; Fri, 12 Nov 2021 13:27:25 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org BA19C3858402 Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-467-lCgVh8EAMK6OoZPsGWZ9Ew-1; Fri, 12 Nov 2021 08:27:22 -0500 X-MC-Unique: lCgVh8EAMK6OoZPsGWZ9Ew-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 11B2187D541; Fri, 12 Nov 2021 13:27:21 +0000 (UTC) Received: from tucnak.zalov.cz (unknown [10.39.192.54]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 9DD575C1B4; Fri, 12 Nov 2021 13:27:20 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.16.1/8.16.1) with ESMTPS id 1ACDRHbb159412 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Fri, 12 Nov 2021 14:27:17 +0100 Received: (from jakub@localhost) by tucnak.zalov.cz (8.16.1/8.16.1/Submit) id 1ACDRGAs159411; Fri, 12 Nov 2021 14:27:16 +0100 Date: Fri, 12 Nov 2021 14:27:16 +0100 From: Jakub Jelinek To: Tobias Burnus , Andrew Stubbs , Tom de Vries , gcc-patches@gcc.gnu.org Subject: [PATCH] libgomp, nvptx, v2: Honor OpenMP 5.1 num_teams lower bound Message-ID: <20211112132716.GD2710@tucnak> Reply-To: Jakub Jelinek References: <20211112132023.GC2710@tucnak> MIME-Version: 1.0 In-Reply-To: <20211112132023.GC2710@tucnak> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Spam-Status: No, score=-5.7 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 12 Nov 2021 13:27:27 -0000 On Fri, Nov 12, 2021 at 02:20:23PM +0100, Jakub Jelinek via Gcc-patches wrote: > This patch assumes that .shared variables are initialized to 0, > https://docs.nvidia.com/cuda/parallel-thread-execution/index.html lists > in Table 7. .shared as non-initializable. If that isn't the case, > we need to initialize it somewhere for the case of #pragma omp target > without #pragma omp teams in it, maybe in libgcc/config/nvptx/crt0.c ? A quick look at libgcc/config/nvptx/crt0.c shows the target supports __attribute__((shared)), so perhaps either following instead, or, if .shared isn't preinitialized to zero, defining the variable in libgcc/config/nvptx/crt0.c , adding there __gomp_team_num = 0; and adding extern keyword before int __gomp_team_num __attribute__((shared)); in libgomp/config/nvptx/target.c. 2021-11-12 Jakub Jelinek * config/nvptx/target.c (__gomp_team_num): Define as __attribute__((shared)) var. (GOMP_teams4): Use __gomp_team_num as the team number instead of %ctaid.x. If first, initialize it to %ctaid.x. If num_teams_lower is bigger than num_blocks, use num_teams_lower teams and arrange for bumping of __gomp_team_num if !first and returning false once we run out of teams. * config/nvptx/teams.c (__gomp_team_num): Declare as extern __attribute__((shared)) var. (omp_get_team_num): Return __gomp_team_num value instead of %ctaid.x. --- libgomp/config/nvptx/target.c.jj 2021-11-12 12:41:11.433501988 +0100 +++ libgomp/config/nvptx/target.c 2021-11-12 14:21:39.451426717 +0100 @@ -26,28 +26,41 @@ #include "libgomp.h" #include +int __gomp_team_num __attribute__((shared)); + bool GOMP_teams4 (unsigned int num_teams_lower, unsigned int num_teams_upper, unsigned int thread_limit, bool first) { + unsigned int num_blocks, block_id; + asm ("mov.u32 %0, %%nctaid.x;" : "=r" (num_blocks)); if (!first) - return false; + { + unsigned in team_num; + if (num_blocks > gomp_num_teams_var) + return false; + team_num = __gomp_team_num; + if (team_num > gomp_num_teams_var - num_blocks) + return false; + __gomp_team_num = team_num + num_blocks; + return true; + } if (thread_limit) { struct gomp_task_icv *icv = gomp_icv (true); icv->thread_limit_var = thread_limit > INT_MAX ? UINT_MAX : thread_limit; } - unsigned int num_blocks, block_id; - asm ("mov.u32 %0, %%nctaid.x;" : "=r" (num_blocks)); - asm ("mov.u32 %0, %%ctaid.x;" : "=r" (block_id)); - /* FIXME: If num_teams_lower > num_blocks, we want to loop multiple - times for some CTAs. */ - (void) num_teams_lower; - if (!num_teams_upper || num_teams_upper >= num_blocks) + if (!num_teams_upper) num_teams_upper = num_blocks; - else if (block_id >= num_teams_upper) + else if (num_blocks < num_teams_lower) + num_teams_upper = num_teams_lower; + else if (num_blocks < num_teams_upper) + num_teams_upper = num_blocks; + asm ("mov.u32 %0, %%ctaid.x;" : "=r" (block_id)); + if (block_id >= num_teams_upper) return false; + __gomp_team_num = block_id; gomp_num_teams_var = num_teams_upper - 1; return true; } --- libgomp/config/nvptx/teams.c.jj 2021-01-05 00:13:58.255297642 +0100 +++ libgomp/config/nvptx/teams.c 2021-11-12 14:22:06.443039863 +0100 @@ -28,6 +28,8 @@ #include "libgomp.h" +extern int __gomp_team_num __attribute__((shared)); + void GOMP_teams_reg (void (*fn) (void *), void *data, unsigned int num_teams, unsigned int thread_limit, unsigned int flags) @@ -48,9 +50,7 @@ omp_get_num_teams (void) int omp_get_team_num (void) { - int ctaid; - asm ("mov.u32 %0, %%ctaid.x;" : "=r" (ctaid)); - return ctaid; + return __gomp_team_num; } ialias (omp_get_num_teams) Jakub