From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id 391463858402 for ; Fri, 12 Nov 2021 13:20:39 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 391463858402 Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-268-TTjwlyXFNQeq8kh4noNlgA-1; Fri, 12 Nov 2021 08:20:29 -0500 X-MC-Unique: TTjwlyXFNQeq8kh4noNlgA-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id C553F6D5F5; Fri, 12 Nov 2021 13:20:28 +0000 (UTC) Received: from tucnak.zalov.cz (unknown [10.39.192.54]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 535F810016F5; Fri, 12 Nov 2021 13:20:28 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.16.1/8.16.1) with ESMTPS id 1ACDKPCB159347 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Fri, 12 Nov 2021 14:20:25 +0100 Received: (from jakub@localhost) by tucnak.zalov.cz (8.16.1/8.16.1/Submit) id 1ACDKNkT159346; Fri, 12 Nov 2021 14:20:23 +0100 Date: Fri, 12 Nov 2021 14:20:23 +0100 From: Jakub Jelinek To: Tobias Burnus , Andrew Stubbs , Tom de Vries Cc: gcc-patches@gcc.gnu.org Subject: [PATCH] libgomp, nvptx: Honor OpenMP 5.1 num_teams lower bound Message-ID: <20211112132023.GC2710@tucnak> Reply-To: Jakub Jelinek MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Spam-Status: No, score=-5.7 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 12 Nov 2021 13:20:40 -0000 Hi! Here is an completely untested attempt at implementing what I was talking about, that for num_teams_upper 0 or whenever num_teams_lower <= num_blocks, the current implementation is fine but if the user explicitly asks for more teams than we can provide in hardware, we need to stop assuming that omp_get_team_num () is equal to the hw team id, but instead need to use some team specific memory (I believe it is .shared for PTX), or if none is provided, array indexed by the hw team id and run some teams serially within the same hw thread. This patch assumes that .shared variables are initialized to 0, https://docs.nvidia.com/cuda/parallel-thread-execution/index.html lists in Table 7. .shared as non-initializable. If that isn't the case, we need to initialize it somewhere for the case of #pragma omp target without #pragma omp teams in it, maybe in libgcc/config/nvptx/crt0.c ? 2021-11-12 Jakub Jelinek * config/nvptx/target.c (__gomp_team_num): Define using inline asm as a .shared var. (GOMP_teams4): Use __gomp_team_num as the team number instead of %ctaid.x. If first, initialize it to %ctaid.x. If num_teams_lower is bigger than num_blocks, use num_teams_lower teams and arrange for bumping of __gomp_team_num if !first and returning false once we run out of teams. * config/nvptx/teams.c (__gomp_team_num): Declare using inline asm as an external .shared var. (omp_get_team_num): Return __gomp_team_num value instead of %ctaid.x. --- libgomp/config/nvptx/target.c.jj 2021-11-12 12:41:11.433501988 +0100 +++ libgomp/config/nvptx/target.c 2021-11-12 14:02:56.231477929 +0100 @@ -26,28 +26,43 @@ #include "libgomp.h" #include +asm ("\n// BEGIN GLOBAL VAR DECL: __gomp_team_num" + "\n.visible .shared .align 4 .u32 __gomp_team_num[1];"); + bool GOMP_teams4 (unsigned int num_teams_lower, unsigned int num_teams_upper, unsigned int thread_limit, bool first) { + unsigned int num_blocks, block_id; + asm ("mov.u32 %0, %%nctaid.x;" : "=r" (num_blocks)); if (!first) - return false; + { + unsigned in team_num; + if (num_blocks > gomp_num_teams_var) + return false; + asm ("ld.shared.u32\t%0, [__gomp_team_num]" : "=r" (team_num)); + if (team_num > gomp_num_teams_var - num_blocks) + return false; + asm ("st.shared.u32\t[__gomp_team_num], %0" + : : "r" (team_num + num_blocks)); + return true; + } if (thread_limit) { struct gomp_task_icv *icv = gomp_icv (true); icv->thread_limit_var = thread_limit > INT_MAX ? UINT_MAX : thread_limit; } - unsigned int num_blocks, block_id; - asm ("mov.u32 %0, %%nctaid.x;" : "=r" (num_blocks)); - asm ("mov.u32 %0, %%ctaid.x;" : "=r" (block_id)); - /* FIXME: If num_teams_lower > num_blocks, we want to loop multiple - times for some CTAs. */ - (void) num_teams_lower; - if (!num_teams_upper || num_teams_upper >= num_blocks) + if (!num_teams_upper) num_teams_upper = num_blocks; - else if (block_id >= num_teams_upper) + else if (num_blocks < num_teams_lower) + num_teams_upper = num_teams_lower; + else if (num_blocks < num_teams_upper) + num_teams_upper = num_blocks; + asm ("mov.u32 %0, %%ctaid.x;" : "=r" (block_id)); + if (block_id >= num_teams_upper) return false; + asm ("st.shared.u32\t[__gomp_team_num], %0" : : "r" (block_id)); gomp_num_teams_var = num_teams_upper - 1; return true; } --- libgomp/config/nvptx/teams.c.jj 2021-01-05 00:13:58.255297642 +0100 +++ libgomp/config/nvptx/teams.c 2021-11-12 13:55:59.950421993 +0100 @@ -28,6 +28,9 @@ #include "libgomp.h" +asm ("\n// BEGIN GLOBAL VAR DECL: __gomp_team_num" + "\n.extern .shared .align 4 .u32 __gomp_team_num[1];"); + void GOMP_teams_reg (void (*fn) (void *), void *data, unsigned int num_teams, unsigned int thread_limit, unsigned int flags) @@ -48,9 +50,9 @@ omp_get_num_teams (void) int omp_get_team_num (void) { - int ctaid; - asm ("mov.u32 %0, %%ctaid.x;" : "=r" (ctaid)); - return ctaid; + int team_num; + asm ("ld.shared.u32\t%0, [__gomp_team_num]" : "=r" (team_num)); + return team_num; } ialias (omp_get_num_teams) Jakub