From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <amonakov@ispras.ru>
Received: from mail.ispras.ru (mail.ispras.ru [83.149.199.84])
 by sourceware.org (Postfix) with ESMTPS id 8FB80385841B
 for <gcc-patches@gcc.gnu.org>; Fri, 12 Nov 2021 21:22:00 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 8FB80385841B
Received: from [10.10.3.121] (unknown [10.10.3.121])
 by mail.ispras.ru (Postfix) with ESMTPS id 27B2340D403D;
 Fri, 12 Nov 2021 21:21:58 +0000 (UTC)
Date: Sat, 13 Nov 2021 00:21:58 +0300 (MSK)
From: Alexander Monakov <amonakov@ispras.ru>
To: Jakub Jelinek <jakub@redhat.com>
cc: Tobias Burnus <tobias@codesourcery.com>, gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] libgomp, nvptx, v3: Honor OpenMP 5.1 num_teams lower bound
In-Reply-To: <20211112194905.GA2664@tucnak>
Message-ID: <126a293f-f6d0-935c-ee9-45720ad48e@ispras.ru>
References: <20211112132023.GC2710@tucnak> <20211112132716.GD2710@tucnak>
 <20211112175804.GJ2710@tucnak>
 <cde518e-b369-c597-5ddd-1086bd7129c5@ispras.ru>
 <20211112194706.GL2710@tucnak> <20211112194905.GA2664@tucnak>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
X-Spam-Status: No, score=-2.7 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS,
 SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Fri, 12 Nov 2021 21:22:02 -0000


On Fri, 12 Nov 2021, Jakub Jelinek via Gcc-patches wrote:

> On Fri, Nov 12, 2021 at 08:47:09PM +0100, Jakub Jelinek wrote:
> > The problem is that the argument of the num_teams clause isn't always known
> > before target is launched.
> 
> There was a design mistake that the clause has been put on teams rather than
> on target (well, for host teams we need it on teams), and 5.1 actually
> partially fixes this up for thread_limit by allowing that clause on both,
> but not for num_teams.

If this is a mistake in the standard, can GCC say "the spec is bad; fix the
spec" and refuse to implement support, since it penalizes the common case?

Technically, this could be implemented without penalizing the common case via
CUDA "dynamic parallelism" where you initially launch just one block on the
device that figures out the dimensions and then performs a GPU-side launch of
the required amount of blocks, but that's a nontrivial amount of work.

I looked over your patch. I sent a small nitpick about 'nocommon' in a separate
message, and I still think it's better to adjust GOMP_OFFLOAD_run to take into
account the lower bound when it's known on the host side (otherwise you do
static scheduling of blocks which is going to be inferior to dynamic scheduling:
imagine lower bound is 3, and maximum resident blocks is 2: then you first do
teams 0 and 1 in parallel, then you do team 2 from the 0'th block, while in fact
you want to do it from whichever block finished its initial team first).

Alexander