From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id 36C713858D1E for ; Fri, 30 Sep 2022 09:35:48 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 36C713858D1E Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1664530547; h=from:from:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:in-reply-to:in-reply-to: references:references; bh=tReCevMcP3Sk/3jRXk2MGO2GJxQj4LpYW5JITX8P7cY=; b=H6NjSsls8UHGMzk07A0U3LPI/VTi/hjEGifba2oGaDbK+S85wgG8JDpV3yjG4ui1DKdUnu CtFGGnTcC9IV7478eJCHC9DvgnGro9ItUMRErOa1Ddqw5oaagEKh5qOvMetzwqEKLwR+F6 J/TwGkrjZt/JvSNONkQts1POftk2izo= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-640-d4ye09tfOwePcksZWGcrwA-1; Fri, 30 Sep 2022 05:35:45 -0400 X-MC-Unique: d4ye09tfOwePcksZWGcrwA-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 8E7A73C0D1B1; Fri, 30 Sep 2022 09:35:45 +0000 (UTC) Received: from tucnak.zalov.cz (unknown [10.39.192.194]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 4B1D91121314; Fri, 30 Sep 2022 09:35:45 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.17.1/8.17.1) with ESMTPS id 28U9ZgEO3857517 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Fri, 30 Sep 2022 11:35:43 +0200 Received: (from jakub@localhost) by tucnak.zalov.cz (8.17.1/8.17.1/Submit) id 28U9ZfbI3857516; Fri, 30 Sep 2022 11:35:41 +0200 Date: Fri, 30 Sep 2022 11:35:41 +0200 From: Jakub Jelinek To: Marcel Vollweiler Cc: gcc-patches@gcc.gnu.org Subject: Re: [Patch] OpenMP, libgomp, gimple: omp_get_max_teams, omp_set_num_teams, and omp_{gs}et_teams_thread_limit on offload devices Message-ID: Reply-To: Jakub Jelinek References: <94d20b62-d841-c4f0-f167-ed76a0b4dbfd@codesourcery.com> <3195cfa5-0612-5b52-4c24-9763c9a56864@codesourcery.com> MIME-Version: 1.0 In-Reply-To: <3195cfa5-0612-5b52-4c24-9763c9a56864@codesourcery.com> X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Spam-Status: No, score=-4.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Sun, Sep 18, 2022 at 10:24:43AM +0200, Marcel Vollweiler wrote: > gcc/ChangeLog: > > * gimplify.cc (optimize_target_teams): Set initial num_teams_upper > to "-2" instead of "1" for non-existing num_teams clause in order to > disambiguate from the case of an existing num_teams clause with value 1. > > libgomp/ChangeLog: > > * config/gcn/icv-device.c (omp_get_teams_thread_limit): Added to > allow processing of device-specific values. > (omp_set_teams_thread_limit): Likewise. > (ialias): Likewise. > * config/nvptx/icv-device.c (omp_get_teams_thread_limit): Likewise. > (omp_set_teams_thread_limit): Likewise. > (ialias): Likewise. > * icv-device.c (omp_get_teams_thread_limit): Likewise. > (ialias): Likewise. > (omp_set_teams_thread_limit): Likewise. > * icv.c (omp_set_teams_thread_limit): Removed. > (omp_get_teams_thread_limit): Likewise. > (ialias): Likewise. > * target.c (get_gomp_offload_icvs): Added teams_thread_limit_var > handling. > (gomp_load_image_to_device): Added a size check for the ICVs struct > variable. > (gomp_copy_back_icvs): New function that is used in GOMP_target_ext to > copy back the ICV values from device to host. > (GOMP_target_ext): Update the number of teams and threads in the kernel > args also considering device-specific values. > * testsuite/libgomp.c-c++-common/icv-4.c: Bugfix. Better say what exactly you changed in words. > * testsuite/libgomp.c-c++-common/icv-5.c: Extended. > * testsuite/libgomp.c-c++-common/icv-6.c: Extended. > * testsuite/libgomp.c-c++-common/icv-7.c: Extended. > * testsuite/libgomp.c-c++-common/icv-9.c: New test. > * testsuite/libgomp.fortran/icv-5.f90: New test. > * testsuite/libgomp.fortran/icv-6.f90: New test. > > gcc/testsuite/ChangeLog: > > * c-c++-common/gomp/target-teams-1.c: Adapt expected values for > num_teams from "1" to "-2" in cases without num_teams clause. > * g++.dg/gomp/target-teams-1.C: Likewise. > * gfortran.dg/gomp/defaultmap-4.f90: Likewise. > * gfortran.dg/gomp/defaultmap-5.f90: Likewise. > * gfortran.dg/gomp/defaultmap-6.f90: Likewise. > --- a/gcc/gimplify.cc > +++ b/gcc/gimplify.cc > @@ -14153,7 +14153,7 @@ optimize_target_teams (tree target, gimple_seq *pre_p) > struct gimplify_omp_ctx *target_ctx = gimplify_omp_ctxp; > > if (teams == NULL_TREE) > - num_teams_upper = integer_one_node; > + num_teams_upper = build_int_cst (integer_type_node, -2); > else > for (c = OMP_TEAMS_CLAUSES (teams); c; c = OMP_CLAUSE_CHAIN (c)) > { The function comment above optimize_target_teams contains detailed description on what the values mean and why, so it definitely should document what -2 means and when it is used. I know you have documentation in libgomp for it, but it should be in both places. > + intptr_t new_teams = orig_teams, new_threads = orig_threads; > + /* ORIG_TEAMS == -2: No explicit teams construct specified. Set to 1. Two spaces after . > + ORIG_TEAMS == -1: TEAMS construct with NUM_TEAMS clause specified, but the > + value could not be specified. No Change. Likewise. lowercase change ? > + ORIG_TEAMS == 0: TEAMS construct without NUM_TEAMS clause. > + Set device-specific value. > + ORIG_TEAMS > 0: Value was already set through e.g. NUM_TEAMS clause. > + No change. */ > + if (orig_teams == -2) > + new_teams = 1; > + else if (orig_teams == 0) > + { > + struct gomp_offload_icv_list *item = gomp_get_offload_icv_item (device); > + if (item != NULL) > + new_teams = item->icvs.nteams; > + } > + /* The device-specific teams-thread-limit is only set if (a) an explicit TEAMS > + region exists, i.e. ORIG_TEAMS > -2, and (b) THREADS was not already set by > + e.g. a THREAD_LIMIT clause. */ > + if (orig_teams >= -2 && orig_threads == 0) The comment talks about ORIG_TEAMS > -2, but the condition is >= -2. So which one is it? > + /* This tests a large number of teams and threads. If it is larger than > + 2^15+1 then the according argument in the kernels arguments list > + is encoded with two items instead of one. On NVIDIA there is an > + adjustment for too large teams and threads. For AMD such adjustment > + exists only for threads and will cause runtime errors with a two large s/two/too/ ? Shouldn't amdgcn adjusts also number of teams? As for testcases, have you tested this in a native setup where dg-set-target-env-var actually works? Jakub