From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-476988-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 69645 invoked by alias); 30 Apr 2018 13:34:40 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 69632 invoked by uid 89); 30 Apr 2018 13:34:39 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-24.9 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,RCVD_IN_DNSWL_NONE,SPF_PASS,URIBL_RED autolearn=ham version=3.3.2 spammy=workers, Allocation, furthermore, atm
X-HELO: relay1.mentorg.com
Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 30 Apr 2018 13:34:34 +0000
Received: from nat-ies.mentorg.com ([192.94.31.2] helo=SVR-IES-MBX-04.mgc.mentorg.com)	by relay1.mentorg.com with esmtps (TLSv1.2:ECDHE-RSA-AES256-SHA384:256)	id 1fD8wq-0007Xg-7I from Tom_deVries@mentor.com 	for gcc-patches@gcc.gnu.org; Mon, 30 Apr 2018 06:34:32 -0700
Received: from [172.30.73.38] (137.202.0.87) by SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) with Microsoft SMTP Server (TLS) id 15.0.1320.4; Mon, 30 Apr 2018 14:33:24 +0100
To: GCC Patches <gcc-patches@gcc.gnu.org>
CC: Thomas Schwinge <thomas@codesourcery.com>
From: Tom de Vries <Tom_deVries@mentor.com>
Subject: [og7, libgomp, nvptx, committed] Fix too-many-resources fatal error condition and message
Message-ID: <d00276f6-0642-84db-5144-5d39710d6c34@mentor.com>
Date: Mon, 30 Apr 2018 13:41:00 -0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0
MIME-Version: 1.0
Content-Type: multipart/mixed;	boundary="------------E01828344E1DA2D88A5E4182"
X-ClientProxiedBy: svr-ies-mbx-01.mgc.mentorg.com (139.181.222.1) To SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4)
X-SW-Source: 2018-04/txt/msg01316.txt.bz2


--------------E01828344E1DA2D88A5E4182
Content-Type: text/plain; charset="utf-8"; format=flowed
Content-Transfer-Encoding: 7bit
Content-length: 2016

Hi,

atm parallel-dims.c fails on Titan-V, with a cuda launch failure:
...
libgomp: cuLaunchKernel error: too many resources requested for launch
...

We've got a check in the libgomp nvptx plugin to prevent the cuda launch 
failure and give a more informative error message:
...
  /* Check if the accelerator has sufficient hardware resources to 

      launch the offloaded kernel.  */
   if (dims[GOMP_DIM_WORKER] > 1)
     {
       if (reg_granularity > 0
           && dims[GOMP_DIM_WORKER] > threads_per_block)
         GOMP_PLUGIN_fatal
           ("The Nvidia accelerator has insufficient resources "
            "to launch '%s'; recompile the program with "
            "'num_workers = %d' on that offloaded region or "
            "'-fopenacc-dim=-:%d'.\n",
            targ_fn->launch->fn, threads_per_block,
            threads_per_block);
     }
...

The message doesn't trigger, because reg_granularity == -1.
This value comes from dev->register_allocation_granularity which 
defaults to -1 because libgomp does not have a hardcoded constant for 
sm_70. The hardcoded constants that are present match 'Warp Allocation 
Granularity' in the GPU Data table in CUDA_Occupancy_calculator.xls, but 
AFAICT there's no column published yet for sm_70.

Furthermore, the comparison to threads_per_block is not correct. What we 
want here is the maximum amount of threads per block, while the 
threads_per_block variable contains an approximation of that, and the 
exact amount required is already available from the CUDA runtime and 
stored at targ_fn->max_threads_per_block.

Then, the comparison to dims[GOMP_DIM_WORKER] is incorrect. It used to 
be correct before "[nvptx] Handle large vectors in libgomp" when we used 
to do "threads_per_block /= warp_size", but now we need to compare 
against dims[GOMP_DIM_WORKER] * dims[GOMP_DIM_VECTOR].

Finally, the message has not been updated to reflect that vector length 
can be larger than 32.

The patch addresses these issues.

Committed to og7.

Thanks,
- Tom

--------------E01828344E1DA2D88A5E4182
Content-Type: text/x-patch;
	name="0001-libgomp-nvptx-Fix-too-many-resources-fatal-error-condition-and-message.patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
	filename*0="0001-libgomp-nvptx-Fix-too-many-resources-fatal-error-condit";
	filename*1="ion-and-message.patch"
Content-length: 1790

[libgomp, nvptx] Fix too-many-resources fatal error condition and message

2018-04-30  Tom de Vries  <tom@codesourcery.com>

	* plugin/plugin-nvptx.c (nvptx_exec): Fix
	insufficient-resources-to-launch fatal error condition and message.

---
 libgomp/plugin/plugin-nvptx.c | 19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 9b4768f..3c00555 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -834,16 +834,15 @@ nvptx_exec (void (*fn), size_t mapnum, void **hostaddrs, void **devaddrs,
 
   /* Check if the accelerator has sufficient hardware resources to
      launch the offloaded kernel.  */
-  if (dims[GOMP_DIM_WORKER] > 1)
-    {
-      if (reg_granularity > 0 && dims[GOMP_DIM_WORKER] > threads_per_block)
-	GOMP_PLUGIN_fatal ("The Nvidia accelerator has insufficient resources "
-			   "to launch '%s'; recompile the program with "
-			   "'num_workers = %d' on that offloaded region or "
-			   "'-fopenacc-dim=-:%d'.\n",
-			   targ_fn->launch->fn, threads_per_block,
-			   threads_per_block);
-    }
+  if (dims[GOMP_DIM_WORKER] * dims[GOMP_DIM_VECTOR]
+      > targ_fn->max_threads_per_block)
+    GOMP_PLUGIN_fatal ("The Nvidia accelerator has insufficient resources to"
+		       " launch '%s' with num_workers = %d and vector_length ="
+		       " %d; recompile the program with 'num_workers = x and"
+		       " vector_length = y' on that offloaded region or "
+		       "'-fopenacc-dim=-:x:y' where x * y <= %d.\n",
+		       targ_fn->launch->fn, dims[GOMP_DIM_WORKER],
+		       dims[GOMP_DIM_VECTOR], targ_fn->max_threads_per_block);
 
   GOMP_PLUGIN_debug (0, "  %s: kernel %s: launch"
 		     " gangs=%u, workers=%u, vectors=%u\n",

--------------E01828344E1DA2D88A5E4182--