From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-482620-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 70161 invoked by alias); 30 Jul 2018 10:20:14 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 70149 invoked by uid 89); 30 Jul 2018 10:20:13 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-26.9 required=5.0 tests=BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,SPF_PASS autolearn=ham version=3.3.2 spammy=
X-HELO: mx1.suse.de
Received: from mx2.suse.de (HELO mx1.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 30 Jul 2018 10:20:12 +0000
Received: from relay1.suse.de (unknown [195.135.220.254])	by mx1.suse.de (Postfix) with ESMTP id 0D25EADC2;	Mon, 30 Jul 2018 10:20:10 +0000 (UTC)
To: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Cc: Thomas Schwinge <thomas@codesourcery.com>, Jakub Jelinek <jakub@redhat.com>, Cesar Philippidis <cesar@codesourcery.com>
From: Tom de Vries <tdevries@suse.de>
Subject: [libgomp, nvptx. committed] Handle per-function max-threads-per-block in default dims
Message-ID: <d5349c01-97cc-7591-bba3-40542e71a5a4@suse.de>
Date: Mon, 30 Jul 2018 10:20:00 -0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="------------12C562AB242C8D469C9FF2C2"
X-IsSubscribed: yes
X-SW-Source: 2018-07/txt/msg01817.txt.bz2

This is a multi-part message in MIME format.
--------------12C562AB242C8D469C9FF2C2
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Content-length: 96

Hi,

Build and reg-tested on x86_64 with nvptx accelerator.

Committed to trunk.

Thanks,
- Tom

--------------12C562AB242C8D469C9FF2C2
Content-Type: text/x-patch;
 name="0003-libgomp-nvptx-Handle-per-function-max-threads-per-block-in-default-dims.patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename*0="0003-libgomp-nvptx-Handle-per-function-max-threads-per-block";
 filename*1="-in-default-dims.patch"
Content-length: 2845

[libgomp, nvptx] Handle per-function max-threads-per-block in default dims

Currently parallel-loop-1.c fails at -O0 on a Quadro M1200, because one of the
kernel launch configurations exceeds the resources available in the device, due
to the default dimensions chosen by the runtime.

This patch fixes that by taking the per-function max_threads_per_block into
account when using the default dimensions.

2018-07-27  Tom de Vries  <tdevries@suse.de>

	* plugin/plugin-nvptx.c (MIN, MAX): Redefine.
	(nvptx_exec): Ensure worker and vector default dims don't exceed
	targ_fn->max_threads_per_block.

---
 libgomp/plugin/plugin-nvptx.c | 29 +++++++++++++++++++++++++----
 1 file changed, 25 insertions(+), 4 deletions(-)

diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 5c522aaf281..b6ec5f88d59 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -141,6 +141,11 @@ init_cuda_lib (void)
 
 #include "secure_getenv.h"
 
+#undef MIN
+#undef MAX
+#define MIN(X,Y) ((X) < (Y) ? (X) : (Y))
+#define MAX(X,Y) ((X) > (Y) ? (X) : (Y))
+
 /* Convenience macros for the frequently used CUDA library call and
    error handling sequence as well as CUDA library calls that
    do the error checking themselves or don't do it at all.  */
@@ -1135,6 +1140,7 @@ nvptx_exec (void (*fn), size_t mapnum, void **hostaddrs, void **devaddrs,
   void *kargs[1];
   void *hp, *dp;
   struct nvptx_thread *nvthd = nvptx_thread ();
+  int warp_size = nvthd->ptx_dev->warp_size;
   const char *maybe_abort_msg = "(perhaps abort was called)";
 
   function = targ_fn->fn;
@@ -1175,7 +1181,6 @@ nvptx_exec (void (*fn), size_t mapnum, void **hostaddrs, void **devaddrs,
 
 	  int gang, worker, vector;
 	  {
-	    int warp_size = nvthd->ptx_dev->warp_size;
 	    int block_size = nvthd->ptx_dev->max_threads_per_block;
 	    int cpu_size = nvthd->ptx_dev->max_threads_per_multiprocessor;
 	    int dev_size = nvthd->ptx_dev->num_sms;
@@ -1213,9 +1218,25 @@ nvptx_exec (void (*fn), size_t mapnum, void **hostaddrs, void **devaddrs,
 	}
       pthread_mutex_unlock (&ptx_dev_lock);
 
-      for (i = 0; i != GOMP_DIM_MAX; i++)
-	if (!dims[i])
-	  dims[i] = nvthd->ptx_dev->default_dims[i];
+      {
+	bool default_dim_p[GOMP_DIM_MAX];
+	for (i = 0; i != GOMP_DIM_MAX; i++)
+	  {
+	    default_dim_p[i] = !dims[i];
+	    if (default_dim_p[i])
+	      dims[i] = nvthd->ptx_dev->default_dims[i];
+	  }
+
+	if (default_dim_p[GOMP_DIM_VECTOR])
+	  dims[GOMP_DIM_VECTOR]
+	    = MIN (dims[GOMP_DIM_VECTOR],
+		   (targ_fn->max_threads_per_block / warp_size * warp_size));
+
+	if (default_dim_p[GOMP_DIM_WORKER])
+	  dims[GOMP_DIM_WORKER]
+	    = MIN (dims[GOMP_DIM_WORKER],
+		   targ_fn->max_threads_per_block / dims[GOMP_DIM_VECTOR]);
+      }
     }
 
   /* Check if the accelerator has sufficient hardware resources to

--------------12C562AB242C8D469C9FF2C2--